Message boards : Number crunching : Client Errors
Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · 8 · Next
Author | Message |
---|---|
AlphaLaser Send message Joined: 19 Aug 06 Posts: 52 Credit: 3,327,939 RAC: 0 |
Just got a batch of Ralph WUs and I can confirm that my host has been able to successfully complete Ralph while failing here at Rosetta. I set the runtimes to be 1 hr on both projects. I also attached about a week ago but only yesterday did I receive any work. Ralph seems to only have work available very sporadically. I did set Ralph to a very high resource share so that BOINC would try to request work more often. |
Sky King Send message Joined: 28 Feb 12 Posts: 11 Credit: 15,912 RAC: 0 |
Just an FYI... I bumped my NVIDIA drivers up to the 295.73 (or more completely, 8.17.12.9573) from the previous 285.62. Had BOINC grab a few Rosetta WUs, just in case this fixed something... BOINC updated me with Rosetta 3.26, but no joy, still the same "client error" result under the new drivers/client combination. |
wbblakemore Send message Joined: 18 Dec 07 Posts: 33 Credit: 4,181 RAC: 0 |
Just an FYI... I bumped my NVIDIA drivers up to the 295.73 (or more completely, 8.17.12.9573) from the previous 285.62. Had BOINC grab a few Rosetta WUs, just in case this fixed something... BOINC updated me with Rosetta 3.26, but no joy, still the same "client error" result under the new drivers/client combination. The driver version I experienced the initial problem under, is 296.10 (latest production version for Geforce series 5 cards). |
woland Send message Joined: 17 Dec 05 Posts: 5 Credit: 124,792 RAC: 0 |
I've just uninstalled latest NVIDIA drivers and got back to Windows default one (8.17.12.6830) and the problem disappeared. |
mikey Send message Joined: 5 Jan 06 Posts: 1895 Credit: 9,186,455 RAC: 3,447 |
I've just uninstalled latest NVIDIA drivers and got back to Windows default one (8.17.12.6830) and the problem disappeared. But now you aren't able to crunch with the gpu anymore, the Windows drivers are good for minimal things like changing screens etc but not near capable enough to crunch with. If you look in the messages tab of Boinc near the top it should say 'no gpu found'. Now in Rosetta it makes no difference, but if you game or use the gpu to crunch for another project, it does make a difference. |
mikey Send message Joined: 5 Jan 06 Posts: 1895 Credit: 9,186,455 RAC: 3,447 |
Just an FYI... I bumped my NVIDIA drivers up to the 295.73 (or more completely, 8.17.12.9573) from the previous 285.62. Had BOINC grab a few Rosetta WUs, just in case this fixed something... BOINC updated me with Rosetta 3.26, but no joy, still the same "client error" result under the new drivers/client combination. I believe the newest is 3.13 or something like that. And I believe it was the 296 versions that had a serious screen saver flaw that caused Boinc to crash whenever the screen saver came on. |
wbblakemore Send message Joined: 18 Dec 07 Posts: 33 Credit: 4,181 RAC: 0 |
Just an FYI... I bumped my NVIDIA drivers up to the 295.73 (or more completely, 8.17.12.9573) from the previous 285.62. Had BOINC grab a few Rosetta WUs, just in case this fixed something... BOINC updated me with Rosetta 3.26, but no joy, still the same "client error" result under the new drivers/client combination. Yes, there is a new version of the NVidia driver out, in the 300 series, but so far it's only for the new GTX 680 card. They're in the process of retrofitting it for other cards, but it's not released to production yet. Hopefully, it will be in the next couple of weeks. Actually, the bug you're referring to isn't about the screen saver at all - the specifics are as follows: If you use a DVI connection to your monitor and the monitor enters power saving mode (due to inactivity), the CUDA drivers can't recover (at least in terms of crunching apps, as opposed to pure graphics processing). The simple work-around for this is to turn off the "sleep" function for power saving in Windows. As long as the monitor doesn't enter sleep mode, everything works fine. |
mikey Send message Joined: 5 Jan 06 Posts: 1895 Credit: 9,186,455 RAC: 3,447 |
Just an FYI... I bumped my NVIDIA drivers up to the 295.73 (or more completely, 8.17.12.9573) from the previous 285.62. Had BOINC grab a few Rosetta WUs, just in case this fixed something... BOINC updated me with Rosetta 3.26, but no joy, still the same "client error" result under the new drivers/client combination. Thank you for the clarification! |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
Just an FYI... I bumped my NVIDIA drivers up to the 295.73 (or more completely, 8.17.12.9573) from the previous 285.62. Had BOINC grab a few Rosetta WUs, just in case this fixed something... BOINC updated me with Rosetta 3.26, but no joy, still the same "client error" result under the new drivers/client combination. And it appears to be limited to CUDA as my graphics driver is a Intel PCI driver and I have never had problems with tasks crashing due to the monitor going into sleep mode. |
In Memory of Kimsey M Fowler Sr Send message Joined: 10 Mar 12 Posts: 26 Credit: 39,033,222 RAC: 0 |
Per request from Rocco, Wireshark was used to collect packets moving between my machine and the Rosetta server under both success and failure modes of operation. ExamDiff is being used to identify differences in the packet contents. I've only made a preliminary review of the data so far, but one obvious difference is that when the NVIDIA driver is installed (failure condition), some additional information related to CUDA is included in the outgoing information packets transferred to the server. Here is a snippet extracted from a packet: <os_name>Microsoft Windows 7</os_name> <os_version>Home Premium x64 Edition, Service Pack 1, (06.01.7601.00)</os_version> </host_info> <disk_usage> <d_boinc_used_total>2148698524.000000</d_boinc_used_total> <d_boinc_used_project>2147898783.000000</d_boinc_used_project> </disk_usage> <coprocs> <coproc_cuda> <count>2</count> <name>GeForce GTX 580</name> <req_secs>345601.728000</req_secs> <req_instances>2.000000</req_instances> <estimated_delay>0.000000</estimated_delay> <drvVersion>28562</drvVersion> <cudaVersion>4010</cudaVersion> <totalGlobalMem>3221225472</totalGlobalMem> <sharedMemPerBlock>49152</sharedMemPerBlock> <regsPerBlock>32768</regsPerBlock> <warpSize>32</warpSize> <memPitch>2147483647</memPitch> <maxThreadsPerBlock>1024</maxThreadsPerBlock> <maxThreadsDim>1024 1024 64</maxThreadsDim> <maxGridSize>65535 65535 65535</maxGridSize> <totalConstMem>65536</totalConstMem> <major>2</major> <minor>0</minor> <clockRate>1800000</clockRate> <textureAlignment>512</textureAlignment> <deviceOverlap>1</deviceOverlap> <multiProcessorCount>16</multiProcessorCount> </coproc_cuda> </coprocs> <result> <name>heterodimer_design_18_pose_C_abinitio_SAVE_ALL_OUT_46199_4020_0</name> <final_cpu_time>10643.500000</final_cpu_time> <final_elapsed_time>11430.226067</final_elapsed_time> <exit_status>0</exit_status> <state>5</state> <platform>windows_x86_64</platform> <version_num>326</version_num> <app_version_num>326</app_version_num> <stderr_out> <core_client_version>6.12.34</core_client_version> <![CDATA[ <stderr_txt> [2012- 4- 8 19:55:12:] :: BOINC:: Initializing ... ok. [2012- 4- 8 19:55:12:] :: BOINC :: boinc_init() BOINC:: Setting up shared resources ... ok. BOINC:: Setting up semaphores ... ok. BOINC:: Updating status ... ok. BOINC:: Registering timer callback... ok. BOINC:: Worker initialized successfully. Everything between <coprocs> and </coprocs> is added information. Notice that soon after that data comes the version number of Rosetta: <version_num>326</version_num> which is lost in the final WU's web page. I wonder if the server software is robust enough to handle unexpected XML tags? For example, perhaps these tags get output only for the Win 7 64-bit version of the driver. ---KMF, Jr. |
mikey Send message Joined: 5 Jan 06 Posts: 1895 Credit: 9,186,455 RAC: 3,447 |
Everything between <coprocs> and </coprocs> is added information. Notice that soon after that data comes the version number of Rosetta: <version_num>326</version_num> which is lost in the final WU's web page. If that is the problem one would think a simple call to Dr. Anderson and his Team would solve that in a hurry! After all it is they who support Boinc and the Server side of the software directly, they also support the Client side as well. Now each Project DOES modify, well most do, the Server software some, some MUCH more than others. But if Rosetta is having these kinds of problems and HASN'T contacted the Team yet, they need to right away! The worst that can happen is that Dr. A and Team say no they won't help, but if they do agree to help the problem could be solved by today! ESPECIALLY if it is a simple coding error! I once worked on a webpage for over an hour trying to make it work, seems I had left out a period, ONE PERIOD, and the page REFUSED to do ANYTHING!!! This was long before programs, it was back to the days of text entering the code. I had a friend help and he found it within 5 minutes, new eyes and the problem was fixed! Todays Boinc Projects need to be able to handle BOTH cpu and gpu processing AND the pc's of those users that do, not just for their own project but for others too! |
Borg_XMZ Send message Joined: 7 Apr 12 Posts: 2 Credit: 163,771 RAC: 0 |
I have a client error too. May I post here? Yesterday I got my new Notebook. I installed and let it work over night. Just the standard installation, with no special work unit settings an. All work units ended with Client error. HP-Notebook: i7-2860QM, 16Gb Ram, Windows7 x64 Prof At the Task ID log I only see …OK. Or is there another Log file where I could check for errors? Server state: Over Outcome: Client error Client state: New Exit status: 0 (0x0) <core_client_version>7.0.25</core_client_version> <![CDATA[ <stderr_txt> [2012- 4-25 2:34:41:] :: BOINC:: Initializing ... ok. [2012- 4-25 2:34:41:] :: BOINC :: boinc_init() BOINC:: Setting up shared resources ... ok. BOINC:: Setting up semaphores ... ok. BOINC:: Updating status ... ok. BOINC:: Registering timer callback... ok. BOINC:: Worker initialized successfully. Registering options.. Registered extra options. Initializing broker options ... Registered extra options. Initializing core... Initializing options.... ok Options::initialize() Options::adding_options() Options::initialize() Check specs. Options::initialize() End reached Loaded options.... ok Processed options.... ok Initializing random generators... ok Initialization complete. Initializing options.... ok Options::initialize() Options::adding_options() Options::initialize() Check specs. Options::initialize() End reached Loaded options.... ok Processed options.... ok Initializing random generators... ok Initialization complete. Setting WU description ... Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev48292.zip Unpacking WU data ... Unpacking data: ../../projects/boinc.bakerlab.org_rosetta/input_CASP9_ff_benchmark_hybridization_run58_T0528_0_C1_yfsong.zip Setting database description ... Setting up checkpointing ... Setting up graphics native ... BOINC:: Worker startup. Starting watchdog... Watchdog active. [2012- 4-25 3:54:31:] :: BOINC:: Initializing ... ok. [2012- 4-25 3:54:31:] :: BOINC :: boinc_init() BOINC:: Setting up shared resources ... ok. BOINC:: Setting up semaphores ... ok. BOINC:: Updating status ... ok. BOINC:: Registering timer callback... ok. BOINC:: Worker initialized successfully. Registering options.. Registered extra options. Initializing broker options ... Registered extra options. Initializing core... Initializing options.... ok Options::initialize() Options::adding_options() Options::initialize() Check specs. Options::initialize() End reached Loaded options.... ok Processed options.... ok Initializing random generators... ok Initialization complete. Initializing options.... ok Options::initialize() Options::adding_options() Options::initialize() Check specs. Options::initialize() End reached Loaded options.... ok Processed options.... ok Initializing random generators... ok Initialization complete. Setting WU description ... Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev48292.zip Unpacking WU data ... Unpacking data: ../../projects/boinc.bakerlab.org_rosetta/input_CASP9_ff_benchmark_hybridization_run58_T0528_0_C1_yfsong.zip Setting database description ... Setting up checkpointing ... Setting up graphics native ... BOINC:: Worker startup. Starting watchdog... Watchdog active. ====================================================== DONE :: 2 starting structures 7213.6 cpu seconds This process generated 2 decoys from 2 attempts ====================================================== BOINC :: WS_max 4.99876e+008 BOINC :: Watchdog shutting down... BOINC :: BOINC support services shutting down cleanly ... called boinc_finish </stderr_txt> ]]> Validate state: Invalid Thanks in advance |
In Memory of Kimsey M Fowler Sr Send message Joined: 10 Mar 12 Posts: 26 Credit: 39,033,222 RAC: 0 |
I have a client error too. May I post here? Hi: Yes, certainly, if it's the same problem. Without knowing your work unit number, we cannot look at the full results: The very last line where it says "application version"... was the version number absent? If so then that is a good sign that you are having the same problem. Your computing platform seems to have the same characteristics as well, the i7 processor and Win 7 64-bit anyway. What video card are you using? Here's a seemingly silly question for any of you experiencing this problem: Does your machine have a CD/DVD drive installed? One sort of odd thing with my problem machine is that I do not have one built in. I use a USB DVD drive I plug in when necessary. Thanks. ---KMF, Jr. |
Sky King Send message Joined: 28 Feb 12 Posts: 11 Credit: 15,912 RAC: 0 |
I have a client error too. May I post here?Does your machine have a CD/DVD drive installed? One sort of odd thing with my problem machine is that I do not have one built in. I use a USB DVD drive I plug in when necessary. I have a stock SATA DVD reader/burner in my rig. |
A.M. Send message Joined: 13 Jun 06 Posts: 12 Credit: 954,586 RAC: 0 |
Here's a seemingly silly question for any of you experiencing this problem: Does your machine have a CD/DVD drive installed? Yes I do. It's a SATA3 "reads and writes everything" combo drive. |
Borg_XMZ Send message Joined: 7 Apr 12 Posts: 2 Credit: 163,771 RAC: 0 |
Yes there is no application version shown. Just 3x --- (application version: ---) My Video Card is a Nvidia 5010M and I have a build in optical drive. And here are some of the Work unit ID´s: Task ID, Work unit ID 501092173, 456890145 501090187, 456888335 501089545, 456887802 501088693, 456887038 501084098, 456882913 501069583, 456869975 501063628, 454749580 501063549, 456865034 501060433, 456862352 501055615, 456859394 501054481, 456858393 501052588, 456856650 501051508, 454710911 501047597, 454736825 501027182, 456836619 501026609, 456836107 501025987, 456835540 501022620, 456832669 501021861, 456831989 501020591, 456830849 thanks |
Fi and Charlie Shaw Send message Joined: 7 May 07 Posts: 8 Credit: 346,961 RAC: 0 |
I've had approx 14 client errors out of my last 100 WU's My other projects are working fine. I suspect these are dud WU's expecially as many others are experiencing similar errors. I'll carry on crunching, as I'm sure that matters will eventually be sorted out. Yes I am using BOINC 7.0.25 Swordfish |
Rocco Moretti Send message Joined: 18 May 10 Posts: 66 Credit: 585,745 RAC: 0 |
I've had approx 14 client errors out of my last 100 WU's You have a slightly different issue ("Incorrect function" with 7.0.25) than the one discussed in this thread (reported as client error, but with exit code 0), so I'll point you to the thread for it (though there aren't any updates yet). |
AlphaLaser Send message Joined: 19 Aug 06 Posts: 52 Credit: 3,327,939 RAC: 0 |
|
mikey Send message Joined: 5 Jan 06 Posts: 1895 Credit: 9,186,455 RAC: 3,447 |
I've had approx 14 client errors out of my last 100 WU's You guys DO realize that version 7.0.25 is the RELEASE version now and has been for more than a week, right?! EVEN WCG handles it properly! |
Message boards :
Number crunching :
Client Errors
©2024 University of Washington
https://www.bakerlab.org