Message boards : Number crunching : Client error for ALL tasks since a month. (Linux 64 bits boinc 7.0.27)
Author | Message |
---|---|
Daedalus Send message Joined: 1 Aug 08 Posts: 39 Credit: 10,107,163 RAC: 354 |
Hello all, I just checked my stats and noticed all my rosetta tasks on my quad 95550 failed since around a month. The computer ID is 1549082. I will not paste here the complete error messages because there are so many tasks. You can browse them from the forum i think. Here is ONE of the results i got. <core_client_version>7.0.27</core_client_version> <![CDATA[ <stderr_txt> [2012- 8-12 17:57:47:] :: BOINC:: Initializing ... ok. [2012- 8-12 17:57:47:] :: BOINC :: boinc_init() BOINC:: Setting up shared resources ... ok. BOINC:: Setting up semaphores ... ok. BOINC:: Updating status ... ok. BOINC:: Registering timer callback... ok. BOINC:: Worker initialized successfully. Registering options.. Registered extra options. Initializing broker options ... Registered extra options. Initializing core... Initializing options.... ok Options::initialize() Options::adding_options() Options::initialize() Check specs. Options::initialize() End reached Loaded options.... ok Processed options.... ok Initializing random generators... ok Initialization complete. Setting WU description ... Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev48292.zip Unpacking WU data ... Unpacking data: ../../projects/boinc.bakerlab.org_rosetta/C6_hexamer_ferredoxin_2klo_0001_INPUT__16_0001_A_fragments_fold_data.zip Setting database description ... Setting up checkpointing ... Setting up graphics native ... Setting up folding (abrelax) ... Beginning folding (abrelax) ... BOINC:: Worker startup. Starting watchdog... Watchdog active. Starting work on structure: _00001 Starting work on structure: _00002 Starting work on structure: _00003 Starting work on structure: _00004 Starting work on structure: _00005 Starting work on structure: _00006 Starting work on structure: _00007 Starting work on structure: _00008 Starting work on structure: _00009 Starting work on structure: _00010 Starting work on structure: _00011 Starting work on structure: _00012 Starting work on structure: _00013 Starting work on structure: _00014 Starting work on structure: _00015 Starting work on structure: _00016 Starting work on structure: _00017 Starting work on structure: _00018 Starting work on structure: _00019 Starting work on structure: _00020 Starting work on structure: _00021 Starting work on structure: _00022 Starting work on structure: _00023 Starting work on structure: _00024 Starting work on structure: _00025 Starting work on structure: _00026 Starting work on structure: _00027 Starting work on structure: _00028 Starting work on structure: _00029 Starting work on structure: _00030 Starting work on structure: _00031 ====================================================== DONE :: 1 starting structures 10478.3 cpu seconds This process generated 31 decoys from 31 attempts ====================================================== BOINC :: WS_max 3.75376e+255 BOINC :: Watchdog shutting down... BOINC :: BOINC support services shutting down cleanly ... called boinc_finish </stderr_txt> ]]> I do not know how to debug these codes and would appreciate any help. I set the project to refuse new work. By the way, it continues to hog my memory in spite of my preferences. |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
There is an existing thread on problems that some are having when running under the newer v7 versions of BOINC: Link to thread Rosetta Moderator: Mod.Sense |
Mad_Max Send message Joined: 31 Dec 09 Posts: 209 Credit: 26,060,287 RAC: 16,833 |
This bug not bound to BOINC 7.x ! One of my team members faced with the same problem: link to host Among other things (after few reboots of the computer and reset the project) have tried to install few different versions of BOINC 7юч (7.0.25, 7.0.28), including 6.x (6.12.34) (including "clean install" - with full deletion of all BOINC related files) The problem persisted. In the end, even reinstalled the operating system (Windows 7), but even this did not help too. So it seems this bug on the server side(validator?). But rare bug. Only one thing that we found out - when the this error hits in the task logs is application version is missing: application version --- all other looks OK |
Link Send message Joined: 4 May 07 Posts: 356 Credit: 382,349 RAC: 0 |
This bug not bound to BOINC 7.x ! This computer is using 7.0.28. Most of the WUs were completed successfully by the wingmen, except for those who had BOINC v7 as well. . |
Mad_Max Send message Joined: 31 Dec 09 Posts: 209 Credit: 26,060,287 RAC: 16,833 |
You see only a last iteration after Windows reinstall now. 6.х version was installed over a month ago. Tasks performed on it have already been removed from the server. But it was exactly the same error on 6.12.34: no any errors in log(except for the missing version of app) and 100% of tasks marked as invalid. |
Daedalus Send message Joined: 1 Aug 08 Posts: 39 Credit: 10,107,163 RAC: 354 |
Ok, i just suspend the project then. What's funny is my current version of BOINC is tied to my OS which is a long term support version so i will not change my BOINC version anytime soon. |
Mad_Max Send message Joined: 31 Dec 09 Posts: 209 Credit: 26,060,287 RAC: 16,833 |
This bug not bound to BOINC 7.x ! Another computer has been hit by this bug. https://boinc.bakerlab.org/rosetta/results.php?hostid=1555324 Reseting project and removing BOINC version 7.0.28 and install version 6.12.34 has not changed anything. This is the third computer with this bug in our team, which can not be solved by any means from user side. In all cases, it does not depend on the version of BOINC, so I'm sure that the problem is not with 7.x BOINC versions or OS version, and is a is on the side of the project. P.S. Now, when people in your team have caught the bug we will immediately recommend to switch to a different dc project and do not waste time trying to resolve it. |
Link Send message Joined: 4 May 07 Posts: 356 Credit: 382,349 RAC: 0 |
Another computer has been hit by this bug. So it's the third copmuter out of how many? Those computers must have something in common, when on all of them the same error occur. The BOINC version might be indeed not responsible this time, in the results of this host there are wingmen, who completed the WUs successfully with v7. Now, when people in your team have caught the bug we will immediately recommend to switch to a different dc project and do not waste time trying to resolve it. Well, that's for sure better than wasting days or weeks of CPU time on errors. . |
Mad_Max Send message Joined: 31 Dec 09 Posts: 209 Credit: 26,060,287 RAC: 16,833 |
BOINC 6.10.18 not help too: https://boinc.bakerlab.org/rosetta/result.php?resultid=531273684 Even new user account registration not working: https://boinc.bakerlab.org/rosetta/results.php?hostid=1564473 > Those computers must have something in common Something really must be common. But, we can not figure out what it might be. While it is clear that this is NOT a BOINC version(try lot of them) and NOT the type / version of the operating system (this BUG seen on Windows 7, Windows XP and Linux). |
Link Send message Joined: 4 May 07 Posts: 356 Credit: 382,349 RAC: 0 |
Something really must be common. But, we can not figure out what it might be. Well, there might be something to try... not a real solution, but it would be good to know. . |
Mad_Max Send message Joined: 31 Dec 09 Posts: 209 Credit: 26,060,287 RAC: 16,833 |
Yeah, try to suspend calculations of all other projects, it will be one of the following steps before detach comps from R@H. And try to install the 32bit version of BOINC (still the only general param for all computers with the bugs that we found out - 64-bit versions of the OS and Intel processor). While I wrote this message list of computers with the bug has grown - now are 4 of them (i count only in our team now): https://boinc.bakerlab.org/rosetta/results.php?hostid=1564583 And it was a just bought new computer first time attached to BOINC... |
Mattmon Send message Joined: 22 Mar 06 Posts: 1 Credit: 8,365,527 RAC: 0 |
I too am having this problem with getting client errors for all my tasks. https://boinc.bakerlab.org/results.php?hostid=1551216 What is most troubling about this problem, is that on the CLIENT, it shows that it completed successfully with "Ready to report". It doesn't even show that it resulted in an error at all! It is only after checking my Tasks that I see that it was Client error. For all I know, this problem could have been happening for MONTHS, and I only found out about it now because I just happened to check my tasks! I now have this project set for no new tasks. |
mikey Send message Joined: 5 Jan 06 Posts: 1895 Credit: 9,172,975 RAC: 3,008 |
I too am having this problem with getting client errors for all my tasks. THAT is the symptom and what makes this so hard for us users to solve, the units finish up just fine but then crash and burn when they are sent to Rosetta! The Raplh units, the beta side of Rosetta, does not have this problem. Rosetta says they have no idea and that they are getting enough good units that the bad ones make no difference to them. They ARE looking into it, but have been for months now and haven't found a thing! Lots of us think it is the Server causing the problems but Rosetta keeps saying they have no idea! On the personal side I have removed all of my pc's that have gpu's in them for crunching, and have downgraded to the Boinc version 6 series for those that are still here and I am fine again. At some point my pc's WILL be using their gpu's again and/or I will go back to projects that require version 7 of Boinc, and Rosetta will be cut off for me again. It is disappointing to me that Rosetta would do this but it IS their project not mine! |
Mad_Max Send message Joined: 31 Dec 09 Posts: 209 Credit: 26,060,287 RAC: 16,833 |
One more computer with BUG: https://boinc.bakerlab.org/rosetta/results.php?hostid=1565305 Now 5 comps in our team can not crunch R@H at all due this bug. To devs: you already losing power of 32 CPU cores due this bug in our team only! Not sure if it representative to all cruncers, but if it is then you losing power of near ~800 CPU cores total (extrapolation based on ratio of computing power of our team = ~4% and the total for the entire project) P.S. Turning off other BOINC projects not help too. |
Chilean Send message Joined: 16 Oct 05 Posts: 711 Credit: 26,694,507 RAC: 0 |
|
Bjarke Send message Joined: 14 Feb 06 Posts: 5 Credit: 1,634,479 RAC: 0 |
Two of my PC's seem to suddenly have gotten this error. Host 1569102 Host 1569084 Both PC's have worked fine with Rosetta before. I will just switch to another project where the programmers know what they are doing and where they are actually putting an effort in solving such problems... |
David E K Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 1 Jul 05 Posts: 1018 Credit: 4,334,829 RAC: 0 |
So sorry for the long hiatus. Been busy. I'll try to figure out what is going on. Just downloaded the client on a new computer and will see what happens. Thank you Mod.Sense for alerting us. |
David E K Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 1 Jul 05 Posts: 1018 Credit: 4,334,829 RAC: 0 |
My new client (7.0.31 on Mac) finished a task successfully. I'll try different platforms. Are people getting these errors on the most current boinc client version? |
mikey Send message Joined: 5 Jan 06 Posts: 1895 Credit: 9,172,975 RAC: 3,008 |
My new client (7.0.31 on Mac) finished a task successfully. I'll try different platforms. Are people getting these errors on the most current boinc client version? I am a Windows guy, not a Linux guy, but in Windows yes even the 7.0.27 version of Boinc had problems. It works fine under the 6.?.? versions of Boinc, but even then ONLY if you are NOT using a gpu for another project. There are too many gpu projects out there now to ignore them and solely crunch for Rosetta, especially for those of us with multiple pc's. We were told that Rosetta is not interested in supporting the version 7.?.? series of Boinc as they were happy with the number of crunchers they had. These errors started a LONG time ago and even the people at the Beta project worked on it but couldn't find any problems. In fact the Beta had NO problems, but here at Rosie there are nothing but problems. I also crunch for the Project Albert, it REQUIRES Boinc version 7.0.27 or above, that makes crunching for Rosie impossible! I HOPE you can fix this problem, LOTS of people are dropping Rosie from their list of Boinc projects and if you can't get new people coming in eventually Rosie WILL CLOSE DOWN. |
Mad_Max Send message Joined: 31 Dec 09 Posts: 209 Credit: 26,060,287 RAC: 16,833 |
My new client (7.0.31 on Mac) finished a task successfully. I'll try different platforms. Are people getting these errors on the most current boinc client version? Read this entire thread first (if you have not done it already). Suumary: bug seen with different versions of BOINC (6.12.34, 7.0.25, 7.0.27, 7.0.28) on different platforms (Windows XP, Windows 7, Windows 8, Linux). Resetting project or reinstalling BOINC - not help. On 2 comps with BUG even reintalling OS and тurning off other BOINC projects not help too. we could not figure out what all computer affected to this error have in common. Very strange bug... P.S. Links to task exaples not working because computers has been switched to other projects long ago. |
Message boards :
Number crunching :
Client error for ALL tasks since a month. (Linux 64 bits boinc 7.0.27)
©2024 University of Washington
https://www.bakerlab.org