Message boards : Number crunching : I'm geting lots of errors with Rosetta v4.07
Author | Message |
---|---|
Simplex0 Send message Joined: 13 Jun 18 Posts: 14 Credit: 1,714,717 RAC: 0 |
I think I will abort all v4.07 from now on, this is how the Stderr logg looks Stderr logg <core_client_version>7.10.2</core_client_version> <![CDATA[ <stderr_txt> command: projects/boinc.bakerlab.org_rosetta/rosetta_4.07_windows_intelx86.exe @T1000.flags -in:file:boinc_wu_zip T1000.zip -nstruct 10000 -cpu_run_time 28800 -watchdog -boinc:max_nstruct 600 -checkpoint_interval 120 -mute all -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -run::rng mt19937 -constant_seed -jran 1202697 Starting watchdog... Watchdog active. BOINC:: CPU time: 22193.3s, 14400s + 7200s[2018- 7-12 18:11:32:] :: BOINC WARNING! cannot get file size for default.out.gz: could not open file. Output exists: default.out.gz Size: -1 InternalDecoyCount: 0 (GZ) ----- 0 ----- Stream information inconsistent. Writing W_0000001 ====================================================== DONE :: 1 starting structures 22194.4 cpu seconds This process generated 1 decoys from 1 attempts ====================================================== 18:11:32 (10080): called boinc_finish(0) </stderr_txt> <message> upload failure: <file_xfer_error> <file_name>T1000_full_aivan_SAVE_ALL_OUT_03_09_677708_5924_0_r2041150700_0</file_name> <error_code>-240 (stat() failed)</error_code> </file_xfer_error> </message> ]]> Seams to be only this type of workunit Namn T1000_full_aivan_SAVE_ALL_OUT_03_09_677708_5951_0 Namn T1000_full_aivan_SAVE_ALL_OUT_03_09_677708_5959_0 Namn T1000_full_aivan_SAVE_ALL_OUT_03_09_677708_5971_0 Namn T1000_full_aivan_SAVE_ALL_OUT_03_09_677708_5972_0 Namn T1000_full_aivan_SAVE_ALL_OUT_03_09_677708_5991_0 Namn T1000_full_aivan_SAVE_ALL_OUT_03_09_677708_4411_0 Namn T1000_full_aivan_SAVE_ALL_OUT_03_09_677708_4297_0 Namn T1000_full_aivan_SAVE_ALL_OUT_03_09_677708_4122_0 Namn T1000_full_aivan_SAVE_ALL_OUT_03_09_677708_4081_0 Namn T1000_full_aivan_SAVE_ALL_OUT_03_09_677708_3741_0 Namn T1000_full_aivan_SAVE_ALL_OUT_03_09_677708_3374_0 Namn T1000_full_aivan_SAVE_ALL_OUT_03_09_677708_2579_0 |
Simplex0 Send message Joined: 13 Jun 18 Posts: 14 Credit: 1,714,717 RAC: 0 |
Ones again I got avian tasks that have been running for 2,5 hours and are estimated to run for 3 - 4 hours more despite that my settings in Rosetta for Target CPU run time is 2 hours. Should I abort them? I have aborted all other avian tasks as my experience is that they are running for a long time an all end up with an error. |
Simplex0 Send message Joined: 13 Jun 18 Posts: 14 Credit: 1,714,717 RAC: 0 |
Yupp! Same error as always, 4 work units and in total 20 hours of wasted computing, luckily I aborted all the other avian workuntis before they started running and wasted even more recourses. Stderr logg <core_client_version>7.10.2</core_client_version> <![CDATA[ <stderr_txt> command: projects/boinc.bakerlab.org_rosetta/rosetta_4.07_windows_intelx86.exe @T1000.3.flags -in:file:boinc_wu_zip T1000.3.zip -nstruct 10000 -cpu_run_time 28800 -watchdog -boinc:max_nstruct 600 -checkpoint_interval 120 -mute all -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -run::rng mt19937 -constant_seed -jran 3683498 Starting watchdog... Watchdog active. BOINC:: CPU time: 22129s, 14400s + 7200s[2018- 7-14 18:20:52:] :: BOINC WARNING! cannot get file size for default.out.gz: could not open file. Output exists: default.out.gz Size: -1 InternalDecoyCount: 0 (GZ) ----- 0 ----- Stream information inconsistent. Writing W_0000001 ====================================================== DONE :: 1 starting structures 22129 cpu seconds This process generated 1 decoys from 1 attempts ====================================================== 18:20:52 (10344): called boinc_finish(0) </stderr_txt> |
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 1994 Credit: 9,623,704 RAC: 9,591 |
WARNING! cannot get file size for default.out.gz: could not open file. +1 Same error on all my T1000_aivan |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
I have no details on the specific WUs or issues they are having. But I wanted everyone to know that BOINC Manager's "estimated runtime" is really based on history, not the present. So, regardless of the name or likely success of current WUs, if BOINC Manager has a recent history with WUs taking 3 or 4 hours longer than the runtime preference, it will "estimate" future WUs will take 3 to 4 hours longer as well. The likelihood of the current WUs running long is not related to the estimated runtime of the BOINC Manager. If the name of the current tasks has the same prefix as those that you had trouble with, that would be a better indicator for you. Rosetta Moderator: Mod.Sense |
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 1994 Credit: 9,623,704 RAC: 9,591 |
I have no details on the specific WUs or issues they are having. But I wanted everyone to know that BOINC Manager's "estimated runtime" is really based on history, not the present. So, regardless of the name or likely success of current WUs, if BOINC Manager has a recent history with WUs taking 3 or 4 hours longer than the runtime preference, it will "estimate" future WUs will take 3 to 4 hours longer as well. For me the problem is not the runtime of wus (i know the decoy's question), but the validation error. |
rjs5 Send message Joined: 22 Nov 10 Posts: 273 Credit: 23,054,272 RAC: 8,196 |
I have no details on the specific WUs or issues they are having. But I wanted everyone to know that BOINC Manager's "estimated runtime" is really based on history, not the present. So, regardless of the name or likely success of current WUs, if BOINC Manager has a recent history with WUs taking 3 or 4 hours longer than the runtime preference, it will "estimate" future WUs will take 3 to 4 hours longer as well. I found 1 "Invalid" run that you were granted 587.11 credits. Were there others that were a problem for you? Seems like the 587 credits were similar to the other valid jobs. name T1000_full3_aivan_SAVE_ALL_OUT_03_09_677955_5874 application Rosetta created 14 Jul 2018, 8:17:20 UTC canonical result 1015258491 granted credit 587.11 https://boinc.bakerlab.org/workunit.php?wuid=914821393 |
Simplex0 Send message Joined: 13 Jun 18 Posts: 14 Credit: 1,714,717 RAC: 0 |
I have no details on the specific WUs or issues they are having. But I wanted everyone to know that BOINC Manager's "estimated runtime" is really based on history, not the present. So, regardless of the name or likely success of current WUs, if BOINC Manager has a recent history with WUs taking 3 or 4 hours longer than the runtime preference, it will "estimate" future WUs will take 3 to 4 hours longer as well. The work units that was marked as 'Invali' is in my first post in this thread and I had 5 - 6 more of the same later. Why you can't find them I have no idea, ask the staff, maybe they can help you. The lates work units of this kind is here.... https://boinc.bakerlab.org/result.php?resultid=1015234868 https://boinc.bakerlab.org/result.php?resultid=1015234886 https://boinc.bakerlab.org/result.php?resultid=1015234755 https://boinc.bakerlab.org/result.php?resultid=1015234804 They was the first units I run in both my fist and second attempt to crunch a bunch of maybe 100 work units but because the 4 first I run all ended up as 'Invalid' I aborted all the others. I have not received any more of this "avian" workunits lately and I hope I wont. The credit is totally irrelevant in this case, the problem imo is that recourses are wasted when hours of crunching ends up with a result that is Invalid. Luckily I spotted the early and wasted only 40 hours instead of 500 hours. |
Simplex0 Send message Joined: 13 Jun 18 Posts: 14 Credit: 1,714,717 RAC: 0 |
I have no details on the specific WUs or issues they are having. But I wanted everyone to know that BOINC Manager's "estimated runtime" is really based on history, not the present. So, regardless of the name or likely success of current WUs, if BOINC Manager has a recent history with WUs taking 3 or 4 hours longer than the runtime preference, it will "estimate" future WUs will take 3 to 4 hours longer as well. The likelihood of the current WUs running long is not related to the estimated runtime of the BOINC Manager. If the name of the current tasks has the same prefix as those that you had trouble with, that would be a better indicator for you. The main issue here is not the runtime or credit in this case, it is that a lot of your crunchers resources and YOUR resorses I wasted when a lot of hours of crunching ends up with a result that is invald. |
Simplex0 Send message Joined: 13 Jun 18 Posts: 14 Credit: 1,714,717 RAC: 0 |
I have no details on the specific WUs or issues they are having. But I wanted everyone to know that BOINC Manager's "estimated runtime" is really based on history, not the present. So, regardless of the name or likely success of current WUs, if BOINC Manager has a recent history with WUs taking 3 or 4 hours longer than the runtime preference, it will "estimate" future WUs will take 3 to 4 hours longer as well. The likelihood of the current WUs running long is not related to the estimated runtime of the BOINC Manager. If the name of the current tasks has the same prefix as those that you had trouble with, that would be a better indicator for you. I have now checked more than 1000 workunits that has finished successfully and only 1 of them took 4 hours while ALL of the invalid aivan workuntis took more that 6 hours to finish. Anyway. It seams that I do not get any more of this kind of workunits so hopefully the problem has already been spotted and taking care of by the staff. |
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 1994 Credit: 9,623,704 RAC: 9,591 |
name T1000_full3_aivan_SAVE_ALL_OUT_03_09_677955_5874 https://boinc.bakerlab.org/result.php?resultid=1015258491 I've got 0 credits for this wu. But this is not a problem. I killed the others "_aivan_" |
rjs5 Send message Joined: 22 Nov 10 Posts: 273 Credit: 23,054,272 RAC: 8,196 |
name T1000_full3_aivan_SAVE_ALL_OUT_03_09_677955_5874 I thought the "granted credits" were issued manually by the project staff on those WUs that run a long time, had a problem caused by Rosetta or researcher and did not issue credits. Hmmm. I only do the work for the credits. I think I will have enough to retire soon. 8-) It is good to report the bad WU like aivan jobs so Rosetta can clean out the pipeline, inform the researcher of his mistake AND then fix the bug in Rosetta. Researchers should not be able to set problem controls. The problem should be filtered by the software before the problem starts work. |
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 1994 Credit: 9,623,704 RAC: 9,591 |
Hmmm. I only do the work for the credits. I think I will have enough to retire soon. 8-) I hope you will not stop to crunch on R@H and write on forum. You would miss us |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
A program runs daily that grants credit to these tasks. The credit reflects the value to the Project Team of understanding what is not working, so things can improve. Rosetta Moderator: Mod.Sense |
Message boards :
Number crunching :
I'm geting lots of errors with Rosetta v4.07
©2024 University of Washington
https://www.bakerlab.org