Message boards : Number crunching : minirosetta 2.14
Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · Next
Author | Message |
---|---|
Sid Celery Send message Joined: 11 Feb 08 Posts: 2125 Credit: 41,249,734 RAC: 9,368 |
Exact same error for me in the following tasks: lrm_jorj_combined_torsion_it06_run01_A_rlbd_1h75__SAVE_ALL_OUT_IGNORE_THE_RESTlr13_DECOY_21224_147_1 lrm_jorj_combined_torsion_it06_run01_A_rlbd_1o73__SAVE_ALL_OUT_IGNORE_THE_RESTlr8_DECOY_21224_118_0 lrm_jorj_combined_torsion_it06_run01_A_rlbd_2uzr__SAVE_ALL_OUT_IGNORE_THE_RESTlr10_DECOY_21224_120_0 lrm_jorj_combined_torsion_it06_run01_A_rlbd_1s12__SAVE_ALL_OUT_IGNORE_THE_RESTlr8_DECOY_21224_213_0 lrm_jorj_combined_torsion_it06_run01_A_rlbd_2i1u__SAVE_ALL_OUT_IGNORE_THE_RESTlr10_DECOY_21224_169_1 Also, the following error in the following tasks: ERROR: rsd_type_list.size() td-only-2-Alg13_8-10_21413_155_1 td-only-2-RrR43_7-10_21413_131_1 td-only-2-DsbA_10-12_21413_137_1 |
P . P . L . Send message Joined: 20 Aug 06 Posts: 581 Credit: 4,865,274 RAC: 0 |
Another failed after 2sec, same problem as others. lrm_jorj_combined_torsion_it06_run01_A_rlbd_1xd6__SAVE_ALL_OUT_IGNORE_THE_RESTlr5_DECOY_21224_606 https://boinc.bakerlab.org/rosetta/workunit.php?wuid=331225615 <core_client_version>6.2.14</core_client_version> <![CDATA[ <message> process exited with code 1 (0x1, -255) </message> <stderr_txt> Setting WU description ... Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev36507.zip Unpacking WU data ... Unpacking data: ../../projects/boinc.bakerlab.org_rosetta/yfsong_lrm_jorj_combined_torsion_it06_run01_A.zip Unpacking data: ../../projects/boinc.bakerlab.org_rosetta/lr5_1xd6.fix.out.zip Setting database description ... Setting up checkpointing ... Setting up graphics native ... ERROR: Unable to open weights. Neither ./dslf_weights.wts nor dslf_weights.wts nor minirosetta_database/scoring/weights/dslf_weights.wts exist ERROR:: Exit from: src/core/scoring/ScoreFunctionFactory.cc line: 178 BOINC:: Error reading and gzipping output datafile: default.out called boinc_finish </stderr_txt> |
P . P . L . Send message Joined: 20 Aug 06 Posts: 581 Credit: 4,865,274 RAC: 0 |
This one failed after 4sec, same as others. lrm_jorj_combined_torsion_it06_run01_A_rlbd_2iiy__SAVE_ALL_OUT_IGNORE_THE_RESTlr10_DECOY_21224_814_0 https://boinc.bakerlab.org/rosetta/workunit.php?wuid=331916208 <core_client_version>6.10.17</core_client_version> <![CDATA[ <message> process exited with code 1 (0x1, -255) </message> <stderr_txt> ( left out bits in middle ) Initializing random generators... ok Initialization complete. Setting WU description ... Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev36507.zip Unpacking WU data ... Unpacking data: ../../projects/boinc.bakerlab.org_rosetta/yfsong_lrm_jorj_combined_torsion_it06_run01_A.zip Unpacking data: ../../projects/boinc.bakerlab.org_rosetta/lr10_2iiy.fix.out.zip Setting database description ... Setting up checkpointing ... Setting up graphics native ... ERROR: Unable to open weights. Neither ./dslf_weights.wts nor dslf_weights.wts nor minirosetta_database/scoring/weights/dslf_weights.wts exist ERROR:: Exit from: src/core/scoring/ScoreFunctionFactory.cc line: 178 BOINC:: Error reading and gzipping output datafile: default.out called boinc_finish </stderr_txt> |
P . P . L . Send message Joined: 20 Aug 06 Posts: 581 Credit: 4,865,274 RAC: 0 |
And another one, took 12sec to die. lrm_jorj_combined_torsion_it06_run01_A_rlbd_1l6p__SAVE_ALL_OUT_IGNORE_THE_RESTlr13_DECOY_21224_797 https://boinc.bakerlab.org/rosetta/workunit.php?wuid=331893230 <core_client_version>6.10.17</core_client_version> <![CDATA[ <message> process exited with code 1 (0x1, -255) </message> <stderr_txt> Setting WU description ... Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev36507.zip Unpacking WU data ... Unpacking data: ../../projects/boinc.bakerlab.org_rosetta/yfsong_lrm_jorj_combined_torsion_it06_run01_A.zip Unpacking data: ../../projects/boinc.bakerlab.org_rosetta/lr13_1l6p.fix.out.zip Setting database description ... Setting up checkpointing ... Setting up graphics native ... ERROR: Unable to open weights. Neither ./dslf_weights.wts nor dslf_weights.wts nor minirosetta_database/scoring/weights/dslf_weights.wts exist ERROR:: Exit from: src/core/scoring/ScoreFunctionFactory.cc line: 178 BOINC:: Error reading and gzipping output datafile: default.out called boinc_finish </stderr_txt> |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
thought you guys fixed the weights problem??? lrm_jorj_combined_torsion_it06_run01_A_rlbd_2i1u__SAVE_ALL_OUT_IGNORE_THE_RESTlr10_DECOY_21224_584_0 ERROR: Unable to open weights. Neither ./dslf_weights.wts nor dslf_weights.wts nor minirosetta_databasescoring/weights/dslf_weights.wts exist ERROR:: Exit from: ....srccorescoringScoreFunctionFactory.cc line: 178 BOINC:: Error reading and gzipping output datafile: default.out called boinc_finish |
P . P . L . Send message Joined: 20 Aug 06 Posts: 581 Credit: 4,865,274 RAC: 0 |
Hi. The first copy of this task errored, i can't see a problem with mine don't know why it got a validate error. cs-td-2-LkR15_5-5_20162_213_1 https://boinc.bakerlab.org/rosetta/workunit.php?wuid=331107668 Server state__Over Outcome__Validate error Client state__Done Exit status__0 (0x0) <core_client_version>6.2.14</core_client_version> <![CDATA[ <stderr_txt> Starting work on structure: _00023 ====================================================== DONE :: 1 starting structures 14182.3 cpu seconds This process generated 23 decoys from 23 attempts ====================================================== BOINC :: Watchdog shutting down... BOINC :: BOINC support services shutting down cleanly ... called boinc_finish </stderr_txt> |
P . P . L . Send message Joined: 20 Aug 06 Posts: 581 Credit: 4,865,274 RAC: 0 |
This one has failed twice, mine after 14sec. SAXS-score-1egaB_SAVE_ALL_OUT_21827_871_1 https://boinc.bakerlab.org/rosetta/workunit.php?wuid=332831626 <core_client_version>6.10.17</core_client_version> <![CDATA[ <message> process exited with code 1 (0x1, -255) </message> Starting work on structure: _00001 ERROR: Assertion failure: runtime_assert( ( begin + size - 1 ) <= pose.total_residue() ); ERROR:: Exit from: src/protocols/abinitio/FragmentMover.cc line: 250 BOINC:: Error reading and gzipping output datafile: default.out called boinc_finish </stderr_txt> |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2125 Credit: 41,249,734 RAC: 9,368 |
A strange one: T0605_tjrs_stg0_lrlxjcst_t000__casp9_SAVE_ALL_OUT_21824_1219_1 ERROR: Error in traceback: pointer doesn't go anywhere! |
P . P . L . Send message Joined: 20 Aug 06 Posts: 581 Credit: 4,865,274 RAC: 0 |
This ran for 3min. fix_disulf_v4_NMR_1j0t_DISULF__BOINC_abrelax.v1_SAVE_ALL_OUT_21861_87_0 https://boinc.bakerlab.org/rosetta/workunit.php?wuid=333047271 <core_client_version>6.2.14</core_client_version> <![CDATA[ <message> process exited with code 1 (0x1, -255) </message> Starting work on structure: _00001 # cpu_run_time_pref: 14400 ERROR: rsd_type_list.size() ERROR:: Exit from: src/core/fragment/Frame.cc line: 62 BOINC:: Error reading and gzipping output datafile: default.out called boinc_finish </stderr_txt> |
Speedy Send message Joined: 25 Sep 05 Posts: 163 Credit: 808,098 RAC: 0 |
P.P.L Your core client (Boinc Manager) is a little out dated, try updating your core client to the recommended version This may help Have a crunching good day!! |
P . P . L . Send message Joined: 20 Aug 06 Posts: 581 Credit: 4,865,274 RAC: 0 |
Hi. As transient says, i think that (process exited with code 1 (0x1, -255) is a generic error code they/boinc use. I'll stop posting that bit in future. As for boinc versions goes as they say, if it ain't broke don't fix it! |
Levent TERLEMEZ Send message Joined: 7 Dec 05 Posts: 18 Credit: 121,492 RAC: 0 |
I bought a brand new AMD Phenom(tm) II X4 925 Processor and return to BOINC my projects. But some interesting things began. What ever project working seti, einstein, rosetta or what ever it is, what wu number it is in that session (in a number downloaded WUs for that day), anyhow there was A (one) calculation error. What may it be? Machine Specs: XP Pro SP3 AMD Phenom(tm) II X4 925 Processor 2 GB DDR3 Ram BOINC Ver. 6.10.58 THANKS for any answers or tips about after any observed the same or like this error before. |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
Levent TERLEMEZ Looks like their task reported back with this: ERROR: rsd_type_list.size() The task was restarted five times. If the task was unable to reach a checkpoint in that time, then the task is aborted for you. But I would expect a message about too many restarts with no progress rather then the one you got. Rosetta Moderator: Mod.Sense |
Levent TERLEMEZ Send message Joined: 7 Dec 05 Posts: 18 Credit: 121,492 RAC: 0 |
Levent TERLEMEZ Thanks for the reply, well sorry for the easy way I selected-asking more, is it possible to be corrupted while downloading. Thanks again. |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
...is it possible to be corrupted while downloading. It is possible for corruption to occur to any data that passes over a network. However, BOINC has signatures that double check the integrity of the files you receive. When a signature mismatch is found, the error is reported differently and the task is not run. Generally the error about gzipping is due to the output file not being produced. So it isn't there to zip. And this is because the error occurred before any output was produced. Rosetta Moderator: Mod.Sense |
P . P . L . Send message Joined: 20 Aug 06 Posts: 581 Credit: 4,865,274 RAC: 0 |
I've ran a few of these already no problem, this is a different error. Ran for 17sec. T0585_tj_rs_stg0_lrlxjcst_t000__casp9_SAVE_ALL_OUT_21908_3066_0 https://boinc.bakerlab.org/rosetta/workunit.php?wuid=333706692 <core_client_version>6.2.14</core_client_version> <![CDATA[ <message> process got signal 11 </message> |
Michael* Send message Joined: 20 Apr 10 Posts: 2 Credit: 1,334,106 RAC: 0 |
I have a recurring problem with Rosetta. One or more workunit gets stuck at some random percentage to completion. Restarting BOINC seems to get the stuck WUs going again but I hate to see so much of my processing potential wasted. With 8 threads, 4 are usually doing SIMAP with each using 12 or 13 percent processing power and 4 threads doing rosetta with each using 12 or 13 percent. Right now one of the rosetta threads is using 0 percent processing power. One of the workunits is stuck at 83.030% and has been running for 11 hours and 39 minutes. Rosetta WUs never take more than 3 hours. The only mention of this WU in the messages is the one where computation started. Just now a restart set that WU back to 34% but at least it is moving again. I don't think it is a memory problem. I've checked the messages and there is no mention of memory running out or any other problems. BOINC uses less than half of my available RAM (6GB) and I have it set to use 70% max while the computer is active. Any solutions or ideas about this problem would be greatly appreciated. |
robertmiles Send message Joined: 16 Jun 08 Posts: 1232 Credit: 14,281,662 RAC: 1,402 |
Does this look like a minirosetta 2.14 problem triggered problems in several other workunits from other BOINC projects? 9/15/2010 10:13:46 PM rosetta@home Sending scheduler request: To fetch work. 9/15/2010 10:13:46 PM rosetta@home Requesting new tasks for CPU and GPU 9/15/2010 10:13:49 PM rosetta@home Scheduler request completed: got 1 new tasks 9/15/2010 10:13:51 PM rosetta@home Started download of old_targets_calbindin_pcs_files4.zip 9/15/2010 10:14:14 PM rosetta@home Finished download of old_targets_calbindin_pcs_files4.zip 9/15/2010 10:15:46 PM rosetta@home Starting calbindin_old_targets_PCS_SAVE_ALL_OUT_21968_479_0 9/15/2010 10:16:20 PM rosetta@home Starting task calbindin_old_targets_PCS_SAVE_ALL_OUT_21968_479_0 using minirosetta version 214 9/15/2010 10:16:20 PM QMC@HOME Task qasino_b3lyp-E26_iso34.896_0 exited with zero status but no 'finished' file 9/15/2010 10:16:20 PM QMC@HOME If this happens repeatedly you may need to reset the project. 9/15/2010 10:16:20 PM Docking Task 1g2k1ebw_mod0014crossdockinghiv1_7120_130310_0 exited with zero status but no 'finished' file 9/15/2010 10:16:20 PM Docking If this happens repeatedly you may need to reset the project. 9/15/2010 10:16:20 PM World Community Grid Task BETA_E200366_495_A.24.C19H12N2OS2.250.1.set1d06_0 exited with zero status but no 'finished' file 9/15/2010 10:16:20 PM World Community Grid If this happens repeatedly you may need to reset the project. 9/15/2010 10:16:20 PM malariacontrol.net Task wu_760_234_219331_0_1284580689_1 exited with zero status but no 'finished' file 9/15/2010 10:16:20 PM malariacontrol.net If this happens repeatedly you may need to reset the project. 9/15/2010 10:16:20 PM ibercivis Task 1bm7opt_fix_gridmaps.7z__ZINC06701282_1284586816_S08_E05_0 exited with zero status but no 'finished' file 9/15/2010 10:16:20 PM ibercivis If this happens repeatedly you may need to reset the project. 9/15/2010 10:16:20 PM boincsimap Task 10090101.156326_1 exited with zero status but no 'finished' file 9/15/2010 10:16:20 PM boincsimap If this happens repeatedly you may need to reset the project. 9/15/2010 10:16:21 PM PrimeGrid Task pps_sr2sieve_1941162_0 exited with zero status but no 'finished' file 9/15/2010 10:16:21 PM PrimeGrid If this happens repeatedly you may need to reset the project. 9/15/2010 10:16:21 PM ibercivis Computation for task 1bm7opt_fix_gridmaps.7z__ZINC06722361_1284587828_S08_E05_0 finished Most of the other workunits recovered enough to finish apparantly successfully. |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
robertmiles That SHOULD not be possible. But you certainly have some highly suspicious circumstantial evidence to assert otherwise. The only thing the various projects have in common that should be capable of causing a cascading crash like that is... well the BOINC client. My instinct is that BOINC had a problem at that time and took 'em all out. Michael* I can't offer any suggestions. As you pointed out, suspend and resume of the task doesn't even seem to kick it to start, at least when tasks are kept in memory, so full restart of BOINC seems to be the only way to get CPU allocated to the task again. I can only confirm that others have observed this as well, and that it seems to be rather rare. I haven't seen what happens if BOINC reschedules that task one it's own. I mean if you suspend it, it will begin another task. If you then release it, BOINC will eventually try to come back to it. At that time does it successfully get CPU time? Or does it get no CPU while BOINC still says it is running? Something to try anyway. Rosetta Moderator: Mod.Sense |
robertmiles Send message Joined: 16 Jun 08 Posts: 1232 Credit: 14,281,662 RAC: 1,402 |
Could be, although if so, it left no other evidence I can see on what went wrong. I do seem to have had problems with the SuperFetch feature of Windows Vista for some time, though - something not adequately documented so that I can see how to fix it. Need some information on how to control WHAT TYPE of information SuperFetch stores; I already have enough information on how to turn it off entirely. On Michael*'s problem: Could that indicate that restarting from what's left in the main memory does not work adequately for that problem, but restarting from the last checkpoint on the hard drive does? |
Message boards :
Number crunching :
minirosetta 2.14
©2024 University of Washington
https://www.bakerlab.org