Message boards : Number crunching : Minirosetta 3.46
Author | Message |
---|---|
Yifan Song Volunteer moderator Project developer Project scientist Send message Joined: 26 May 09 Posts: 62 Credit: 7,322 RAC: 0 |
minirosetta is updated to 3.46 to include recent developments in electron density and other scoring functions. This update also fixes a bug in density gradient calculations that drives the reference frame apart and occasionally cause the program to crash in a long simulation. Post problems related to the update here. |
TJ Send message Joined: 29 Mar 09 Posts: 127 Credit: 4,799,890 RAC: 0 |
Indeed Yifan, all the cryo's finish without error. So thanks for your effort. One thing I noticed though is that roughly 1/3 of these cryo's run for little more than 25000 seconds. The other 2/3 in 9000-10000 seconds, which seems normal for a Rosetta task. Greetings, TJ. |
fcbrants Send message Joined: 25 Mar 13 Posts: 13 Credit: 3,933,177 RAC: 0 |
I restarted the project last night & ran through a few WU's (stopped taking WU's 4/24/ & restarted 4/27), working only on Rosetta. I checked the task list today (have Not been watching it closely) & found an errored WU. It was sitting in the task list with "Computation Error" & I found its entry in the event log: Computation for task e6-52-1-46_abinitio_SAVE_ALL_OUT_78312_744_2 finished Output file e6-52-1-46_abinitio_SAVE_ALL_OUT_78312_744_2_0 for task e6-52-1-46_abinitio_SAVE_ALL_OUT_78312_744_2 absent It soon spooled off of the task list & was gone. Hope this helps, feel free to contact me if you need to do any troubleshooting. My machine: https://boinc.bakerlab.org/rosetta/show_host_detail.php?hostid=1606827 and results: https://boinc.bakerlab.org/rosetta/results.php?hostid=1606827 and the offending WU: https://boinc.bakerlab.org/rosetta/result.php?resultid=578113826 Franko |
JAMES DORISIO Send message Joined: 25 Dec 05 Posts: 15 Credit: 201,201,447 RAC: 48,100 |
The applications page still shows Windows/x86 as 3.45 and I am getting Rosetta Mini 3.45 with windows xp x86 32 bit computers. Are you planing to upgrade this version also. Thanks Jim. |
JKitterman Send message Joined: 21 Oct 05 Posts: 11 Credit: 814,463 RAC: 0 |
Link to Cryo that ran for over 25000 seconds https://boinc.bakerlab.org/rosetta/result.php?resultid=577970219 It pretty much repeats the sin_cos_range Error the whole time as below. I only checked three of them but they all appeared to have the same issue <core_client_version>7.0.64</core_client_version> <![CDATA[ <stderr_txt> ROR: 1.#QNAN00 is outside of [-1,+1] sin and cos value legal range sin_cos_range ERROR: 1.#QNAN00 is outside of [-1,+1] sin and cos value legal range sin_cos_range ERROR: 1.#QNAN00 is outside of [-1,+1] sin and cos value legal range sin_cos_range ERROR: 1.#QNAN00 is outside of [-1,+1] sin and cos value legal range |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
I restarted the project last night & ran through a few WU's (stopped taking WU's 4/24/ & restarted 4/27), working only on Rosetta. I checked the task list today (have Not been watching it closely) & found an errored WU. It was sitting in the task list with "Computation Error" & I found its entry in the event log: Franko's task output shows 9 seconds of CPU time and the following: ERROR: ERROR: FragmentIO: could not open file start.200.9mers Rosetta Moderator: Mod.Sense |
Yifan Song Volunteer moderator Project developer Project scientist Send message Joined: 26 May 09 Posts: 62 Credit: 7,322 RAC: 0 |
Thanks guys! The cryo jobs use a different protocol, so they do run longer. Let me take a look at the sin_cos_range error. That was the error I eventually saw with the bug in the 3.45 version. I'll check to see if there's anything else still causing the problem. I think the Windows/x86 one is only for graphic interface, the actual minirosetta program runs on the platform "Microsoft Windows running on an AMD x86_64 or Intel EM64T CPU". I'll double check with DEK to make sure. I'll also tell the user running the abinitio job to pay attention to their input files. Yifan PS: the cryo_bf... jobs are from earlier, I think the input files might be been screwed up already with earlier iterations using the old release. |
P . P . L . Send message Joined: 20 Aug 06 Posts: 581 Credit: 4,865,274 RAC: 0 |
This one ran under the new app & had this error message 99 times by the look of it, I haven't counted them ;) you can if you like. CASP9_fb_benchmark_hybridization_run54_T0613_0_D2_SAVE_ALL_OUT_IGNORE_THE_REST_48029_1425_1 https://boinc.bakerlab.org/rosetta/workunit.php?wuid=524562110 ERROR: error in process_residue_request: 'com' ERROR:: Exit from: src/core/conformation/symmetry/util.cc line: 93 ====================================================== DONE :: 99 starting structures 1384.62 cpu seconds This process generated 99 decoys from 99 attempts ====================================================== BOINC :: WS_max 0 BOINC :: Watchdog shutting down... BOINC :: BOINC support services shutting down cleanly ... called boinc_finish |
chip Send message Joined: 2 Oct 07 Posts: 1 Credit: 100,133 RAC: 0 |
3 instruction per cycle and 100% L2/L3 Hit Ratio on Sandy Bridge CPU - nice performance for 25000s cryo's tasks! P.S.: xxx.A_ and CASP9_ tasks have 1 IPC and 60%/40% L2/L3 Hit Ratio... |
Brian Priebe Send message Joined: 27 Nov 09 Posts: 16 Credit: 33,020,247 RAC: 0 |
rb_04_26_38593_73094__t000__0_C1_SAVE_ALL_OUT_IGNORE_THE_REST_79295_1489_0 (task ID 578191167) died with exit status -1 in less than 12sec using the new code. |
robertmiles Send message Joined: 16 Jun 08 Posts: 1232 Credit: 14,281,662 RAC: 1,807 |
Link to Cryo that ran for over 25000 seconds A similar failed workunit for me: https://boinc.bakerlab.org/rosetta/result.php?resultid=577969315 |
P . P . L . Send message Joined: 20 Aug 06 Posts: 581 Credit: 4,865,274 RAC: 0 |
Hi Yifan. I had this one finish O.K. but showing this error message. I also have another 3 cryo tasks that are running overtime and have only check pointed once, I'll let them go to see what happens. cryo_bh__chain_K_subrun_002_SAVE_ALL_OUT_IGNORE_THE_REST_79122_932_0 https://boinc.bakerlab.org/rosetta/workunit.php?wuid=524887806 # cpu_run_time_pref: 21600 dof_atom1 atomno= 3 rsd= 1 atom1 atomno= 1 rsd= 1 atom2 atomno= 2 rsd= 1 atom3 atomno= 5 rsd= 1 atom4 atomno= 6 rsd= 1 THETA1 nan THETA3 nan PHI2 0 ERROR: AtomTree::torsion_angle_dof_id: angle range error ERROR:: Exit from: src/core/kinematics/AtomTree.cc line: 780 ====================================================== DONE :: 52 starting structures 21197.3 cpu seconds This process generated 52 decoys from 52 attempts ====================================================== BOINC :: WS_max 0 BOINC :: Watchdog shutting down... BOINC :: BOINC support services shutting down cleanly ... called boinc_finish </stderr_txt> ]]> Validate state Valid |
Yifan Song Volunteer moderator Project developer Project scientist Send message Joined: 26 May 09 Posts: 62 Credit: 7,322 RAC: 0 |
CASP9_fb is a really old batch of jobs. The symmetry definition IO changed since then. So the new executable shouldn't work on them any more. "com" defines the center of mass, and I believe the naming was changed to avoid confusion. y This one ran under the new app & had this error message 99 times by the look of it, I haven't counted them ;) you can if you like. |
Yifan Song Volunteer moderator Project developer Project scientist Send message Joined: 26 May 09 Posts: 62 Credit: 7,322 RAC: 0 |
rb_04_26_38593_73094__t000__0_C1_SAVE_ALL_OUT_IGNORE_THE_REST_79295_1489_0 (task ID 578191167) died with exit status -1 in less than 12sec using the new code. I'm running a local test now, and it's been running for 20 min now and still going. Maybe there's some downloading errors that make the input files incomplete? |
Yifan Song Volunteer moderator Project developer Project scientist Send message Joined: 26 May 09 Posts: 62 Credit: 7,322 RAC: 0 |
Hi Yifan. That comes from the same problem as the sin_cos_range error. I'm looking into it now. The bug in the gradient calculations makes the transformation matrix non-orthogonal, which is why arcsin(cos) gets the bigger-than-one input, an then some angels become NaN |
bfromcolo Send message Joined: 25 Apr 13 Posts: 2 Credit: 1,294,095 RAC: 0 |
I am new at this, just got things running over the weekend. But I just aborted three of these that had made no progress on their ETA in hours, I assumed it was looping. Looking at my task log I see another 3 that went over 25000 seconds, instead a more normal 10500 seconds. They had all rapidly gotten to 9 - 15 minutes remaining and then the ETA just stopped updating, or was decreasing at a very slow rate, like 1 sec every 5 min. Is there any point allowing these to continue to run once they stop updating the ETA while continuing to consume CPU? When it finally does complete is it returning anything useful? |
P . P . L . Send message Joined: 20 Aug 06 Posts: 581 Credit: 4,865,274 RAC: 0 |
Hi Yifan. These finished late last night there, all 3 only check pointed once at around an 1 hour. The first 2 are from my i7 2700k, the watch dog kill them by the look of it at 10hrs, the other is from my x6 1055T again at 10hrs. https://boinc.bakerlab.org/rosetta/workunit.php?wuid=524887852 cryo_bh__chain_N_subrun_002_SAVE_ALL_OUT_IGNORE_THE_REST_79121_294_0 https://boinc.bakerlab.org/rosetta/workunit.php?wuid=524887800 cryo_bg__chain_K_subrun_002_SAVE_ALL_OUT_IGNORE_THE_REST_79130_932_0 # cpu_run_time_pref: 21600 BOINC:: CPU time: 36269.9s, 14400s + 21600s[2013- 4-28 19:36:38:] :: BOINC InternalDecoyCount: 11 ====================================================== DONE :: 2 starting structures 36269.9 cpu seconds This process generated 11 decoys from 11 attempts ====================================================== called boinc_finish SIGSEGV: segmentation violation </stderr_txt> ]]> Validate state Invalid ============================================================= https://boinc.bakerlab.org/rosetta/workunit.php?wuid=524910872 cryo_bh__chain_e_l_subrun_003_SAVE_ALL_OUT_IGNORE_THE_REST_79150_107_0 # cpu_run_time_pref: 21600 BOINC:: CPU time: 36377.6s, 14400s + 21600s[2013- 4-28 21: 2:59:] :: BOINC InternalDecoyCount: 8 ====================================================== DONE :: 2 starting structures 36377.6 cpu seconds This process generated 8 decoys from 8 attempts ====================================================== called boinc_finish SIGSEGV: segmentation violation Stack trace (21 frames): [0xb2aef87] [0xf77c0400] [0xa6ce54c] [0xa6e7659] [0xa1648c7] [0xa1f2dd2] [0xa1f4df1] [0x9d4d1a5] [0x9f10187] [0x9d56457] [0x9d4265a] [0x8925eca] [0x8681018] [0x992d14f] [0x9931429] [0x9aebcad] [0x9b4f815] [0x9b4d045] [0x8054950] [0xb33f328] [0x8048131] Exiting... </stderr_txt> ]]> Validate state Valid |
robertmiles Send message Joined: 16 Jun 08 Posts: 1232 Credit: 14,281,662 RAC: 1,807 |
Yifan, Two cryo workunits where most but not all of the decoys gave the sin and cos range error: sin_cos_range ERROR: 1.#QNAN00 is outside of [-1,+1] sin and cos value legal range https://boinc.bakerlab.org/rosetta/result.php?resultid=578116849 https://boinc.bakerlab.org/rosetta/result.php?resultid=578212532 Another one where it has been nearly an hour since the last checkpoint. Not clear if this indicates a problem. https://boinc.bakerlab.org/rosetta/result.php?resultid=578064578 |
P . P . L . Send message Joined: 20 Aug 06 Posts: 581 Credit: 4,865,274 RAC: 0 |
Hi. I can only speak for myself, but I'm seeing more invalid tasks & no/few check pointing since the update.? |
Yifan Song Volunteer moderator Project developer Project scientist Send message Joined: 26 May 09 Posts: 62 Credit: 7,322 RAC: 0 |
rb_04_26_38593_73094__t000__0_C1_SAVE_ALL_OUT_IGNORE_THE_REST_79295_1489_0 (task ID 578191167) died with exit status -1 in less than 12sec using the new code. OK, found the problem with this one. It comes from our robetta server using a parameter to randomly trigger a deprecated function. I just changed the server to disable that mechanism. |
Message boards :
Number crunching :
Minirosetta 3.46
©2024 University of Washington
https://www.bakerlab.org