Minirosetta 3.46

Message boards : Number crunching : Minirosetta 3.46

To post messages, you must log in.

1 · 2 · 3 · 4 · Next

AuthorMessage
Yifan Song
Volunteer moderator
Project developer
Project scientist

Send message
Joined: 26 May 09
Posts: 62
Credit: 7,322
RAC: 0
Message 75493 - Posted: 26 Apr 2013, 22:59:12 UTC

minirosetta is updated to 3.46 to include recent developments in electron density and other scoring functions.
This update also fixes a bug in density gradient calculations that drives the reference frame apart and occasionally cause the program to crash in a long simulation.
Post problems related to the update here.
ID: 75493 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
TJ

Send message
Joined: 29 Mar 09
Posts: 127
Credit: 4,799,890
RAC: 0
Message 75501 - Posted: 27 Apr 2013, 12:08:45 UTC

Indeed Yifan, all the cryo's finish without error. So thanks for your effort.
One thing I noticed though is that roughly 1/3 of these cryo's run for little more than 25000 seconds. The other 2/3 in 9000-10000 seconds, which seems normal for a Rosetta task.
Greetings,
TJ.
ID: 75501 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
fcbrants
Avatar

Send message
Joined: 25 Mar 13
Posts: 13
Credit: 3,933,177
RAC: 0
Message 75509 - Posted: 27 Apr 2013, 18:57:33 UTC

I restarted the project last night & ran through a few WU's (stopped taking WU's 4/24/ & restarted 4/27), working only on Rosetta. I checked the task list today (have Not been watching it closely) & found an errored WU. It was sitting in the task list with "Computation Error" & I found its entry in the event log:

Computation for task e6-52-1-46_abinitio_SAVE_ALL_OUT_78312_744_2 finished
Output file e6-52-1-46_abinitio_SAVE_ALL_OUT_78312_744_2_0 for task e6-52-1-46_abinitio_SAVE_ALL_OUT_78312_744_2 absent

It soon spooled off of the task list & was gone.

Hope this helps, feel free to contact me if you need to do any troubleshooting.

My machine:

https://boinc.bakerlab.org/rosetta/show_host_detail.php?hostid=1606827

and results:

https://boinc.bakerlab.org/rosetta/results.php?hostid=1606827

and the offending WU:

https://boinc.bakerlab.org/rosetta/result.php?resultid=578113826

Franko
ID: 75509 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile JAMES DORISIO

Send message
Joined: 25 Dec 05
Posts: 15
Credit: 201,214,286
RAC: 35,552
Message 75510 - Posted: 27 Apr 2013, 19:06:29 UTC

The applications page still shows Windows/x86 as 3.45 and I am getting Rosetta Mini 3.45 with windows xp x86 32 bit computers.
Are you planing to upgrade this version also.
Thanks Jim.


ID: 75510 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
JKitterman

Send message
Joined: 21 Oct 05
Posts: 11
Credit: 814,463
RAC: 0
Message 75511 - Posted: 27 Apr 2013, 19:24:44 UTC - in response to Message 75493.  
Last modified: 27 Apr 2013, 19:25:40 UTC

Link to Cryo that ran for over 25000 seconds
https://boinc.bakerlab.org/rosetta/result.php?resultid=577970219
It pretty much repeats the sin_cos_range Error the whole time as below.
I only checked three of them but they all appeared to have the same issue

<core_client_version>7.0.64</core_client_version>
<![CDATA[
<stderr_txt>
ROR: 1.#QNAN00 is outside of [-1,+1] sin and cos value legal range
sin_cos_range ERROR: 1.#QNAN00 is outside of [-1,+1] sin and cos value legal range
sin_cos_range ERROR: 1.#QNAN00 is outside of [-1,+1] sin and cos value legal range
sin_cos_range ERROR: 1.#QNAN00 is outside of [-1,+1] sin and cos value legal range
ID: 75511 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 75512 - Posted: 27 Apr 2013, 20:54:19 UTC - in response to Message 75509.  

I restarted the project last night & ran through a few WU's (stopped taking WU's 4/24/ & restarted 4/27), working only on Rosetta. I checked the task list today (have Not been watching it closely) & found an errored WU. It was sitting in the task list with "Computation Error" & I found its entry in the event log:

Computation for task e6-52-1-46_abinitio_SAVE_ALL_OUT_78312_744_2 finished
Output file e6-52-1-46_abinitio_SAVE_ALL_OUT_78312_744_2_0 for task e6-52-1-46_abinitio_SAVE_ALL_OUT_78312_744_2 absent

It soon spooled off of the task list & was gone.

Hope this helps, feel free to contact me if you need to do any troubleshooting.

My machine:

https://boinc.bakerlab.org/rosetta/show_host_detail.php?hostid=1606827

and results:

https://boinc.bakerlab.org/rosetta/results.php?hostid=1606827

and the offending WU:

https://boinc.bakerlab.org/rosetta/result.php?resultid=578113826

Franko


Franko's task output shows 9 seconds of CPU time and the following:

ERROR: ERROR: FragmentIO: could not open file start.200.9mers
ERROR:: Exit from: ......srccorefragmentFragmentIO.cc line: 233
std::cerr: Exception was thrown:


[ERROR] EXCN_utility_exit has been thrown from: ......srccorefragmentFragmentIO.cc line: 233
ERROR: ERROR: FragmentIO: could not open file start.200.9mers

Rosetta Moderator: Mod.Sense
ID: 75512 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Yifan Song
Volunteer moderator
Project developer
Project scientist

Send message
Joined: 26 May 09
Posts: 62
Credit: 7,322
RAC: 0
Message 75513 - Posted: 27 Apr 2013, 21:56:22 UTC
Last modified: 28 Apr 2013, 0:26:04 UTC

Thanks guys! The cryo jobs use a different protocol, so they do run longer.
Let me take a look at the sin_cos_range error. That was the error I eventually saw with the bug in the 3.45 version. I'll check to see if there's anything else still causing the problem.
I think the Windows/x86 one is only for graphic interface, the actual minirosetta program runs on the platform "Microsoft Windows running on an AMD x86_64 or Intel EM64T CPU". I'll double check with DEK to make sure.
I'll also tell the user running the abinitio job to pay attention to their input files.

Yifan

PS: the cryo_bf... jobs are from earlier, I think the input files might be been screwed up already with earlier iterations using the old release.
ID: 75513 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
P . P . L .

Send message
Joined: 20 Aug 06
Posts: 581
Credit: 4,865,274
RAC: 0
Message 75516 - Posted: 28 Apr 2013, 2:33:51 UTC

This one ran under the new app & had this error message 99 times by the look of it, I haven't counted them ;) you can if you like.

CASP9_fb_benchmark_hybridization_run54_T0613_0_D2_SAVE_ALL_OUT_IGNORE_THE_REST_48029_1425_1

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=524562110



ERROR: error in process_residue_request: 'com'
ERROR:: Exit from: src/core/conformation/symmetry/util.cc line: 93

======================================================
DONE :: 99 starting structures 1384.62 cpu seconds
This process generated 99 decoys from 99 attempts
======================================================
BOINC :: WS_max 0

BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down cleanly ...
called boinc_finish

ID: 75516 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile chip
Avatar

Send message
Joined: 2 Oct 07
Posts: 1
Credit: 100,133
RAC: 0
Message 75518 - Posted: 28 Apr 2013, 4:28:03 UTC
Last modified: 28 Apr 2013, 5:01:12 UTC

3 instruction per cycle and 100% L2/L3 Hit Ratio on Sandy Bridge CPU - nice performance for 25000s cryo's tasks!

P.S.: xxx.A_ and CASP9_ tasks have 1 IPC and 60%/40% L2/L3 Hit Ratio...
ID: 75518 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Brian Priebe

Send message
Joined: 27 Nov 09
Posts: 16
Credit: 33,020,247
RAC: 0
Message 75519 - Posted: 28 Apr 2013, 5:45:40 UTC

rb_04_26_38593_73094__t000__0_C1_SAVE_ALL_OUT_IGNORE_THE_REST_79295_1489_0 (task ID 578191167) died with exit status -1 in less than 12sec using the new code.
ID: 75519 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1232
Credit: 14,281,662
RAC: 1,402
Message 75520 - Posted: 28 Apr 2013, 6:01:07 UTC - in response to Message 75511.  

Link to Cryo that ran for over 25000 seconds
https://boinc.bakerlab.org/rosetta/result.php?resultid=577970219
It pretty much repeats the sin_cos_range Error the whole time as below.
I only checked three of them but they all appeared to have the same issue

<core_client_version>7.0.64</core_client_version>
<![CDATA[
<stderr_txt>
ROR: 1.#QNAN00 is outside of [-1,+1] sin and cos value legal range
sin_cos_range ERROR: 1.#QNAN00 is outside of [-1,+1] sin and cos value legal range
sin_cos_range ERROR: 1.#QNAN00 is outside of [-1,+1] sin and cos value legal range
sin_cos_range ERROR: 1.#QNAN00 is outside of [-1,+1] sin and cos value legal range


A similar failed workunit for me:

https://boinc.bakerlab.org/rosetta/result.php?resultid=577969315
ID: 75520 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
P . P . L .

Send message
Joined: 20 Aug 06
Posts: 581
Credit: 4,865,274
RAC: 0
Message 75521 - Posted: 28 Apr 2013, 8:02:44 UTC
Last modified: 28 Apr 2013, 8:04:51 UTC

Hi Yifan.

I had this one finish O.K. but showing this error message.

I also have another 3 cryo tasks that are running overtime and have only check pointed once, I'll let them go to see what happens.


cryo_bh__chain_K_subrun_002_SAVE_ALL_OUT_IGNORE_THE_REST_79122_932_0

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=524887806

# cpu_run_time_pref: 21600
dof_atom1 atomno= 3 rsd= 1
atom1 atomno= 1 rsd= 1
atom2 atomno= 2 rsd= 1
atom3 atomno= 5 rsd= 1
atom4 atomno= 6 rsd= 1
THETA1 nan
THETA3 nan
PHI2 0

ERROR: AtomTree::torsion_angle_dof_id: angle range error
ERROR:: Exit from: src/core/kinematics/AtomTree.cc line: 780
======================================================
DONE :: 52 starting structures 21197.3 cpu seconds
This process generated 52 decoys from 52 attempts
======================================================
BOINC :: WS_max 0

BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down cleanly ...
called boinc_finish

</stderr_txt>
]]>

Validate state Valid
ID: 75521 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Yifan Song
Volunteer moderator
Project developer
Project scientist

Send message
Joined: 26 May 09
Posts: 62
Credit: 7,322
RAC: 0
Message 75526 - Posted: 28 Apr 2013, 21:13:14 UTC - in response to Message 75516.  

CASP9_fb is a really old batch of jobs. The symmetry definition IO changed since then. So the new executable shouldn't work on them any more. "com" defines the center of mass, and I believe the naming was changed to avoid confusion.
y

This one ran under the new app & had this error message 99 times by the look of it, I haven't counted them ;) you can if you like.

CASP9_fb_benchmark_hybridization_run54_T0613_0_D2_SAVE_ALL_OUT_IGNORE_THE_REST_48029_1425_1

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=524562110



ERROR: error in process_residue_request: 'com'
ERROR:: Exit from: src/core/conformation/symmetry/util.cc line: 93

======================================================
DONE :: 99 starting structures 1384.62 cpu seconds
This process generated 99 decoys from 99 attempts
======================================================
BOINC :: WS_max 0

BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down cleanly ...
called boinc_finish
ID: 75526 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Yifan Song
Volunteer moderator
Project developer
Project scientist

Send message
Joined: 26 May 09
Posts: 62
Credit: 7,322
RAC: 0
Message 75527 - Posted: 28 Apr 2013, 21:15:31 UTC - in response to Message 75519.  

rb_04_26_38593_73094__t000__0_C1_SAVE_ALL_OUT_IGNORE_THE_REST_79295_1489_0 (task ID 578191167) died with exit status -1 in less than 12sec using the new code.


I'm running a local test now, and it's been running for 20 min now and still going. Maybe there's some downloading errors that make the input files incomplete?
ID: 75527 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Yifan Song
Volunteer moderator
Project developer
Project scientist

Send message
Joined: 26 May 09
Posts: 62
Credit: 7,322
RAC: 0
Message 75529 - Posted: 28 Apr 2013, 21:28:11 UTC - in response to Message 75521.  

Hi Yifan.

I had this one finish O.K. but showing this error message.

I also have another 3 cryo tasks that are running overtime and have only check pointed once, I'll let them go to see what happens.


cryo_bh__chain_K_subrun_002_SAVE_ALL_OUT_IGNORE_THE_REST_79122_932_0

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=524887806

# cpu_run_time_pref: 21600
dof_atom1 atomno= 3 rsd= 1
atom1 atomno= 1 rsd= 1
atom2 atomno= 2 rsd= 1
atom3 atomno= 5 rsd= 1
atom4 atomno= 6 rsd= 1
THETA1 nan
THETA3 nan
PHI2 0

ERROR: AtomTree::torsion_angle_dof_id: angle range error
ERROR:: Exit from: src/core/kinematics/AtomTree.cc line: 780
======================================================
DONE :: 52 starting structures 21197.3 cpu seconds
This process generated 52 decoys from 52 attempts
======================================================
BOINC :: WS_max 0

BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down cleanly ...
called boinc_finish


]]>

Validate state Valid


That comes from the same problem as the sin_cos_range error. I'm looking into it now. The bug in the gradient calculations makes the transformation matrix non-orthogonal, which is why arcsin(cos) gets the bigger-than-one input, an then some angels become NaN
ID: 75529 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
bfromcolo

Send message
Joined: 25 Apr 13
Posts: 2
Credit: 1,294,095
RAC: 0
Message 75530 - Posted: 28 Apr 2013, 21:48:24 UTC

I am new at this, just got things running over the weekend. But I just aborted three of these that had made no progress on their ETA in hours, I assumed it was looping. Looking at my task log I see another 3 that went over 25000 seconds, instead a more normal 10500 seconds. They had all rapidly gotten to 9 - 15 minutes remaining and then the ETA just stopped updating, or was decreasing at a very slow rate, like 1 sec every 5 min. Is there any point allowing these to continue to run once they stop updating the ETA while continuing to consume CPU? When it finally does complete is it returning anything useful?
ID: 75530 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
P . P . L .

Send message
Joined: 20 Aug 06
Posts: 581
Credit: 4,865,274
RAC: 0
Message 75531 - Posted: 28 Apr 2013, 22:00:51 UTC
Last modified: 28 Apr 2013, 22:06:04 UTC

Hi Yifan.

These finished late last night there, all 3 only check pointed once at around an 1 hour.

The first 2 are from my i7 2700k, the watch dog kill them by the look of it at 10hrs, the other is from my x6 1055T again at 10hrs.


https://boinc.bakerlab.org/rosetta/workunit.php?wuid=524887852

cryo_bh__chain_N_subrun_002_SAVE_ALL_OUT_IGNORE_THE_REST_79121_294_0

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=524887800

cryo_bg__chain_K_subrun_002_SAVE_ALL_OUT_IGNORE_THE_REST_79130_932_0


# cpu_run_time_pref: 21600
BOINC:: CPU time: 36269.9s, 14400s + 21600s[2013- 4-28 19:36:38:] :: BOINC
InternalDecoyCount: 11
======================================================
DONE :: 2 starting structures 36269.9 cpu seconds
This process generated 11 decoys from 11 attempts
======================================================
called boinc_finish
SIGSEGV: segmentation violation

</stderr_txt>
]]>

Validate state Invalid

=============================================================

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=524910872

cryo_bh__chain_e_l_subrun_003_SAVE_ALL_OUT_IGNORE_THE_REST_79150_107_0


# cpu_run_time_pref: 21600
BOINC:: CPU time: 36377.6s, 14400s + 21600s[2013- 4-28 21: 2:59:] :: BOINC
InternalDecoyCount: 8
======================================================
DONE :: 2 starting structures 36377.6 cpu seconds
This process generated 8 decoys from 8 attempts
======================================================
called boinc_finish
SIGSEGV: segmentation violation
Stack trace (21 frames):
[0xb2aef87]
[0xf77c0400]
[0xa6ce54c]
[0xa6e7659]
[0xa1648c7]
[0xa1f2dd2]
[0xa1f4df1]
[0x9d4d1a5]
[0x9f10187]
[0x9d56457]
[0x9d4265a]
[0x8925eca]
[0x8681018]
[0x992d14f]
[0x9931429]
[0x9aebcad]
[0x9b4f815]
[0x9b4d045]
[0x8054950]
[0xb33f328]
[0x8048131]

Exiting...

</stderr_txt>
]]>

Validate state Valid
ID: 75531 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1232
Credit: 14,281,662
RAC: 1,402
Message 75532 - Posted: 29 Apr 2013, 1:17:44 UTC

Yifan,

Two cryo workunits where most but not all of the decoys gave the sin and cos range error:

sin_cos_range ERROR: 1.#QNAN00 is outside of [-1,+1] sin and cos value legal range

https://boinc.bakerlab.org/rosetta/result.php?resultid=578116849

https://boinc.bakerlab.org/rosetta/result.php?resultid=578212532


Another one where it has been nearly an hour since the last checkpoint. Not clear if this indicates a problem.

https://boinc.bakerlab.org/rosetta/result.php?resultid=578064578
ID: 75532 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
P . P . L .

Send message
Joined: 20 Aug 06
Posts: 581
Credit: 4,865,274
RAC: 0
Message 75533 - Posted: 29 Apr 2013, 1:26:18 UTC

Hi.

I can only speak for myself, but I'm seeing more invalid tasks & no/few check pointing since the update.?



ID: 75533 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Yifan Song
Volunteer moderator
Project developer
Project scientist

Send message
Joined: 26 May 09
Posts: 62
Credit: 7,322
RAC: 0
Message 75535 - Posted: 29 Apr 2013, 2:02:59 UTC - in response to Message 75527.  

rb_04_26_38593_73094__t000__0_C1_SAVE_ALL_OUT_IGNORE_THE_REST_79295_1489_0 (task ID 578191167) died with exit status -1 in less than 12sec using the new code.


I'm running a local test now, and it's been running for 20 min now and still going. Maybe there's some downloading errors that make the input files incomplete?


OK, found the problem with this one. It comes from our robetta server using a parameter to randomly trigger a deprecated function. I just changed the server to disable that mechanism.
ID: 75535 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
1 · 2 · 3 · 4 · Next

Message boards : Number crunching : Minirosetta 3.46



©2024 University of Washington
https://www.bakerlab.org