Minirosetta 3.14

Message boards : Number crunching : Minirosetta 3.14

To post messages, you must log in.

Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · Next

AuthorMessage
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 71071 - Posted: 17 Aug 2011, 3:45:28 UTC

Robert, those are the odd ones. The watchdog can't get at them, because it doesn't get any CPU either. If you exit and restart BOINC, they will generally straighten themselves out. If 600MB was more then comfortable for your machine, then it does little harm to cancel one once and a while.
Rosetta Moderator: Mod.Sense
ID: 71071 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1233
Credit: 14,281,662
RAC: 943
Message 71074 - Posted: 17 Aug 2011, 13:56:53 UTC
Last modified: 17 Aug 2011, 14:03:24 UTC

Another 3.14 workunit that stopped using any CPU time.

https://boinc.bakerlab.org/rosetta/result.php?resultid=442045657

T0610_3ot2.pdb_boinc_lr_control_nativechainA_loopbuild_threading_cst_relax_wangyr_IGNORE_THE_REST_30423_909
Max RAM usage 95 MB
CPU time at last checkpoint 00:24:30
CPU time 00:24:33
Elapsed time 08:52:22
Estimated time remaining 26:52:02
Fraction done 3.325%
Virtual memory size 325.62 MB
Working set size 333.91 MB

Note the large difference between Max RAM usage and the Working set size.

Peak working set 341.920 MB
BOINC 6.12.33
64-bit Windows Vista Home Premium SP2
8 GB memory; BOINC allowed to use 40% of it
Leave applications in memory when suspended
Tthrottle64 V4.20 running, but only to display the temperatures

Already aborted, rather than wait for an answer.

Rosetta@Home is on No new tasks; probably will stay there until I see some signs on RALPH@Home that something is being done about this.

600 MB is reasonable on this computer; going many hours doing nothing useful is not.
ID: 71074 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Paul van Dijken

Send message
Joined: 21 Jun 10
Posts: 2
Credit: 1,123,717
RAC: 0
Message 71146 - Posted: 26 Aug 2011, 8:26:04 UTC

After running WU rb_08_23_25236_50085_rs_stg0_lrlxMultiCst_t000__casp9__aln2_SAVE_ALL_OUT_30565_13_0 for 13+ hours and no progress beyond 12.700%, I aborted it.
This was the 3rd time in a few days it happened.
I stopped downloading Rosetta.
Any estimate when this issue is going to be solved?

ID: 71146 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Ed

Send message
Joined: 2 Aug 11
Posts: 31
Credit: 662,563
RAC: 0
Message 71173 - Posted: 1 Sep 2011, 3:32:35 UTC

I am having a different problem. My BOINC is set to run 65% Seti and 35% Rosetta, but it is constantly running Rosetta in priority mode. This has been going on for days. It is like every WU is coming down and immediately goes into priority.

I have had no SETI WU for days so Rosetta has been getting all the cycle. Now that Seti is sending out WU again I expect things to balance back out but it is not happening. Seti is getting no CPU time at all.

I have suspended Rosetta to give SETI some run time.

this better stop. Anyone have any suggestions?
ID: 71173 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1233
Credit: 14,281,662
RAC: 943
Message 71176 - Posted: 1 Sep 2011, 5:07:49 UTC - in response to Message 71173.  
Last modified: 1 Sep 2011, 5:09:29 UTC

I am having a different problem. My BOINC is set to run 65% Seti and 35% Rosetta, but it is constantly running Rosetta in priority mode. This has been going on for days. It is like every WU is coming down and immediately goes into priority.

I have had no SETI WU for days so Rosetta has been getting all the cycle. Now that Seti is sending out WU again I expect things to balance back out but it is not happening. Seti is getting no CPU time at all.

I have suspended Rosetta to give SETI some run time.

this better stop. Anyone have any suggestions?


Do the Rosetta workunits happen to have due dates before those for SETI?

Is the total expected time to run all the Rosetta workunit greater than 35% of the time to their due dates?

Have you tried setting both Rosetta and SETI on No New Tasks until close to finishing all the downloaded workunits, then unsetting this for SETI first and getting a few SETI workunits, then unsetting it for Rosetta as well?
ID: 71176 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 71180 - Posted: 1 Sep 2011, 23:15:44 UTC
Last modified: 1 Sep 2011, 23:17:32 UTC

The thing I would try, is leaving it alone. The lack of work from SETI is causing the BOINC Manager to get work from Rosetta, but then when it looks at the 35% resource share, and the debt to SETI, it starts to worry about completing the tasks on time, so it sets them to run first (which is all that "high priority" means after all).

You shouldn't have to suspend projects and micro-manage things to get the resource share you have selected... when work is available. When work is not available, it gets work from where it can... and makes it up to the other project when it starts producing work again.
Rosetta Moderator: Mod.Sense
ID: 71180 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Ed

Send message
Joined: 2 Aug 11
Posts: 31
Credit: 662,563
RAC: 0
Message 71182 - Posted: 2 Sep 2011, 20:48:44 UTC

Thanks guys!

I am going to leave it alone, but I have set Rosetta to "no new WU" When it runs dry the Seti will get its time again.

It could be that, during the time when Rosetta had no work and Seti was getting all the time that a "debt" was built up and it is now being worked off.

Who knows, but I think you all for you analysis and recommendations.
ID: 71182 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 0
Message 71183 - Posted: 3 Sep 2011, 6:44:57 UTC - in response to Message 71182.  
Last modified: 3 Sep 2011, 6:50:21 UTC

Thanks guys!

I am going to leave it alone, but I have set Rosetta to "no new WU" When it runs dry the Seti will get its time again.

It could be that, during the time when Rosetta had no work and Seti was getting all the time that a "debt" was built up and it is now being worked off.

Who knows, but I think you all for you analysis and recommendations.



by no new work you will create a debt again and the next time your start the project you will get an overload of rosetta work and seti will shut down until the debt is settled.

best thing to do is to set your percentage of rosetta much lower than seti.
then the deb issue will not be a factor and rosetta work will take a back seat to seti until seti dries up again.
ID: 71183 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1233
Credit: 14,281,662
RAC: 943
Message 71184 - Posted: 3 Sep 2011, 9:58:33 UTC - in response to Message 71183.  
Last modified: 3 Sep 2011, 10:37:14 UTC

Thanks guys!

I am going to leave it alone, but I have set Rosetta to "no new WU" When it runs dry the Seti will get its time again.

It could be that, during the time when Rosetta had no work and Seti was getting all the time that a "debt" was built up and it is now being worked off.

Who knows, but I think you all for you analysis and recommendations.



by no new work you will create a debt again and the next time your start the project you will get an overload of rosetta work and seti will shut down until the debt is settled.

best thing to do is to set your percentage of rosetta much lower than seti.
then the deb issue will not be a factor and rosetta work will take a back seat to seti until seti dries up again.


I've tried something similar, and found that if you set some BOINC project to such a low percentage that giving it only that percentage will not allow all the workunits you have already downloaded fron that project to complete on time, at least one of those workunits will almost immediately go into high priority mode. Shortening the queue of already downloaded workunits, if appropriate, before making any such change, works better.
ID: 71184 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 0
Message 71185 - Posted: 3 Sep 2011, 13:05:14 UTC - in response to Message 71184.  
Last modified: 3 Sep 2011, 13:05:53 UTC

[quote]Thanks guys!

I am going to leave it alone, but I have set Rosetta to "no new WU" When it runs dry the Seti will get its time again.

It could be that, during the time when Rosetta had no work and Seti was getting all the time that a "debt" was built up and it is now being worked off.

Who knows, but I think you all for you analysis and recommendations.



by no new work you will create a debt again and the next time your start the project you will get an overload of rosetta work and seti will shut down until the debt is settled.

best thing to do is to set your percentage of rosetta much lower than seti.
then the deb issue will not be a factor and rosetta work will take a back seat to seti until seti dries up again.



so then what would happen if he set both projects to no new work, let the tasks clear out. redo his percentages and extra days to what he thinks will work and then allow new work to come in? This way he could start clean and let Boinc Mgr figure out what to do based on the new parameters.
I've tried something similar, and found that if you set some BOINC project to such a low percentage that giving it only that percentage will not allow all the workunits you have already downloaded fron that project to complete on time, at least one of those workunits will almost immediately go into high priority mode. Shortening the queue of already downloaded workunits, if appropriate, before making any such change, works better.
ID: 71185 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1233
Credit: 14,281,662
RAC: 943
Message 71187 - Posted: 3 Sep 2011, 23:23:38 UTC - in response to Message 71185.  

[quote]Thanks guys!

I am going to leave it alone, but I have set Rosetta to "no new WU" When it runs dry the Seti will get its time again.

It could be that, during the time when Rosetta had no work and Seti was getting all the time that a "debt" was built up and it is now being worked off.

Who knows, but I think you all for you analysis and recommendations.



by no new work you will create a debt again and the next time your start the project you will get an overload of rosetta work and seti will shut down until the debt is settled.

best thing to do is to set your percentage of rosetta much lower than seti.
then the deb issue will not be a factor and rosetta work will take a back seat to seti until seti dries up again.



so then what would happen if he set both projects to no new work, let the tasks clear out. redo his percentages and extra days to what he thinks will work and then allow new work to come in? This way he could start clean and let Boinc Mgr figure out what to do based on the new parameters.


I've tried that also. Can start with an imbalance in the workunits, with the first project that asks for workunits getting more that its share. Generally not as bad an imbalance, though.
ID: 71187 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 71189 - Posted: 4 Sep 2011, 20:09:37 UTC

The best way for debt to balance out, is to leave it alone. Now that SETI has work coming, the BOINC Manager will figure out what it needs to do to balance the debt and deliver the resource shares you have selected.

As you say, there may have been a debt owed to Rosetta. If the two started equal, and SETI work dried up, you get a larger than average pile of Rosetta work. Then SETI comes back with work, and BOINC will figure out that it needs to both complete the tasks it has from Rosetta, and begin getting more from SETI than Rosetta to achieve the desired resource share.

All of the adjusting of resource shares, flagging as no new work, etc. is simply making it impossible for BOINC to figure out what you want.
Rosetta Moderator: Mod.Sense
ID: 71189 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Ed

Send message
Joined: 2 Aug 11
Posts: 31
Credit: 662,563
RAC: 0
Message 71203 - Posted: 6 Sep 2011, 15:59:13 UTC
Last modified: 6 Sep 2011, 15:59:34 UTC

Looks like BOINC has finaly balanced out as I am now getting a more normal distribution of processing time between the two projects.

Thanks for the recommendations.
ID: 71203 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
svincent

Send message
Joined: 30 Dec 05
Posts: 219
Credit: 12,120,035
RAC: 0
Message 71264 - Posted: 15 Sep 2011, 16:55:49 UTC

Task 448806358 (T590_cc_rs_stg0_lrlxMultiCst_t000__casp9__aln1_SAVE_ALL_OUT_31304_125_0) failed on Mac

ERROR: seqpos >=1 && seqpos <= size()
ERROR:: Exit from: src/core/conformation/Conformation.hh line: 268
BOINC:: Error reading and gzipping output datafile: default.out
called boinc_finish

</stderr_txt>
]]>
ID: 71264 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
svincent

Send message
Joined: 30 Dec 05
Posts: 219
Credit: 12,120,035
RAC: 0
Message 71306 - Posted: 21 Sep 2011, 22:32:22 UTC

A couple of tasks called test_needle* failed in the middle of computation under W7 with the same error message,

ERROR: std::abs( coordsys_rot.det() - 1.0 ) < 1e-6
ERROR:: Exit from: ......srccoreposesymmetryutil.cc line: 740
BOINC:: Error reading and gzipping output datafile: default.out
called boinc_finish

The tasks were 449325696 and 449325673

-------

Another test needle* task, 449325408 ran for an excessive length of time (>7 hours on a 3 hour preference) generating 2 decoys. The result was valid but there's a message in the log about an H-bond being tripped.

Hbond tripped: [2011- 9-20 10:24:55:]
BOINC:: CPU time: 25291.3s, 14400s + 10800s[2011- 9-20 15: 1:52:] :: BOINC
InternalDecoyCount: 2
======================================================
DONE :: 2 starting structures 25291.3 cpu seconds
This process generated 2 decoys from 2 attempts
======================================================
called boinc_finish

ID: 71306 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile alpha

Send message
Joined: 4 Nov 06
Posts: 27
Credit: 1,550,107
RAC: 0
Message 71342 - Posted: 27 Sep 2011, 13:16:04 UTC

Compute error for work unit 408829805:

https://boinc.bakerlab.org/rosetta/result.php?resultid=450197233

The only problem I see is:

upload failure: <file_xfer_error>
<file_name>1AI8.ppk1.nobb_docking_benchmark_8Sep2011_30843_72_1_0</file_name>
<error_code>-131</error_code>
</file_xfer_error>
ID: 71342 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
[AF>france>pas-de-calais]symaski62

Send message
Joined: 19 Sep 05
Posts: 47
Credit: 33,871
RAC: 0
Message 71360 - Posted: 1 Oct 2011, 23:26:53 UTC

https://boinc.bakerlab.org/rosetta/result.php?resultid=452608726

Task ID 452608726
Name Aug20_needle_13start_test_SAVE_ALL_OUT__31431_61348_0


<core_client_version>6.12.33</core_client_version>
<![CDATA[
<stderr_txt>
[2011-10- 1  0:22:16:] :: BOINC:: Initializing ... ok.
[2011-10- 1  0:22:16:] :: BOINC :: boinc_init()
BOINC:: Setting up shared resources ... ok.
BOINC:: Setting up semaphores ... ok.
BOINC:: Updating status ... ok.
BOINC:: Registering timer callback... ok.
BOINC:: Worker initialized successfully. 
Registering options.. 
Registered extra options.
Initializing broker options ...
Registered extra options.
Initializing core...
Initializing options.... ok 
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize()  End reached
Loaded options.... ok 
Processed options.... ok 
Initializing random generators... ok 
Initialization complete. 
Initializing options.... ok 
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize()  End reached
Loaded options.... ok 
Processed options.... ok 
Initializing random generators... ok 
Initialization complete. 
Setting WU description ...
Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev42272.zip
Unpacking WU data ...
Unpacking data: ../../projects/boinc.bakerlab.org_rosetta/Aug20_13start_needle.zip
Setting database description ...
Setting up checkpointing ...
Setting up graphics native ...
BOINC:: Worker startup. 
Starting watchdog...
Watchdog active.
# cpu_run_time_pref: 86400
[2011-10- 1 10:18:19:] :: BOINC:: Initializing ... ok.
[2011-10- 1 10:18:19:] :: BOINC :: boinc_init()
BOINC:: Setting up shared resources ... ok.
BOINC:: Setting up semaphores ... ok.
BOINC:: Updating status ... ok.
BOINC:: Registering timer callback... ok.
BOINC:: Worker initialized successfully. 
Registering options.. 
Registered extra options.
Initializing broker options ...
Registered extra options.
Initializing core...
Initializing options.... ok 
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize()  End reached
Loaded options.... ok 
Processed options.... ok 
Initializing random generators... ok 
Initialization complete. 
Initializing options.... ok 
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize()  End reached
Loaded options.... ok 
Processed options.... ok 
Initializing random generators... ok 
Initialization complete. 
Setting WU description ...
Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev42272.zip
Unpacking WU data ...
Unpacking data: ../../projects/boinc.bakerlab.org_rosetta/Aug20_13start_needle.zip
Setting database description ...
Setting up checkpointing ...
Setting up graphics native ...
BOINC:: Worker startup. 
Starting watchdog...
Watchdog active.
# cpu_run_time_pref: 86400
Continuing computation from checkpoint: chk_S_00008_FragmentSampler__stage1 ... success! 

ERROR: std::abs( coordsys_rot.det() - 1.0 ) < 1e-6
ERROR:: Exit from: ......srccoreposesymmetryutil.cc line: 740
called boinc_finish

</stderr_txt>
]]>

ID: 71360 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
entigy

Send message
Joined: 2 Nov 05
Posts: 5
Credit: 990,830
RAC: 0
Message 71362 - Posted: 2 Oct 2011, 16:01:00 UTC

I've just reconnected to Rosetta after some time away, and the 2 units I've completed both have a 'validation error'.
Is this going to happen with all the remaining Mini 3.14 units I have ?
If so, I might as well detach again ......
ID: 71362 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1233
Credit: 14,281,662
RAC: 943
Message 71363 - Posted: 2 Oct 2011, 21:11:07 UTC - in response to Message 71362.  

I've just reconnected to Rosetta after some time away, and the 2 units I've completed both have a 'validation error'.
Is this going to happen with all the remaining Mini 3.14 units I have ?
If so, I might as well detach again ......


My three computers have already been on No New Tasks for Rosetta for weeks, but due to a different 3.14 problem. On some computers, including those, 3.14 workunits tend to crash in a way that does not manage to tell BOINC that the workunit is no longer running and some other workunit can now be started.

I'm getting better 3.14 results on RALPH@Home, though, so the developers may be working out a way to change the workunit inputs in a way that gives better results without changing the 3.14 program yet.

Therefore, I'd suggest setting Rosetta on No New Tasks for now, but letting the remaining workunits run to see if they will all at least finish properly.
ID: 71363 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
svincent

Send message
Joined: 30 Dec 05
Posts: 219
Credit: 12,120,035
RAC: 0
Message 71390 - Posted: 7 Oct 2011, 2:27:27 UTC

Task Aug20_needle_9start_test_SAVE_ALL_OUT__31432_91316_0 (452661954) failed on W7 after taking 7 hours on a 3 hour preference.

Watchdog active.
Hbond tripped: [2011-10- 5 14: 9:21:]
BOINC:: CPU time: 25478.3s, 14400s + 10800s[2011-10- 5 20:46:31:] :: BOINC
WARNING! cannot get file size for default.out.gz: could not open file.
Output exists: default.out.gz Size: -1
InternalDecoyCount: 0 (GZ)
ID: 71390 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · Next

Message boards : Number crunching : Minirosetta 3.14



©2024 University of Washington
https://www.bakerlab.org