Strange WU behavior

Message boards : Number crunching : Strange WU behavior

To post messages, you must log in.

AuthorMessage
Richard de Lhorbe

Send message
Joined: 17 Aug 09
Posts: 5
Credit: 3,013,955
RAC: 0
Message 73041 - Posted: 11 May 2012, 1:38:50 UTC

I run Rosetta on a two core laptop under Linux Ubuntu 12.01 LTS (I may have the exact version number wrong, but it is the most recent upgrade). Yesterday, Rosetta stopped crunching completely, with WU 459324425 indicating "Waiting to Run" with zero percent shown as completed, and no other workunits running at all. I tried downloading some Einstein at Home, and they ran fine over night. This morning, I still could not get Rosetta to run, so I shut the computer down for the day, wondering if overheating had anything to do with it. Tonight, I turned it back on, but same problem. Eventually I aborted WU 459324425, and then BOINC 7.0.24 started Rosetta up again on both cores. I just looked the WU up under Tasks, and it indicates it used 1399 sec of CPU time, yet it showed zero percent complete under BOINC manager. I will be interested to see if my wingman gets it crunched or not.

Anyway, all is running fine now, but if any experts want to look at this workunit to see if there is anything weird about it, be my guest. Having one workunit completely stop crunching on two cores is rather strange.

Regards
Richard
ID: 73041 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Chilean
Avatar

Send message
Joined: 16 Oct 05
Posts: 711
Credit: 26,694,507
RAC: 0
Message 73042 - Posted: 11 May 2012, 3:22:17 UTC

Are your memory settings set correctly? Tools -> computing preferences -> disk and memory usage.

If it's too low, then Rosetta won't run due to low memory allowance (Rosetta is a big RAM user).
ID: 73042 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile ChertseyAl
Avatar

Send message
Joined: 23 Nov 05
Posts: 1
Credit: 100,315
RAC: 0
Message 73044 - Posted: 11 May 2012, 18:12:49 UTC - in response to Message 73042.  

Are your memory settings set correctly? Tools -> computing preferences -> disk and memory usage.

If it's too low, then Rosetta won't run due to low memory allowance (Rosetta is a big RAM user).


FWIW, I've had big problems with the rb_05_06 series - They just freeze, state is 'running' but they're not, no increase in CPU time or percentage. RAM usage is pretty high. Restarting BOINC sometimes gets them going again. Some complete, others get stuck again. I aborted a few, and managed to nurse the rest to completion.

Sadly I lost a lot of time on these. Hopefully I won't get any more as I don't have time to micromanage my machines :(

Cheers,

Al.

ID: 73044 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Richard de Lhorbe

Send message
Joined: 17 Aug 09
Posts: 5
Credit: 3,013,955
RAC: 0
Message 73049 - Posted: 12 May 2012, 0:49:03 UTC - in response to Message 73042.  

Are your memory settings set correctly? Tools -> computing preferences -> disk and memory usage.

If it's too low, then Rosetta won't run due to low memory allowance (Rosetta is a big RAM user).


Memory settings should be fine, Rosetta has run with no problem for weeks now on this machine. Most of the time, Rosetta is the only thing running so it has all the resources to itself.

My wingman has now completed the first badly behaved workunit successfully. But I have just had another one do the same thing, workunit 459481462 .... shut down both cores, aborting it allowed both cores to start up again. This workunit is part of the same series as the first, starting with "secY_hybrid_03 ......". I think I will just abort these on first download. No other workunits seem to be causing any particular issues.

Regards
Richards
ID: 73049 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Snags

Send message
Joined: 22 Feb 07
Posts: 198
Credit: 2,888,320
RAC: 0
Message 73051 - Posted: 12 May 2012, 9:20:41 UTC - in response to Message 73049.  

Are your memory settings set correctly? Tools -> computing preferences -> disk and memory usage.

If it's too low, then Rosetta won't run due to low memory allowance (Rosetta is a big RAM user).


Memory settings should be fine, Rosetta has run with no problem for weeks now on this machine. Most of the time, Rosetta is the only thing running so it has all the resources to itself.

My wingman has now completed the first badly behaved workunit successfully. But I have just had another one do the same thing, workunit 459481462 .... shut down both cores, aborting it allowed both cores to start up again. This workunit is part of the same series as the first, starting with "secY_hybrid_03 ......". I think I will just abort these on first download. No other workunits seem to be causing any particular issues.

Regards
Richards


I wouldn't dismiss memory issues so quickly. I have a "secY_hybrid_03" using over 1GB right now along with a "rb_05_11" using nearly 700MB. . A similiar combination of workunits on your machine could explain the symptoms you describe. Most workunits don't require this much memory so it's not surprising that you haven't seen problems before.


Best,
Snags
ID: 73051 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Chilean
Avatar

Send message
Joined: 16 Oct 05
Posts: 711
Credit: 26,694,507
RAC: 0
Message 73060 - Posted: 14 May 2012, 2:32:37 UTC - in response to Message 73044.  

Are your memory settings set correctly? Tools -> computing preferences -> disk and memory usage.

If it's too low, then Rosetta won't run due to low memory allowance (Rosetta is a big RAM user).


FWIW, I've had big problems with the rb_05_06 series - They just freeze, state is 'running' but they're not, no increase in CPU time or percentage. RAM usage is pretty high. Restarting BOINC sometimes gets them going again. Some complete, others get stuck again. I aborted a few, and managed to nurse the rest to completion.

Sadly I lost a lot of time on these. Hopefully I won't get any more as I don't have time to micromanage my machines :(

Cheers,

Al.


I had the same problem a while ago (I even made a thread about it). I simply aborted a handful of workunits and the problem never appeared again, my wingman finished those WU w/o a problem tho... so it's odd.
ID: 73060 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote

Message boards : Number crunching : Strange WU behavior



©2024 University of Washington
https://www.bakerlab.org