Message boards : Number crunching : Strange WU behavior
Author | Message |
---|---|
Richard de Lhorbe Send message Joined: 17 Aug 09 Posts: 5 Credit: 3,013,955 RAC: 0 |
I run Rosetta on a two core laptop under Linux Ubuntu 12.01 LTS (I may have the exact version number wrong, but it is the most recent upgrade). Yesterday, Rosetta stopped crunching completely, with WU 459324425 indicating "Waiting to Run" with zero percent shown as completed, and no other workunits running at all. I tried downloading some Einstein at Home, and they ran fine over night. This morning, I still could not get Rosetta to run, so I shut the computer down for the day, wondering if overheating had anything to do with it. Tonight, I turned it back on, but same problem. Eventually I aborted WU 459324425, and then BOINC 7.0.24 started Rosetta up again on both cores. I just looked the WU up under Tasks, and it indicates it used 1399 sec of CPU time, yet it showed zero percent complete under BOINC manager. I will be interested to see if my wingman gets it crunched or not. Anyway, all is running fine now, but if any experts want to look at this workunit to see if there is anything weird about it, be my guest. Having one workunit completely stop crunching on two cores is rather strange. Regards Richard |
Chilean Send message Joined: 16 Oct 05 Posts: 711 Credit: 26,694,507 RAC: 0 |
|
ChertseyAl Send message Joined: 23 Nov 05 Posts: 1 Credit: 100,315 RAC: 0 |
Are your memory settings set correctly? Tools -> computing preferences -> disk and memory usage. FWIW, I've had big problems with the rb_05_06 series - They just freeze, state is 'running' but they're not, no increase in CPU time or percentage. RAM usage is pretty high. Restarting BOINC sometimes gets them going again. Some complete, others get stuck again. I aborted a few, and managed to nurse the rest to completion. Sadly I lost a lot of time on these. Hopefully I won't get any more as I don't have time to micromanage my machines :( Cheers, Al. |
Richard de Lhorbe Send message Joined: 17 Aug 09 Posts: 5 Credit: 3,013,955 RAC: 0 |
Are your memory settings set correctly? Tools -> computing preferences -> disk and memory usage. Memory settings should be fine, Rosetta has run with no problem for weeks now on this machine. Most of the time, Rosetta is the only thing running so it has all the resources to itself. My wingman has now completed the first badly behaved workunit successfully. But I have just had another one do the same thing, workunit 459481462 .... shut down both cores, aborting it allowed both cores to start up again. This workunit is part of the same series as the first, starting with "secY_hybrid_03 ......". I think I will just abort these on first download. No other workunits seem to be causing any particular issues. Regards Richards |
Snags Send message Joined: 22 Feb 07 Posts: 198 Credit: 2,888,320 RAC: 0 |
Are your memory settings set correctly? Tools -> computing preferences -> disk and memory usage. I wouldn't dismiss memory issues so quickly. I have a "secY_hybrid_03" using over 1GB right now along with a "rb_05_11" using nearly 700MB. . A similiar combination of workunits on your machine could explain the symptoms you describe. Most workunits don't require this much memory so it's not surprising that you haven't seen problems before. Best, Snags |
Chilean Send message Joined: 16 Oct 05 Posts: 711 Credit: 26,694,507 RAC: 0 |
Are your memory settings set correctly? Tools -> computing preferences -> disk and memory usage. I had the same problem a while ago (I even made a thread about it). I simply aborted a handful of workunits and the problem never appeared again, my wingman finished those WU w/o a problem tho... so it's odd. |
Message boards :
Number crunching :
Strange WU behavior
©2024 University of Washington
https://www.bakerlab.org