Message boards : Number crunching : Long run and failure.
Author | Message |
---|---|
adrianxw Send message Joined: 18 Sep 05 Posts: 653 Credit: 11,840,739 RAC: 34 |
This morning, I noticed this wu was acting a little strangely. First, it has been running for more than 24 hours now, I've been watching for a short while and it doesn't seem to be advancing, ie. the % done is not moving. Also, when I try to fire up the graphics from that wu,it doesn't run, I get the initial black rectangular window, but nothing more. When I try to stop the graphics, I get Windows saying the "not running, abort or retry" type message. Digging a little deeper, I found another wu, this one, with an extended run time and an eventual error. I have suspended the wu, pending comments. This is my usual machine I use almost all day, (ie not a dedicated BOINC cruncher), not had any other BOINC problems or issues with anything else. Fully patched, up to date Win XP machine. Wave upon wave of demented avengers march cheerfully out of obscurity into the dream. |
mikey Send message Joined: 5 Jan 06 Posts: 1895 Credit: 9,174,382 RAC: 3,121 |
This morning, I noticed this wu was acting a little strangely. First, it has been running for more than 24 hours now, I've been watching for a short while and it doesn't seem to be advancing, ie. the % done is not moving. Also, when I try to fire up the graphics from that wu,it doesn't run, I get the initial black rectangular window, but nothing more. When I try to stop the graphics, I get Windows saying the "not running, abort or retry" type message. I don't remember the time frame but there seems to be a problems with SOME of the units not recognizing that the time has elapsed and it needs to stop. Try exiting Boinc and then restarting it and see if the unit picks back up at a checkpoint or if it is just time to cancel it and move on to the next one. |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
There were some issues with some BOINC versions (not sure specific versions were ever isolated) where the BOINC Manager shows the task as "running", but doesn't give it any CPU time. And so no progress is made. Simple litmus test is to open Windows task manager and see if you have an active thread per CPU or if system idle is getting CPU. Exiting (not just close) and restarting BOINC causes it to get it's head straight and the task will be given CPU to finish it's work. When the tasks actually get CPU as planned, I don't believe I've ever seen any that were not automatically ended by the watch dog if they exceed target runtime (see your Rosetta-specific preferences, default runtime preference is 3hrs) by 4hrs. So if you run default 3hr tasks, and one runs more than 7hrs, the watchdog will catch that and end it for you... so long as BOINC is giving it CPU time for it to do so. Rosetta Moderator: Mod.Sense |
Message boards :
Number crunching :
Long run and failure.
©2024 University of Washington
https://www.bakerlab.org