Unfinished work units

Message boards : Number crunching : Unfinished work units

To post messages, you must log in.

AuthorMessage
Reaper13

Send message
Joined: 10 Nov 05
Posts: 3
Credit: 74,001
RAC: 0
Message 69889 - Posted: 24 Mar 2011, 19:29:02 UTC

Why would I have something on the order of 25 work units that are in some % of completion? I have 4 that are running as "high priority" but a ton of unfinished ones? Why does Rosetta not go back and finish the other ones that were started?
ID: 69889 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile dcdc

Send message
Joined: 3 Nov 05
Posts: 1833
Credit: 120,343,184
RAC: 28,545
Message 69890 - Posted: 24 Mar 2011, 19:48:22 UTC - in response to Message 69889.  

Why would I have something on the order of 25 work units that are in some % of completion? I have 4 that are running as "high priority" but a ton of unfinished ones? Why does Rosetta not go back and finish the other ones that were started?


To be pedantic, it's BOINC that controls which task is worked on- Rosetta just does what it's told.

My initial suggestion was going to be that you were low on RAM so they might be waiting for memory, but I see that's not the case as you have 8GB for 4 cores, unless you're either running something particularly memory intensive or you've adjusted the BOINC memory allowance right down?

If it's not to do with memory then I have no idea - have you adjusted the BOINC 'store enough work for # days' setting recently? It might have started downloading more urgent tasks when part-way through existing ones, but to get 25 partially complete is impressive...
ID: 69890 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 69891 - Posted: 24 Mar 2011, 20:12:03 UTC
Last modified: 24 Mar 2011, 20:16:09 UTC

Right, memory is my first thought as well. Specifically, the amount of memory BOINC is allowed to use. When R@h tasks run, they grow in memory usage as a given model progresses. Then when the model completes, memory usage drops a bit and will build again as the next model progresses. If BOINC detects that more memory is being used then your preferences allow, it suspends the task to reduce the memory used by BOINC. Then it starts another task, hoping it might use less memory then the last one, and for a short time, it does use less and runs. But often, it will grow to use a similar amount of memory later in the model, and depending upon where the other 3 active tasks are in their runs, that may then again exceed the memory preference.

So why not go back to the first task now rather then starting a third? ...well, I think BOINC already knows how much memory the first task is going to need (same amount it was using when it was suspended). And generally, that amount of memory would also exceed what BOINC is trying to live within. So, it knows the first task won't fit, and the second task grew too large... "hey what about this third task here?" ...and the cycle continues.

This can lead to sizable swap file space used. Hopefully you are retaining suspended tasks "in memory" (virtual memory, i.e. the swap file), see preferences.

But in the end, it should eventually figure things out and get those run. You may notice periods of time when only 3 CPUs are active so that it can live within the memory preference, and yet meet the deadlines of existing tasks before requesting more.

If the current behavior bothers you or perhaps the large swap file is undesirable, I'd suggest either allowing BOINC a higher percentage of system memory, or consider limiting BOINC to using 3 CPUs.

Another idea would be to add another project for something above 25% resource share, that uses significantly less memory to run. WCG often has projects that have small memory footprints. That way one CPU will tend to be running the low-memory project while the other three still have enough memory to happily run R@h tasks.
Rosetta Moderator: Mod.Sense
ID: 69891 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile dcdc

Send message
Joined: 3 Nov 05
Posts: 1833
Credit: 120,343,184
RAC: 28,545
Message 69892 - Posted: 24 Mar 2011, 20:41:59 UTC

Or add even more RAM! :D
ID: 69892 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote

Message boards : Number crunching : Unfinished work units



©2025 University of Washington
https://www.bakerlab.org