Does boinc hold on to cores?

Message boards : Number crunching : Does boinc hold on to cores?

To post messages, you must log in.

AuthorMessage
BKFC

Send message
Joined: 21 Apr 20
Posts: 34
Credit: 3,160,585
RAC: 0
Message 99881 - Posted: 3 Dec 2020, 15:05:35 UTC

I have a Ryzen7-2700x with 8 cores that is set up to run Rosetta@home at about 65% (to avoid overheating), when I'm not using the machine for other purposes. One of these uses a molecular dynamics simulator (LAMMPS) that employs MPI to distribute the work over the cores. After nothing but Rosetta@home was running overnight, I launched a LAMMPS application this morning. The boinc manager said that Rosetta@home was suspended, but the LAMMPS application only ran on one core, yet the system status monitor showed all other cores (and threads) as idle. I tried this repeatedly, with the same result. Finally I restarted the machine, and then the LAMMPS application ran on all 8 cores (actually all 16 threads).

So my question is this: is there anything about boinc that would cause it to hang on to cores (or set some parameter) whereby other applications such as LAMMPS assumes the full machine is not available? Or does anyone know of a setting that I can 'reset' regarding the status of the machine for OpenMPI?
ID: 99881 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1233
Credit: 14,324,975
RAC: 3,637
Message 99882 - Posted: 3 Dec 2020, 15:49:17 UTC - in response to Message 99881.  

I have a Ryzen7-2700x with 8 cores that is set up to run Rosetta@home at about 65% (to avoid overheating), when I'm not using the machine for other purposes. One of these uses a molecular dynamics simulator (LAMMPS) that employs MPI to distribute the work over the cores. After nothing but Rosetta@home was running overnight, I launched a LAMMPS application this morning. The boinc manager said that Rosetta@home was suspended, but the LAMMPS application only ran on one core, yet the system status monitor showed all other cores (and threads) as idle. I tried this repeatedly, with the same result. Finally I restarted the machine, and then the LAMMPS application ran on all 8 cores (actually all 16 threads).

So my question is this: is there anything about boinc that would cause it to hang on to cores (or set some parameter) whereby other applications such as LAMMPS assumes the full machine is not available? Or does anyone know of a setting that I can 'reset' regarding the status of the machine for OpenMPI?

Have you tried shutting down BOINC, not just suspending it, rather than restarting the machine?
ID: 99882 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Brian Nixon

Send message
Joined: 12 Apr 20
Posts: 293
Credit: 8,432,366
RAC: 0
Message 99883 - Posted: 3 Dec 2020, 15:50:52 UTC - in response to Message 99881.  

Perhaps not so much hang on to cores as hang on to memory, which could be causing LAMMPS to think there is insufficient available for it to run on cores that BOINC was previously using. In Computing preferences, in the Memory section, try deselecting Leave non-GPU tasks in memory while suspended.
ID: 99883 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
BKFC

Send message
Joined: 21 Apr 20
Posts: 34
Credit: 3,160,585
RAC: 0
Message 99886 - Posted: 3 Dec 2020, 16:19:52 UTC - in response to Message 99882.  

I tried shutting down the boinc manager, and checked the box not to run anything when the manager wasn't running. My LAMMPS application still only ran on 1 CPU. Is there a more drastic boinc shutdown option?
ID: 99886 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
BKFC

Send message
Joined: 21 Apr 20
Posts: 34
Credit: 3,160,585
RAC: 0
Message 99887 - Posted: 3 Dec 2020, 17:12:33 UTC - in response to Message 99883.  

Good point, hadn't thought of that, but when I looked, that box was already unchecked.
ID: 99887 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mikey
Avatar

Send message
Joined: 5 Jan 06
Posts: 1895
Credit: 9,208,737
RAC: 3,249
Message 99888 - Posted: 3 Dec 2020, 17:16:59 UTC - in response to Message 99886.  

I tried shutting down the boinc manager, and checked the box not to run anything when the manager wasn't running. My LAMMPS application still only ran on 1 CPU. Is there a more drastic boinc shutdown option?


In Linux yes there is a way to clear Boinc from memory without requiring a pc restart, but I don't know what it is anymore, but I don't know how without shutting down Boinc. I think the problem is the suspended tasks, they are locked into their 'slots' and until you get rid of the tasks, or physically stop Boinc they are still tied to that particular 'slot'.
ID: 99888 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
BKFC

Send message
Joined: 21 Apr 20
Posts: 34
Credit: 3,160,585
RAC: 0
Message 99890 - Posted: 3 Dec 2020, 17:52:27 UTC - in response to Message 99888.  

Two items:

1. I checked the status at the Rosetta site, and apparently there are 29 (!) tasks listed as running, even though there are only 16 cores, and boinc only shows 16 tasks.
2. There is some traffic a few years ago about the role of NVIDIA drivers. I just realized that I recently updated my NVIDIA driver, though I fail to see how this would make a difference, since Rosetta@home doesn't use GPUs.
ID: 99890 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Brian Nixon

Send message
Joined: 12 Apr 20
Posts: 293
Credit: 8,432,366
RAC: 0
Message 99891 - Posted: 3 Dec 2020, 19:16:39 UTC - in response to Message 99890.  

Re 1 (phantom tasks): apparently this can happen when there are network problems, causing the server to think it’s sent tasks but the client never to receive them. It came up here recently; the consensus was not to worry about it, and just let the phantom tasks time out and get resent.

Re the other issues: I don’t know enough about running BOINC on Linux to be able to offer any more suggestions…
ID: 99891 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
BKFC

Send message
Joined: 21 Apr 20
Posts: 34
Credit: 3,160,585
RAC: 0
Message 99892 - Posted: 3 Dec 2020, 20:22:07 UTC - in response to Message 99888.  

A third item:

I am now watching the system monitor. Rosetta is now using about 10 GB memory (out of 32 GB). I then start LAMMPS; Rosetta is 'suspended', but the memory usage is still around 10 GB (LAMMPS uses a few hundred MB).

I've now cycled this process several times, and memory usage is over 11 GB.
ID: 99892 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mikey
Avatar

Send message
Joined: 5 Jan 06
Posts: 1895
Credit: 9,208,737
RAC: 3,249
Message 99899 - Posted: 4 Dec 2020, 0:44:08 UTC - in response to Message 99892.  

A third item:

I am now watching the system monitor. Rosetta is now using about 10 GB memory (out of 32 GB). I then start LAMMPS; Rosetta is 'suspended', but the memory usage is still around 10 GB (LAMMPS uses a few hundred MB).

I've now cycled this process several times, and memory usage is over 11 GB.


The OS needs memory to do things incluidng puttin stuff on the screen and switch between tasks and other things we are doing. If you suspend Boinc I'm not sure it releases the memory of any tasks as long as the Boinc Manager is still running. That's a problem some of us run into when we suspend one project to run another for a short period of time ie a couple of days and then go back to the original project and resume the tasks.
ID: 99899 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
MarkJ

Send message
Joined: 28 Mar 20
Posts: 72
Credit: 25,238,680
RAC: 0
Message 99906 - Posted: 4 Dec 2020, 11:18:18 UTC

To stop/start/restart BOINC on a recent Debian or Ubuntu:

sudo systemctl stop boinc-client
sudo systemctl start boinc-client
sudo systemctl restart boinc-client
BOINC blog
ID: 99906 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
MarkJ

Send message
Joined: 28 Mar 20
Posts: 72
Credit: 25,238,680
RAC: 0
Message 99907 - Posted: 4 Dec 2020, 11:26:21 UTC - in response to Message 99886.  

I tried shutting down the boinc manager, and checked the box not to run anything when the manager wasn't running. My LAMMPS application still only ran on 1 CPU. Is there a more drastic boinc shutdown option?

Shutting down the manager does nothing. The manager is just the GUI. You would want to stop the BOINC client (the service that does the scheduling and comms).

BOINC will keep tasks in memory if the option is selected, even if they get swapped out or suspended. Setting the “keep tasks in memory” off should release the memory and cpu thread when they get suspended.
BOINC blog
ID: 99907 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
BKFC

Send message
Joined: 21 Apr 20
Posts: 34
Credit: 3,160,585
RAC: 0
Message 99915 - Posted: 4 Dec 2020, 19:01:25 UTC - in response to Message 99907.  

I had already selected that option, that is, unchecked the box.

I've run R@H for several months with no trouble. The only thing that has changed at my end is that I updated the NVIDIA driver to 4.50. There are some posts that suggest the problem is related to this, but I'm not in a position to downgrade my NVIDIA driver.
ID: 99915 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote

Message boards : Number crunching : Does boinc hold on to cores?



©2024 University of Washington
https://www.bakerlab.org