Message boards : Number crunching : Rosetta leaking memory (or even threads)?
Author | Message |
---|---|
Anders Sjöqvist Send message Joined: 23 Feb 09 Posts: 8 Credit: 541,727 RAC: 0 |
I have a FreeBSD machine that performs occasional cron jobs, but they are fairly cheap. I also have some things like nginx and Squid running, but they are just rejecting connections. There's also no X Windows on the machine to eat any CPU cycles. Basically, Rosetta has access to the full capacity of the computer. The computer is not a very powerful one, with two cores and hyper-threading, 2 GB of RAM and 5 GB of swap, but I think it should be powerful enough to run Rosetta. However, apart from the four minirosetta threads running at 100%, there are 13 idle threads, the majority of which are nano-sleeping. One thread also seems starving (waiting on futex). Last night, Rosetta consumed all of the RAM and all of the swap space, causing my logs to fill up with error messages. Here's a list of all of the processes running on the computer, except for a few hidden system processes: last pid: 20518; load averages: 4.61, 4.53, 4.44 up 62+22:22:33 09:49:29 45 processes: 5 running, 40 sleeping CPU: 0.3% user, 96.5% nice, 2.2% system, 1.1% interrupt, 0.0% idle Mem: 1268M Active, 242M Inact, 379M Wired, 54M Cache, 213M Buf, 25M Free Swap: 5003M Total, 2429M Used, 2574M Free, 48% Inuse PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND 20365 boinc 1 155 i31 819M 735M CPU2 2 63:41 100.00% minirosetta_3.31_i6 20366 boinc 1 155 i31 819M 735M nanslp 2 0:02 0.00% minirosetta_3.31_i6 20367 boinc 1 155 i31 819M 735M nanslp 0 0:00 0.00% minirosetta_3.31_i6 19920 boinc 1 155 i31 791M 168M CPU1 1 240:32 100.00% minirosetta_3.31_i6 19921 boinc 1 155 i31 791M 168M nanslp 0 0:07 0.00% minirosetta_3.31_i6 19922 boinc 1 155 i31 791M 168M nanslp 1 0:00 0.00% minirosetta_3.31_i6 15990 boinc 1 155 i31 697M 220K nanslp 3 1:04 0.00% minirosetta_3.30_i6 35537 boinc 1 155 i31 513M 1620K nanslp 2 0:05 0.00% minirosetta_3.31_i6 28665 boinc 1 155 i31 508M 208K nanslp 1 0:52 0.00% minirosetta_3.30_i6 90066 boinc 1 155 i31 504M 1396K nanslp 1 0:34 0.00% minirosetta_3.31_i6 90065 boinc 1 155 i31 504M 1396K futex 1 0:04 0.00% minirosetta_3.31_i6 20245 boinc 1 155 i31 484M 242M RUN 0 103:07 100.00% minirosetta_3.31_i6 20246 boinc 1 155 i31 484M 242M nanslp 0 0:04 0.00% minirosetta_3.31_i6 20247 boinc 1 155 i31 484M 242M nanslp 2 0:00 0.00% minirosetta_3.31_i6 20427 boinc 1 155 i31 313M 230M CPU3 3 38:33 100.00% minirosetta_3.31_i6 20428 boinc 1 155 i31 313M 230M nanslp 0 0:01 0.00% minirosetta_3.31_i6 20429 boinc 1 155 i31 313M 230M nanslp 1 0:00 0.00% minirosetta_3.31_i6 77868 anders 2 20 0 85360K 9228K kqread 2 179:34 0.00% rtorrent 20465 anders 1 20 0 51532K 4348K select 1 0:00 0.00% sshd 20463 root 1 21 0 51532K 4316K sbwait 3 0:00 0.00% sshd 1477 root 1 20 0 46872K 504K select 1 0:00 0.00% sshd 1453 boinc 1 155 i31 38576K 2864K select 3 295:09 0.00% boinc_client 39018 www 1 20 0 36608K 0K kqread 3 0:01 0.00% <nginx> 39017 root 1 52 0 36608K 0K pause 2 0:00 0.00% <nginx> 77866 anders 1 20 0 23100K 312K select 2 1:34 0.00% screen 1389 root 1 20 0 22368K 796K select 3 8:55 0.00% ntpd 1490 smmsp 1 20 0 20420K 656K pause 0 0:04 0.00% sendmail 1484 root 1 20 0 20420K 648K select 0 2:49 0.00% sendmail 20466 anders 1 20 0 17624K 2532K wait 0 0:00 0.00% bash 20471 anders 1 20 0 16740K 2084K CPU0 3 0:02 0.00% top 1496 root 1 20 0 14296K 352K nanslp 2 0:40 0.00% cron 1434 root 1 21 0 12356K 208K bpf 2 53.4H 0.00% knockd 1215 root 1 20 0 12220K 520K select 1 0:24 0.00% syslogd 1396 root 1 20 0 12220K 168K select 1 20:32 0.00% powerd 1573 root 1 52 0 12220K 88K ttyin 3 0:00 0.00% getty 1566 root 1 52 0 12220K 88K ttyin 1 0:00 0.00% getty 1568 root 1 52 0 12220K 88K ttyin 2 0:00 0.00% getty 1569 root 1 52 0 12220K 88K ttyin 0 0:00 0.00% getty 1567 root 1 52 0 12220K 88K ttyin 1 0:00 0.00% getty 1570 root 1 52 0 12220K 88K ttyin 2 0:00 0.00% getty 1571 root 1 52 0 12220K 88K ttyin 2 0:00 0.00% getty 1572 root 1 52 0 12220K 88K ttyin 3 0:00 0.00% getty 1040 root 1 20 0 10372K 600K select 2 0:00 0.00% devd 1039 _dhcp 1 20 0 10092K 612K select 0 0:06 0.00% dhclient 1001 root 1 34 0 10092K 476K select 0 0:05 0.00% dhclient Sometimes, things like this happen (look at the next listing). Note the memory consumption for the first process, and that there's only 8 MB of swap space available. (I guess this is what happened last night.) last pid: 20559; load averages: 2.98, 3.08, 3.56 up 62+22:37:40 10:04:36 43 processes: 4 running, 39 sleeping CPU: 0.4% user, 73.0% nice, 3.1% system, 1.0% interrupt, 22.5% idle Mem: 1403M Active, 128M Inact, 378M Wired, 54M Cache, 213M Buf, 3688K Free Swap: 5003M Total, 4995M Used, 8116K Free, 99% Inuse, 2312K In, 308K Out PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND 20365 boinc 1 155 i31 3174M 1309M swread 0 69:30 0.29% minirosetta_3.31_i6 19920 boinc 1 155 i31 798M 136M CPU2 2 253:45 99.76% minirosetta_3.31_i6 19921 boinc 1 155 i31 798M 136M nanslp 1 0:07 0.00% minirosetta_3.31_i6 19922 boinc 1 155 i31 798M 136M nanslp 1 0:00 0.00% minirosetta_3.31_i6 ---------------------- cut ---------------------- Can anyone help me? Is Rosetta leaking memory, or may it even be leaking processes? Why should I have 17 Rosetta processes running, each locking up several hundred megabytes of memory? Is it normal? Shouldn't the preferences prevent BOINC from using up that much memory? Is my computer simply not powerful enough? Should I remove Rosetta to ensure stability? The system has only been running for 62 days. Thanks! |
mikey Send message Joined: 5 Jan 06 Posts: 1895 Credit: 9,186,455 RAC: 3,447 |
I have a FreeBSD machine that performs occasional cron jobs, but they are fairly cheap. I also have some things like nginx and Squid running, but they are just rejecting connections. There's also no X Windows on the machine to eat any CPU cycles. Basically, Rosetta has access to the full capacity of the computer. I think the problem is your memory settings, I think you are starting a new process everytime Boinc uses up all the available free memory as new tasks use less memory. The key is to go into you account and change some lines in Computing Preferences. Specifically these are my settings: Use at most 75% of page file (swap space) Use at most Enforced by version 5.8+ 85% of memory when computer is in use Use at most Enforced by version 5.8+ 90% of memory when computer is not in use If you look at yours they are most likely nearer to 50% and with only 2 gig of ram, and Rosie's larger units, you will easily pass that. |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
Right, BOINC Manager is trying to run within the current memory constraints configured (perhaps the default settings). It is normal for R@h tasks to take a lot of memory to run. BOINC Manager apparently is starting a task and as it runs for a while it will accumulate more and more information to keep track of and grow it's memory footprint. It eventually crosses the threshold BOINC wants to enforce and so BOINC suspends that task (keeping it in memory which apparently is your configured preference, also the setting I'd recommend), and begins another task, which at first runs in less memory and so it gets to run for a while, gradually grows in memory footprint and the process repeats itself. Many ways to address the situation, just depends upon your goals: Configure BOINC to run on less than all of your CPUs. Allow BOINC to use more memory. Add memory to the machine. Attach to an additional project that uses less memory so that some of the active tasks will run in a smaller footprint and BOINC will intermix them with the tasks that require a larger memory footprint. Rosetta Moderator: Mod.Sense |
Anders Sjöqvist Send message Joined: 23 Feb 09 Posts: 8 Credit: 541,727 RAC: 0 |
Thanks for the replies, both of you! However, I don't really get why allowing less memory would make BOINC consume more. Is the limit per process, rather than for BOINC as a total? I found a certain workunit that had previously caused an "Out Of Memory" exception on a 16GB Windows machine. This seems to have been downloaded just before I first noticed the problems. You offered some good advice on what I should do in the future. I don't think I'll add more RAM just to run BOINC. Rather, I bought a cheap computer with low energy footprint specifically to do simple things, and I just wanted it to help humanity when it wasn't doing anything else. Limiting the number of cores or finding a second project might be good options, though, and I'll soon look into that. For now, I first disallowed new downloads, and when it was done I shut everything off. Interestingly, different processes required different amounts of effort to shut down. A few of them, among them a couple of processes that were launched in early May, neither listen to HUP, TERM, INT nor QUIT, and I had to resort to KILL. Still, I decided that my system had become so unstable from running out of memory several times (for example, the port knocking daemon didn't work anymore), that I'd better reboot. Strangely enough, the boinc_enable="YES" that I believe has always worked before didn't work anymore, and I had to rewrite it to boinc_client_enable="YES". Weird... It's just too bad that I had a "% of time BOINC client is running" at 99.9997%, which is now at 99.774%. :( |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
Actually Mikey was speculating that your memory setting was near 50% and suggesting you do something more like his, which is 85%. And he said "new tasks use less memory"... he probably should have added "...at first". As I said, it is normal that it grows as progress into a given model continues. ...however, yes, it sounds like you got a task that used an extraordinary amount of memory. Rosetta Moderator: Mod.Sense |
mikey Send message Joined: 5 Jan 06 Posts: 1895 Credit: 9,186,455 RAC: 3,447 |
Actually Mikey was speculating that your memory setting was near 50% and suggesting you do something more like his, which is 85%. And he said "new tasks use less memory"... he probably should have added "...at first". As I said, it is normal that it grows as progress into a given model continues. ...however, yes, it sounds like you got a task that used an extraordinary amount of memory. ModSense is correct, what I MEANT to say, and SHOULD have said, is that new tasks start out using just a little bit of memory then as they progress they use more and more memory. At some point along the way they exceed the amount your settings allow them so Boinc stops them and starts another one, after all the new one does not take as much memory! NO the units are NOT designed to tell Boinc how big they can get, so Boinc sees the small amount in the beginning and assumes it will be consistent throughout, which of course they aren't! |
Message boards :
Number crunching :
Rosetta leaking memory (or even threads)?
©2024 University of Washington
https://www.bakerlab.org