Questions and Answers : Unix/Linux : boincmgr with rosetta downloaded lots of data and when I rebooted it seemed to start over
Author | Message |
---|---|
Macuilxochitl Send message Joined: 20 May 20 Posts: 6 Credit: 0 RAC: 0 |
I have a Ryzen 5 1600 AF and 'buntu 20.04 and want to contribute to fighting covid. I downloaded boincmgr and attached myself to https://boinc.bakerlab.org/rosetta/ It downloaded a ton of data, over a GB. I guess for most folks that isn't a big deal, but I have funky DSL far from the CO and only get ~200kbps, and the client kept interrupting itself forcing me to manually restart the transfer many times. Finally, after hours I started crunching data and using my 12 cores, I figured I was good to go. But I'd installed an updated kernel, so I turned the machine off for the evening, and when I rebooted it started over, trying to download another 500MB of data in the Transfers tab of the client. In the Tasks tab it looks like it wants to download another 16 things. When I go to Properties in the Projects page it says I'm using 1.57 GB (which isn't that big a deal) and I've completed 0 tasks and have 27 failed tasks. In the Projects tab it claims my 'Avg. work done' is 0.09, which doesn't sound impressive. Is this expected behavior? I don't want to pointlessly squander my limited bandwidth, waste electricity and not get anything done. Now boincmgr's Transfers tab is stalled out at 84% but the Status column reads Download: active. Suspending the project does not allow me to Retry and start downloading again. It eventually times out and waits to retry. I can force it to Retry Now, but I'm still getting a speed of 0.00 KBps. But pinging Google shows I have 0% packet loss. Speedtest shows my connection is working: speedtest-cli --server 3864 --simple Ping: 52.206 ms Download: 1.46 Mbit/s Upload: 0.88 Mbit/s I understand there has been a major uptick in folding usage, maybe the problem is on your end. This is very frustrating, is there some other project I can contribute to that works on the coronavirus that is a little less finicky and data-intensive? |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1680 Credit: 17,854,150 RAC: 22,647 |
Do you have another account here? Because there is no sign of any computers on the account you used to post here, let alone them having got any work, or produced any errors. When you first join a project (any project) there will be a lot of downloading as not only do you have to get Tasks to process, you also need to get the applications to process them. Different types of Tasks will also require different support files. However once all of these files have been downloaded, then the actual data files downloaded to process are generally only a few hundred kB- although the result files being sent back can be as much as 30MB (some times more), usually a lot less. If file transfers tend to be sticky (it says it's uploading/downloading but nothing is actually happening), in the BOINC Manager (Advanced view), Activity, select "Suspend network activity", then re-select "Network activity based on preferences". It may also be necessary to use a proxy server to work around the problems with your net connection. Grant Darwin NT |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
The BOINC Manager will take care of retrying downloads that get interrupted. BOINC also has settings where you can limit bandwidth usage if you like. "Avg. work done" is over the last 10 days, and during most of those, it sounds like you did no work because you were not attached to the project, so sort of a meaningless number hours after you signup. Rosetta Moderator: Mod.Sense |
Macuilxochitl Send message Joined: 20 May 20 Posts: 6 Credit: 0 RAC: 0 |
Here is the Properties for my Rosetta@home Project tab. General URL https://boinc.bakerlab.org/rosetta/ User name Macuilxochitl Team name Resource share 100 Scheduler RPC deferred for 01:07:09 Disk usage 1.82 GB Computer ID 4432863 Suspended via GUI no Don't request tasks no Host location default Tasks completed 9 Tasks failed 57 Credit User 20,818 total, 55.83 average Host 588 total, 55.83 average Scheduling Scheduling priority -1.01 Don't request tasks for NVIDIA GPU Project has no apps for NVIDIA GPU Last scheduler reply Wed 20 May 2020 12:27:11 PM PDT As you can see I'm using 1.82 GB of disk space and it seemed like I downloaded something like that in data. Rosetta is the only project I have attached to boincmgr. It looks like I'm finally cranking through some work. My failed Tasks added another 31 to 57, I'm not sure what is going on there. My username 'Macuilxochitl' appears to be the same as my name on this board. My user ID appears to be 2157465. I've devoted 100% of my resources to https://boinc.bakerlab.org/rosetta But it looks like my machine is now cranking out work units. I'd kind of wanted to work on COVID 19 for personal reasons, but I'm apparently working on 3-dimensional shapes of proteins, but I guess that is OK. I'll let this unit finish in the next 2-3 hours and see if I have to download another big dollop of data. Also, if I don't, I'll try rebooting and see if I have to start over downloading large amounts. |
Macuilxochitl Send message Joined: 20 May 20 Posts: 6 Credit: 0 RAC: 0 |
Well, rosetta@home took up where it left off after a reboot, I guess everything is ducky. But I've used various boinc clients intermittently over the last 15-20 years and never had this much of a PITA getting it up and running before, the downloads were brutal. Do you have another account here? Because there is no sign of any computers on the account you used to post here, let alone them having got any work, or produced any errors. When I hit the 'Your Account' button in the advanced Projects tab it takes me to this web site and my computer is obviously thrashing around, seems like I'm using about 55-60% of my CPUs (according to mpstat) and BOINC seems to really like to gobble RAM, I think its using about 5GB, and my previous session I was up to using a system total of 10GB (a personal record!): $ ps aux | awk '{print $6/1024 " MBtt" $11}' | grep boinc | sort -n 19.6133 MB /usr/bin/boinc 104.738 MB /usr/bin/boincmgr 374.59 MB ../../projects/boinc.bakerlab.org_rosetta/rosetta_4.20_x86_64-pc-linux-gnu 403.539 MB ../../projects/boinc.bakerlab.org_rosetta/rosetta_4.20_x86_64-pc-linux-gnu 1016.75 MB ../../projects/boinc.bakerlab.org_rosetta/rosetta_4.20_x86_64-pc-linux-gnu 1018.41 MB ../../projects/boinc.bakerlab.org_rosetta/rosetta_4.20_x86_64-pc-linux-gnu 1019.94 MB ../../projects/boinc.bakerlab.org_rosetta/rosetta_4.20_x86_64-pc-linux-gnu 1142.95 MB ../../projects/boinc.bakerlab.org_rosetta/rosetta_4.20_x86_64-pc-linux-gnu boincmgr says I've done 22,899 work and my Avg. work done is 259, whatever that means, but according to the linked "Your account" page I don't seem to be doing much of anything: Total credit 0 Recent average credit 0.00 Computers on this account blank Tasks blank That seems a little weird. I don't care about credit, but I hope my computer time is not going to /dev/null Should I be concerned? Anyway, I'm going to shop around for a couple of case fans and see if I can increase the 60% of my processors that are being devoted to folding, I seem to have hit a thermal wall (my CPU is at 80C). |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1680 Credit: 17,854,150 RAC: 22,647 |
Should I be concerned?Yes, you need to figure out how many accounts you have, and what they are. The account you are posting here with has no computers doing any work at all (as the linked to Account page shows), you need to log in to the project using the name & email address that you used to attach the computer to Rosetta that is presently processing work. Grant Darwin NT |
Macuilxochitl Send message Joined: 20 May 20 Posts: 6 Credit: 0 RAC: 0 |
As far as I can tell I registered with rosetta exactly once and the boincmgr links to this account that I'm posting from. This has been too much of a PITA. shutdown -h +200 That should shut my machine down after my box is finished with the task I'm working on. and will give the both a reasonable time to upload the results Then: Wipe the rosetta project with boincmgr apt-get purge boinc* Then I will reinstall boinc and try starting over with a fresh account. If that doesn't work any better than this has, I've download the folding@home deb file and can give it a throw, but I gather than folding is more GPU-centric than boinc and I have a fairly anemic GK208B video card. |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1680 Credit: 17,854,150 RAC: 22,647 |
Then I will reinstall boinc and try starting over with a fresh account.Or just attach to the project with the account you are using here. Grant Darwin NT |
Macuilxochitl Send message Joined: 20 May 20 Posts: 6 Credit: 0 RAC: 0 |
Grant, I hate to waste all the time and bandwidth I spent setting this up, it would be helpful if you could tell me exactly how to attach to the project with this account. It isn't obvious to me. This is what happened when I tried to attach using the key from https://boinc.bakerlab.org/rosetta/home.php: ~$ boinccmd --project_attach https://boinc.bakerlab.org/rosetta MYKEYFROMYOURACCONTPAGE 21-May-2020 16:35:24: GUI RPC error: Already attached to project Operation failed: already attached to project So I tried this command: $ boinccmd --lookup_account https://boinc.bakerlab.org/rosetta MYEMAIL@SOMEWHERE.com MYPASSWORD status: Success poll status: operation in progress account key: SOME_COMPLETELY_DIFFERENT_KEY $ [i]boinccmd --project_attach https://boinc.bakerlab.org/rosetta SOME_COMPLETELY_DIFFERENT_KEY[/i] 21-May-2020 16:41:13: GUI RPC error: Already attached to project Operation failed: already attached to project Great. $ boinccmd --get_state ======== Projects ======== 1) ----------- name: Rosetta@home master URL: https://boinc.bakerlab.org/rosetta/ user_name: Macuilxochitl team_name: resource share: 100.000000 user_total_credit: 26253.309237 user_expavg_credit: 560.492141 host_total_credit: 6023.143093 host_expavg_credit: 560.492141 nrpc_failures: 0 master_fetch_failures: 0 master fetch pending: no scheduler RPC pending: no trickle upload pending: no attached via Account Manager: no ended: no suspended via GUI: no don't request more work: no disk usage: 0.000000 last RPC: Thu May 21 16:15:31 2020 project files downloaded: 0.000000 GUI URL: name: Message boards description: Correspond with other users on the Rosetta@home message boards URL: https://boinc.bakerlab.org/rosetta/forum_index.php GUI URL: name: Your account description: View your account information URL: https://boinc.bakerlab.org/rosetta/home.php GUI URL: name: Your tasks description: View the last week or so of computational work URL: https://boinc.bakerlab.org/rosetta/results.php?userid=283434 jobs succeeded: 17 jobs failed: 58 elapsed time: 585442.120870 cross-project ID: b234b0bee793944832bb02a56190d855 ======== Applications ======== 1) ----------- name: rosetta Project: Rosetta@home ======== Application versions ======== 1) ----------- project: Rosetta@home application: rosetta platform: x86_64-pc-linux-gnu version: 4.20 estimated GFLOPS: 2.78 filename: rosetta_4.20_x86_64-pc-linux-gnu ======== Workunits ======== 1) ----------- name: Junior_HalfRoid_design6_cart_COVID-19_SAVE_ALL_OUT_IGNORE_THE_REST_3mi5lf1n_929381_18 FP estimate: 8.000000e+13 FP bound: 1.000000e+17 memory bound: 1716.61 MB disk bound: 1716.61 MB ---- ======== Tasks ======== 1) ----------- name: Junior_HalfRoid_design6_cart_COVID-19_SAVE_ALL_OUT_IGNORE_THE_REST_3mi5lf1n_929381_18_1 WU name: Junior_HalfRoid_design6_cart_COVID-19_SAVE_ALL_OUT_IGNORE_THE_REST_3mi5lf1n_929381_18 project URL: https://boinc.bakerlab.org/rosetta/ received: Wed May 20 16:46:38 2020 report deadline: Sat May 23 16:46:38 2020 ready to report: no state: downloaded scheduler state: scheduled active_task_state: EXECUTING app version num: 420 resources: 1 CPU estimated CPU time remaining: 16102.080000 slot: 4 PID: 3736 CPU time at last checkpoint: 15733.360000 current CPU time: 15874.380000 fraction done: 0.440900 swap size: 1350 MB working set size: 1208 MB ... ======== Time stats ======== now: 1590104715.493604 on_frac: 0.901893 connected_frac: -1.000000 cpu_and_network_available_frac: 0.987368 active_frac: 0.987368 gpu_active_frac: 0.829332 client_start_time: Thu May 21 16:14:51 2020 previous_uptime: 30987.510875 session_active_duration: 1787.619883 session_gpu_active_duration: 0.000000 total_start_time: Mon May 18 16:48:52 2020 total_duration: 161704.501342 total_active_duration: 150230.505758 total_gpu_active_duration: 20.217683 What am I doing wrong? At least the 'boinccmd --get_state' implies that I'm working on COVID, which was my main goal. But it would be nice to know if the work I am doing is actually being used. |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1680 Credit: 17,854,150 RAC: 22,647 |
What am I doing wrong?No idea. I have only ever used the graphical Manager. I left the command line behind a very long time ago. The other option would be instead of posting here using this present account, log off from the site, and log back on using the account the computer is using (that actually makes more sense, as the other account will have all the history of the work the computer has done, where as this account doesn't have any processing history *slaps self*). Grant Darwin NT |
Macuilxochitl Send message Joined: 20 May 20 Posts: 6 Credit: 0 RAC: 0 |
I'm probably capable of handling the GUI, what exactly are the instructions I'm supposed to follow? bonicmgr currently tells me it has a dollop of data ready to report. https://ibb.co/3NMbpz8 But when I click on 'Your tasks' on the Tasks tab of bonicmgr I it takes me to my default browser with this helpful message: Unable to handle request No access https://ibb.co/f1LxZrf I still have no idea if my work is being processed or used of if I'm just contributing to making up the 17% deficit of greenhouse gases caused by Covid slowing down the economy. |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1680 Credit: 17,854,150 RAC: 22,647 |
OK, from what you have posted previously, $ boinccmd --get_state ======== Projects ======== 1) ----------- name: Rosetta@home master URL: https://boinc.bakerlab.org/rosetta/ user_name: Macuilxochitl team_name: resource share: 100.000000 user_total_credit: 26253.309237 user_expavg_credit: 560.492141 host_total_credit: 6023.143093 host_expavg_credit: 560.492141 GUI URL: name: Your tasks description: View the last week or so of computational work URL: https://boinc.bakerlab.org/rosetta/results.php?userid=283434 jobs succeeded: 17 jobs failed: 58 elapsed time: 585442.120870 cross-project ID: b234b0bee793944832bb02a56190d855So work is being done, and it is earning Credit for that computer on that account. The user ID for that account is 283434 The user ID for the account you are posting with here is 2157465 So it looks like you've had an account for quite some time, for some reason you then created a new account- but your computer is still on the old account. But since you have logged in here with the new account, you can't see the computer. And when you go to check out your account using the BOINC Manager on the computer, you can't- because you are not logged in on that account. If you click on "Log out" at the top right hand corner of this page, that will log you out from this web site using the new account. If you then follow Step 2 below, that should allow you to log back in using your original account, the one that has the computer on it. Forgot your account info? You might wan to triple check everything before doing it (i just got up & i'm still not quite awake yet; it's been a looooong and tiring week), but it should get you logged in to this website using the account that your computer is on. Grant Darwin NT |
Macuilxochitl Send message Joined: 11 Oct 08 Posts: 13 Credit: 134,700 RAC: 0 |
Alright, thanks, that seemed to work. I'm not sure why I have 2 accounts. When I set up BOINC on this box I tried to log in with the username and password I had on record from 12 years ago, but it rejected my password, so I asked to reset my password, but for some reason it seemed to create a second account, I'm not sure what happened, but now everything seems ducky. At least I have some confidence that my computer work is being used. I just ordered 3 case fans, so we'll see if I can begin to really do some crunching. |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1680 Credit: 17,854,150 RAC: 22,647 |
Glad you got that sorted. Now you can start hunting down what is going on with the system- it's putting out a lot of errors. Grant Darwin NT |
Macuilxochitl Send message Joined: 11 Oct 08 Posts: 13 Credit: 134,700 RAC: 0 |
I see the errors in the manager, but I wouldn't know where to look for the source. Maybe it is because I am using the proprietary Nvidia (linux) graphics driver? I guess I could try some memtest. I'm not seeing anything in the GUI Event log that flags my attention. If I were going to guess I'd say maybe the errors I was getting trying to transfer data to and fro given my marginal internet connection is to blame. I took a cursory look in my home directory but didn't see any log file to examine. Can you suggest where I might look? |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1680 Credit: 17,854,150 RAC: 22,647 |
Can you suggest where I might look?Unfortunately Linux error messages see to be on par with old DOS ones- next to useless. It's very unlikely to be related to the video driver (possible, but very unlikely). And the internet issues are also not likely to be the cause- the files were downloaded OK. While several Tasks crashed & burned straight away, others started processing and then crashed. But it is a possibility. One or 2 of the errors appear to be related to the Tasks themselves- there are issues with some of the Work Units, but all of the others are dying only on your system. A quick search shows that "process got signal 11" errors are either a problem with the programme (yet others aren't having the issues you are), or it's a hardware problem. Since you've got your account sorted out, and hopefully the internet issues sorted out, the usual suggested fix it is to Reset project (on the BONC Manager Project tab). What it does is clears out all of your local files (data and application) for the project, then re-downloads new copies, then downloads new Tasks to process. Given you have had internet issues, it is possible a file or two is corrupted & is responsible for your high error count. This should eliminate the Rosetta software/libraries/databases being at fault. If after doing that the problems still occur, then it's a case of testing RAM, making sure the CPU isn't overheating. possibly even turning off hyperthreading & see how things go- with that number of cores & threads, your present system RAM will result in some memory issues as some Tasks can require as much as 3GB of RAM. You generally need to allow for 1.3GB of RAM per core/thread in use to avoid running in to memory limitation issues. But it shouldn't result in the errors that you are seeing. It's also worth checking the rails of your power supply- if the voltages are dropping under load, that can also result in "process got signal 11" errors. Grant Darwin NT |
Macuilxochitl Send message Joined: 11 Oct 08 Posts: 13 Credit: 134,700 RAC: 0 |
Well, I let my work unit finish and then Reset the project, as suggested. It has been cranking for 3.5 hours and the 'tasks failed' has remained at 82, so maybe that did the trick. I'll keep an eye on it. It could be the rocky upload may have been to blame. I stuck a USB wifi dongle on the machine and used my neighbor's much faster internet connection to download my work, which went quickly without any interruptions, so maybe it was cleaner. I've got some fans on the way, maybe it will reduce my temps from ~70c, though that doesn't seem excessive. I don't think the RAM is a limiting factor, I've never gone past 10 GB out of 16, but I guess I could stick in another 8 GB at some point if it becomes an issue. I really hope the power supply is not an issue, those suckers are expensive at the moment. Also, maybe a source of error is that my Geforce 730 GT is a refurb I got for $20. But I get the impression that the GPU isn't that important for Rosetta. Still, Nvidia is a crummy choice for Linux, and I get screen weirdness way to often. I wish I could find a cheap AMD processor and use the open source driver, but unfortunately, desktop display adapters are largely a thing of the past. Folks that are not gaming use onboard graphics, which is actual faster than this damn card anyway, and it is hard to find a decent video card for < $80 or so. |
Macuilxochitl Send message Joined: 11 Oct 08 Posts: 13 Credit: 134,700 RAC: 0 |
|
Macuilxochitl Send message Joined: 11 Oct 08 Posts: 13 Credit: 134,700 RAC: 0 |
Man, this simultaneously sucks and blow, not unlike the two case fans I just installed to make boinc work better. Errors are ongoing, for a minute I thought I was through that, but I'm gone from 82 to 115 since I reset the project without increasing my completed units from 26. And now it looks like my communication with rosetta is mucked up again, the Transfers tab show my Download is pending. Specifically: "Download: pending (project backoff: 00:30...." This is using a wifi dongle and my neighbor's much faster wire, speedtest says: Download: 42.18 Mbit/s Oh well, at least my new blue LED fan is pretty. Guess I'll try some memtest before I give up on the project. Ah, the download just restarted and mostly went comfortably until it got to the last tiem in the Transfers tabs, then it stopped again, but after a minute it retried and now I'm cranking again. My temps went right back up to 73C. despite the new fans, but it looks like I'm using a greater percent of my CPU according to htop. Ah crap, while I was typing my CPU use dropped off again and now I'm getting: Status: Communication deferred... Oh well. I'll report back after running memtest. |
Macuilxochitl Send message Joined: 11 Oct 08 Posts: 13 Credit: 134,700 RAC: 0 |
https://imgur.com/ViVw0CA Oy. |
Questions and Answers :
Unix/Linux :
boincmgr with rosetta downloaded lots of data and when I rebooted it seemed to start over
©2024 University of Washington
https://www.bakerlab.org