Message boards : Number crunching : Minirosetta 3.73-3.78
Previous · 1 . . . 9 · 10 · 11 · 12 · 13 · 14 · Next
Author | Message |
---|---|
Mad_Max Send message Joined: 31 Dec 09 Posts: 209 Credit: 25,915,065 RAC: 11,835 |
Looks kike something wrong with rb_01_08_.... series of WUs on minirosetta 3.78. (rb_01_08_77806_122534__t000__2_C1_SAVE_ALL_OUT_IGNORE_THE_REST_541301_331_0 latest example) i have seen some of these tasks consuming huge amount of RAM - it start from standard 200-400 Mb range but at same point can hoard up to 1400-1800 Mb per task. May be even more - it crashed due to out of RAM (8 GB RAM + 4 GB page/swap file on 6-core CPU) |
Jim1348 Send message Joined: 19 Jan 06 Posts: 881 Credit: 52,257,545 RAC: 0 |
i have seen some of these tasks consuming huge amount of RAM - it start from standard 200-400 Mb range but at same point can hoard up to 1400-1800 Mb per task. I have five on Windows 7 64-bit (i7-4771), and six on Ubuntu 16.04 (i7-3770) ranging from 1 to 19 hours with no problems yet, but I will keep an eye on them. If they blow up, it must be late in the run. EDIT: By the way, I see you are using AMD CPUs. I got poor performance on my Ryzen 1700 on Rosetta, as I reported earlier. I wonder if they need to recompile it to fix this problem too? |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2122 Credit: 41,184,189 RAC: 10,001 |
Boinc 7.83 recent Mini-rosetta 3.78 error nRoCM_01_P05055_group0_congq_SAVE_ALL_OUT_IGNORE_THE_REST_541727_1334_0 ERROR: ERROR: reading of AtomPair failed. |
Mad_Max Send message Joined: 31 Dec 09 Posts: 209 Credit: 25,915,065 RAC: 11,835 |
I do not see such memory leaks any more lately too. About AMD CPU performance - I do not know. I do not have any latest AMD CPUs (from Ryzen family) yet. I am still using older CPUs: one Phenom II X6 and two FX-8320 (Vishera/Piledriver), And I have not seen any performance issues with these older AMD CPUs in Rosetta: they almost on par with corresponding (from same Generation/age and same core number) Intel CPUs. |
James W Send message Joined: 25 Nov 12 Posts: 130 Credit: 1,766,254 RAC: 0 |
I've recently begun having this issue with my host XP running Pentium 4 CPU. Previously no problems, though of course slow and relatively low credits as expected. Using app 3.78 windows_intelx86. Workunit 872559942 - Task 967645181 01/20/2018 12:57:30 PM | Rosetta@home | Computation for task rb_01_17_79431_122764__t000__1_C1_SAVE_ALL_OUT_IGNORE_THE_REST_542014_553_0 finished Same errors for Workunit 872559856 Task 967645069. No point in me continuing to run Rosetta on this host if this situation continues, as able to run SETI@home without issue with it. |
James W Send message Joined: 25 Nov 12 Posts: 130 Credit: 1,766,254 RAC: 0 |
I'm having this error on my host Windows XP running Pentium 4 CPU. Using app 3.78 windows_intelx86. Name RhbaA_18619_a_trimmed_27_127len_cstwt_3.0_centerjumps_9mers_542830_9305_0 The following is repeated several times: Initializing random generators... ok |
James W Send message Joined: 25 Nov 12 Posts: 130 Credit: 1,766,254 RAC: 0 |
Re: My host Windows XP with Pentium 4 CPU (1 core HT). Issue with Rosetta Mini v3.78 windows_intelx86. Name rb_02_04_80757_123466__t000__1_C1_SAVE_ALL_OUT_IGNORE_THE_REST_544266_607_0 |
Aladar42 Send message Joined: 14 Nov 17 Posts: 2 Credit: 67,864 RAC: 0 |
Couple of errors overnight: https://boinc.bakerlab.org/workunit.php?wuid=878661631 https://boinc.bakerlab.org/workunit.php?wuid=878661900 https://boinc.bakerlab.org/workunit.php?wuid=878661716 |
James W Send message Joined: 25 Nov 12 Posts: 130 Credit: 1,766,254 RAC: 0 |
Application version Rosetta Mini v3.78 windows_x86_64 Device: 1759960, Task: 993065168, and WU 894585192 . Status: Error while computing. Errors: Too many errors (may have bug). Too many total results. Exit status -1 (0xFFFFFFFF) Unknown error code Options::initialize() |
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 1994 Credit: 9,572,447 RAC: 7,196 |
All "nas_final" wus end, after few seconds, with error: ERROR: unrecognized residue NAS |
James W Send message Joined: 25 Nov 12 Posts: 130 Credit: 1,766,254 RAC: 0 |
Application version Rosetta Mini v3.78 windows_x86_64 Device: 1759960, Task: 993062525, and WU 894544928. Status: Error while computing. Errors: Too many errors (may have bug). Too many total results. Exit status: 1 (0x00000001) Unknown error code <message> Incorrect function. (0x1) - exit code 1 (0x1)</message> |
James W Send message Joined: 25 Nov 12 Posts: 130 Credit: 1,766,254 RAC: 0 |
Application version Rosetta Mini v3.78 windows_x86_64 Device: 1759960, Task: 993062751, and WU 894756725. Status: Aborted. Exit status: 203 (0x000000CB) EXIT_ABORTED_VIA_GUI BOINC:: Worker startup. As WU continued to start over and resetting elapsed time to zero, at least 6 or more times, I aborted WU. Figured if was still on Structure 1 after 6 to 8 hrs of number crunching, it wasn't going to finish well., and was wasting CPU resources that could be used doing other work. |
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 1994 Credit: 9,572,447 RAC: 7,196 |
All "nas_final" wus end, after few seconds, with error: Again, all "nas_final" with the same error |
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 1994 Credit: 9,572,447 RAC: 7,196 |
1003924662 ERROR: Unable to open atomset parameter file: minirosetta_databasechemical/atom_type_sets/fa_standard// |
rjs5 Send message Joined: 22 Nov 10 Posts: 273 Credit: 23,008,562 RAC: 5,976 |
1003924662 Seems very strange that the path name has both forward and back slashes. |
Viking69 Send message Joined: 3 Oct 05 Posts: 20 Credit: 6,788,606 RAC: 1,744 |
I am seeing this, but the WU's seems to be getting credit. 7/20/2018 7:57:02 AM | Rosetta@home | Task PH18070961_fold_SAVE_ALL_OUT_677251_949_0 exited with zero status but no 'finished' file Hi all you enthusiastic crunchers..... |
rjs5 Send message Joined: 22 Nov 10 Posts: 273 Credit: 23,008,562 RAC: 5,976 |
I am seeing this, but the WU's seems to be getting credit. Seems like a relatively harmless BOINC timing problem. From Snags on the Ralph board: https://boinc.bakerlab.org/forum_thread.php?id=6376 The "exited with zero status but no 'finished' file" occurs when some other task on your computer prevents the science app from communicating with BOINC. It is usually safe to ignore it as it will have to happen 100 times to a task before the task will give up and error out. On the BOINC forum Jord (Ageless)makes the following suggestions: Possible causes of the "Task exited with zero status but no 'finished' file" syndrome: 1. Make sure you exclude the BOINC directory and all subdirectories (or the BOINC Data directory and all subdirectories in BOINC 6 and 7) from being actively scanned by anti-virus and anti-spyware software. Only scan when you have exited BOINC. 2. Don't defrag your disk with BOINC on. 3. Don't run Scandisk with BOINC on. 4. Disable Drive Indexing. 5. Update your motherboard chipset drivers, specifically those for your IDE or SATA controllers. 6. Disable the Time synchronization in Windows XP/Vista. normally found under the clock (double click it in the system tray), third tab (Internet in English), uncheck the sync option. 7. When you use use BOINC's CPU throttling function, you can run into the too many exit(0)s error. The advice here is to disable the BOINC throttling (set it to 100%) and reduce the amount of CPUs/cores for BOINC to use. ** Use at most 100.0 percent of CPU time. * In BOINC 7.0, this is done through the option On multiprocessors, use at most xxx% of the processors. |
James W Send message Joined: 25 Nov 12 Posts: 130 Credit: 1,766,254 RAC: 0 |
Application version Rosetta Mini v3.78 windows_intelx86 Device: 1759960, Task: 1015834050, and WU 915361344. Status: Timed out - no response. Outcome: No reply As WU continued to start over and resetting elapsed time to zero numerous times with the notation "Task exited with zero status but no 'finished' file." I reset the project rather than aborting the WU, as is always suggested, for a change. Apparently the WU went into ghost-land until time out date/time came along. I guess aborting would have been the "better" option, reassigning the WU to another host, or would it have been. Appears another task was created for this WU, but not sent as of this date and time. I'm wondering why not reassigned yet, or if problem found with this WU type. Just an FYI at this point as errored-out due to being timed-out. |
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 1994 Credit: 9,572,447 RAC: 7,196 |
1027856016 1027855995 ERROR: ERROR: FragmentIO: could not open file 00001.200.9mers After 2 years of this app there are errors again... Debug it or abandon it to pass all on 4.x version, that's the question |
James W Send message Joined: 25 Nov 12 Posts: 130 Credit: 1,766,254 RAC: 0 |
Application version Rosetta Mini v3.78 windows_intelx86 Device: 1759960, Task: 1026363230, and WU 924883597. Status: Validate error. Outcome: Validate error. Errors: Too many total results, which was seen before my task completed. I was 3 hours past deadline. However, the replacement task was "abandoned" and received less than 30 minutes after it was sent to host. No errors noted in my computation, per my review. The end of the Stderr output shows: ====================================================== I'm just wondering why the error. Normally I note after-deadline valid WUs still get credit if completed before valid replacement task. |
Message boards :
Number crunching :
Minirosetta 3.73-3.78
©2024 University of Washington
https://www.bakerlab.org