Message boards : Number crunching : Minirosetta 3.73-3.78
Previous · 1 . . . 8 · 9 · 10 · 11 · 12 · 13 · 14 · Next
Author | Message |
---|---|
Tero Send message Joined: 22 Jul 17 Posts: 1 Credit: 939,545 RAC: 0 |
I seems that version 3.78 broke compatibility with the Linux client. After Minirosetta 3.78 update, tasks started to fail with "computation error". Latest version of the "regular" Rosetta works fine. I run CentOS linux 7.3 with client 7.6.22. It seems that the error is with how the new version handles files: ERROR: in::file::zip minirosetta_database.zip does not exist! ERROR:: Exit from: src/apps/public/boinc/minirosetta.cc line: 195 (Example workunit 853521226) There is a database zip-file, but it's name is minirosetta_database_d0bf94b.zip. If I make a copy of the zip file to minirosetta_database.zip, I get file errors like "ERROR: ERROR: Option file open failed for: 'flags_rb_10_11_78082_120670__t000__0_C1_robetta'" (workunit 854223185). That file was present in the project folder. |
planetclown Send message Joined: 27 Jan 12 Posts: 5 Credit: 13,101,172 RAC: 8,983 |
Hello, I'm occasionally seeing two different errors on the following apps:
Rosetta Mini v3.78 i686-pc-linux-gnu
BOINC:: Worker startup. Starting watchdog... Watchdog active. *** glibc detected *** ../../projects/boinc.bakerlab.org_rosetta/minirosetta_3.78_x86_64-pc-linux-gnu: free(): invalid pointer: 0x13867fb8 *** ======= Backtrace: ========= [0xdf36941] [0xdf3a45b] [0xede768c] [0xdeffb51] [0x81630ad] [0xd45eb92] [0xd45ebcb] [0xd465336] [0xd46ca67] [0xd46feef] [0xd474232] [0xd400a01] [0xd40c69a] [0xc9ac83d] [0xc9ad47f] [0xca8b53f] [0xb08de97] [0xb265920] [0xb2a83b6] [0xb29f4d2] [0x8aaae73] [0x8aae71d] [0x8ab361b] [0x8a925f9] [0x8a65a47] [0xb371855] [0xb3743be] [0xb434b13] [0xb43119d] [0x8a6fa23] [0x8056303] [0xdf0cfd8] [0x8048131] ======= Memory map: ======== 08048000-0ede4000 r-xp 00000000 08:05 1183736 /var/lib/boinc-client/projects/boinc.bakerlab.org_rosetta/minirosetta_3.78_x86_64-pc-linux-gnu 0ede4000-0edec000 rw-p 06d9c000 08:05 1183736 /var/lib/boinc-client/projects/boinc.bakerlab.org_rosetta/minirosetta_3.78_x86_64-pc-linux-gnu 0edec000-0f115000 rw-p 00000000 00:00 0 10d45000-17e18000 rw-p 00000000 00:00 0 [heap] ebd2d000-f2cd4000 rw-p 00000000 00:00 0 f305c000-f3d64000 rw-p 00000000 00:00 0 f4200000-f4221000 rw-p 00000000 00:00 0 f4221000-f4300000 ---p 00000000 00:00 0 f517e000-f517f000 ---p 00000000 00:00 0 f517f000-f5e8f000 rw-p 00000000 00:00 0 f5e8f000-f7667000 rw-s 00000000 08:05 1581177 /var/lib/boinc-client/slots/11/boinc_minirosetta_11 f7667000-f7668000 ---p 00000000 00:00 0 f7668000-f766b000 rw-p 00000000 00:00 0 f766b000-f766d000 rw-s 00000000 08:05 1581173 /var/lib/boinc-client/slots/11/boinc_mmap_file f766d000-f776a000 rw-p 00000000 00:00 0 f776a000-f776c000 r--p 00000000 00:00 0 [vvar] f776c000-f776e000 r-xp 00000000 00:00 0 [vdso] ffc6c000-ffc8e000 rw-p 00000000 00:00 0 [stack] </stderr_txt> ]]> The second error is SIGSEGV: segmentation violation BOINC:: Worker startup. Starting watchdog... Watchdog active. SIGSEGV: segmentation violation Stack trace (4 frames): [0xde75dcf] [0xf77ceca0] [0xdf36358] [0xeffb51ff] Exiting... </stderr_txt> ]]> I haven't seen any errors while running Rosetta v4.06 app or other BOINC projects. Any help would be appreciated. Thank you! |
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 2001 Credit: 9,780,807 RAC: 8,163 |
Sorry to say that but your crappy Ryzen is the problem Ryzen is crappy? Are you a troll? |
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 2001 Credit: 9,780,807 RAC: 8,163 |
Either the application crashes outright with a segmentation fault, or the C library kills it because it detected an invalid pointer, this way preventing a possible segfault. If you think about it there must also be cases where an invalid pointer goes unnoticed but doesn't cause a segfault. If you have a invalid pointer in your sw it's your problem, not a cpu problem. Search for "kill_ryzen" or "marginality error" and you'll find many reports on Ryzens segfaulting in a particular use case: massive parallel compiler runs on Linux. An extreme scenario, but not unrealistic, and there's no excuse for simply crashing. Problem solved months ago, with free replaces of early Ryzen and with bios update (agesa 1.0.0.6b). |
Jim1348 Send message Joined: 19 Jan 06 Posts: 881 Credit: 52,257,545 RAC: 0 |
Problem solved months ago, with free replaces of early Ryzen and with bios update (agesa 1.0.0.6b). I purchased a Ryzen 1700 made in week 33 of 2017, so it is a fixed version. It is on an ASRock Fatal1ty X370 Gaming X motherboard with the agesa 1.0.0.6b BIOS, and with 32 GB of Patriot DDR4 memory (15-15-15-36). The CPU is not overclocked, and runs Ubuntu 17.10. I just started running Rosetta on 15 cores, with the other core supporting a GTX 970 on Folding. Previously, it had been running WCG for about a month with no errors, but that is too easy. https://boinc.bakerlab.org/rosetta/results.php?hostid=3299745 In addition to errors, I am interested in the output. These are the 24-hour work units, and I was averaging about 800 points each on an i7-3770 (7 cores, with one reserved for a GPU, also on Ubuntu) for those that ran the full 24 hours. We will see how it goes. |
mmonnin Send message Joined: 2 Jun 16 Posts: 61 Credit: 25,390,629 RAC: 47,239 |
You can RMA segfault Zen chips. http://www.extremetech.com/computing/254750-amd-replaces-ryzen-cpus-users-affected-rare-linux-bug |
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 2001 Credit: 9,780,807 RAC: 8,163 |
That's not a solution, it's an emergency measure. And of course I expect it to be free. Good thing this option exists though. But in this RMA process they'll ask you to run tests and document them with photos. Believe it or not, I have no means to take photos, so no RMA for me. There is a radical solution: pass to Windows 10. Problem goes away :-P Or wait 4.06 become the default application. |
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 2001 Credit: 9,780,807 RAC: 8,163 |
Problem solved months agoI'm not aware of an official statement saying the problem's been identified, let alone solved. Care to give me a pointer? *giggles* New "RMA Ryzen" has not this problem, so they find it and resolve... |
Jim1348 Send message Joined: 19 Jan 06 Posts: 881 Credit: 52,257,545 RAC: 0 |
We will see how it goes. I have gotten rather poor performance with the Ryzen 1700, somewhat less output per core than an i7-3770, and three errors. But I have now disabled SMT in the BIOS. There we some problems with that early on with Ryzen, and maybe Rosetta does not work well with it on AMD. So I am now running Rosetta on 7 full cores, with one core reserved for the GPU. I will run it for about two or three more days to see. |
robertmiles Send message Joined: 16 Jun 08 Posts: 1233 Credit: 14,338,560 RAC: 2,994 |
If you have a invalid pointer in your sw it's your problem, not a cpu problem.I'm missing the word "because" in that sentence. I just saw a similar problem but under Windows 10 and on an Intel CPU. 7H2LD3_51C703_fold_and_dock_SAVE_ALL_OUT_538615_1685 https://boinc.bakerlab.org/workunit.php?wuid=864346673 Rosetta Mini 3.78 64-bit Windows 10 Intel i7-5950X, 32 GB, SSD Perhaps someone could check if it's the same problem, but under conditions much less likely to have the problem become visible. |
Jim1348 Send message Joined: 19 Jan 06 Posts: 881 Credit: 52,257,545 RAC: 0 |
I have gotten rather poor performance with the Ryzen 1700, somewhat less output per core than an i7-3770, and three errors. But I have now disabled SMT in the BIOS. There we some problems with that early on with Ryzen, and maybe Rosetta does not work well with it on AMD. So I am now running Rosetta on 7 full cores, with one core reserved for the GPU. I will run it for about two or three more days to see. After disabling SMT in the BIOS on my Ryzen 1700 machine (Ubuntu 17.10), I have obtained the following results, which are slightly complicated: https://boinc.bakerlab.org/results.php?hostid=3299745 Good News: No more errors, with 31 work units being completed successfully. This compares with 3 errors out of 21 work units when SMT was enabled. Bad News: The output, as measured by the credits is still quite low even on full Ryzen cores (running 7 cores, with the other one dedicated to a GPU) when you are running only Rosetta (but see below). And the credits are all over the place. Just considering the Rosetta mini 3.78 that ran the full 24 hours, they range from 178 to 815 (except for the last, at 1160 points), and averaged 337 points. That seems to be about the same (per core) as with SMT enabled and running Rosetta on 15 cores, so enabling SMT should at least increase the total output, even with errors. However, in neither case is the Ryzen as good a the i7-3770 (with hyperthreading). I get no errors on 3.78, and credits average around 800 points per work unit running with 7 cores. I see no advantage to Ryzen thus far as compared to Ivy Bridge if you run only Rosetta. But the Ryzen 1700 does much better on WCG (running mainly MCM and MIP, with a few of the others). There I get no errors, and twice the output of the i7-3770. So there is something wrong with how the Rosetta AMD app runs on Ryzen. I hope they can fix it, as I will probably be converting most of my machines to AMD eventually. And, in another twist, the last of the Rosettas did quite well at 1160 points. That was because as I was finishing the Rosettas, I allowed the WCG work units to run. Therefore, when most of the cores were running WCG, the last Rosetta got very good points (though the very last of the 3.78 got stuck and I had to abort it). Moral: Until they fix Rosetta to run properly on Ryzen, it would be best to mix Rosetta with something else on the majority of the cores (WCG works). You will probably need to experiment to find out what works best though. ===================================================================================================== Work units that ran the full 24 hours (3.78 only) run with SMT disabled (running on 7 full cores): Returned 9 Dec: 1160.19 815.46 187.98 186.55 Returned 8 Dec: 815.21 178.20 184.03 182.54 747.96 796.89 Returned 7 Dec: 184.87 187.17 186.49 182.87 183.08 183.50 181.75 Ave: 337 points (excluding the last work unit at 1160 points). NOTE: very little difference in credits per core with SMT enabled (but twice the number of cores). Addendum: I don't know how 4.06 Rosetta runs on Ryzen, except that the points are lower as compared to 3.78 Rosetta mini. But how it runs on an Intel chip is another matter. |
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 2001 Credit: 9,780,807 RAC: 8,163 |
Error after 5 hours.... 958310977 -529697949 (0xE06D7363) Unknown error code |
planetclown Send message Joined: 27 Jan 12 Posts: 5 Credit: 13,101,172 RAC: 8,983 |
Ryzen is crappy? Are you a troll?Yes. No. You don't seem to own a Ryzen. I do. Just want to reply with an updated status to my SEGFAULT issues with Ryzen 7 1700. I was able to reproduce the segmentation faults using the “kill ryzen” test. I also got a replacement Ryzen through AMD’s RMD process. It took about a week from when I mailed it back to when I received the replacement. My original CPU had a manufacture date in the 21st week of the year, the replacement in the 39th week (where it’s believed chips produced in the 25th or prior weeks may have issues). I have now completed 97 Rosetta Mini v3.78 tasks on linux without a single error. It appears RMDing the Ryzen was the solution. Thank you floyd for providing information on the issues that people have been having with the Ryzen chips! Results from my desktop with the latest Ryzen: https://boinc.bakerlab.org/results.php?hostid=3297625&offset=0&show_names=0&state=0&appid=4[/url] |
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 2001 Credit: 9,780,807 RAC: 8,163 |
963219454 ERROR: get_jump_that_builds_residue: not build by a jump! |
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 2001 Credit: 9,780,807 RAC: 8,163 |
Sorry to say that but your crappy Ryzen is the problem. Oh, boys, it's not a bug, it's a feature. :-P Intel security patch |
Mad_Max Send message Joined: 31 Dec 09 Posts: 209 Credit: 26,390,338 RAC: 19,707 |
Looks kike something wrong with rb_01_08_.... series of WUs on minirosetta 3.78. (rb_01_08_77806_122534__t000__2_C1_SAVE_ALL_OUT_IGNORE_THE_REST_541301_331_0 latest example) i have seen some of these tasks consuming huge amount of RAM - it start from standard 200-400 Mb range but at same point can hoard up to 1400-1800 Mb per task. May be even more - it crashed due to out of RAM (8 GB RAM + 4 GB page/swap file on 6-core CPU) |
Jim1348 Send message Joined: 19 Jan 06 Posts: 881 Credit: 52,257,545 RAC: 0 |
i have seen some of these tasks consuming huge amount of RAM - it start from standard 200-400 Mb range but at same point can hoard up to 1400-1800 Mb per task. I have five on Windows 7 64-bit (i7-4771), and six on Ubuntu 16.04 (i7-3770) ranging from 1 to 19 hours with no problems yet, but I will keep an eye on them. If they blow up, it must be late in the run. EDIT: By the way, I see you are using AMD CPUs. I got poor performance on my Ryzen 1700 on Rosetta, as I reported earlier. I wonder if they need to recompile it to fix this problem too? |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2136 Credit: 41,518,559 RAC: 15,775 |
Boinc 7.83 recent Mini-rosetta 3.78 error nRoCM_01_P05055_group0_congq_SAVE_ALL_OUT_IGNORE_THE_REST_541727_1334_0 ERROR: ERROR: reading of AtomPair failed. |
Mad_Max Send message Joined: 31 Dec 09 Posts: 209 Credit: 26,390,338 RAC: 19,707 |
I do not see such memory leaks any more lately too. About AMD CPU performance - I do not know. I do not have any latest AMD CPUs (from Ryzen family) yet. I am still using older CPUs: one Phenom II X6 and two FX-8320 (Vishera/Piledriver), And I have not seen any performance issues with these older AMD CPUs in Rosetta: they almost on par with corresponding (from same Generation/age and same core number) Intel CPUs. |
James W Send message Joined: 25 Nov 12 Posts: 130 Credit: 1,766,254 RAC: 0 |
I've recently begun having this issue with my host XP running Pentium 4 CPU. Previously no problems, though of course slow and relatively low credits as expected. Using app 3.78 windows_intelx86. Workunit 872559942 - Task 967645181 01/20/2018 12:57:30 PM | Rosetta@home | Computation for task rb_01_17_79431_122764__t000__1_C1_SAVE_ALL_OUT_IGNORE_THE_REST_542014_553_0 finished Same errors for Workunit 872559856 Task 967645069. No point in me continuing to run Rosetta on this host if this situation continues, as able to run SETI@home without issue with it. |
Message boards :
Number crunching :
Minirosetta 3.73-3.78
©2024 University of Washington
https://www.bakerlab.org