Message boards : Number crunching : R@H Scientists/Coders: An analysis of the Rosetta binaries...
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · Next
Author | Message |
---|---|
David E K Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 1 Jul 05 Posts: 1018 Credit: 4,334,829 RAC: 0 |
I added the Windows (SSE2) and linux (64bit SSE4) builds to Ralph@home and the linux app has a 28% failure rate. The Windows app looks good at a 97% success rate. The current 32bit linux app also has a 97% success rate which is expected. I'll have to look into the 64bit linux app failures and see what's actually going on. For the linux build I used the sse4 optimization option, a more current GCC, and a 64bit build which gave around a 13% improvement. |
David E K Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 1 Jul 05 Posts: 1018 Credit: 4,334,829 RAC: 0 |
Looks like the errors are likely coming from older cpu's that don't support sse4. |
rjs5 Send message Joined: 22 Nov 10 Posts: 273 Credit: 23,013,988 RAC: 6,289 |
I added the Windows (SSE2) and linux (64bit SSE4) builds to Ralph@home and the linux app has a 28% failure rate. It is unlikely to be an SSE4 issue. Since the failing CPU is an AMD CPU, it is more likely an SSE3 or SSSE3 problem. |
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 1994 Credit: 9,581,913 RAC: 7,911 |
I added the Windows (SSE2) and linux (64bit SSE4) builds to Ralph@home and the linux app has a 28% failure rate. Some points: 1) Please, insert brief description of app in Applications Page, like this. 2) Which is the Win32 improvement? 3) The large part of computational power is generated by Win64 hosts. So, waiting for this. :-) |
David E K Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 1 Jul 05 Posts: 1018 Credit: 4,334,829 RAC: 0 |
I added the Windows (SSE2) and linux (64bit SSE4) builds to Ralph@home and the linux app has a 28% failure rate. Will do. The Win32 minimally improved (less than 2%) with sse2. |
Timo Send message Joined: 9 Jan 12 Posts: 185 Credit: 45,649,459 RAC: 0 |
What is the baseline or 'normal' failure rate on the different platforms? |
rjs5 Send message Joined: 22 Nov 10 Posts: 273 Credit: 23,013,988 RAC: 6,289 |
What is the baseline or 'normal' failure rate on the different platforms? The failure rate should be 0%. There was a 28% "failure" rate because 28% of the jobs went to AMD machines which aborted immediately. SSE4 is not supported by the AMD CPU. There are other inconsistencies in the instruction sets. It is pretty easy to get to SSE2 because both AMD and Intel support SSE2 instructions similarly. Once you go beyond SSE2, optimization becomes a little more tricky. That is why the BOINC environment provides the application with information that tells it what features the CPU support. It is displayed in the BOINC manager Event Log when BOINC starts up. There are a number of things that affect the performance that is measured. Among things affecting performance. 1. Windows kicks the app into turbo mode aggressively and the newer CPU throttle back frequency if the CPU heats up to prevent damage (Sandybrdge and newer). The old and new versions may not be running at the same frequency on the same machine. Not likely affecting David's results. 2. 8 32-bit registers versus 16 64-bit registers. The Win 64 version uses the same 8 XMM registers in SCALAR mode (one compute at a time) that the 32-bit version does. The 64-bit performance comes from having 8 more registers to store temporary results and not having to STORE and LOAD them to/from memory. After getting to SSE2, the next performance barrier would likely be to figure out why the application is running in SCALAR mode rather than VECTOR mode. Fixing the code so the compiler will generate vector code will double the performance during the times when these computations are currently happening. Without vector code or looking at the algorithms, there is little more that David can do beyond SSE2. The Intel ICC "-ax" dispatcher will automatically build "fat" binaries with code optimized for CPU features, but I suspect that code might still generate scalar code. |
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 1994 Credit: 9,581,913 RAC: 7,911 |
?? AMD support SSE4.2 since 2011.. |
dcdc Send message Joined: 3 Nov 05 Posts: 1831 Credit: 119,555,104 RAC: 7,481 |
I'm trying to follow - were the failures on an AMD Phenom machine? And is it worth sending some more out for a bigger sample of which machines they fail on? |
rjs5 Send message Joined: 22 Nov 10 Posts: 273 Credit: 23,013,988 RAC: 6,289 |
boboviz .... You are absolutely correct and my error. Thanks for pointing it out. Sorry. dcdc .... The only errors that were reported publicly were ones on an AMD Phenom running 64-bit Linux. I ran about 80 tasks successfully on my machines so I made the leap-of-guess that anyone with an older AMD CPU would get tasks, they would quickly error-out and then the machine would ask for more. David E K would have to break down the exact machine types. Setting database description ... Setting up checkpointing ... Setting up graphics native ... BOINC:: Worker startup. Starting watchdog... Watchdog active. SIGILL: illegal instruction Stack trace (18 frames): [0x3c04cbd] [0x409f10] [0x35b0473] [0x376a461] [0x37343a8] [0x373498c] ... [0x1fc556b] [0x202cc6d] [0x202bbc0] [0x40de78] [0x3cbb84b] [0x4003e9] Exiting... As soon as the application has unpacked and begins to run via BOINC it fails I am using AMD Phenom CPUs running 64 Bit Fedora Linux. |
Dr. Merkwürdigliebe Send message Joined: 5 Dec 10 Posts: 81 Credit: 2,657,273 RAC: 0 |
?? AMD support SSE4.2 since 2011.. Applies to Phenom: Gotta admit, I'm a bit confused here. Wikipedia says the supported instruction sets are: MMX, Extended 3DNow!, SSE, SSE2, SSE3, SSE4a, AMD64, Cool'n'Quiet, NX bit, AMD-V These instructions (SSE4a) are not available in Intel processors and it's not the same as SSE4. Quote from the German Wikipedia: Trotz des ähnlichen Namens hat SSE4a nichts mit Intels Befehlssatzerweiterung SSE4 zu tun. Die einzige Gemeinsamkeit besteht lediglich darin, dass beide auf SSE3 aufbauen. means Though having a simlar name it's got nothing to do with the instruction set SSE4. The only thing in common is that both have their foundation in SSE3. |
dcdc Send message Joined: 3 Nov 05 Posts: 1831 Credit: 119,555,104 RAC: 7,481 |
From CPU world (http://www.cpu-world.com/Glossary/S/SSE4.html): SSE4 instruction set extension consists of 54 instructions that improve performance of media data manipulation and text processing. The first 47 instructions from the SSE4, called SSE4.1, were introduced in Intel Penryn core on January 7th, 2008. Support for remaining 7 instructions, or SSE4.2, was included into Nehalem core. So any intel chip with ix (or pentium/celeron derivatives) support full SSE4.2, and earlier chips from the 45nm shrink (so my E8400 but not my Q6600) support the majority of the instructions under SSE4.1. From Wiki (https://en.wikipedia.org/wiki/SSE4): The SSE4a instruction group was introduced in AMD's Barcelona microarchitecture. These instructions are not available in Intel processors. |
dcdc Send message Joined: 3 Nov 05 Posts: 1831 Credit: 119,555,104 RAC: 7,481 |
I'll also look into a VS upgrade. According to the link above, it's out tomorrow. The Release candidate is available here: https://www.visualstudio.com/en-us/downloads/visual-studio-2015-downloads-vs.aspx And it says: Free for open source projects, academic research, training, education and small professional teams. Would that allow AVX to be tested? |
Dr. Merkwürdigliebe Send message Joined: 5 Dec 10 Posts: 81 Credit: 2,657,273 RAC: 0 |
Phenom = K10 = no SSE4x Problem solved? |
dcdc Send message Joined: 3 Nov 05 Posts: 1831 Credit: 119,555,104 RAC: 7,481 |
I would guess you're right. The K10 chips do have SSE4a which might cause confusion? Is it down to the compiler to point Windows/Linux to the correct parts of the binary to use, given the CPU? |
Dr. Merkwürdigliebe Send message Joined: 5 Dec 10 Posts: 81 Credit: 2,657,273 RAC: 0 |
Is it down to the compiler to point Windows/Linux to the correct parts of the binary to use, given the CPU? I searched the Internet for an answer to that but only found information in regard to the Intel compiler. I figure it will be similar with gcc. Intel Compiler Vectorization
Not sure though... |
rjs5 Send message Joined: 22 Nov 10 Posts: 273 Credit: 23,013,988 RAC: 6,289 |
Is it down to the compiler to point Windows/Linux to the correct parts of the binary to use, given the CPU? I think that gcc 4.8 and following gcc versions try to do something similar but it is not clear to me what it does. https://gcc.gnu.org/wiki/FunctionMultiVersioning ... and then a link from the above page .... https://gcc.gnu.org/wiki/FunctionSpecificOpt Getting to SSE2 is pretty easy and does not cause too much damage because SSE2 has been in CPU's for a decade. Going beyond SSE2 is tricky. The best "next step" beyond SSE2 is to make sure the source code will generate vector code. If you cannot arrange the source code to generate a vector binary, then it makes little sense to turn on higher options (AVX ....). You are still only going to crunch 1 value at a time. Vector code will allow you to crunch multiple values in parallel. The performance of all sections of code that convert to vector code will be multiplied by the vector size. That is why you try to use the DATA TYPE size that does the job and no larger. If you can get your answer using a FLOAT or INTEGER, then using a DOUBLE or a LONG reduces your performance by 1/2. You crunch a lot of 0's to generate 0's. 8-) |
Dr. Merkwürdigliebe Send message Joined: 5 Dec 10 Posts: 81 Credit: 2,657,273 RAC: 0 |
Looks like the developers need to get their hands on a Linux version of the Intel compiler or accept SSE3 as the baseline or screw all AMD K10 users. |
dcdc Send message Joined: 3 Nov 05 Posts: 1831 Credit: 119,555,104 RAC: 7,481 |
VS2015 says: Build for iOS, Android, Windows devices, Windows Server or Linux |
rjs5 Send message Joined: 22 Nov 10 Posts: 273 Credit: 23,013,988 RAC: 6,289 |
VS2015 says: Build for iOS, Android, Windows devices, Windows Server or Linux The Intel compiler plugs right into MS Visual Studio and that is really the preferred way I think they would like to have you use it. I don't know if it will install properly into VS15 but there is an academic program that U. of Wash. developers likely qualify for. Intel was one of the early supporters of BOINC. Link .... https://software.intel.com/en-us/qualify-for-free-software VS15 itself will not solve the problem of generating the best code for "a particular" CPU. Rosetta, however, knows what requesting system looks like and what binary they would like to ship to the user for best/reasonable results. Look at the file in your BOINC data directory for a file named "sched_request_boinc.bakerlab.org_rosetta.xml". The p_features section describes what capabilities your system CPU has. Rosetta can ask for an SSE4 application to be sent to my Haswell machine because it supports SSE4. <p_features>fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss htt tm pni ssse3 fma cx16 sse4_1 sse4_2 movebe popcnt aes f16c rdrandsyscall nx lm avx avx2 vmx tm2 dca pbe fsgsbase bmi1 smep bmi2</p_features> Rosetta must already parses this file to determine PLATFORM (Windows or Linux) and what kind of jobs I want to accept. If they can handle TWO platforms, then this platform dispatcher could be expanded to include additional granularity of FEATURES supported. |
Message boards :
Number crunching :
R@H Scientists/Coders: An analysis of the Rosetta binaries...
©2024 University of Washington
https://www.bakerlab.org