GPU WU's

Author	Message
Balmer Send message Joined: 2 Dec 05 Posts: 2 Credit: 96,879 RAC: 0	Message 91487 - Posted: 30 Dec 2019, 0:47:19 UTC Develop WU's for GPU(s) GPU(s) are much faster! .. ID: 91487 · Rating: 0 · rate: / Reply Quote

[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 2204 Credit: 13,720,774 RAC: 9	Message 91489 - Posted: 30 Dec 2019, 16:14:43 UTC - in response to Message 91487. Oh, no, please. Not another useless thread about gpu ID: 91489 · Rating: 0 · rate: / Reply Quote

Jim1348 Send message Joined: 19 Jan 06 Posts: 881 Credit: 52,257,545 RAC: 0	Message 91490 - Posted: 30 Dec 2019, 18:03:38 UTC - in response to Message 91489. Not another useless thread about gpu Advice from the expert. But there is always hope for the future. With all the new protein design techniques (maybe including AI), they could be developing new stuff now. I do know that they collaborate with Folding@home, and they do GPU work, so it might spread back. ID: 91490 · Rating: 0 · rate: / Reply Quote

[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 2204 Credit: 13,720,774 RAC: 9	Message 91491 - Posted: 30 Dec 2019, 21:09:32 UTC - in response to Message 91490. Last modified: 30 Dec 2019, 21:10:00 UTC But there is always hope for the future. - We are using a 2 years old apps (3.78 and 4.07) without signs of new version. - We are using Windows 32 bit apps, not 64bit. - We are using apps without optimizations (sse/avx). - This thread started 5 years ago, also this. This 6 years ago. This thread started 10 years ago. ID: 91491 · Rating: 0 · rate: / Reply Quote

Jim1348 Send message Joined: 19 Jan 06 Posts: 881 Credit: 52,257,545 RAC: 0	Message 91492 - Posted: 30 Dec 2019, 22:45:37 UTC - in response to Message 91491. That suggests that either the algorithms can't be adapted to the science they need to do, or else they have too many crunchers and no incentive to improve. If half the people quit, we can see if it has any effect. ID: 91492 · Rating: 0 · rate: / Reply Quote

kaancanbaz Send message Joined: 28 Apr 06 Posts: 23 Credit: 3,045,052 RAC: 0	Message 91493 - Posted: 31 Dec 2019, 9:58:23 UTC folding@home has many cpu only work units too. When asked fah says the same thing, not all projects are suitable for gpu folding. maybe its too hard for them to code a gpu version. You can dedicate your gpu to folding@home and cpu to rosetta. Thats how I use my resources for a long time. ID: 91493 · Rating: 0 · rate: / Reply Quote

Jim1348 Send message Joined: 19 Jan 06 Posts: 881 Credit: 52,257,545 RAC: 0	Message 91495 - Posted: 31 Dec 2019, 21:12:53 UTC - in response to Message 91493. Last modified: 31 Dec 2019, 21:15:46 UTC You can dedicate your gpu to folding@home and cpu to rosetta. Thats how I use my resources for a long time. Yes, I do BOINC on the CPU and Folding on the GPU too, on all my machines. They work well together. Some people unfortunately stick to one camp or another, but why not do both? ID: 91495 · Rating: 0 · rate: / Reply Quote

ProDigit Send message Joined: 6 Dec 18 Posts: 27 Credit: 2,718,346 RAC: 0	Message 91497 - Posted: 1 Jan 2020, 4:24:52 UTC - in response to Message 91493. Last modified: 1 Jan 2020, 4:27:47 UTC folding@home has many cpu only work units too. When asked fah says the same thing, not all projects are suitable for gpu folding. maybe its too hard for them to code a gpu version. You can dedicate your gpu to folding@home and cpu to rosetta. Thats how I use my resources for a long time. A lot of projects can now be ran on a modern GPU. ATI GPUs support Double precision, which is probably the only limitation on what projects can only be ran from CPU. Most RTX and modern higher end AMD GPUs, support DPP. You have to look at it this way: A CPU runs Out Of Order instructions, running at sub 5 Ghz, with mostly 8 cores 16 threads. That's 16 threads of 5Ghz of data, that gets a 20-25% boost due to out of order arrangement. Multiply this and a fictive number of 100 comes out. A GPU runs mostly in-order cores, at a much lower 1,5-1,8Ghz (let's say for the sake of discussion 1,5Ghz sustained). But a low end GPU has a good 384 cores, a high end GPU has 4500 cores. The fictive number for low end GPUs (like a GT 750) would be 576. The fictive number for high end GPUs (like an RTX 2080 Ti) would be 6750. And while the CPU has many more optimizations, GPUs benefit from their direct access to VRAM (much faster than CPU RAM). GPU worst case scenario VS CPU best case scenario: A budget GPU is +5x faster than a CPU. A high end GPU is +66x faster. A fair comparison would say the average GPU is 100x faster than an average CPU, while doing it at a much lower power consumption. Heck, even performance/power consumption (efficiency) of $20 cheap Chinese media players, or cellphones, is much higher than x86 CPUs. ID: 91497 · Rating: 0 · rate: / Reply Quote

Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0	Message 91500 - Posted: 1 Jan 2020, 14:51:18 UTC Some compute problems benefit from the high degree of parallelism. Some compute problems have to resolve a series of calculations where each answer is used as an input to the next calculation. Sequential problems like that benefit from high raw CPU (and memory, and L2 cache) performance. Fast memory and cache basically just eliminating things that can cause the CPU to be waiting rather than crunching. In other words, any performance metric published about GPU vs. CPU is only as good as how well the benchmark used for the comparison matches your actual requirements. I should also point out that there are several labs working on similar protein structure research, but different search algorithms are used. Some of those labs have created GPU applications. But this doesn't mean it is something that will show great performance benefit to R@h. I don't actually know enough about the algorithms and GPU functionality to have an opinion on the practicality of it. I am just saying that it is not safe to presume that since project X has done some sort of protein structure prediction on a GPU, that R@h, and the algorithms it uses for the various sorts of predictions, would see a similar performance boost from GPU. Rosetta Moderator: Mod.Sense ID: 91500 · Rating: 0 · rate: / Reply Quote

[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 2204 Credit: 13,720,774 RAC: 9	Message 91504 - Posted: 2 Jan 2020, 11:59:45 UTC - in response to Message 91492. That suggests that either the algorithms can't be adapted to the science they need to do, or else they have too many crunchers and no incentive to improve. The first.....but also the second. Rsj5 showed that they are not interested in optimize code!! If half the people quit, we can see if it has any effect. I did it. I passed from over 3k of RAC to less than 1k. And i'm evaluating to pass all my core to other interesting projects like QchemPedia ID: 91504 · Rating: 0 · rate: / Reply Quote

[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 2204 Credit: 13,720,774 RAC: 9	Message 91505 - Posted: 2 Jan 2020, 12:08:36 UTC - in response to Message 91500. I am just saying that it is not safe to presume that since project X has done some sort of protein structure prediction on a GPU, that R@h, and the algorithms it uses for the various sorts of predictions, would see a similar performance boost from GPU. Years ago they tried a gpu app of R@H (so i think is possible to do, even if limited to some protocols), with little benefits. But during these years a lot of things changed, like HW and SW, so i don't know if benefits are bigger now. ID: 91505 · Rating: 0 · rate: / Reply Quote

[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 2204 Credit: 13,720,774 RAC: 9	Message 91538 - Posted: 10 Jan 2020, 16:20:01 UTC Seems that at IPD are using some Rosetta protocols with GPU. But i don't know if it is usable in R@H ID: 91538 · Rating: 0 · rate: / Reply Quote

Jim1348 Send message Joined: 19 Jan 06 Posts: 881 Credit: 52,257,545 RAC: 0	Message 91539 - Posted: 10 Jan 2020, 16:38:54 UTC - in response to Message 91538. Very good. But in their sample, they train with three NVIDIA 1080 Ti GPUs. I wonder what that means for us. They could start doing more in-house, for example. ID: 91539 · Rating: 0 · rate: / Reply Quote

[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 2204 Credit: 13,720,774 RAC: 9	Message 91540 - Posted: 10 Jan 2020, 19:47:40 UTC - in response to Message 91539. Very good. But in their sample, they train with three NVIDIA 1080 Ti GPUs. And a Tesla P100. But in the paper (it's a preview) they does not specificate what platform (i think CUDA, even if i prefer OpenCl). Is this a shy beginning or they use gpu only internal? ID: 91540 · Rating: 0 · rate: / Reply Quote

ProDigit Send message Joined: 6 Dec 18 Posts: 27 Credit: 2,718,346 RAC: 0	Message 91880 - Posted: 6 Mar 2020, 13:40:21 UTC Most of these programs, actually use partial GPUs. They're still crunching on CPU, but offload whatever the GPU can do, to the GPU. For instance, The Collatz Conjecture is doing very little CPU work (just feeding the GPU with data to crunch), and uses probably less than 5-10% of CPU per high power GPU. Other GPU projects (like FAH, and I believe CPUGrid or PrimeGrid) can use as much as 80% of a CPU core running at 3-4Ghz. In the latter, the CPU us doing what the CPU only can. Most GPUs are great at 16 bit FPP, which is quite precise (or precise enough to do a lot of calculations). For instance, their coronavirus folding project can very easily be done on standard GPUs, like Folding at home. ID: 91880 · Rating: 0 · rate: / Reply Quote

Jim1348 Send message Joined: 19 Jan 06 Posts: 881 Credit: 52,257,545 RAC: 0	Message 91881 - Posted: 6 Mar 2020, 14:04:48 UTC - in response to Message 91880. Most of these programs, actually use partial GPUs. I am hoping that new developments in CUDA (or OpenCl) make GPUs more usable and more like CPUs. Only then will new projects start to pay attention to them. Otherwise, it is only the few that have algorithms that can be run on GPUs. Of course, both Nvidia and AMD would like that too, to extend their sales. I just wonder if we will see more at 7 nm? I don't know enough about the software to keep track of it. ID: 91881 · Rating: 0 · rate: / Reply Quote

Mad_Max Send message Joined: 31 Dec 09 Posts: 209 Credit: 30,949,009 RAC: 16	Message 91921 - Posted: 10 Mar 2020, 19:55:53 UTC - in response to Message 91497. Last modified: 10 Mar 2020, 20:12:37 UTC folding@home has many cpu only work units too. When asked fah says the same thing, not all projects are suitable for gpu folding. maybe its too hard for them to code a gpu version. You can dedicate your gpu to folding@home and cpu to rosetta. Thats how I use my resources for a long time. A lot of projects can now be ran on a modern GPU. ATI GPUs support Double precision, which is probably the only limitation on what projects can only be ran from CPU. Most RTX and modern higher end AMD GPUs, support DPP. You have to look at it this way: A CPU runs Out Of Order instructions, running at sub 5 Ghz, with mostly 8 cores 16 threads. That's 16 threads of 5Ghz of data, that gets a 20-25% boost due to out of order arrangement. Multiply this and a fictive number of 100 comes out. A GPU runs mostly in-order cores, at a much lower 1,5-1,8Ghz (let's say for the sake of discussion 1,5Ghz sustained). But a low end GPU has a good 384 cores, a high end GPU has 4500 cores. The fictive number for low end GPUs (like a GT 750) would be 576. The fictive number for high end GPUs (like an RTX 2080 Ti) would be 6750. And while the CPU has many more optimizations, GPUs benefit from their direct access to VRAM (much faster than CPU RAM). GPU worst case scenario VS CPU best case scenario: A budget GPU is +5x faster than a CPU. A high end GPU is +66x faster. A fair comparison would say the average GPU is 100x faster than an average CPU, while doing it at a much lower power consumption. Heck, even performance/power consumption (efficiency) of $20 cheap Chinese media players, or cellphones, is much higher than x86 CPUs. LOL divide GPU "cores" by factor of 64 and you will get REAL GPU core count. Real core = minimal independent part of electronic chip which can run own program/computing thread. For example GT 750 has 8 GPU cores and RTX 2080 Ti has 68 GPU cores. What you are referring to is not cores, but "shaders" or elemental computation units inside SIMD engine of GPU core. Usually 64 shaders per GPU core for AMD and NV GPUs . And most of them is 32 bit compute units, only minority is capable of 64 bit. Naming shaders as "cores" is just marketing bullshit. x86 cores also have multiple compute unites inside each core and all of them is 64 bit capable. Current standard desktop x86 CPUs from Intel/AMD has 8х64 bit FPU plus 4 integer/logic compute units per each core and running at 2-3 times higher frequency and with higher efficiency compared to GPU cores. Intel high-end server CPUs has 16x64 bit FPUs + 4x64 INT per each core. As a result modern GPU only just few times faster compared to modern CPUs. And only on task well suitable for highly parallel SIMD computation. On tasks non well suitable for such way of computation it can be even slower compared to CPUs. ID: 91921 · Rating: 0 · rate: / Reply Quote

torma99 Send message Joined: 16 Feb 20 Posts: 14 Credit: 298,345 RAC: 0	Message 91924 - Posted: 10 Mar 2020, 20:51:26 UTC - in response to Message 91921. Great summary! ID: 91924 · Rating: 0 · rate: / Reply Quote

Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1939 Credit: 18,534,891 RAC: 0	Message 92497 - Posted: 29 Mar 2020, 0:36:30 UTC - in response to Message 91921. As a result modern GPU only just few times faster compared to modern CPUs. And only on task well suitable for highly parallel SIMD computation. On tasks non well suitable for such way of computation it can be even slower compared to CPUs. Actually, the facts say otherwise. Seti@home data is broken up in to WUs that are processed serially, perfect for CPUs. Over time they made use of volunteer developers, and applications that made use of SSSEx.x and eventually AVX were developed- the AVX application being around 40% faster than the early stock application (depending on the type of Task being done). Then they started making use of GPUs, and guess what? It is possible to break up some types of work that is normally processed serially to do parallel processing, then re-combine the parallel produced results to give a final result that matches that produced by the CPU application, and it does it in much less time. For example- using the final Special Application for LINUX for Nvidia GPUs (as it uses CUDA) a particular Task on a high-end CPU will take around 1hr 30min (all cores, all threads and using the AVX application which is 40% faster than the original stock application). The same task on a high-end GPU is done in less than 50 seconds. And the GPU result matches that of the CPU- a Valid result. Personally, i think going from 90min to 50 seconds to process a Task is a significant improvement. Think of how much more work could be done each hour, day, week. Grant Darwin NT ID: 92497 · Rating: 0 · rate: / Reply Quote

Laurent Send message Joined: 15 Mar 20 Posts: 14 Credit: 88,800 RAC: 0	Message 92511 - Posted: 29 Mar 2020, 7:59:53 UTC - in response to Message 92497. The short version: they are sitting on a huge code base, licenced in a way that limits open-soucre efforts and originally written to be highly modular. The single thread code used in boinc can also run in MPI mode on clusters. Multiple universities have extended the high level code parts, sometimes exploiting the inner mechanics of the low level code. Porting that one without breaking code will not be easy. I'm totally with you regarding runtime and it being worth the effort. But this one calls for full time developers with formal training, not scientists doing development on the side. I'm not sure how much man-power Rosetta actually has to do this and I'm also not sure if the commercial side of Rosetta has an interest in doing this. ID: 92511 · Rating: 0 · rate: / Reply Quote