Message boards : Number crunching : Client Errors
Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 8 · Next
Author | Message |
---|---|
wbblakemore Send message Joined: 18 Dec 07 Posts: 33 Credit: 4,181 RAC: 0 |
Just out of curiosity, do we know for an affirmative fact that the same problem DOESN'T affect GPU users with ATI cards? |
Rayburner Send message Joined: 4 Oct 05 Posts: 32 Credit: 16,518,823 RAC: 0 |
I can just say that I do have an i7 with a AMD RADEON 6970 with driver 12.1 running just fine on rosetta (on Win7 x64). |
Sky King Send message Joined: 28 Feb 12 Posts: 11 Credit: 15,912 RAC: 0 |
It would be useful to hear from William Blakemore, Alpha Laser, Sky King, and Digital Savior if they are running EVGA's, how many, what model, and if they have been running Folding@Home. Interestingly, I do have an EVGA GTX 560 in my rig. I used to do GPU folding on other GPUs, but not on this GPU and this particular Windows 7 instance. On this machine I was doing only SMP folding, and I stopped all folding activities prior to switching over to BOINC. The EVGA driver package is 8.17.12.8562 and the NVIDIA control panel is 3.9.731.0 |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
Is it possible that BOINC APIs and etc. are actually functioning differently in differing hardware environments? That would certainly make it difficult to track down, because the problem would not directly be in your code. Make the same set of APIs calls to assemble the results on two different machines, and one works and one doesn't sort of thing would be exactly what one might see in such a case. Rosetta Moderator: Mod.Sense |
wbblakemore Send message Joined: 18 Dec 07 Posts: 33 Credit: 4,181 RAC: 0 |
Is it possible that BOINC APIs and etc. are actually functioning differently in differing hardware environments? That would certainly make it difficult to track down, because the problem would not directly be in your code. Make the same set of APIs calls to assemble the results on two different machines, and one works and one doesn't sort of thing would be exactly what one might see in such a case. The testing of In Memory Of Kinsey M Fowler Sr. conclusively proves otherwise. He was able to reproduce both successful runs and failing ones with the identical hardware configuration. It's distressing that you would make such a comment. When a user goes to significant effort to debug your software for you, you should at least do him the courtesy of paying attention to his results. |
Sky King Send message Joined: 28 Feb 12 Posts: 11 Credit: 15,912 RAC: 0 |
Well, for now, I have decided to just suspend Rosetta folding until this gets resolved. As a 10 million F@H point contributor, I really wanted all my computing horsepower to go specifically to Rosetta, but I'm going to do World Community Grid for a while. I am curious--when F@H -bigadv units came out, they ran much faster using the linux cores than windows cores--so much so that running a linux virtual machine appliance was still faster with VM overhead than running native windows. Are there canned BOINC VM appliances available that could run under VMWare VMplayer, that would bypass this problem? |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
It's distressing that you would make such a comment. When a user goes to significant effort to debug your software for you, you should at least do him the courtesy of paying attention to his results. My apologies. Please keep in mind that I am an at-home volunteer, and not a Rosetta developer. As such, I was not following all details of the described testing, because I know those on the Project Team are already doing so. I was simply trying to offer another point of view on the situation. Sometimes that sparks ideas that lead to solutions. Rosetta Moderator: Mod.Sense |
Sky King Send message Joined: 28 Feb 12 Posts: 11 Credit: 15,912 RAC: 0 |
Please keep in mind that I am an at-home volunteer, and not a Rosetta developer. I was thinking the same thing when I read the earlier post... Most community discussion groups are moderated by volunteers that have no official connection to the underlying developers. |
wbblakemore Send message Joined: 18 Dec 07 Posts: 33 Credit: 4,181 RAC: 0 |
It's distressing that you would make such a comment. When a user goes to significant effort to debug your software for you, you should at least do him the courtesy of paying attention to his results. I'm sorry if you found my comment to be overly harsh -- that certainly wasn't my intention. Having said that, though, I will also note that I am not a Rosetta developer, either, but an at-home volunteer such as yourself. I don't think it's asking too much of either of us to be aware of what's been previously posted, before adding additional posts to this thread. The alternative is merely adding additional FUD to what is turning out to be a difficult problem to solve. |
In Memory of Kimsey M Fowler Sr Send message Joined: 10 Mar 12 Posts: 26 Credit: 39,033,222 RAC: 0 |
Hi everyone... I don't have any great progress to report, but as we are all anxious for a way to resolve this, I want to give you an update. Rocco Moretti from Rosetta requested some sets of data from my machine collected both while running properly and while running with the client error condition. The data was sent near the end of last week, and this morning he reported that nothing has been found. Analysis is continuing. I'm trying real hard to come up with some ideas of what to do to try to get it running properly, test any theories, software/hardware configurations, etc. The problem boils down to whether or not the NVIDIA GPU driver is installed versus the Windows default driver. Understanding the problem is complicated by the fact that I have two machines running with the same versions of software on the same versions of motherboards and processors, but one machine has never experience the problem and the other has. Did this summary of the problem help you think of anything? It did for me, so now I have a few simple little experiments to try this evening. |
Sky King Send message Joined: 28 Feb 12 Posts: 11 Credit: 15,912 RAC: 0 |
Let me know how I can help... I currently have both the EVGA NVIDIA 560 and an AT 4850 in my rig... I am not too jazzed about too much tampering, but, I can run off the 4850 and submit some before/after work as well. |
wbblakemore Send message Joined: 18 Dec 07 Posts: 33 Credit: 4,181 RAC: 0 |
Hi everyone... I don't have any great progress to report, but as we are all anxious for a way to resolve this, I want to give you an update. Rocco Moretti from Rosetta requested some sets of data from my machine collected both while running properly and while running with the client error condition. The data was sent near the end of last week, and this morning he reported that nothing has been found. Analysis is continuing. Umm ... check software versions for the Rosetta app, as well as CUDA apps (and .dll's) for other projects? There has to be SOMETHING different :) |
Sky King Send message Joined: 28 Feb 12 Posts: 11 Credit: 15,912 RAC: 0 |
Rocco Moretti from Rosetta requested some sets of data from my machine collected both while running properly and while running with the client error condition. I am wondering if it is possible that Rosetta can "manually" issue us work packets to be completed... The "gold standard" for analysis purposes would be for us to run several work packets with nvidia driver, and then run the IDENTICAL work packets WITHOUT the nvidia driver and then diff the resulting work product. |
wbblakemore Send message Joined: 18 Dec 07 Posts: 33 Credit: 4,181 RAC: 0 |
Rocco Moretti from Rosetta requested some sets of data from my machine collected both while running properly and while running with the client error condition. As a follow-up question to this, would there be any benefit for those of us having this problem to head over to RALPH, to serve as testers for any fixes in the works? |
Rocco Moretti Send message Joined: 18 May 10 Posts: 66 Credit: 585,745 RAC: 0 |
As a follow-up question to this, would there be any benefit for those of us having this problem to head over to RALPH, to serve as testers for any fixes in the works? Interesting point. The version of the application is currently the same on both Ralph and Rosetta@home, and since we just released 3.26, it'd likely be a while before we'll test a new version. That said, I just double checked Ralph, and it doesn't look like anyone there is experiencing the same sort of symptoms. I can't say if that's just because there's no one on Ralph that has the type of system that's experiencing problems, or that there's some difference between Ralph and Rosetta@home that causes the problem to disappear - though I doubt it's the latter. If people are willing, signing up to Ralph@home with computers which are experiencing the problem would probably be worth trying. Certainly if we ever do figure out what the problem is, we would want to test it on Ralph first, prior to releasing it to Rosetta@home in general. |
AlphaLaser Send message Joined: 19 Aug 06 Posts: 52 Credit: 3,327,939 RAC: 0 |
As a follow-up question to this, would there be any benefit for those of us having this problem to head over to RALPH, to serve as testers for any fixes in the works? I've attached my host to ralph, unfortunately I haven't received any ralph tasks yet but still trying. Host page is here. |
wbblakemore Send message Joined: 18 Dec 07 Posts: 33 Credit: 4,181 RAC: 0 |
As a follow-up question to this, would there be any benefit for those of us having this problem to head over to RALPH, to serve as testers for any fixes in the works? OK, I created an account on Ralph as well ... I turned on NNT until the developers can figure out whether my presence (with 100% error rate on production software) is a help or a hindrance. |
In Memory of Kimsey M Fowler Sr Send message Joined: 10 Mar 12 Posts: 26 Credit: 39,033,222 RAC: 0 |
This is an update on my troubleshooting activites since my last post five days ago. I've been experimenting under the assumption that there are no problems with Rosetta/BOINC software. My activies have assumed there is a driver or software configuration problem causing "client errors". I've learned from various websites that there can be a complex relationship between among the motherboard chipset drivers, the GPU drivers (a.k.a. the "display adapter drivers" in Windows 7) and the video monitor drivers. All drivers associated with putting something out there for your eyes to see must work together as a team. Another issue is that from a driver perspective, VGA drivers are more forgiving, and DVI drivers are less forgiving/interchangable; DVI monitor drivers are more specific. Here are some of the things I've tested: 1) Plugged the monitor into each of four different DVI ports. This may seem silly, but I read reports of the DVI connector closest to the motherboard being preferred. My GPU's don't have VGA ports, and I am using a VGA cable with adapter. (Are any of you using a real DVI cable???) 2) I installed the DVI driver from the CD included with the HP S2031 flat panel 20" monitor I bought last month at Fry's for only $89. Was HP dumping them at a low price for a reason I wondered? 3) Next I got rid of the flat panel, and replaced it with a 20 year old Dell VGA extracted from the attic. 4) I reinstalled the chipset, GPU, and monitor drivers in different orders. These test were time consuming in waiting for tasks to complete to determine results. None of the above solved the problem. The next idea was to determine if the problem could be pinned to hardware or software. As mentioned above, I have two nearly identical systems with the same motherboards, CPUs, & OS Win 7 64-bit. The main difference is that the good computer has two unbridged EVGA 560 Ti's, and the problem computer has two unbridged EVGA 580's. I wondered what would happen if a mirror image of the hard drive from the good computer was put in place into the problem computer. I tested this using the following procedure: 1) Used Win 7 OS to make an emergency boot recovery CD for the good machine. 2) used Win 7 OS to generate a disk image of the entire hard drive (SSD) from the good machine onto an external hard drive plugged in via USB. 3) Used that emergency boot disk to boot the problem machine. 4) Went to Windows recover in control panels of the problem machine and installed the image from the external hard drive. This process completely wiped all existing files from the problem machine's boot drive. 5) Disconnected the network cable from the wall so the problem computer could not communicated with BOINC, Rosetta, Folding@HOme, or Microsoft pirate hunter. 6) Using built-in Win 7 OS capabilities, I proceeded to change the computer name, the user name, and Win 7 product key to those of the prior installation of the problem computer. BOINC, Rosetta, and Folding@Home were uninstalled, and their data directories deleted. 7) The network cable was plugged in, the machine rebooted, and Win 7 was re-validated with Microsoft. 8) BOINC and Rosetta were reinstalled from a fresh copy. At this point my machine was configured for testing the old hardware with a known good software configuration. The driver for the EVGA GeForce 560's and 580's is the same. I looked under device manager to check the display adapter driver and found that just the one EVGA 580 that the monitor was attached to was present, but it was using the correct NVIDIA driver. Without fiddling with anything else, I immediately launched Rosetta. The WU's returned from this configuration were reported correctly. Sixteen Wu's ran successfully before anything was changed. This might seem like a conclusive test, but to be fair, I must point out that all of the new WU's used Rosetta 3.26, whereas both the good and problem machines had been running 3.24. This introduced an unwanted variable into the test. The next step was to get the second EVGA 580 to be recognized by the computer. I found many, many reports on the web of a second video card not being recognized, and many of those reports mentioned an ASUS motherboard, the same as my installation. Next the NVIDIA drivers were reinstalled using a freshly downloaded file from EVGA. Upon rebooting the machine, device manager reported that both GPU's were now recognized and installed. Subsequent WU's completed by Rosetta reported client errors.... ouch! "Device Manager" was used to uninstalled one GPU, and again WU's completed successfully. Some Notes of Interest: a) I found that reinstallation of the NVIDIA driver can knock out the HP S2031 flat panel monitor driver. b) If a WU begins on a machine that's in a good state, but finishes while the machine is in a bad state, the WU fails with client errors. If a WU's progress reaches 100% while the machine is in a bad state, the WU fails with client errors. If a WU has finished in a bad state, and is "Ready to Report" and the machine is changed into a good state before the report occurs, the WU is reported with client errors. Therefore, I would conclude that the WU only fails with client errors (the type addressed by this forum anyway) if the platform upon which it is running is in a bad state at the moment when it's state of progress reaches 100%, and that its success is independent of the good/bad state at any time other than the 100% mark. I have more tests in progress now, and will try to make another report tonight. In the mean time, can you guys with the problem please respond to these questions: A) Anyone using the HP S1931, 2031, 2231, 2331 monitor series? B) Anyone have their monitor plugged in with a true DVI cable without DVI-to-VGA adapter? C) Is everyone running with two GPU's installed? D) Has anyone else discovered that if the second GPU is "uninstalled" via Device Manager, the problem goes away??? E) Anyone using an SLI bridge between multiple GPU's? Keep looking for the golden Easter egg! |
A.M. Send message Joined: 13 Jun 06 Posts: 12 Credit: 954,586 RAC: 0 |
If people are willing, signing up to Ralph@home with computers which are experiencing the problem would probably be worth trying. Can do. |
A.M. Send message Joined: 13 Jun 06 Posts: 12 Credit: 954,586 RAC: 0 |
I have more tests in progress now, and will try to make another report tonight. In the mean time, can you guys with the problem please respond to these questions: Negative. I'm using an ASUS monitor, which is being reported as a 'Generic PnP Monitor' by Windows B) Anyone have their monitor plugged in with a true DVI cable without DVI-to-VGA adapter? Yes I do. C) Is everyone running with two GPU's installed? I am not. D) Has anyone else discovered that if the second GPU is "uninstalled" via Device Manager, the problem goes away??? |
Message boards :
Number crunching :
Client Errors
©2024 University of Washington
https://www.bakerlab.org