Message boards : Number crunching : At a bit of a loss
Author | Message |
---|---|
Chris Holvenstot Send message Joined: 2 May 10 Posts: 220 Credit: 9,106,918 RAC: 0 |
Yesterday I had a system start aborting all of its tasks. When mew tasks were downloaded I was getting "file size or signature" errors on the minirosetta_2.17_x86_64-pc-linux-gnu file. However, I compared the size and checksum of this file to the same file on several "working" systems and everything matched. This system normally is rack mounted running "headless"over a VNC connection, so the network is functioning OK. dmesg and messages file show no errors reported at the system level. I deleted the minirosetta_2.17_x86_64-pc-linux-gnu file and reinstalled BOINC - when it was brought back up it tried to download the file and got the same error. I took the system out of the rack and put it on the bench with a real monitor and keyboard, ran fsck a number of times, and brought BOINC back up. Still getting the checksum / signature error. I once again cleaned up the project file, deleting minirosetta* and started over. I'm now getting getting the checksum / signature error on both the minirosetta_2.17_x86_64-pc-linux-gnu file and the minirosetta_database_rev39052,zip file. Other files downloaded such as abrelax.default.v1 and 2RN2_pcs_cst_files.r2.pnoe.V1 download with out this error. The error is the same with image file verification feature turned on or turned off. The following types of messages are shown in the job output for all these jobs: <file_xfer_error> <file_name>minirosetta_graphics_1.92_x86_64-pc-linux-gnu</file_name> <error_code>-200</error_code> </file_xfer_error> <file_xfer_error> <file_name>minirosetta_2.17_x86_64-pc-linux-gnu</file_name> <error_code>-200</error_code> </file_xfer_error> Once again I stress, it appears that the files are making it into the ../projects/boinc.bakerlab.org_rosetta directory with the correct filesize and checksum (as caluclated using "sum") Any ideas? |
[AF>france>pas-de-calais]symaski62 Send message Joined: 19 Sep 05 Posts: 47 Credit: 33,871 RAC: 0 |
http://boincfaq.mundayweb.com/index.php?view=80&sessionID=6beaee9a77e8cbf27114ae3a750d9877 ERR_WRONG_SIZE -200 Your files have the wrong file size. |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
If the files are arriving uncorrupted, as you seem to have verified as many ways as possible, and you are still getting the error, the only culprit left would seem to be BOINC. It would seem to be erroneously reporting the error. What BOINC version are you running on that machine? One other possibility would be a piece of the project configuration, have you tried completely detaching and reattaching R@h? Rosetta Moderator: Mod.Sense |
Chris Holvenstot Send message Joined: 2 May 10 Posts: 220 Credit: 9,106,918 RAC: 0 |
What BOINC version are you running on that machine? It was running 6.10.56 at the time of the failure, but upgraded 6.10.58 as a part of the diagnostic process. That did not help. I would snag a copy of the downloaded files which were marked as being in error before they were cleaned up and manually unzip them - assuming that I may see an indication of what was causing the length / signature errors. They unzipped clean. I brought up network tools and started to look to see if I had a ratty network connection - which I doubted since moving it to the bench this morning it was connected directly to the router with a new cable instead of being on the downstream switch. 0 Xmit errors, 0 Recv Errors, 0 Collisions Still getting length / signature errors. In desperation I tried your suggestion of completely disconnecting from the project and reconnecting - bang, everything started working. As Gomer Pyle would say - Shazam. Want to give me a clue as to what kind of magic is at work here? Could it have been a fat electron? |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
Am I correct to presume that R@h previously had been running fine on that machine? If so, then it almost sounds like there's a project key used to test the signature validity, and it somehow became corrupted on your hard drive. With a corrupted key you can't come up with a matching signature, hence they never match. Rosetta Moderator: Mod.Sense |
Chilean Send message Joined: 16 Oct 05 Posts: 711 Credit: 26,694,507 RAC: 0 |
|
Chris Holvenstot Send message Joined: 2 May 10 Posts: 220 Credit: 9,106,918 RAC: 0 |
Am I correct to presume that R@h previously had been running fine on that machine? Most certainly, this was one of my early machines and had been running fine until sometime yesterday while I was at work. It had garnered a cumulative credit of over 350K. Corruption of a file is always a possibility here - especially since when I got home from work yesterday it was clear that the local electric company had once again "showed its affection" for those of us living in the complex by "blipping" our electricity - the timer on the microwave always rats them out when it flashes "reset" Your idea of a corrupted key does make sense. |
Chris Holvenstot Send message Joined: 2 May 10 Posts: 220 Credit: 9,106,918 RAC: 0 |
Chile Man - they say that everything is bigger up here in Texas - including the RAC - you know, I'm pretty good with Photoshop, you want me to try fixing that Lone Star on your avatar? Something just seems slightly out of place. Have a great day my friend |
Message boards :
Number crunching :
At a bit of a loss
©2024 University of Washington
https://www.bakerlab.org