Message boards : Number crunching : validate errors
Author | Message |
---|---|
Robby1959 Send message Joined: 10 May 07 Posts: 38 Credit: 9,298,741 RAC: 0 |
one of my machines produces validate errors - what could be the problem ? the only other project is GPUgrid running |
Chilean Send message Joined: 16 Oct 05 Posts: 711 Credit: 26,694,507 RAC: 0 |
|
Murasaki Send message Joined: 20 Apr 06 Posts: 303 Credit: 511,418 RAC: 0 |
If you get any type of errors on your task: Step 1: Add a quick note on the type of error and the task id number in the Minirosetta 3.62 thread. That will help the scientists to investigate the problem and hopefully speed up finding a solution. Step 2: Check the work unit details for the invalid task. If one user gets an error the task is sent out to another user to try again. If both users had an error then it is safe to say there is a problem with the work unit. If the other user was successful when you failed then that could suggest a problem with your machine (though I would only make that assumption after several errors that turned out successfully on other machines - one error may just be an isolated glitch). One of your recent errors was on task 756849504 and checking the associated work unit, 685776063, shows that the second user also had an error. In this case it is safe to assume the problem is with the task and not your computer. One important point to remember with validate errors is that it is possible for 99% of your data to be useable by the scientists. The system will report a validate error if the final decoy information doesn't come back correctly. In this task you submitted 1,261 decoys so it is possible that 1,260 of them came out fine (unfortunately there is no way for us to tell on the user side whether the data we submit is usable solely on the error message). |
Link Send message Joined: 4 May 07 Posts: 356 Credit: 382,349 RAC: 0 |
One of your recent errors was on task 756849504 and checking the associated work unit, 685776063, shows that the second user also had an error. In this case it is safe to assume the problem is with the task and not your computer. This in a "FFD*" task, they all end with validate errors. All his validate errors are from those work units, so no problem on his end. . |
Murasaki Send message Joined: 20 Apr 06 Posts: 303 Credit: 511,418 RAC: 0 |
One of your recent errors was on task 756849504 and checking the associated work unit, 685776063, shows that the second user also had an error. In this case it is safe to assume the problem is with the task and not your computer. My advice explains what to do when validate errors occur in the future and I also pointed out the current issue is not with their machine. I don't see what was wrong with my reply... |
Link Send message Joined: 4 May 07 Posts: 356 Credit: 382,349 RAC: 0 |
Nothing wrong with your reply, just added a link to the thread about those tasks, actually as a hint for the OP to check at least few most recent threads for known problems, that way it's often possible to get an answer faster than by starting a new thread. . |
Betting Slip Send message Joined: 26 Sep 05 Posts: 71 Credit: 5,702,246 RAC: 0 |
Not even got this small problem sorted yet? https://boinc.bakerlab.org/rosetta/workunit.php?wuid=692377625 |
krypton Volunteer moderator Project developer Project scientist Send message Joined: 16 Nov 11 Posts: 108 Credit: 2,164,309 RAC: 0 |
Shooooot! Thanks for the report, we're working on canceling the jobs and fixing the bug. Not even got this small problem sorted yet? |
Rymorea Send message Joined: 28 Sep 15 Posts: 3 Credit: 53,241 RAC: 0 |
Last night I got 3 validate error https://boinc.bakerlab.org/rosetta/workunit.php?wuid=692514402 https://boinc.bakerlab.org/rosetta/workunit.php?wuid=692515434 https://boinc.bakerlab.org/rosetta/workunit.php?wuid=692402873 I hope it helps to find solution. Seti@home Classic account User ID 955 member since 8 Sep 1999 classic CPU time 539,770 hours |
svincent Send message Joined: 30 Dec 05 Posts: 219 Credit: 12,120,035 RAC: 0 |
Task 768264227 gave Validate Error. 6H_TUBE8_AB_6H_TUBE_07.pdb_15_10_12_32_13_globalDocking_1_SAVE_ALL_OUT_309429_46_0 |
Ananas Send message Joined: 1 Jan 06 Posts: 232 Credit: 752,471 RAC: 0 |
rb_12_07_59935_105598_ab_stage0_t000___robetta_IGNORE_THE_REST_12_12_313657_76_0 99 decoys generated (i think that's where all WUs stop), first delivery, no restarts, outcome "invalid" :-( |
Timo Send message Joined: 9 Jan 12 Posts: 185 Credit: 45,649,459 RAC: 0 |
rb_12_07_59935_105598_ab_stage0_t000___robetta_IGNORE_THE_REST_12_12_313657_76_0 Was still counted towards science and towards your points total, if you scroll to the bottom of the page you linked to you'll see this:
|
Ananas Send message Joined: 1 Jan 06 Posts: 232 Credit: 752,471 RAC: 0 |
It has the WU state "cancelled" now, one more from the same series has the same outcome and it is cancelled too. The credits have been granted later on that first one, I guess someone checked the problem and granted manually (thanks for that!), as the new one still shows 0 granted. |
Luigi R. Send message Joined: 7 Feb 14 Posts: 39 Credit: 2,045,527 RAC: 0 |
App version: Rosetta Mini 3.67 I'm still getting some validate errors from two different i7. These 24 hours of computing have 0 granted credit... why??? https://boinc.bakerlab.org/rosetta/result.php?resultid=777361677 https://boinc.bakerlab.org/rosetta/result.php?resultid=777391503 |
Timo Send message Joined: 9 Jan 12 Posts: 185 Credit: 45,649,459 RAC: 0 |
App version: Rosetta Mini 3.67 Both of those WUs came from jobs that got cancelled by the researcher (see here and here - note the 'errors' field shows the job was cancelled) after your received the WU in your queue. Generally if you wait a couple of days the claimed credit will be granted on a future sweep of the validator program. Situations like this are why David has been looking at tightening up the required 'average TAT' for some jobs. It doesn't appear to me that the BOINC client/server supports any kind of 'remote kill switch' for a given WU once it's been downloaded, so there's no way to prevent said WUs from needlessly running after queued if the job they belong to gets cancelled. Personally, to improve my average turn around time for my boxes and thus increase the chance that the crunching I'm doing is actually going to contribute to someone's query results, I've actually tightened my queue settings (I basically don't queue any tasks) and my target runtime is a more modest 8 hours. |
Luigi R. Send message Joined: 7 Feb 14 Posts: 39 Credit: 2,045,527 RAC: 0 |
Both of those WUs came from jobs that got cancelled by the researcher (see here and here - note the 'errors' field shows the job was cancelled) after your received the WU in your queue. Generally if you wait a couple of days the claimed credit will be granted on a future sweep of the validator program. Ok! I thought granted credit was immediately equal to claimed credit when a WU failed and the only credit counter was updated after a couple of days. Glad to know I was wrong. Situations like this are why David has been looking at tightening up the required 'average TAT' for some jobs. It doesn't appear to me that the BOINC client/server supports any kind of 'remote kill switch' for a given WU once it's been downloaded, so there's no way to prevent said WUs from needlessly running after queued if the job they belong to gets cancelled I've seen many times workunits to get "cancelled by server" when are no longer needed or for other reasons. Personally, to improve my average turn around time for my boxes and thus increase the chance that the crunching I'm doing is actually going to contribute to someone's query results, I've actually tightened my queue settings (I basically don't queue any tasks) and my target runtime is a more modest 8 hours. I usually run default WUs. I was doing some tests and my queue's settings were "min/max reserve of work" = "0.5/0.6 days". |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2125 Credit: 41,228,659 RAC: 10,982 |
Both of those WUs came from jobs that got cancelled by the researcher (see here and here - note the 'errors' field shows the job was cancelled) after your received the WU in your queue. Generally if you wait a couple of days the claimed credit will be granted on a future sweep of the validator program. And credited already. It seems to be an overnight job that mops up validate errors, so usually within 24hrs. It is disconcerting, I agree |
Message boards :
Number crunching :
validate errors
©2024 University of Washington
https://www.bakerlab.org