Computation Error

Message boards : Number crunching : Computation Error

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
wk4536

Send message
Joined: 18 Mar 14
Posts: 5
Credit: 14,883
RAC: 0
Message 80222 - Posted: 22 Jun 2016, 20:32:31 UTC

So I have recently checked the success of some of the WU I have completed and I have a ton of them that are never validated due to compute error (average about 10% success rate) and am not sure why this is the case. Could anyone provide me with some feedback because I will probably just move my support towards other projects like the world community grid. I don't have the most powerful machines but its frustrating to see that most of the efforts made aren't actually resulting in progress for the project
ID: 80222 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2122
Credit: 41,184,189
RAC: 10,001
Message 80223 - Posted: 23 Jun 2016, 3:21:51 UTC - in response to Message 80222.  
Last modified: 23 Jun 2016, 3:25:41 UTC

So I have recently checked the success of some of the WU I have completed and I have a ton of them that are never validated due to compute error (average about 10% success rate) and am not sure why this is the case. Could anyone provide me with some feedback because I will probably just move my support towards other projects like the world community grid. I don't have the most powerful machines but its frustrating to see that most of the efforts made aren't actually resulting in progress for the project

All your tasks seem to run ok, but those on your two android devices seem to come up with file xfer errors. Do you see errors reported at your end? Is there an antivirus firewall involved or restriction in mobile connectivity somewhere?

I mention this because I'm currently having file upload problems from my laptop at the moment even though all other devices are working fine (and downloads are ok too)
ID: 80223 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
wk4536

Send message
Joined: 18 Mar 14
Posts: 5
Credit: 14,883
RAC: 0
Message 80244 - Posted: 23 Jun 2016, 18:41:32 UTC

I don't see any error messages when I check my android devices. One of them is an old galaxy s3 that I leave plugged in and check every couple of days just to see if its still crunching and the other is my s6 that crunches when I charge it. I honestly did not think there were any issues until I thought to check my task list here on this site and saw that most of the tasks never actually earned any credits. I never installed any firewalls or anything on my phones like I do on my computer and I don't think I know how to restrict the mobile connection. I live downtown in a big city on the east coast and one of the devices is completely stationary (doesn't leave a room) in a building with wifi. It is possible that the wifi had occasional flickers but I don't think that would justify the large number of errors seen.


ID: 80244 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
caffeineyellow5

Send message
Joined: 12 Jun 16
Posts: 3
Credit: 4,876
RAC: 0
Message 80288 - Posted: 25 Jun 2016, 6:31:23 UTC

I have 26 successful results out of 111 on my ZTE Maven. It runs continuously plugged in and on the wifi and has continual upload errors after seemingly good finished tasks. It is a "naked" phone since the only app I have installed after buying it was BOINC.

This is the phone:
https://boinc.bakerlab.org/rosetta/show_host_detail.php?hostid=2737228

This is the task list:
https://boinc.bakerlab.org/rosetta/results.php?hostid=2737228

After getting a:
======================================================
DONE :: 2 starting structures 2279.05 cpu seconds
This process generated 2 decoys from 2 attempts
======================================================
BOINC :: WS_max 0
BOINC :: BOINC support services shutting down cleanly ...
01:54:19 (15023): called boinc_finish(0)

</stderr_txt>

I get a message:
<message>
upload failure: <file_xfer_error>
<file_name>db_set3_7res_android_d_c.20.10_0001_SAVE_ALL_OUT_344080_2094_1_0</file_name>
<error_code>-161 (not found)</error_code>
</file_xfer_error>

</message>

Please help me fix this so that I can get credit for every successful task and further the work of Rosetta@Home.
Thank you - Mike
ID: 80288 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Luigi R.

Send message
Joined: 7 Feb 14
Posts: 39
Credit: 2,045,527
RAC: 0
Message 80289 - Posted: 25 Jun 2016, 6:49:08 UTC

It looks like there is no db_set3_7res_android_d_c.20.10_0001_SAVE_ALL_OUT_344080_2094_1_0 file. Maybe no read permissions for some reason?
ID: 80289 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
caffeineyellow5

Send message
Joined: 12 Jun 16
Posts: 3
Credit: 4,876
RAC: 0
Message 80294 - Posted: 25 Jun 2016, 22:54:54 UTC - in response to Message 80289.  

It looks like there is no db_set3_7res_android_d_c.20.10_0001_SAVE_ALL_OUT_344080_2094_1_0 file. Maybe no read permissions for some reason?

How can I fix this error then???
ID: 80294 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
wk4536

Send message
Joined: 18 Mar 14
Posts: 5
Credit: 14,883
RAC: 0
Message 80298 - Posted: 27 Jun 2016, 7:37:15 UTC - in response to Message 80294.  

It looks like there is no db_set3_7res_android_d_c.20.10_0001_SAVE_ALL_OUT_344080_2094_1_0 file. Maybe no read permissions for some reason?

How can I fix this error then???


I would also like to know how to fix as this is currently the project that gets me most excited regarding boinc. I've suspended my work on this and shifted to other projects in the mean time.
ID: 80298 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Luigi R.

Send message
Joined: 7 Feb 14
Posts: 39
Credit: 2,045,527
RAC: 0
Message 80299 - Posted: 27 Jun 2016, 7:42:23 UTC

Sorry, I don't know. It's weird some successes occurred too.
ID: 80299 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
wk4536

Send message
Joined: 18 Mar 14
Posts: 5
Credit: 14,883
RAC: 0
Message 80300 - Posted: 27 Jun 2016, 8:01:57 UTC - in response to Message 80299.  

Sorry, I don't know. It's weird some successes occurred too.



Hmm, maybe it would be better to shift support to other projects until its sorted out. At least that way work units don't go to waste.
ID: 80300 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
caffeineyellow5

Send message
Joined: 12 Jun 16
Posts: 3
Credit: 4,876
RAC: 0
Message 80309 - Posted: 28 Jun 2016, 9:04:35 UTC - in response to Message 80300.  
Last modified: 28 Jun 2016, 9:06:19 UTC

Sorry, I don't know. It's weird some successes occurred too.



Hmm, maybe it would be better to shift support to other projects until its sorted out. At least that way work units don't go to waste.

I'm noticing when I go to look at the work units and their history on other systems that my lack of success is followed by or follows other's lack of success on the same work units. Perhaps the work units that can run on ARM7 systems all have a 25% success rate in general and failure is part of the plan for 75% of them??? It seems that the ones the fail for me fail for many systems across different versions of phone, OS, etc. I mean is this a possibility at Rosetta@Home that there is a built in 75% failure rate for certain types of work units and the failures and the successes are all leading to the success of the project as an whole? Like a failure proves something about the work inside the work unit that leads to conclusions about the work and directions of research? Just throwing it out there as an open question since I don't know at all and the thought occurred. Cause if that is true, then none of the work, failure or success, is wasted work, just uncredited work, which in the long term credit means nothing to the work being done, as long as it gets done.
ID: 80309 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
wk4536

Send message
Joined: 18 Mar 14
Posts: 5
Credit: 14,883
RAC: 0
Message 80316 - Posted: 29 Jun 2016, 20:12:33 UTC - in response to Message 80309.  


That is true if that was indeed what was happening. I just think that it actually means that it never made it back. Based on the outcomes explanation screen:

Client error The task was sent to a computer and an error occurred.
Success A computer completed and reported the task successfully.

I thought it was also a mobile device issue but the majority of the projects sent to my vaio also ended in a compute error. Oh well, its not really like my individual contribution was going to make or break anything. If the work being done does not result in the the results they want I don't think it would come up as a compute error since they would have received data that they can use to modify their design or whatever. I don't necessarily care that my credit numbers aren't going up since they don't mean anything outside of the app but I do like seeing changes so that I know that my computer and phone are making contributions, as small as they seem.

Sorry, I don't know. It's weird some successes occurred too.



Hmm, maybe it would be better to shift support to other projects until its sorted out. At least that way work units don't go to waste.

I'm noticing when I go to look at the work units and their history on other systems that my lack of success is followed by or follows other's lack of success on the same work units. Perhaps the work units that can run on ARM7 systems all have a 25% success rate in general and failure is part of the plan for 75% of them??? It seems that the ones the fail for me fail for many systems across different versions of phone, OS, etc. I mean is this a possibility at Rosetta@Home that there is a built in 75% failure rate for certain types of work units and the failures and the successes are all leading to the success of the project as an whole? Like a failure proves something about the work inside the work unit that leads to conclusions about the work and directions of research? Just throwing it out there as an open question since I don't know at all and the thought occurred. Cause if that is true, then none of the work, failure or success, is wasted work, just uncredited work, which in the long term credit means nothing to the work being done, as long as it gets done.

ID: 80316 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
catavalon21

Send message
Joined: 22 Oct 06
Posts: 1
Credit: 898,035
RAC: 0
Message 80328 - Posted: 4 Jul 2016, 13:26:09 UTC - in response to Message 80316.  


That is true if that was indeed what was happening. I just think that it actually means that it never made it back. Based on the outcomes explanation screen:

Client error The task was sent to a computer and an error occurred.
Success A computer completed and reported the task successfully.

I thought it was also a mobile device issue but the majority of the projects sent to my vaio also ended in a compute error. Oh well, its not really like my individual contribution was going to make or break anything. If the work being done does not result in the the results they want I don't think it would come up as a compute error since they would have received data that they can use to modify their design or whatever. I don't necessarily care that my credit numbers aren't going up since they don't mean anything outside of the app but I do like seeing changes so that I know that my computer and phone are making contributions, as small as they seem.

Sorry, I don't know. It's weird some successes occurred too.



Hmm, maybe it would be better to shift support to other projects until its sorted out. At least that way work units don't go to waste.

I'm noticing when I go to look at the work units and their history on other systems that my lack of success is followed by or follows other's lack of success on the same work units. Perhaps the work units that can run on ARM7 systems all have a 25% success rate in general and failure is part of the plan for 75% of them??? It seems that the ones the fail for me fail for many systems across different versions of phone, OS, etc. I mean is this a possibility at Rosetta@Home that there is a built in 75% failure rate for certain types of work units and the failures and the successes are all leading to the success of the project as an whole? Like a failure proves something about the work inside the work unit that leads to conclusions about the work and directions of research? Just throwing it out there as an open question since I don't know at all and the thought occurred. Cause if that is true, then none of the work, failure or success, is wasted work, just uncredited work, which in the long term credit means nothing to the work being done, as long as it gets done.



I don't think this is entirely the reason. I have a very large percentage of Android tasks ending in compute failure, and they can't have "not made it back", as the deadline hasn't occurred yet.

It's possible the results made it back with an error, but as another post said earlier, tasks that fail appear to fail repeatedly across several users consistently.

Hopefully it'll get sorted out, but will crunch something else in the interim. F@H has an Android client, though not on BOINC.

ID: 80328 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Dr. Merkwürdigliebe
Avatar

Send message
Joined: 5 Dec 10
Posts: 81
Credit: 2,657,273
RAC: 0
Message 80329 - Posted: 6 Jul 2016, 18:01:27 UTC
Last modified: 6 Jul 2016, 18:02:36 UTC

A whole bunch of compute errors on FFD_something jobs...

Link
ID: 80329 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 80330 - Posted: 6 Jul 2016, 18:06:43 UTC - in response to Message 80329.  

A whole bunch of compute errors on FFD_something jobs...

Link


Same here. All of my FFD_ tasks failed rather quickly. I'll shoot an EMail to DK just in case they have not already noticed.
Rosetta Moderator: Mod.Sense
ID: 80330 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
krypton
Volunteer moderator
Project developer
Project scientist

Send message
Joined: 16 Nov 11
Posts: 108
Credit: 2,164,309
RAC: 0
Message 80331 - Posted: 7 Jul 2016, 2:10:52 UTC

Thanks for the alert you guys! I've contacted the responsible scientist.

Hopefully we can avoid this in the future!
ID: 80331 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
jl91569

Send message
Joined: 5 Jul 16
Posts: 1
Credit: 9,049
RAC: 0
Message 80332 - Posted: 7 Jul 2016, 2:50:43 UTC

Hey there.

Just wanted to let you know that FFD tasks seem to be failing on my machine after a few seconds as well. Here's the error log:

ERROR: unrecognized residue SUG
ERROR:: Exit from: ......srccoreiopose_from_sfrPoseFromSFRBuilder.cc line: 1030
BOINC:: Error reading and gzipping output datafile: default.out
called boinc_finish

Thanks.
ID: 80332 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
rjs5

Send message
Joined: 22 Nov 10
Posts: 273
Credit: 23,013,736
RAC: 6,296
Message 80341 - Posted: 11 Jul 2016, 18:25:22 UTC - in response to Message 80331.  

Thanks for the alert you guys! I've contacted the responsible scientist.

Hopefully we can avoid this in the future!



Seems like someone could write a script that runs "hourly" to automatically detect errors and publish that to the list of scientists ... or is that what already happens? 8-)
ID: 80341 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 1994
Credit: 9,581,529
RAC: 7,889
Message 80342 - Posted: 11 Jul 2016, 19:58:48 UTC - in response to Message 80341.  

Seems like someone could write a script that runs "hourly" to automatically detect errors and publish that to the list of scientists ..


+1
Great idea!!
ID: 80342 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
sinspin

Send message
Joined: 30 Jan 06
Posts: 29
Credit: 6,574,585
RAC: 0
Message 80356 - Posted: 13 Jul 2016, 19:40:14 UTC - in response to Message 80341.  

Seems like someone could write a script that runs "hourly" to automatically detect errors and publish that to the list of scientists ... or is that what already happens? 8-)


+1
I hope it comes before the hell is frozen up.
ID: 80356 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
rjs5

Send message
Joined: 22 Nov 10
Posts: 273
Credit: 23,013,736
RAC: 6,296
Message 80369 - Posted: 15 Jul 2016, 0:32:00 UTC - in response to Message 80356.  
Last modified: 15 Jul 2016, 0:34:32 UTC

Seems like someone could write a script that runs "hourly" to automatically detect errors and publish that to the list of scientists ... or is that what already happens? 8-)


+1
I hope it comes before the hell is frozen up.



... and boboviz ...

The script is probably 5 lines long and the results could be sorted into a "scientist quality" shame ranking to improve the quality of submissions.

Scientist "rankings" could be a TASK "success percentage" or just ... my prank suggestion ...

1. Baker
2. Graduate
3. GED
4. WTF
ID: 80369 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
1 · 2 · Next

Message boards : Number crunching : Computation Error



©2024 University of Washington
https://www.bakerlab.org