Stuck on Uploading

Message boards : Number crunching : Stuck on Uploading

To post messages, you must log in.

1 · 2 · 3 · 4 · Next

AuthorMessage
Bill Kozorra

Send message
Joined: 25 Jan 11
Posts: 5
Credit: 86,550,972
RAC: 0
Message 80230 - Posted: 23 Jun 2016, 9:55:45 UTC

All my computers, both Mac and Windows, have work units stuck on "Uploading". Is anyone else having this issue?
ID: 80230 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Luigi R.

Send message
Joined: 7 Feb 14
Posts: 39
Credit: 2,045,527
RAC: 0
Message 80232 - Posted: 23 Jun 2016, 10:02:19 UTC

ID: 80232 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
XAVER
Avatar

Send message
Joined: 11 Jun 16
Posts: 1
Credit: 685,308
RAC: 163
Message 80233 - Posted: 23 Jun 2016, 12:32:32 UTC

Yes, same with me since yesterday evening (CEST). Download also (successful only after several tries).
ID: 80233 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Timo
Avatar

Send message
Joined: 9 Jan 12
Posts: 185
Credit: 45,649,459
RAC: 0
Message 80235 - Posted: 23 Jun 2016, 13:33:09 UTC - in response to Message 80233.  

Yes, same with me since yesterday evening (CEST). Download also (successful only after several tries).


Same thing here. I was starting to get nervous that it was only me - started clearing DNS caches, even reset my modem and router, etc. One thing I'm noticing now is that so far, the Server Status page is not loading so I suspect it's something on the server side but we cannot even check from our end. Maybe the servers are finally buckling under the load.


**38 cores crunching for R@H on behalf of cancercomputer.org - a non-profit supporting High Performance Computing in Cancer Research
ID: 80235 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
hjdghjdghjghjjggh

Send message
Joined: 10 May 16
Posts: 7
Credit: 9,749
RAC: 0
Message 80236 - Posted: 23 Jun 2016, 13:41:27 UTC

I can confirm as well.

ID: 80236 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
hjdghjdghjghjjggh

Send message
Joined: 10 May 16
Posts: 7
Credit: 9,749
RAC: 0
Message 80247 - Posted: 23 Jun 2016, 19:20:37 UTC
Last modified: 23 Jun 2016, 19:23:18 UTC

Well, they fixed the status page, and it seems like someone broke something...

ID: 80247 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile shanen
Avatar

Send message
Joined: 16 Apr 14
Posts: 195
Credit: 12,662,308
RAC: 0
Message 80250 - Posted: 23 Jun 2016, 23:00:23 UTC

I'll add a Linux box to the list of failing OSes, but not actually absolutely certain right now... I can see a Mac and Windows 10 box doing it right now, but the closest Linux box doesn't feel like showing the BOINC manager right now...

Good to see they noticed the server status needed fixing. Small step, but something.
#1 Freedom = (Meaningful - Constrained) Choice{5} != (Beer^3 | Speech)
ID: 80250 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
hjdghjdghjghjjggh

Send message
Joined: 10 May 16
Posts: 7
Credit: 9,749
RAC: 0
Message 80251 - Posted: 23 Jun 2016, 23:42:54 UTC

I should've probably said earlier, but that machine is linux being remote controlled locally by a mac. It can download and run tasks just fine, but any uploading is failing.
ID: 80251 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Murodoch

Send message
Joined: 10 Apr 07
Posts: 6
Credit: 21,941,417
RAC: 1,816
Message 80253 - Posted: 24 Jun 2016, 0:59:36 UTC

Ran into the same problem since yesterday afternoon (time zone +8).Downloading is OK.
ID: 80253 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
sinspin

Send message
Joined: 30 Jan 06
Posts: 29
Credit: 6,574,585
RAC: 0
Message 80255 - Posted: 24 Jun 2016, 7:28:30 UTC

Some of my yesterday finished WUs are stuck in the upload queue. If i start the upload manually get i always the same message :
24.06.2016 10:45:49 | rosetta@home | Started upload of rb_06_22_66335_110608_ab_stage0_h002___robetta_IGNORE_THE_REST_12_19_383620_11_0_0
24.06.2016 10:45:52 | | Project communication failed: attempting access to reference site
24.06.2016 10:45:52 | rosetta@home | Temporarily failed upload of rb_06_22_66335_110608_ab_stage0_h002___robetta_IGNORE_THE_REST_12_19_383620_11_0_0: connect() failed
24.06.2016 10:45:52 | rosetta@home | Backing off 04:23:37 on upload of rb_06_22_66335_110608_ab_stage0_h002___robetta_IGNORE_THE_REST_12_19_383620_11_0_0
24.06.2016 10:45:54 | | Internet access OK - project servers may be temporarily down.

However, all recently finished WUs are uploaded on the spot.

ID: 80255 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Timo
Avatar

Send message
Joined: 9 Jan 12
Posts: 185
Credit: 45,649,459
RAC: 0
Message 80259 - Posted: 24 Jun 2016, 14:10:34 UTC - in response to Message 80255.  

I enabled some of the advanced logging features of BOINC Manager, looks like the issue is with srv1.bakerlab.org. Currently, my personal DNS cache maps this server to 128.95.160.142 and indeed, BOINC attempts to connect to that IP:

6/24/2016 9:48:23 AM | rosetta@home | [http] [ID#9] Info: Trying 128.95.160.142...
6/24/2016 9:48:25 AM | rosetta@home | [http] [ID#9] Info: connect to 128.95.160.142 port 80 failed: Connection refused
6/24/2016 9:48:25 AM | rosetta@home | [http] [ID#9] Info: Failed to connect to srv1.bakerlab.org port 80: Connection refused
6/24/2016 9:48:25 AM | rosetta@home | [http] [ID#9] Info: Closing connection 15
6/24/2016 9:48:25 AM | rosetta@home | [http] HTTP error: Couldn't connect to server
6/24/2016 9:48:25 AM | rosetta@home | [file_xfer] http op done; retval -107 (connect() failed)
6/24/2016 9:48:25 AM | rosetta@home | [file_xfer] file transfer status -107 (connect() failed)
6/24/2016 9:48:25 AM | rosetta@home | Temporarily failed upload of rb_06_23_66630_110639__t000__ab_robetta_IGNORE_THE_REST_396472_2506_0_0: connect() failed


I can also confirm that other machines on the internet resolve srv1.bakerlab.org to 128.95.160.142 - maybe this DNS entry is invalid on some major DNS servers... or something much simpler, that particular server is busy.. Will enable [http debug] logging on my other hosts to see if I can confirm which server(s) are accepting work, maybe there is more than one server and it depends on which one the task is trying to get sent back to.. all speculation at this point.
**38 cores crunching for R@H on behalf of cancercomputer.org - a non-profit supporting High Performance Computing in Cancer Research
ID: 80259 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Timo
Avatar

Send message
Joined: 9 Jan 12
Posts: 185
Credit: 45,649,459
RAC: 0
Message 80261 - Posted: 24 Jun 2016, 14:22:33 UTC
Last modified: 24 Jun 2016, 14:23:49 UTC

My suspicions confirmed;

  • srv1.bakerlab.org at 128.95.160.142 is not working / not accepting uploads
  • srv4.bakerlab.org at 128.95.160.145 is accepting uploads



All tasks that are trying to be uploaded to the first server are failing, while any tasks attempt to be sent to srv4.bakerlab.org successfully upload.

ID: 80261 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 80263 - Posted: 24 Jun 2016, 14:45:47 UTC

Thanks Timo! I just tried it and it works for uploads.

Here is the line to add to Windows hosts file to work around the server that is not responding. If you don't know what to do with this information, it would be best to just wait for the issue to be resolved on the server side.

128.95.160.145 srv1.bakerlab.org
Rosetta Moderator: Mod.Sense
ID: 80263 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Luigi R.

Send message
Joined: 7 Feb 14
Posts: 39
Credit: 2,045,527
RAC: 0
Message 80265 - Posted: 24 Jun 2016, 14:48:51 UTC - in response to Message 80261.  
Last modified: 24 Jun 2016, 14:51:34 UTC

What about manually modifying client_state.xml? I'm wondering if it could be safe or have some effect something like...

***PLEASE DON'T DO THIS***
Replace
<upload_url>http://srv1.bakerlab.org/rosetta_cgi/file_upload_handler</upload_url>

with
<upload_url>http://srv4.bakerlab.org/rosetta_cgi/file_upload_handler</upload_url>
ID: 80265 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
hjdghjdghjghjjggh

Send message
Joined: 10 May 16
Posts: 7
Credit: 9,749
RAC: 0
Message 80266 - Posted: 24 Jun 2016, 15:06:37 UTC

Well, I see the status page says all the servers are online, the problem is I have the same tasks failing to upload. And this appears to be why:

ID: 80266 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Luigi R.

Send message
Joined: 7 Feb 14
Posts: 39
Credit: 2,045,527
RAC: 0
Message 80267 - Posted: 24 Jun 2016, 15:07:30 UTC - in response to Message 80263.  

Thanks Timo! I just tried it and it works for uploads.

Here is the line to add to Windows hosts file to work around the server that is not responding. If you don't know what to do with this information, it would be best to just wait for the issue to be resolved on the server side.

128.95.160.145 srv1.bakerlab.org

Thanks, it works on linux *buntu too.

Put that line in /etc/hosts.
ID: 80267 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Timo
Avatar

Send message
Joined: 9 Jan 12
Posts: 185
Credit: 45,649,459
RAC: 0
Message 80268 - Posted: 24 Jun 2016, 17:29:20 UTC - in response to Message 80263.  

Thanks Timo! I just tried it and it works for uploads.

Here is the line to add to Windows hosts file to work around the server that is not responding. If you don't know what to do with this information, it would be best to just wait for the issue to be resolved on the server side.

128.95.160.145 srv1.bakerlab.org



Yep, this is a solution.. I just hope it doesn't mess anything up on the server end if a result is landing on a different server than the server cluster was expecting to receive it at??


**38 cores crunching for R@H on behalf of cancercomputer.org - a non-profit supporting High Performance Computing in Cancer Research
ID: 80268 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
sinspin

Send message
Joined: 30 Jan 06
Posts: 29
Credit: 6,574,585
RAC: 0
Message 80269 - Posted: 24 Jun 2016, 17:35:19 UTC

Change of hosts.txt is a stupid idea! There should be another way.
Replace of "srv1.bakerlab.org" in client_state.xml is useless. Whenever i restart the boinc manager is it again there. I can't find from where boinc restore this.
i have it also replace in client_state_prev.xml.

Thanks your lazy guys have i now two WU's lost! 24 hours of work for nothing.
ID: 80269 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Timo
Avatar

Send message
Joined: 9 Jan 12
Posts: 185
Credit: 45,649,459
RAC: 0
Message 80270 - Posted: 24 Jun 2016, 17:40:09 UTC - in response to Message 80269.  

Change of hosts.txt is a stupid idea! There should be another way.
Replace of "srv1.bakerlab.org" in client_state.xml is useless. Whenever i restart the boinc manager is it again there. I can't find from where boinc restore this.
i have it also replace in client_state_prev.xml.

Thanks your lazy guys have i now two WU's lost! 24 hours of work for nothing.


On the contrary I think both ideas are resourceful and creative and a testimate to the better elements of our community working together. I also highly doubt anyone is being 'lazy', on the contrary most of the Baker Lab researchers and systems people are incredibly dedicated. Your frustration is understandable but misdirected, we're all on the same team here man :)
ID: 80270 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
sinspin

Send message
Joined: 30 Jan 06
Posts: 29
Credit: 6,574,585
RAC: 0
Message 80271 - Posted: 24 Jun 2016, 18:07:48 UTC - in response to Message 80270.  

..., we're all on the same team here man :)

No! That is wrong, they need us, we don't.
They get money for the job. I spend money, for hardware and for power. I spend my spare time for maintenance of my systems to keep them always crunching.
Later, in an hopefully no so far future, if the first results come to the market, pay i again to get the medicine or whatever.

All what i get is useless credits, and the hope that i spend my money and time not in weapons.

And all what they do in this case is "you should change you hosts file"!!
There is no information at the Website, so that everyone can find out that there is a problem. There is not really a useful workaround wich works for everyone.

And again: replace of srv1 in client_state.xml and client_state_prev.xml is no solution. restart of boinc and the previous srv1.. is back.
ID: 80271 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
1 · 2 · 3 · 4 · Next

Message boards : Number crunching : Stuck on Uploading



©2024 University of Washington
https://www.bakerlab.org