Upload issues "again"

Message boards : Number crunching : Upload issues "again"

To post messages, you must log in.

AuthorMessage
Vmo9

Send message
Joined: 12 Oct 05
Posts: 1
Credit: 1,726,538
RAC: 0
Message 77199 - Posted: 1 Aug 2014, 10:58:29 UTC

It becomes very frustrating to see day's go by and my results are still not uploading. This is the reason why I set; "won't get new tasks" to Rosetta. The application takes close to four gigabits of disk space, which if it worked correctly, I wouldn't mind. Does anyone know why this constantly happens?
ID: 77199 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
[FI] OIKARINEN
Avatar

Send message
Joined: 16 Nov 13
Posts: 6
Credit: 131,483
RAC: 0
Message 77245 - Posted: 3 Aug 2014, 9:33:01 UTC

After some days of server issues .. the problem is solved , at least for me , have been able to upload a couple of tasks at full speed ..
Life is too short to live concerned about its mysteries.
ID: 77245 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile dcdc

Send message
Joined: 3 Nov 05
Posts: 1831
Credit: 119,627,225
RAC: 11,586
Message 77247 - Posted: 3 Aug 2014, 10:49:32 UTC - in response to Message 77199.  
Last modified: 3 Aug 2014, 10:49:52 UTC

It becomes very frustrating to see day's go by and my results are still not uploading. This is the reason why I set; "won't get new tasks" to Rosetta. The application takes close to four gigabits of disk space, which if it worked correctly, I wouldn't mind. Does anyone know why this constantly happens?

It is frustrating but it doesn't constantly happen - it happens a couple of times a year at most. It's one of the best-administered projects out there and has been fixed on a Sunday so someone has obviously been putting the hours in.
ID: 77247 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Murasaki
Avatar

Send message
Joined: 20 Apr 06
Posts: 303
Credit: 511,418
RAC: 0
Message 77251 - Posted: 3 Aug 2014, 12:48:31 UTC

Problem now diagnosed:

krypton wrote:
Some good news and bad news:

Good news: The servers and UW network is working normally. We got a
HUGE spike in new users/computers connected to the R@H project.

Bad news: We didn't get a heads up notice and so were not prepared to
handle soo much traffic at once. These computers are still in the
process of downloading Rosetta/Database. As Ananas suggested, it's an
issue with the number of allowed concurrent connections per server.

Good news: Once all these new computers get a copy of rosetta/database
(which is a large single download), everything will go back to normal.
We will be getting more servers, to prevent this from happening in the
future.

Once we know who these new users are, we'll post something on the front page.

Once again, thank you for all the feedback, these were very helpful in
debugging the issue.

-Krypton.


David E K wrote:
Yep, I'm currently optimizing the number of connections on all our servers. Looks like they can keep up without too much load/memory usage so far. These servers are pretty old and I'm sure we'll upgrade soon hopefully.


Polian wrote:
Looks like a DoS attack to me, to be honest:

I just picked a random new user ID, 680000 and went up by one from there

Example new users:

https://boinc.bakerlab.org/rosetta/show_user.php?userid=680000
https://boinc.bakerlab.org/rosetta/show_user.php?userid=680001
https://boinc.bakerlab.org/rosetta/show_user.php?userid=680002
https://boinc.bakerlab.org/rosetta/show_user.php?userid=680003
https://boinc.bakerlab.org/rosetta/show_user.php?userid=680004
https://boinc.bakerlab.org/rosetta/show_user.php?userid=680005
https://boinc.bakerlab.org/rosetta/show_user.php?userid=680006


David E K wrote:
Hmm, I also suspected this but the IP's from the logs were coming from various places. Maybe I'll have to disable new users for now until we figure things out.


David E K wrote:
I was told by Matthew Blumberg at Gridrepublic that the new users are real crunchers and that they "started a new marketing campaign via charityengine.com." So I re-enabled the account creation for these users. Our servers may get sluggish again but hopefully things will settle down as the new user rates decrease. And hopefully optimizing the connections on our servers will help. In the future, we hope to get more servers.

This issue coincided with the annual RosettaCon meeting (so most of the Baker lab members were out of town), a final ramp up in CASP targets, and me going on a family camping vacation where I had no phone reception. Normally, I would have been able to react faster to debug and help diffuse the situation.

I am sorry for any inconvenience and the fact that it took a few days to finally make some progress.
ID: 77251 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile David E K
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 1 Jul 05
Posts: 1018
Credit: 4,334,829
RAC: 0
Message 77252 - Posted: 3 Aug 2014, 13:31:14 UTC

It would have been resolved earlier also but Sergey, a graduate student in the lab, who is the one very busy with CASP and because of this (depending on R@H highly) he noticed the issue right away and was trying to diagnose the problem BUT he did not have access to our servers. If he was given access, which I'm guessing our sys admins were reluctant to do for security reasons, I have absolutely no doubt that he would have fixed the issue even while spending all nighters working on CASP targets. The up-time for R@h has actually been really good lately before this incident. Thanks everyone for helping us diagnose the issue!
ID: 77252 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote

Message boards : Number crunching : Upload issues "again"



©2024 University of Washington
https://www.bakerlab.org