Message boards : Number crunching : Credit granted pending stack
Previous · 1 · 2 · 3
Author | Message |
---|---|
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
DK, from what people have reported over time, it sounds like the server side must be hitting a section of special code for the case when the project is "out of work" (i.e. frantically creating work, but still unable to keep up with immediate demand). And in that case, it seems the server sends back to client a 24hr backoff. I'm thinking if you can search your server source code for "86400" (seconds in a day), you may find it. I would think this was server self-preservation early on in R@h project. But now that new work is available seconds or minutes later, it has outlasted it's initial purpose. Rosetta Moderator: Mod.Sense |
David E K Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 1 Jul 05 Posts: 1018 Credit: 4,334,829 RAC: 0 |
DK, from what people have reported over time, it sounds like the server side must be hitting a section of special code for the case when the project is "out of work" (i.e. frantically creating work, but still unable to keep up with immediate demand). And in that case, it seems the server sends back to client a 24hr backoff. I'm thinking if you can search your server source code for "86400" (seconds in a day), you may find it. The 24 hour backoff seems fine to me to help reduce the load on our servers in those special situations which hopefully should not happen that often. I can look into the scheduler code though. I think we are low on jobs at the moment but the demand fluctuates depending on what is necessary for the specific research projects. |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
Right, it would be ok if all that it effects is getting new work to the host, but it also affects reporting completed work, and I think it may even halt uploads once it hits that threshold. It only seems to take 2 or 3 "no work" indications in 10 minutes to receive the 24hr backoff. So it has actually gotten in the way of completing 48hr deadline work for CASP. Rosetta Moderator: Mod.Sense |
David E K Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 1 Jul 05 Posts: 1018 Credit: 4,334,829 RAC: 0 |
Right, it would be ok if all that it effects is getting new work to the host, but it also affects reporting completed work, and I think it may even halt uploads once it hits that threshold. It only seems to take 2 or 3 "no work" indications in 10 minutes to receive the 24hr backoff. That's not good. The tight deadline is tough particularly when the system is having issues. Hopefully we can prevent issues from arising in the future but there's a lot of moving parts. |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2125 Credit: 41,245,383 RAC: 9,571 |
Right, it would be ok if all that it effects is getting new work to the host, but it also affects reporting completed work, and I think it may even halt uploads once it hits that threshold. It only seems to take 2 or 3 "no work" indications in 10 minutes to receive the 24hr backoff. That's not quite right. It goes to 24hr back-off on the very first occasion, not after 2 or 3. Observed many times. And this affects polling of any type, so uploads too in a period where CASP tasks need to come back before a 2 day deadline (inc processing time, which I accidentally discovered today has increased to a default 8 hours rather than 6hrs). Unless users are micro-managing to ensure everything's going ok, this won't be spotted. I suspect 'normal' people just let things run so won't know it's happened. As I mentioned earlier, I haven't noticed any 24hr backoffs for some weeks, so the urgency has gone out of this issue, but the principle stands. |
No.15 Send message Joined: 30 Dec 15 Posts: 7 Credit: 7,621,315 RAC: 0 |
Right it is not 2 or 3 tries it is 1. If there is one issue you get 24 hour back off and all your current work does not get uploaded. You either have to consistently watch the project, be willing to lose credit or move on. It might even be ok if Rosetta's work was 2 or 3 days out but it seems their deadlines are always tight. Like I said I want to crunch Rosetta but I am not willing to baby sit it. I moved my stuff over to WCG but I will be back if this ever gets fixed. |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2125 Credit: 41,245,383 RAC: 9,571 |
As a lot of tasks have probably been returned recently, validation is pending for something like 3+ hours. I also spotted a 24hr backoff for the first time in weeks - just once, but it hasn't gone away |
Message boards :
Number crunching :
Credit granted pending stack
©2024 University of Washington
https://www.bakerlab.org