Problems and Technical Issues with Rosetta@home

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 . . . 55 · Next

AuthorMessage
Profile ukishun

Send message
Joined: 28 Apr 11
Posts: 4
Credit: 18,756
RAC: 0
Message 70308 - Posted: 10 May 2011, 9:50:31 UTC

I mentioned a task that was giving me trouble here:
https://boinc.bakerlab.org/rosetta/forum_thread.php?id=5704

It eventually went away (I went to sleep, I have no idea what happened it to it)

Now a similarly named WU:

FOLD_N_DOCK_dagk_D2symm_SAVE_ALL_OUT_IGNORE_THE_REST_26520_4912
(https://boinc.bakerlab.org/rosetta/workunit.php?wuid=383890060)

is giving me some trouble. when I suspend the task and restart the computer, the task restarts and goes back to 0%. Even after about 45 minutes of CPU runtime, I check the properties and CPU time at last checkpoint is blank. I don't know if this is 'normal', since it's been mentioned that some WUs take a long time to complete, but it seems like a waste of computing time if it just restarts.

ID: 70308 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 70310 - Posted: 10 May 2011, 17:46:18 UTC - in response to Message 70308.  

...when I suspend the task and restart the computer, the task restarts and goes back to 0%. Even after about 45 minutes of CPU runtime, I check the properties and CPU time at last checkpoint is blank. I don't know if this is 'normal', since it's been mentioned that some WUs take a long time to complete, but it seems like a waste of computing time if it just restarts.


Yes, normal. Some tasks checkpoint more frequently then others. If your machine were in an environment where no progress is made after five attempts at restarting it, Rosetta automatically marks the task as completed and reports it back. So, such tasks never just get stuck on a machine that can never complete them. In your case, it just needs to run longer before it is going to be able to checkpoint and preserve the work completed up to that point.

Rosetta Moderator: Mod.Sense
ID: 70310 · Rating: 0 · rate: Rate + / Rate - Report as offensive
SafeAggie

Send message
Joined: 22 Oct 05
Posts: 3
Credit: 458,414
RAC: 0
Message 70313 - Posted: 10 May 2011, 22:49:23 UTC
Last modified: 10 May 2011, 22:50:39 UTC

Client Error/Compute Error: FOLD_N_DOCK_dagk_D2symm_SAVE_ALL_OUT_IGNORE_THE_REST_26520_2442_0
resultid=420544634
wuid=383772100
CPU Time: 3,129.82 seconds

Validate Error: ProteinG_abinitio_SAVE_ALL_OUT_design_relax_g038_003_25638_198_1
resultid=420536503
wuid=381625616
CPU Time: 2,695.49 seconds

Validate Error: ProteinG_abinitio_SAVE_ALL_OUT_design_relax_g060_007_26528_2_0
resultid=420535967
wuid=383764150
CPU Time: 2,754.88 seconds
ID: 70313 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Jesse Viviano

Send message
Joined: 14 Jan 10
Posts: 42
Credit: 2,700,472
RAC: 0
Message 70336 - Posted: 14 May 2011, 12:06:17 UTC
Last modified: 14 May 2011, 12:07:08 UTC

Please see my computer's list of work units. The two work units whose names start with "IF3_like_SAVE_ALL_OUT_relax_i091_26681_" get validate errors on my machine which is set to a 24 hour target work unit turnaround time, and one of the work units was resent to a machine which is set to a much shorter turnaround time. That machine's result validated fine. Could someone fix the validator or adjust the limit that finishes the work unit early to prevent validator problems for this series for those whose machines are set to a 24 hour turnaround time?
ID: 70336 · Rating: 0 · rate: Rate + / Rate - Report as offensive
googloo
Avatar

Send message
Joined: 15 Sep 06
Posts: 133
Credit: 22,722,686
RAC: 3,784
Message 70406 - Posted: 27 May 2011, 15:53:03 UTC

Maybe somebody will pay attention to this thread. Hey, project people - there's been no work since yesterday afternoon.
ID: 70406 · Rating: 0 · rate: Rate + / Rate - Report as offensive
CBSX01

Send message
Joined: 17 Dec 07
Posts: 11
Credit: 5,387,356
RAC: 0
Message 70407 - Posted: 27 May 2011, 17:27:22 UTC - in response to Message 70406.  

Maybe somebody will pay attention to this thread. Hey, project people - there's been no work since yesterday afternoon.


I know! I've got 3 PCs on my desk here at work and only 1 network cable (sorry, no personal switches or routers allowed).

Got my preferences set to cache 3 days (for the long weekend) and hoping to fill them up with work to report on Tuesday.
ID: 70407 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Plasmon_attack

Send message
Joined: 2 May 10
Posts: 13
Credit: 15,451,384
RAC: 0
Message 70408 - Posted: 27 May 2011, 17:45:19 UTC - in response to Message 70407.  

Yeah, this seems to happen every once in a while, the queue runs down. I'm tempted to ramp up my queues to be over 5 days or more since it can take that long to get work filled back in. Given it's Friday I suspect our computers are all getting a break over the weekend and our RAC's will just have to eat it.
ID: 70408 · Rating: 0 · rate: Rate + / Rate - Report as offensive
edikl

Send message
Joined: 16 Jun 10
Posts: 10
Credit: 186,187
RAC: 0
Message 70409 - Posted: 27 May 2011, 18:33:31 UTC

It is normal that sometimes things go wrong. But it would be nice to hear a word from project administrators, that we have a problem (why and when it is predicted to be fixed). We, users, like to know that we are treated seriously :)
ID: 70409 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Telescope Adrian

Send message
Joined: 14 Nov 06
Posts: 9
Credit: 1,906,378
RAC: 0
Message 70410 - Posted: 27 May 2011, 18:39:44 UTC

The answer is very simple , run work for other BOINC projects too , like SPINHENGE or WCG .
ID: 70410 · Rating: 0 · rate: Rate + / Rate - Report as offensive
CBSX01

Send message
Joined: 17 Dec 07
Posts: 11
Credit: 5,387,356
RAC: 0
Message 70411 - Posted: 27 May 2011, 19:03:58 UTC - in response to Message 70409.  

It is normal that sometimes things go wrong. But it would be nice to hear a word from project administrators, that we have a problem (why and when it is predicted to be fixed). We, users, like to know that we are treated seriously :)


Precisely. Even if it's "Just cracked the first cold one. Back on Tuesday".

And in reply to Mr. Telescope, yes, these things have happened in the past and we all know the routine. Going to to be attaching to POEM on all PCs but it would be nice know if it's going to be 10 minutes, hours or days...
ID: 70411 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Samson

Send message
Joined: 23 May 11
Posts: 8
Credit: 257,870
RAC: 0
Message 70414 - Posted: 27 May 2011, 20:47:29 UTC

I thought I made the right choice with Rosetta but now I'm not so sure.

I want a project that I can devote all my CPU time to. Rosetta seems to be the best as far as practical medical advances is concerned.

I went as far as to contact 2 mods/admins yesterday; I haven't heard a peep.

I'd like to get Rosetta running again.

An update or catastrophe notice would be nice.
ID: 70414 · Rating: 0 · rate: Rate + / Rate - Report as offensive
TPCBF

Send message
Joined: 29 Nov 10
Posts: 111
Credit: 5,085,161
RAC: 1,449
Message 70416 - Posted: 27 May 2011, 21:34:55 UTC - in response to Message 70414.  

I thought I made the right choice with Rosetta but now I'm not so sure.

I want a project that I can devote all my CPU time to. Rosetta seems to be the best as far as practical medical advances is concerned.

I went as far as to contact 2 mods/admins yesterday; I haven't heard a peep.

I'd like to get Rosetta running again.

An update or catastrophe notice would be nice.
Don't hold you breath on getting any official info on what's wrong any time soon. Specially with the long weekend coming up here in the U.S. of A...

I don't put all my eggs in one nest anymore since the "breakdown" over the change of year, I split my CPU time between Rosetta and World Community Grid, so far they don't have been down both at the same time and WGC is at least a prolific with their research as R@H...

Ralf
ID: 70416 · Rating: 0 · rate: Rate + / Rate - Report as offensive
cnick6

Send message
Joined: 30 May 06
Posts: 29
Credit: 12,597,623
RAC: 0
Message 70417 - Posted: 27 May 2011, 21:45:14 UTC

I also do multiple. The funny thing is Seti@home is down too. Just can't win.
ID: 70417 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Samson

Send message
Joined: 23 May 11
Posts: 8
Credit: 257,870
RAC: 0
Message 70421 - Posted: 28 May 2011, 1:33:20 UTC

Work is starting to trickle in.

Seems I got a few units about an hour ago.

I could check the log but....

Anyway, some work is coming down the pipe. For me at least.
ID: 70421 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile rochester new york
Avatar

Send message
Joined: 2 Jul 06
Posts: 2842
Credit: 2,020,043
RAC: 0
Message 70423 - Posted: 28 May 2011, 1:52:41 UTC - in response to Message 70421.  

Work is starting to trickle in.

Seems I got a few units about an hour ago.

I could check the log but....

Anyway, some work is coming down the pipe. For me at least.


yeah the queue increased by a few thousand
ID: 70423 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Shawn
Volunteer moderator
Project developer
Project scientist

Send message
Joined: 22 Jan 10
Posts: 17
Credit: 53,741
RAC: 0
Message 70425 - Posted: 28 May 2011, 2:09:18 UTC

Hey guys, I submitted a new job for MVH, which you can read about in the protein-protein interface thread if you're interested. This job is slightly different from the previous ones (it includes more stubs), so I wanted to do some extra checking to make sure that it wouldn't break anything.

Hopefully, I'll get some jobs for Ebola targets later this week!
ID: 70425 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 0
Message 70435 - Posted: 29 May 2011, 12:29:20 UTC

your validator or credit counting server must have crashed.
i lost just over 100 pts average credit in 1 or 2 days!!!!!
the results show nothing wrong with credits.
so there must be something else.
how can you lose 100 pts average credit in a couple of days????????
from about 525 to 425 starting on the line of may 27. today is the 29th
ID: 70435 · Rating: 0 · rate: Rate + / Rate - Report as offensive
TPCBF

Send message
Joined: 29 Nov 10
Posts: 111
Credit: 5,085,161
RAC: 1,449
Message 70438 - Posted: 29 May 2011, 17:26:54 UTC - in response to Message 70435.  

your validator or credit counting server must have crashed.
i lost just over 100 pts average credit in 1 or 2 days!!!!!
the results show nothing wrong with credits.
so there must be something else.
how can you lose 100 pts average credit in a couple of days????????
from about 525 to 425 starting on the line of may 27. today is the 29th
Got a bunch of WU stuck as "Pending credit" over night now too.

Well, never a dull moment with R@H... :-(

Ralf
ID: 70438 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2125
Credit: 41,245,383
RAC: 9,571
Message 70440 - Posted: 30 May 2011, 1:14:51 UTC - in response to Message 70435.  

your validator or credit counting server must have crashed.
i lost just over 100 pts average credit in 1 or 2 days!!!!!
the results show nothing wrong with credits.
so there must be something else.
how can you lose 100 pts average credit in a couple of days????????
from about 525 to 425 starting on the line of may 27. today is the 29th

They're not lost - just saved up. When they do get awarded you'll have an inflated rac, then it'll drop again when that lump drops out. IYSWIM...

But there is an intermittent problem with validation somewhere. Some of my team are getting credits straight away, some after a delay (not long but undetermined), one for much of the day. No biggie, but a kick may be in order...
ID: 70440 · Rating: 0 · rate: Rate + / Rate - Report as offensive
TPCBF

Send message
Joined: 29 Nov 10
Posts: 111
Credit: 5,085,161
RAC: 1,449
Message 70442 - Posted: 30 May 2011, 5:47:07 UTC - in response to Message 70440.  

But there is an intermittent problem with validation somewhere. Some of my team are getting credits straight away, some after a delay (not long but undetermined), one for much of the day. No biggie, but a kick may be in order...
I wouldn't call that an "intermittent" problem. A couple of WUs got validated during the day, but the list of "Pending credit" WU's just keeps getting longer...
Hope that isn't like a balloon that gets slowly blown up until you get a big bang (again).

Ralf
ID: 70442 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Previous · 1 · 2 · 3 · 4 · 5 . . . 55 · Next

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home



©2024 University of Washington
https://www.bakerlab.org