Message boards : Number crunching : Minirosetta 3.46
Previous · 1 · 2 · 3 · 4 · Next
Author | Message |
---|---|
P . P . L . Send message Joined: 20 Aug 06 Posts: 581 Credit: 4,865,274 RAC: 0 |
And another three of these, this 1 ran for 8hrs my run time is now 4hrs so why didn't it stop earlier. The other 2 I aborted at over 6hrs because I couldn't see them finishing without getting an error & wasting more time anyway. B.T.W. they ran non-stop for all that time, so I don't know why it's saying that there is 2 starting structures I think it normally says 1? rb_05_10_38828_73745__t000__0_C1_SAVE_ALL_OUT_IGNORE_THE_REST_80811_594_0 https://boinc.bakerlab.org/rosetta/workunit.php?wuid=527333216 # cpu_run_time_pref: 14400 BOINC:: CPU time: 29192.9s, 14400s + 14400s[2013- 5-12 18:22:12:] :: BOINC InternalDecoyCount: 12 ====================================================== DONE :: 2 starting structures 29192.9 cpu seconds This process generated 12 decoys from 12 attempts ====================================================== called boinc_finish SIGSEGV: segmentation violation </stderr_txt> ]]> Validate state Invalid Claimed credit 298.87771452369 Granted credit 0 application version 3.46 |
P . P . L . Send message Joined: 20 Aug 06 Posts: 581 Credit: 4,865,274 RAC: 0 |
Still getting errors on these tasks, I'm aborting any of these I see. rb_05_12_38289_72979__t000__3_C1_SAVE_ALL_OUT_IGNORE_THE_REST_80960_200_0 https://boinc.bakerlab.org/rosetta/workunit.php?wuid=527807465 # cpu_run_time_pref: 14400 dof_atom1 atomno= 3 rsd= 1 atom1 atomno= 1 rsd= 1 atom2 atomno= 2 rsd= 1 atom3 atomno= 5 rsd= 1 atom4 atomno= 6 rsd= 1 THETA1 nan THETA3 nan PHI2 0 ERROR: AtomTree::torsion_angle_dof_id: angle range error ERROR:: Exit from: src/core/kinematics/AtomTree.cc line: 780 ERROR: unknown atom_name: PRO NV ERROR:: Exit from: src/core/chemical/ResidueType.cc line: 2016 SIGSEGV: segmentation violation Stack trace (17 frames): [0xb2aef87] [0xf7735400] [0xa166837] [0xa1f3edc] [0xa1f4e3c] [0x996c8d6] [0x996df60] [0x89561af] [0x867d35e] [0x992d14f] [0x9931429] [0x9aebcad] [0x9b4f815] [0x9b4d045] [0x8054950] [0xb33f328] [0x8048131] Exiting... </stderr_txt> ]]> |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
These hyb_cb_bench_(etc) tasks are really poor credit tasks. they complete ok, but give you only 20 credits for your work. |
James W Send message Joined: 25 Nov 12 Posts: 130 Credit: 1,766,254 RAC: 0 |
I've noticed over the last couple weeks that there have been several types of jobs I haven't seen before (some beginning with the "hyb" or "hybred," "cyto," etc.) These jobs are not setting checkpoints, even after crunching up to 11 hours or so (with checkpoint limited to no more than every 60 sec. in computer pref.) The jobs starting "rb_5_17" and other "dates" continue to have checkpoints as usual. The problem, as noted in another recent thread, is that if I must shut down my system or reboot (such as for doing Windows updates, updating applications, etc.), or if I must close BIONC, I lose all the work in these "new" type jobs without checkpoints. Were these new jobs set up this way, or was this an oversight? I've already lost a number of hours/days of crunching because of this issue before I got an idea of what was happening. Is there a reasonable workaround for this problem? Thanks. |
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 1994 Credit: 9,623,704 RAC: 9,591 |
Were these new jobs set up this way, or was this an oversight? I've already lost a number of hours/days of crunching because of this issue before I got an idea of what was happening. Is there a reasonable workaround for this problem? Thanks. Target CPU run time: 1 hour..... :-) |
robertmiles Send message Joined: 16 Jun 08 Posts: 1232 Credit: 14,281,662 RAC: 1,807 |
Were these new jobs set up this way, or was this an oversight? I've already lost a number of hours/days of crunching because of this issue before I got an idea of what was happening. Is there a reasonable workaround for this problem? Thanks. That should work if you're allowed to set it that low. If I remember correctly, though, the minimum is now 3 hours. |
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 1994 Credit: 9,623,704 RAC: 9,591 |
That should work if you're allowed to set it that low. If I remember correctly, though, the minimum is now 3 hours. My run time is 2 hours.... |
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 1994 Credit: 9,623,704 RAC: 9,591 |
A lot of rb_number wu's cannot start graphic (and, i think, calculation). There is a simple green line and 0 steps. I kill these wus |
mikey Send message Joined: 5 Jan 06 Posts: 1895 Credit: 9,169,305 RAC: 3,857 |
These hyb_cb_bench_(etc) tasks are really poor credit tasks. I am SOOOO glad I am almost outa here!! My cryo tasks are again taking 7 hours to finish, I just use the defaults, and I am getting 20 to 25 frickin credits for them. NOW the same thing is happening with the RB units, PATHETIC!!! My eb units are doing okay but it is a pain trying to keep 10 systems clear of all the bad units!! I AM trying to help but my rac is DECLINING and my work output is RISING, that is just NOT RIGHT!!! |
Speedy Send message Joined: 25 Sep 05 Posts: 163 Credit: 808,098 RAC: 0 |
A lot of rb_number wu's cannot start graphic (and, i think, calculation). I am running rb_06_21_39751_75850__t000__3_C1_SAVE_ALL_OUT_IGNORE_THE_REST_87610_60 the default run time (3 hours) and the graphics are working perfectly. Have a crunching good day!! |
mikey Send message Joined: 5 Jan 06 Posts: 1895 Credit: 9,169,305 RAC: 3,857 |
How is THIS my fault? https://boinc.bakerlab.org/rosetta/workunit.php?wuid=534659615 stderr out <core_client_version>7.0.64</core_client_version> <![CDATA[ <message> Maximum disk usage exceeded </message> <stderr_txt> of [-1,+1] sin and cos value legal range sin_cos_range ERROR: 1.#QNAN00 is outside of [-1,+1] sin and cos value legal range sin_cos_range ERROR: 1.#QNAN00 is outside of [-1,+1] sin and cos value legal range sin_cos_range ERROR: 1.#QNAN00 is outside of [-1,+1] sin and cos value legal range sin_cos_range ERROR: 1.#QNAN00 is outside of [-1,+1] sin and cos value legal range sin_cos_range ERROR: 1.#QNAN00 is outside of [-1,+1] sin and cos value legal range sin_cos_range ERROR: 1.#QNAN00 is outside of [-1,+1] sin and cos value legal range sin_cos_range ERROR: 1.#QNAN00 is outside of [-1,+1] sin and cos value legal range sin_cos_range ERROR: 1.#QNAN00 is outside of [-1,+1] sin and cos value legal range sin_cos_range ERROR: 1.#QNAN00 is outside of [-1,+1] sin and cos value legal range sin_cos_range ERROR: 1.#QNAN00 is outside of [-1,+1] sin and cos value legal range sin_cos_range ERROR: 1.#QNAN00 is outside of [-1,+1] sin and cos value legal range sin_cos_range ERROR: 1.#QNAN00 is outside of [-1,+1] sin and cos value legal range etc, etc, ETC!!! The pc is: CPU type AuthenticAMD AMD Phenom(tm) II X6 1090T Processor [Family 16 Model 10 Stepping 0] Number of CPUs 6 Operating System Microsoft Windows 7 Ultimate x64 Edition, Service Pack 1, (06.01.7601.00) Memory 12283.63 MB Cache 512 KB Swap space 24565.44 MB That is SIX cpu's with ONLY five crunching, the other is being used to support the gpu, with some left over for whatever. The unit took "CPU time 14335.48", ie 4 HOURS, and then it just errored out!! What kind of project is this right now?!!! LOTS of problems with the different kinds of units yet they are being released like this is a BETA project or something!! Rosetta is SUPPOSED to be about the SCIENCE, not releasing units to 'see if they work or not'!! I thought that's what the Beta Project was all about, testing the units PRIOR to them being released here!!! |
robertmiles Send message Joined: 16 Jun 08 Posts: 1232 Credit: 14,281,662 RAC: 1,807 |
How is THIS my fault? Rosetta@Home doesn't use beta testing - they use alpha testing (RALPH@Home) instead. Does this make them think that Rosetta@Home should do the beta testing instead? If so, the results are often worse than most of the alpha test BOINC projects I have my computers participate in. |
P . P . L . Send message Joined: 20 Aug 06 Posts: 581 Credit: 4,865,274 RAC: 0 |
Had a couple of these tasks error today, this message goes on for a few pages. CASP9_fb_benchmark_hybridization_run54_T0534_1_C2_SAVE_ALL_OUT_IGNORE_THE_REST_47953_1846_1 https://boinc.bakerlab.org/rosetta/workunit.php?wuid=534900423 ERROR: error in process_residue_request: 'com' ERROR:: Exit from: src/core/conformation/symmetry/util.cc line: 93 # cpu_run_time_pref: 21600 ERROR: error in process_residue_request: 'com' ERROR:: Exit from: src/core/conformation/symmetry/util.cc line: 93 ERROR: error in process_residue_request: 'com' ERROR:: Exit from: src/core/conformation/symmetry/util.cc line: 93 ====================================================== DONE :: 99 starting structures 1201 cpu seconds This process generated 99 decoys from 99 attempts ====================================================== BOINC :: WS_max 0 BOINC :: Watchdog shutting down... BOINC :: BOINC support services shutting down cleanly ... called boinc_finish |
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 1994 Credit: 9,623,704 RAC: 9,591 |
Rosetta@Home doesn't use beta testing - they use alpha testing (RALPH@Home) instead. Does this make them think that Rosetta@Home should do the beta testing instead? If so, the results are often worse than most of the alpha test BOINC projects I have my computers participate in. I partecipate to ralph@home project, but sometimes i think that the possibility to test largely the new version/new code/etc is VERY understimated. And admins do not partecipate on test forum..... |
mikey Send message Joined: 5 Jan 06 Posts: 1895 Credit: 9,169,305 RAC: 3,857 |
Rosetta@Home doesn't use beta testing - they use alpha testing (RALPH@Home) instead. Does this make them think that Rosetta@Home should do the beta testing instead? If so, the results are often worse than most of the alpha test BOINC projects I have my computers participate in. To be honest until some of us screamed, yelled and started aborting all the cryo units recently the Admins aren't HERE either!! I guess they 'are too busy' to waste their time seeing if what they designed actually works in the REAL WORLD!! |
mikey Send message Joined: 5 Jan 06 Posts: 1895 Credit: 9,169,305 RAC: 3,857 |
Here is ANOTHER "hyb-ab-bench" unit that just cost me SEVEN HOURS of crunching time and THEN errored out: https://boinc.bakerlab.org/rosetta/result.php?resultid=588986073 The reason: </stderr_txt> <message> upload failure: <file_xfer_error> <file_name>hyb_ab_bench_4aimA_SAVE_ALL_OUT_IGNORE_THE_REST_53960_1303_0_0</file_name> <error_code>-161</error_code> </file_xfer_error> UPLOAD FAILURE----WTF are you telling me? Is Rosetta telling me that AFTER SEVEN HOURS of crunching a unit fails to upload and I will get NO CREDITS for it???!!!!!!!! WHERE is the Scientist who designed these things? Why is SOMEONE not here explaining what the heck is going on?!!!! This is JUST ONE of my pc's here, I have NOT checked the others, but it is NOT the same one as the last problem I posted about having problems with!!! |
TJ Send message Joined: 29 Mar 09 Posts: 127 Credit: 4,799,890 RAC: 0 |
These hyb_cb_bench_(etc) tasks are really poor credit tasks. It is indeed not right. The server code is obsolete and a cause of the low credit, for any WU. But they will not update it here unless it totally crashed and cannot be brought to live again... shame. They are lucky that their research is quite important, otherwise...there are many many projects that need CPU-time. Greetings, TJ. |
TJ Send message Joined: 29 Mar 09 Posts: 127 Credit: 4,799,890 RAC: 0 |
Were these new jobs set up this way, or was this an oversight? I've already lost a number of hours/days of crunching because of this issue before I got an idea of what was happening. Is there a reasonable workaround for this problem? Thanks. I have seen this on preferences but have now idea what it does or where it is for. Can someone please explain this? Greetings, TJ. |
robertmiles Send message Joined: 16 Jun 08 Posts: 1232 Credit: 14,281,662 RAC: 1,807 |
Were these new jobs set up this way, or was this an oversight? I've already lost a number of hours/days of crunching because of this issue before I got an idea of what was happening. Is there a reasonable workaround for this problem? Thanks. Rosetta@Home workunits are set up in usually 100 sections, called decoys. They try to run however many of these decoys they expect to finish in the target CPU run time, but can go over if the last one takes longer than expected. I'm not sure if the shutdown code runs properly if the last decoy that was finished reported an error instead of a good answer. |
P . P . L . Send message Joined: 20 Aug 06 Posts: 581 Credit: 4,865,274 RAC: 0 |
Looks like I'm going to have to take the big hammer to some of these tasks, I'm not amused at all. My 6hr runtime ended up over 10hrs. CASP9_bw_benchmark_hybridization_run49_T0606_1_C1_SAVE_ALL_OUT_IGNORE_THE_REST_46414_1348_0 https://boinc.bakerlab.org/rosetta/workunit.php?wuid=534948991 Unpacking data: ../../projects/boinc.bakerlab.org_rosetta/input_CASP9_bw_benchmark_hybridization_run49_T0606_1_C1_yfsong.zip Setting database description ... Setting up checkpointing ... Setting up graphics native ... BOINC:: Worker startup. Starting watchdog... Watchdog active. # cpu_run_time_pref: 21600 BOINC:: CPU time: 36341.9s, 14400s + 21600s[2013- 6-25 17:58:54:] :: BOINC InternalDecoyCount: 2 ====================================================== DONE :: 2 starting structures 36341.9 cpu seconds This process generated 2 decoys from 2 attempts ====================================================== called boinc_finish SIGSEGV: segmentation violation Stack trace (21 frames): [0xb2aef87] [0xf777f400] [0xa6ce54c] [0xa6e7659] [0xa1648c7] [0xa1f2dd2] [0xa1f4df1] [0x9d4d1a5] [0x9f10187] [0x9d56457] [0x9d4265a] [0x8925eca] [0x8681018] [0x992d14f] [0x9931429] [0x9aebcad] [0x9b4f815] [0x9b4d045] [0x8054950] [0xb33f328] [0x8048131] Exiting... </stderr_txt> ]]> Validate state Valid Claimed credit 280.45 Granted credit 11.25 |
Message boards :
Number crunching :
Minirosetta 3.46
©2024 University of Washington
https://www.bakerlab.org