Waiting for Memory

Questions and Answers : Unix/Linux : Waiting for Memory

To post messages, you must log in.

AuthorMessage
Profile hawkless1

Send message
Joined: 10 Mar 20
Posts: 7
Credit: 504,454
RAC: 0
Message 107276 - Posted: 12 Oct 2022, 14:04:38 UTC

Hello, I often saw that some Rosetta jobs did not start (0.00%).
In the status column: Waiting for memory
and the CPU is also underutilized (10% overall)

how it is possible for the manager to download jobs when there is not enough memory available to run the job.

In the past I canceled many jobs, also other jobs like SIDOC, Milkyway or LHC, because the Rosetta job worked 0% and all other jobs were blocked so that they passed the expiry date.

I don't think it's right that one job blocks all the others because nothing is calculated!


Operating System: Kubuntu 20.04
KDE Plasma Version: 5.18.8
KDE Frameworks Version: 5.68.0
Qt Version: 5.12.8
Kernel Version: 5.4.0-128-generic
OS Type: 64-bit
Processors: 4 × Intel® Core™ i3-6100 CPU @ 3.70GHz
Memory: 3,7 GiB

BOINC.Manager
Version 7.16.6 (x64)
wxWidgets version 3.0.4

Example:
Anwendung
Rosetta 4.20
Name
12mer_af_hallucinated_172_57_best_SAVE_ALL_OUT_2920568_412
Status
Warten auf Speicher
erhalten
Mo 10 Okt 2022 13:26:17 CEST
Ablaufdatum
Do 13 Okt 2022 13:26:16 CEST
Geschätzter Berechnungsaufwand
80.000 GFLOPs
Prozessorzeit
---
Prozessor-Zeit seit dem letzten Checkpoint
---
bisherige Laufzeit
---
Geschätzte verbleibende Zeit
07:59:39
Fortschritt
0,000%
benötigter Arbeitsspeicher
0 bytes
Größe des Arbeitspakets
0 bytes
Verzeichnis
slots/2
Ausführbare Datei
rosetta_4.20_x86_64-pc-linux-gnu
ID: 107276 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
hadron

Send message
Joined: 4 Sep 22
Posts: 68
Credit: 1,559,185
RAC: 321
Message 107303 - Posted: 13 Oct 2022, 2:08:54 UTC - in response to Message 107276.  

Hello, I often saw that some Rosetta jobs did not start (0.00%).
In the status column: Waiting for memory
and the CPU is also underutilized (10% overall)

how it is possible for the manager to download jobs when there is not enough memory available to run the job.


This has nothing to do with Rosetta.
How much memory are you letting boinc use? That is what determines whether or not you are going to get this message. Check Options/Computing Preferences/Disk and Memory to find out.
On my system, I let boinc have up to 90% of system memory, regardless of what else the computer might be doing. Yes, that does mean that sometimes some things have to be swapped to the swap partition, but it is necessary to make sure that boinc tasks do not get stopped because they can't have enough memory.

I don't have very many other things running, so I can set these values that high. If you do have other memory-hungry tasks running on your system, and must allot less to boinc, then I think your only options are:
a) add more system memory if you can, or
b) set boinc to run fewer simultaneous tasks.
ID: 107303 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile hawkless1

Send message
Joined: 10 Mar 20
Posts: 7
Credit: 504,454
RAC: 0
Message 107307 - Posted: 13 Oct 2022, 9:56:41 UTC - in response to Message 107303.  

in my preferences i have

Use RAM:

when in Use 80%
When not in Use 90% of Memmory

installed is 4GB
aktually in use .... then when i see there ist a Job from Rosetta Aktiv less then 10% cpu usage, less then 1GB Ram in use.
the Job is aktiv and after 60 minutes there will no job cange be done, the system still waiting for this Job but nothing will be done


the next JOB on what the problem is


Anwendung
Rosetta 4.20
Name
12mer_af_hallucinated_153_19_best_SAVE_ALL_OUT_2920368_554
Status
Warten auf Speicher
erhalten
Di 11 Okt 2022 05:37:04 CEST
Ablaufdatum
Fr 14 Okt 2022 05:37:04 CEST
Geschätzter Berechnungsaufwand
80.000 GFLOPs
Prozessorzeit
---
Prozessor-Zeit seit dem letzten Checkpoint
---
bisherige Laufzeit
---
Geschätzte verbleibende Zeit
07:59:40
Fortschritt
0,000%
benötigter Arbeitsspeicher
0 bytes
Größe des Arbeitspakets
0 bytes
Verzeichnis
slots/0
Ausführbare Datei
rosetta_4.20_x86_64-pc-linux-gnu
ID: 107307 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile hawkless1

Send message
Joined: 10 Mar 20
Posts: 7
Credit: 504,454
RAC: 0
Message 107308 - Posted: 13 Oct 2022, 10:04:38 UTC - in response to Message 107307.  

Starting the system:
Do 13 Okt 2022 04:02:30 CEST | | Starting BOINC client version 7.16.6 for x86_64-pc-linux-gnu
Do 13 Okt 2022 04:02:30 CEST | | log flags: file_xfer, sched_ops, task, cpu_sched
Do 13 Okt 2022 04:02:30 CEST | | Libraries: libcurl/7.68.0 OpenSSL/1.1.1f zlib/1.2.11 brotli/1.0.7 libidn2/2.2.0 libpsl/0.21.0 (+libidn2/2.2.0) libssh/0.9.3/openssl/zlib nghttp2/1.40.0 librtmp/2.3
Do 13 Okt 2022 04:02:30 CEST | | Data directory: /var/lib/boinc-client
Do 13 Okt 2022 04:02:31 CEST | | No usable GPUs found
Do 13 Okt 2022 04:02:35 CEST | | libc: Ubuntu GLIBC 2.31-0ubuntu9.9 version 2.31
Do 13 Okt 2022 04:02:35 CEST | | Host name: Spider
Do 13 Okt 2022 04:02:35 CEST | | Processor: 4 GenuineIntel Intel(R) Core(TM) i3-6100 CPU @ 3.70GHz [Family 6 Model 94 Stepping 3]
Do 13 Okt 2022 04:02:35 CEST | | Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault invpcid_single pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm arat pln pts hwp hwp_notify hwp_act_window hwp_epp md_clear flush_l1d arch_capabilities
Do 13 Okt 2022 04:02:35 CEST | | OS: Linux Ubuntu: Ubuntu 20.04.5 LTS [5.4.0-128-generic|libc 2.31 (Ubuntu GLIBC 2.31-0ubuntu9.9)]
Do 13 Okt 2022 04:02:35 CEST | | Memory: 3.72 GB physical, 4.00 GB virtual
Do 13 Okt 2022 04:02:35 CEST | | Disk: 191.19 GB total, 53.62 GB free
Do 13 Okt 2022 04:02:35 CEST | | Local time is UTC +2 hours
Do 13 Okt 2022 04:02:35 CEST | | VirtualBox version: 6.1.38_Ubuntur153438
Do 13 Okt 2022 04:02:35 CEST | | Config: GUI RPCs allowed from:
Do 13 Okt 2022 04:02:35 CEST | | hawk-ntb.frotz.box
Do 13 Okt 2022 04:02:35 CEST | | hawk2000.frotz.box
Do 13 Okt 2022 04:02:35 CEST | | spider.frotz.box
Do 13 Okt 2022 04:02:36 CEST | Einstein@Home | General prefs: from Einstein@Home (last modified 14-Aug-2022 10:31:17)
Do 13 Okt 2022 04:02:36 CEST | Einstein@Home | Host location: none
Do 13 Okt 2022 04:02:36 CEST | Einstein@Home | General prefs: using your defaults
Do 13 Okt 2022 04:02:36 CEST | | Reading preferences override file
Do 13 Okt 2022 04:02:36 CEST | | Preferences:
Do 13 Okt 2022 04:02:36 CEST | | max memory usage when active: 3046.43 MB
Do 13 Okt 2022 04:02:36 CEST | | max memory usage when idle: 3427.24 MB
Do 13 Okt 2022 04:02:40 CEST | | max disk usage: 40.00 GB
Do 13 Okt 2022 04:02:40 CEST | | max CPUs used: 3
Do 13 Okt 2022 04:02:40 CEST | | don't compute while active
Do 13 Okt 2022 04:02:40 CEST | | don't use GPU while active
Do 13 Okt 2022 04:02:40 CEST | | suspend work if non-BOINC CPU load exceeds 25%
Do 13 Okt 2022 04:02:40 CEST | | (to change preferences, visit a project web site or select Preferences in the Manager)
Do 13 Okt 2022 04:02:40 CEST | | Setting up project and slot directories
Do 13 Okt 2022 04:02:40 CEST | | Checking active tasks
Do 13 Okt 2022 04:02:40 CEST | | Using account manager BOINCstatsBAM!
Do 13 Okt 2022 04:02:40 CEST | climateprediction.net | URL https://climateprediction.net/; Computer ID 149207x; resource share 2
Do 13 Okt 2022 04:02:40 CEST | Einstein@Home | URL http://einstein.phys.uwm.edu/; Computer ID 1278939x; resource share 1
Do 13 Okt 2022 04:02:40 CEST | GPUGRID | URL https://www.gpugrid.net/; Computer ID 53992x; resource share 100
Do 13 Okt 2022 04:02:40 CEST | LHC@home | URL https://lhcathome.cern.ch/lhcathome/; Computer ID 1061173x; resource share 21
Do 13 Okt 2022 04:02:40 CEST | Milkyway@Home | URL http://milkyway.cs.rpi.edu/milkyway/; Computer ID 85251x; resource share 24
Do 13 Okt 2022 04:02:40 CEST | Rosetta@home | URL https://boinc.bakerlab.org/rosetta/; Computer ID 378798x; resource share 100
Do 13 Okt 2022 04:02:40 CEST | SiDock@home | URL https://www.sidock.si/sidock/; Computer ID 2592x; resource share 110
Do 13 Okt 2022 04:02:40 CEST | TN-Grid Platform | URL http://gene.disi.unitn.it/test/; Computer ID 5215x; resource share 22
Do 13 Okt 2022 04:02:40 CEST | World Community Grid | URL http://www.worldcommunitygrid.org/; Computer ID 584781x; resource share 50
Do 13 Okt 2022 04:02:40 CEST | yoyo@home | URL http://www.rechenkraft.net/yoyo/; Computer ID 46790x; resource share 20
Do 13 Okt 2022 04:02:40 CEST | | Setting up GUI RPC socket
Do 13 Okt 2022 04:02:40 CEST | | gui_rpc_auth.cfg is empty - no GUI RPC password protection
Do 13 Okt 2022 04:02:40 CEST | | Checking presence of 343 project files
Do 13 Okt 2022 04:07:53 CEST | | Contacting account manager at https://bam.boincstats.com/
Do 13 Okt 2022 04:07:55 CEST | | Account manager: BAM! User: 9744x, hawkless1
Do 13 Okt 2022 04:07:55 CEST | | Account manager: BAM! Host: 91285x
Do 13 Okt 2022 04:07:55 CEST | | Account manager: Number of BAM! connections for this host: 715
Do 13 Okt 2022 04:07:55 CEST | | Account manager: Dear founder of team Lippe.NRW.de, your team is invited to participate in the team challenges (http://boincstats.com/en/bam/teamChallenge/)
Do 13 Okt 2022 04:07:55 CEST | | Account manager contact succeeded
Do 13 Okt 2022 08:11:16 CEST | LHC@home | Sending scheduler request: Requested by project.
Do 13 Okt 2022 08:11:16 CEST | LHC@home | Not requesting tasks: don't need (job cache full)
Do 13 Okt 2022 08:11:17 CEST | LHC@home | Scheduler request completed
Do 13 Okt 2022 08:11:17 CEST | LHC@home | Project requested delay of 6 seconds
last entry
but in the manager ist the Rosetta jab (see abobe) aktive!

[/img]
ID: 107308 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
hadron

Send message
Joined: 4 Sep 22
Posts: 68
Credit: 1,559,185
RAC: 321
Message 107321 - Posted: 14 Oct 2022, 19:34:53 UTC - in response to Message 107308.  

in my preferences i have

Use RAM:

when in Use 80%
When not in Use 90% of Memmory

installed is 4GB
aktually in use .... then when i see there ist a Job from Rosetta Aktiv less then 10% cpu usage, less then 1GB Ram in use.
the Job is aktiv and after 60 minutes there will no job cange be done, the system still waiting for this Job but nothing will be done


You are running boinc tasks on a system with only 4GB installed?? Amazing. If you can you might try adding more RAM to the system. I'd suggest a minimum of 16GB, but on the computer you are talking about here, you might be able to do with only 8.

You can (should?) set both of those RAM settings to 100%. Boinc runs at a very low priority, so if the system needs memory to do something else, it will swap out a boinc task to do that. Adding more system memory will reduce the chances of this ever happening.
ID: 107321 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile hawkless1

Send message
Joined: 10 Mar 20
Posts: 7
Credit: 504,454
RAC: 0
Message 107513 - Posted: 20 Oct 2022, 15:43:39 UTC - in response to Message 107321.  

whith oher projects like LHC, Sidock, Einstein, Climateprediction, Milkyway, yoyo, or TN-Grid i dont have such a problem only with Rosetta since beginning of this Year.
and if the Rosetta is waiting for Memory no other Project will be Crunched.

i will add some more RAM, but the chips (Sticks) are sold out, and the delivery time is currently 14 weeks :-( before next year I will not get any additional RAM :-(
ID: 107513 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile hawkless1

Send message
Joined: 10 Mar 20
Posts: 7
Credit: 504,454
RAC: 0
Message 107514 - Posted: 20 Oct 2022, 15:46:56 UTC - in response to Message 107321.  
Last modified: 20 Oct 2022, 16:00:06 UTC

ok with RAM Setting to 100% it is now crunching.

and... there are suddenly 3 Jobs of Rosetta are running, bevor were two jobs "Wating for Memory", one job "displaced".

2066 /3808 MiB are now in USE of system RAM. outsourcing: 74MiB.

:Thinking: 3 Jobs are using ~2GB RAM,
but not one Job can Start, if the are 90% of 3.8GB (3427MiB) are free??

Thanks for the help
ID: 107514 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote

Questions and Answers : Unix/Linux : Waiting for Memory



©2024 University of Washington
https://www.bakerlab.org