Internet traffic and necessary data

Message boards : Number crunching : Internet traffic and necessary data

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4

AuthorMessage
Profile Carlos_Pfitzner
Avatar

Send message
Joined: 22 Dec 05
Posts: 71
Credit: 138,867
RAC: 0
Message 10738 - Posted: 13 Feb 2006, 21:15:23 UTC
Last modified: 13 Feb 2006, 21:22:18 UTC

or do as SETI Beta are doing, and re-write the app so that it uses a basic method by default, but uses SSE when it detects that the processor is capable of handling that instruction set, best of both world then, because i'm sure you'd get a lot of people complaining that rosetta no longer runs on their older computers,
and besides, rosetta is still seeking more processing power last time i checked, so excluding hosts is a bad thing, especially if the app is just going to error out on a non-compatible host


A simple code to add into rosetta, to verify wich instruction set each cpu can
so u can be sure that app will no crash when executing SSE

U will need to adapt it ... eg: If cpu sse capable goto sse_crunch routine
Else goto non_sse_crunch routine
and may be u will need to have a sse_crunch and a sse2_crunch routines ...
*sure, no pc will get any app crash -:)

//
// chkcpu.c
//
// Check cpu extensions for Intel compatible cpu
//
// Tetsuji "Maverick" Rai

#include <stdio.h>

main(){
  unsigned long _ok_cpuid, _ecx, _edx, _init_flags, _mod_flags;

  __asm__ ("
      pushf;
      pop %%eax;
      mov %%eax, %0;
      xor $0x200000, %%eax;
      push %%eax;
      popf;
      pushf;
      pop %%ebx;
      mov %%ebx, %1;"
           : "=m"(_init_flags), "=m"(_mod_flags)
           );

  printf("init flag = %08x   modified flags = %08xn",
         _init_flags, _mod_flags);
  if (!((_init_flags ^ _mod_flags) & 0x200000)) {
    printf("cpuid isn't availablen");
    return 1;
  }

  printf("nok cpuid is availablen");

  __asm__ ("
      xor %%eax,%%eax;
      inc %%eax;
      cpuid;
      mov %%ecx,%0;
      mov %%edx,%1;"
           : "=m"(_ecx), "=m"(_edx)
           );
  if (_edx & 0x8000){
    printf("cmov : Yesn");
  }else{
    printf("cmov : Non");
  }
  if (_edx & 0x02000000){
    printf("sse  : Yesn");
  }else{
    printf("sse  : Non");
  }
  if (_edx & 0x04000000){
    printf("sse2 : Yesn");
  }else{
    printf("sse2 : Non");
  }
  if (_ecx & 0x1){
    printf("sse3 : Yesn");
  }else{
    printf("sse3 : Non");
  }

  return 0;
}

Click signature for global team stats
ID: 10738 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Lee Carre

Send message
Joined: 6 Oct 05
Posts: 96
Credit: 79,331
RAC: 0
Message 10739 - Posted: 13 Feb 2006, 21:19:12 UTC - in response to Message 10708.  

different compression methods will only be adopted if they work across all platforms, if they don't, then they're not appropriate for BOINC use
Well, that's why I suggested bzip2 as it's an open-source, plug-in replacement for gzip. Works in all platforms.
great stuff, prehaps suggest it on the boinc dev mailing list if it achieves consistently greater compression ratios, it'll help everyone :)

Agreed, but as you correct said more processing power, NOT necessarily more HOSTS. Have a look at CPU stats
true, but if you compare processing rate (using something like TeraFLOPS) against number of hosts, you'll get a positive correlation (more hosts = more processing)

Personally, I'd be happy with offering a beta-SSE-enabled Rosetta executable, as optional install, like many people install optimised BOINC app.
now that's an idea, but obviously to get the most benifit for the cost then you might as well deploy an app that will do it automatically, that's the best cost:benifit ratio, but as a half-way thing then yea, a seperate app would probably help, but you'd need quite a lot of people using the optimised version to notice an improvement
ID: 10739 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Lee Carre

Send message
Joined: 6 Oct 05
Posts: 96
Credit: 79,331
RAC: 0
Message 10740 - Posted: 13 Feb 2006, 21:25:46 UTC - in response to Message 10738.  

A simple code to add into rosetta, to verify wich instruction set each cpu can
so u can be sure that app will no crash when executing SSE

U will need to adapt it ... eg: If cpu sse capable goto sse_crunch routine
Else goto non_sse_crunch routine
and may be u will need to have a sse_crunch and a sse2_crunch routines ...
*sure, no pc will get any app crash -:)
i'm no programmer, so forgive the newbie question

i understand the instruction set selection method, but how hard would it be to have different versions of the routines in the same app, would there be a lot of work involved or is it quite simple?
ID: 10740 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile dcdc

Send message
Joined: 3 Nov 05
Posts: 1832
Credit: 119,675,695
RAC: 11,002
Message 10782 - Posted: 15 Feb 2006, 16:34:10 UTC

Can anyone tell me roughly how much bandwidth rosetta will use, post installation, on a Sempron 2600+ machine that is on ~3hrs a day, running XP? I can probably add this machine, but it's on capped broadband so the bandwidth is all important.

cheers
Danny
ID: 10782 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile SwZ
Avatar

Send message
Joined: 1 Jan 06
Posts: 37
Credit: 169,775
RAC: 0
Message 10783 - Posted: 15 Feb 2006, 16:37:57 UTC - in response to Message 10782.  

Can anyone tell me roughly how much bandwidth rosetta will use, post installation, on a Sempron 2600+ machine that is on ~3hrs a day, running XP? I can probably add this machine, but it's on capped broadband so the bandwidth is all important.

cheers
Danny

About 10Mb per day
ID: 10783 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile dcdc

Send message
Joined: 3 Nov 05
Posts: 1832
Credit: 119,675,695
RAC: 11,002
Message 10787 - Posted: 15 Feb 2006, 21:28:23 UTC - in response to Message 10783.  

Can anyone tell me roughly how much bandwidth rosetta will use, post installation, on a Sempron 2600+ machine that is on ~3hrs a day, running XP? I can probably add this machine, but it's on capped broadband so the bandwidth is all important.

cheers
Danny

About 10Mb per day


Cheers. Unfortunately, I think it'd need to be less than 100MB/month to be viable on that machine.
ID: 10787 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
BennyRop

Send message
Joined: 17 Dec 05
Posts: 555
Credit: 140,800
RAC: 0
Message 10790 - Posted: 15 Feb 2006, 21:57:33 UTC

Keep an eye on the boards - as later on this week we'll be given a new app that will give us the ability to select how many hours of jobs to download for each project downloaded.
So, for a 24 hour run (on always on systems), we can supposedly download 24 hours of work on one project; instead of 8ea 3 hour projects or 48ea 30 minute projects. If this gives the option of up to a week's worth of work per project - then it'll be very easy to get the bandwidth usage below your target.

With better compression added to the client, it'll be possible to reduce the bandwidth usage to 50% to 33% of the current project downloads. So there's still room for improvement.

Reducing bandwidth usage to 1 eighth or 1/48th (depending on the type of projects being handed out at the time) just by switching to 24 hours (the default will be 8) of jobs per project (for always on 24/7 machines) will be a tremendous reduction for those with usage caps.
ID: 10790 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile dcdc

Send message
Joined: 3 Nov 05
Posts: 1832
Credit: 119,675,695
RAC: 11,002
Message 10819 - Posted: 16 Feb 2006, 20:12:11 UTC - in response to Message 10790.  

Keep an eye on the boards - as later on this week we'll be given a new app that will give us the ability to select how many hours of jobs to download for each project downloaded.
So, for a 24 hour run (on always on systems), we can supposedly download 24 hours of work on one project; instead of 8ea 3 hour projects or 48ea 30 minute projects. If this gives the option of up to a week's worth of work per project - then it'll be very easy to get the bandwidth usage below your target.

With better compression added to the client, it'll be possible to reduce the bandwidth usage to 50% to 33% of the current project downloads. So there's still room for improvement.

Reducing bandwidth usage to 1 eighth or 1/48th (depending on the type of projects being handed out at the time) just by switching to 24 hours (the default will be 8) of jobs per project (for always on 24/7 machines) will be a tremendous reduction for those with usage caps.


Yeah - we've ordered the parts for the machine so it'll be easiest to install it when i build it, but I can ask him to install at a later date if the bandwidth requirements can be controlled to a suitable level for him.

ID: 10819 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Astro
Avatar

Send message
Joined: 2 Oct 05
Posts: 987
Credit: 500,253
RAC: 0
Message 11346 - Posted: 24 Feb 2006, 21:13:57 UTC

File compression may soon be offered through Boinc. See below email from Dr. Anderson:

Email from Dr. Anderson:

David Anderson to boinc_projects, boinc_dev
More options 4:01 pm (6 minutes ago)

Libcurl has the ability to handle HTTP replies
that are compressed using the 'deflate' and 'gzip' encoding types.
Previously the BOINC client didn't enable this feature,
but starting with the next version of the client (5.4) it does.
This means that BOINC projects will be able to reduce
network bandwidth to data servers
(and possibly server disk space) by using HTTP compression,
without mucking around with applications.

This is described here:
http://boinc.berkeley.edu/files.php#compression

-- David



Interesting.
ID: 11346 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Hoelder1in
Avatar

Send message
Joined: 30 Sep 05
Posts: 169
Credit: 3,915,947
RAC: 0
Message 11351 - Posted: 24 Feb 2006, 22:56:41 UTC - in response to Message 11346.  

Libcurl has the ability to handle HTTP replies
that are compressed using the 'deflate' and 'gzip' encoding types.
Previously the BOINC client didn't enable this feature,
but starting with the next version of the client (5.4) it does.
This means that BOINC projects will be able to reduce
network bandwidth to data servers
(and possibly server disk space) by using HTTP compression,
without mucking around with applications.

I am not familiar with 'deflate' but since the the Rosetta files already are gzipped, gzipping them a second time wouldn't have any additonal benefit. ;-)
ID: 11351 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Astro
Avatar

Send message
Joined: 2 Oct 05
Posts: 987
Credit: 500,253
RAC: 0
Message 11355 - Posted: 24 Feb 2006, 23:50:55 UTC

ut oh, this just in from Bruce Allen at Einstein:

Bruce Allen <xxxxxx@gravity.phys.uwm.edu>to David, boinc_projects, boinc_dev
More options 5:17 pm (1 hour ago)

David, some project (including E@H) are already sending/returning files
which are 'zipped'. We need to make sure that the cgi file_upload_handler
program does not automatically uncompress files unless this has been
requested specifically by the project.

Cheers,


Then later, Dr. A came out with:

[boinc_alpha] compression bug in 5.3.21 Inbox

David Anderson to boinc_alpha
More options 6:24 pm (25 minutes ago)

We quickly found that the support for gzip compression
breaks Einstein@home and CPDN, which do their own compression.
We're fixing this and it will be in 5.3.22.

-- David


Point here is that if Rosetta uses this compression, users shouldn't just jump for the latest dev client until testers have worked out the bugs. I am a boinc alpha tester and will soon find out if this is a problem. LOL I still have 3 4.81s' before I get on with the 4.82's
ID: 11355 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
BennyRop

Send message
Joined: 17 Dec 05
Posts: 555
Credit: 140,800
RAC: 0
Message 11363 - Posted: 25 Feb 2006, 2:50:02 UTC

Ask them what level of compression they're using for these transfers.. since Gzip allows you to specify a range of compression abilities ranging from Fast to Best. For large files, one hopes they're using the highest compression possible.

`--fast'
`--best'
`-n'
Regulate the speed of compression using the specified digit n, where `-1' or `--fast' indicates the fastest compression method (less compression) and `--best' or `-9' indicates the slowest compression method (optimal compression). The default compression level is `-6' (that is, biased towards high compression at expense of speed).


from http://www.math.utah.edu/docs/info/gzip_4.html#SEC7

For that matter, is Rosetta using -9/--best with the zlib compression it currently uses?
ID: 11363 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
alvin

Send message
Joined: 19 Jul 15
Posts: 5
Credit: 6,550,555
RAC: 0
Message 78722 - Posted: 8 Sep 2015, 2:22:28 UTC

I have currently running this project and its all fine except one thing
download data amount
here is monthly report
address download upload total
bakerlab.org 24.0 GB (5.4 %) 6.00 GB (6.7 %) 30.0 GB (5.6 %)

It's strange as I have opposite issue with other projects - they have huge ratio for download:upload as 1:5 or more.

The issue is amount of traffic : could I ask you to pack results on client side if possible?

Could compressing data be an option in settings?

I suppose all those years ages ago noone cares about those amounts, but why the difference disbalance between incoming data and outcoming data is so huge?
Anyway I think some action either on project side or whole boinc side could be done to pursue the balance and minimise traffic.
ID: 78722 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2125
Credit: 41,249,734
RAC: 8,235
Message 78725 - Posted: 8 Sep 2015, 4:05:05 UTC - in response to Message 78722.  

I have currently running this project and its all fine except one thing
download data amount
here is monthly report
address download upload total
bakerlab.org 24.0 GB (5.4 %) 6.00 GB (6.7 %) 30.0 GB (5.6 %)

It's strange as I have opposite issue with other projects - they have huge ratio for download:upload as 1:5 or more.

The issue is amount of traffic : could I ask you to pack results on client side if possible?

Could compressing data be an option in settings?

I suppose all those years ages ago noone cares about those amounts, but why the difference disbalance between incoming data and outcoming data is so huge?
Anyway I think some action either on project side or whole boinc side could be done to pursue the balance and minimise traffic.

I think all tasks are already packed.

I notice you keep a 7 day buffer, which is much bigger than necessary. I get away with just 2 days quite comfortably. It's rare to need anything more.

But your biggest problem is that you use only a 1 hour runtime, for each of your 32 PCs! 500 or more tasks each makes 16,000! There's your problem!

First, cut your buffer down to 2 days in BOINC under Computing Preferences - or whatever you're comfortable with. Leave it for 5 days to let your buffers run down, then go online and increase your run time for each machine. But do this slowly otherwise tasks will miss their deadline. So just increase from 1 hour to 2 hours at first and leave it a few days again before increasing to 4 hours.

Reducing your buffers will mean you'll only have uploads for 5 days, no downloads at all. Doubling your runtime to 2 hrs will halve your previous volume of downloads while (I think) not changing the upload size (or by very little if it does increase). Doubling to 4hrs will halve downloads again.

It's up to you if you decide to increase to 6hr runtimes, which is the default. If you do, your downloads will reduce again in proportion.
ID: 78725 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
alvin

Send message
Joined: 19 Jul 15
Posts: 5
Credit: 6,550,555
RAC: 0
Message 78726 - Posted: 8 Sep 2015, 4:14:41 UTC - in response to Message 78725.  
Last modified: 8 Sep 2015, 4:18:49 UTC

thanks man
so is CPU running time is equivalent of portion of tasks executed?
so does it slice tasks to portions therefore then I have more runtime it just crunches one task or portions of it longer? Am I correct?

------
those 7 or 10 days for tasks appeared after my fight to get tasks for projects then they claim "not a priority project" etc
lets see
ID: 78726 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Timo
Avatar

Send message
Joined: 9 Jan 12
Posts: 185
Credit: 45,649,459
RAC: 0
Message 78728 - Posted: 8 Sep 2015, 5:28:17 UTC

Basically, to properly query the energy space of a structure, many decoys of said structure need to be simulated - a longer runtime means it will simulate more decoys before reporting work to/pestering for work from the server.

It's...
a) more efficient on your end as there's less time spent doing disk I/O switching between models
b) more efficient for the project servers as they can bulk load in results/create bigger bulk work faster
c) will use less bandwidth as more resources are shared between decoy runs as the target models don't change as frequently
ID: 78728 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 78736 - Posted: 8 Sep 2015, 15:50:22 UTC
Last modified: 8 Sep 2015, 15:59:00 UTC

The bandwidth will not vary one-to-one with runtime, because sometimes you get several tasks that use the same underlying database. And in that sense, having a large number of tasks improves your odds of already having a similar task on deck somewhere. But I agree with the the suggestions (and actually just responded with the same in response to IMs from Costa).

With that many machines, night and day difference in download bandwidth will be achieved using a cacheing proxy server. This will afford the same effect as described above, where already having another task from the same batch of work will avoid a large DB download, but now leverage that across all of the hosts using the proxy, rather than just within a single host.

The project also changed application levels recently, and so without a cacheing proxy, each host had to download it's own copy of the new executables and libraries.
Rosetta Moderator: Mod.Sense
ID: 78736 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2125
Credit: 41,249,734
RAC: 8,235
Message 78742 - Posted: 9 Sep 2015, 2:18:39 UTC - in response to Message 78726.  
Last modified: 9 Sep 2015, 2:21:34 UTC

thanks man
so is CPU running time is equivalent of portion of tasks executed?
so does it slice tasks to portions therefore then I have more runtime it just crunches one task or portions of it longer? Am I correct?

What the others said is right. I looked at one of your tasks on one machine and it reported it completed 5 "decoys" in 1 hour. If you increased to 2 hours it would run 10 and you get double the credit. I think the maximum allowed in a task is 99. The number of "decoys" varies a lot, but obviously the default 6 hour runs manage it fine.

those 7 or 10 days for tasks appeared after my fight to get tasks for projects then they claim "not a priority project" etc
lets see

I think you run a lot of projects. When you do, BOINC goes a bit weird. Increasing the number of days makes things worse, so I read, so cutting down to 2 days (or less) will help. I think the default is actually 0.25 days so you could cut it down even more if you like. If ever tasks dry up on Rosetta (happens only once every 6 months or so) you have plenty of other project tasks to take up the slack.

It's all a learning curve. No harm done. Thanks for committing so many machines to Rosetta!
ID: 78742 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
alvin

Send message
Joined: 19 Jul 15
Posts: 5
Credit: 6,550,555
RAC: 0
Message 78743 - Posted: 9 Sep 2015, 2:53:09 UTC
Last modified: 9 Sep 2015, 2:55:15 UTC

I've started with LHC but then it goes out of tasks I got to have something esle and then it all builds up.
Also some GPU-based projects' tasks are longer than 5 days alone, so this is a huge mess and mix in that.
Finally boinc really plays up with different combinations of projects not getting tasks even if they are available because of whatever (berkeley support wasn't been helpful in that) so my goal is to have crunchers performing instead of idling so lets see.
ID: 78743 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 · 2 · 3 · 4

Message boards : Number crunching : Internet traffic and necessary data



©2024 University of Washington
https://www.bakerlab.org