Google Answers Logo
View Question
 
Q: How fast can files be distributed over the internet? ( Answered 5 out of 5 stars,   2 Comments )
Question  
Subject: How fast can files be distributed over the internet?
Category: Computers > Internet
Asked by: mjk234-ga
List Price: $200.00
Posted: 26 Sep 2005 13:29 PDT
Expires: 26 Oct 2005 13:29 PDT
Question ID: 572903
I was just curious, once I put up a file to be shared on the internet,
how fast can it get shared and on average how many people can access
it if the file is encouraged to be shared and doesn't break any
copyright law?  For instance, if I wanted to put an MP3 of my band up
on the web, how many people can I expect to download it within a
period of a day, or a week?

What about for larger files?  Like, say, I want to upload an hour long
video to be shared freely on the internet?  How many people would be
able to share it within the span of a day, week, and month if I was
using a program like Bittorrent?

I know some of these answers also depend on visibility, so what are
the answers for files with no marketing other than word of mouth and
files that are marketed using website ads, PPC advertising, etc.? 
Kind-of like best and worst case scenereo in terms of visibility of
the file?

Also, could you list resources and statistics of file distribution and
person-to-person file sharing over the internet?  As in what files are
most popular (like music files, or program files, or video files), how
many people are sharing the files, how fast the files are shared, what
programs are used to share the files, etc?
Answer  
Subject: Re: How fast can files be distributed over the internet?
Answered By: welte-ga on 02 Oct 2005 10:58 PDT
Rated:5 out of 5 stars
 
Hi mjk234, and thanks for your interesting question!  As mentioned in
the Comments, your question spans the entire range of potential file
distribution speeds and sizes over multiple peer-to-peer (P2P)
networks with varying degrees of advertising and visibility.  The
answer to your question is unlikely to be a simple number or even
range of numbers.  To reasonably address your question, I turned to
scholarly research on the topic of P2P distribution networks (also
known as overlay networks among computer scientists).  Here's what I
found...

=================
BitTorrent Distribution
=================

One approach is to simply sample an entire P2P distribution network
and gather statistical information.  This will include a statistically
representative sample of file sizes, visibility, people sharing the
file, etc.  A group from the University of Oslo examined the use of
BitTorrent as a potential Content Distribution Network (CDN):

Karl-André Skevik, Vera Goebel, Thomas Plagemann.  Analysis of
BitTorrent and its use for the Design of a P2P based Streaming
Protocol for a Hybrid CDN.  Department of Informatics, University of
Oslo.
http://www.ifi.uio.no/english/research/groups/dmms/papers/129.pdf

In their research, this group used a modified version of the
BitTorrent client called TorrentSniff to make measurements of the
network.  Using this tool, the group was able to look at multiple
BitTorrent sessions involving multiple host operating systems across
the globe.  Figure 1 (on page 6) gives a plot of seedtime (hours) vs.
average client download speed (Mb/sec).  This is particularly relevant
to the portion of your question regarding the amount of time to
distribute video content (or any other content) via the BitTorrent
network.  The interpretation of the graph is as follows: The
BitTorrent network functions by a host uploading fragments of a file
(seeding) to other hosts.  If all of these clients have slow download
speeds (low Mb/Sec), then it will take a longer time to seed a file of
a given size, and vice versa.  As this graph was generated from real
data within the BitTorrent network, it inherently includes factors
such as when a client quickly downloads a file from a host (who's
seeding it) and then goes offline (and therefore fails to seed the
file to others) and so on.

Figure 2 of this article is also interesting, and gets at some of the
factors contributing to the variability in the rate of file
distribution. This measurement simply looked at the number of hosts on
the network, how many were seeding a given file, how many had
firewalls in place, etc., over a period of 35 days.  Interestingly,
you can see a peak (nearly a factor of 2 in total hosts) at day #7,
which corresponds to the Easter holiday when more people were
presumably at home using their computers.

The effect of host characteristics on download speed can be complex. 
For example, take a look at Figure 3.  There was approximately a
factor of 2 difference between open vs. firewalled hosts.  Also (see
Figure 5), open hosts received more data than those between firewalls,
helping to increase the rate of file distribution.  These measurements
are in the aggregate - see Figure 6 for the per host speed
differences.  Interestingly, in spite of speed differences, there was
little difference in the percent of a given file that was completed at
the time of disconnection from the network (see Figure 7).

The bottom line...  The authors compile the data and give average download rates:

"The average download speed in the observed network lies at roughly 0.2M bitps, 
0.3M bitps for open hosts. The average speed was seen to vary between BitTor- 
rent sessions measured. [21] found average speeds in the range from 0.5M bitps 
to 0.6M bitps, for a similar session. Take into consideration that the maxi- 
mum download rate for Digital Subscriber Line subscriptions are often between 
0.5M bitps and 2M bitps, and the performance achieved with P2P based net- 
working is evidently promising with regards to streaming, even without any 
special support for this."

Mbitps = Megabits/sec.  1 Megabit/sec = 128 Kilobytes/sec (Kb/sec).
Here's a conversion tool:
http://www.matisse.net/bitcalc/

The remainder of this article discusses a proposed network system for
streaming content using a P2P paradigm, which is unlikely to be
immediately relevant to your question.

________________

A second study from The Netherlands looked at BitTorrent performance as well:

J.A. Pouwelse, P. Garbacki, D.H.J. Epema, H.J. Sips.  A Measurement
Study of the BitTorrent
Peer-to-Peer File-Sharing System.  Parallel and Distributed Systems
group, Delft University of Technology, The Netherlands.  May 15, 2004.
http://www.isa.ewi.tudelft.nl/~pouwelse/bittorrent_measurements.pdf
http://iptps05.cs.cornell.edu/PDFs/CameraReady_202.pdf
http://www.theregister.co.uk/2004/12/18/bittorrent_measurements_analysis/

The authors conclude:

"BitTorrent is the indisputable leader in download performance. Its
lack of searching
functionality is compensated by an advanced download distribution protocol that 
leaves the competitors far behind."

The article goes on to discuss the means by which .torrent files are
seeded to the network and how the Suprnova.org (NOT .com) functions. 
Figure 6 (on page 18) shows the distribution of downloads vs. average
download bandwith (kB/s) and is roughly in agreement with the results
of the paper cited above:

"It turns out that 90% of the peers have a download speed below
65KB/s; the average download speed of 30KB/s allows peers to fetch
even large files in a day."

This article also looked at content lifetime (see Figures 7-8 on pages 19-20):

"Figure 8 shows that seeds with a high uptime are rare. Only 9,219 out
of 53,883 peers (17%) have an uptime longer than one hour after they
finished downloading."

This implies that people may download your file(s), but they are
unlikely to remain online long enough to share the file with others. 
Not to be discouraged, the article also looked up the lifetime of a
file vs. the number of seeds and found that even files with a single
seed can have long lifetimes (see Fig. 7).

This and other articles discuss pollution levels, which relates to how
many corrupt or damaged files are present in the system.  This is
unlikely to affect you (except as it affects the entire bandwidth of
the network), since your files are unlikely to be targets for those
posting corrupt files.  An exception would be the automatic scripts
running on the network that generate files with names corresponding to
the search string.  These are only a problem if a user has their
computer set to automatically download matches to their search string,
which will propagate these rogue files.

The group also mentions the issue of popularity as it effects file distribution:

"Our popularity measurements show that the number of downloads in
BitTorrent/Suprnova
is strongly influenced by the availability of the central components. We concluded 
that the lack of decentralization in BitTorrent/Suprnova is the cause
of the avail-
ability problems, and we proposed an improvement to the system by completely 
decentralizing the functionality of the central components across the
peers in order
to solve the availability problems."

________________


In terms of types and numbers of files shared, Figure 1 gives a good
idea of the answer to this question and shows the variability of file
downloads (by type and for the total) on this network.

Cache Logic has also done research in this area, finding that 11.34%
of files are audio, 61.44% video, and 27.22% other, across 4 different
P2P networks (August, 2005):
http://www.cachelogic.com/research/filetypestudy.php

You can view the full slide show of this presentation here:
http://www.cachelogic.com/research/ft_slide1.php

You can also read the related press release:
http://www.cachelogic.com/news/pr090805.php


This group has also done analysis of the worldwide picture of P2P networks:

http://www.cachelogic.com/research/p2p2004.php
http://www.cachelogic.com/research/slide1.php
http://www.cachelogic.com/news/pr040715.php

Slide #13 gives the relative popularity of 4 different P2P networks
around the world:
http://www.cachelogic.com/research/slide13.php

Here is more, similar data from the same group:
http://www.cachelogic.com/p2p/p2ptraffic.php

The following is an additional batch of research giving a worldwide
overview of P2P networks around the world in 2005.  It includes
breakdowns by country of the most popular networks, as well as the
distribution of P2P network traffic (Slides 8-9).  This may aid you in
determining which P2P networks to target with your content, depending
on which international regions you want to target.  There are also
multiple slides breaking down the file formats that predominate, which
should help you decide what audio or video encoding you would like to
employ.

http://www.cachelogic.com/research/2005_slide01.php
http://www.cachelogic.com/research/p2p2005.php


Another report relevant to the content of P2P networks comes from the
US Government General Accounting Office (GAO), investigating the
prevalence of child pornography on P2P networks.  In the course of
their investigation, they found that 34% of images were adult
pornography, 42% were child pornography, and 24% were nonpornographic.
 See Figure 1 on pg. 12.  Sad and disturbing.

http://www.gao.gov/new.items/d03351.pdf

================
Other P2P networks
================

An influential study (cited by 661 other papers according to Google
Scholar) was done by a group at the University of Washington
Department of Computer Science & Engineering:

Stefan Saroiu, P. Krishna Gummadi, Steven D. Gribble.  A Measurement
Study of Peer-to-Peer File Sharing Systems.  Department of Computer
Science & Engineering, University of Washington.
http://www.cs.ucsb.edu/~almeroth/classes/F02.276/papers/p2p-measure.pdf

This article describes the nature of hosts participating in the
Napster and Gnutella networks.  Obviously the data for the Napster
network is now irrelevant, since Napster is no longer a free, open
network.  The data for Gnutella, however, likely remains useful.

The right portion of Figure 4 (page 7) describes the distribution of
bandwidths among Gnutella clients.  30% of Gnutella clients had very
high bandwidth connections (at least 3Mbps).  The other factor that
plays a big role in the speed of file distribution is host latency,
which can be defined as the time it takes for a host to respond to a
request for data.  A host with a high bandwidth connection and low
latency (and long uptime, with many files to share) would be
considered "server-like," as opposed to one with high latency and low
bandwidth, which would be more like a client.  The more server-like
hosts, the faster the network will be.  Figures 5-7 describe the
distribution of these characteristics throughout the Gnutella network.

________________

A second recent study looked at the performance of P2P networks as
they might be used by mobile users (e.g. using GPRS or other mobile
wireless networks):

Tobias Hoßfeld, Kurt Tutschku, Frank-Uwe Andersen, Hermann deMeer,
Jens O. Oberender.  Simulative Performance Evaluation of a Mobile
Peer-to-Peer File-Sharing System.  Department of Distributed Systems,
University of Würzburg, Germany.
http://www.fmi.uni-passau.de/~oberende/publications/performanceevalmobile.pdf

Figure 3 shows the download volume for a sample 3MB file.  Their
overall conclusion is that GPRS mobile technology is not yet at a
state where it is realistic to use for P2P file sharing.

________________

The KaZaa Network is perhaps the most popular P2P network.  

Nathaniel Leibowitz, Matei Ripeanu, Adam Wierzbicki.  Deconstructing
the Kazaa Network. Expand Networks; Computer Science Department, The
University of Chicago; Polish-Japanese Institute for Information
Technology.
http://www.cse.iitb.ac.in/~joytechie/seminar/papers/casestudies/Leibowitz-DeconstructingKazaa.pdf

This study found that, perhaps not surprisingly, a large fraction of
the bandwidth was due to traffic in a relatively small number of
popular files:

"...it becomes obvious that about 30% of all download cycles go to the
1% most popular files..."

"The behavior we notice in the previous graph is much more pronounced:
we observe that as little as 2500 files (a mere 0.8% of all detected
files) account for as much as 80% of the traffic. "

In Figures 9, 10, the authors show the number of new files that enter
the network (by hour and by day).  Their data does not go far enough
to establish a steady state value, but gives a rough idea of how many
new files your files would be up against:

"During the period of observation, the number of new unique files did
not decrease to zero and did not stabilize at a constant level.
However, it is reasonable to suppose that this value would stabilize
during a longer observation period. We suggest an interesting
explanation for the steady state value: it indicates the rate at which
new files enter the network, in other words the rate at which new
songs, movies games and the like are created."

Also relevant to your question - This paper found that there are two
populations of popular files:

"Summarizing these two experiments, we obtain that 15% of the highly
popular files, remain popular throughout the experiment, while the
rest are popular shorter time intervals. This indicates that the
popular files are composed of two sets: a set of persistently popular
files and a set of transiently popular files whose popularity is short
lived."

This means that if you can advertise your files to the point that they
become popular, you likely should not rest on your laurels, riding the
popularity wave, or you may end up in the "transiently popular" group
and fade to obscurity.

________________

You may have heard that eDonkey recently threw in the towel as a P2P
developer.  Here is a story with congressional testimony from Sam
Yagan, President of MetaMachine, Inc. in which he discusses his
experiences with independent bands using P2P networks for
distribution:

"As a graduate student at Stanford University, I conducted a study on
possible new business models for the music industry given the advent
of file sharing and in particular looked at independent bands and
their use of P2P as a distribution mechanism. Without exception, every
independent band I interviewed begged for increased distribution ?
gladly willing to distribute promotional music for free in the hopes
of gaining fans and widespread popularity. In fact, after
file-sharing, the technique most cited by independent bands for
acquiring fans was taping signs to lampposts. I find it difficult to
refute the fact that many independent bands thrive on the distribution
offered by P2P application, even in its current ?open? form."

http://p2pnet.net/story/6408
http://judiciary.senate.gov/testimony.cfm?id=1624&wit_id=4689

==========
Popular files
==========

Finding the most popular files on a P2P network proved to be more
difficult than I anticipated.  Fortunately, a group at UMass studied
just this issue for files in 2002 on the Gnutella network.  If you're
interested, the Tables in their report gives information on the number
of users seen, number of files, number of transfers, etc., for this
period.  At the end of the article, Table 4 (pg. 10) gives a list of
the top 60 files (of any type) along with their filename extension and
number of users sharing each given file.  This table includes all
sorts of small files that are not typically sought out by users (e.g.,
"divider.giv", etc.).  Perhaps of more interest is Table 5 (pg. 11),
which shows the 60 most popular MP3 files, again along with the
numbers of users sharing them.  Table 6 (pg. 12) lists the most
popular video files.  Some of the more graphic file names have been
replaced by [pornographic name].  Fascinating stuff!

Jacky Chu, Kevin Labonte, Brian Neil Levine.  Availability and
Popularity Measurements of Peer-to-Peer File Systems.  Department of
Computer Science, University of Massachusetts.
http://prisms.cs.umass.edu/brian/pubs/chu.labonte.p2pjournal.pdf

====================================================

So, to , there are a wide range of possible distribution speeds that
depend on many factors, including the visibility of your file(s),
choice of P2P network, host issues and bottlenecks, size of the files,
etc.  Details of how these factors affect the rate of distribution are
discussed above and further details are available in the studies cited
above.

Likely the largest factor initially will be visibility.  Band
promotion and advertising is probably a topic for another question,
but some basic options are listed in the "Other Resources" section
below.  I highly recommend Podcasting to spread the word of your band
and your music and video files.  Another way to more efficiently
promote the band via word-of-mouth is to chat with P2P users (most P2P
programs allow you to chat publicly or privately with other current
users).  Asking these individuals to try out your music and spread the
word if they like what they hear can help kick start the process. 
Most people would probably also really enjoy chatting online with
members of a new band, particularly if they enjoy your music.  Posting
your file information to Usenet newsgroups is also a means by which
you may attract  new listeners without incurring advertising costs.

Another great outlet is CDBaby.com.  My wife actually uses this site
for distribution of independent CDs at minimal cost.
http://cdbaby.com/


I hope this information will be useful.  I wish you and your band all
the best in your work and promotion.  Feel free to request any
clarification.

Best,
            -welte-ga

====================================================


=============
Other Resources
=============

You may gain visibility by publishing a Pod-cast:

http://www.apple.com/podcasting/
http://en.wikipedia.org/wiki/Podcasting

http://www.ipodder.org/
http://www.podcast.net/addpodcast

Here's an article from Business Week describing the current
competition between large media outlets and indies in the Podcast
world:

http://www.businessweek.com/magazine/content/05_33/b3947062_mz011.htm


Here's an article from The Economist discussing the present and future
of music distribution, comparing big music vs. indies with an overview
of P2P and podcasting:

http://msl1.mit.edu/furdlog/docs/2004-10-28_economist_music_industry.pdf

________________

This article describes the various technologies employed in the design
of P2P networks.  It may be interesting for you if you want to know
more about, say, how various networks such as KaZaa or Gnutella are
constructed and function.  It's intermediate in it's technical level.

Stephanos Androutsellis-Theotokis and Diomidis Spinellis. A survey of
peer-to-peer content distribution technologies. ACM Computing Surveys,
36(4):335?371, December 2004. (doi:10.1145/1041680.1041681).

Draft version is freely available:
http://www.spinellis.gr/pubs/jrnl/2004-ACMCS-p2p/html/AS04.pdf

________________

For more information on the most popular files, programs used for
sharing files, etc., I recommend the following sites:

For general P2P news:

Slyck News
http://www.slyck.com/

One of the larger P2P portals:
http://search.suprnova.org/


________________

P2P clients:

CNet's Download.com recently instituted a policy of not allowing
software that contains spyware, so this is a good site to visit for
safer software:

http://www.download.com/
http://www.download.com/3120-20_4-0.html?tag=srch&qt=p2p&tg=dl-20&search.x=0&search.y=0&search=+Go%21



===========
Search strategies:

Using scholar.google.com and Google.com:

file distribution speeds
file distribution speeds (P2P OR "peer to peer")
file distribution speeds (P2P OR "peer to peer") "new file"
file distribution speeds (P2P OR "peer to peer") "new content"
"new content" (P2P OR "peer to peer") dissemination ~rate
"new (content OR ~file)" (P2P OR "peer to peer") dissemination ~rate
file ranking (p2p OR peer-to-peer) audio video ~porn
mjk234-ga rated this answer:5 out of 5 stars
Excellent Job.

Comments  
Subject: Re: How fast can files be distributed over the internet?
From: searchingforananswer-ga on 26 Sep 2005 15:13 PDT
 
In my opinion, your scenario has far too many variables to get an
accurate response.  I've got music own music on several websites and
have some success at it... but..  as to what you can expect from doing
this..  who knows.  Foremost your success is going to depend on the
amount of traffic to the website.  You can have the best site in the
world..  but people need to know it exists among the billion others. 
Choosing to build a site somewhere that may get you lateral traffic is
a good idea..  there are several musician sites that do that..  one I
use is www.soundclicks.com...  you can also build for free at
www.freewebs.com and get some lateral traffic there to your site. 
Both are easy to build on.  Second...  even if someone downloads it,
will they make an effort to share it with anyone once they do.  Most
people don't make the effort..  atleast not immediately.  They may
tell a friend... but will that friend visit your site?  Third, the
speed at which a song can be downloaded is determined by two
factors... the amount of bandwith provided to you by the host
server...  and the speed with which the person connecting to the site
is running as well.  A website can be a very effective tool for
distribution...  but your primary concern should be the advertising of
the website.  I wouldn't recommend using a peer server unless you have
a recognizeable name.  Most peer groups are depended upon someone
searching you or your song out.  They won't know to look for you
unless you're someone they've heard of.  Good luck to you.
Subject: Re: How fast can files be distributed over the internet?
From: shahzadafzal-ga on 01 Oct 2005 03:29 PDT
 
I think... i mostly depends of which site you are going to upload ur
music of any other file. But if want ur own site think make ur own web
site there lot of site offering free home page or even some sort of
little web site. Geocities.com, freeservers.com, bravenet.com like
many others. Than point comes how people will know about ur web site,
means how ur going to promote ur site......... Any new site never
beacome popular instantly it requires time. And most of all the thing
is what r u going to oofer in ur web site or the web site where r u
going to put ur data........ in start u must offer any thing extra
ordinary some of type of oppurtinities for the viewers of web site. Of
cource web site is very good idea to promote ur work. But if u dont
want this... than u have to goin some group there are very popular
groups on the yahoogroups.com and msngroups.com also there are
fourms..... website u must join and than very simple mail to the
members of group or publish it to the fourm........ in just one day i
think more than 10000 user will get instant access to ur music, or
video. and i think this is the best option. Thank You and BEst of luck

Important Disclaimer: Answers and comments provided on Google Answers are general information, and are not intended to substitute for informed professional medical, psychiatric, psychological, tax, legal, investment, accounting, or other professional advice. Google does not endorse, and expressly disclaims liability for any product, manufacturer, distributor, service or service provider mentioned or any opinion expressed in answers or comments. Please read carefully the Google Answers Terms of Service.

If you feel that you have found inappropriate content, please let us know by emailing us at answers-support@google.com with the question ID listed above. Thank you.
Search Google Answers for
Google Answers  


Google Home - Answers FAQ - Terms of Service - Privacy Policy