Google Answers Logo
View Question
 
Q: Gnutella data ( No Answer,   0 Comments )
Question  
Subject: Gnutella data
Category: Science > Technology
Asked by: gcpo-ga
List Price: $20.00
Posted: 21 May 2004 05:30 PDT
Expires: 20 Jun 2004 05:30 PDT
Question ID: 349894
I am a graduate student at the University of South Florida.  For my
research I need the distribution of files in a "typical" p2p file-sharing
network.  Specifically, I seek recent data that contains the node_id and
list of shared files (by name, date, and size) for each node of a typical
p2p network.  This would include free loading nodes who copy files, but
share none (so, their directory list would be an empty set).  Such a
typical p2p network should probably contain thousands of nodes (certainly,
ten nodes is not sufficient).  Having knowledege of the underlying
topology (connections between nodes) would be a bonus, but is not really
needed.  It is OK if the node_id and even the file names are made secret
by a unique one-way hash (e.g., an MD5 hash).

Since this data will be used for to-be-published research, it must be
"open" and reproducible.  Best might be existing trace data already
collected and archived by a credible source.  Also good would be a tool
whereby I could collect such data myself.  My end goal is a probability
function for a P2P network for Pr[node i contains file x].

I am willing to pay $20.00 dollars and provide an acknowldgement in any
published works that use the data.

Request for Question Clarification by philip_lynx-ga on 23 May 2004 19:58 PDT
Dear gcpo,

while I have no raw data for you, there are some previous works in
this area, where the authors may be willing to share their collected
information (or the tools used for the collection, if you want to
create your own datasets). Let me point them out to you.

S. Saroiu, P. Gummadi, S. Gribble, "A Measurement  Study of P2P File 
Sharing  Systems", University  of  Washington Technical Report
UW-CSE-01-06-02, July 2001.
http://www.cs.washington.edu/homes/gribble/papers/mmcn.pdf

Matei Ripeanu and Ian Foster, "Mapping the Gnutella Network:
Macroscopic Properties of Large-Scale Peer-to-Peer Systems", IPTPS02,
http://www.cs.rice.edu/Conferences/IPTPS02/128.pdf
Here, section 3.3 and further may be of most interest to you.

Eytan Adar, Bernardo A. Huberman, "Free Riding on Gnutella", 2000.
http://citeseer.ist.psu.edu/316990.html

If this information satisfies at least some of your needs, I can post
it as an answer. However, feel free to reduce the list price first...

Friendly greetings,

     Philip Lynx

Clarification of Question by gcpo-ga on 24 May 2004 05:19 PDT
Hello Philip,

Thank you for your interest and prompt reply. Indeed the reference you
have writen are similar to what I am asking. I have Gribble's data but
it only has the size of the data shared by each node in the P2P
network.
I am looking for more details like the file names that are shared by
each node in the network.

Cheer ;>
Graciela

Request for Question Clarification by philip_lynx-ga on 24 May 2004 06:06 PDT
Hello Graciela,

I am sorry, but I can't help you further with your specific request.
Most protocols prevent the listing out of files that a node holds for
obvious reasons, except if explicitly enabled by the user. See e.g.
the eMule docs, or Gnutella v0.6 (
http://rfc-gnutella.sourceforge.net/developer/testing/ ). Thus you can
gather statistics about number of files, size of shared data of a
node, but not about specific filenames / content descriptors (e.g.
SHA-1 values). That is one of the reasons why there is no present
research data (I think) of the kind you are looking for.

I can think of three ways for you on how to gather the data you want:

1) run a gnutella ultrapeer or eMule/eDonkey Server and collect
information about your leafs / clients. (should be well doable)
2) ask a software provider to add specific monitoring features to a
release, so that you can gather feedback (very unlikely)
3) watch the traffic your node forwards for others and/or issue random
3-letter search queries (e.g. mp3, avi, mpg, zip, ...) and sample what
kind of results you get.

Sorry for the bad news, and good luck!

    Philip
Answer  
There is no answer at this time.

Comments  
There are no comments at this time.

Important Disclaimer: Answers and comments provided on Google Answers are general information, and are not intended to substitute for informed professional medical, psychiatric, psychological, tax, legal, investment, accounting, or other professional advice. Google does not endorse, and expressly disclaims liability for any product, manufacturer, distributor, service or service provider mentioned or any opinion expressed in answers or comments. Please read carefully the Google Answers Terms of Service.

If you feel that you have found inappropriate content, please let us know by emailing us at answers-support@google.com with the question ID listed above. Thank you.
Search Google Answers for
Google Answers  


Google Home - Answers FAQ - Terms of Service - Privacy Policy