I am a graduate student at the University of South Florida. For my
research I need the distribution of files in a "typical" p2p file-sharing
network. Specifically, I seek recent data that contains the node_id and
list of shared files (by name, date, and size) for each node of a typical
p2p network. This would include free loading nodes who copy files, but
share none (so, their directory list would be an empty set). Such a
typical p2p network should probably contain thousands of nodes (certainly,
ten nodes is not sufficient). Having knowledege of the underlying
topology (connections between nodes) would be a bonus, but is not really
needed. It is OK if the node_id and even the file names are made secret
by a unique one-way hash (e.g., an MD5 hash).
Since this data will be used for to-be-published research, it must be
"open" and reproducible. Best might be existing trace data already
collected and archived by a credible source. Also good would be a tool
whereby I could collect such data myself. My end goal is a probability
function for a P2P network for Pr[node i contains file x].
I am willing to pay $20.00 dollars and provide an acknowldgement in any
published works that use the data. |
Request for Question Clarification by
philip_lynx-ga
on
23 May 2004 19:58 PDT
Dear gcpo,
while I have no raw data for you, there are some previous works in
this area, where the authors may be willing to share their collected
information (or the tools used for the collection, if you want to
create your own datasets). Let me point them out to you.
S. Saroiu, P. Gummadi, S. Gribble, "A Measurement Study of P2P File
Sharing Systems", University of Washington Technical Report
UW-CSE-01-06-02, July 2001.
http://www.cs.washington.edu/homes/gribble/papers/mmcn.pdf
Matei Ripeanu and Ian Foster, "Mapping the Gnutella Network:
Macroscopic Properties of Large-Scale Peer-to-Peer Systems", IPTPS02,
http://www.cs.rice.edu/Conferences/IPTPS02/128.pdf
Here, section 3.3 and further may be of most interest to you.
Eytan Adar, Bernardo A. Huberman, "Free Riding on Gnutella", 2000.
http://citeseer.ist.psu.edu/316990.html
If this information satisfies at least some of your needs, I can post
it as an answer. However, feel free to reduce the list price first...
Friendly greetings,
Philip Lynx
|
Clarification of Question by
gcpo-ga
on
24 May 2004 05:19 PDT
Hello Philip,
Thank you for your interest and prompt reply. Indeed the reference you
have writen are similar to what I am asking. I have Gribble's data but
it only has the size of the data shared by each node in the P2P
network.
I am looking for more details like the file names that are shared by
each node in the network.
Cheer ;>
Graciela
|
Request for Question Clarification by
philip_lynx-ga
on
24 May 2004 06:06 PDT
Hello Graciela,
I am sorry, but I can't help you further with your specific request.
Most protocols prevent the listing out of files that a node holds for
obvious reasons, except if explicitly enabled by the user. See e.g.
the eMule docs, or Gnutella v0.6 (
http://rfc-gnutella.sourceforge.net/developer/testing/ ). Thus you can
gather statistics about number of files, size of shared data of a
node, but not about specific filenames / content descriptors (e.g.
SHA-1 values). That is one of the reasons why there is no present
research data (I think) of the kind you are looking for.
I can think of three ways for you on how to gather the data you want:
1) run a gnutella ultrapeer or eMule/eDonkey Server and collect
information about your leafs / clients. (should be well doable)
2) ask a software provider to add specific monitoring features to a
release, so that you can gather feedback (very unlikely)
3) watch the traffic your node forwards for others and/or issue random
3-letter search queries (e.g. mp3, avi, mpg, zip, ...) and sample what
kind of results you get.
Sorry for the bad news, and good luck!
Philip
|