A copyrighted ebook (PDF) was purchased and was, without
authorization, made available on the Internet for public access and
free downloading. A professional and expert opinion is needed for
selecting the most realistic and appropriate method for determining
the number of downloads based on available detailed and not-detailed
log files. The professional opinion should include a brief
justification for the selection of the recommended method.
Following is a short description of the case.
The e-book is a PDF download with the file size 155,780 bytes.
Clicking on the applicable link on a search engine page originates
automatically a download. A hit is considered a download when its size
is 155,780 bytes. Hits with a different size are considered hits from
search engine spiders.
There are two different types of traffic log files: 1) Not-detailed
log files showing only the total number of all hits (without
distinguishing hits of downloads and spiders). 2) Detailed log files
showing the total number of downloads and the total number of hits
from search engine spiders.
Not-detailed logs are available for a time period of 21 weeks.
Detailed logs are available for a time period of 8 weeks.
Question:
What is the best approach for determining the number of downloads for
the 21 weeks with the not-detailed log files (or is there another more
realistic approach than the two mentioned below):
1) Based on data from the 8 weeks detailed logs, calculate the weekly
average of hits from search engine spiders. Multiply this weekly
average by the number of weeks (21) = total hits from spiders for the
21 weeks. For the time period of 21 weeks: total hits minus hits from
spiders = number of downloads for the 21 weeks.
Note: This approach is based on the assumption that the frequency of
visits from search engine spiders is pre-scheduled by search engines,
is roughly consistent, and is not directly dependent on, or related to
the fluctuation of other traffic to the pdf-file. Larger fluctuations
in the number of hits are originated by an increase of visits from the
public, and not from spiders.
2) Based on data from the 8 weekly detailed logs, calculate the
percentage (%) of hits from downloads. Apply this percentage to the
total number of hits during the 21 weeks in order to calculate the
number of downloads.
Note: This approach is based on the assumption that the frequency of
visits from search engine spiders is directly related to and dependent
on other traffic to the pdf-file, or in other words: an increase or
decrease in hits/visits from the public (downloads), would
originate/cause the same percentage of increase or decrease from
visits/hits from search engine spiders.
For more details, please see
http://www.rcglobal.com/ansto-copyrightinfringement.htm |