Hi,
The answer to your first question is, most do roughly same thing
group all requests from a 'user' within a timeoout period, into a
single visit.
I only managed to find a few packages that actually went far in
describing their methods, below are some of those descriptions.
Wusage:
A "visit" consists of one or more accesses made by the same visitor,
with no more than a certain time interval between accesses. The
maximum time interval is termined by the Max. Minutes Between Accesses
(trailtimeout) option.
The identity of the visitor is determined by combining the authorized
user name (when available), HTTP user-identifying "cookies", site (IP
address), operating system, and web browser identifying information in
order to produce the most unique "key" possible. Any such fields that
are not actually available are not used. In the simplest case, where
the log file does not contain any other user-identifying information,
only the site (IP address) of the visitor is used. When cookies are
present, they override all other factors.
When the maximum time interval has elapsed, the visit is considered to
be over, and the next access by that visitor begins a new visit.
http://www.boutell.com/wusage/8.0/definitions.html
Webalizer:
Visits occur when some remote site makes a request for a page on
your server for the first time. As long as the same site keeps making
requests within a given timeout period, they will all be considered
part of the same Visit. If the site makes a request to your server,
and the length of time since the last request is greater than the
specified timeout period (default is 30 minutes), a new Visit is
started and counted, and the sequence repeats. Since only pages will
trigger a visit, remotes sites that link to graphic and other non-
page URLs will not be counted in the visit totals, reducing the number
of false visits.
http://www.webalizer.org/webalizer_help.html
NetTracker:
A visitor is a person viewing a Web site. If your Web site does not
use cookies or if the visitor does not have a cookie, a visitor is
defined as a unique combination of a user agent and a host name or IP
address. If your site uses cookies sent by the Sane Web Server
Plug-in, a visitor would be defined by the cookie transmitted by the
visitor's browser. NetTracker can also be configured to define
visitors based on their HTTP authenticated user name or a parsed
parameter.
http://www.sane.com/support/NetTracker/faq.html#visitview
Two of the three seem to make a unique user key from the available log
information (typically IP Address and User Agent). I suspect that
others do the same, only do not really discuss that well in their
documentation.
That also addresses question two a number of applications generate a
user key from IP Address and other header information, this is a means
of individualising users that are visiting through a proxy server,
however that is obviously still not going to be perfect.
I suspect that this approach is probably used in the majority of log
analysis software, however other programs don't detail their methods
well.
Webalizer is a little unclear about it's methods, but it's use of the
term 'site' in the above passage indicates that it's visitor counting
is based solely upon IP Address.
As for your third question, I have not really found any analysis
software that claims to alter the actual log files in any way.
However, I did find this quite basic Awk script, that generates a
basic break down of visitors into a simple log from, from IIS logs. It
seems quite straight forward, and may provide a good starting point to
a simple script to do what you want:
http://alan-ng.net/scripts/visits.htm
I hope this helps.
Regards,
Sycophant-ga |