Google Answers Logo
View Question
 
Q: Newsgroup Usage ( Answered 5 out of 5 stars,   2 Comments )
Question  
Subject: Newsgroup Usage
Category: Computers > Internet
Asked by: rook-ga
List Price: $8.00
Posted: 02 Nov 2002 15:13 PST
Expires: 02 Dec 2002 15:13 PST
Question ID: 96806
Hi all!  I hope this question is easy and fun to answer... 
 
About how many people post non-binary questions or comments to english
language NNTP newsgroups everyday?
 
Your answer can be an approximation based on other statistics. 
Thanks!!!
 
-JB

Request for Question Clarification by ephraim-ga on 02 Nov 2002 19:05 PST
JB,

I've found a web site which provides raw data that can easily be used
to create the statistics that you're looking for, if you take the time
to go through it. Unfortunately, I do not currently have the tools to
do this myself.

Would you like me to answer the question by posting a link to this
site?

/ephraim

Clarification of Question by rook-ga on 03 Nov 2002 16:09 PST
No thank you, Ephriam, unless you could be more specific about the
"tools" that would be required.  I am just looking for a close
approximation based on published facts/figures, perhaps from stat
sites like the one you are thinking about.  I am impressed with
skbenja-ga's comment (thanks steve!).  But is 902,199 the number for
English language articles only?  I could not hit the site referenced
(.su = Soviet Union?).  Thanks in advance.

Request for Question Clarification by ephraim-ga on 03 Nov 2002 19:57 PST
I can provide you with statistics that include the number of posts and
number of people posting per newsgroup per day, for each specific
newsgroup in the available hierarchies. You would need to add up the
numbers by hand, or get software capable of doing so for you. (FYI, it
would probably be a rather simple piece of software that could do this
for you -- you just need something that can parse and add the
results.)

What I have in mind didn't specifically mention that it excluded
binary postings, but I would work under the assumption that the vast
majority of postings outside alt.binaries, etc. are probably plain
text.

Is this something that would interest you?

/ephraim

Clarification of Question by rook-ga on 03 Nov 2002 20:34 PST
What do you mean by "available hierarchies?"

Request for Question Clarification by ephraim-ga on 03 Nov 2002 20:59 PST
Rook,

NNTP-based newsgroups, commonly known as Usenet, are subdivided into
various hierarchies. The "Big-8" are comp.*, humanities.*, misc.*,
news.*, rec.*, sci.*, soc.*, and talk.* . There are hierarchies as
well, the largest of which is alt.* with less "official" rules than
the Big-8, and smaller hierarchies intended for specific countries,
languages, or even companies.

I searched for alt.*, rec.*, humanities.*, and soc.* using the tool
which I have in mind, and managed to get statistics for each group in
those hierarchies. I did not search for any others, but I would
imagine that they exist as well.

/ephraim

Clarification of Question by rook-ga on 04 Nov 2002 13:39 PST
that sounds good, ephraim.  thank you for your patience, candor, and help. 
 
-jb
Answer  
Subject: Re: Newsgroup Usage
Answered By: ephraim-ga on 04 Nov 2002 15:24 PST
Rated:5 out of 5 stars
 
Rook,

The utility that you need is at

	http://netscan.research.microsoft.com/Static/Default.asp


You'll notice that that the default page is for a listing of all
newsgroups in microsoft.* which contain the word "windowsxp". You'll
also notice that the default statistics are set to "monthly" for all
these groups. You have the option of choosing daily, weekly, or
monthly statistics for dates of 9/30/2002 or earlier. (I've no idea
what date they started posting these statistics.)

As I've already said, the main "official" Big-8 hierarchies are soc.*,
news.*, rec.*, humanities.*, sci.*, comp.*, talk.*, and misc.* . The
largest hierarchy, though, is alt.*, which isn't officially part of
the Big-8, but still contains the majority of traffic going through
Usenet. One of the reasons for this is that it's much easier to create
a group in alt.* than anywhere else. The other reason is that alt.* is
where most of Usenet's binary newsgroups exist. Since you're not
interested in binary data, I'll show you how to skip them later. If
you need assistance with the basics of what Usenet is and how it
works, I've included links further down in this answer.

If you want to see a traffic breadown of "all" of Usenet in both its
official and unofficial meanings, click on the link labelled "Tree
Map" at the top of the Netscan default page. This lets you see a map
of the size of Usenet and should give you a visual picture of which
hierarchies you want to examine. Here's a direct link to this page:

	http://netscan.research.microsoft.com/Static/treemap/nsbox.asp


In any case, let's say you want to get a listing of all groups under
"rec.*" for 8/25/2002.

1) click on "Day" so that you set the listing type to daily.
2) click on the button to the left of "starts with" so that you've
selected the starts with method of searching.
3) type "rec" into the box to the right of "starts with"
4) change the date to 8/25/2002
5) click "Show Data"


You should now see a listing of 639 newsgroups. The first newsgroup in
the list should be "rec.autos.sport.nascar". Based on the data shown,
there were 995 posts in this newsgroup on 8/25/2002. A total of 180
different people/posters/e-mail addresses made all these postings. If
you've done it all right, you should get a page which looks like this:

http://netscan.research.microsoft.com/Static/default.asp?NGType=D&SearchType=0&NGSearch=rec&orand=or&NGSearch2=&SearchDate=8%2F25%2F2002&gd=Show+Data


FYI, it seems that the data may be more accurate as you choose dates
closer to 9/30/2002. Your mileage may vary. Now, what does one need to
do in order to find out the exact number of posts made to all groups
in rec.*? Just add up all the numbers in the columns. If you want to
find out the total number of posters, than add up the posters for each
group. Obviously, there may be people who post messages to more than
one group in one day, but the number you get should be correct in
terms of the right exponential order. (I'd guess that you'll probably
want to estimate by dividing the number by 2 -- or, just skim the
PPRatio column and use that to make up a good ratio that you can
multiply your results by to get something approximately correct.)

The one really big bugaboo you might have is doing this for alt.* as
when I tried it, I got more than 7000 newsgroups listed on one page.
This is a bad thing, as it may use up your computer's memory. If
you're searching for groups in alt.*, I suggest breaking them down by
letter of the alphabet, i.e. typing "alt.a", "alt.c", "alt.d" into the
"starts with" box. Just skip all of "alt.b" since it's probably 90%
binary groups.

Here's another statistics links which may be of use to you:

[Graph of daily bytes and articles received by newsfeed.mesh.ad.jp]
http://newsfeed.mesh.ad.jp/flow/index.html


Finally, here are some links that should provide you with some
information about how Usnet works. I hope that you find these useful:

[Usenet Access Guide]
http://www.geocities.com/ResearchTriangle/Lab/1131/ua.htm

[Usenet the global watering hole]
http://www.cs.indiana.edu/docproject/bdgtti/bdgtti_7.html

[The Big-8 hierarchies]
http://tgos.org/newbie/newsgroups2.html


The following search will point you at other statistics pages for
Usenet. I've browsed through a handful of these, but you may want to
examine these more closely to see which are most useful. Be aware that
some of these pages are outdated and they haven't been updated in
months/years.

://www.google.com/search?as_q=usenet+statistics&num=100&hl=en&ie=UTF-8&oe=UTF-8&btnG=Google+Search&as_epq=&as_oq=&as_eq=&lr=&as_ft=i&as_filetype=&as_qdr=all&as_occt=any&as_dt=i&as_sitesearch=&safe=images


I hope this has given you the information you need. If any of this is
unclear, feel free to ask.

/ephraim

Clarification of Answer by ephraim-ga on 04 Nov 2002 15:45 PST
One other point I should make....the Netscan utility lists groups from
largest to smallest. It's safe to assume that the majority listed are
very low traffic. In the rec.* example I gave above, only groups #1 -
#101 had more than 100 postings on 8/25/2002. The other 439 groups had
less than 100 postings, and about 250 groups had less than 10 postings
on that day. So, you can probably get a good estimate by just looking
at very top of the lists provided.

/ephraim

Clarification of Answer by ephraim-ga on 07 Nov 2002 06:49 PST
Rook,

Thanks for the tip!

Yes, your logic seems correct.

If somebody (rudely) posts a uuencoded binary image outside of a
binary group, then I'm not sure how to force it to appear in the
statistics. It is certainly possible that large news sites might have
some type of filtering software to prevent large uuencoded binary
images from appearing in text-based discussion groups, but I do not
know the details.

Any group with binary images should have the word "binary" in its
name. In fact, you can use the Netscan utility to get a listing of all
groups with "binary" in the title by just checking the "contains" box
and searching for the word "binary". You'll notice from the results
that at least 95% of the binary groups are in alt.binaries* ! The few
others that appear are from comp.binaries* , or regional groups like
de.binaries*.

Good luck!

/ephraim
rook-ga rated this answer:5 out of 5 stars and gave an additional tip of: $3.00
Thanks,Ephraim!  Your answer was fabulous.  Here's how I got my final
answer... I called NewsAdmin.com and spoke to a sales rep.  He said
that they accept about 1.5 million new messages per day (not including
rejected spam).  Then, I estimated the number of UUENCODED messages
per day by using Netscan.com (alt.bin* for the month of Sep.).  I was
able to add the fields by selecting all, copy/paste to notepad,
deleting extraneous data from top and bottom, opening with Excel
(delimited using spaces), and then summing the column (articles per
day).  I took that number (7,919,573), divided by 30 (263,985 per day
in Sep.), and subtracted that number from 1.5 million.  My estimated
answer was 1,236,015 (non-spam, non-binary) NNTP articles per day (or,
17.6% are UUENCODED).  Do you think this is a good guess?  Has my math
or logic failed anywhere that you can see?  Do you have any idea how
many UUENCODED messages per day exist outside of alt.bin*?  Thanks
again, Ephraim!

Comments  
Subject: Re: Newsgroup Usage
From: skbenja-ga on 02 Nov 2002 21:20 PST
 
There are well over 25,0000 newsgroups -- but one server may only
carry 12,000 of them and some many only carry a few hundred.  Some
newsgroups are privately hosted on one or two servers and not
propogated through the network.

Refer to: http://www.smr-usenet.com/tech/how.shtml to see how usenet
works!

Messages are passed from one server to another until they propogate
throughout the whole usenet network.  If you are posting to
groups.google.com, then your message will get propogated to the
majority of services since it is a very large, high traffic server. 
According to http://news.demos.su/stats-today.html, the average amount
of posts is  902,199 per day at a volume of around 110 gigabytes at
THAT specific server.  Which probably is more less the actual
mainstream usenet volume give or take a few thousand messages.

As for determining which contain binary data there are three ways to
answer your question:

1. ZERO. No usenet message contains binary data.  They are all 7-bit
regular ASCII text.  However, binary data can be encoded into usenet
messages using UUENCODING (refer to
http://www.picksoft.demon.co.uk/pssl/uucode.htm).

2. Look at what's posted in groups like alt.binary, and taking that
out from the total count.  I haven't researched this to the full
extent I could, but I'd say a huge majority of usenet DATA traffic is
in the alt.binary group.  But not all the messages contain UUENCODED
binary data.

3. Look through the 900,000-some odd messages and count the ones that
contain UUENCODED data.  Probably able to be automated but not very
practical.

- Steve
Subject: Re: Newsgroup Usage
From: rook-ga on 04 Nov 2002 13:23 PST
 
that sounds good, ephraim.  thank you for your patience, candor, and help.

-jb

Important Disclaimer: Answers and comments provided on Google Answers are general information, and are not intended to substitute for informed professional medical, psychiatric, psychological, tax, legal, investment, accounting, or other professional advice. Google does not endorse, and expressly disclaims liability for any product, manufacturer, distributor, service or service provider mentioned or any opinion expressed in answers or comments. Please read carefully the Google Answers Terms of Service.

If you feel that you have found inappropriate content, please let us know by emailing us at answers-support@google.com with the question ID listed above. Thank you.
Search Google Answers for
Google Answers  


Google Home - Answers FAQ - Terms of Service - Privacy Policy