I am looking for a complete list of ".org" domain names on the
internet. The list can be self-made (from DNS servers?) or generated
from another source.
The question is answered when a more or less complete list is
delivered in electronic format (I already have 6000 names to check
from).
Thank you in advance,
Kris |
Request for Question Clarification by
pafalafa-ga
on
04 May 2003 10:43 PDT
Hello kris4884,
Nice to see you back again...you sure do have a hankering for large
databases, it seems (I answered your "hotels" question a while back).
Anyway, there are 2 million-plus ORG domain names -- I assume you're
looking for a l-o-o-o-n-g list that contains very simple entries like:
www.NAME.org
and nothing else.
Can you confirm, please, that that's what your after. I'm not sure
how likely it is that a file of all 2 million names can be generated,
but I'm certainly willing to look into it (as are some other
researchers here, I'm sure...).
Thanks.
pafalafa-ga
|
Clarification of Question by
kris4884-ga
on
04 May 2003 11:02 PDT
Hello again!
I remember indeed...
About your clarification request, indeed, I am looking for the looooooong list.
Just make it the best possible...
Thanks in advance...
Kris
|
Request for Question Clarification by
pafalafa-ga
on
06 May 2003 13:38 PDT
Just to let you know, I saw your latest comment, and I'm working on
it...as I imagine others are as well. As soon as one of us can
extract the database that you desire, we'll be sure to let you know.
|
Clarification of Question by
kris4884-ga
on
06 May 2003 21:40 PDT
OK
|
Request for Question Clarification by
pafalafa-ga
on
12 May 2003 08:15 PDT
Yes...I'm working on it, and I'm sure other researchers are as well.
Problem is, getting access to the data itself involves a snail-mail
step and approval of an application by PIR. The timing is out of our
hands. When I hear back from PIR, I'll be able to tell you then
whether the task is do-able or not.
Patience....patience....
|
Request for Question Clarification by
webadept-ga
on
12 May 2003 17:44 PDT
Hi,
Building the list now. Not going to wait for the PIR everyone else is,
building it from scratch. It is veeerrryyy large :-) Should be done
tonight however. I'll post it as a link to a zipped file, each line an
org domain.
thanks,
webadept-ga
|
Request for Question Clarification by
webadept-ga
on
13 May 2003 07:33 PDT
Hi,
This current list has 112, 738 org domains in text format. The file
can be gotten here:
http://www.lucidmatrix.com/uploads/orgtest.zip
It is not complete, or at least my program has not stopped running,
but I thought after your long wait, you would like something to get
started with, and have some idea of the size of the list you are going
to get.
The way I'm getting this list is using a Perl script to build it, and
a large name list to seek it out. The name list is a combination of
two lists. First is a list of just about all the Proper Names in the
world. At least all of them that can be spelled using my keyboard. It
is about 22, 000 names. The other list is a word list, a dictionary
built using popular novels and writings. With the name list this
builds a 255, 924 word list.
Using each of these words, I'm running the script using the Google
API, and a couple other search engines, using a query like this:
site:.org +yappiness
you can see the results here:
://www.google.com/search?num=100&hl=en&lr=&ie=UTF-8&oe=UTF-8&safe=active&q=site%3A.org+%2Byappiness&btnG=Google+Search
As you can see, only .org sites show up using the site switch, which
is rather nice for this type of search.
After these are gathered, I then dedup the list and put them in
alphabetic order.
The trouble here is that it is taking quite a long time to get through
the whole word list, though I am certain that it will produce a, more
or less, complete list or org sites. I expect the program to be
finished with it's run sometime Friday. If you are okay waiting that
long for the rest of the names, then just post that here and I'll let
the program continue to move forward. If not, keep the list of names
and we'll see if this other source will any faster.
Another thing to note, is that I'm using the "Adult Content" filters
on these searches as well. So the list as it stands will not be
"complete" If you don't want to have this filter, then post that in
your Clarification as well, and I'll change that out. The reason I had
it set is Google and other engines only bring back a maximum of 1000
sites for any given search. To get the maximum "real" org sites, I did
not want the porn orgs taking up those slots.
Thanks,
webadept-ga
|
Clarification of Question by
kris4884-ga
on
13 May 2003 10:40 PDT
OK,
you may post the answer when you have the complete result list.
Can you add in front of the domain names "www" where necessary? Then
the file is ready for immediate use.
Thanks for the help.
For the other researchers, just post a comment when you have the list
and I will post the same question again for the same amount of 100
US$, just for a double check (I really need to have a complete list).
|
Clarification of Question by
kris4884-ga
on
13 May 2003 21:56 PDT
Hello webadept-ga,
I had some problems working out the list.
Could it be possible for delivery to have multiple text file in the
download zip. Each file should have no more than 100.000 domain names.
The zipped can be downloaded in one time.
About the format of the NAME.org, could you deliver each line in the
text file as following: http://www.Name.org/, where the www should be
there, or replaced by f.ex. another name within the domain.
Thanks a lot for taking in account these comments.
Best regards,
Kris
|
Request for Question Clarification by
webadept-ga
on
13 May 2003 23:22 PDT
Yes to both questions. I'll adjust all that when the run is done.
So you want the file broken into smaller files, each with 100 orgs in
them, and then zipped into one download, and the http put in there so
the are more or less full links.
No problem. I'll have this to you as soon as the run is done.
webadept-ga
|
Request for Question Clarification by
pafalafa-ga
on
23 May 2003 10:01 PDT
Hello Kris,
Well, after a lot of bureaucratic hassle with the PIR folks, I finally
got access to the ZONE file. I am now the proud possessor of every
single one of the approx. 3 million goldarn ORG names in existence.
Problem is, this a huge file -- 130 megabytes -- and needs a lot of
clean-up. It's not an easy task (not for me, anyway) but I'm looking
into ways to make it happen.
But even after the data is massaged into the type of list you want, it
will still be way too large to use in a Windows-type desktop
environment. This is really material for a larger system than a PC,
probably operating in a UNIX or Linux environment. Do you have access
to anything like that?
If not, you might want to reconsider just working off Webadept's
initial list of about 100,000 ORG names, since these seem include
pretty much all the major ORG sites.
Anyway...I just wanted to give you an update here. I'll let you know
if I make any progress.
|
Clarification of Question by
kris4884-ga
on
24 May 2003 04:36 PDT
Congratulations pafalafa-ga!
I have access to a rather large SUN machine (university stuff), and
some pretty nice WIN2K servers (quadri, 2Gig RAM) so I (or some of my
collegues) should be able to handle the stuff. These machines should
also be able to cleann-up the list too.
If you could split-up the big file in small files of 100K domains each
and put them all together in one .ZIP file for download, then it would
be great.
Just send me a link where a I can download the zip file, and of course
you may answer the question then.
Thank you for your help!
Kris
|