Google Answers Logo
View Question
 
Q: Hierarchical List of Geographical Name Places ( Answered 5 out of 5 stars,   8 Comments )
Question  
Subject: Hierarchical List of Geographical Name Places
Category: Reference, Education and News > General Reference
Asked by: tagger-ga
List Price: $100.00
Posted: 21 Sep 2002 17:47 PDT
Expires: 21 Oct 2002 17:47 PDT
Question ID: 67684
I am looking for a structured, hierarchical list, in electronic
format, of continents, countries, states (where applicable) and their
main cities.
Also, it should include aliases for these names.
This list should be publicly available for free download or access.

For example, parts of this list should show:
----------
North America
  United States [U.S., U.S.A., US, USA, United States of America]
    California
      Los Angeles
      San Francisco
      …

Middle East
  Israel [il]
    Jerusalem
    Tel Aviv [Tel-Aviv, TA, TLV]
    …
----------

This list would ideally be in XML format or some kind of CSV file
(although any other form that can be parsed to derive the above
information is fine too).

Best regards,
Uri.

Request for Question Clarification by rbnn-ga on 21 Sep 2002 20:00 PDT
Why is "Middle East" listed in your sample answer? It does not belong
to any of the categories listed in your question.

Clarification of Question by tagger-ga on 21 Sep 2002 21:45 PDT
It is one of the top level continents/regions. For example, if you
look at http://directory.google.com/Top/Regional/ you'll see it listed
just before "North America". In essence what I need is something like
the Regional hierarchy that I've just pointed to, or that of Yahoo.
However, I need it without all the extra categories there such as
"Business and Economy","A,B,C,...", etc. - Just the pure hierarchy of
geographical place names.
Thanks.
Answer  
Subject: Re: Hierarchical List of Geographical Name Places
Answered By: kyrie26-ga on 22 Sep 2002 00:57 PDT
Rated:5 out of 5 stars
 
Hello tagger-ga,

Thank you for your question. After searching the Web high and low on
various search terms without success (see search terms below), I found
the perfect site that has the data you are looking for :

World Gazetteer : population figures for cities, towns and places
http://www.gazetteer.de/

"This site provides statistics about current population of countries,
their administrative divisions, cities and towns as well as images of
the current national flags."

The data you seek can found in this downloadable file :

Largest towns (all places with a population above 100 000, as a
tab-separated text file compressed to zip 241k)
http://www.gazetteer.de/st/cities.zip

The fields available include country, administrative district (state),
city, and other details like population, latitude, longitude. HOWEVER,
region was not included.

I proceeded to import this tab-separated file into Excel, trimmed out
unnecessary columns, and then inserted a new column (field) for region
(making sure to double-check for correctness and spelling). Where
there was any doubt as to which region a country belonged to, I
consulted the CIA World Factbook 2001 at [
http://www.odci.gov/cia/publications/factbook/index.html ].

I then rearranged the columns to match the relevance to your question.
There are 5,400+ records in this file, containing all the information
you asked for except for alternate names.

From my experience as a web developer and database
designer/programmer, the tab-separated format (TSV) is the best format
to use because of its portability. I presume you will be using this
data in a database-enabled application (not using a database is a bad
idea for this volume of data), so therefore with a database you will
be able to generate the listing in any form you wish, including a
hierarchical listing where there are no repeated instances of names
(currently there are row repeats of regions and countries in the
file). Also, this current "redundant" format that includes repeats can
easily be "normalized" in a database table. For example, you will
convert all instances of "Africa" to "1". Another table will contain
the reference codes for all countries. This saves you table space and
allows you to perform relational actions.

If this list of 5400+ cities is too large for your needs, you can use
the Auto Filter function (in Excel) to limit the listing to cities
above a specified population. This will give you a smaller list with
larger cities. Feel free to experiment to find the group you are
looking for.


I have made this file available on my Web server at :

http://www.quikpublish.com/cities_x.zip   (195Kb)

This is a Zip file that contains an Excel file (.xls). You can export
a TSV (tab-separated values) or CSV (comma-separated values) file from
this, for importing into a database.


Google Search Terms :

+-----------------------+

Unsuccessful :

geographical place name list
xml geography
gml data
gml cities data
gml hierarchy OR hierarchical data
xml city OR cities list
xml country OR countries city OR cities list OR data
hierarchical city OR cities
hierarchical region city OR cities list OR listing
hierarchical world region city OR cities list OR listing
major cities by country by region
major cities hierarchical list
geographic place names
world OR global geographic place names
download free world OR global geographic place names
geographical place name list

+-----------------------+

Successful : 

political place name list
://www.google.com/search?q=political+place+name+list&hl=en&lr=&ie=ISO-8859-1&safe=off

+-----------------------+



I hope this is what you are looking for. If you need further
assistance, for instance with data format conversion, please do not
hesitate to request for clarification and I will do my best to help
you. Thank you for using Google Answers!


Best regards,

kyrie26-ga

Request for Answer Clarification by tagger-ga on 22 Sep 2002 12:30 PDT
Thanks for the answer. 
Indeed I've also tried seeking high and low with no success (many of
my queries were similar to your unsuccessful ones...).
With regards to the development tips: I'm a web developer myself, and
seasoned with DBs, so I'm covered there :)

There are two caveats in your answer, however:
1) Some of the names are in their native language (phonetically). For
example, Jerusalem appears as Yerushalayim. This is not precisely what
I need, since I need to cross this with another DB, where the names
are in English. (Note that in the example I've provided, the name was
Jerusalem).
2) As you've pointed out, there is no list of alternate names which is
also crucial for me (and may even solve caveat number 1). While the
file you have supplied me with is indeed a long way from where I was,
it's still a short distance from where I want to be, as described in
my original question.

So, I'll be happy if you can try to find the additional list of
alternative names (it can be from another source, and I'll be happy to
cross-reference it with this one as long as there is a deterministic
way of doing that).

I'll also try to find this, and if I do find it before you I'll let
you know and I'll regard your answer as a complete one.

Best regards,
Uri.

Clarification of Answer by kyrie26-ga on 23 Sep 2002 09:28 PDT
Hello again tagger-ga,

I've taken a look at the file I downloaded from here :

ADL Gazetteer Development Page 
http://alexandria.sdc.ucsb.edu/~lhill/adlgaz/ 
[Scroll to the middle where it says "List of 5.9 million geographic
names available for download"]
 
Excerpt : "The ADL Project has created a list of all of the names,
both primary and alternative names, from its ADL Gazetteer and is
making it available for download and local use within the limits of
our copyright statement. We anticipate that the list will be useful
for geoparsing applications where geographic names need to be
identified in natural language text. Each entry in the list, one line
per entry, consists of (1) the ADL Gazetteer Identifier for the entry
associated with the name; (2) the name; (3) the date of entry into the
database."

This is a proprietary database that includes geographical feature
names such as "spring well" in addition to political place names.

I've taken a look at it, however it's too large for me to view in
Excel (fields are delimited by the "|" character) or any text viewer.
The first 65,000+ records or so look promising. It looks like there
are phoenetic names (natural language?) in there as well.

As I mentioned in my earlier comment, it may be possible to match one
place name to its other variants through the proprietary primary key.
You would run a one-time job to build a similar "variant place names"
table in your own database using your own primary key (as the foreign
key), using a given place name for each record to drive the search.
The end result is a table that is a subset of the above file, relevant
to your records, and using your own ID key scheme. An application
could go looking for variant names in this new subset table using the
key from the given place name.

Possible problems at this stage :

1) The file may be too large for your computing resources to handle.

2) The phoenetic scheme from our first World Gazetteer file may not be
consistent with what's in the file, resulting in very few matches. At
this point we don't know until we actually run the subset build job.
My hunch is that this is not a problem because it is such a huge file
and looks comprehensive.


At this point I would encourage you to have a look at this file, and
see if you can use it. Again, you may have further questions, so don't
hesitate to request clarification again. Let me know what you need.


Good luck,

kyrie26-ga

Clarification of Answer by kyrie26-ga on 09 Oct 2002 07:10 PDT
Here's the latest on my correspondence with World Gazetteer :

+--------------------------+

> Thanks for your reply. Would you be kind enough to point me to a version of
> that complete cities file that includes a field/column for the alternative
> place names? I would be very grateful if you have this. Thank you very much
> for your time.

> > for each country there is an entry "alternative place names" that also
> > contains these information.
> > It is only in html format for each country, but usually calculation
> > programs like excel can import html data.

e.g. for Israel it would be:
http://www.world-gazetteer.com/a/a_il.htm

There is currently no file that contains all names for all countries but I am
working on it.

Stefan

+--------------------------+

How're things on your end, tagger-ga?


kyrie26-ga

Clarification of Answer by kyrie26-ga on 10 Oct 2002 13:14 PDT
Good news tagger-ga!

New correspondence from Stefan, the World Gazetteer site owner :

+-----------------------------------------------------------+

I have uploaded a zipped file that contains all the alternative 
place and region names of my db:
http://www.world-gazetteer.com/st/altnames.zip
A first and second glance at the (program generated) list
did not reveal any errors. Before the next official update
of the World Gazetteer I will have a deeper look at this file.

For characters with diacritical marks I have used the
html-spelling, like ϧ where 999 is the unicode number
for the relative character (I have used this system in other
download files as well). The list also contains the place
names without diacritical characters.

+-----------------------------------------------------------+

I have taken a look at this file. You may have to strip some html tags
from the "Alternative Name" column for your use. There is a column
"Primary Name Basic" that holds the original place names without any
"html-spelling" (vs. Primary Name contains this notation). The "cc"
column is country code.

Hope this helps.


kyrie26-ga
tagger-ga rated this answer:5 out of 5 stars
Excellent answer and dedication to the final goal.

Comments  
Subject: Re: Hierarchical List of Geographical Name Places
From: kyrie26-ga on 22 Sep 2002 14:25 PDT
 
Hi tagger-ga,

Just to let you know, I am working on the clarification. I have
contacted the owner of the World Gazetteer site and am awaiting his
reply. Will let you know as soon as I find something solid. Thanks for
your patience.


kyrie26-ga
Subject: Re: Hierarchical List of Geographical Name Places
From: kyrie26-ga on 22 Sep 2002 20:32 PDT
 
tagger-ga,

We may be in luck. Check out this link :

ADL Gazetteer Development Page
http://alexandria.sdc.ucsb.edu/~lhill/adlgaz/

Scroll to the middle where it says "List of 5.9 million geographic
names available for download".

Excerpt : "The ADL Project has created a list of all of the names,
both primary and alternative names, from its ADL Gazetteer and is
making it available for download and local use within the limits of
our copyright statement. We anticipate that the list will be useful
for geoparsing applications where geographic names need to be
identified in natural language text. Each entry in the list, one line
per entry, consists of (1) the ADL Gazetteer Identifier for the entry
associated with the name; (2) the name; (3) the date of entry into the
database."

What this tells me is that all variant names for a particular place
name will be identified by a unique ID. Which means that if you can
identify one of the names from the set (from the other database), you
can also identify the rest in the set (through the ID).

The only catch is that this downloadable file is 55.1Mb, and with my
modem connection I won't be able to do anything until I download it
(should take the whole night). Hopefully you have a faster connection
and can take a look. Well, no doubt as to the comprehensiveness of the
data at 5.9 million geographic names!

Have a look at it if you can, in any case I will inform you of my
findings ASAP. Thanks!


Cheers

kyrie26-ga
Subject: Re: Hierarchical List of Geographical Name Places
From: shemahansky-ga on 23 Sep 2002 00:11 PDT
 
Try www.cities.com - Choose the continent in the center of the page
(not from the bar on the right). Then country and then cities.
Successful query seems to be - cities country (I tried "cities by
country")
And let me know if you could use it.
Kira
Subject: Re: Hierarchical List of Geographical Name Places
From: shemahansky-ga on 23 Sep 2002 00:39 PDT
 
Also, please check this:
http://164.214.2.59/gns/html/
and then
http://164.214.2.59/gns/html/cntry_files.html 
(excluding USA and Antarctica, for USA see the link that they provide)
Kira
Subject: Re: Hierarchical List of Geographical Name Places
From: shemahansky-ga on 23 Sep 2002 06:45 PDT
 
http://pweb.jps.net/~davidwong/official.htm
Subject: Re: Hierarchical List of Geographical Name Places
From: tagger-ga on 23 Sep 2002 16:09 PDT
 
Hi kyrie-ga (and shemahansky-ga - thanks for the additional
resources!)
I'm sorry I didn't reply earlier. I have already downloaded and
started to work with the large file earlier today, and I've loaded it
into the database to try and match with the previous file. I think
that the links that shemahansky-ga has provided may also help and I've
loaded some of them into the DB as well. However, I then had to attend
to something else, so I couldn't get far yet. Tomorrow I'll try to
cross these resources and see if I get sufficient matches which are
also deterministic, and I'll update here.
Thanks again,
Uri.
Subject: Re: Hierarchical List of Geographical Name Places
From: kyrie26-ga on 23 Sep 2002 16:46 PDT
 
No problem, Uri. This has actually been one of the most fun and
challenging questions I've worked on, because I am interested in both
databases and geography, especially the concept of hierarchical
groupings. I'm keeping an eye out for other possible solutions to your
problem, and am keeping my fingers crossed that the World Gazetteer
administrator will reply my email. Good luck, and we'll be in touch...


kyrie26-ga
Subject: Re: Hierarchical List of Geographical Name Places
From: tagger-ga on 10 Oct 2002 14:10 PDT
 
Thanks kyrie26-ga, I believe this answer solves it once and for all. I
was away for a while so I didn’t get around to trying to match the
previous tables with one-another, yet just indexing the large table
took a good few days… So it seems I won’t have to do that now.
Uri.

Important Disclaimer: Answers and comments provided on Google Answers are general information, and are not intended to substitute for informed professional medical, psychiatric, psychological, tax, legal, investment, accounting, or other professional advice. Google does not endorse, and expressly disclaims liability for any product, manufacturer, distributor, service or service provider mentioned or any opinion expressed in answers or comments. Please read carefully the Google Answers Terms of Service.

If you feel that you have found inappropriate content, please let us know by emailing us at answers-support@google.com with the question ID listed above. Thank you.
Search Google Answers for
Google Answers  


Google Home - Answers FAQ - Terms of Service - Privacy Policy