Google Answers Logo
View Question
 
Q: Dow Jones Global Classification Standard (Industry and Symbol List) ( Answered 5 out of 5 stars,   0 Comments )
Question  
Subject: Dow Jones Global Classification Standard (Industry and Symbol List)
Category: Business and Money > Finance
Asked by: mikeu-ga
List Price: $20.00
Posted: 26 Jun 2002 10:33 PDT
Expires: 26 Jul 2002 10:33 PDT
Question ID: 33603
I'm looking for a webpage or downloadable file that contains a list of
all of
the Dow Jones defined Industries/Sectors, and a list of the stock
symbols that comprise them.

If you go to the following page you can see a breakdown of the various
sectors/groups:

http://djindexes.com/jsp/giClassification.jsp


You can also view a hierarchy of the groups at:
http://cbs.marketwatch.com/tools/industry/default.asp?siteid=mktw&bcind_ind=bc_all&bcind_period=3mo

If on this page I were to click on say Furnishing and Appliances I
could then select the link to 'Show All stocks in this industry' which
would take you to:
http://cbs.marketwatch.com/tools/industry/stocklist.asp?siteid=mktw&bcind_ind=ftr&bcind_period=3mo

I need a file that gives me a list of all of the stocks symbols
categorized by industry - note that some industries have more than one
page ie: 1-50; 51-100 etc.

I can deal with any format, XML, text, etc.  Its important that I can
get this information from ONE location and thats its organized in a
heirarchical format with all of the symbols that exist for that
industry/sector listed.
Answer  
Subject: Re: Dow Jones Global Classification Standard (Industry and Symbol List)
Answered By: rhansenne-ga on 27 Jun 2002 11:48 PDT
Rated:5 out of 5 stars
 
Hi mikeu-ga,

Since I didn't find a central list containing all the Dow Jones stocks
on a one page, I wrote a small program that retrieves the info from
the CBS site you listed. It was definitely a challenge to come up with
a program that parsed out all the required data correctly and it took
quite some time. I won't provide the code behind it here, as it was
programmed in a quick'n dirty fashion. It did the job however, and you
can find an XML file containing all the Dow Jones stocks, in
hierarchical order from these locations:

Zipped (129k):
http://users.pandora.be/rami/dow/dow_jones_stocks.zip

Unzipped (740k):
http://users.pandora.be/rami/dow/dow_jones_stocks.xml

The structure of the XML will be obvious to you as soon as you take a
look at the file, since it’s a very simple hierarchical structuring,
so I won’t bore you with a DTD or XML Schema.

If you know a little about parsing XML (and from reading your question
I believe you do), it’s quite simple to, for instance, store the data
in a database, or use it in an application, directly from within the
XML. Remember however that, conform to the XML specs, the ‘&’ is
replaced by ‘&’ everywhere in the XML file.

If you have any questions about the files, please ask for a
clarification!

Hope this helps you along,

Kind regards,

Rhansenne-ga.

Request for Answer Clarification by mikeu-ga on 02 Jul 2002 05:50 PDT
Awesome job, I thought of taking the same approach but I never done
anything as 'recursive' as this required.  I'd love to see you code
some time just to get some pointers.   What did you write it in?

Clarification of Answer by rhansenne-ga on 02 Jul 2002 07:57 PDT
Hi again,

I wrote the program in Java, my programming language of choice.
Unfortunately I removed the code a day or two ago. Anyway, it was
coded quite 'dirty' in the sense that a change of layout to the site
might have rendered the parser useless. But since the program was only
meant to run a single time, I didn't want to spend too much time on
making it more generic.

The technique is quite similar to those bots that scan webpages to
extract e-mail addresses.

The parser starts at the cbs page with the hierarchy of groups. First
task is to parse out all the industries and recognize their level in
the hierarchy. This can be achieved by recognizing the indentation
(for instance “WIDTH="20"”) and whether or not there are boldface-tags
around the name. Once the names and codes are retrieved the program
uses the codes as parameters to get the page with all the stocks in
the current industry (retrieving the page source of a url as a String
is quite easy in Java). Here again the stock names and codes are
parsed out, based on the format of the surrounding tags.

For each of these pages the parser then has to detect whether or not
there are links to follow-up pages present, and if so, recursively
follow them. The problem I faced here was avoiding an endless loop
(since previous pages shouldn't be retrieved again). Some extra checks
on the params corrected this.

Anyway, I had a lot of fun developing the parser (I learned quite a
bit myself ;-) )

Kind regards,

rhansenne-ga.
mikeu-ga rated this answer:5 out of 5 stars
Exactly what I was looking for.

Comments  
There are no comments at this time.

Important Disclaimer: Answers and comments provided on Google Answers are general information, and are not intended to substitute for informed professional medical, psychiatric, psychological, tax, legal, investment, accounting, or other professional advice. Google does not endorse, and expressly disclaims liability for any product, manufacturer, distributor, service or service provider mentioned or any opinion expressed in answers or comments. Please read carefully the Google Answers Terms of Service.

If you feel that you have found inappropriate content, please let us know by emailing us at answers-support@google.com with the question ID listed above. Thank you.
Search Google Answers for
Google Answers  


Google Home - Answers FAQ - Terms of Service - Privacy Policy