Google Answers Logo
View Question
 
Q: Need script to run on Mac OSX to pull scores from a website ( Answered 5 out of 5 stars,   0 Comments )
Question  
Subject: Need script to run on Mac OSX to pull scores from a website
Category: Computers > Programming
Asked by: srv007-ga
List Price: $150.00
Posted: 18 May 2006 22:27 PDT
Expires: 17 Jun 2006 22:27 PDT
Question ID: 730293
I need a an executable script (php, Applescript etc) I can run on OSX
to pull scores from the Japan Golf Tour website and returns a text
file to BBEdit or just returns a text file to the desktop and saves it
to a certain folder.

The Japan Golf Tour website can be found at
http://www.jgto.org/jgto/WG01000000Init.do

Unfortuntely the page for the scores changes each day, however it is
always the same link on the front page called "FULL LEADERBOARD". The
name of the link is always the same but the actual HTML link changes.
So to get to the scores page via any script it will need to read the
above page, sift through the source code and go to the URL behind that
link.

Once on the scores page we need the scores pulled out in the following order:

Position Score Player Name (1-2-3-4)

And the rest of the columns are rejected.

However on day one of the tournament, there will only be a score in
the "1" column and there will be dashes in 2, 3 and 4 as those rounds
have yet to be played. On day two there will be a score under "1" and
"2" column but dashes under 3 and 4 and so on. The script will need
some conditional statements to test for this and use the appropriate
round numbers eg if there is one score and three dahses, it's
obviously day one of the tournament.

We then need it to check everyone has finished playing the tournament
for the day and the word "FIN" is in the "Hole" column, if that is not
there on every line then the script needs to alert the user to this
fact and that the tuornament has yet to finish - try again later.

If everythign is right (eg all "FIN") then the data is pulled out in
the following way:

1 -8 Takao Nogami (64)

T2 -6 Akinori Tani (65)
T2 -6 David Smail (65)
T2 -6 Hideto Tanihara (65)
T2 -6 Shingo Katayama (65)

6 -6 Soushi Tajima (66)

T7 -5 Brendan Jones (68)
T7 -5 Eiji Mizoguchi (68)
T7 -5 Hidemasa Hoshino (68)
T7 -5 Hirofumi Miyase (68)
T7 -5 Prayad Marksaeng (68)
T7 -5 Takuya Taniguchi (68)
T7 -5 Tateo Jet Ozaki (68)

T14 -4 Hiroshi Makino (69)
T14 -4 Jong-Duck Kim (69)
T14 -4 Keiichiro Fukabori (69)
T14 -4 Mitsuo Harada (69)
T14 -4 Paul Sheehan (69)
T14 -4 Shigeru Nonaka (69)
T14 -4 Takeshi Kajikawa (69)
T14 -4 Thaworn Wiratchant (69)
T14 -4 Toshihiro Aizawa (69)
T14 -4 Toshinori Muto (69)
T14 -4 Wen-Chong Liang (69)
T14 -4 Yoichi Shimizu (69)

etc etc..............

The numbers in brackets at the end are the round scores. On day one
you get (xx), day two you get (xx-xx) or day three (xx-xx-xx) and so
on.

We need to fill in the missing position numbers from the table - eg
the first column plus need to put a "T" in front of them if there is
more than one and ;eave as is if not eg 1st place or 6th place
outright in the example above.

As well as all that each bunch of players on the same score is sorted
by first name ascending for easy reading.

The final thing is certain players get highlighted by wrapping them in
[b]Player Name[/b]. We would need a place in the script to be able to
add in an array of names...manually is fine.

So when the script is run and it has done everythign and returned the
text the final thing it does is run through the names and wraps the
whole line with the matching player name as in the previous sentence.

Hope that all makes sense.
Answer  
Subject: Re: Need script to run on Mac OSX to pull scores from a website
Answered By: leapinglizard-ga on 28 May 2006 17:10 PDT
Rated:5 out of 5 stars
 
Dear srv007,


I have written a Python script to your specifications. You may download
it from the following location.

plaintext file: japan_golf.py
http://plg.uwaterloo.ca/~mlaszlo/answers/japan_golf.py


The Python interpreter is usually preinstalled on OS X. If this is
not true in your case, or if you want the latest version of Python,
you can install it by following the instructions on the following page.

python.org: Python on the Mac
http://www.python.org/download/mac/


To run the script, execute

    python japan_golf.py outfile.txt

or just

    python japan_golf.py

from the command line. The parsing results are printed to standard output,
and optionally to the file name specified by the second argument, if any.

This code should work as long as the Japan Golf Tour website continues to
display scores in the same format. Due to the idiosyncratic nature of
their HTML output, the code is likely to break if their format changes.

Within the first few lines of the script, you will find an array named
highlight_names, which you can fill with the players' names you wish to
see highlighted. An empty array is written

    []

and an array of one name is

    ['Tiger Woods']

while an array of two names looks like

    ['Tiger Woods', 'Ernie Els']

and so on.

It wasn't clear to me from your instructions whether you wanted only
the player's name highlighted, or the entire line that contains the
player's name.  At present, the entire line is enclosed in [b] [/b]
tags, but I can easily change it so that only the name is boldfaced.

If there are other small changes you would like me to make, or if
you have any trouble running the script, please let me know through
a Clarification Request and give me a chance to fully meet your needs
before you rate this answer.

Regards,

leapinglizard


Search strategy:

macintosh python
://www.google.com/search?q=macintosh+python

Request for Answer Clarification by srv007-ga on 02 Jun 2006 03:26 PDT
Hi LL,
Looks like it may work well. I didn;t get notified of your answer so
only saw it now and just ran it on the round 2 scores but got some
errors as per below.

Thanks for your help.

downloading front page...
leaderboard URL = http://www.jgto.org/jgto/WG02020000Init.do?year=2006&tournaKbnCd=0&conferenceCd=12&round=4
downloading leaderboard...
parsing leaderboard...
Traceback (most recent call last):
  File "japan_golf.py", line 64, in ?
    record = [name] + [int(items[0])] + [items[1].lower()]
IndexError: list index out of range

Clarification of Answer by leapinglizard-ga on 02 Jun 2006 08:24 PDT
I've made a small change to the code. Please download the new version
and try it out.

japan_golf.py
http://plg.uwaterloo.ca/~mlaszlo/answers/japan_golf.py

leapinglizard

Request for Answer Clarification by srv007-ga on 02 Jun 2006 23:22 PDT
Thanks mate...we are getting there. YOu left in a double up of lines
at the end of the script so I deleted that and then the script went
through but died again. They had already started the third round and
there were blank parts in some columns I think before the whole field
goes out. Probably not a problem in real life as we would only be
runnign it at the end of the day. Although if it can be fixed that
would be great.

Anyway, we will wait until the third round ends today and try it fully.

downloading front page...
leaderboard URL = http://www.jgto.org/jgto/WG02020000Init.do?year=2006&tournaKbnCd=0&conferenceCd=12&round=4
downloading leaderboard...
parsing leaderboard...
Traceback (most recent call last):
  File "japan_golf.py", line 64, in ?
    record = [name] + [int(items[0])] + [items[1].lower()]
IndexError: list index out of range
stevieray:~/Desktop charlie$ python japan_golf.py outfile.txt
  File "japan_golf.py", line 113
    else:
       ^
SyntaxError: invalid syntax

Request for Answer Clarification by srv007-ga on 03 Jun 2006 00:47 PDT
Waited for the third round to finish. All looks ok script wise as it
works but it's taking "Todays" score rather than the "Total" score in
the second column. The second part of our scores are always the Total
no matter what day..

So we need to change it from grabbing the 4th column to the 2nd column.

All "Total" scores of 0 need to be changed to "Ev" as in Even. ASo
coming down the list you would have a few players at +2 after three
rounds, then a few at +1 and then probably a few at 0...that needs to
say "Ev".

Thanks

Request for Answer Clarification by srv007-ga on 03 Jun 2006 00:49 PDT
Seems also that just the player name is highlighted - need that to be
the whole line pls.

Request for Answer Clarification by srv007-ga on 03 Jun 2006 00:54 PDT
Whoops also just noticed that most of the positions are incorrect too.
Most do not match what is on the scoreboard at all.

Yours
-----------------------------
1 -8 Hideto Tanihara (67-69-63)

2 -5 Shingo Katayama (68-67-66)

3 -2 Mitsuo Harada (71-66-69)

4 -3 Kiyoshi Murota (70-69-68)

T5 -2 Kaname Yokoo (70-70-69)
T5 -2 S K Ho (70-68-69)

7 0 Taichi Teshima (67-71-71)

8 -4 Wei-Tze Yeh (71-72-67)

9 -2 Azuma Yano (72-69-69)

10 0 Toshimitsu Izawa (71-68-71)

T11 -2 Shoichi Ideguchi (70-72-69)
T11 -2 Toshikazu Sugihara (72-70-69)

-------------------------------------
Proper Scores
-------------------------------------
1	-14	Hideto Tanihara	-8	FIN	67	69	63	-	199
2	-12	Shingo Katayama	-5	FIN	68	67	66	-	201
3	-7	Mitsuo Harada	-2	FIN	71	66	69	-	206
4	-6	Kiyoshi Murota	-3	FIN	70	69	68	-	207
	-6	S K Ho	-2	FIN	70	68	69	-	207
6	-4	Kaname Yokoo	-2	FIN	70	70	69	-	209
	-4	Taichi Teshima	0	FIN	67	71	71	-	209
8	-3	Wei-Tze Yeh	-4	FIN	71	72	67	-	210
	-3	Azuma Yano	-2	FIN	72	69	69	-	210
	-3	Toshimitsu Izawa	0	FIN	71	68	71	-	210
11	-2	Toshikazu Sugihara	-2	FIN	72	70	69	-	211
	-2	Shoichi Ideguchi	-2	FIN	70	72	69	-	211
	-2	Frankie Minoza	-1	FIN	71	70	70	-	211
	-2	Nozomi Kawahara	-1	FIN	73	68	70	-	211
	-2	Tatsuhiko Takahashi	+2	FIN	70	68	73	-	211

Request for Answer Clarification by srv007-ga on 06 Jun 2006 17:32 PDT
Waiting on answers for the above? Script is unusable as is.

Clarification of Answer by leapinglizard-ga on 06 Jun 2006 22:49 PDT
> YOu left in a double up of lines at the end of the script so I deleted
> that

There was no line doubling in the script, I assure you. I tested the file
locally and uploaded it directly to my web server. Any transcription
errors must have occurred at your end. I recommend that you download
the script directly rather than copying and pasting.


> They had already started the third round and there were blank parts in
> some columns I think before the whole field goes out. Probably not a
> problem in real life as we would only be runnign it at the end of the
> day. Although if it can be fixed that would be great.

I can certainly work on this. It would be a great help if you could send
the full error output after a script crash.


> stevieray:~/Desktop charlie$ python japan_golf.py outfile.txt
> File "japan_golf.py", line 113
> else:
> ^
> SyntaxError: invalid syntax

There was no invalid syntax in the file as I uploaded it. Again, the
remedy is to directly download my script rather than attempting the
error-prone copy-and-paste approach.


> it's taking "Todays" score rather than the "Total" score in the second
> column.

I will make this change.


> All "Total" scores of 0 need to be changed to "Ev" as in Even.

Ditto.


> Seems also that just the player name is highlighted

Oh? The script does enclose the entire line in [b] [/b] tags. Is there
a problem with the boldface syntax? Perhaps your viewer is at fault?


> Whoops also just noticed that most of the positions are incorrect

Part of that was caused by the mistaken use of each player's daily score
rather than his total score. The remaining discrepancies are due to your
stipulation that the names of players with identical scores be sorted
alphabetically by first name.


Please download the updated script and give me your feedback.

http://plg.uwaterloo.ca/~mlaszlo/answers/japan_golf.py


leapinglizard

Clarification of Answer by leapinglizard-ga on 06 Jun 2006 22:56 PDT
There is no longer a link to the leaderboard on the front page of the
Japan Golf Tour website. You must therefore enter the leaderboard
address manually by changing the value of the variable board_url near
the beginning of the script. If you know of a reliable way to
automatically discover the location of the leaderboard, I can
implement that, too.

leapinglizard

Request for Answer Clarification by srv007-ga on 07 Jun 2006 06:56 PDT
Thanks will give it a go. No tournament this week so no link on the
front. Defintely downloaded the file last week so who knows what
happened.

Will try when the next tournament is being played.

Request for Answer Clarification by srv007-ga on 16 Jun 2006 05:25 PDT
I tried your latest file - downloaded it directly and I get the
following error with and without outfile.txt

stevieray:~ charlie$ cd Desktop 
stevieray:~/Desktop charlie$ python japan_golf.py outfile.txt
Traceback (most recent call last):
  File "japan_golf.py", line 39, in ?
    board = open('board.html').read()
IOError: [Errno 2] No such file or directory: 'board.html'
stevieray:~/Desktop charlie$ python japan_golf.py outfile.txt
Traceback (most recent call last):
  File "japan_golf.py", line 39, in ?
    board = open('board.html').read()
IOError: [Errno 2] No such file or directory: 'board.html'
stevieray:~/Desktop charlie$ python japan_golf.py
Traceback (most recent call last):
  File "japan_golf.py", line 39, in ?
    board = open('board.html').read()
IOError: [Errno 2] No such file or directory: 'board.html'
stevieray:~/Desktop charlie$

Request for Answer Clarification by srv007-ga on 16 Jun 2006 05:28 PDT
I created an empty text file called board.html but then I get this error.

stevieray:~/Desktop charlie$ python japan_golf.py
Traceback (most recent call last):
  File "japan_golf.py", line 88, in ?
    tie_score = records[0][1]
IndexError: list index out of range
stevieray:~/Desktop charlie$

Clarification of Answer by leapinglizard-ga on 17 Jun 2006 13:05 PDT
Oops! I left the script in debugging mode. If you change the line

  if (0):

to read

  if (1):

it will download the board from the location indicated by board_url.

leapinglizard

Clarification of Answer by leapinglizard-ga on 17 Jun 2006 13:27 PDT
I've made a few small changes to the script. Please download the
latest version from the usual location.

http://plg.uwaterloo.ca/~mlaszlo/answers/japan_golf.py

There is now a constant called DOWNLOAD that should be set to a
non-zero value, such as 1, if you want to download the latest
leaderboard from the web. If you want to reuse the most recently
downloaded leaderboard, for instance to check formatting and other
script updates, then set DOWNLOAD to 0.

Another option is to set the leaderboard URL directly instead of
trying to discover it on the Japan Golf Tour front page. This may come
in handy if you want to download and process the latest leaderboard
and the front page isn't displaying the leaderboard link for some
reason. In such circumstances, you can enter the leaderboard URL
directly by enclosing it in single quotes and setting the board_url
value to it near the beginning of the script.

leapinglizard

Request for Answer Clarification by srv007-ga on 18 Jun 2006 04:59 PDT
Perfect - thanks for all of your help. Works great and I was able to
grab all four days of data by adding them in manually too.

Thanks again
SRV

Clarification of Answer by leapinglizard-ga on 18 Jun 2006 09:09 PDT
You're welcome, and thank you for the five-star rating.

leapinglizard
srv007-ga rated this answer:5 out of 5 stars

Comments  
There are no comments at this time.

Important Disclaimer: Answers and comments provided on Google Answers are general information, and are not intended to substitute for informed professional medical, psychiatric, psychological, tax, legal, investment, accounting, or other professional advice. Google does not endorse, and expressly disclaims liability for any product, manufacturer, distributor, service or service provider mentioned or any opinion expressed in answers or comments. Please read carefully the Google Answers Terms of Service.

If you feel that you have found inappropriate content, please let us know by emailing us at answers-support@google.com with the question ID listed above. Thank you.
Search Google Answers for
Google Answers  


Google Home - Answers FAQ - Terms of Service - Privacy Policy