Google Answers Logo
View Question
 
Q: Scrape Daily Racing Form Charts using Ruby, Rubyful Soup and MySql ($200) ( No Answer,   0 Comments )
Question  
Subject: Scrape Daily Racing Form Charts using Ruby, Rubyful Soup and MySql ($200)
Category: Computers > Programming
Asked by: handicapper-ga
List Price: $200.00
Posted: 13 May 2006 17:23 PDT
Expires: 12 Jun 2006 17:23 PDT
Question ID: 728556
****Background****
I'm developing an application that utilizes Rubyful Soup to scrape the
daily racing form charts and insert the scraped data into some MySql
tables. Given a url, the application will go to the specified race
card and scrape the related race charts. The program will start by
processing the first race and it will scrape race data, running line
data and race call data identified below and then insert the data into
corresponding tables. When the first race is completed, the program
will continue to the second race. The process will be repeated until
all races are processed. Typically there are 8 to 12 races per race
card.

*** Requirement ****
Develop a script/program using Ruby and Rubyful Soup that given a URL
for a race card, will scrape the race, running line and race call data
items identified below for each chart that comprises the race card and
insert scraped data items into the appropriate MySql table.


***Terminology *****

A chart describes a particular race, at a particular track on a particular day. 

A race card is the set of race charts for a particular day and track. 

****Rescources*****

Rubyful Soup: http://www.crummy.com/software/RubyfulSoup/

Daily Racing form Chart: http://www.drf.com/charts/13/cBEL13.html?rn=212054

Daily Racing Chart indes: http://www.drf.com/charts/cindex.html


**** Chart Definition *******

Each race chart is comprised of the following elements:
1.  Track Name & Date   *
2.  Race Condition      *
3.  Running Lines[1..N] *
4.  Race Line           *
5.  Results and Payoffs
6.  Winner
7.  Trainer
8.  Scratched
9.  Claimed
10. Comments            *

Sample data and explanations for the pertinent race card elements (*)
are set forth below. The sample data was derived from the first race
at Belmont park for May 13, 2006.
http://www.drf.com/charts/13/cBEL13.html?rn=212054

***Track Name****
Belmont Park
Saturday, May 13, 2006

Track:     Belmont Park
Date:      5/13/2006
DayofWeek: Saturday

- The track name, date and day of week and appear once at the top of
each race card (set of charts).

***Race Number***
1st Race

RaceNumber:    1
- Each race chart starts with a race number.


***Race Condition***
1 Mile  Dirt  ALLOWANCE OPTIONAL CLAIMING PURSE  $46,000  Open  
Value of Race 46000  Value to Winner  27,600  2nd 9,200  3rd 4,600 
4th 2,300  5th 1,380  6th 920 Mutuel Pool $295,494.00   Exacta Pool
$309,392.00   Trifecta Pool $205,001.00

- The race condition includes several data elements. The data elements
and extract definitions follow:

Distance:  1 Mile | 8 (furlongs)       
Surface:   Dirt | Dirt
Type:      Allowance Claiming | Allowance Claiming
Purse:     Purse 46,000 | 46,000
ToWinner:  27600 | 27600

***Sample Running Line and explanation***
Each race will have 3 to 24 running lines, depending on the number of horses.

PN Horse          M  Eq	 Wt  PP	SP  1/4  1/2   3/4   Str  Fin	Jockey	   Odds

6  Touchdown Kid  LB b   124 4  4   2 1  1 ½   1 1½  1 5  1 4¼  Luzzi M J  .60

PN: The horse program number = 6
Horse: The horses name is = Touchdown Kid
M: The horse was medicated with L(asix) and B(ute)
E: The horse races with equipment - b(linkers)
Wt: The horse was carrying 124 pounds
PP: The horse broke from post position 4
SP: The horse starting position right after the break was 4th.
1/4: At the 1/4 pole, the horse was in 2nd, 1 length ahead of the 3rd horse
1/2: At the 1/2 pole, the horse was in 1st, 1/2 length ahead of the 2nd horse
3/4: At the 3/4 pole, the horse was in 1st, 1 1/2 lengths ahead of the 2nd horse
Str: In the stretch, the horse was in 1st, 5 lengths ahead of the 2nd horse.
Fin: At the finish, the horse was in 1st, 4 1/2 lengths ahead of the 2nd horse.
Jockey: The Jockey's name was Luzzi M J
Odds: The horses odds were .60

*** Sample Race Lines ********
Off at 1:00 Start Good for all . Won Driving. Time , :22 4/5,  :45
3/5,  1:10 2/5,  1:37, Clear63. Track: Fast.

Off at: 1:00
Start: Good for all
Won: Driving
Time1: :22 4/5 
Time2: :45 5/5
Time3: 1:10 2/5
Time4: 1:37     
Note: The number of times dependes on the distance of the race
Weather1: Clear
Weather2: 63
Track: Fast

***Sample Comments****
TOUCHDOWN KID quickly showed in front, set the pace while in hand,
drew away  when roused and was kept to a drive to the wire. SEEKING
THE MONEY raced close  up along the inside and rallied on the rail to
get the place spot. HEATHROW  chased the pace while three wide and was
outfinished for the place. DUKE'S  CROSSING was outrun early, came
wide for the drive and offered a mild rally  outside. CHAMPCHU raced
close up early and lacked a rally. HARD IRON was outrun  along the
inside.



*** Database Tables ***
The extracted data should be inserted into one of four MySql tables:
Races,  Running_Lines, and Race_Calls. There is a one-to-many
relationship between the Races table and the Running_Lines table and a
one-to-many relationship between the Running_Lines table and the
Race_Calls table.  The structure for each table and related sample
data values based on the sample data above follow:

Races:
Name      | Type     |Sample Data (Domain)
RaceId    | Int      |Auto Increment
RaceDate  | date     | 5/13/06 (all valid dates)
Dayofweek | char     | Saturday
Track     | char     | Belmont Park 
RaceNumber| Int      | 1 (Integers values from 1 to 20)
Distance  | float    | 8 (distance expressed in furlongs) furlong=1/8 of mile
Surface   | char     | Dirt 
Purse     | float    | 46000
ToWinner  | float    | 27600
RaceType  | char     | Allowance Optional Claiming
FieldSize | int      | Number of horses in a race.
Winner    | char     | The name of the winning horse
WinOdds   | float    | The odds for the winning horse
Offat     | time     | 1:00
Start     | char     | Good for all
Won       | char     | Driving
Weather1  | char     | clear
Weather2  | int      | 63
Track     | char     | Fast
Comments  | Longtext |TOUCHDOWN KID quickly showed in front, set the pace...

Running_Lines:
Name      | Type |Sample Data
RaceId    | Int  | From race table
ProgNum   | Int  | 1
Med       | char | LB
Equ       | char | b
Wgt       | int  | 124
PP        | int  | 4
SP        | int  | 4
Jockey    | char | Luzzi M J
Odds      | float| .60

Race_Calls:
Name      | Type |Sample Data (domain)
RaceId    | Int  |From race table
CallNum   | Int  | 1    (An interger value 1 thru 6)
CallCode  | Char |1/4  (1/4, 1/2, 3/4, 1, Str, Fin)
CallPos   | Int  | 1
CallLen   | float| 1
Timevalue | Char | 22 4/5
Time      | Float| 22.80 (time expressed in seconds)   

Thank you.

Clarification of Question by handicapper-ga on 17 May 2006 13:29 PDT
Significant Tip ($100) for an answer to this question.
Answer  
There is no answer at this time.

Comments  
There are no comments at this time.

Important Disclaimer: Answers and comments provided on Google Answers are general information, and are not intended to substitute for informed professional medical, psychiatric, psychological, tax, legal, investment, accounting, or other professional advice. Google does not endorse, and expressly disclaims liability for any product, manufacturer, distributor, service or service provider mentioned or any opinion expressed in answers or comments. Please read carefully the Google Answers Terms of Service.

If you feel that you have found inappropriate content, please let us know by emailing us at answers-support@google.com with the question ID listed above. Thank you.
Search Google Answers for
Google Answers  


Google Home - Answers FAQ - Terms of Service - Privacy Policy