Google Answers Logo
View Question
 
Q: Using PHP (preg_match_all ) and regex to suck player scores from website ( Answered 5 out of 5 stars,   0 Comments )
Question  
Subject: Using PHP (preg_match_all ) and regex to suck player scores from website
Category: Computers > Programming
Asked by: srv-ga
List Price: $20.00
Posted: 23 Jun 2002 07:10 PDT
Expires: 30 Jun 2002 07:10 PDT
Question ID: 31873
I need to pull Australian player scores from the LPGA.com website
(http://www.lpga.com/statistics/leaderboard.cfm). I have got my code
to get the scores to screen but can't get them into an array and print
in a table. I have included my code below . So to clarify I need to
connect to the website throw the page into an array then only pull out
the Australian players and their scores. The Australian players names
are held in an array at the end of my code below.  It also needs to
pull out the Aussie scores from the list of players that missed the
cut at the bottom of the leaderboard. You will need to go to the
website to see what I mean I would think :))
(http://www.lpga.com/statistics/leaderboard.cfm) I have gotten a
variation of the code below to work on other sites but the LPGA source
code is a little messier!

--------------------------
My Code
----------------------------------
<?php
 $page_array= file("http://www.lpga.com/statistics/leaderboard.cfm");
 //echo $page_array;
 foreach ($page_array as $page) {
 if (eregi("([\t]+)<td align=([a-z][A-Z]+)><font
face=\"Verdana,Arial,Helvetica,sans serif\" size=2><a
href=\"/players/playerpage.cfm\?player_id=([0-9]+)\">",$page)) {
  echo $page;
  preg_match_all ("|.cfm\?player_id=([0-9]+)\">(.*)</a>|", $page,
$out, PREG_PATTERN_ORDER);
  $player_name = $out[1][0];
  }
 if (eregi("([\t]+)<td align=center><font
face=\"Verdana,Arial,Helvetica,sans serif\" size=2>",$page)) {
  preg_match_all ("|(.*)</font></td>|", $page, $out,
PREG_PATTERN_ORDER);
  $current = $out[1][0];
  $to_par = $out[1][1];
  $total_score = $out[1][2];
  }
  preg_match_all ("|<td align=([a-z]+)><font
face=\"Verdana,Arial,Helvetica,sans serif\" size=2>(.*)</td>|", $page,
$out, PREG_PATTERN_ORDER);
  $start = $out[1][0];
  $today = $out[1][1];
  $thru = $out[1][2];
  $round_1 = $out[1][3];
  $round_2 = $out[1][4];
  $round_3 = $out[1][5];
  $round_4 = $out[1][6];
  $players_array[$player_name] = array("current" => $current,"to_par"
=> $to_par, "total_score" => $total_score, "start" => $start, "today"
=> $today, "thru" => $thru, "round_1" => $round_1, "round_2" =>
$round_2, "round_3" => $round_3, "round_4" => $round_4);
}

$aussies = array("Karrie Webb", "Rachel Teske", "Jane Crafter");
echo "<table border=\"1\">";
echo "<tr><td colspan=11>LPGA Tour Scoreboard</td></tr>";
echo "<tr>
		<td>Player Name</td>
		<td>Current</td>
		<td>Today</td>
		<td>Total Score</td>
		<td>Start</td>
		<td>Today</td>
		<td>Thru</td>
		<td>R1</td>
		<td>R2</td>
		<td>R3</td>
		<td>R4</td>
	</tr>";
foreach ($players_array as $key => $val) {
 if (in_array($key,$aussies)) {
	 echo "<tr>";
	 echo "<td>$key</td>";
	 foreach ($val as $v) {
		echo "<td>$v</td>";
	 }
	 echo "</tr>";
}
}
echo "</table>";

?>
Answer  
Subject: Re: Using PHP (preg_match_all ) and regex to suck player scores from website
Answered By: runix-ga on 23 Jun 2002 09:32 PDT
Rated:5 out of 5 stars
 
Hello! 

I've modified your script to work: you had some problems with the
regular expressions. I would recommend you to read 'Mastering regular
expressions' ( http://www.amazon.com/exec/obidos/ASIN/0596002890/qid=1024849703/sr=8-1/ref=sr_8_1/102-4897136-7732161
)

In the table, I didn't knew where to print  'to_par'. You'll have to
add the item 'to_par' on the foreach that builds the table.


Good luck!

------------------------

<?php
 $page_array= file("http://www.lpga.com/statistics/leaderboard.cfm");
 $players=array();
 $started=0;
 foreach ($page_array as $page) {
 if (ereg('<tr bgcolor="EEEEEE">',$page)){
  if (count($data)){
  	$players[$player_name]=$data;
  }
  $player_name='';
  $data=array();
  $started=1;
 }
 if (empty($player_name) && $started && preg_match("|<td
align=center><font face=\"Verdana,Arial,Helvetica,sans serif\"
size=2>(.*?)</font>|i",$page,$out)){
  if (!isset($data['start'])){$data['start']=$out[1]; }
  elseif (!isset($data['curr'])){$data['curr']=$out[1]; }
 }
 if (eregi("([\t]+)<td align=([a-z]+)><font
face=\"Verdana,Arial,Helvetica,sans serif\" size=2><a
href=\"/players/playerpage.cfm\?player_id=([0-9]+)\">",$page)) {
  preg_match_all ("|.cfm\?player_id=([0-9]+)\">(.*?)</a>|", $page,
$out, PREG_PATTERN_ORDER);
  $player_name = $out[2][0];
  }
 if (!empty($player_name) && !isset($data['today'])){
	preg_match('/<td align=center><font
face="Verdana,Arial,Helvetica,sans serif" size=2>([0-9]+:[0-9]+
[AP]M|E|[\+-][0-9])/i',$page,$out);
	$data['today']=$out[1];
 }elseif (!empty($player_name) and eregi("<td align=center><font
face=\"Verdana,Arial,Helvetica,sans serif\" size=2>",$page)) {
  preg_match_all ("|<td align=center><font
face=\"Verdana,Arial,Helvetica,sans serif\"
size=2>(.*?)</font></td>|", $page, $out, PREG_PATTERN_ORDER);
  $fields=array('thru','to_par','1','2','3','4','total');
	foreach($fields as $field){
		if (!isset($data[$field])){
			$data[$field]=$out[1][0]." ";
			break;
		}
	}
  }
}
$aussies = array("Karrie Webb", "Rachel Teske", "Jane Crafter");
echo "<table border=\"1\">";
echo "<tr><td colspan=11>LPGA Tour Scoreboard</td></tr>";
echo "<tr> <td>Player Name</td> <td>Current</td> <td>Today</td>
<td>Total Score</td> <td>Start</td> <td>Today</td> <td>Thru</td>
<td>R1</td> <td>R2</td> <td>R3</td> <td>R4</td> </tr>";
foreach ($players as $name => $data) {
		if (in_array($name,$aussies)) {
			echo "<tr>";
			echo "<td>$name</td>";

			foreach (array('curr','today','total', 'start', 'today',
'thru','1','2','3','4') as $null=>$k) {
					if ($k!='null'){
							echo "<td>$k : ".$data[$k]."</td>";
					}
			}
			echo "</tr>\n";
	}
}
echo "</table>"; ?>

----------------------

Request for Answer Clarification by srv-ga on 23 Jun 2002 14:42 PDT
Hi Runix,
Fantastic stuff!  Works like a charm!! Thought it wasn't working but
just had to get the line breaks back in check from the browser display
:)) Just one thing..it isn't picking up the players at the end who
missed the cut. There is a table at the bottom of the scoreboard that
must get pulled out with just the name and the other two figures and
appear at the bottom of the table. I think that was about it.

Also should ask if it will still work when there is only one round
played. What the site does is they only display round one on the first
day and then on the second day they display round one and two and so
on..Just don;t want it to break this week when the tournament starts
and they are only showing one round. Sometimes the Ladies tournaments
are only three rounds and sometimes four like this week...soooo much
to think about :))

I should have asked you to comment the code so I could figure it all
out! ;) Thanks again and hope to hear from you soon....

Clarification of Answer by runix-ga on 23 Jun 2002 16:12 PDT
Hey :) Im happy you liked the script!
I've commented it a bit, please askme if you don't understand
something.

Here's the new version, now it reads the Missed cut table.
Good luck!

----------------------
<?php
 $page_array= file("http://www.lpga.com/statistics/leaderboard.cfm");
# $page_array= file("leaderboard.cfm");
 $players=$missed_cut=array();
 $started=0;
 $missed=0;
 foreach ($page_array as $page) {

 if (!$missed){
 	 if (eregi('MISSED CUT',$page)){
	 	$missed=1;
		$started=0;
		$data=array();
	 	continue;
	 }

 	# new record starts...
	 if (ereg('<tr bgcolor="EEEEEE">|</table>',$page)){
	  if (count($data)){
	  	$players[$player_name]=$data;
	  }
	  $player_name='';
	  $data=array();
	  $started=1;
	 }

	 # 'start' and 'current' 
	 if (empty($player_name) && $started && preg_match("|<td
align=center><font face=\"Verdana,Arial,Helvetica,sans serif\"
size=2>(.*?)</font>|i",$page,$out)){
	  if (!isset($data['start'])){$data['start']=$out[1]; }
	  elseif (!isset($data['curr'])){$data['curr']=$out[1]; }
	 }
	
	 # player name
	 if (eregi("([\t]+)<td align=([a-z]+)><font
face=\"Verdana,Arial,Helvetica,sans serif\" size=2><a
href=\"/players/playerpage.cfm\?player_id=([0-9]+)\">",$page)) {
	  preg_match_all ("|.cfm\?player_id=([0-9]+)\">(.*?)</a>|", $page,
$out, PREG_PATTERN_ORDER);
	  $player_name = $out[2][0];
	  }
	  
	 # date (it differs from the other fields 'thru', 'to_par', etc
because the line doesn't ends with </font> (they close the tag on the
next line)
	 if (!empty($player_name) && !isset($data['today'])){
		preg_match('/<td align=center><font
face="Verdana,Arial,Helvetica,sans serif" size=2>([0-9]+:[0-9]+
[AP]M|E|[\+-][0-9])/i',$page,$out);
		$data['today']=$out[1];
	 }
	 # thru, to_par, r1, r2, r3, etc
	 elseif (!empty($player_name) and eregi("<td align=center><font
face=\"Verdana,Arial,Helvetica,sans serif\" size=2>",$page)) {
	  preg_match_all ("|<td align=center><font
face=\"Verdana,Arial,Helvetica,sans serif\"
size=2>(.*?)</font></td>|", $page, $out, PREG_PATTERN_ORDER);
	  $fields=array('thru','to_par','1','2','3','4','total');
		foreach($fields as $field){
			if (!isset($data[$field])){
				$data[$field]=$out[1][0]." ";
				break;
			}
		}
	  }
 }else{
  if (ereg('<tr bgcolor="EEEEEE">|</table>',$page)){
#  		print "new player, closing: '$player_name'\n";
	  if (count($data)){
	  	$missed_cut[$player_name]=$data;
	  }
	  $player_name='';
	  $data=array();
	  $started=1;
	  continue;
  }
  if ($started and preg_match('|<td align=left><a
href="/players/playerpage.cfm\?player_id=[0-9]+"><font
face="Verdana,Arial,Helvetica,sans serif"
size=2>(.*?)</font></a></td>|i',$page,$out)){
  	$player_name=$out[1];
	continue;
  }
  if ($started and $player_name and preg_match('|<td
align=center><font face="Verdana,Arial,Helvetica,sans serif"
size=2>(.*?)</font>(&nbsp;)?</td>|i',$page,$out)){
  	foreach(array('r1','r2','total') as $k){
		if (!isset($data[$k])){
			$data[$k]=$out[1];
			break;
		}
	}
	continue;
  }
  
  

 }
}
$aussies = array("Karrie Webb", "Rachel Teske", "Jane Crafter");
echo "<table border=\"1\">";
echo "<tr><td colspan=11>LPGA Tour Scoreboard</td></tr>";
echo "<tr> <td>Player Name</td> <td>Current</td> <td>Today</td>
<td>Total Score</td> <td>Start</td> <td>Today</td> <td>Thru</td>
<td>R1</td> <td>R2</td> <td>R3</td> <td>R4</td> </tr>";
foreach ($players as $name => $data) {
		if (in_array($name,$aussies)) {
			echo "<tr>";
			echo "<td>$name</td>";

			foreach (array('curr','today','total', 'start', 'today',
'thru','1','2','3','4') as $null=>$k) {
					if ($k!='null'){
							echo "<td>$k : ".$data[$k]."</td>";
					}
			}
			echo "</tr>\n";
	}
}
echo "</table>";

print "Missed cut:\n";
foreach($missed_cut as $k=>$p){
	print "$k:\n";
	foreach($p as $k=>$v){
		print "\t$k: $v\n";
	}
	print "<br />\n";
}
?>

Request for Answer Clarification by srv-ga on 23 Jun 2002 19:15 PDT
Hi Runix,
Thanks again and esp for the comments...that helps big time ;) This is
prob my fault for not making it clear in my ramblng question but I
need the missed cut table to only show the Aussie players too.

So basically the script needs to pull out every Aussie player and put
them in one table with the ones who made the cut then a table row with
the title "Missed Cut" and then below that all the Aussie players that
missed the cut. So everything should be in one table with the "Missed
cuts" at the bottom. The missed cut section would show name/r1/r2 and
total under the same columns as used by the top players with the empty
ones just showing a blank eg r3 and r4.

Hope it all makes sense as it is a little difficult to expain but you
have been great so far!...thanks!!

Some Aussie test players for the "Missed Cut" section are "Wendy
Doolan" and  "Jan Stephenson". Thanks again mate....you have helped me
out more than you know! :))

Clarification of Answer by runix-ga on 23 Jun 2002 20:16 PDT
just change the last part (where I print the missed cut list) for this one:

print "Missed cut:\n
<table>";
foreach($missed_cut as $k=>$p){
    if (in_array($k,$aussies)) {
        print "<tr><Td>$k</td>\n";
        foreach($p as $k=>$v){
            print "<td>$v</td>\n";
        }
        print "</tr>\n";
    }
}
print "</table>";

Request for Answer Clarification by srv-ga on 23 Jun 2002 21:41 PDT
Thanks again for your help Runix...awesome stuff and works great! Now
for the rest of the ones I need done...will post those soon ;)

Clarification of Answer by runix-ga on 24 Jun 2002 11:02 PDT
ok!
(just to close the clarification request! :)
srv-ga rated this answer:5 out of 5 stars
Excellent answer and works perfectly! Just clarifying the bits and
pieces at the end of the question.

Comments  
There are no comments at this time.

Important Disclaimer: Answers and comments provided on Google Answers are general information, and are not intended to substitute for informed professional medical, psychiatric, psychological, tax, legal, investment, accounting, or other professional advice. Google does not endorse, and expressly disclaims liability for any product, manufacturer, distributor, service or service provider mentioned or any opinion expressed in answers or comments. Please read carefully the Google Answers Terms of Service.

If you feel that you have found inappropriate content, please let us know by emailing us at answers-support@google.com with the question ID listed above. Thank you.
Search Google Answers for
Google Answers  


Google Home - Answers FAQ - Terms of Service - Privacy Policy