Hello srv-ga. I have written the PHP script just as your describe it,
but I would also like to give you some additional advice. I have done
things similar to this in the past and have run into two problems.
Problem number one is load time. When someone visits this page it
will have the load time of visiting three pages. To solve this
problem in the past I have used a UNIX cron job to periodically (maybe
once an hour) cache the pages to a local file. I'm not sure if you
are using UNIX, but it is very easy to call the command "wget" at
intervals to cache pages.
The second problem is the fragility of the script and its dependence
on its input not changing. I have tried to make it as flexible as
possible (using case insensitive matches, etc) but there is no telling
how they will change the page you are parsing. The ultimate way to do
this is to have an agreement with the other web site to put the
content in a certain format in a hidden location. For example, many
web sites use an XML format called RSS to transmit news headlines
between sites. RSS is a standard format you can rely on. I
understand this probably wasn't possible in your scenario or you
probably would have done it. So, keep your eye on the script and make
sure it doesn't break if they change the formatting of their HTML.
With those problems in mind, here is the script I came up with:
---Begin---
<?php
// get the home page content
$homecontent = file('http://www.europeantour.com/home/');
// Look for the section with the "Live Scoring" URL
do {
$line = array_shift($homecontent);
} while (!preg_match('/Live Scoring/i', $line));
// Look for the actual URL
do {
$line = array_shift($homecontent);
if (preg_match('/http:\/\/scoring.europeantour.com\//i', $line)) {
$url = preg_replace('/^.*<a href="(.*?)".*/', "$1", chop($line));
}
} while (!preg_match('/http:\/\/scoring.europeantour.com\//i',
$line));
unset($homecontent); // free up a little memory
// get the score content
$content = file($url);
// look for table header
do {
$line = array_shift($content);
} while (!preg_match('/"smallgreen">Rank<\/font>/i', $line));
$alldone = false; // whether we are done parsing the score table
$allscores = array(); // initialize the scores array
do {
// look for beginning of row
do {
$line = array_shift($content);
if (preg_match('/<\/table/i', $line)) {
$alldone = true;
break;
}
} while (!preg_match('/<tr/i', $line));
if ($alldone == false) {
$row = array(); // initialize this row
do {
$line = array_shift($content);
if (preg_match('/<td/i', $line)) {
$data = rtrim(strip_tags($line));
array_push($row, $data);
}
} while (!preg_match('/<\/tr/i', $line));
if (count($row) == 9 &&
(preg_match('/AUS/i', $row[2]) || preg_match('/NZL/i',
$row[2]) )) {
list($lname, $fname) = preg_split('/,\s*/', $row[1]);
$fname = preg_replace('/\s+/', '', $fname);
$lname = preg_replace('/\s+/', '', $lname);
$row[1] = ucfirst(strtolower($fname))."
".ucfirst(strtolower($lname));
// missed cut
$row[7] = preg_replace('/ /', '-', $row[7]);
$row[8] = preg_replace('/ /', '-', $row[8]);
array_push($allscores, $row);
}
}
} while ($alldone == false);
unset($content); // free up some more memory
?>
<html>
<body bgcolor="#ffffff">
<table border=1>
<tr>
<th>Rank</th>
<th>Name</th>
<th>Hole</th>
<th>Par</th>
<th>1</th>
<th>2</th>
<th>3</th>
<th>4</th>
<th>Total</th>
</tr>
<?php
foreach ($allscores as $score) {
?>
<tr>
<td><?= $score[0] ?></td>
<td><?= $score[1] ?></td>
<td><?= $score[3] ?></td>
<td><?= $score[4] ?></td>
<td><?= $score[5] ?></td>
<td><?= $score[6] ?></td>
<td><?= $score[7] ?></td>
<td><?= $score[8] ?></td>
<td><?= $score[4] ?></td>
</tr>
<?php
}
?>
</table>
</body>
</html>
--End--
I hope this works for you. If I missed anything you asked for just
let me know and I'll add it. Here are a couple sites that explain
some of the concepts in this script. I used mainly Perl style regular
expression functions in PHP because those are the ones I am used to.
The PHP Manual (source of infinite wisdom)
http://www.php.net/manual/en/
Perl Regular Expressions (the "preg" functions in PHP use the same
syntax)
http://www.perldoc.com/perl5.6.1/pod/perlre.html |
Clarification of Answer by
e_murphy-ga
on
06 Jul 2002 09:22 PDT
OK, sorry about not understand the "missed the cut" part. I was
getting lost in the golf terminology. Like most programmers, I don't
know a thing about sports. In any event, here are the requested
changes:
--Begin--
<?php
// get the home page content
$homecontent = file('http://www.europeantour.com/home/');
// Look for the section with the "Live Scoring" URL
do {
$line = array_shift($homecontent);
} while (!preg_match('/Live Scoring/i', $line));
// Look for the actual URL
do {
$line = array_shift($homecontent);
if (preg_match('/http:\/\/scoring.europeantour.com\//i', $line)) {
$url = preg_replace('/^.*<a href="(.*?)".*/', "$1", chop($line));
}
} while (!preg_match('/http:\/\/scoring.europeantour.com\//i',
$line));
unset($homecontent); // free up a little memory
$allscores = array(); // initialize the scores array
for ($i = 0; $i < 2; $i++) { // doing it all twice for the "missed the
cut"
if ($i == 1) { // make the "missed the cut" url
$url = substr($url, 0, -8)."9_0.html";
}
// get the score content
$content = file($url);
// look for table header
do {
$line = array_shift($content);
} while (!preg_match('/"smallgreen">Rank<\/font>/i', $line));
$alldone = false; // whether we are done parsing the score table
do {
// look for beginning of row
do {
$line = array_shift($content);
if (preg_match('/<\/table/i', $line)) {
$alldone = true;
break;
}
} while (!preg_match('/<tr/i', $line));
if ($alldone == false) {
$row = array(); // initialize this row
do {
$line = array_shift($content);
if (preg_match('/<td/i', $line)) {
$data = rtrim(strip_tags($line));
array_push($row, $data);
}
} while (!preg_match('/<\/tr/i', $line));
if (count($row) == 9 &&
(preg_match('/AUS/i', $row[2]) || preg_match('/NZL/i',
$row[2]) )) {
list($lname, $fname) = preg_split('/,\s*/', $row[1]);
$fname = preg_replace('/\s+/', '', $fname);
$lname = preg_replace('/\s+/', '', $lname);
$row[1] = beautify_name($fname)." ".beautify_name($lname);
// extra stuff in rank
$row[0] = preg_replace('/ /', '', $row[0]);
if ($i == 1) {
// missed cut
$row[7] = preg_replace('/ /', '-', $row[7]);
$row[8] = preg_replace('/ /', '-', $row[8]);
}
array_push($allscores, $row);
}
}
} while ($alldone == false);
unset($content); // free up some more memory
}
// Fix name capitalizations, you can put this somewhere else if you
want
function beautify_name ($name) {
$newname = ucfirst(strtolower($name));
$apos = strpos($newname, "'"); // find apostrophe
if ($apos != false) {
$newname = substr($newname, 0, $apos)."'".
ucfirst(substr($newname, $apos+1, 1)).
substr($newname, $apos+2);
}
return $newname;
}
?>
<html>
<body bgcolor="#ffffff">
<table border=1>
<tr>
<th>Rank</th>
<th>Name</th>
<th>Hole</th>
<th>Par</th>
<th>1</th>
<th>2</th>
<th>3</th>
<th>4</th>
<th>Total</th>
</tr>
<?php
foreach ($allscores as $score) {
?>
<tr>
<td><?= $score[0] ?></td>
<td><?= $score[1] ?></td>
<td><?= $score[3] ?></td>
<td><?= $score[4] ?></td>
<td><?= $score[5] ?></td>
<td><?= $score[6] ?></td>
<td><?= $score[7] ?></td>
<td><?= $score[8] ?></td>
<td><?= $score[4] ?></td>
</tr>
<?php
}
?>
</table>
</body>
</html>
--End--
|