When I do:
<?php
$data=file_get_contents("http://snout.omroep.nl/tekst/501-01.html");
echo $data;
?>
I get:
Warning: file_get_contents(http://snout.omroep.nl/tekst/501-01.html):
failed to open stream: HTTP request failed! HTTP/1.1 403 Forbidden in
/home/dev/google.php on line 2
My guess is that they are checking the either the referrer (doubtful,
as a direct link works), or checking the browsers type (sent in the
request headers).
Have a read of http://uk.php.net/curl - the cURL library. Its a much
better way of getting files (allows storage of cookies, setting of
cookies, refferrs, etc).
This is working and tested on my server;
<?php
echo url_get('snout.omroep.nl', '/tekst/501-01.html');
function url_get($domain, $uri, $referer = '-')
{
$header = array();
$header[] = 'GET '.$uri.' HTTP/1.1';
$header[] = 'Host: '.$domain;
$header[] = 'Connection: close';
$header[] = 'Accept:
text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5';
$header[] = 'Accept-Language: en-gb,en;q=0.5';
$header[] = 'Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7';
$header[] = 'User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1;
en-GB; rv:1.8) Gecko/20051107 Firefox/1.5 Web-Sniffer/1.0.22';
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $domain.$uri);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_REFERER, $reffer);
curl_setopt($ch, CURLOPT_HTTPHEADER, $header);
$result['exec'] = curl_exec ($ch);
$result['info'] = curl_getinfo($ch);
//use this line to get more info
//return $result;
return $result['exec'];
curl_close ($ch);
}
?>
I belive they are checking the headers strictly - but the above works
:P. NB: cURL is also faster than the method you were using, and more
flexible and reliable. If it is not installed on your server (its on
most), contact me for instructions on installation.
Demo @ http://dev.isitaboat.co.uk/google.php |