|
|
Subject:
PHP/Regex for extracting Second-Level domains from URLs
Category: Computers > Programming Asked by: fattymelt-ga List Price: $100.00 |
Posted:
05 Mar 2005 08:32 PST
Expires: 06 Mar 2005 06:44 PST Question ID: 485160 |
Given URLs of the form: http://www.example.com http://www.example.com/ http://www.example.net (any third-level domain, .net, .co.uk, etc.) http://www.example.com/example.html?a=1&b=2 http://www2.example.com (any sub-domain) http://example.com (no sub-domain) I need PHP code that uses a Regular Expression to extract just the second-level domain "example.com" from all forms of the full URL. Code should take a URL as input and give second-level domain "example.com" as output. | |
| |
|
|
There is no answer at this time. |
|
Subject:
Re: PHP/Regex for extracting Second-Level domains from URLs
From: eliteskillsdotcom-ga on 05 Mar 2005 13:32 PST |
Cant get it to work perfectly but this is what I could come up with: <? $urls = 'bsldkfs http://www.jimmyr.com/index.php and https://eliteskill.com asdf asd http:/www.eliteskills.com/tacos/ sdf sd sd f http://www.eliteskills.com/'; preg_match_all('/(http|ftp)+(s)?:(\/\/)((\w|\.)+)(\/)?(\S+)?/i', $urls, $return); echo '<pre>'; print_r($return[0]); echo '</pre>'; ?> It doesn't grab directories. Maybe someone else can figure it out from there. |
Subject:
Re: PHP/Regex for extracting Second-Level domains from URLs
From: fattymelt-ga on 05 Mar 2005 15:04 PST |
eliteskills - I appreciate you time, but my questions states "I need PHP code that uses a Regular Expression to extract just the second-level domain "example.com" from all forms of the full URL. Code should take a URL as input and give second-level domain "example.com" as output." So, for example, your code should take in "http://www.jimmyr.com/index.php" and give back "jimmyr.com" Keep in mind, it needs to handle all of the variations I listed. thanks |
Subject:
Re: PHP/Regex for extracting Second-Level domains from URLs
From: eliteskillsdotcom-ga on 05 Mar 2005 16:44 PST |
I was not providing an answer. "Cant get it to work perfectly but this is what I could come up with" It's just what I could do so the next guy might be able to finish it up. That's basically the function but it has to be modified to accept directories, and crop the domain part. If not all through preg_match_all then looping the array and using a preg_replace. |
Subject:
Re: PHP/Regex for extracting Second-Level domains from URLs
From: fattymelt-ga on 05 Mar 2005 17:28 PST |
Just to make sure there is no confusion here... The code I am asking for takes a URL as input. I will be supplying that. I do not need any code that extract URLs from a string. No one that will be trying to answer this should need any code that extracts a URL from a string. I already have the URLs. I need code that extracts the second-level domain from a URL. thanks |
Subject:
Re: PHP/Regex for extracting Second-Level domains from URLs
From: eliteskillsdotcom-ga on 05 Mar 2005 19:31 PST |
I spent a bit of time on this. If you want to contribute anything go to http://www.eliteskills.com/donate.php . Let me know if it doesn't work or more information is needed. Coding: ---------------------------------------------------------------- <? $urls = 'http://www.jimmyr.com/index.php https://eliteskill.com http://www.eliteskills.com/tacos/ http://google.com/search?q=query%20string%20from%20hell%20here ftp://www.ftp.com/ http://us.mail.yahoo.com/ ://www.google.co.uk/ http://us.f526.mail.yahoo.com/ http://www.eliteskills.com/dmozsubmit/categ/Kids_and_Teens/Arts/'; preg_match_all('/(http|ftp)+(s)?:(\/\/)((\w|\.)+)(\/)?(\S+)?/i', $urls, $return); // Grab the url list and put into array return $numElements = count($return[0]); // Count how many elements in array $foo=array(); $foo=$return[0]; for($counter=0; $counter < $numElements; $counter++) { // loop through array contents outputting spliced url $url=$foo[$counter]; echo "In: $url"; $url=ereg_replace("\.(php|asp|html|htm|cfm)", "", $url); // add any other extentions, this was all I could think of that may link // This is to not confused the ending of a url as part of the domain when counting $urlcount = explode(".",$url); $urlcount1 = count($urlcount); $urlcount1--; if (ereg("co.uk", $url)){ $urlcount1--; } // Accomodates for the dual co.uk ending // Below it divides the url by how many subdomains it has to properly crop it if ($urlcount1==1){ $url=preg_replace("/(http(s)?|ftp):(\/\/)/i", "", $url); $url=preg_replace("/([^\/]+)(.*)/", "\\1", $url); } if ($urlcount1==2){ $url=preg_replace("/(http(s)?|ftp):(\/\/)[^\.]+\./i", "", $url); $url=preg_replace("/([^\/]+)(.*)/", "\\1", $url); } if ($urlcount1==3){ $url=preg_replace("/(http(s)?|ftp):(\/\/)[^\.]+\.[^\.]+\./i", "", $url); $url=preg_replace("/([^\/]+)(.*)/", "\\1", $url); } if ($urlcount1==4){ $url=preg_replace("/(http(s)?|ftp):(\/\/)[^\.]+\.[^\.]+\.[^\.]+\./i", "", $url); $url=preg_replace("/([^\/]+)(.*)/", "\\1", $url); } echo "<br />Out: $url, $urlcount1<br /><br />"; } ?> --Output-- ------------------------------------------------------------------ In: http://www.jimmyr.com/index.php Out: jimmyr.com, 2 In: https://eliteskill.com Out: eliteskill.com, 1 In: http://www.eliteskills.com/tacos/ Out: eliteskills.com, 2 In: http://google.com/search?q=query%20string%20from%20hell%20here Out: google.com, 1 In: ftp://www.ftp.com/ Out: ftp.com, 2 In: http://us.mail.yahoo.com/ Out: yahoo.com, 3 In: ://www.google.co.uk/ Out: google.co.uk, 2 In: http://us.f526.mail.yahoo.com/ Out: yahoo.com, 4 In: http://www.eliteskills.com/dmozsubmit/categ/Kids_and_Teens/Arts/ Out: eliteskills.com, 2 |
Subject:
Re: PHP/Regex for extracting Second-Level domains from URLs
From: eliteskillsdotcom-ga on 05 Mar 2005 19:41 PST |
<? $url="whatever url you want to enter"; // Or just $url=$_POST["whateveryounamedtheinputbox"]; if you're grabbing it from a form. echo "In: $url"; $url=ereg_replace("\.(php|asp|html|htm|cfm)", "", $url); // add any other extentions, this was all I could think of that may link // This is to not confused the ending of a url as part of the domain when counting $urlcount = explode(".",$url); $urlcount1 = count($urlcount); $urlcount1--; if (ereg("co.uk", $url)){ $urlcount1--; } // Accomodates for the dual co.uk ending // Below it divides the url by how many subdomains it has to properly crop it if ($urlcount1==1){ $url=preg_replace("/(http(s)?|ftp):(\/\/)/i", "", $url); $url=preg_replace("/([^\/]+)(.*)/", "\\1", $url); } if ($urlcount1==2){ $url=preg_replace("/(http(s)?|ftp):(\/\/)[^\.]+\./i", "", $url); $url=preg_replace("/([^\/]+)(.*)/", "\\1", $url); } if ($urlcount1==3){ $url=preg_replace("/(http(s)?|ftp):(\/\/)[^\.]+\.[^\.]+\./i", "", $url); $url=preg_replace("/([^\/]+)(.*)/", "\\1", $url); } if ($urlcount1==4){ $url=preg_replace("/(http(s)?|ftp):(\/\/)[^\.]+\.[^\.]+\.[^\.]+\./i", "", $url); $url=preg_replace("/([^\/]+)(.*)/", "\\1", $url); } echo "<br />Out: $url, $urlcount1<br /><br />"; ?> |
Subject:
Re: PHP/Regex for extracting Second-Level domains from URLs
From: garyking-ga on 05 Mar 2005 20:23 PST |
No offense, but I don't think this question is worth $100. Try asking at a PHP forum; you will probably get a better response at one such as: http://www.phpbuilder.com/board/ Good luck! |
Subject:
Re: PHP/Regex for extracting Second-Level domains from URLs
From: fattymelt-ga on 05 Mar 2005 21:39 PST |
that code almost does the trick, but: 1) hard-coding the "co.uk" doesn't take into account other third-level domains (e.g. "co.in", etc.) 2) because you split on "." your code breaks if the querystring we're to inlucde a "." (e.g. .../index.asp?price=1.23 3) I don't want to have to come up with an exhaustive list of file extensions (there are plenty more than php,asp,html,htm,cfm) |
Subject:
Re: PHP/Regex for extracting Second-Level domains from URLs
From: eliteskillsdotcom-ga on 05 Mar 2005 22:05 PST |
You're right. This has been fun play con regular expressions. I think i got it now... 0.o maybe this time. Much cleaner anyways. <? $urls = 'http://www.jimmyr.com/in.d.e.x.php https://eliteskill.com http://www.eliteskills.com/tacos/ http://google.com/search?q=query%20string%20from%20hell%20here http://google.com/search?q=255.255.255.255 http://google.co.in https://google.ru/*&!0@3#)($*)__Q)(E ftp://www.ftp.com/ http://us.mail.yahoo.com/ ://www.google.co.uk/ http://us.f526.mail.yahoo.com/ http://www.eliteskills.com/dmozsubmit/categ/Kids_and_Teens/Arts/'; preg_match_all('/(http|ftp)+(s)?:(\/\/)((\w|\.)+)(\/)?(\S+)?/i', $urls, $return); // Grab the url list and put into array return $numElements = count($return[0]); $foo=array(); $foo=$return[0]; for($counter=0; $counter < $numElements; $counter++) { $url=$foo[$counter]; echo "In: $url"; $url=preg_replace("/((http(s)?|ftp):\/\/)/", "", $url); $url=preg_replace("/([^\/]+)(.*)/", "\\1", $url); $urlcount = explode(".",$url); $urlcount1 = count($urlcount); $urlcount1--; if (ereg("co\.", $url)){ $urlcount1--; } $url=preg_replace("/([^\.]+)\./i", "", $url,$urlcount1-1); echo "<br />Out: $url, $urlcount1<br /><br />"; } ?> |
Subject:
Re: PHP/Regex for extracting Second-Level domains from URLs
From: fattymelt-ga on 06 Mar 2005 06:44 PST |
eliteskills - very nice. thanks. I'm cancelling the question and will be hitting up your "donate" button! FYI.. this code: if (ereg("co\.", $url)){ $urlcount1--; } screws things up for a domain that ends in "co" such as www.AcmeCo.com I change the regex to "\.co\." to get around that problem. Otherwise, this is some good code. Thanks, again. |
If you feel that you have found inappropriate content, please let us know by emailing us at answers-support@google.com with the question ID listed above. Thank you. |
Search Google Answers for |
Google Home - Answers FAQ - Terms of Service - Privacy Policy |