![]() |
|
|
| Subject:
Regex
Category: Computers > Programming Asked by: grabby-ga List Price: $20.00 |
Posted:
28 Jul 2004 09:20 PDT
Expires: 27 Aug 2004 09:20 PDT Question ID: 380299 |
I want to extract a URL from some HTML into a variable using a Perl regex, it *always* ends in page0001.html. ie:- http://some.long.protracted.url.domain/and/some/other/stuff/page0001.html | |
| |
| |
| |
| |
|
|
| Subject:
Re: Regex
Answered By: palitoy-ga on 28 Jul 2004 11:29 PDT Rated: ![]() |
Hello grabby This is the regex you require. Assuming that the above text is in a variable called $html after you have scraped the page: $html =~ m/replace\(\'(.*)page0001\.html/ ; $url_matched = $1 ; For your example above $url_matched will now be equal to: http://foo.bar.com/something/here/blah/ If you need any more information on this please ask for clarification and I will do my best to help. Similarly if you would like this explained more fully I would be glad to help. | |
| |
| |
| |
grabby-ga
rated this answer:
and gave an additional tip of:
$25.00
you beauty! |
|
| Subject:
Re: Regex
From: greyknight-ga on 28 Jul 2004 10:41 PDT |
\b
# Match the leading part (proto://hostname, or just hostname)
(
# http://, or https:// leading part
(https?)://[-\w]+(\.\w[-\w]*)+
|
# or, try to find a hostname with our more specific sub-expression
(?i: [a-z0-9] (?:[-a-z0-9]*[a-z0-9])? \. )+ # sub domains
# Now ending .com, etc. For these, require lowercase
(?-i: com\b
| edu\b
| biz\b
| gov\b
| in(?:t|fo)\b # .int or .info
| mil\b
| net\b
| org\b
| [a-z][a-z]\b # two-letter country codes
)
)
# Allow an optional port number
( : \d+ )?
# We definately need at least one /
(/)
# This part of the URL is optional
(
# The rest are heuristics for what seems to work well
[^.!,?;"'<>()\[\]{}\s\x7F-\xFF]*
(?:
[.!,?]+ [^.!,?;"'<>()\[\]{}\s\x7F-\xFF]+
)*
)?
# It should end in page0001.html
( page0001\.html )
Most of this regular expression was borrowed from Jeffrey Friedl who
wrote some excellent books on regular expressions (e.g. Mastering
Regular Expressions) |
If you feel that you have found inappropriate content, please let us know by emailing us at answers-support@google.com with the question ID listed above. Thank you. |
| Search Google Answers for |
| Google Home - Answers FAQ - Terms of Service - Privacy Policy |