|
|
Subject:
Regex
Category: Computers > Programming Asked by: grabby-ga List Price: $20.00 |
Posted:
28 Jul 2004 09:20 PDT
Expires: 27 Aug 2004 09:20 PDT Question ID: 380299 |
I want to extract a URL from some HTML into a variable using a Perl regex, it *always* ends in page0001.html. ie:- http://some.long.protracted.url.domain/and/some/other/stuff/page0001.html | |
| |
| |
| |
| |
|
|
Subject:
Re: Regex
Answered By: palitoy-ga on 28 Jul 2004 11:29 PDT Rated: |
Hello grabby This is the regex you require. Assuming that the above text is in a variable called $html after you have scraped the page: $html =~ m/replace\(\'(.*)page0001\.html/ ; $url_matched = $1 ; For your example above $url_matched will now be equal to: http://foo.bar.com/something/here/blah/ If you need any more information on this please ask for clarification and I will do my best to help. Similarly if you would like this explained more fully I would be glad to help. | |
| |
| |
|
grabby-ga
rated this answer:
and gave an additional tip of:
$25.00
you beauty! |
|
Subject:
Re: Regex
From: greyknight-ga on 28 Jul 2004 10:41 PDT |
\b # Match the leading part (proto://hostname, or just hostname) ( # http://, or https:// leading part (https?)://[-\w]+(\.\w[-\w]*)+ | # or, try to find a hostname with our more specific sub-expression (?i: [a-z0-9] (?:[-a-z0-9]*[a-z0-9])? \. )+ # sub domains # Now ending .com, etc. For these, require lowercase (?-i: com\b | edu\b | biz\b | gov\b | in(?:t|fo)\b # .int or .info | mil\b | net\b | org\b | [a-z][a-z]\b # two-letter country codes ) ) # Allow an optional port number ( : \d+ )? # We definately need at least one / (/) # This part of the URL is optional ( # The rest are heuristics for what seems to work well [^.!,?;"'<>()\[\]{}\s\x7F-\xFF]* (?: [.!,?]+ [^.!,?;"'<>()\[\]{}\s\x7F-\xFF]+ )* )? # It should end in page0001.html ( page0001\.html ) Most of this regular expression was borrowed from Jeffrey Friedl who wrote some excellent books on regular expressions (e.g. Mastering Regular Expressions) |
If you feel that you have found inappropriate content, please let us know by emailing us at answers-support@google.com with the question ID listed above. Thank you. |
Search Google Answers for |
Google Home - Answers FAQ - Terms of Service - Privacy Policy |