Hi! :)
I got the regex! :)
<?
$words=array('WORD');
$data="WORD this is my WORD dont <p class='WORD'> or <!-- DONT WORD
--> WORD";
foreach($words as $word){
$data=preg_replace("/(?!<!--)(?!<)(^|[\s\.,>])($word)($|[\s,\.])(?!>)(?!-->)/",
"$1<span>$2</span>$3",$data);
}
print $data;
print "\n\n\n";
?>
prints:
<span>WORD</span> this is my <span>WORD</span> dont <p class='WORD'>
or <!-- DONT WORD --> <span>WORD</span>
You have to add the words you want to replace in the array '$words'. I
used WORD here
Aditional Links
I recommend you this book: 'Mastering Regular Expressions'
[ http://www.amazon.com/exec/obidos/ASIN/0596002890/qid=1024379027/sr=8-1/ref=sr_8_1/102-4897136-7732161
]
Search Strategy
Personal Experience
Good luck! |
Request for Answer Clarification by
endquote-ga
on
18 Jun 2002 00:09 PDT
Almost! Except:
$sWord = "too";
$str = "<!-- bar too foo -->";
$str = preg_replace("/(?!<!--)(?!<)(^|[\s\.,>])($sWord)($|[\s,\.])(?!>)(?!-->)/i",
"\\1<b>\\2</b>\\3", $str);
returns <!-- bar <b>too</b> foo -->
Not sure why that is. That's not good though. This is going to be run
on HTML from a wide variety of users, so hopefully it can handle
non-standard weirdness, too.
Also, will this have problems with punctuation other than periods and
commas? Could a non-word character be used instead of \., to be more
generic?
I actually have that book, but honestly don't have time to read it
before I need the answer to this, and find it to be pretty difficult
to read anyway. Thanks though, this is the best I've seen yet! If you
could just fix the comment thing...
|
Request for Answer Clarification by
endquote-ga
on
18 Jun 2002 00:31 PDT
Messing with it some more, here. Your example does work, unless you
include another word before the end of the comment. For example:
WORD this is my WORD dont <p class='WORD'> or <!-- DONT WORD foo -->
WORD
would give
<span>WORD</span> this is my <span>WORD</span> dont <p class='WORD'>or
<!-- DONT <span>WORD</span> --> <span>WORD</span>
Also I put an i modifier on there to make it case-insensitive.
This'll be very exciting for me if you can make it go. :) It's kind of
fundamental to a largish project.
|
Request for Answer Clarification by
endquote-ga
on
18 Jun 2002 01:09 PDT
Damn, okay, I just came up with another requirement for this, but
since it wasn't in the original I can make it a new question you can
answer for the same price. Words should also not be matched if they
are linked with <a> tags. So...
<!-- some WORD here -->
and
<a href="http://word.com/">some word here</a>
should both *not* match.
|
Clarification of Answer by
runix-ga
on
18 Jun 2002 15:25 PDT
Ok!
Here's version 2.0 :)
Now, It doesn't replace the words inside the comments or HTML tags.
<?
$data="this is my WORD <p class='biri WORD biri'> or <!-- DONT WORD
CRI --> WORD";
$words=array('WORD');
$data=add_spans($data,$words);
print $data;
print "\n\n\n";
function add_spans($data,$words){
$data=">".$data."<";
foreach($words as $word){
$data=preg_replace("/(>[^<]*)($word)([^<>]*)/",
"$1<span>$2</span>$3",$data);
}
$data=substr($data,1,strlen($data)-2);
return $data;
}
?>
prints: this is my <span>WORD</span> <p class='biri WORD biri'> or
<!-- DONT WORD CRI --> <span>WORD</span>
The only problem I found is that if you have the word you're looking
for 2 times without a html tag in the middle, the first one doesn't
gets <spaned>.
It's not possible to fix this problem using regular expressions (or,
at least, I don't know how to), so if you want we can think another
way of doing it
Good luck!
|
Clarification of Answer by
runix-ga
on
19 Jun 2002 11:34 PDT
ok! let's try this one!
And don't worry about the links issue, I've modified it so it doesn't
replace anything inside a tag (ie <span class='WORD'> won't be
replaced!)
<?
$data=file("sampletext.txt");
$words=array('more');
$data=add_spans($data,$words);
print $data;
print "\n\n\n";
function add_spans($data,$words){
if (is_array($data)){$data=join('',$data);}
$data=">".$data."<";
foreach($words as $word){
$data=preg_replace("/((?=>)[^<]*?[^\w])($word)([^\w][^<>]*)/i",
"$1<span>$2</span>$3",$data);
}
$data=substr($data,1,strlen($data)-2);
return $data;
}
?>
|
Request for Answer Clarification by
endquote-ga
on
19 Jun 2002 18:43 PDT
It's still matching stuff in comments. Try it with:
$words=array('more', 'foxy', 'while');
and you'll see. Perhaps moshen's comment would be helpful? I'll try
and hack on it some, too.
|
Request for Answer Clarification by
endquote-ga
on
19 Jun 2002 19:56 PDT
I made some progress:
<?
$data=file("sampletext.txt");
$words=array('more', 'foxy', 'while', 'phoenix');
$data=add_spans($data,$words);
print $data;
function add_spans($data,$words){
if(is_array($data)) { $data=join('',$data); }
foreach($words as $word){
$data = preg_replace("/($word)/i", "<b>\\1</b>", $data); // put
tags around the word
$data = preg_replace("/(<[^<>]*)<b>($word)<\/b>([^<>]*>)/i",
"\\1\\2\\3", $data); // remove them if it was in an html tag
$data = preg_replace("/(<a [^<>]*>.*)<b>($word)<\/b>(.*<\/a>)/i",
"\\1\\2\\3", $data); // remove them if it was in a link
// $data = preg_replace("/(<\!--[.\r\n]*-->)/", "", $data); // remove
them if it was in a comment
// $data = preg_replace("/(<\!--.*)<b>($word)<\/b>(.*-->)/i",
"\\1\\2\\3", $data); // remove them if it was in a comment
}
return $data;
}
?>
Still can't seem to match a comment though!
|
Clarification of Answer by
runix-ga
on
21 Jun 2002 06:15 PDT
This is the correct version, without regexes:
<?
$data=file("sampletext.txt");
$words=array('more', 'foxy', 'while', 'Phoenix', 'phoenixfest',
'real');
$data=add_spans($data,$words);
print $data;
function add_spans($data,$words){
$pre = '<b><a href="http://tangent.cx/r.php?url=dev.endquote.com%2Findex.php%3Fid%3D376"
onmouseout="doTangent()"
onmousover="doTangent(\'dev.endquote.com/index.php?id=376\',\'Pre-Phoenix
Festival.\',\'2001-07-04\',\'ly The Phoenix Festival is the day
aft\',\'dev.endquote.com/index.php?id=396\',\'Money and
DSL.\',\'2001-08-27\',\'CA World Sound Festival and maybe kick\')">';
$post = '</a></b>';
if (is_array($data)){$data=join('',$data);}
$forb=array();
$end=0;
do{
$start=strpos($data,"<!--",$end);
if ($start === false){
break;
}
$end=strpos($data,"-->",$start);
$forb[]=array($start,$end);
}while($start<strlen($data));
$end=0;
do{
$start=strpos($data,"<",$end);
if ($start === false){
break;
}
$end=strpos($data,">",$start);
$forb[]=array($start,$end);
}while($start<strlen($data));
$end=0;
do{
$start=strpos($data,"<a",$end);
if ($start === false){
break;
}
$end=strpos($data,"</a>",$start);
$forb[]=array($start,$end);
}while($start<strlen($data));
$dataL=strtolower($data);
foreach($words as $word){
$word=strtolower($word);
$pos=0;
do{
$pos=strpos($dataL,$word,$pos);
if ($pos===false){break;}
if (check($pos,$forb)){
if ($pos>=1){
$before=substr($data,$pos-1,1);
}else{$before='';}
if ($pos<strlen($data)){
$after=substr($data,$pos+strlen($word),1);
}else{$after='';}
if (eregi("[a-z]",$before.$after)){
$pos++;
continue;
}
$NEW=$pre.substr($data,$pos,strlen($word)).$post;
$data=substr($data,0,$pos).$NEW.substr($data,$pos+strlen($word));
$dataL=substr($dataL,0,$pos).$NEW.substr($dataL,$pos+strlen($word));
$end=0;
do{
$start=strpos($NEW,"<",$end);
if ($start === false){ break; }
$end=strpos($NEW,">",$start);
# print "new forbidden areas: ".($start+$pos)." ,
".($end+$pos)."\n";
$forb[]=array($start+$pos,$end+$pos);
}while($start<strlen($NEW));
$forb=updateForb($pos,strlen($NEW)-strlen($word),$forb);
$pos=$pos+7;
break;
}else{
$pos++;
}
}while($pos!=false and $pos<strlen($data));
}
return $data;
}
function updateForb($pos,$sum,$forb){
$ret=array();
foreach($forb as $f){
list($start,$end)=$f;
if ($pos<=$start){
$start=$start+$sum;
$end=$end+$sum;
}
$ret[]=array($start,$end);
}
return $ret;
}
function check($pos,$forb){
foreach($forb as $f){
list($start,$end)=$f;
if ($pos>=$start and $pos<=$end){
return 0;
}
}
return 1;
}
?>
|