Google Answers Logo
View Question
 
Q: Replacing upper-case words ( No Answer,   14 Comments )
Question  
Subject: Replacing upper-case words
Category: Computers > Programming
Asked by: j_philipp-ga
List Price: $8.00
Posted: 09 Apr 2003 06:28 PDT
Expires: 11 Apr 2003 21:37 PDT
Question ID: 188219
I want to convert ...

"This is ONLY a small example. NOT the real text.
NOT AT ALL. I am sure."

... into ...

"This is <em>only</em> a small example. <em>Not</em> the real text.
<em>Not at all</em>. I am sure."


A PHP regular expression, Word Macro, and what not would answer the
question. (It doesn't have to be fast, but note that the texts are
several hundred KB.) Thanks.

Request for Question Clarification by hammer-ga on 09 Apr 2003 06:38 PDT
What constitutes a replacable chunk? More than one capital letter in a
row? All caps between spaces? Between punctuation? Could you elaborate
a bit on the rules?

- Hammer

Request for Question Clarification by hammer-ga on 09 Apr 2003 06:39 PDT
Also, what's the required input format and output format?

- Hammer

Request for Question Clarification by ragingacademic-ga on 09 Apr 2003 13:09 PDT
j_philipp -

Thanks for your question.

Unless this is something you have to do on a regular basis - here's a
simple, if not elegant, solution...

1) Dump the text into a .doc (Word) file.
2) Select all.
3) Hit <Shift-F3> twice.

Should work like a charm.

If this works for you, let me know, and I'll repost as a reply.

thanks,
ragingacademic

Clarification of Question by j_philipp-ga on 10 Apr 2003 00:54 PDT
Ragingacademic,

Could you please explain in the comments what your method achieves and
wether or not it works. (I do not have Word on this machine.)


Hammer,

What constitutes a replacable chunk? I think it must be more than one
letter, and it must be upper-case. Exceptions, as Mathtalk pointed out
in the comments, are words like USA. But, I could live with the fact
those would be (incorrectly) replaced with lower-case emphasis by the
algorithm.

As for the input output format, that is normal Notepad-readable text
(ASCII, UniCode, ISOCode -- I don't know exactly). I think it's of not
much importance though for the algorithm, or is it?


Knowledgeseeker,

I'm trying out your suggestion and will let you know if it works for
me. The download is quite big though (it seems I need to download two
programs). Thanks for the info.


Mathtalk,

The algorithm to find out wether or not the word is at the beginning
of a sentence should be something like: was the previous letter,
skipping spaces, quotation marks and line-feeds, either one of "!",
".", "?". Not sure if that would do the complete job though.


Enzo,

I will give your code a try and let you know. I will expand the code
so that it covers "A" additional to "I", which I think would be
crucial.


Thanks to everyone.

Clarification of Question by j_philipp-ga on 10 Apr 2003 01:45 PDT
Enzo,

The code works fine on the sample string provided, however I get a
server error when I run it on a bigger file (about 250KB). I do not
have PHP installed locally and online the exact error is hidden (I
cannot fully control my server settings). Maybe there's too much
memory in use? It didn't seem like a time-out since the error message
came back after ca. 15 seconds.


Knowledge_seeker,

I installed the program but do not know how to achieve the wanted
effect, if you could elaborate on that? Also, it must work with the
trial version.


Thanks again.
Answer  
There is no answer at this time.

Comments  
Subject: Re: Replacing upper-case words
From: knowledge_seeker-ga on 09 Apr 2003 07:26 PDT
 
Hey j --

I'm no wizard at this stuff, but would this work for you? 

TEXTPIPE FOR ASKSAM
http://www.asksam.com/textpipe.asp

They offer a free trial version and it does work as a stand alone (you
don't need ASKSAM.)

-K~
Subject: Re: Replacing upper-case words
From: mathtalk-ga on 09 Apr 2003 07:41 PDT
 
To judge by the example that j_philipp-ga gives, there is a tricky
detail in preserving the capitalization of words occuring at the
beginning of a sentence.

It is easy to give a regular expression that finds an isolated word
that consists of all caps (and then to make the appropriate
replacement).  What is difficult is to detect whether that word occurs
at the beginning of a sentence.

Perhaps it might be acceptable to <em>Always</em> preserve the
uppercase first letter of such words?

-- mt
Subject: Re: Replacing upper-case words
From: enzo-ga on 09 Apr 2003 09:32 PDT
 
How about something like this...

<?php

function emText($string=NULL) {

	if ($string == NULL) {
		$string = "This is ONLY a small example! NOT the real text?  NOT AT
ALL. I am sure.";
	}

	$new_string = str_replace("  "," ",$string);
	
	$words = explode(" ",$new_string);
	while (list($key,$value) = each($words)) {
		if (($value == strtoupper($value)) and ($value != "I")) {
			if ((substr($words[$key-1],-1) == ".") or
				(substr($words[$key-1],-1) == "!") or 
				(substr($words[$key-1],-1) == "?")) {
				$value = "<em>".ucfirst(strtolower($value))."</em>"; 
			} else {	
				if ((substr($value,-1) == ".") or
					(substr($value,-1) == "!") or
					(substr($value,-1) == "?")) {
					$mark = substr($value,-1);
					$value = substr($value,0,strlen($value)-1);
					$value = "<em>".strtolower($value)."</em>$mark";	
				} else {
					$value = "<em>".strtolower($value)."</em>";
				}	
			}
		}
		$new_words[] = $value;	
	}	
	
	$imp = implode(" ", $new_words);
	
	$new_string = $imp;
	$new_string = str_replace(". ",".  ",$new_string);
	$new_string = str_replace("! ","!  ",$new_string);
	$new_string = str_replace("? ","?  ",$new_string);
	$new_string = str_replace("</em> <em>"," ",$new_string);

//  For debugging	
/*	
	for ($i = 0; $i <= strlen($new_string); $i ++) {
		echo "I: $i ->".substr($new_string,$i,1)."<-
".ord(substr($new_string,$i,1))."<BR>";
	}
*/
	return $new_string;
}

echo emText();

?>
Subject: Re: Replacing upper-case words
From: mathtalk-ga on 09 Apr 2003 12:36 PDT
 
Hi, enzo-ga:

Rather than hardcoding "I" as an exception, it might be better to work
with the length of all-uppercase words as the criteria.  philipp-ga
would have to weigh in on this, but which of the following deserve
"emphasis"?

I, A, US, USA, NASA, etc.

-- mathtalk
Subject: Re: Replacing upper-case words
From: ragingacademic-ga on 10 Apr 2003 03:01 PDT
 
j - 

My pleasure.

My method does exactly what you require - I tested it on your example
and it works perfectly.

However, not that it will also fail the "USA" test...

ragingacademic
Subject: Re: Replacing upper-case words
From: j_philipp-ga on 10 Apr 2003 03:19 PDT
 
That's good enough. Ragingacademic, I'll test the approach and let you
know if it works for me.
Subject: Re: Replacing upper-case words
From: j_philipp-ga on 10 Apr 2003 03:36 PDT
 
Ragingacademic, I just found out "my" Internet Cafe doesn't have
Microsoft Word. It might take some time before I can test the
approach. But in case I don't get another solution to work in the
meantime, I won't forget you! Sorry for the confusion.
Subject: Re: Replacing upper-case words
From: enzo-ga on 10 Apr 2003 07:18 PDT
 
Ragingacademic, 
The Microsoft Word solution does allow the user to toggle from ALL
uppercase, to ALL lowercase, to Sentence case, but it does not place
<em> tags around the words that were in ALL uppercase.

Enzo
Subject: Re: Replacing upper-case words
From: enzo-ga on 10 Apr 2003 08:03 PDT
 
Thanks for giving my code a try.  I hope it works out for you, it has
been fun trying to solve your problem, and I enjoy the feedback.  The
code can be expanded to handle "A" and other characters/situations. 
You can add a line to the code to adjust the script execution timeout
as indicated below.  I have added a line before the function emText()
at the top of the code.  You might have to increase the 120 seconds
timeout to allow for more text to be processed by your server.  My run
time example is below.

set_time_limit ( 120 );   # will set the timeout to 120 seconds

function emText() 

My Pentium II w/256 MB RAM Mandrake 8.0 {398 BogoMIPS} Linux/Apache
Server ran this script with 787504 characters (about 552kb) of text in
112 seconds.  That is about 760 words evaluated a second with debuging
on, which considerably slows down the process.

If you still get time outs, you can split the text in half and parse
half at a time.  Does this work for you?

Enzo
Subject: Re: Replacing upper-case words
From: enzo-ga on 10 Apr 2003 08:33 PDT
 
P.S.  your error could have something to do with the text you are
using.  Does it have quotes " in it that would be exiting the $string
field?  If so we can work around that.  How about this complete code
snippet...?

<?php

set_time_limit ( 160 );

function emText($string=NULL) {

	if ($string == NULL) {
		$string = "This is ONLY a small example! NOT the real text?  NOT AT
ALL. I am sure.
This is ONLY a small example! NOT the real text?  NOT AT ALL. I am
sure. This is ONLY a small example! NOT the real text?  NOT AT ALL. I
am sure.";
	} 

	$new_string = str_replace("  "," ",$string);
	$new_string = str_replace(chr(10),"<lf>",$new_string);	
	$new_string = str_replace(chr(13),"<cr>",$new_string);		
	$new_string = str_replace(chr(9),"<tab>",$new_string);		
	
	$words = explode(" ",$new_string);
	while (list($key,$value) = each($words)) {
		if (($value == strtoupper($value)) and ($value != "I") and ($value
!= "A") ) {
			if ((substr($words[$key-1],-1) == ".") or
				(substr($words[$key-1],-1) == "!") or 
				(substr($words[$key-1],-1) == "?")) {
				$value = "<em>".ucfirst(strtolower($value))."</em>"; 
			} else {	
				if ((substr($value,-1) == ".") or
					(substr($value,-1) == "!") or
					(substr($value,-1) == "?")) {
					$mark = substr($value,-1);
					$value = substr($value,0,strlen($value)-1);
					$value = "<em>".strtolower($value)."</em>$mark";	
				} else {
					$value = "<em>".strtolower($value)."</em>";
				}	
			}
		}
		$new_words[] = $value;	
	}	
	
	$imp = implode(" ", $new_words);
	
	$new_string = $imp;
	$new_string = str_replace(". ",".  ",$new_string);
	$new_string = str_replace("! ","!  ",$new_string);
	$new_string = str_replace("? ","?  ",$new_string);
	$new_string = str_replace("</em> <em>"," ",$new_string);
	$new_string = str_replace("<lf>",chr(10),$new_string);
	$new_string = str_replace("<cr>",chr(13),$new_string);
	$new_string = str_replace("<tab>",chr(9),$new_string);
	
	$new_string = stripslashes($new_string);

	for ($i = 0; $i <= strlen($new_string); $i ++) { 
  		echo "I: $i ->".substr($new_string,$i,1)."<-".ord(substr($new_string,$i,1))."<BR>";
 	}


	return $new_string;

}

if ($HTTP_POST_VARS["emTextString"]) {
	echo "<PRE>".emText($HTTP_POST_VARS["emTextString"])."</PRE>";
	echo "<BR><P><A HREF=\"?\">Try again</A>";
} else {
?>
<HTML>
<BODY>
<FORM action="<?php echo $PHP_SELF; ?>" METHOD="POST" >
<textArea name="emTextString" COLS="60" ROWS="10">This is ONLY a small
example! "NOT" the real text?  NOT AT ALL. I am sure.
This is ONLY a small example! NOT the real text?  NOT AT ALL. I am
sure. This is ONLY "a small" example!</textArea><BR><P></P>
<input type=submit value=submit>
</FORM>
</BODY>
</HTML>
<?php

}
?>
Subject: Re: Replacing upper-case words
From: enzo-ga on 10 Apr 2003 08:37 PDT
 
Forgot to comment out the debugging... :(  sorry for making this
answer so lengthy...  comments welcomed...

<?php

set_time_limit ( 160 );

function emText($string=NULL) {

	if ($string == NULL) {
		$string = "This is ONLY a small example! NOT the real text?  NOT AT
ALL. I am sure.
This is ONLY a small example! NOT the real text?  NOT AT ALL. I am
sure. This is ONLY a small example! NOT the real text?  NOT AT ALL. I
am sure.";
	} 

	$new_string = str_replace("  "," ",$string);
	$new_string = str_replace(chr(10),"<lf>",$new_string);	
	$new_string = str_replace(chr(13),"<cr>",$new_string);		
	$new_string = str_replace(chr(9),"<tab>",$new_string);		
	
	$words = explode(" ",$new_string);
	while (list($key,$value) = each($words)) {
		if (($value == strtoupper($value)) and ($value != "I") and ($value
!= "A") ) {
			if ((substr($words[$key-1],-1) == ".") or
				(substr($words[$key-1],-1) == "!") or 
				(substr($words[$key-1],-1) == "?")) {
				$value = "<em>".ucfirst(strtolower($value))."</em>"; 
			} else {	
				if ((substr($value,-1) == ".") or
					(substr($value,-1) == "!") or
					(substr($value,-1) == "?")) {
					$mark = substr($value,-1);
					$value = substr($value,0,strlen($value)-1);
					$value = "<em>".strtolower($value)."</em>$mark";	
				} else {
					$value = "<em>".strtolower($value)."</em>";
				}	
			}
		}
		$new_words[] = $value;	
	}	
	
	$imp = implode(" ", $new_words);
	
	$new_string = $imp;
	$new_string = str_replace(". ",".  ",$new_string);
	$new_string = str_replace("! ","!  ",$new_string);
	$new_string = str_replace("? ","?  ",$new_string);
	$new_string = str_replace("</em> <em>"," ",$new_string);
	$new_string = str_replace("<lf>",chr(10),$new_string);
	$new_string = str_replace("<cr>",chr(13),$new_string);
	$new_string = str_replace("<tab>",chr(9),$new_string);
	
	$new_string = stripslashes($new_string);

	return $new_string;
}

if ($HTTP_POST_VARS["emTextString"]) {
	echo "<PRE>".emText($HTTP_POST_VARS["emTextString"])."</PRE>";
	echo "<BR><P><A HREF=\"?\">Try again</A>";
} else {
?>
<HTML>
<BODY>
<FORM action="<?php echo $PHP_SELF; ?>" METHOD="POST" >
<textArea name="emTextString" COLS="60" ROWS="10">This is ONLY a small
example! "NOT" the real text?  NOT AT ALL. I am sure.
This is ONLY a small example! NOT the real text?  NOT AT ALL. I am
sure. This is ONLY "a small" example!</textArea><BR><P></P>
<input type=submit value=submit>
</FORM>
</BODY>
</HTML>
<?php

}
?>
Subject: Re: Replacing upper-case words
From: j_philipp-ga on 10 Apr 2003 23:51 PDT
 
Hello Enzo,

Thanks for all that great work you put into this question for free. It
works as soon as I split my files into smaller chunks. However, it
puts emphasis on numbers (e.g. "23" will become "<em>23</em>". Do you
know what's happening? If you feel like you've done enough already
(you have) just let me know and I will go through your code myself and
try to do the change.
Thanks a lot!
Subject: Re: Replacing upper-case words
From: enzo-ga on 11 Apr 2003 07:01 PDT
 
not a problem...  simple addition of "and (!is_numeric($value))" to
limit numbers and numeric text from being <em>'ed...  :)  Do you like
the text box submit...?  I am glad to help...  it was fun...  even for
free...  ;)

complete code:

<?php

set_time_limit ( 160 );

function emText($string=NULL) {

	if ($string == NULL) {
		$string = "This is ONLY a small example! NOT the real text?  NOT AT
ALL. I am sure.
Now to include numeric text. 23 is not 2.3 nor -2.3";
	} 

	$new_string = str_replace("  "," ",$string);
	$new_string = str_replace(chr(10),"<lf>",$new_string);	
	$new_string = str_replace(chr(13),"<cr>",$new_string);		
	$new_string = str_replace(chr(9),"<tab>",$new_string);		
	
	$words = explode(" ",$new_string);
	while (list($key,$value) = each($words)) {
		if (($value == strtoupper($value)) and ($value != "I") and ($value
!= "A") and (!is_numeric($value)) ) {
			if ((substr($words[$key-1],-1) == ".") or
				(substr($words[$key-1],-1) == "!") or 
				(substr($words[$key-1],-1) == "?")) {
				$value = "<em>".ucfirst(strtolower($value))."</em>"; 
			} else {	
				if ((substr($value,-1) == ".") or
					(substr($value,-1) == "!") or
					(substr($value,-1) == "?")) {
					$mark = substr($value,-1);
					$value = substr($value,0,strlen($value)-1);
					$value = "<em>".strtolower($value)."</em>$mark";	
				} else {
					$value = "<em>".strtolower($value)."</em>";
				}	
			}
		}
		$new_words[] = $value;	
	}	
	
	$imp = implode(" ", $new_words);
	
	$new_string = $imp;
	$new_string = str_replace(". ",".  ",$new_string);
	$new_string = str_replace("! ","!  ",$new_string);
	$new_string = str_replace("? ","?  ",$new_string);
	$new_string = str_replace("</em> <em>"," ",$new_string);
	$new_string = str_replace("<lf>",chr(10),$new_string);
	$new_string = str_replace("<cr>",chr(13),$new_string);
	$new_string = str_replace("<tab>",chr(9),$new_string);
	
	$new_string = stripslashes($new_string);
/*
	for ($i = 0; $i <= strlen($new_string); $i ++) { 
  		echo "I: $i ->".substr($new_string,$i,1)."<-".ord(substr($new_string,$i,1))."<BR>";
 	}
*/
	return $new_string;
}

if ($HTTP_POST_VARS["emTextString"]) {
	echo "<PRE>".emText($HTTP_POST_VARS["emTextString"])."</PRE>";
	echo "<BR><P><A HREF=\"?\">Try again</A>";
} else {
?>
<HTML>
<BODY>
<FORM action="<?php echo $PHP_SELF; ?>" METHOD="POST" >
<textArea name="emTextString" COLS="60" ROWS="10">This is ONLY a small
example! NOT the real text?  NOT AT ALL. I am sure.
Now to include numeric text. 23 is not 2.3 nor
-2.3</textArea><BR><P></P>
<input type=submit value=submit>
</FORM>
</BODY>
</HTML>
<?php
}
?>
Subject: Re: Replacing upper-case words
From: j_philipp-ga on 11 Apr 2003 21:36 PDT
 
Enzo, a thousand thanks... if you ever become a Researcher I make sure
to drop a question your way! Yes, I like the text-box submit. This
code should do the job perfectly now!

Important Disclaimer: Answers and comments provided on Google Answers are general information, and are not intended to substitute for informed professional medical, psychiatric, psychological, tax, legal, investment, accounting, or other professional advice. Google does not endorse, and expressly disclaims liability for any product, manufacturer, distributor, service or service provider mentioned or any opinion expressed in answers or comments. Please read carefully the Google Answers Terms of Service.

If you feel that you have found inappropriate content, please let us know by emailing us at answers-support@google.com with the question ID listed above. Thank you.
Search Google Answers for
Google Answers  


Google Home - Answers FAQ - Terms of Service - Privacy Policy