Hello broker-ga,
I had to smile when I saw your question, because I've had a
long-standing interest in the very same topic -- analyzing phrase
frequencies in text -- and have been surprised over the years at how
frustratingly difficult it can be to find the right tools.
To cut to the chase, there is software out there available for free
that will do the job for you (and the software will have to do...I
don't think there's a service in existence that will do the analysis
for you at no charge).
However, be warned...the software is pretty cumbersome to get used to;
it's DOS based, doesn't come with much documentation, and tends to use
the very arcane lingo of linguistics, which is almost incomprehensible
to normal human beings.
However, even though its pretty dated at this point, I haven't found
anything better among the freeware offerings.
That said, some time ago I tinkered with the software for a while, and
then ran a phrase analysis on a downloadable text of the Bible. Here
are some of my results of the most frequently-occuring five-word (or
more) phrases:
132 occurences of the phrase: of the children of israel
95 and the lord said to
91 the lord spoke to moses
83 and the lord spoke to
81 and the lord spoke to moses
72 the children of israel and
70 to the children of israel
66 the lord spoke to moses saying
63 the tabernacle of the testimony
63 and the lord spoke to moses saying
59 and the children of israel
59 is the family of the
55 as the lord had commanded
54 out of the land of
52 the lord said to moses
The same could be done for for any text document, and for any group of
n-word phrases, where n is 2,3,4 or any number of your choosing.
The software itself is called TACT -- Text Analysis Computing Tools --
and can be found here:
http://www.chass.utoronto.ca/cch/tact.html
Click on the links "Disk 1" and "Disk 2" to download the software
(shows you how long ago it was created -- they were still using
floppies!). It might help to right-click and then choose "Save Target
As" on the pulldown menu.
There's a link on the page for "ordering Information", but this is not
for the software (which is free) but for documentation about the
program -- your call as to whether you want to spring for it or not.
Like I said, the software itself is very useful, but there *is* a
learning curve. There's no way to fully talk you through the initial
stages -- best thing I can suggest is to simply ask me any questions
here. Just post a "Request for Clarification" to let me know what
questions come up as you play with the software. I'll do my best to
respond promptly.
One more thing. Here's a link to other text analysis tools. Again,
the lingo is hard to wade through, but the tools themselves are very
interesting...some of them may be of use to you:
http://www.sil.org/linguistics/computing.html
Good luck in your ventures.
pafalafa-ga
search strategy: None -- used bookmarked sites and personal
knowledge. |
Clarification of Answer by
pafalafa-ga
on
28 Aug 2003 07:21 PDT
Broker-ga,
The text analysis you want to do is fairly sophisticated and fairly
arcane -- a combination that doesn't lend itself to software solutions
that are both easy and free. As the comment from yosarian-ga notes,
you are really trying to do n-gram linguistic analysis, and this is a
tough field to find easy-to-use tools.
That said, I urge you to try out the TACT software I found for you.
Although it is DOS-based, it will run on your Windows system, so that
shouldn't be an obstacle (Windows itself, until recently, was
DOS-based as well).
However, if you want Windows alternatives, they are out there, but:
(1) there's no reason to think they're any easier to use and (2)
they're not free.
I've no direct familiarity with these other programs, but you might
want to explore them if you're looking for options:
Wordstat text analysis software at:
http://www.simstat.com/home.html
Hyperresearch at:
http://www.researchware.com/
These *probably* will do the trick, but I have no direct familiarity
with them, so I'm afraid I can't make any promises.
I really do think that the TACT software is your absolute best bet.
Try it out, and if you run into any difficulties, just post a note
here to let me know, and I'll be glad to try to walk through the
set-up and use of the software.
Good luck.
pafalafa-ga
|