Google Answers Logo
View Question
Q: Feature film & statistics ( Answered 5 out of 5 stars,   0 Comments )
Subject: Feature film & statistics
Category: Arts and Entertainment > Movies and Film
Asked by: tarre-ga
List Price: $7.00
Posted: 22 Jan 2003 01:14 PST
Expires: 21 Feb 2003 01:14 PST
Question ID: 146878
What is the average count of unique words (without repetition) in
American feature film?
Subject: Re: Feature film & statistics
Answered By: sycophant-ga on 22 Jan 2003 02:32 PST
Rated:5 out of 5 stars
This is a little difficult to answer accurately, as films vary
drastically. Also, information such as unique word count are not
readily available. What is available however, are scripts.

So, I have examined the following sctipts from Drew's Script-o-Rama

The Sixth Sense
Austin Powers
Chasing Amy
Interview With A Vampire
Thirteen Days

Using a complex combination of GNU tools (detailed below) I derived
the following numbers:

The Sixth Sense: 1421 (from 22753 words)
Austin Powers: 2096 (from 19091 words)
Chasing Amy: 1889 (from 23232 words)
Interview With A Vampire: 1736 (from 22371 words)
Thirteen Days: 2457 (from 33311 words)
Traffic: 2436 (from 29872 words)

This is by no means an accurate sample of all American movies, however
a reasonable estimate can be hazarded based on these numbers, or an
average of around 2000 unique words. However these scripts also
include shooting directions and some abbreviations. For this, based of
reading through some of them, I suspect we can subtract around 20 from
our estimate (most words seem to be used if dialogue too, however
some, especially film terms are not). This brings us to an average of

This number appears to be about 1/5th to 1/10th of the average
vocabulary of a native english speaker (estimates seems to vary from
10,000 to 20,000)

For your information the 10 most common words in my sample are:
   8316 the
   3659 a
   3291 to
   3126 and
   2510 of
   2424 you
   1932 in
   1800 i
   1623 is
   1450 it

Script text was passed though the following command pipeline:
  tr '[A-Z]' '[a-z]' | tr -cd '[A-Za-z0-9_ \012]' | tr -s '[ ]' '\012'
| sort | uniq -u | wc -w
  (For unique count)
  tr '[A-Z]' '[a-z]' | tr -cd '[A-Za-z0-9_ \012]' | tr -s '[ ]' '\012'
| sort |  wc -w
  (For total count)

If you feel my sample to really too small, let me know, and I can
double, or triple it, although I suspect the outcome will remain

tarre-ga rated this answer:5 out of 5 stars
I know that it is hard to say exactly: "And the average count is .. " 
 .. :))) .. but the answer satisfies me. Thanks a lot!

There are no comments at this time.

Important Disclaimer: Answers and comments provided on Google Answers are general information, and are not intended to substitute for informed professional medical, psychiatric, psychological, tax, legal, investment, accounting, or other professional advice. Google does not endorse, and expressly disclaims liability for any product, manufacturer, distributor, service or service provider mentioned or any opinion expressed in answers or comments. Please read carefully the Google Answers Terms of Service.

If you feel that you have found inappropriate content, please let us know by emailing us at with the question ID listed above. Thank you.
Search Google Answers for
Google Answers  

Google Home - Answers FAQ - Terms of Service - Privacy Policy