Google Answers Logo
View Question
 
Q: Linquistics: English language word frequency lists ( Answered 5 out of 5 stars,   1 Comment )
Question  
Subject: Linquistics: English language word frequency lists
Category: Reference, Education and News > Teaching and Research
Asked by: akirsten-ga
List Price: $100.00
Posted: 06 Feb 2005 16:26 PST
Expires: 08 Mar 2005 16:26 PST
Question ID: 470042
What are the 1000 most common words in the English language? (Drawn
from a "Very Large" corpus.) I need an ordered list that includes a
numerical measure for their relative freuency.  For example it is not
enough for me to know that "him" is more common than "dog."  I also
need a quantifiable measure
of how much more common.
Answer  
Subject: Re: Linquistics: English language word frequency lists
Answered By: scriptor-ga on 06 Feb 2005 17:46 PST
Rated:5 out of 5 stars
 
Dear akirsten,

The British National Corpus (BNC) should qualify as a "very large"
corpus. It contains English texts  from a wide range of sources with a
total of 100 million words.

This is a frequency list for the complete BNC [1]:
ftp://ftp.itri.bton.ac.uk/bnc/all.num.o5

The format for the list is: Frequency - Word - Part-of-speech-code
(POS) - Number of files the word occurs in.
For the definitions of the POS codes used, please have a look at this list [2]:
http://www.itri.brighton.ac.uk/~Adam.Kilgarriff/poscodes.html


There is also a lemmatised list of the 6318 words with more than 800
occurrences in the whole 100M-word BNC [3]. It might be interesting
because of the differences to the unlemmatised "raw" frequency list:
ftp://ftp.itri.bton.ac.uk/bnc/lemma.num

The format of the list is: Sort-order - Frequency - Word - Word-Class


For comparison purposes, it might be interesting to also have a look
at the word frequency list for the Brown Corpus of general
non-academic English. This Corpus contains only 1,015,945 words, but
includes American English. Here is a frequency list for the 2000 most
frequent words from the Brown Corpus [4]:
http://www.edict.com.hk/lexiconindex/frequencylists/words2000.htm


I hope that this will be useful for you!
Best regards,
Scriptor




Sources:

British National Corpus
http://www.natcorp.ox.ac.uk/

[1] BNC database and word frequency lists, by Adam Kilgarriff,
University of Brighton: Unlemmatised frequency list
ftp://ftp.itri.bton.ac.uk/bnc/all.num.o5

[2] BNC database and word frequency lists, by Adam Kilgarriff,
University of Brighton: BNC Part-of-speech codes
http://www.itri.brighton.ac.uk/~Adam.Kilgarriff/poscodes.html

[3] BNC database and word frequency lists, by Adam Kilgarriff,
University of Brighton: Lemmatised frequency list
ftp://ftp.itri.bton.ac.uk/bnc/lemma.num

[4] Edict Virtual Language Center: Words listed by frequency - the
first 2000 most frequent words from the Brown Corpus
http://www.edict.com.hk/lexiconindex/frequencylists/words2000.htm

Edict Virtual Language Center: Word Frequency Text Profiler
http://www.edict.com.hk/textanalyser/

University of Essex: W3-Corpora - The Brown Corpus
http://clwww.essex.ac.uk/w3c/corpus_ling/content/corpora/list/private/brown/brown.html

University of Essex: W3-Corpora
http://clwww.essex.ac.uk/w3c/

University of Tübingen: LINGUIST List 9.1319
http://www.sfs.nphil.uni-tuebingen.de/linguist/issues/9/9-1319.html

Cunningham & Cunningham, Inc.: How we talk
http://c2.com/cgi/wiki?HowWeTalk


Search terms used:
"word frequency" english
://www.google.de/search?hl=de&newwindow=1&c2coff=1&q=%22word+frequency%22+english&btnG=Suche&meta=
"brown corpus" "word frequency"
://www.google.de/search?q=%22brown+corpus%22+%22word+frequency%22&hl=de&lr=&newwindow=1&c2coff=1&start=0&sa=N
"english word frequency list"
://www.google.de/search?hl=de&newwindow=1&c2coff=1&q=%22english+word+frequency+list%22&btnG=Suche&meta=
"British National Corpus" "frequency list"
://www.google.de/search?q=%22British+National+Corpus%22+%22frequency+list%22&hl=de&lr=&newwindow=1&c2coff=1&start=0&sa=N
"british national corpus"
://www.google.de/search?hl=de&newwindow=1&c2coff=1&q=%22british+national+corpus%22&btnG=Suche&meta=
akirsten-ga rated this answer:5 out of 5 stars

Comments  
Subject: Re: Linquistics: English language word frequency lists
From: mononexo-ga on 06 Feb 2005 20:05 PST
 
You may find this site useful and fun:
http://www.wordcount.org/index2.html

Important Disclaimer: Answers and comments provided on Google Answers are general information, and are not intended to substitute for informed professional medical, psychiatric, psychological, tax, legal, investment, accounting, or other professional advice. Google does not endorse, and expressly disclaims liability for any product, manufacturer, distributor, service or service provider mentioned or any opinion expressed in answers or comments. Please read carefully the Google Answers Terms of Service.

If you feel that you have found inappropriate content, please let us know by emailing us at answers-support@google.com with the question ID listed above. Thank you.
Search Google Answers for
Google Answers  


Google Home - Answers FAQ - Terms of Service - Privacy Policy