Dear akirsten,
The British National Corpus (BNC) should qualify as a "very large"
corpus. It contains English texts from a wide range of sources with a
total of 100 million words.
This is a frequency list for the complete BNC [1]:
ftp://ftp.itri.bton.ac.uk/bnc/all.num.o5
The format for the list is: Frequency - Word - Part-of-speech-code
(POS) - Number of files the word occurs in.
For the definitions of the POS codes used, please have a look at this list [2]:
http://www.itri.brighton.ac.uk/~Adam.Kilgarriff/poscodes.html
There is also a lemmatised list of the 6318 words with more than 800
occurrences in the whole 100M-word BNC [3]. It might be interesting
because of the differences to the unlemmatised "raw" frequency list:
ftp://ftp.itri.bton.ac.uk/bnc/lemma.num
The format of the list is: Sort-order - Frequency - Word - Word-Class
For comparison purposes, it might be interesting to also have a look
at the word frequency list for the Brown Corpus of general
non-academic English. This Corpus contains only 1,015,945 words, but
includes American English. Here is a frequency list for the 2000 most
frequent words from the Brown Corpus [4]:
http://www.edict.com.hk/lexiconindex/frequencylists/words2000.htm
I hope that this will be useful for you!
Best regards,
Scriptor
Sources:
British National Corpus
http://www.natcorp.ox.ac.uk/
[1] BNC database and word frequency lists, by Adam Kilgarriff,
University of Brighton: Unlemmatised frequency list
ftp://ftp.itri.bton.ac.uk/bnc/all.num.o5
[2] BNC database and word frequency lists, by Adam Kilgarriff,
University of Brighton: BNC Part-of-speech codes
http://www.itri.brighton.ac.uk/~Adam.Kilgarriff/poscodes.html
[3] BNC database and word frequency lists, by Adam Kilgarriff,
University of Brighton: Lemmatised frequency list
ftp://ftp.itri.bton.ac.uk/bnc/lemma.num
[4] Edict Virtual Language Center: Words listed by frequency - the
first 2000 most frequent words from the Brown Corpus
http://www.edict.com.hk/lexiconindex/frequencylists/words2000.htm
Edict Virtual Language Center: Word Frequency Text Profiler
http://www.edict.com.hk/textanalyser/
University of Essex: W3-Corpora - The Brown Corpus
http://clwww.essex.ac.uk/w3c/corpus_ling/content/corpora/list/private/brown/brown.html
University of Essex: W3-Corpora
http://clwww.essex.ac.uk/w3c/
University of Tübingen: LINGUIST List 9.1319
http://www.sfs.nphil.uni-tuebingen.de/linguist/issues/9/9-1319.html
Cunningham & Cunningham, Inc.: How we talk
http://c2.com/cgi/wiki?HowWeTalk
Search terms used:
"word frequency" english
://www.google.de/search?hl=de&newwindow=1&c2coff=1&q=%22word+frequency%22+english&btnG=Suche&meta=
"brown corpus" "word frequency"
://www.google.de/search?q=%22brown+corpus%22+%22word+frequency%22&hl=de&lr=&newwindow=1&c2coff=1&start=0&sa=N
"english word frequency list"
://www.google.de/search?hl=de&newwindow=1&c2coff=1&q=%22english+word+frequency+list%22&btnG=Suche&meta=
"British National Corpus" "frequency list"
://www.google.de/search?q=%22British+National+Corpus%22+%22frequency+list%22&hl=de&lr=&newwindow=1&c2coff=1&start=0&sa=N
"british national corpus"
://www.google.de/search?hl=de&newwindow=1&c2coff=1&q=%22british+national+corpus%22&btnG=Suche&meta= |