Clarification of Answer by
webadept-ga
on
30 Aug 2002 20:05 PDT
Hi again,
If you need a 100k file I can probably make one for you. I figured
that the link site at the wordlist project at sourceforge was far more
than you would ever need, and the 56k file was just something I had.
The way I made the 56k file was from a project I'm working on
(personal project) which is creating vocabulary lists and thesaurus
lists from authors. Shakespeare, Milton and Twain are in that file.
The tool I have right now, downloads the complete works of those three
from an open source project and extracts the vocabulary of words with
3 characters or more. But building a greater list shouldn't be much of
a problem. I would just download a few more books from different
authors and run the de-dup program on the vocabularies.
The thing that threw me in your question the most was "common english
words" and the 45k expectation. "common english words" only equal
about 8-10k of words. Even for well spoken people. Shakespeare had
over 15k (though my count came closer to 19k but this includes names
and places)in his works, but that's really high.
But if you want 100k, I can probably get really close to that, perhaps
even over, but they are not going to be "common english words".
Thanks,
webadept-ga