I need a large list of Spanish words suitable for validating entries
in a Scrabble-like word puzzle game (i.e. words that you'd find in a dictionary).
The word list must be large (>80,000 words), comprehensive, and
reasonably free of slang and proper names (which won't be valid in the
game). It should include plurals and the various conjugated forms of
verbs - i.e. any word you could legally play in Scrabble or a similar
style game.
Most importantly, the word list must be public domain, open source, or
available under some sort of license that makes it suitable for use in
my game (and there must be a clear license/statement to that effect,
on a reasonable, non-warez site).
The list can be in any reasonable text format - if I can cut and paste
it into Windows Notepad, that's all I need.
I do not speak Spanish, but I will have a Spanish speaking
acquaintenance verify that it looks reasonable.
(Note that I've posted this question 3 times - once each for Spanish,
German, and French - feel free to answer for each language for a
bigger payment) |
Request for Question Clarification by
bobbie7-ga
on
28 Jun 2005 14:38 PDT
Would you be willing to pay for the word list?
A Spanish Word List containing 288,000 words costs US$499.00.
To redistribute the word list, a payment of a small fee is required
for each copy redistributed.
Would something like this interest you?
Thanks, Bobbie7
|
Request for Question Clarification by
bobbie7-ga
on
28 Jun 2005 14:48 PDT
Would COES meet your needs?
http://www.datsi.fi.upm.es/~coes/espell_readme/espell_readme.html
|
Clarification of Question by
psteinx-ga
on
28 Jun 2005 15:36 PDT
The COES appears to only have roots, and not conjugated verbs and the like.
(i.e. I eat cheese. I ate cheese. He eats cheese. Como queso. Comí
queso. El come queso. - the different forms of 'eat/como' are not in
the list. I'm not good enough at Spanish to figure out how to do this
automatically.)
$500 for a list sounds rather pricey - what's the per-unit
distribution fee on top of that? I'm not totally opposed to paying a
license fee, but the Spanish market for my game is rather small and
that's a bit pricey.
|
Hello again psteinx,
I found a Spanish wordlist at the Carnegie Mellon University (CMU )
Artificial Intelligence Repository.
This word list contains about 90,000 Spanish words.
File name: span_lex.zip
Download from here:
http://128.2.209.79/afs/cs.cmu.edu/project/ai-repository/ai/areas/nlp/corpora/dicts/spanish/
Or you may also use this direct link
http://128.2.209.79/afs/cs.cmu.edu/project/ai-repository/ai/areas/nlp/corpora/dicts/spanish/span_lex.zip
Spanish Lexicon
This directory contains a simple list of about 90,000 Spanish words.
Origin: /afs/umich.edu/group/itd/archive/linguistics/lexica
http://128.2.209.79/afs/cs/project/ai-repository/ai/areas/nlp/corpora/dicts/spanish/0.html
Readme file:
A simple list of about 90,000 Spanish words.
Courtesy of Dave Eddington <deddington@@acad1.mtsu.edu>,
of Middle Tennessee State University.
The ASCII codes used are as follows:
* = beginning of a word
# = end of a word
V\ = a vowel with an accute accent
n~ = n with a tilde over it
u$ = u with umlaut
http://128.2.209.79/afs/cs.cmu.edu/project/ai-repository/ai/areas/nlp/corpora/dicts/spanish/readme.txt
FREELY DISTRIBUTABLE and FREE USAGE
Use and copying of the software and the preparation of derivative
works based on this software are permitted, subject to the author's
terms and conditions. Public domain software and software covered by
the GNU General Public License automatically meet this definition.
(To save space, a single copy of the various versions of the GNU GPL
have been placed in the
directory copying/gpl/)
http://128.2.209.79/afs/cs.cmu.edu/project/ai-repository/ai/readme.txt
Search criteria:
Spanish words
Spanish words +zip
Spanish wordlist
Spanish word list
Best regards,
Bobbie7 |
Clarification of Answer by
bobbie7-ga
on
28 Jun 2005 16:30 PDT
The University of Michigan Linguistics Archive provides the Spanish
word list for download.
span-lex.zip
08-Nov-1993 22:04
260k
http://www.umich.edu/~archive/linguistics/texts/lexica/
It appears that this list is in the public domain.
?Welcome to the University of Michigan Linguistics Archive.?
?These archives are at archive.umich.edu in the /linguistics directory.
There is a collection of public domain, freeware, shareware, and other
files that may be useful to linguists.?
http://www.umich.edu/~archive/linguistics/00readme.txt
|
Request for Answer Clarification by
psteinx-ga
on
28 Jun 2005 18:35 PDT
Hmmm - closer, but I still don't think this does it. It does not
appear to have different verb forms/conjugations. Again, the example
I broke it.
I break it.
He broke it.
translates to
Yo lo rompí.
Yo lo rompo.
El lo rompió.
(I'm using this to test http://ets.freetranslation.com/)
the different forms of 'break' (rompi, rompo, rompio), are not in the
dictionary, and I don't know enough about spanish to know how to form
them automatically (or even if that's possible). Plus the list
doesn't identify verbs from nouns, which though not needed for my
final purpose, would perhaps facilitate automatic conjungation.
|
Clarification of Answer by
bobbie7-ga
on
28 Jun 2005 18:36 PDT
Thank you for the clarification. I´ll see what else I can find.
Bobbie7
|
Clarification of Answer by
bobbie7-ga
on
28 Jun 2005 20:08 PDT
Psteinx,
I found a list of 10,000 conjugated verbs.
Compjugador is a Spanish verbs conjugator. It is able to conjugate all
the verbs in the official Spanish language. It contains close to
10,000 verbs.
Download here:
http://sourceforge.net/projects/compjugador/
Compjugador is distributed under the GNU Public Licence.
http://compjugador.sourceforge.net/
Adding this file to the 90,000 word list that I provided previously
would give you a complete list of 100,000 words.
Would this solution work for you?
Thanks, Bobbie7
|
Clarification of Answer by
bobbie7-ga
on
28 Jun 2005 20:11 PDT
Just open the data file and you will see "verbs.txt" which contains
the list of 10,000 conjugated verbs.
|
Clarification of Answer by
bobbie7-ga
on
28 Jun 2005 20:17 PDT
The GNU GENERAL PUBLIC LICENSE can be found in the file "copying."
|
Request for Answer Clarification by
psteinx-ga
on
29 Jun 2005 07:34 PDT
I see the verbs.txt file, and it appears to have a lot of verbs in it.
But it is far from comprehensive. Again, examples like:
I eat cheese.
I ate cheese.
He eats cheese.
I slice cheese.
I sliced cheese.
He slices cheese.
I break it.
I broke it.
He broke it.
>>
Como queso. Comí queso. El come queso. Corto queso. Corté queso. El
corta queso. Yo lo rompo. Yo lo rompí. El lo rompió.
>>
The verbs in the examples should all be common verbs, conjugated in
common ways, but I don't find any of the Spanish ones in verbs.txt
|
Clarification of Answer by
bobbie7-ga
on
29 Jun 2005 07:55 PDT
I´ll continue my search.
Thank you for your patience.
--Bobbie7
|
Clarification of Answer by
bobbie7-ga
on
29 Jun 2005 10:35 PDT
Enter the verb to conjugate and clicck on the word "conjuga". The
program will fully conjugate 10,000 Spanish verbs in all tenses
http://turingmachine.org/compjugador/
I checked and conjugations for the verbs romper(to break) and comer
(to eat) are provided.
Contact information
Daniel M. Germán
email: dmg@csg.uwaterloo.ca
Would this be of any use for you?
|
Clarification of Answer by
bobbie7-ga
on
29 Jun 2005 10:39 PDT
A short biography and alternate contact information for Daniel M.
German is available here:
http://sern.ucalgary.ca/courses/SENG/693/F00/readings/German/
|