Google Answers Logo
View Question
 
Q: Strange list of words ( Answered 5 out of 5 stars,   0 Comments )
Question  
Subject: Strange list of words
Category: Computers > Internet
Asked by: apteryx-ga
List Price: $3.09
Posted: 03 Jan 2004 13:54 PST
Expires: 02 Feb 2004 13:54 PST
Question ID: 292826
What is the significance or purpose of this list of words?  This is a
TinyURL version of the link:

http://tinyurl.com/33c8m

I'd also like to know *how* you know and how sure you are of your conclusion.

The main site is in German (with English translation); I think a
knowledge of German might be helpful in divining its intent.

Thank you,
Apteryx
Answer  
Subject: Re: Strange list of words
Answered By: aceresearcher-ga on 03 Jan 2004 15:11 PST
Rated:5 out of 5 stars
 
Welcome back, apteryx!

This TinyURL resolves to the actual URL
http://cvs.berlios.de/cgi-bin/viewcvs.cgi/gentoo-deutsch/sonstiges/ghs/templates/Attic/english.0?rev=1.2

Open Source computer code is software that has been freely made
available to any developer who wishes to experiment with, tweak,
and/or enhance that software.
http://dictionary.reference.com/search?q=open+source&r=67

BerliOS is an Open Source Mediator for computer program developers:
"The goal of BerliOS is to provide support for different interest
groups in the area of Open Source Software (OSS). Our aim is to fulfil
a neutral mediator function. The target groups of BerliOS are on one
hand the developers and users of Open Source Software and on the other
hand commercial manufacturers of OSS operating systems and
applications as well as support companies."
http://www.berlios.de/about/index.php.en

CVS is a software file Version Control System helps prevent a
developer from uploading a file on top of the file uploaded by another
developer working on the same thing simultaneously, thus destroying
the other's changes. It also allows you to compare two different
versions of a file to see only the differences between them, which can
be very helpful in debugging software.
http://www.loria.fr/~molli/cvs/doc/cvs_1.html#SEC2

This file is one of several contained in the "Templates" folder for
GHS, which stands for "Gento-Hilfe-System", which is German for "Gento
Help System".
http://cvs.berlios.de/cgi-bin/viewcvs.cgi/gentoo-deutsch/sonstiges/ghs/templates/?hideattic=0#dirlist

The files in this folder are used to "parse", or process, Search terms
in the Gento-Hilfe-System. This particular file, "Affix",
http://cvs.berlios.de/cgi-bin/viewcvs.cgi/gentoo-deutsch/sonstiges/ghs/templates/Attic/english.aff?rev=1.2
is used to determine common prefixes and suffixes for English Words,
such as re-, in-, un-, -ed, -ly, -est, etc.

The file about which you are asking is a file of English words
appearing in the GHS. When used in conjunction with the Affix file
listed above, different forms of these words which appear in the Help
system can be created and matched to search terms entered by a user,
thus enabling the Help system to find more results by using the root
words of the search terms the user enters.
http://cvs.berlios.de/cgi-bin/viewcvs.cgi/gentoo-deutsch/sonstiges/ghs/templates/Attic/english.0?rev=1.2

Search Strategy

I used various forms of the URL you provided to get a feel for what
the site contained:

http://cvs.berlios.de/cgi-bin/viewcvs.cgi/gentoo-deutsch/sonstiges/ghs/templates/Attic
http://www.berlios.de (click on "English", then on "About Us")
http://cvs.berlios.de/cgi-bin/viewcvs.cgi (click on "CVS Help")

Open Source computer code
://www.google.com/search?q=Open+Source+computer+code

Google Language Translation
http://translate.google.com/translate_t
Translate text: Gento Hilfe System   from German to English


Before Rating my Answer, if you have any questions about the
information which I have presented, please post a Request for
Clarification.

I hope that this Answer provides exactly the information which you were seeking!

Best wishes,

aceresearcher

Request for Answer Clarification by apteryx-ga on 03 Jan 2004 16:06 PST
So far, so good, aceresearcher.  I haven't been away, though--I've
posted more questions in the past week than I usually do in a month! 
Guess I must be running a high level of QSH, the curiosity hormone,
this season.

Before posting my question, I did poke around the BerliOS site and
look at the description and mission statement and so on, but I wasn't
able to piece it all together well enough to figure out what this list
is for.  (I made the tiny URL from the expanded one.)  I was also
looking for any evidence that the site might be a cover for or aid to
some kind of marketing operation, maybe serving spammers, but I am not
experienced enough to know what signs to look for.

I'm still not sure I really understand your explanation.  I know
little bit about search algorithms and information retrieval, and I
understand parsing and affixes and other concepts having to do with
language.  What I would still like to know is *how* the list is
used--meaning what kinds of operations would be performed on or with
it--and how the result could aid in a search.  Can you give an example
that doesn't involve too many leaps or guesses?  Maybe you could
illustrate with some terms from the list:  how about 'Agamemnon',
'tergiversator', 'stipitate', and 'collywobbles'?  (Bonus points if
you can use them all in one sentence.)

One other thing, and a guess would be just fine here:  can you imagine
someone's being able to use this or a similar list to generate spam
messages, and if so, how would that work?  No need to research
this--just your top-of-the-head reaction is all I'm after.

Thank you,
Apteryx

Clarification of Answer by aceresearcher-ga on 03 Jan 2004 22:08 PST
Hi, apteryx!

I'm sorry if you felt that my explanation was not very clear. Let me
try it this way:

I'm searching a dictionary (or glossary, or help system, or website,
etc) for the word "tergiversating". The search software checks the
built-in list of available keywords in the database against the search
term(s) I entered. Well, that file contains "tergiversator" and
"tergiversate", which are close, but not quite a cigar. **However**,
because the programmer was so clever, instead of just telling me that
it can't find the word in its files and that sorry, I'm just out of
luck, the software goes on to check, using the Affix file, if
"tergiversating" might be a variant of the word "tergiversate".

Since "-ing" is listed in the Affix file as a legitimate variant of
words ending in "-e", lo and behold, the software, instead of
disappointing me and traumatizing me forever, can return a list of
search results to me which include the word "tergiversate", a word
that *is* contained in the help database.


Your hunch that this kind of "dictionary" list can be used to generate
random "Subject" lines and "From" addresses for spam mailings is a
correct one. With software that randomly selects words from a
dictionary file, then randomly creates variants of those words using
an Affix file to append prefixes or suffixes to those words, a pool of
potential words *much* larger than the original dictionary can be
created.

If you are interested in a detailed, step-by-step technical
explanation of how such a program would work -- or in obtaining
program code that actually performs this function -- there are a
number of Researchers who are programming gurus and who might be able
to do this for you; however, you would likely need to post that
Question with a considerably higher fee attached to justify the work
that it would involve.


As for your last additional Question:

Although he was an accomplished tergiversator, Agamemnon got a severe
case of the collywobbles every time he told the outright lie that
palms were trees, rather than stipitate cycads.

tergiversator
http://dictionary.reference.com/search?q=tergiversator

Agamemnon
http://dictionary.reference.com/search?q=Agamemnon

collywobbles
http://dictionary.reference.com/search?q=collywobbles

stipitate
http://dictionary.reference.com/search?q=stipitate

Regards,

ace

Clarification of Answer by aceresearcher-ga on 01 Jan 2007 05:41 PST
Apteryx,

The process of using root words, prefixes, and suffixes in computer
algorithms is known as stemming or conflation. You can read more about
it in Wikipedia:

http://en.wikipedia.org/wiki/Stemming

Regards,
aceresearcher
apteryx-ga rated this answer:5 out of 5 stars and gave an additional tip of: $6.17
Oh, bravo, aceresearcher!  Totally satisfactory, and I thank you. 
Here's a little something extra for your thorough explanation, which I
fully understand (having spent some years as a programmer myself in a
former life), and a whopping 4.88 superbonus, which I compute to be
the appropriate compensation for your sentence, at the rate of 1.22
per challenge word.  Nicely done.

Apteryx

Comments  
There are no comments at this time.

Important Disclaimer: Answers and comments provided on Google Answers are general information, and are not intended to substitute for informed professional medical, psychiatric, psychological, tax, legal, investment, accounting, or other professional advice. Google does not endorse, and expressly disclaims liability for any product, manufacturer, distributor, service or service provider mentioned or any opinion expressed in answers or comments. Please read carefully the Google Answers Terms of Service.

If you feel that you have found inappropriate content, please let us know by emailing us at answers-support@google.com with the question ID listed above. Thank you.
Search Google Answers for
Google Answers  


Google Home - Answers FAQ - Terms of Service - Privacy Policy