Google Answers Logo
View Question
 
Q: What English letter do most commonly searched for keywords begin with. ( No Answer,   7 Comments )
Question  
Subject: What English letter do most commonly searched for keywords begin with.
Category: Computers > Algorithms
Asked by: gomvents-ga
List Price: $10.00
Posted: 07 Oct 2004 17:28 PDT
Expires: 06 Nov 2004 16:28 PST
Question ID: 411782
What English letter do most commonly searched for keywords begin with.
Not just the top 50, 100, 10000000 etc., but what will fill about 40TB
full of storage data. i'm looking to create a search engine database
system that groups clusters by letters, IE a search for cars would
pull from the C server....  if for example if words that begin with X,
Y, and Z are not common I may group them on one server... The idea
behind the systen is to distribute load, thanks!

Request for Question Clarification by googleexpert-ga on 09 Oct 2004 19:21 PDT
A nice place to start with, is the Google Zeitgeist page:
://www.google.com/press/zeitgeist.html

Also, Ask Jeeves and Webcrawler have similar pages where you can see Live searches:
http://sp.ask.com/docs/about/jeevesiq.html?o=0&q=yu%20&qsrc=0
http://msxml.webcrawler.com/info.wbcrwl/searchspy/results.htm?fci=1?filter=0&qcat=web

I think O and I are common as the first letter of most keyword
searches because of "Online"
and "Internet"

Also, because you are trying to distribute workload for a search
engine database, I suggest you read "The Google File System"
Paper: http://www.cs.rochester.edu/sosp2003/papers/p125-ghemawat.pdf

Please let me know if this helps you.

Thanks.

-googleepxert

Request for Question Clarification by pafalafa-ga on 09 Oct 2004 19:32 PDT
You never know WHAT you can find on the internet!

=====

Frequencies of initial letters
 

t, a, s, o, i, c, w, p, b, f, h, m, r, d, e, n, l, g, u, y, v, j, k, q, x, z

=====

While this isn't a list derived from search terms, I wouldn't be
surprised if searches closely match it, with perhaps a few key
changes:

--the popularity of "a" probably stems from the words a, an, and, none
of which find much currency in searches

--ditto with "t" -- the, these, those -- also not used much by
searchers, although "t" is very common for other terms as well, and is
probably high on the list anyway.

--"e" might have a high ranking due to the popularity of the "e"
prefix these days -- e-commerce, eBay, etc.



Does that help at all...?

Clarification of Question by gomvents-ga on 09 Oct 2004 20:36 PDT
pafalafa-ga, where did you get that information from for initial
letters? W seems to be very high up on that list... this is turning
out to be very tough...
Answer  
There is no answer at this time.

Comments  
Subject: Re: What English letter do most commonly searched for keywords begin with.
From: silver777-ga on 07 Oct 2004 22:41 PDT
 
Hi Gomvents,

Interesting. Sounds like a G**gle trade secret to me. Most common word
hits could only be collated by experience from another search engine
like the aforementioned. That ain't something I would willingly share
with another, considering the data gathered.

It might sound simple, but how about you start with the most
frequently used letters, to determine your own number of common
searches beginning with a given letter.

That is, in order ETOANIRSHDLCWUMFYGPBVKXQJZ. You might find that cars
have less hits than elephants. Unless of course the elephants are
hitting the cars.

Phil
Subject: Re: What English letter do most commonly searched for keywords begin with.
From: frde-ga on 08 Oct 2004 01:00 PDT
 
Although that initially sounds a good idea, it has drawbacks

Most searches will probably consist of two or more words, which means
that one query will need to attack two+ servers.

You would probably be better off replicating your database across a
number of identical machines.

Your idea of doing a 'transformation' is quite a good one, and is
certainly easily viable for the first two letters ... at least.
Subject: Re: What English letter do most commonly searched for keywords begin with.
From: gomvents-ga on 08 Oct 2004 06:27 PDT
 
I have two comments/questions for both of you... silver777-ga where
did you find the information about ETOANIRSHDLCWUMFYGPBVKXQJZ ? "You
would probably be better off replicating your database across a number
of identical machines." We are talking about 40TB! That's too
expensive of a storage situation plus I'm not sure if Linux can handle
that much one one disk. (it's raid 5 and will look like one physical
disk). "Most searches will probably consist of two or more words" A
search of "Dog food" in my model would just go to the D server where
as "Food for dogs" would go to the F server. I don't need perfect
distribution, but I would likw a good idea for example is words and
phrases that begin with X, Y, or Z are very rare I'd like to group
them on one server etc. Thanks!
Subject: Re: What English letter do most commonly searched for keywords begin with.
From: frde-ga on 08 Oct 2004 07:47 PDT
 
Like most people specifying a system 
- you lied

The underlying problem is distributing 40TB of data over a number of servers
ie: to split a large amount of data over separate machines
Subject: Re: What English letter do most commonly searched for keywords begin with.
From: gomvents-ga on 08 Oct 2004 10:11 PDT
 
There is no lie... picture like 16 - 20 servers... some will have
keyewords begining with one letter, some will have keywords beginning
with several letters

for example, a,b, or c VS. just words that begin with s

This should make sense now...
Subject: Re: What English letter do most commonly searched for keywords begin with.
From: silver777-ga on 09 Oct 2004 07:26 PDT
 
Hi again Gomvents,

I memorised the list from when I was in high school. I'm unsure as to
it's origin, but I believe it to be the correct sequence of the
English alphabet according to the frequency of use of letters. I just
thought it might help. That differs from what you probably need
though. It is not a list of "first letters in a word most commonly
used".

Beyond that, the rest is rocket science to me. So, on the lower end of
the scale of simplifying things, what if you just use a word count? If
there are say 3 times as many words starting with "C" than there are
"D", could you attribute your search engines accordingly?
Subject: Re: What English letter do most commonly searched for keywords begin with.
From: curious_-ga on 14 Oct 2004 21:38 PDT
 
Why don't you just make a reasonable estimation of the distribution of
letters to start off, and simply re-arrange your partitioning of
letters to servers as time goes on if your initial assumptions turn
out to be false.

From a design standpoint, it would make a lot more sense to design the
system flexibly in the first place rather than putting a lot of stock
into the notion that you will be able to obtain an accurate first
letter distribution and that such a distribution will remain constant
over time (considering that a large percentage of web searches are for
Britney spears, that biases the B heavily... but what will be the name
of the next pop star? -- you get the point)...

If you insist on dividing the work up without any experimental
evidence, my guess is that you would obtain better accuracy by hashing
the entire search term  using a hashing function that took into
consideration more than just the first letter.  Look into Zipf's
law... he did some groundbreaking work on word frequency which you may
find interesting.  However I think a flexible design with the ability
to re arrange the data storage over time will be the most efficient
and accurate approach to solving the problem you describe.

Important Disclaimer: Answers and comments provided on Google Answers are general information, and are not intended to substitute for informed professional medical, psychiatric, psychological, tax, legal, investment, accounting, or other professional advice. Google does not endorse, and expressly disclaims liability for any product, manufacturer, distributor, service or service provider mentioned or any opinion expressed in answers or comments. Please read carefully the Google Answers Terms of Service.

If you feel that you have found inappropriate content, please let us know by emailing us at answers-support@google.com with the question ID listed above. Thank you.
Search Google Answers for
Google Answers  


Google Home - Answers FAQ - Terms of Service - Privacy Policy