Google Answers Logo
View Question
 
Q: Assistance with formulating a MySQL query for minimal processor overhead ( No Answer,   8 Comments )
Question  
Subject: Assistance with formulating a MySQL query for minimal processor overhead
Category: Computers
Asked by: theenglishgentleman-ga
List Price: $50.00
Posted: 08 Sep 2005 02:22 PDT
Expires: 08 Oct 2005 02:22 PDT
Question ID: 565547
Assume we have a database table that stores the results of a photo
rating website. People rate the photos of others on a 1 to 10 scale.

The table where we store the results has the following fields:

id (INT) e.g. 17426
viewer_id (INT) e.g. 1023
subject_id (INT) e.g. 4721
rating (TINYINT) e.g. 7

The query we would like is as follows - it's in three parts, but this
is just to make it easier to understand. The MySQL query should be a
single SQL statement:

1. For a given user X, we are only interested in the last 200 photos
that X has rated.

2. We now want to find the 100 other people, who have seen and rated
the most number of same people as X (i.e. people who've seen some, or
all of the same 200 people).

3. We want to sort those 100 people in descending order, with the
person who has seen the most out of those 200 people at the top.

In other words...

John has seen and rated 500 people on the system. We are only
interested in the last 200 people he has rated. Taking those 200
people, we find the 100 people who have seen and rated the most of
those 200 people. E.G.Another user Bill has viewed 163 of the same 200
people. Another user Calvin has viewed 127 of those 200 people.
Another user Todd has viewed 56 of those 200 people. We want to find
the other 97 people who have seen large numbers of those 200 people,
and put them in descending order of priority. Using our example Bill
is at the top of the list with 163/200, a guy by the name of Fred is
at the very bottom with 17/100.

The result should be the 100 user ID's sorted as described. It should
be a single SQL query that is the most efficient as possible. We
expect a high use of this query. You can assume we have choices in
terms of when we create the table, whether to create indexes or not,
and what type of table type we use (MyISAM,InnoDB etc.)

The answer we are seeking is...

1. What the MySQL query should be.
2. Any requirements for indexes or table type.
2. The actual number of queries per second we could achieve for a
given processor speed and RAM configuration (your machine), and a
table size of 10,000,000 ratings, spread evenly among 50,000 users.
3. The actual number of queries per second we could achieve for the
same processor speed and RAM configuration, and a table size of
100,000,000 ratings, spread evenly among 500,000 users.
4. Any general suggestions for optimizing this process.

We do expect someone the answer to include the results of a real
physical test using sample data, which you would generate.

Clarification of Question by theenglishgentleman-ga on 09 Sep 2005 04:48 PDT
Sorry, the ID column is auto increment which should cover the issue of
'last 200'. As for the simulation, generating some sample data and
running a test should not really be that much work.

Clarification of Question by theenglishgentleman-ga on 10 Sep 2005 15:36 PDT
I wanted to thank ccahd for his or her detailed answer, but I don't
believe MySQL supports the TOP command.

Clarification of Question by theenglishgentleman-ga on 13 Sep 2005 04:24 PDT
Thanks for your help ccahd. We are going to proceed with our own test
now. Many thanks!

Clarification of Question by theenglishgentleman-ga on 13 Sep 2005 09:33 PDT
Hi there ccahd. That sounds good. I understand from the TOS that
google may remove any 'personally identifiable' information. Without
identifying myself personally therefore, you may make contact with our
company using the info at millenniumphoto dot com.

Clarification of Question by theenglishgentleman-ga on 13 Sep 2005 09:34 PDT
incidentally. that should read 'using the information at the website
millenniumphoto dot com'
Answer  
There is no answer at this time.

Comments  
Subject: Re: Assistance with formulating a MySQL query for minimal processor overhead
From: nelson-ga on 08 Sep 2005 03:55 PDT
 
You need a date column to determine the last 200.  You should not use
an ID column, if that was your plan.
Subject: Re: Assistance with formulating a MySQL query for minimal processor overhead
From: feldersoft-ga on 08 Sep 2005 20:58 PDT
 
Wow only $50...and you want someone to do an actual simulation?  

Good luck with that.  I second the comment about the date column...you'll need one.
Subject: Re: Assistance with formulating a MySQL query for minimal processor overhead
From: bozo99-ga on 09 Sep 2005 04:13 PDT
 
Isn't a queue of 200 INTs a beter way to track the last 200 items ?

I'm also unclear what is the value of this query - isn't it going to
return the people who've spent loads of time rating everybody ?
Subject: Re: Assistance with formulating a MySQL query for minimal processor overhead
From: ccahd-ga on 10 Sep 2005 02:29 PDT
 
You can try this qury to get results required 

SELECT     TOP 100 viewer_id, COUNT(*) AS Expr1
FROM         photo_table
WHERE     (viewer_id IN (SELECT viewer_id
                            FROM photo_table WHERE viewer_id <> 1 AND
subject_id IN (SELECT TOP 200 subject_id                              
                         FROM photo_table WHERE viewer_id = 1         
                                               ORDER BY id DESC)))
GROUP BY viewer_id
ORDER BY COUNT(*) DESC

Also create following index :

create index idx_viewer_id on photo_table (viewer_id)
create index idx_subject_id on photo_table (subject_id)

note :

I believe that subject_id is photo id.
Subject: Re: Assistance with formulating a MySQL query for minimal processor overhead
From: nelson-ga on 10 Sep 2005 08:55 PDT
 
I stand by my comment.  Autoicrement/identity columns should not be
used for temporal analysis.
Subject: Re: Assistance with formulating a MySQL query for minimal processor overhead
From: ccahd-ga on 12 Sep 2005 01:32 PDT
 
in my sql you can use limit rather then top. 

SELECT  viewer_id, COUNT(*) AS Expr1
FROM         photo_table
WHERE     (viewer_id IN (SELECT viewer_id
                            FROM photo_table WHERE viewer_id <> 1 AND
subject_id IN (SELECT subject_id                              
                         FROM photo_table WHERE viewer_id = 1         
                                               ORDER BY id DESC limit 1,200)))
GROUP BY viewer_id
ORDER BY COUNT(*) DESC limit 1,100
Subject: Re: Assistance with formulating a MySQL query for minimal processor overhead
From: ccahd-ga on 13 Sep 2005 07:22 PDT
 
if possible aware me about whatever solutions you are doing. this can
help me to add some value to my knowledge.

thank
Subject: Re: Assistance with formulating a MySQL query for minimal processor overhead
From: ccahd-ga on 14 Sep 2005 00:31 PDT
 
no email id in website.

Important Disclaimer: Answers and comments provided on Google Answers are general information, and are not intended to substitute for informed professional medical, psychiatric, psychological, tax, legal, investment, accounting, or other professional advice. Google does not endorse, and expressly disclaims liability for any product, manufacturer, distributor, service or service provider mentioned or any opinion expressed in answers or comments. Please read carefully the Google Answers Terms of Service.

If you feel that you have found inappropriate content, please let us know by emailing us at answers-support@google.com with the question ID listed above. Thank you.
Search Google Answers for
Google Answers  


Google Home - Answers FAQ - Terms of Service - Privacy Policy