Google Answers: MySQL performance problem with large (12M rows) table.

View Question

Q: MySQL performance problem with large (12M rows) table. ( No Answer, 0 Comments )

Question

Subject: MySQL performance problem with large (12M rows) table.
Category: Computers > Programming
Asked by: raymond99-ga
List Price: $30.00

Posted: 31 Mar 2006 03:48 PST
Expires: 05 Apr 2006 08:55 PDT
Question ID: 713927

I have a table with 20 dimensions (columns), up to 12.000.000 rows and
no indexes.  The 20 columns contain things like date, department,
store, article, etc. The last column contains a count. The use of the
data by the user is such the indexes don't work because it is not
known in advance which columns the user will include in the WHERE
clause.

Queries over this table are slow - as expected - so I want to build an
alternative table with 1 dimension less, and the count aggregated.
Something like:

CREATE TABLE alt AS
SELECT a, b, c, d, e, f, g, h, i, j, k, l, sum(total)
FROM orig
GROUP BY a, b, c, d, e, f, g, h, i, j, k, l;

Now whenever I would normally access the 'orig' table and I can use
the 'alt' table instead, provided that the left-out column does not
appear anywhere in a WHERE clause. Because the 'alt' table has 50%
less rows than the 'orig' table, this gives me a 50% performance gain.

The above concept works like a charm in Oracle8, but MySQL chokes on
the above query when the number of rows in the 'orig' table exceeds
100.000 or so. I have increased MySQL's key buffer, sort buffer and
temporary table size to 200M each, but still I am left with a 10
minute query to reduce 2.600.000 rows in the 'orig' table to 1.500.000
rows in the 'alt' table.

I haven't tried the 12.000.000 rows case, but eventually this should
run in (far) under a minute.

If all else fails I can read the entire data into memory, perform the
reduction myself (in the application), and write the data back to the
'alt' table. But I'd rather not.

Any help is of course appreciated.

Request for Question Clarification by answerguru-ga on 31 Mar 2006 06:39 PST

Hi Raymond - it certainly make sense why this is happening. The ideal
thing to do here would be to normalize your database. This essentially
means that instead of having a single tables with 20 attributes, you
would have several whose records are linked together. Overall, this
provides a much more robust environment for running queries that are
slow by nature (such as GROUP BY queries).

If you wish, I can give you an example of a new table structure if you
can provide all of the attributes in your existing table. Would this
constitute a sufficient answer?

Thanks,
answerguru-ga

Clarification of Question by raymond99-ga on 03 Apr 2006 08:04 PDT

Alas this is not possible. The 20 (or so) dimensions do not constitute
attributes that can be normalized. Instead these are all independent
dimensions. Imagine, if you wish a list of 12.000.000 persons where
for each person you have the name, date of birth, place of birth,
length, weight, eye color, education, employer, and a dozen other
*independent* relations.

Answer

There is no answer at this time.

Comments

There are no comments at this time.

Important Disclaimer: Answers and comments provided on Google Answers are general information, and are not intended to substitute for informed professional medical, psychiatric, psychological, tax, legal, investment, accounting, or other professional advice. Google does not endorse, and expressly disclaims liability for any product, manufacturer, distributor, service or service provider mentioned or any opinion expressed in answers or comments. Please read carefully the Google Answers Terms of Service.

If you feel that you have found inappropriate content, please let us know by emailing us at answers-support@google.com with the question ID listed above. Thank you.

Search Google Answers for

Google Home - Answers FAQ - Terms of Service - Privacy Policy