I have a table with 20 dimensions (columns), up to 12.000.000 rows and
no indexes. The 20 columns contain things like date, department,
store, article, etc. The last column contains a count. The use of the
data by the user is such the indexes don't work because it is not
known in advance which columns the user will include in the WHERE
clause.
Queries over this table are slow - as expected - so I want to build an
alternative table with 1 dimension less, and the count aggregated.
Something like:
CREATE TABLE alt AS
SELECT a, b, c, d, e, f, g, h, i, j, k, l, sum(total)
FROM orig
GROUP BY a, b, c, d, e, f, g, h, i, j, k, l;
Now whenever I would normally access the 'orig' table and I can use
the 'alt' table instead, provided that the left-out column does not
appear anywhere in a WHERE clause. Because the 'alt' table has 50%
less rows than the 'orig' table, this gives me a 50% performance gain.
The above concept works like a charm in Oracle8, but MySQL chokes on
the above query when the number of rows in the 'orig' table exceeds
100.000 or so. I have increased MySQL's key buffer, sort buffer and
temporary table size to 200M each, but still I am left with a 10
minute query to reduce 2.600.000 rows in the 'orig' table to 1.500.000
rows in the 'alt' table.
I haven't tried the 12.000.000 rows case, but eventually this should
run in (far) under a minute.
If all else fails I can read the entire data into memory, perform the
reduction myself (in the application), and write the data back to the
'alt' table. But I'd rather not.
Any help is of course appreciated. |
Clarification of Question by
raymond99-ga
on
03 Apr 2006 08:04 PDT
Alas this is not possible. The 20 (or so) dimensions do not constitute
attributes that can be normalized. Instead these are all independent
dimensions. Imagine, if you wish a list of 12.000.000 persons where
for each person you have the name, date of birth, place of birth,
length, weight, eye color, education, employer, and a dozen other
*independent* relations.
|