Hello,
I have a MySql query which is stumping my abilitites to optimize it.
Given a set of rows marked "dirty", and a column containing
heirarchial data in LDAP DN format, I want to mark all "peers" of the
dirty rows for updating, which I'll do in another query. From all I
can tell, this query has proper indexes, and is not doing something
stupid like an accidntal cross join. So assistance in getting this to
run in reasonable time over a decent sized data set is worth money to
me.
Here is my query:
UPDATE derived d1, derived d2 SET
d1.runid='01964652a821d326741c67cbbe943684' WHERE
d2.axis_ReceiptDate_parent = 'year=2005' AND
d2.parent_dirty_ReceiptDate = 1 AND d2.type=1 AND d1.type=1 AND
d1.hash != '' AND d1.runid IS NULL AND d1.axis_ReceiptDate_parent =
d2.axis_ReceiptDate_parent AND d1.axis_SalesPersonGeography =
d2.axis_SalesPersonGeography AND d1.axis_Departments =
d2.axis_Departments
NOTE: my original version of this query used a subquery, but MySql
doesn't allow a select and update on the same table in the same query,
so I had to rewrite using this JOIN.
Here is the table definition:
+-----------------------------------+--------------+------+-----+-------------------+-------+
| Field | Type | Null | Key |
Default | Extra |
+-----------------------------------+--------------+------+-----+-------------------+-------+
| alter_date | timestamp | YES | |
CURRENT_TIMESTAMP | |
| type | int(1) | | PRI | 0
| |
| hash | varchar(32) | YES | MUL | NULL
| |
| user_data_Subtotal | double(18,8) | YES | |
0.00000000 | |
| goal_Subtotal | double(18,8) | YES | |
0.00000000 | |
| avg_Subtotal | double(18,8) | YES | | NULL
| |
| user_data_TotalCogs | double(18,8) | YES | |
0.00000000 | |
| goal_TotalCogs | double(18,8) | YES | |
0.00000000 | |
| avg_TotalCogs | double(18,8) | YES | | NULL
| |
| user_data_TotalNumSold | double(18,8) | YES | |
0.00000000 | |
| goal_TotalNumSold | double(18,8) | YES | |
0.00000000 | |
| avg_TotalNumSold | double(18,8) | YES | | NULL
| |
| user_data_AvgTicket | double(18,8) | YES | |
0.00000000 | |
| goal_AvgTicket | double(18,8) | YES | |
0.00000000 | |
| avg_AvgTicket | double(18,8) | YES | | NULL
| |
| user_data_AvgTicket_sum | double(18,8) | YES | |
0.00000000 | |
| user_data_AvgTicket_count | double(18,8) | YES | |
0.00000000 | |
| user_data_AvgMargin | double(18,8) | YES | |
0.00000000 | |
| goal_AvgMargin | double(18,8) | YES | |
0.00000000 | |
| avg_AvgMargin | double(18,8) | YES | | NULL
| |
| user_data_AvgMargin_sum | double(18,8) | YES | |
0.00000000 | |
| user_data_AvgMargin_count | double(18,8) | YES | |
0.00000000 | |
| axis_ReceiptDate | varchar(200) | | PRI |
| |
| axis_ReceiptDate_parent | varchar(200) | YES | MUL | NULL
| |
| axis_ReceiptDate_sort | varchar(200) | YES | | NULL
| |
| parent_dirty_ReceiptDate | int(1) | YES | | NULL
| |
| axis_SalesPersonGeography | varchar(200) | | PRI |
| |
| axis_SalesPersonGeography_parent | varchar(200) | YES | MUL | NULL
| |
| axis_SalesPersonGeography_sort | varchar(200) | YES | | NULL
| |
| parent_dirty_SalesPersonGeography | int(1) | YES | | NULL
| |
| axis_Departments | varchar(200) | | PRI |
| |
| axis_Departments_parent | varchar(200) | YES | MUL | NULL
| |
| axis_Departments_sort | varchar(200) | YES | | NULL
| |
| parent_dirty_Departments | int(1) | YES | | NULL
| |
| runid | varchar(32) | YES | MUL | NULL
| |
| axis_receiptdate_depth | int(2) | YES | | NULL
| |
| axis_departments_depth | int(2) | YES | | NULL
| |
| axis_salespersongeography_depth | int(2) | YES | | NULL
| |
+-----------------------------------+--------------+------+-----+-------------------+-------+
Here are my indexes:
mysql> show indexes from derived;
+---------+------------+--------------------------------+--------------+-----------------------------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index
| Column_name | Collation | Cardinality |
Sub_part | Packed | Null | Index_type | Comment |
+---------+------------+--------------------------------+--------------+-----------------------------------+-----------+-------------+----------+--------+------+------------+---------+
| derived | 0 | PRIMARY | 1
| type | A | 1 |
NULL | NULL | | BTREE | |
| derived | 0 | PRIMARY | 2
| axis_ReceiptDate | A | 808 |
NULL | NULL | | BTREE | |
| derived | 0 | PRIMARY | 3
| axis_SalesPersonGeography | A | 10277 |
NULL | NULL | | BTREE | |
| derived | 0 | PRIMARY | 4
| axis_Departments | A | 71945 |
NULL | NULL | | BTREE | |
| derived | 1 | index_d_ReceiptDate | 1
| axis_ReceiptDate | A | 808 |
NULL | NULL | | BTREE | |
| derived | 1 | index_d_ReceiptDate | 2
| parent_dirty_ReceiptDate | A | 808 |
NULL | NULL | YES | BTREE | |
| derived | 1 | index_d_ReceiptDate_p | 1
| axis_ReceiptDate_parent | A | 31 |
NULL | NULL | YES | BTREE | |
| derived | 1 | index_d_SalesPersonGeography | 1
| axis_SalesPersonGeography | A | 59 |
NULL | NULL | | BTREE | |
| derived | 1 | index_d_SalesPersonGeography | 2
| parent_dirty_SalesPersonGeography | A | 59 |
NULL | NULL | YES | BTREE | |
| derived | 1 | index_d_SalesPersonGeography_p | 1
| axis_SalesPersonGeography_parent | A | 3 |
NULL | NULL | YES | BTREE | |
| derived | 1 | index_d_Departments | 1
| axis_Departments | A | 19 |
NULL | NULL | | BTREE | |
| derived | 1 | index_d_Departments | 2
| parent_dirty_Departments | A | 19 |
NULL | NULL | YES | BTREE | |
| derived | 1 | index_d_Departments_p | 1
| axis_Departments_parent | A | 2 |
NULL | NULL | YES | BTREE | |
| derived | 1 | index_d_hash | 1
| hash | A | 71945 |
NULL | NULL | YES | BTREE | |
| derived | 1 | index_d_runid | 1
| runid | A | 3 |
NULL | NULL | YES | BTREE | |
+---------+------------+--------------------------------+--------------+-----------------------------------+-----------+-------------+----------+--------+------+------------+---------+
MySql won't allow an "explain" on an update, so here's an explain on a
similar select:
mysql> explain select d1.runid from derived d1, derived d2 WHERE
d2.axis_ReceiptDate_parent = 'year=2005' AND
d2.parent_dirty_ReceiptDate = 1 AND d2.type=1 AND d1.type=1 AND
d1.hash != '' AND d1.runid IS NULL AND d1.axis_ReceiptDate_parent =
d2.axis_ReceiptDate_parent AND d1.axis_SalesPersonGeography =
d2.axis_SalesPersonGeography AND d1.axis_Departments =
d2.axis_Departments;
+----+-------------+-------+------+-----------------------------------------------------------------------------------------------------------+------------------------------+---------+--------------------------------------------------+------+-------------+
| id | select_type | table | type | possible_keys
| key | key_len | ref
| rows | Extra |
+----+-------------+-------+------+-----------------------------------------------------------------------------------------------------------+------------------------------+---------+--------------------------------------------------+------+-------------+
| 1 | SIMPLE | d1 | ref |
PRIMARY,index_d_ReceiptDate_p,index_d_SalesPersonGeography,index_d_Departments,index_d_hash,index_d_runid
| index_d_ReceiptDate_p | 201 | const
| 6117 | Using where |
| 1 | SIMPLE | d2 | ref |
PRIMARY,index_d_ReceiptDate_p,index_d_SalesPersonGeography,index_d_Departments
| index_d_SalesPersonGeography | 200 |
warehouse_deptsales.d1.axis_SalesPersonGeography | 1219 | Using where
|
+----+-------------+-------+------+-----------------------------------------------------------------------------------------------------------+------------------------------+---------+--------------------------------------------------+------+-------------+
The derived table has roughly 72,000 rows:
mysql> select count(*) from derived;
+----------+
| count(*) |
+----------+
| 71945 |
+----------+
and the update query above generally takes between 50 and 250 seconds
to complete. Given the indexes, it seems like it should take less
time, like an order of magnitude. |