We have a table in a web-app that we created which has grown pretty
big. It is 3.2 million rows with 50 columns taking up about 4.5 GB. It
is actually performing quite well because we have indexed it pretty
efficiently. However, we are running into problems whenever we need to
change its structure.
Everytime we want to add a new index or a new column, it takes about
an hour to complete the operation. During this time the table is
locked, and all incoming queries have to wait up to 60 minutes to be
serviced. This is not acceptable since we have very active servers and
no allowed period for downtime. I want to know how to avoid this
problem.
Here's what I have come up with. Can you comment on these or suggest better ones?
A) Split the table up into smaller tables horizontally (split rows
across many tables) using a MERGE table. Then we can alter each table
one at a time, which would take less time altogether (I'm assuming)
and not lock all data at once.
B) Split the table vertically (split columns across many tables) by
pulling out portions of the table which are logically different. Same
theory as above.
C) Use some sort of replication to copy the table, perform the change
on the non-live copy, while logging all new changes to the live copy.
When the non-live is done with the alteration, use the log to get it
up to date, then switch non-live to live, and scrap the old live. This
sounds good, but complicated - are there any tools which would make
this easy?
D) I'm spent...any suggestions?
Again, we cannot spend more than one minute of downtime when updating
the table, as the server is in constant use.
Using MySQL 3.23 on Linux.
Thanks! |
Request for Question Clarification by
mathtalk-ga
on
08 Dec 2004 12:52 PST
Hi, shane43-ga:
I'm wondering what operations have to performed routinely to update
the table, besides the exceptional cases in which you add a new index
or a new column.
If the table were "read only" outside of some defined weekly schedule
in which updates are applied to the table, then "replicating" the
table shouldn't pose a big headache. You just take a dump of the
table anytime after it has been updated for the week and use that to
populate a second copy of the database. Make your changes there, and
then repoint your web-app to the second copy of the database.
In fact if the updates to the database were really as simple as
suggested there, then the weekly (or overnight) update could be done
to a parallel copy of the database.
So I'm guessing that your application is really much more
"transactional" and the table in question is actually being constantly
updated by the Web traffic. But if this is so, I'd ask how you
currently backup the database in case of a crash or other unplanned
corruption of data.
From the sound of things splitting the table up is probably a good
thing in and of itself. 50 columns and 3.2 million rows, plus the
fact that you only mention one table, strongly suggests that the
design is not "normalized". While this can be a practical approach
for some performance goals, it often creates risks of inconsistency in
the underlying data. This is all very general, of course, but bears
mentioning in light of your proposal to possibly split the table
"vertically".
As far as splitting a table "horizontally" goes, this is managed
somewhat slickly in commercial database managers like MS SQL Server
with what are called partitioned tables. In effect one subdivides a
table into identical copies that hold each some well-defined subset of
the key ranges.
Of course MS SQL Server and Linux won't mix, but the general point is
that if performance and uptime were really the most important design
goal, I'd rethink the dependence on MySQL.
I'm looking forward to hearing a bit more about your environment & goals.
regards, mathtalk-ga
|
Clarification of Question by
shane43-ga
on
09 Dec 2004 12:16 PST
Hi Mathtalk -
>So I'm guessing that your application is really much more
"transactional"
You're right - The table is not read-only, there are many requests
coming in constantly that are inserting records or updating rows.
>But if this is so, I'd ask how you currently backup the database in
case of a crash or other unplanned corruption of data.
We just do a mysql dump of the structure and data every so often and
save it to a file.
>As far as splitting a table "horizontally" goes, this is managed
somewhat slickly in commercial database managers like MS SQL Server
Well, MySQL is what we are on now, and that uses MERGE tables, but I
just learned that if one of the underlying tables changes structure
from the merge table, then errors arrise. So it appears this is not a
good solution.
>if performance and uptime were really the most important design
goal, I'd rethink the dependence on MySQL.
Really? I thought MySQL was known to serve high traffic databases pretty well.
>50 columns and 3.2 million rows, plus the fact that you only mention
one table, strongly suggests that the design is not
"normalized"....possibly split the table "vertically".
Here's the problem I have with that - Our main beef of the table is
5-10 fields, and then we have another 40 which are flags, timestamps,
and such which are set by using the beef data in a specific
application. For example, Say we have a contact as the beef (name,
address, phone, email). There is one application of this data to call
the contact and confirm their info. So for this application we use
another 5 fields like (time called, who called, who answered the
phone, is the info correct...). So it's nice to be able to sort and
query the data of confirmed contacts by filtering using those flags in
the same table. However, I can see how this can be bad because not all
of these contacts will ever be confirmed. So should I store this
confirmed data in a new table? I guess that makes sense, it's just
convenient to have it the current way.
Anyway, there are about 5 of these 'applications' that we perform on
the beef data. However, some of these applications will always be
done, so it's almost necessary to have them stored in the same table.
So should we just create 5 more tables and then link them with an ID
if there is a 1-1 relationship between the beef data and the
application results?
Thanks!
|
Request for Question Clarification by
mathtalk-ga
on
10 Dec 2004 08:58 PST
Does the table have a primary key, ie. a field or combination of
fields that uniquely identifies a row in the table? Is this key a
"synthetic" one, e.g. an arbitrary value created just for the sake of
having a primary key, or is it a "natural" one that reflects a
"business rule" of the application?
regards, mathtalk-ga
|
Clarification of Question by
shane43-ga
on
10 Dec 2004 12:02 PST
It has a synthetic primary key.
|