Google Answers: Altering database in a live environment

View Question

Q: Altering database in a live environment ( No Answer, 0 Comments )

Question

Subject: Altering database in a live environment
Category: Computers > Internet
Asked by: shane43-ga
List Price: $20.00

Posted: 08 Dec 2004 11:34 PST
Expires: 07 Jan 2005 11:34 PST
Question ID: 439904

We have a table in a web-app that we created which has grown pretty big. It is 3.2 million rows with 50 columns taking up about 4.5 GB. It is actually performing quite well because we have indexed it pretty efficiently. However, we are running into problems whenever we need to change its structure. Everytime we want to add a new index or a new column, it takes about an hour to complete the operation. During this time the table is locked, and all incoming queries have to wait up to 60 minutes to be serviced. This is not acceptable since we have very active servers and no allowed period for downtime. I want to know how to avoid this problem. Here's what I have come up with. Can you comment on these or suggest better ones? A) Split the table up into smaller tables horizontally (split rows across many tables) using a MERGE table. Then we can alter each table one at a time, which would take less time altogether (I'm assuming) and not lock all data at once. B) Split the table vertically (split columns across many tables) by pulling out portions of the table which are logically different. Same theory as above. C) Use some sort of replication to copy the table, perform the change on the non-live copy, while logging all new changes to the live copy. When the non-live is done with the alteration, use the log to get it up to date, then switch non-live to live, and scrap the old live. This sounds good, but complicated - are there any tools which would make this easy? D) I'm spent...any suggestions? Again, we cannot spend more than one minute of downtime when updating the table, as the server is in constant use. Using MySQL 3.23 on Linux. Thanks!
Request for Question Clarification by mathtalk-ga on 08 Dec 2004 12:52 PST Hi, shane43-ga: I'm wondering what operations have to performed routinely to update the table, besides the exceptional cases in which you add a new index or a new column. If the table were "read only" outside of some defined weekly schedule in which updates are applied to the table, then "replicating" the table shouldn't pose a big headache. You just take a dump of the table anytime after it has been updated for the week and use that to populate a second copy of the database. Make your changes there, and then repoint your web-app to the second copy of the database. In fact if the updates to the database were really as simple as suggested there, then the weekly (or overnight) update could be done to a parallel copy of the database. So I'm guessing that your application is really much more "transactional" and the table in question is actually being constantly updated by the Web traffic. But if this is so, I'd ask how you currently backup the database in case of a crash or other unplanned corruption of data. From the sound of things splitting the table up is probably a good thing in and of itself. 50 columns and 3.2 million rows, plus the fact that you only mention one table, strongly suggests that the design is not "normalized". While this can be a practical approach for some performance goals, it often creates risks of inconsistency in the underlying data. This is all very general, of course, but bears mentioning in light of your proposal to possibly split the table "vertically". As far as splitting a table "horizontally" goes, this is managed somewhat slickly in commercial database managers like MS SQL Server with what are called partitioned tables. In effect one subdivides a table into identical copies that hold each some well-defined subset of the key ranges. Of course MS SQL Server and Linux won't mix, but the general point is that if performance and uptime were really the most important design goal, I'd rethink the dependence on MySQL. I'm looking forward to hearing a bit more about your environment & goals. regards, mathtalk-ga
Clarification of Question by shane43-ga on 09 Dec 2004 12:16 PST Hi Mathtalk - >So I'm guessing that your application is really much more "transactional" You're right - The table is not read-only, there are many requests coming in constantly that are inserting records or updating rows. >But if this is so, I'd ask how you currently backup the database in case of a crash or other unplanned corruption of data. We just do a mysql dump of the structure and data every so often and save it to a file. >As far as splitting a table "horizontally" goes, this is managed somewhat slickly in commercial database managers like MS SQL Server Well, MySQL is what we are on now, and that uses MERGE tables, but I just learned that if one of the underlying tables changes structure from the merge table, then errors arrise. So it appears this is not a good solution. >if performance and uptime were really the most important design goal, I'd rethink the dependence on MySQL. Really? I thought MySQL was known to serve high traffic databases pretty well. >50 columns and 3.2 million rows, plus the fact that you only mention one table, strongly suggests that the design is not "normalized"....possibly split the table "vertically". Here's the problem I have with that - Our main beef of the table is 5-10 fields, and then we have another 40 which are flags, timestamps, and such which are set by using the beef data in a specific application. For example, Say we have a contact as the beef (name, address, phone, email). There is one application of this data to call the contact and confirm their info. So for this application we use another 5 fields like (time called, who called, who answered the phone, is the info correct...). So it's nice to be able to sort and query the data of confirmed contacts by filtering using those flags in the same table. However, I can see how this can be bad because not all of these contacts will ever be confirmed. So should I store this confirmed data in a new table? I guess that makes sense, it's just convenient to have it the current way. Anyway, there are about 5 of these 'applications' that we perform on the beef data. However, some of these applications will always be done, so it's almost necessary to have them stored in the same table. So should we just create 5 more tables and then link them with an ID if there is a 1-1 relationship between the beef data and the application results? Thanks!
Request for Question Clarification by mathtalk-ga on 10 Dec 2004 08:58 PST Does the table have a primary key, ie. a field or combination of fields that uniquely identifies a row in the table? Is this key a "synthetic" one, e.g. an arbitrary value created just for the sake of having a primary key, or is it a "natural" one that reflects a "business rule" of the application? regards, mathtalk-ga
Clarification of Question by shane43-ga on 10 Dec 2004 12:02 PST It has a synthetic primary key.

Answer

There is no answer at this time.

Comments

There are no comments at this time.

Important Disclaimer: Answers and comments provided on Google Answers are general information, and are not intended to substitute for informed professional medical, psychiatric, psychological, tax, legal, investment, accounting, or other professional advice. Google does not endorse, and expressly disclaims liability for any product, manufacturer, distributor, service or service provider mentioned or any opinion expressed in answers or comments. Please read carefully the Google Answers Terms of Service.

If you feel that you have found inappropriate content, please let us know by emailing us at answers-support@google.com with the question ID listed above. Thank you.

Search Google Answers for

Google Home - Answers FAQ - Terms of Service - Privacy Policy