|
|
Subject:
Imbalanced M:N relations in PostgreSQL
Category: Computers > Programming Asked by: ph317-ga List Price: $75.00 |
Posted:
17 Aug 2002 15:45 PDT
Expires: 16 Sep 2002 15:45 PDT Question ID: 55687 |
In the case of a classic m:n relation between two tables, which would traditionally be done as so: create table users ( userid integer primary key, ..... ); create table orgs ( orgid integer primary key, ..... ); create table usr_org ( userid integer references users, orgid integer references orgs, primary key (userid,orgid) ); ... supposing the relation is "imbalanced", meaning that while each org may relate to 100+ users, each user generally relates to only 1-3 orgs, is it "better" and/or more efficient in PostgreSQL to use an array in the "user" side, as so: create table users ( id integer primary key, orgs integer[], ..... ); create table orgs ( id integer primary key, ..... ); .. or are such relations best left with a seperate m:n mapping table as in the classic example at the top? As far as I can tell, one big difference is I can't force referential integrity with the array relation ("orgs integer[] references orgs" bombs out), but it's not much price to pay in light of all things. The upside I can see from here is that it would reduce the complexity of some of my application code and my SQL statements - including turning some multi-statement update/insert transactions into single statement transactions. I guess the input I'm looking for here is more along the lines of the performance impact of this change, but any other insight is welcome. |
|
Subject:
Re: Imbalanced M:N relations in PostgreSQL
Answered By: answerguru-ga on 17 Aug 2002 16:21 PDT Rated: |
Hello ph317-ga, Thanks for your interesting question...its always a bit of a trade-off situation when dealing with these types of optimization techniques. I have found from experience that the "classic" method as you call it is the most effective method of actually storing the data. There are a couple reasons for this: 1. The relational database model was built to support the types of relations that you are attempting to design in this example. By creating a table for the relation you are preventing yourself from potential headaches down the line when you may want to manipulate the data differently than was originally intended. 2. As a rule of thumb, you should NEVER model such that it strays from the conventional representation in order to make application integration easier. Not only are you leading to the same problem as in #1, but you are also hampering the internal optimization algorithms used to speed up each type of operation in the first place. 3. Although your data currently has only 1-3 orgs per user, are you certain that the patterns currently exhibited in the data will persist with time? When you are designing any database implementation, making assumptions about the data in the future is going to be restrictive on its potential manipulability in the future. I know it seems that this won't change in your case but I'm just stating a general fact that is worth keeping in mind. Remember that it only takes one exceptional user that belongs to a huge amount of organizations to throw this whole model out of optimal state using your array solution. 4. Statement execution in a higher-level programming language (such as Java/C++ or others) is generally quite fast. I would give you an order of magnitude on this but i am speaking in general terms so the variation on this magnitude would be quite large. The implication of this fact is that several transaction statements that follow the intended database conventions in the host program will tend to operate a lot more efficiently than a single transaction statement which result in an inefficient query to the database. When source code is converted to machine code, it is optimized by the compiler but the same cannot always be said for database queries. I think the reason you are leaning towards this tradeoff in the first place is that it saves you having to code all of these "multi-statement" queries and transactions, but ultimately the decision that you need to make is how critical speed is on your database-integrated application. If you don't mind making a sacrifice in speed in order to ease the coding process then your array solution is "legal" in database terms and it may make sense for you to use it. However, your statement that it doesn't have referetial integrity worries me personally and after analyzing this array design I don't see a way to acheive this integrity without going back to the classic design. The performance of the array design is better IF (and these are two big IFs): 1. Data patterns don't change 2. The way the data is being used doesn't change 3. Referential integrity is not considered important now or in the future From my viewpoint, I've found that the habits that are acquired when working with databases often have a serious impact on the performance and maintainability of database and the application. The best database schemas (designs) are those that are flexible to change in actual data patterns as well as the ways that the data is used and manipulated. I know that was a lot to swallow, so if you have trouble understanding any of the above information please feel free to post a clarification. Cheers! answerguru-ga |
ph317-ga
rated this answer:
The researcher is obviously knowledge about database design. |
|
There are no comments at this time. |
If you feel that you have found inappropriate content, please let us know by emailing us at answers-support@google.com with the question ID listed above. Thank you. |
Search Google Answers for |
Google Home - Answers FAQ - Terms of Service - Privacy Policy |