Google Answers Logo
View Question
 
Q: Imbalanced M:N relations in PostgreSQL ( Answered 5 out of 5 stars,   0 Comments )
Question  
Subject: Imbalanced M:N relations in PostgreSQL
Category: Computers > Programming
Asked by: ph317-ga
List Price: $75.00
Posted: 17 Aug 2002 15:45 PDT
Expires: 16 Sep 2002 15:45 PDT
Question ID: 55687
In the case of a classic m:n relation between two tables, which would
traditionally be done as so:

create table users ( userid integer primary key, ..... );
create table orgs ( orgid integer primary key, ..... );
create table usr_org ( userid integer references users, orgid integer
references orgs, primary key (userid,orgid) );

... supposing the relation is "imbalanced", meaning that while each
org may relate to 100+ users, each user generally relates to only 1-3
orgs, is it "better" and/or more efficient in PostgreSQL to use an
array in the "user" side, as so:

create table users ( id integer primary key, orgs integer[], ..... );
create table orgs ( id integer primary key, ..... );

.. or are such relations best left with a seperate m:n mapping table
as in the classic example at the top?

As far as I can tell, one big difference is I can't force referential
integrity with the array relation ("orgs integer[] references orgs"
bombs out), but it's not much price to pay in light of all things. 
The upside I can see from here is that it would reduce the complexity
of some of my application code and my SQL statements - including
turning some multi-statement update/insert transactions into single
statement transactions.  I guess the input I'm looking for here is
more along the lines of the performance impact of this change, but any
other insight is welcome.
Answer  
Subject: Re: Imbalanced M:N relations in PostgreSQL
Answered By: answerguru-ga on 17 Aug 2002 16:21 PDT
Rated:5 out of 5 stars
 
Hello ph317-ga,

Thanks for your interesting question...its always a bit of a trade-off
situation when dealing with these types of optimization techniques. I
have found from experience that the "classic" method as you call it is
the most effective method of actually storing the data. There are a
couple reasons for this:

1. The relational database model was built to support the types of
relations that you are attempting to design in this example. By
creating a table for the relation you are preventing yourself from
potential headaches down the line when you may want to manipulate the
data differently than was originally intended.

2. As a rule of thumb, you should NEVER model such that it strays from
the conventional representation in order to make application
integration easier. Not only are you leading to the same problem as in
#1, but you are also hampering the internal optimization algorithms
used to speed up each type of operation in the first place.

3. Although your data currently has only 1-3 orgs per user, are you
certain that the patterns currently exhibited in the data will persist
with time? When you are designing any database implementation, making
assumptions about the data in the future is going to be restrictive on
its potential manipulability in the future. I know it seems that this
won't change in your case but I'm just stating a general fact that is
worth keeping in mind. Remember that it only takes one exceptional
user that belongs to a huge amount of organizations to throw this
whole model out of optimal state using your array solution.

4. Statement execution in a higher-level programming language (such as
Java/C++ or others) is generally quite fast. I would give you an order
of magnitude on this but i am speaking in general terms so the
variation on this magnitude would be quite large. The implication of
this fact is that several transaction statements that follow the
intended database conventions in the host program will tend to operate
a lot more efficiently than a single transaction statement which
result in an inefficient query to the database. When source code is
converted to machine code, it is optimized by the compiler but the
same cannot always be said for database queries.

I think the reason you are leaning towards this tradeoff in the first
place is that it saves you having to code all of these
"multi-statement" queries and transactions, but ultimately the
decision that you need to make is how critical speed is on your
database-integrated application. If you don't mind making a sacrifice
in speed in order to ease the coding process then your array solution
is "legal" in database terms and it may make sense for you to use it.
However, your statement that it doesn't have referetial integrity
worries me personally and after analyzing this array design I don't
see a way to acheive this integrity without going back to the classic
design.

The performance of the array design is better IF (and these are two
big IFs):
1. Data patterns don't change
2. The way the data is being used doesn't change
3. Referential integrity is not considered important now or in the
future

From my viewpoint, I've found that the habits that are acquired when
working with databases often have a serious impact on the performance
and maintainability of database and the application. The best database
schemas (designs) are those that are flexible to change in actual data
patterns as well as the ways that the data is used and manipulated.

I know that was a lot to swallow, so if you have trouble understanding
any of the above information please feel free to post a clarification.

Cheers!

answerguru-ga
ph317-ga rated this answer:5 out of 5 stars
The researcher is obviously knowledge about database design.

Comments  
There are no comments at this time.

Important Disclaimer: Answers and comments provided on Google Answers are general information, and are not intended to substitute for informed professional medical, psychiatric, psychological, tax, legal, investment, accounting, or other professional advice. Google does not endorse, and expressly disclaims liability for any product, manufacturer, distributor, service or service provider mentioned or any opinion expressed in answers or comments. Please read carefully the Google Answers Terms of Service.

If you feel that you have found inappropriate content, please let us know by emailing us at answers-support@google.com with the question ID listed above. Thank you.
Search Google Answers for
Google Answers  


Google Home - Answers FAQ - Terms of Service - Privacy Policy