Google Answers Logo
View Question
 
Q: how do I limit left joins in mysql? ( No Answer,   4 Comments )
Question  
Subject: how do I limit left joins in mysql?
Category: Computers > Programming
Asked by: hish-ga
List Price: $50.00
Posted: 01 Nov 2004 04:50 PST
Expires: 15 Nov 2004 01:54 PST
Question ID: 422861
I try to limit left joins or subselects on mysql 4.1.

Example:
table t1
  a1  |   b1   
--------------
  1   |   11  
  2   |   21  
  3   |   31  
  4   |   41  

table t2
  a2  |   b2  
--------------
  2   |   51  
  2   |   61  
  2   |   71  
  4   |   81  

table t3
  a3  |   b3  
--------------
  1   |   151  
  1   |   161  
  3   |   171  
  3   |   181  

when I want to retrieve the first 3 records of t1:
select * from t1 limit 0,3
result:
  a1  |   b1   
--------------
  1   |   11  
  2   |   21  
  3   |   31  

Example A:
but now I want to retrieve the first 3 records of t1 and t2.b2 and
when there is a match, also the first of t1:
expected result:
  a1  |   b1  |  b2
-------------------
  1   |   11  |  null
  2   |   21  |  51
  3   |   31  |  null

Example B:
or with 3 tables, I want t1, t2.b2 and t3.b3
expected result:
  a1  |   b1  |  b2   |  b3
----------------------------
  1   |   11  |  null | 151
  2   |   21  |  51   | null
  3   |   31  |  null | 171


I have no idea how to do a query for examples A and B. Also be aware
that the tables might have thousands of similiar records. So, when for
example t2.a2 and t3.a3 has 10000 entries with the value "1" and we
would do a distinct left join, then we would crash the db because it
would try to filter 10000x10000 rows, even when we only need the first
match.

Clarification of Question by hish-ga on 01 Nov 2004 05:33 PST
A short clarification:
all "distinct" and "group by" didn't work for me because of the memory
and cpu usage. It seems that these parameters only filter a huge
result set of billions of rows down to 3.

Request for Question Clarification by mathtalk-ga on 03 Nov 2004 18:28 PST
Hi, hish-ga:

Any interest in the approach I outlined in my Comment below (using a
UNION to simulate an outer join)?  I believe this approach gives the
best chance of tuning the performance of each "subselect".

regards, mathtalk-ga

Clarification of Question by hish-ga on 04 Nov 2004 00:14 PST
Hi mathtalk-ga,
please clarify your approach by a sample query. I can't estimate the
performance right now.
Btw: Do you know if UNION is supported by other RDBMs?

Request for Question Clarification by mathtalk-ga on 04 Nov 2004 04:59 PST
Hi, hish-ga:

Yes, UNION and the variant UNION ALL (when one doesn't need to avoid
duplicate rows in the results) are widely available in SQL
implementations.

I'll post a sample query below as a Comment corresponding to your Example A.

The idea is to combine multiple SELECT statements, in the simplest
case one for when a matching row exists and one for when no match
exists:

SELECT ... -- when matches exist
UNION
SELECT ... -- when no match exists

regards, mathtalk-ga
Answer  
There is no answer at this time.

Comments  
Subject: Re: how do I limit left joins in mysql?
From: mathtalk-ga on 01 Nov 2004 07:27 PST
 
I suspect that outer joins characteristically do not allow a (further)
restriction on the table/field which is "optional".

One approach to try:  construct the "outer join" instead as a UNION:
those records from the required table for which no match exists (a not
exists clause) in the optional table, padded with Nulls for fields
from the latter, combined with a query that does an inner join (and
restricts the match to only the "first" row as you define it in the
"optional" table).

UNION was implemented in MySQL starting with version 4.0.

regards, mathtalk-ga
Subject: Re: how do I limit left joins in mysql?
From: mathtalk-ga on 04 Nov 2004 11:11 PST
 
Let's consider the Example A above, in which there are two tables, t1 and t2.

The more complex of the two queries is where a match exists in t2.a2
to t1.a1, because we need to pick out the least/first t2.b2 that
works.  There are several approaches for doing this, and no one
approach is always the best performer.  But one that often works
nicely for me is this:

SELECT t1.a1, t1.b1, t2.b2 from t1, t2
where t1.a1 = t2.a2
and   not exists ( select * from t2 x2
                   where a2 = t2.a2
                   and   b2 < t2.b2
                 )


The easier of the two queries is the one in which a t1 row has no
matching t2 row.  We might write (anticipating the "missing" field):

SELECT t1.a1, t1.b1, Null from t1
where not exists ( select * from t2
                   where a2 = t1.a1
                 )

Finally, put both queries together:

SELECT t1.a1, t1.b1, t2.b2 from t1, t2
where t1.a1 = t2.a2
and   not exists ( select * from t2 x2
                   where a2 = t2.a2
                   and   b2 < t2.b2
                 )
UNION ALL
SELECT t1.a1, t1.b1, Null from t1
where not exists ( select * from t2
                   where a2 = t1.a1
                 )

I'm using UNION ALL here because we know a priori that the rows from
these two queries are disjoint; it seems a waste of server time to
look for duplicates and weed them out.

regards, mathtalk-ga
Subject: Re: how do I limit left joins in mysql?
From: hish-ga on 05 Nov 2004 00:33 PST
 
Hi mathtalk-ga,
thanks for your comment. I tried it with 10000 records in t1 and 100
in t2. I have two problems with your approach:
a) In every case your query should return exactly 10000 rows
regardless what values are in in the fields but I always get >10000
b) doing a select * on t1 takes about 0.5 secs. doing a left join or a
t1.a1=t2.a2 takes about 0.8 secs. Your query takes about 3.8 secs.
c) "... not exists ( select * from t2 x2" what does "x2" mean here?
Thanks for your help
Subject: Re: how do I limit left joins in mysql?
From: mathtalk-ga on 05 Nov 2004 05:38 PST
 
hish-ga wrote:

a) In every case your query should return exactly 10000 rows
regardless what values are in in the fields but I always get >10000

You are guaranteed of exactly 10000 rows if (as was the case in
Example A) the composite key a2,b2 is unique on table t2.  When two
rows have the same "least" value b2 for a shared a2, the query I wrote
will return both.  This is my best guess as to why there are more than
10000 rows returned.  Let me know if that's the "real world"
situation.

b) doing a select * on t1 takes about 0.5 secs. doing a left join or a
t1.a1=t2.a2 takes about 0.8 secs. Your query takes about 3.8 secs.

The sort of query I wrote does more work than a simple join (inner or
outer) because it locates the minimum b2 for given a2.  I'd think in
terms of getting a query that returns the correct results and then
tuning performance.

c) "... not exists ( select * from t2 x2"
   what does "x2" mean here?

Here x2 is an "alias" for table t2 in the dependent subquery. 
Although I allowed the fields belonging to the "inner" copy of t2 to
be so identified by leaving off the table qualification, the alias x2
would be used explicitly in this manner:

SELECT t1.a1, t1.b1, t2.b2 from t1, t2
where t1.a1 = t2.a2
and   not exists ( select * from t2 x2
                   where x2.a2 = t2.a2
                   and   x2.b2 < t2.b2
                 )

regards, mathtalk-ga

Important Disclaimer: Answers and comments provided on Google Answers are general information, and are not intended to substitute for informed professional medical, psychiatric, psychological, tax, legal, investment, accounting, or other professional advice. Google does not endorse, and expressly disclaims liability for any product, manufacturer, distributor, service or service provider mentioned or any opinion expressed in answers or comments. Please read carefully the Google Answers Terms of Service.

If you feel that you have found inappropriate content, please let us know by emailing us at answers-support@google.com with the question ID listed above. Thank you.
Search Google Answers for
Google Answers  


Google Home - Answers FAQ - Terms of Service - Privacy Policy