Google Answers: how do I limit left joins in mysql?

View Question

Q: how do I limit left joins in mysql? ( No Answer, 4 Comments )

Question

Subject: how do I limit left joins in mysql?
Category: Computers > Programming
Asked by: hish-ga
List Price: $50.00

Posted: 01 Nov 2004 04:50 PST
Expires: 15 Nov 2004 01:54 PST
Question ID: 422861

I try to limit left joins or subselects on mysql 4.1. Example: table t1 a1 \| b1 -------------- 1 \| 11 2 \| 21 3 \| 31 4 \| 41 table t2 a2 \| b2 -------------- 2 \| 51 2 \| 61 2 \| 71 4 \| 81 table t3 a3 \| b3 -------------- 1 \| 151 1 \| 161 3 \| 171 3 \| 181 when I want to retrieve the first 3 records of t1: select * from t1 limit 0,3 result: a1 \| b1 -------------- 1 \| 11 2 \| 21 3 \| 31 Example A: but now I want to retrieve the first 3 records of t1 and t2.b2 and when there is a match, also the first of t1: expected result: a1 \| b1 \| b2 ------------------- 1 \| 11 \| null 2 \| 21 \| 51 3 \| 31 \| null Example B: or with 3 tables, I want t1, t2.b2 and t3.b3 expected result: a1 \| b1 \| b2 \| b3 ---------------------------- 1 \| 11 \| null \| 151 2 \| 21 \| 51 \| null 3 \| 31 \| null \| 171 I have no idea how to do a query for examples A and B. Also be aware that the tables might have thousands of similiar records. So, when for example t2.a2 and t3.a3 has 10000 entries with the value "1" and we would do a distinct left join, then we would crash the db because it would try to filter 10000x10000 rows, even when we only need the first match.
Clarification of Question by hish-ga on 01 Nov 2004 05:33 PST A short clarification: all "distinct" and "group by" didn't work for me because of the memory and cpu usage. It seems that these parameters only filter a huge result set of billions of rows down to 3.
Request for Question Clarification by mathtalk-ga on 03 Nov 2004 18:28 PST Hi, hish-ga: Any interest in the approach I outlined in my Comment below (using a UNION to simulate an outer join)? I believe this approach gives the best chance of tuning the performance of each "subselect". regards, mathtalk-ga
Clarification of Question by hish-ga on 04 Nov 2004 00:14 PST Hi mathtalk-ga, please clarify your approach by a sample query. I can't estimate the performance right now. Btw: Do you know if UNION is supported by other RDBMs?
Request for Question Clarification by mathtalk-ga on 04 Nov 2004 04:59 PST Hi, hish-ga: Yes, UNION and the variant UNION ALL (when one doesn't need to avoid duplicate rows in the results) are widely available in SQL implementations. I'll post a sample query below as a Comment corresponding to your Example A. The idea is to combine multiple SELECT statements, in the simplest case one for when a matching row exists and one for when no match exists: SELECT ... -- when matches exist UNION SELECT ... -- when no match exists regards, mathtalk-ga

Answer

There is no answer at this time.

Comments

Subject: Re: how do I limit left joins in mysql?
From: mathtalk-ga on 01 Nov 2004 07:27 PST

I suspect that outer joins characteristically do not allow a (further)
restriction on the table/field which is "optional".

One approach to try:  construct the "outer join" instead as a UNION:
those records from the required table for which no match exists (a not
exists clause) in the optional table, padded with Nulls for fields
from the latter, combined with a query that does an inner join (and
restricts the match to only the "first" row as you define it in the
"optional" table).

UNION was implemented in MySQL starting with version 4.0.

regards, mathtalk-ga

Subject: Re: how do I limit left joins in mysql?
From: mathtalk-ga on 04 Nov 2004 11:11 PST

Let's consider the Example A above, in which there are two tables, t1 and t2.

The more complex of the two queries is where a match exists in t2.a2
to t1.a1, because we need to pick out the least/first t2.b2 that
works.  There are several approaches for doing this, and no one
approach is always the best performer.  But one that often works
nicely for me is this:

SELECT t1.a1, t1.b1, t2.b2 from t1, t2
where t1.a1 = t2.a2
and   not exists ( select * from t2 x2
                   where a2 = t2.a2
                   and   b2 < t2.b2
                 )


The easier of the two queries is the one in which a t1 row has no
matching t2 row.  We might write (anticipating the "missing" field):

SELECT t1.a1, t1.b1, Null from t1
where not exists ( select * from t2
                   where a2 = t1.a1
                 )

Finally, put both queries together:

SELECT t1.a1, t1.b1, t2.b2 from t1, t2
where t1.a1 = t2.a2
and   not exists ( select * from t2 x2
                   where a2 = t2.a2
                   and   b2 < t2.b2
                 )
UNION ALL
SELECT t1.a1, t1.b1, Null from t1
where not exists ( select * from t2
                   where a2 = t1.a1
                 )

I'm using UNION ALL here because we know a priori that the rows from
these two queries are disjoint; it seems a waste of server time to
look for duplicates and weed them out.

regards, mathtalk-ga

Subject: Re: how do I limit left joins in mysql?
From: hish-ga on 05 Nov 2004 00:33 PST

Hi mathtalk-ga,
thanks for your comment. I tried it with 10000 records in t1 and 100
in t2. I have two problems with your approach:
a) In every case your query should return exactly 10000 rows
regardless what values are in in the fields but I always get >10000
b) doing a select * on t1 takes about 0.5 secs. doing a left join or a
t1.a1=t2.a2 takes about 0.8 secs. Your query takes about 3.8 secs.
c) "... not exists ( select * from t2 x2" what does "x2" mean here?
Thanks for your help

Subject: Re: how do I limit left joins in mysql?
From: mathtalk-ga on 05 Nov 2004 05:38 PST

hish-ga wrote:

a) In every case your query should return exactly 10000 rows
regardless what values are in in the fields but I always get >10000

You are guaranteed of exactly 10000 rows if (as was the case in
Example A) the composite key a2,b2 is unique on table t2.  When two
rows have the same "least" value b2 for a shared a2, the query I wrote
will return both.  This is my best guess as to why there are more than
10000 rows returned.  Let me know if that's the "real world"
situation.

b) doing a select * on t1 takes about 0.5 secs. doing a left join or a
t1.a1=t2.a2 takes about 0.8 secs. Your query takes about 3.8 secs.

The sort of query I wrote does more work than a simple join (inner or
outer) because it locates the minimum b2 for given a2.  I'd think in
terms of getting a query that returns the correct results and then
tuning performance.

c) "... not exists ( select * from t2 x2"
   what does "x2" mean here?

Here x2 is an "alias" for table t2 in the dependent subquery. 
Although I allowed the fields belonging to the "inner" copy of t2 to
be so identified by leaving off the table qualification, the alias x2
would be used explicitly in this manner:

SELECT t1.a1, t1.b1, t2.b2 from t1, t2
where t1.a1 = t2.a2
and   not exists ( select * from t2 x2
                   where x2.a2 = t2.a2
                   and   x2.b2 < t2.b2
                 )

regards, mathtalk-ga

Important Disclaimer: Answers and comments provided on Google Answers are general information, and are not intended to substitute for informed professional medical, psychiatric, psychological, tax, legal, investment, accounting, or other professional advice. Google does not endorse, and expressly disclaims liability for any product, manufacturer, distributor, service or service provider mentioned or any opinion expressed in answers or comments. Please read carefully the Google Answers Terms of Service.

If you feel that you have found inappropriate content, please let us know by emailing us at answers-support@google.com with the question ID listed above. Thank you.

Search Google Answers for

Google Home - Answers FAQ - Terms of Service - Privacy Policy