SQL for Smarties | SQL Programming Style | Trees and Hierarchies in SQL | SQL Puzzles and Answers | Data and Databases


Wednesday, April 11, 2007

Convert *= / =* to outer joins (ANSI compliant)

SQL Apprentice Question
I'm working on converting *= and =* to 'left outer join' and 'right outer
join'.


I noticed the difference in behavior between the =* and the phrase "right
outer join" and between *= and 'left outer join'. The result set is
different. Here is an example:


select a.au_id, b.title, c.qty


from titleauthor a, titles b, sales c


where (a.title_id =* b.title_id)


and (a.title_id =* c.title_id)


I try to conver the above to:


select a.au_id, b.title, c.qty


from titleauthor a


right outer join titles b


on (a.title_id = b.title_id )


right outer join sales c


on (a.title_id = c.title_id )


The first results into 391 rows in pubs database of sql-server 2000 and the
second produces 34 rows. It seems not that straight forward to convert.


The question is: If I want to get 391 row, what should I change/add in the
second sql statement?

Thank you. Appreciate your help.



Celko Answers
There is no simple answer because the old proprietary *=, +=, OUTER 
<tbl name>, (+) syntaxes all behaved a little differently. There were
a lot of weird tricks to work around the fixed orders of execution and
so forth.

I am also not sure how to handle a "HAVING x *= y" clause, which old
Sybase people used to write. And then nesting subqueries could mess up
things. But if you understand how the ANSI syntax works, then you can
take the **intent** of the old code and re-write the statements pretty
quickly.


Here is how OUTER JOINs work in SQL-92. Assume you are given:


Table1 Table2
a b a c
====== ======
1 w 1 r
2 x 2 s
3 y 3 t
4 z


and the outer join expression:


Table1
LEFT OUTER JOIN
Table2
ON Table1.a = Table2.a <== join condition
AND Table2.c = 't'; <== single table condition


We call Table1 the "preserved table" and Table2 the "unpreserved
table" in the query. What I am going to give you is a little
different, but equivalent to the ANSI/ISO standards.


1) We build the CROSS JOIN of the two tables. Scan each row in the
result set.


2) If the predicate tests TRUE for that row, then you keep it. You
also remove all rows derived from it from the CROSS JOIN


3) If the predicate tests FALSE or UNKNOWN for that row, then keep the
columns from the preserved table, convert all the columns from the
unpreserved table to NULLs and remove the duplicates.


So let us execute this by hand:


Let @ = passed the first predicate
Let * = passed the second predicate


Table1 CROSS JOIN Table2
a b a c
=========================
1 w 1 r @
1 w 2 s
1 w 3 t *
2 x 1 r
2 x 2 s @
2 x 3 t *
3 y 1 r
3 y 2 s
3 y 3 t @* <== the TRUE set
4 z 1 r
4 z 2 s
4 z 3 t *


Table1 LEFT OUTER JOIN Table2
a b a c
=========================
3 y 3 t <= only TRUE row
-----------------------
1 w NULL NULL Sets of duplicates
1 w NULL NULL
1 w NULL NULL
-----------------------
2 x NULL NULL
2 x NULL NULL
2 x NULL NULL
3 y NULL NULL <== derived from the TRUE set - Remove
3 y NULL NULL
-----------------------
4 z NULL NULL
4 z NULL NULL
4 z NULL NULL


the final results:


Table1 LEFT OUTER JOIN Table2
a b a c
=========================
1 w NULL NULL
2 x NULL NULL
3 y 3 t
4 z NULL NULL


The basic rule is that every row in the preserved table is represented
in the results in at least one result row.


There are limitations and very serious problems with the extended
equality version of an outer join used in some diseased mutant
products. Consider the two Chris Date tables


Suppliers SupParts
supno supno partno qty
========= ==============
S1 S1 P1 100
S2 S1 P2 250
S3 S2 P1 100
S2 P2 250


and let's do an extended equality outer join like this:


SELECT *
FROM Supplier, SupParts
WHERE Supplier.supno *= SupParts.supno
AND qty < 200;


If I do the outer first, I get:


Suppliers LOJ SupParts
supno supno partno qty
=======================
S1 S1 P1 100
S1 S1 P2 250
S2 S2 P1 100
S2 S2 P2 250
S3 NULL NULL NULL


Then I apply the (qty < 200) predicate and get


Suppliers LOJ SupParts
supno supno partno qty
===================
S1 S1 P1 100
S2 S2 P1 100


Doing it in the opposite order


Suppliers LOJ SupParts
supno supno partno qty
===================
S1 S1 P1 100
S2 S2 P1 100
S3 NULL NULL NULL


Sybase does it one way, Oracle does it the other and Centura (nee
Gupta) lets you pick which one -- the worst of both non-standard
worlds! In SQL-92, you have a choice and can force the order of
execution. Either do the predicates after the join ...


SELECT *
FROM Supplier
LEFT OUTER JOIN
SupParts
ON Supplier.supno = SupParts.supno
WHERE qty < 200;


... or do it in the joining:


SELECT *
FROM Supplier
LEFT OUTER JOIN
SupParts
ON Supplier.supno = SupParts.supno
AND qty < 200;


Another problem is that you cannot show the same table as preserved
and unpreserved in the extended equality version, but it is easy in
SQL-92. For example to find the students who have taken Math 101 and
might have taken Math 102:


SELECT C1.student, C1.math, C2.math
FROM (SELECT * FROM Courses WHERE math = 101) AS C1
LEFT OUTER JOIN
(SELECT * FROM Courses WHERE math = 102) AS C2
ON C1.student = C2.student;

No comments: