Joe Celko The SQL Apprentice: Table Joins on more than one field

SQL Apprentice Question
Can someone help me with this SQL?

1) EMPLOYEE table has columns 'employee_id' and 'emp_sid' and other
columns

2) EMPLOYEE_BENEFITS table has a column called employee_entity, this
column can be joined to either 'employee_id' OR 'emp_sid' but not both
in the EMPLOYEE table.

3) EMPLOYEE_TRACK table has column called employee_track_entity, this
column can be joined to the employee_benefits_id (PK) of the
EMPLOYEE_BENEFITS table.

I am listing the sql for the tables (the tables shows only the columns
in question)

CREATE TABLE [dbo].[EMPLOYEE] (
[employee_id] [int] IDENTITY (1, 1) NOT NULL ,
[empsid_id] [int] NOT NULL
) ON [PRIMARY]
GO

CREATE TABLE [dbo].[EMPLOYEE_BENEFITS] (
[employee_benefits_id] [int] IDENTITY (1, 1) NOT NULL ,
[employee_entity] [int] NOT NULL
) ON [PRIMARY]
GO

CREATE TABLE [dbo].[EMPLOYEE_TRACK ] (
[employee_track_id] [int] IDENTITY (1, 1) NOT NULL ,
[employee_track_entity] [int] NOT NULL
) ON [PRIMARY]
GO

SELECT * FROM EMPLOYEE e
INNER JOIN
EMPLOYEE_BENEFITS eb ON (e.employee_id = eb.employee_entity OR
e.empsid_id = eb.employee_entity)
INNER JOIN
EMPLOYEE_TRACK et ON eb.employee_benefits_id = et.employee_track_entity

The above SQL I wrote is this: the second inner join uses a OR to join
either of the columns in the first table EMPLOYEE. There is performance
degradation with this SQL. With huge data It takes about 30 seconds to
execute. I know this is not the perfect way to do it, can anyone of the
SQL Gurus please enlighten me to a faster approach?

If I dont use the OR I can try left join on the same table
EMPLOYEE_BENEFITS twice by changing the join types, but If I did this
what table alias can I use to join to the 3rd table?

SELECT * FROM EMPLOYEE e
LEFT JOIN
EMPLOYEE_BENEFITS eb1 ON e.employee_id = eb.employee_entity
LEFT JOIN
EMPLOYEE_BENEFITS eb2 ON e.empsid_id = eb.employee_entity
INNER JOIN
EMPLOYEE_TRACK et ON [???].employee_benefits_id =
et.employee_track_entity

Celko Answers
>> Can someone help me with this SQL? <<

Not really, because you do not have an RDBMS. You have a bunch of
poorly designed non-tables written in SQL.

>> 1) EMPLOYEE table has columns 'employee_id' and 'emp_sid' and other

columns <<

The table name EMPLOYEE (all uppercase so it ihard to read or you are
using punchcards for the one employee you have -- singular names mean
one entity). If this table is for personnel data (a set), not for each
employee as if they records in a sequential file, why did you give it
such a bad name. Which one of these two columns is the PRIMARY KEY?
Oh, you have no keys at all!!

>> 2) EMPLOYEE_BENEFITS table has a column called employee_entity, this column can be joined to either 'employee_id' OR 'emp_sid' but not both in the EMPLOYEE table. <<

You actually used a postfix of entity! So much for data modeling and
ISO-11179 specs! You also missed the whole idea of PK-FK constraints --
There is no OR option in the concept. I think that some early network
DBMS system had "variant pointers" that could work liekthat, but I
owuld have to research it

>> 3) EMPLOYEE_TRACK table has column called employee_track_entity, this column can be joined to the employee_benefits_id (PK) of the EMPLOYEE_BENEFITS table.<<

Again, a singular name so we have only one track. Since IDENTITY can
never be a key by definition, EMPLOYEE_BENEFITS has no key to
reference. Don't your benefit programs have names, tax ids, or
something you can validate and verify?

Again, you are creating a pointer chain DBMS system in SQL, but do not
have the background to realize that you are re-inventing a square
wheel.

CREATE TABLE Personnel
(employee_id CHAR(9) NOT NULL PRIMARY KEY, -- use legally required id
...);

CREATE TABLE EmployeeBenefits
(employee_benefits_id INTEGER NOT NULL PRIMARY KEY,
employee_id CHAR(9) NOT NULL
REFERENCES Personnel(employee_id)
ON UPDATE CASCADE
ON DELETE CASCADE,
..);

CREATE TABLE EmployeeTracks
(employee_track_id INTEGER NOT NULL PRIMARY KEY,
.. );

>> The above SQL I wrote is this: the second inner join uses a OR to join either of the columns in the first table EMPLOYEE. There is performance degradation with this SQL. With huge data It takes about 30 seconds to execute. <<

You are worried about performance degradation?? You forgot the lack of
data integrity caused by two keys.

>> I know this is not the perfect way to do it, can anyone of the SQL Gurus please enlighten me to a faster approach? <<

Clean up the DDL. Get real keys instead of that IDENTITY crap. Learn
basic data modeling. Fix the multiple key problem. And stop putting
SELECT * in production code.

I saw a problem like this once a few decades ago. Two companies had
merged because they wer tired of competing in the same market (welding
supplies, same products). One company coded its inventory by the
location in the warehouse (makes picking orders very easy) and the
other coded by the type of welding done (aluminium, brass, underwater,
etc.). The warehouses were arranged very differently becuase of the
encoding. Are you familar with the Japanese housing numbering system
versus the United States?

They wanted a combined inventory and catalog, but their customers and
personnel were too used to one system or the other and the politics
were awful.

What they needed as a kludge was one and only one SKU code and a
conversion table in the computer and a pair of codes on the labels.
Until they could design a good SKU code.

Did this cost a lot of time and money? You bet! In fact, it killed the
merger. Each warehouse was an island of data, so there was no timely
way to move inventory across the two SKU codes to fill orders.
Someone asks for 5 Widgets and 2 are under code #A and 3 are under code
#B in another warehouse, but nobody knew!

Have you been to a Barnes & Noble lately? Look at the company sticky
label that goes over the pre-printed ISBN code. Same expensive, stupid
design flaw that you and the welding supplies companies had.

Joe Celko The SQL Apprentice

Wednesday, May 10, 2006

Table Joins on more than one field

No comments:

Blog Archive

Labels