SQL for Smarties | SQL Programming Style | Trees and Hierarchies in SQL | SQL Puzzles and Answers | Data and Databases


Thursday, June 29, 2006

Ambiguous Column Names in Multi-Table Join

SQL Apprentice Question
A (possibly dumb) question, but I've had no luck finding a definitive
answer to it. Suppose I have two tables, Employees and Employers, which
both have a column named "Id":


Employees
-Id
-FirstName
-LastName
-SSN
etc.


Employers
-Id
-Name
-Address
etc.


and now I perform the following join:


SELECT Employees.*, Employers.*
FROM Employees LEFT JOIN Employers ON (Employees.Id=Employers.Id)


The result-set will contain two "Id" columns, so SQL Server will
disambiguate them; one column will still be called "Id", while the
other will be called "Id1." My question is, how are you supposed to
know which "Id" column belongs to which table? My intuition tells me,
and limited testing seems to indicate, that it depends on the order in
which the table names show up in the query, so that in the above
example, "Id" would refer to Employees.Id, while "Id1" would refer to
Employers.Id. Is this order guaranteed?


Also, why does SQL Server use such a IMO brain-damaged technique to
handle column name conflicts? In MS Access, it's much more
straightforward; after executing the above query, you can use
"Employees.Id" and "Employers.Id" (and more generally,
"TableNameOrTableAlias.ColumnName") to refer to the specific "Id"
column you want, instead of "Id" and "Id1" -- the
"just-tack-on-a-number" strategy is slightly annoying when dealing with
complex queries.


Celko Answers

>> A (possibly dumb) question, but I've had no luck finding a definitive answer to it. Suppose I have two tables, Employees and Employers, which both have a column named "Id" <<


Well, first of all, kill the stupid bastard that used "id" as a column
name, since he never read ISO-11179 or any book on BASIC data modeling.
This is not a data element name; it is too vague (identifier of
what??) and it appears EVERYWHERE, so it is a meaningless exposed
physical locator.

Please post DDL, so that people do not have to guess what the keys,
constraints, Declarative Referential Integrity, data types, etc. in
your schema are. Sample data is also a good idea, along with clear
specifications. It is very hard to debug code when you do not let us
see it. If you were polite and had a valid schema, would it look like
this?


CREATE TABLE Personnel -- note the use of a collective name for a set
(ssn CHAR(9) NOT NULL PRIMARY KEY, -- legal requirement!
last_name VARCHAR(20) NOT NULL,
first_name VARCHAR(20) NOT NULL,
.. );


CREATE TABLE Employers
(duns_nbr CHAR(9) NOT NULL PRIMARY KEY, -- industry standards!!
employer_name VARCHAR(20) NOT NULL,
..);



>> and now I perform the following join: <<


I hope not! it makes no sense. What is the relationship between two
values in totally different domains?? Gee, we need a table for that
..


>> The result-set will contain two "Id" columns, so SQL Server will disambiguate them; one column will still be called "Id", while the other will be called "Id1." My question is, how are you supposed to know which "Id" column belongs to which table? <<


By having a proper data model in which different data elements have
different names. What you have here is a "Vague, Magical, Universal
one-size-fits-all Kabalah Number" on tables, when you need a third
table called "Employment" with the employees and employers identifiers
in its columns. Basically the engine is tryitn to do the best it can
with your crappy design.


>> My intuition tells me, and limited testing seems to indicate, that it depends on the order in which the table names show up in the query, so that in the above example, "Id" would refer to Employees.Id, while "Id1" would refer to Employers.Id. Is this order guaranteed? <<


This is one of MANY reasons good programmers do not do this kind of
crappy design. The vendor is free to do anything they wish with the
display of such data. Nobody agrees. Nobody does it the same in
different releases. The best you can do is alias one of the columns.

No comments: