SQL for Smarties | SQL Programming Style | Trees and Hierarchies in SQL | SQL Puzzles and Answers | Data and Databases


Monday, July 17, 2006

weird sql query problem

SQL Apprentice Question
I dont have an DDL for this so I am just going to write it out by hand

CREATE TABLEA
(
AddressID BIGINT,
Address1 nvarchar(100),
Address2 nvarchar(200),
zipcode nvarchar(5)
)
Primary key is AddressID


CREATE Table JunctionA
(
AddressID BIGINT,
PersonID BIGINT,
Description nvarchar(300)
)


and say I have this same data


TableA===========


1 123 test st. '' 12345
2 123 test st '' 12345
3 230 1st st '' 12345
4 345 2nd st '' 90122


Junction==========
1 2 desc1
1 4 desc3
1 3 desc3
2 2 desc2
3 2 desc34
3 1 desc9


as you can see there are 2 addresses in tablea (ID 1 AND 2) that are the
same, and in the junction table, person 2 references both of them! how can I
fix this with a query so every item in tableA is unique while making all the
items in the junction change their references from the ones I want to delete
in tablea to the one that stays?



Celko Answers
This was pretty sloppy even for an example -- DDL is easy to write; do
you really have a need for BIGINT? Usually huge and proprietary are
bad things. Why did you invent variable length zip coes and address
lines that exceed the USPS CHAR(35)?

CREATE TABLE Addresses
(address_id INTEGER NOT NULL PRIMARY KEY,
address1 CHAR(35) NOT NULL,
address2 CHAR(35) NOT NULL,
zip_code CHAR (5) NOT NULL
CHECK (zip_code LIKE '[0-9][0-9][0-9][0-9][0-9]'),
UNIQUE (address1, address2, zip_code) -- prevents your problem!
);


The correct terms are "referencing" and "reference" tables. I have no
idea who invented "Junction" as a term for a table that holds a
relationship.


CREATE TABLE Mailings
(address_id INTEGER NOT NULL
REFERENCES Addresses(address_id)
ON UPDATE CASCADE
ON DELETE CASCADE,
person_id INTEGER NOT NULL
REFERENCES Personnel(person_id)
ON UPDATE CASCADE
ON DELETE CASCADE,
mailing_description VARCHAR(300) NOT NULL, -- 300? guess or research?
PRIMARY KEY (address_id, person_id)
);


Look at how the DRI actions and the constraints are doing most of your
work for you. The UNIQUE() constraint can be expensive and hard to
write because you are dealing with English ("Lane" = "Ln", "Street" =
"St" = "Str"). You probably should look at Melisa Data or SSA software
for cleaning address data -- SSA has a great little booklet on the
problems that will scare you to death.

No comments: