SQL for Smarties | SQL Programming Style | Trees and Hierarchies in SQL | SQL Puzzles and Answers | Data and Databases


Wednesday, May 17, 2006

Junction table data integrity approaches

SQL Apprentice Question
A database that I'm supporting has a 'super' many to many table (DDL
below) that joins any table to any other table. The business uses this
to create links between 'Vehicles', 'People', 'Groups of people',
'Places' and about ten other entities/tables.


Now I understand that this is not good DB design because the integrity
cannot be enforced and this is indeed where problems are surfacing. I
want to try and ensure that orphaned records never occur in this super
sized junction table. On the other hand, if this were done with a
many-to-many table for each entity then the integrity would be
maintained by the DB, but the design would not be as manageable due to
the number of tables.


The orphaned records are causing code crashes because the developers
expect there to be a record on the other end of every join.


I have two questions then;
1) Is there a way to keep the current structure and enforce integrity?
2) Is there another approach that you'd recommend?


Triggers are an option although they are generally 'hidden' from
developers who will forget to add a trigger for new tables.


A practical option is to clean the table of orphaned records once a day
and possibly do a quick check for orphan records when a user logs in
(as this only occurrs infrequently, the delay of a few seconds should
not be too much of an issue). This sounds like sticking plaster though
as I'm just covering up the problem, not solving it.


I appreciate that this has been asked before but I've Googled for a
good hour and I can't quite find anything that quite answers it.


DDL below, thanks in advance for your opinions,


Ryan


CREATE TABLE [dbo].[Link](
[FromPrimaryKey] [int] NOT NULL,
[FromTargetTableID] [int] NOT NULL,
[ToPrimaryKey] [int] NOT NULL,
[ToTargetTableID] [int] NOT NULL,
[Notes] [nvarchar](3000) COLLATE Latin1_General_CI_AS NULL,
[IsGenerated] [bit] NOT NULL,
CONSTRAINT [PK_Links] PRIMARY KEY CLUSTERED
(
[FromPrimaryKey] ASC,
[FromTargetTableID] ASC,
[ToPrimaryKey] ASC,
[ToTargetTableID] ASC
)WITH (PAD_INDEX = OFF, IGNORE_DUP_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY]


Celko Answers
Not only is this a bad idea, but it says that **all** primary keys are
integers! That is not very likely in the real world.

You are also committing the sin of mixing data and metadata in a
schema.


Do not use the proprietary BIT data type; this is SQL and not a punch
card system which depended on such low level flags.



>> A practical option is to clean the table of orphaned records [sic] once a day and possibly do a quick check for orphan records [sic] when a user logs in (as this only occurrs infrequently, the delay of a few seconds should not be too much of an issue). <<


This sort of "monster table" is not easier to maintain. You make an
error in one place and destroy integrity in a totally unrelated place.
The fact that you are creating orpahans during a session is the
problem. Cleaning them up is a kludge.

What you want is a relationship table for each relationship. It should
have a meaningful name like "Vehicle Assignments" and contain other
information about the relationship, say from and to dates on the (VIN,
employee_id) that make up the natural key.


While this is not a OTLT or MUCK design, it is as bad or worse. You
have missed the most basic ideas of RDBMS in this design.


Rows are not records and "link" refers to pointer chains; SQL has
REFERENCES which are abstract and may or may not be implemented with
pointers. You even think about adding procedural code (triggers)
instead of declarative constraints. You are still thinking about a
non-RDBMS model of data.

No comments: