SQL for Smarties | SQL Programming Style | Trees and Hierarchies in SQL | SQL Puzzles and Answers | Data and Databases


Monday, July 03, 2006

Identity keys in a One-to-Zero-or-Many relationship

SQL Apprentice Question
Considering the following:

CREATE TABLE CustomerTypes
(
PKCustomerType INT IDENTITY (1,1) NOT NULL,
CustomerTypeDesc VARCHAR(30)
)
GO


INSERT CustomerTypes (CustomerTypeDesc) VALUES ('Retail')
INSERT CustomerTypes (CustomerTypeDesc) VALUES ('Wholesale')
INSERT CustomerTypes (CustomerTypeDesc) VALUES ('Corporate')


CREATE TABLE Customers
(
PKCustomer INT IDENTITY (1,1) NOT NULL,
CustomerID VARCHAR(20),
FKCustomerType INT
)
GO


INSERT Customers (CustomerID, FKCustomerType) Values ('CUST001', 1)
INSERT Customers Values ('CUST002', 3)
INSERT Customers Values ('CUST003', 2)


How should I handle a case where I don't want to specify a customer
type for whatever reason? Perhaps it is unknown or not applicable in
some cases. I could add a 'None or N/A' record in the CustomerTypes
table for that purpose, but when validating the data in an INSERT
stored procedure, for example, I would like to have a value to use as
default in the case of a missing of invalid field value being passed
for FKCustomerType. If it weren't an identity field in the
CustomerTypes table, I could just create a CustomerType record with a
PKCustomerType value of 0 and always use that value in such cases. With
it being an identity field, however, I can't count on a specific value
for that purpose. It seems I have two options:


1) Create the CustomerTypes table without PKCustomerType being an
identity field...insert the 'None or N/A' record, assigning it a
PKCustomerType value of, say, 0...alter the table to make
PKCustomerType and identity field (1,1)


2) Merely use NULL in Customers where CustomerType isn't specified.


Is there a better way? If not, any insight on why one approach would be
better than the other? I'm leaning toward option 2 at this point.


Celko Answers

>> Considering the following: <<


Let's fix what you posted so that it follows the most basic RDBMS
design principles. For example, why do you have no keys? Why did you
think that IDENTITY is every used? Why did you use PK- prefixes in
violation of ISO-11179 rules?

CREATE TABLE CustomerTypes
(customer_type INTEGER NOT NULL PRIMARY KEY,
customer_type_desc VARCHAR(30) NOT NULL);


I see from the use of IDENTITY that you did not bother with designing
an encoding scheme. You might want to learn how to do that


CREATE TABLE Customers
(customer_id VARCHAR(20) NOT NULL PRIMARY KEY,
(customer_type INTEGER NOT NULL
REFERENCES CustomerTypes (customer_type)
ON UPDATE CASCADE,
.. );


VARCHAR(20) is a bit long, but by the definition of an identifier, this
has to be your key. Frankly, I would look for a DUNS number or some
indusrty standard code.


Stop putting prefixes that tell you **how** a data element is used in a
table. The name of a data element is supposed to tell you **what** it
is.



>> How should I handle a case where I don't want to specify a customer type for whatever reason? Perhaps it is unknown or not applicable in some cases. <<


Then you need codes for those situations if they are logically
different. For example, the ICD codes for disease have '000.000' which
means "undiagnosed" and '999.999' which means "we did all the test and
still don't know!" -- very different kinds of missing data! VERY
IMPORTANT to distinguish them!! Matter of life and death, in fact.


>> I could add a 'None or N/A' record [sic] in the CustomerTypes table for that purpose, but when validating the data in an INSERT stored procedure, for example, I would like to have a value to use as default in the case of a missing of invalid field [sic] value being passed for FKCustomerType [sic]. <<


Now we are getting to your REAL problem. Let's get back to the basics
of an RDBMS. Rows are not records; fields are not columns; tables are
not files; there is no sequential access or ordering in an RDBMS, so
faking a record number or the lines on a piece of paper with a
proprietary IDENTITY property is dead wrong. You do not have the right
mindset, and all you are going to get on a Newsgroup is a few kludges
to help you fake it for a long period of time before the collapse.


>> If it weren't an identity field [sic] in the CustomerTypes table, I could just create a CustomerType record [sic] with a PKCustomerType value of 0 and always use that value in such cases. With it being an identity field [sic], however, I can't count on a specific value for that purpose. <<


Surprise! Surprise! Surprise! See how non-relational, non-verifiable,
non-portable proprietary extensions screw up things?


>> It seems I have two options: <<


Yes; do it right or kludge it :)


>> 1) Create the CustomerTypes table without PKCustomerType [sic] being an identity field [sic] ...insert the 'None or N/A' record, assigning it a PKCustomerType [sic] value of, say, 0...alter the table to makePKCustomerType and identity field [sic] (1,1) <<


Almost! Stop using IDENTITY. This is an RDBMS and not a sequential
file system.


>> 2) Merely use NULL in Customers where CustomerType isn't specified. <<


Maybe; do you need to know anything about the missing values? Or just
that it is missing? Get a copy of SQL PROGRAMMING STYLE, SQL FOR
SMARTIES
or DATA & DATABASES and then read the chapters on scales &
measurements, and how to design encoding schemes. The research pattern
is simple:

1) Look for industry standards (Google it!)


2) Look for company standards (see the accounting department for help)


3) In the remaining 5% of the cases where you have to invent something,
pick a type of encoding and follow the rules for good design. My guess
in this example is a hierachy or vector code because customers break
down into tax/tax exempt, domestic/foreign and so forth within
retailers and wholesalers.

No comments: