SQL for Smarties | SQL Programming Style | Trees and Hierarchies in SQL | SQL Puzzles and Answers | Data and Databases


Tuesday, April 11, 2006

Normalizing help and naming conventions

SQL Apprentice Question
I have an Access Database that I am working on normalizing to SQL Server.
This Access database has been revised multiple times, each time the
programmer (not me!) didn't work to get the existing data to the same level
as the revised program. For instance, the HairColor column has the
following DISTINCT values,

Auburn
Black
Black/Gray
Blonde
Brown
Brown/Gray
Dark Brown
Gray
Gray/Black
Gray/Brown
Lt. Brown
Red
Red-blonde
Red/Blonde
Red/Brown

Now the current revision of the Access database limits the values for the
HairColor column to,

Black
Blonde
Brown
Gray
Red

So I need to make the current values reflect the current dropdown. I know
how to go about this, but have a question about normalizing this database.
Should I create a seperate table, HairColor and then assign ID's to the
HairColor column? What is the benefit of doing it this way, instead of just
leaving the text in there?

Also, if I am to create a new table, how does the naming convention work
with that? Should I prefix my table name with fk_HairColor, or something
like that?

Any advice is appreciated!

Thanks!

Celko Answers
>> Should I create a seperate table, HairColor and then assign ID's to the
HairColor column? What is the benefit of doing it this way, instead of
just
leaving the text in there? <<

Let's start with basics:

1) Do your research! Is there an industry standard code for hair
colors? Use it

2) If not, then design an encoding that has a missing and/or misc code
value -- Hey, I am bald!

3) If you expect the values to change, the list to be long, or that you
willneed to add other information later, then put the encoding into a
separate table.

4) If you expect the values to be static, the list to be short or that
you will not need to add other information later, then put the encoding
into a CHECK() constraint. For example, the ISO sex codes are going to
stay the same for awhile, as are the two-letter state codes.

Get a copy of SQL PROGRAMMING STYLE. You never, never prefix a data
element name with thigns to tell you (1) Where it is used -- so no
table names (2) how it is used -- so no pk- or fk- crap (3) how it is
store, so no data types. Follow ISO-11179 rules and name a thing for
what it is, independent of its location or usage.

Do not blindly design an encoding as a numbered list. Think about
hierarchical ("Dewey Decimal") codes, abbreviation codes, etc. That
is also in SQL PROGRAMMING STYLE.

No comments: