SQL for Smarties | SQL Programming Style | Trees and Hierarchies in SQL | SQL Puzzles and Answers | Data and Databases


Monday, April 17, 2006

representing months with columns

SQL Apprentice Question
keep running into people that represent each month, January through
December, as a column in a table. Considering the nature of some of the
reports generated from this data, storing it in this way can certainly be
convenient; however I personally find the idea repugnant. I've been doing
this a while and think I have a pretty good intuitive grasp on what is good
database design and what isn't... but in this case the other designer could
come back and argue that since the number of months never changes,
representing each of them with a column is perfectly valid. Then I could
argue that that greatly complicates retrieving aggregate data, but then he
could say that my approach greatly complicates presenting the data with a
column per month...

In the beginning of my career as a database designer I never had issues like
this, I didn't need to work with anyone. Now it seems at every turn I end up
arguning with someone about what is good design and what isn't. Usually it's
regrettably obvious that the other person doesn't know what he's doing
("MUST you use nested cursors when all you need is a JOIN!?"), but in this
case I'm not so sure. Before I start another argument, I would like to hear
other people's thoughts in the matter.


Comments?

Celko Answers
This denormalized design is fine in a Data Warehouse or OLAP schema,
where the data is static, the range of the data is static and the data
validated before it goes into the tables.

It is a nightmare for OLTP. I would construct a Calendar table with
ranges for fiscal periods, holidays, promotional periods, etc. set it
up for 10-20 years and join to it to get your "annual window" for
reports.


As an aside, years ago there was an article in DB2 magazine comparing
performance and storage usage between the two designs. I vaguely
remember that the normalized design performed better but took up more
space after April in the sample data. Disk is cheap; time and
integrity are not.

No comments: