SQL for Smarties | SQL Programming Style | Trees and Hierarchies in SQL | SQL Puzzles and Answers | Data and Databases


Thursday, June 14, 2007

Question regarding multiple data sources and coalesce

SQL Apprentice Question
I inherited a db that is pulling data from 5 different sources and
storing them in 5 different tables. The tables all have the same
basic data, minor differences based on source. The company currently
creates a "summary" table by joining the 5 source tables and using
coalesce( ) to display data from preferred data sources if it is
available from there.

I've been asked to present an alternate schema. My first thought was
to normalize the 5 tables into 1 with a source andI could just include
the source in the ORDER BY of queries to get the preferred source.
But then I thought since each source could have 30Million rows maybe I
would loose performance over the existing schema. So, could somebody
point me towards a reference source that may cover this type of
topic? Of course, any opinions (on this issue) would be appreciated
as well.


Celko Answers

>> I've been asked to present an alternate schema. My first thought was to normalize the 5 tables into 1 with a source and I could just include the source in the ORDER BY of queries to get the preferred source. <<


Something ike this? Cram all the data that you have into staging
table

CREATE TABLE Foobar
(foo_key INTEGER NOT NULL,
source_nbr INTEGER NOT NULL
CHECK (source_nbr BETWEEN 1 AND 5),
PRIMARY KEY (foo_key, source_nbr),
etc.);


Assuming sources are ranks from 1 to 5, pull out the most trusted for
each key


SELECT foo_key, etc.
FROM Foobar AS F1
WHERE source_nbr =
(SELECT MIN(source_nbr)
FROM Foobar AS F2
WHERE F1.foo_key = F2.foo_key);


This is the best we can do without more specs.




Original Source

No comments: