SQL Apprentice Question
I inherited a db that is pulling data from 5 different sources and
storing them in 5 different tables. The tables all have the same
basic data, minor differences based on source. The company currently
creates a "summary" table by joining the 5 source tables and using
coalesce( ) to display data from preferred data sources if it is
available from there.
I've been asked to present an alternate schema. My first thought was
to normalize the 5 tables into 1 with a source andI could just include
the source in the ORDER BY of queries to get the preferred source.
But then I thought since each source could have 30Million rows maybe I
would loose performance over the existing schema. So, could somebody
point me towards a reference source that may cover this type of
topic? Of course, any opinions (on this issue) would be appreciated
as well.
Celko Answers
>> I've been asked to present an alternate schema. My first thought was to normalize the 5 tables into 1 with a source and I could just include the source in the ORDER BY of queries to get the preferred source. <<
Something ike this? Cram all the data that you have into staging
table
CREATE TABLE Foobar
(foo_key INTEGER NOT NULL,
source_nbr INTEGER NOT NULL
CHECK (source_nbr BETWEEN 1 AND 5),
PRIMARY KEY (foo_key, source_nbr),
etc.);
Assuming sources are ranks from 1 to 5, pull out the most trusted for
each key
SELECT foo_key, etc.
FROM Foobar AS F1
WHERE source_nbr =
(SELECT MIN(source_nbr)
FROM Foobar AS F2
WHERE F1.foo_key = F2.foo_key);
This is the best we can do without more specs.
Original Source
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment