SQL Apprentice Question
I have the following tables: 
NumberKey 
ID 
Name 
NumberValue 
ID 
NumberKey_ID 
Value 
The values stored in the NumberValue table could be currency, 
percentage, or number that require 7 digits to the right of the decimal 
place, etc., etc. (This table is not very big, so I'm storing many 
different types of numbers here, and not worrying about the most 
efficient data types).  I am toying with the idea of adding a format 
field to the NumberKey table to use later in the UI to format the Value 
field in NumberValue.  I am envisioning using percentage, currency, 
etc. as string so the UI knows what format function to use to display 
the value in a text box, etc. etc. 
Does anyone have any experience with this?  Is this even a good idea? 
Should I let business logic handle this? 
Celko Answers
>> The values stored in the NumberValue table could be currency, percentage, or number that require 7 digits to the right of the decimal  place, etc., etc. << 
Then **by definition** this is NOT a table.  The mi9stake you are 
making is so bad and so popular among non_SQL programemrs it has a name 
--  EAV (Entity-Attribute-Value) modeling. 
A table **by definition** is a set of things of the same kind.  As 
Aristlotle put it, "To be is to be something in particular; to be 
nothing in particular or to be everythig gin general is to be nothing." 
>>  (This table is not very big, so I'm storing many different types of numbers here, and not worrying about the most efficient data types). << 
You have no idea what data integrity is. 
>> I am toying with the idea of adding a format field [sic] to the NumberKey table to use later in the UI to format the Value field [sic] in NumberValue.  << 
You do not know that a field and column are different concepts.  Fields 
can change type and format; columns do not.  COBOL programmers often 
want to do this because that is how their language works. 
Why are you formatting data in the back end?  The basic principle of a 
tiered architecture is that display is done in the front end and never 
in the back end.  This a more basic programming principle than just SQL 
and RDBMS. 
>>  Does anyone have any experience with this? << 
Yes.  I bill $1000 per day plus expenses to repair these systems.  I 
have found that they get corrupted in about one year in production, and 
corrupted beyond repair in less than two years. 
I found an old "cut & paste".  Someone like you posted this: 
CREATE TABLE EAV -- no key declared 
(key_col VARCHAR (10) NULL, 
 attrib_value VARCHAR (50) NULL); 
INSERT INTO EAV VALUES ('LOCATION','Bedroom'); 
INSERT INTO EAV VALUES ('LOCATION','Dining Room'); 
INSERT INTO EAV VALUES ('LOCATION','Bathroom'); 
INSERT INTO EAV VALUES ('LOCATION','courtyard'); 
INSERT INTO EAV VALUES ('EVENT','verbal aggression'); 
INSERT INTO EAV VALUES ('EVENT','peer'); 
INSERT INTO EAV VALUES ('EVENT','bad behavior'); 
INSERT INTO EAV VALUES ('EVENT','other'); 
CREATE TABLE EAV_DATA  -note lack of constraints, defaults, DRI 
(id INTEGER IDENTITY (1,1) NOT NULL, 
 bts_id INTEGER NULL, 
 key_col VARCHAR (10) NULL, 
 attrib_value VARCHAR (50) NULL ); 
INSERT INTO EAV_DATA VALUES (1, 'LOCATION', 'Bedroom'); 
INSERT INTO EAV_DATA VALUES (1, 'EVENT', 'other'); 
INSERT INTO EAV_DATA VALUES (1, 'EVENT', 'bad behavior'); 
INSERT INTO EAV_DATA VALUES (2, 'LOCATION', 'Bedroom'); 
INSERT INTO EAV_DATA VALUES (2, 'EVENT', 'other'); 
INSERT INTO EAV_DATA VALUES (2, 'EVENT', 'verbal aggression'); 
INSERT INTO EAV_DATA VALUES (3, 'LOCATION', 'courtyard'); 
INSERT INTO EAV_DATA VALUES (3, 'EVENT', 'other'); 
INSERT INTO EAV_DATA VALUES (3, 'EVENT', 'peer'); 
Ideally, the result set of the query would be Location Event count 
(headings if possible) 
Bedroom verbal aggression 1 
Bedroom peer 0 
Bedroom bad behavior 0 
Bedroom other 2 
Dining Room verbal aggression 0 
Dining Room peer 0 
Dining Room bad behavior 0 
Dining Room other 0 
Bathroom verbal aggression 0 
Bathroom peer 0 
Bathroom bad behavior 0 
Bathroom other 0 
courtyard verbal aggression 0 
courtyard peer 1 
courtyard bad behavior 0 
courtyard other 1 
Also, if possible, another query would return this result set. (I think 
I know how to do this one.) 
Location Event count 
Bedroom verbal aggression 1 
Bedroom other 2 
courtyard peer 1 
courtyard other 1 
Here is an answer From: Thomas Coleman 
SELECT Locations.locationvalue, Events.eventvalue, 
       (SELECT COUNT(*) 
          FROM (SELECT LocationData.locationvalue, EventData.eventvalue 
                  FROM (SELECT TD1.bts_id, TD1.value AS locationvalue 
                          FROM eav_data AS TD1 
                         WHERE TD1.key = 'location') AS LocationData 
               INNER JOIN 
               (SELECT TD2.bts_id, TD2.value AS eventvalue 
              FROM eav_data AS TD2 
              WHERE TD2.key = 'event' 
            ) AS EventData 
            ON LocationData.bts_id = EventData.bts_id 
      ) AS CollatedEventData 
    WHERE CollatedEventData.locationvalue = Locations.locationvalue 
      AND CollatedEventData.eventvalue = Events.eventvalue 
FROM (SELECT T1.value AS locationvalue 
      FROM EAV AS T1 
      WHERE T1.key = 'location') AS Locations, 
   (SELECT T2.value AS eventvalue 
      FROM EAV AS T2 
     WHERE T2.key = 'event') AS Events 
ORDER BY Locations.locationvalue, Events.eventvalue , 
SELECT Locations.locationvalue, Events.eventvalue 
       (SELECT COUNT(*) 
          FROM (SELECT LocationData.locationvalue, EventData.eventvalue 
                  FROM (SELECT TD1.bts_id, TD1.value AS locationvalue 
                          FROM eav_data AS TD1 
                         WHERE TD1.key = 'location') AS LocationData 
              INNER JOIN 
              (SELECT TD2.bts_id, TD2.value AS eventvalue 
                 FROM eav_data AS TD2 
                WHERE TD2.key = 'event') AS EventData 
              ON LocationData.bts_id = EventData.bts_id) 
              AS CollatedEventData 
    WHERE CollatedEventData.locationvalue = Locations.locationvalue 
      AND CollatedEventData.eventvalue = Events.eventvalue) 
FROM (SELECT T1.value AS locationvalue 
        FROM EAV AS T1 
      WHERE T1.key = 'location') AS Locations, 
    (SELECT T2.value AS eventvalue 
       FROM EAV AS T2 
       WHERE T2.key = 'event') AS Events; 
Is the same thing in a proper schema as: 
SELECT L.locationvalue, E.eventvalue, COUNT(*) 
  FROM Locations AS L, Events AS E 
 WHERE L.btd_id = E.btd_id 
 GROUP BY L.locationvalue, E.eventvalue; 
The reason that I had to use so many subqueries is that those entities 
are all lopped into the same table. There should be separate tables for 
Locations and Events. 
The column names are seriously painful. Beyond the fact that I 
personally hate underscores in column names, using underscores at the 
end of the column name is really non-intuitive. I removed them for my 
example and came across the next column name faux pas. Don't use "key" 
and "value" for column names. It means that the developer *has* 
surround the column name with square brackets for everything which is a 
serious pain. 
There is such a thing as "too" generic. There has to be some structure 
or everything becomes nothing more than a couple of tables called 
"things". The real key (no pun intended) is commonality. Is there a 
pattern to the data that they want to store? It may not be possible to 
create one structure to rule them all and in the darkness bind them. 
"To be is to be something in particular; to be nothing in particular is 
to be nothing." --Aristole 
All data integrity is destroyed. Any typo becomes a new attribute or 
entity. Entities are found missing attributes, so all the reports are 
wrong. 
ry to write a single CHECK() constraint that works for all the 
attributes of those 30+ entities your users created because you were 
too dumb or too lazy to do your job. It can be done! You need a case 
expression almost 70 WHEN clauses for a simple invoice and order system 
when I tried it as an exercise. 
try to write a single DEFAULT clause for 30+ entities crammed into one 
column. Impossible! 
Try to set up DRI actions among the entities. If you thought the WHEN 
clauses in the single CASE expression were unmaintainable, wait until 
you see the "TRIGGERs from Hell" -- Too bad that they might not fit 
into older SQL Server which had some size limits. Now maintain it.
Monday, June 19, 2006
Subscribe to:
Post Comments (Atom)

No comments:
Post a Comment