Joe Celko The SQL Apprentice: April 2006

Thursday, April 27, 2006

what is another word for database cell

SQL Apprentice Question
What is another word for database cell?
ie the single atomic units of data that a rdbms stores.

Celko Answers
>> the single atomic units of data that a rdbms stores. <<

Cell??? You are thinking about spreadsheets. A spreadsheet is made up
rows and columns which define a cell. A cell is either a value or a
declarative program element. they are both required and are of equal
status to get to the basic unit of work -- the cell.

A table is made up of ROWS, then rows are made up of columns. The unit
of work is the row, not the column within a row. You insert a whole
row; you delete a whole row and (beleive it or not) you update a whole
row. THIS IS A TOTALLY DIFFERENT MODEL!

The ANSI model of an UPDATE is that it acts as if

1) You go to the base table. It cannot have an alias because an alias
would create a working table that woudl be updates and then disappear
sfter the statement.

2) You go to the WHERE clause. All rows (if any!) that test TRUE are
marked as a subset. If there no WHERE clause, then the entire table is
marked. The name of this set/pseudo-table is OLD in Standard SQL.

3) You go to the SET clause and construct a set/pseudo-table called
NEW. The rows in this table are build by copying values from the
columns are not mentioned from the original row to the NEW row. The
columns that are assigned all at once. That is, the unit of work is a
row, not one column at a time.

4) The OLD subset is deleted and the NEW set is inserted. This is why

UPDATE Foobar
SET a = b, b = a;

Swaps the values in the columns a and b. The engine checks constraint
and does a ROLLBACK if there are violations.

In full SQL-92, you can use row constructors to say things like:

UPDATE Foobar
SET (a, b)
= (SELECT x, y
FROM Floob AS F1
WHERE F1.keycol= Foobar.keycol);

What is the right term for the legs of an automobile? Answer: the
concept of a cell does not exist in RDBMS.

We have rows, which are made up of (unordered) columns. A column can
only hold a scalar value. A column can exist only within a row; a row
exists only within a table.

A spreadsheet cell is fixed in its location by its (row, column, sheet)
co-ordindates. It can hold a scalar value or declarative code.

I have run into this problem before when teaching accounting people.
There are always a few who have trouble making the leap. On the good
side, they are used to declarative programmng and do not have to learn
the "There are no FOR-loops in SQL!" stuff.

Wednesday, April 26, 2006

column "generated always as" generated from multiple columns

SQL Apprentice Question
I have a table, something similar to:

create table my_table (
id char(32) not null primary key,
num integer not null,
code varchar(2) not null,
name varchar(60) not null,
[...]
)

Now I need to add another column 'calc_field' of type varchar that is
always automatically
derived from columns: id, num, code (concatenated)
I was trying to do something with concatenation and CAST but it always
fails.
Even if I try to test my expression with simple select to see generated
column,
the same expression fails in the 'generated always as' statement.

Any hint how to do it?

Celko Answers
Put this in a VIEW. The code will be Standard, portable SQL and always
up to date. Do not make it hard and proprietary.

>> I still don't understand why you're referring to a deprecated standard.

SQL92 is superseeded by SQL:1999, which in turn is out-of-date since
December 2003 wher SQL:2003 was published by ISO. <<

But then you get into the problems of porting code to products that are
just getting to SQL-92 and the way the US Government (largest user of
computers on Earth) only wants SQL-92 code. Then local syntax for this
feature varies between SQL:2003 and SQL Server, etc.

I'd go with the VIEW for practical reasons right now and then switch
over when more products have it.

Tuesday, April 25, 2006

Calling a stored procedure through an UPDATE statement

SQL Apprentice Question
I'm trying to create several insert statements in a stored procedures
as in below, but it will insert the reocrds sequentially and i want the
stored procedures to populate the fields that are parrallel. I also tried
set rowcount 0 - it didn't accept that and I don't know how to use the
UPDATE for
this like "set Count_H = exec LOE_H_Counts" and it didn't work.

Does anybody know how I can call a stored procedure in an UPDATE clause or
if there is another way to get around this?

Thank you so much for your time!

CREATE Procedure LOE_Counts
as
create table #tempLOE( Status varchar(50) ,Type varchar (50) ,
Count_H int, Count_M int, Count_L int, Count_Total int)
insert #tempLOE(id, Status, TYPE, Count_Total)
exec LOE_GrandTotal
insert #tempLOE(Count_H)
exec LOE_H_Count
insert #tempLOE(Count_M)
exec LOE_M_Count
select * from #tempLOE

Celko Answers
>>I'm trying to create several insert statements in a stored procedures

as in below, but it will insert the records [sic] sequentially and i
want the
stored procedures to populate the fields [sic] that are parrallel
[unh?]. <<

Let's get back to the basics of an RDBMS. Rows are not records; fields
are not columns; tables are not files; there is no sequential access or
ordering in an RDBMS, so "first", "next" and "last" are totally
meaningless. If you want an ordering, then you need to have a column
that defines that ordering. You must use an ORDER BY clause on a
cursor or in an OVER() clause.

In SQL, the INSERT, UPDATE and DELETE statements work on rows as a
unit. This is not a file system, wher you can move a read/write head
to a field and do your work. This is what happens when you use the
wrong words.

I will not even get into the vagueness of "id", "type" and "status" as
attempts at data element names.

Next, good SQL programmers avoid UDF and procedural code. That is 3GL
programming and not declarative coding.

GROUP BY Question

SQL Apprentice Question
I need to generate daily aggregates (sums) from some hourly data and although
I have query that works, I think I've gone overboard on the GROUP BY
statement. Can anyone give me some feedback on this? I'm sure there is a
better way to do this, but I'm at a loss.

SELECT
max(LP.NO_FINISH_LINE) NO_FINISH_LINE,
CONVERT(CHAR(10),max(LP.DT_FINISHED),126) DT_FINISHED,
CAST(sum(LP.NO_WEIGHT_PROD) as DECIMAL(8,2)) NO_WEIGHT_PROD,
CAST(sum(LP.NO_WEIGHT_REJECT) as DECIMAL(8,2)) NO_WEIGHT_REJECT,
FROM LOT_PRODUCTION LP
WHERE (LP.DT_FINISHED BETWEEN CONVERT(CHAR(10), DATEADD(mm, - 1,
(DATEADD(dd, 1 - DAY(GETDATE()), GETDATE()))), 126) AND GETDATE()) AND
LP.NO_FINISH_LINE = 'L2'
GROUP BY DAY(DT_FINISHED),MONTH(DT_FINISHED),YEAR(DT_FINISHED)
ORDER BY DT_FINISHED ASC;

Celko Answers
Why are you formatting the data in the query? That is always done in
the front end. Why are you computing all those silly things in SQL,
which is not a computational language?

Because you are thinking like a COBOL programmer whose data is stored
in strings; think like an SQL programmer who works with abstract data
types.

First build an auxiliary table with your reporting periods. Remember,
tables and not computations:

CREATE TABLE ReportPeriods
(report_period_name CHAR(10) NOT NULL PRIMARY KEY,
rpt_start_date DATETIME NOT NULL,
rpt_end_date DATETIME NOT NULL,
CHECK (rpt_start_date < rpt_end_date));

Next, do a simple join to get your reports:

SELECT R.report_period_name,
MAX(LP.finish_line_nbr)AS finish_line_max,
SUM(LP.prod_wgt) AS prod_wgt_tot,
SUM(LP.reject_wgt) AS reject_wgt_tot,
FROM LotProduction AS LP,
ReportPeriods AS R
WHERE LP.finish_date
BETWEEN R.rpt_start_date AND R.rpt_end_date
GROUP BY R.report_period_name;

Since your data element did not follow ISO-11179 rules, I tried to
guess at corrections. You might also want to start writing Standard SQL
instead of dialect CURRENT_TIMESTAMP instead of the old getdate(),
CAST() instead of CONVERT(), etc.

Using one table vs. many

SQL Apprentice Question
I am designing a table that will store large amounts of rows.
(>100,000). The first idea is to create a generic-like table:

id | type | data

Where data field is dependant on type field. That is, if type == 1 data
is INT, if type == 2 i might interpret data as VARCHAR... Is that a
good thing to do? I belive i could use BLOB as data field type. This
way i could store variant types, right? Also note, that i do not plan
to run SQL queries that depend or in other ways manipulate data field.

Or should i split one huge table so they look like this:

table_type1
id | data (INT)

table_type2
id | data (VARCHAR)

Celko Answers
Before I cut & paste a detailed rant about EAV, you might want to learn
all the HUGE differences between fields and columns. Given the proper
definitions, you will not ask questions like this again. The idea of a
variant column is like "changable constant" in RDBMS.

I found an old "cut & paste". Someone like you posted this:

CREATE TABLE EAV -- no key declared
(key_col VARCHAR (10) NULL,
attrib_value VARCHAR (50) NULL);

INSERT INTO EAV VALUES ('LOCATION','Bedroom');
INSERT INTO EAV VALUES ('LOCATION','Dining Room');
INSERT INTO EAV VALUES ('LOCATION','Bathroom');
INSERT INTO EAV VALUES ('LOCATION','courtyard');
INSERT INTO EAV VALUES ('EVENT','verbal aggression');
INSERT INTO EAV VALUES ('EVENT','peer');
INSERT INTO EAV VALUES ('EVENT','bad behavior');
INSERT INTO EAV VALUES ('EVENT','other');

CREATE TABLE EAV_DATA -note lack of constraints, defaults, DRI
(id INTEGER IDENTITY (1,1) NOT NULL,
bts_id INTEGER NULL,
key_col VARCHAR (10) NULL,
attrib_value VARCHAR (50) NULL );

INSERT INTO EAV_DATA VALUES (1, 'LOCATION', 'Bedroom');
INSERT INTO EAV_DATA VALUES (1, 'EVENT', 'other');
INSERT INTO EAV_DATA VALUES (1, 'EVENT', 'bad behavior');
INSERT INTO EAV_DATA VALUES (2, 'LOCATION', 'Bedroom');
INSERT INTO EAV_DATA VALUES (2, 'EVENT', 'other');
INSERT INTO EAV_DATA VALUES (2, 'EVENT', 'verbal aggression');
INSERT INTO EAV_DATA VALUES (3, 'LOCATION', 'courtyard');
INSERT INTO EAV_DATA VALUES (3, 'EVENT', 'other');
INSERT INTO EAV_DATA VALUES (3, 'EVENT', 'peer');

Ideally, the result set of the query would be Location Event count
(headings if possible)

Bedroom verbal aggression 1
Bedroom peer 0
Bedroom bad behavior 0
Bedroom other 2
Dining Room verbal aggression 0
Dining Room peer 0
Dining Room bad behavior 0
Dining Room other 0
Bathroom verbal aggression 0
Bathroom peer 0
Bathroom bad behavior 0
Bathroom other 0
courtyard verbal aggression 0
courtyard peer 1
courtyard bad behavior 0
courtyard other 1

Also, if possible, another query would return this result set. (I think
I know how to do this one.)

Location Event count
Bedroom verbal aggression 1
Bedroom other 2
courtyard peer 1
courtyard other 1

Here is an answer From: Thomas Coleman

SELECT Locations.locationvalue, Events.eventvalue,
(SELECT COUNT(*)
FROM (SELECT LocationData.locationvalue, EventData.eventvalue

FROM (SELECT TD1.bts_id, TD1.value AS locationvalue
FROM eav_data AS TD1
WHERE TD1.key = 'location') AS LocationData
INNER JOIN
(SELECT TD2.bts_id, TD2.value AS eventvalue
FROM eav_data AS TD2
WHERE TD2.key = 'event'
) AS EventData
ON LocationData.bts_id = EventData.bts_id
) AS CollatedEventData
WHERE CollatedEventData.locationvalue = Locations.locationvalue
AND CollatedEventData.eventvalue = Events.eventvalue
FROM (SELECT T1.value AS locationvalue
FROM EAV AS T1
WHERE T1.key = 'location') AS Locations,
(SELECT T2.value AS eventvalue
FROM EAV AS T2
WHERE T2.key = 'event') AS Events
ORDER BY Locations.locationvalue, Events.eventvalue ,
SELECT Locations.locationvalue, Events.eventvalue
(SELECT COUNT(*)
FROM (SELECT LocationData.locationvalue, EventData.eventvalue

FROM (SELECT TD1.bts_id, TD1.value AS locationvalue
FROM eav_data AS TD1
WHERE TD1.key = 'location') AS LocationData
INNER JOIN
(SELECT TD2.bts_id, TD2.value AS eventvalue
FROM eav_data AS TD2
WHERE TD2.key = 'event') AS EventData
ON LocationData.bts_id = EventData.bts_id)
AS CollatedEventData
WHERE CollatedEventData.locationvalue = Locations.locationvalue
AND CollatedEventData.eventvalue = Events.eventvalue)
FROM (SELECT T1.value AS locationvalue
FROM EAV AS T1
WHERE T1.key = 'location') AS Locations,
(SELECT T2.value AS eventvalue
FROM EAV AS T2
WHERE T2.key = 'event') AS Events;

Is the same thing in a proper schema as:

SELECT L.locationvalue, E.eventvalue, COUNT(*)
FROM Locations AS L, Events AS E
WHERE L.btd_id = E.btd_id
GROUP BY L.locationvalue, E.eventvalue;

The reason that I had to use so many subqueries is that those entities
are all lopped into the same table. There should be separate tables for
Locations and Events.

The column names are seriously painful. Beyond the fact that I
personally hate underscores in column names, using underscores at the
end of the column name is really non-intuitive. I removed them for my
example and came across the next column name faux pas. Don't use "key"
and "value" for column names. It means that the developer *has*
surround the column name with square brackets for everything which is a
serious pain.

There is such a thing as "too" generic. There has to be some structure
or everything becomes nothing more than a couple of tables called
"things". The real key (no pun intended) is commonality. Is there a
pattern to the data that they want to store? It may not be possible to
create one structure to rule them all and in the darkness bind them.

"To be is to be something in particular; to be nothing in particular is
to be nothing." --Aristole

All data integrity is destroyed. Any typo becomes a new attribute or
entity. Entities are found missing attributes, so all the reports are
wrong.

ry to write a single CHECK() constraint that works for all the
attributes of those 30+ entities your users created because you were
too dumb or too lazy to do your job. It can be done! You need a case
expression almost 70 WHEN clauses for a simple invoice and order system
when I tried it as an exercise.

ry to write a single DEFAULT clause for 30+ entities crammed into one
column. Impossible!

Try to set up DRI actions among the entities. If you thought the WHEN
clauses in the single CASE expression were unmaintainable, wait until
you see the "TRIGGERs from Hell" -- Too bad that they might not fit
into older SQL Server which had some size limits. Now maintain it.

Like most new ideas, the hard part of understanding what the relational
model is comes in un-learning what you know about file systems. As
Artemus Ward (William Graham Sumner, 1840-1910) put it, "It ain't so
much the things we don't know that get us into trouble. It's the things
we know that just ain't so."

If you already have a background in data processing with traditional
file systems, the first things to un-learn are:

(0) Databases are not file sets.
(1) Tables are not files.
(2) Rows are not records.
(3) Columns are not fields.

Modern data processing began with punch cards. The influence of the
punch card lingered on long after the invention of magnetic tapes and
disk for data storage. This is why early video display terminals were
80 columns across. Even today, files which were migrated from cards to
magnetic tape files or disk storage still use 80 column records.

But the influence was not just on the physical side of data processing.
The methods for handling data from the prior media were imitated in
the new media.

Data processing first consisted of sorting and merging decks of punch
cards (later, sequential magnetic tape files) in a series of distinct
steps. The result of each step feed into the next step in the process.
This leads to temp table and other tricks to mimic that kind of
processing.

Relational databases do not work that way. Each user connects to the
entire database all at once, not to one file at time in a sequence of
steps. The users might not all have the same database access rights
once they are connected, however. Magnetic tapes could not be shared
among users at the same time, but shared data is the point of a
database.

Tables versus Files

A file is closely related to its physical storage media. A table may
or may not be a physical file. DB2 from IBM uses one file per table,
while Sybase puts several entire databases inside one file. A table is
a set of rows of the same kind of thing. A set has no ordering
and it makes no sense to ask for the first or last row.

A deck of punch cards is sequential, and so are magnetic tape files.
Therefore, a physical file of ordered sequential records also
became the mental model for data processing and it is still hard
to shake. Anytime you look at data, it is in some physical ordering.

The various access methods for disk storage system came later, but even
these access methods could not shake the mental model.

Another conceptual difference is that a file is usually data that deals
with a whole business process. A file has to have enough data in
itself to support applications for that business process. Files tend
to be "mixed" data which can be described by the name of the business
process, such as "The Payroll file" or something like that.

Tables can be either entities or relationships within a business
process. This means that the data which was held in one file is often
put into several tables. Tables tend to be "pure" data which can be
described by single words. The payroll would now have separate tables
for timecards, employees, projects and so forth.

Tables as Entities

An entity is physical or conceptual "thing" which has meaning be
itself. A person, a sale or a product would be an example. In a
relational database, an entity is defined by its attributes, which are
shown as values in columns in rows in a table.

To remind users that tables are sets of entities, I like to use
collective or plural nouns that describe the function of the entities
within the system for the names of tables. Thus "Employee" is a bad
name because it is singular; "Employees" is a better name because it is
plural; "Personnel" is best because it is collective and does not
summon up a mental picture of individual persons.

If you have tables with exactly the same structure, then they are sets
of the same kind of elements. But you should have only one set for
each kind of data element! Files, on the other hand, were PHYSICALLY
separate units of storage which could be alike -- each tape or disk
file represents a step in the PROCEDURE , such as moving from raw data,
to edited data, and finally to archived data. In SQL, this should be a
status flag in a table.

Tables as Relationships

A relationship is shown in a table by columns which reference one or
more entity tables. Without the entities, the relationship has no
meaning, but the relationship can have attributes of its own. For
example, a show business contract might have an agent, an employer and
a talent. The method of payment is an attribute of the contract
itself, and not of any of the three parties.

Rows versus Records

Rows are not records. A record is defined in the application program
which reads it; a row is defined in the database schema and not by a
program at all. The name of the field in the READ or INPUT statements
of the application; a row is named in the database schema. Likewise,
the PHYSICAL order of the field names in the READ statement is vital
(READ a,b,c is not the same as READ c, a, b; but SELECT a,b,c is the
same data as SELECT c, a, b.

All empty files look alike; they are a directory entry in the operating
system with a name and a length of zero bytes of storage. Empty tables
still have columns, constraints, security privileges and other
structures, even tho they have no rows.

This is in keeping with the set theoretical model, in which the empty
set is a perfectly good set. The difference between SQL's set model
and standard mathematical set theory is that set theory has only one
empty set, but in SQL each table has a different structure, so they
cannot be used in places where non-empty versions of themselves could
not be used.

Another characteristic of rows in a table is that they are all alike in
structure and they are all the "same kind of thing" in the model. In a
file system, records can vary in size, datatypes and structure by
having flags in the data stream that tell the program reading the data
how to interpret it. The most common examples are Pascal's variant
record, C's struct syntax and Cobol's OCCURS clause.

The OCCURS keyword in Cobol and the Variant records in Pascal have a
number which tells the program how many time a record structure is to
be repeated in the current record.

Unions in 'C' are not variant records, but variant mappings for the
same physical memory. For example:

union x {int ival; char j[4];} myStuff;

defines myStuff to be either an integer (which are 4 bytes on most
modern C compilers, but this code is non-portable) or an array of 4
bytes, depending on whether you say myStuff.ival or myStuff.j[0];

But even more than that, files often contained records which were
summaries of subsets of the other records -- so called control break
reports. There is no requirement that the records in a file be related
in any way -- they are literally a stream of binary data whose meaning
is assigned by the program reading them.

Columns versus Fields

A field within a record is defined by the application program that
reads it. A column in a row in a table is defined by the database
schema. The datatypes in a column are always scalar.

The order of the application program variables in the READ or INPUT
statements is important because the values are read into the program
variables in that order. In SQL, columns are referenced only by their
names. Yes, there are shorthands like the SELECT * clause and INSERT
INTO [table name] statements which expand into a list of column names
in the physical order in which the column names appear within their
table declaration, but these are shorthands which resolve to named
lists.

The use of NULLs in SQL is also unique to the language. Fields do not
support a missing data marker as part of the field, record or file
itself. Nor do fields have constraints which can be added to them in
the record, like the DEFAULT and CHECK() clauses in SQL.

Relationships among tables within a database

Files are pretty passive creatures and will take whatever an
application program throws at them without much objection. Files are
also independent of each other simply because they are connected to one
application program at a time and therefore have no idea what other
files looks like.

A database actively seeks to maintain the correctness of all its data.
The methods used are triggers, constraints and declarative referential
integrity.

Declarative referential integrity (DRI) says, in effect, that data in
one table has a particular relationship with data in a second (possibly
the same) table. It is also possible to have the database change
itself via referential actions associated with the DRI.

For example, a business rule might be that we do not sell products
which are not in inventory. This rule would be enforce by a REFERENCES
clause on the Orders table which references the Inventory table and a
referential action of ON DELETE CASCADE

Triggers are a more general way of doing much the same thing as DRI. A
trigger is a block of procedural code which is executed before, after
or instead of an INSERT INTO or UPDATE statement. You can do anything
with a trigger that you can do with DRI and more.

However, there are problems with TRIGGERs. While there is a standard
syntax for them in the SQL-92 standard, most vendors have not
implemented it. What they have is very proprietary syntax instead.
Secondly, a trigger cannot pass information to the optimizer like DRI.
In the example in this section, I know that for every product number in
the Orders table, I have that same product number in the Inventory
table. The optimizer can use that information in setting up EXISTS()
predicates and JOINs in the queries. There is no reasonable way to
parse procedural trigger code to determine this relationship.

The CREATE ASSERTION statement in SQL-92 will allow the database to
enforce conditions on the entire database as a whole. An ASSERTION is
not like a CHECK() clause, but the difference is subtle. A CHECK()
clause is executed when there are rows in the table to which it is
attached. If the table is empty then all CHECK() clauses are
effectively TRUE. Thus, if we wanted to be sure that the Inventory
table is never empty, and we wrote:

CREATE TABLE Inventory
( ...
CONSTRAINT inventory_not_empty
CHECK ((SELECT COUNT(*) FROM Inventory) > 0), ... );

it would not work. However, we could write:

CREATE ASSERTION Inventory_not_empty
CHECK ((SELECT COUNT(*) FROM Inventory) > 0);

and we would get the desired results. The assertion is checked at the
schema level and not at the table level.

No distinct in a select into statement

SQL Apprentice Question
use MSSQL 2000 and encountered a strange problem wqhile I tried to
use a select into statement .

If I perform the command command below I get only one dataset which has
the described properties.
If I use the same statement in a select into statement (see the second
select) I get several datasets with the described properties like I
didn't use distinct
Is there any posiibility to use destinct in a select into statement

select distinct IDENTITY (int) as ID, Title1 as Title1, Title2 as
Title2, Title3 as Title3,
AggregationTitle1 as AggregationTitle1, AggregationTitle2 as
AggregationTitle2,
AggregationTitle3 as AggregationTitle3, AggregationTitle4 as
AggregationTitle4
from Variables where Title1 is not NULL or Title2 is not NULL or
Title3 is not NULL or
AggregationTitle1 is not NULL or AggregationTitle2 is not NULL or
AggregationTitle3 is not NULL or AggregationTitle4 is not NULL;

This is the same with select into :

select distinct IDENTITY (int) as ID, Title1 as Title1, Title2 as
Title2, Title3 as Title3,
AggregationTitle1 as AggregationTitle1, AggregationTitle2 as
AggregationTitle2,
AggregationTitle3 as AggregationTitle3, AggregationTitle4 as
AggregationTitle4
into VarTitles from Variables where Title1 is not NULL or Title2 is
not NULL or Title3 is not NULL or
AggregationTitle1 is not NULL or AggregationTitle2 is not NULL or
AggregationTitle3 is not NULL or
AggregationTitle4 is not NULL;

Hope anyone can help.

Celko Answers
Minor trick to make the code easier to read and maintain:

WHERE COALESCE (title1, title2, title3) IS NOT NULL
OR COALESCE (ggregation_title1, aggregation_title2,
aggregation_title3, aggregation_title4) IS NOT NULL

Unfortunately these columns look like repeated and a really bad 1NF
problem. I have the feeling that you might have wanted to use
COALESCE() in the SELECT list to get a non-null title and non-null
aggregation_title instead of this convoluted query.

stored procedure with table as a variable

SQL Apprentice Question
I have multiple tables which all have similar structures. A 1 to 2
character CODE and a 10 to 75 character DETAIL. I want a stored
procedure that will take the code and table to find it in as input and
return the detail. Here is my code so far.

CREATE PROCEDURE [dbo].[lookup_code] @mCODE char(2), @mTABLE char(75) as
set nocount on
SELECT detail FROM @mTABLE WITH (READPAST, XLOCK)
where code = @mCODE
GO

Check Syntax returns: Error 156: Incorrect syntax near the keyword 'WITH'.

If found that it doesn't like a variable being used as the table name.
If I change @mTABLE to mytable it passes syntax checking. Can anyone
tell me how to get the table name to be a variable?

Celko Answers
>> I have multiple tables which all have similar structures. A 1 to 2

character CODE and a 10 to 75 character DETAIL. <<

Code? What kind of code? likewise detail is too vague to be a valid
data element name.

>> Can anyone tell me how to get the table name to be a variable? <<

The short answer is use slow, proprietrary dynamic SQL to kludge a
query together on the fly with your table name in the FROM clause.

The right answer is never pass a table name as a parameter. You need
to understand the basic idea of a data model and what a table means in
implementing a data model. Go back to basics. What is a table? A
model of a set of entities or relationships. EACH TABLE SHOULD BE A
DIFFERENT KIND OF ENTITY. When you have many tables that model the same
entity, then you have a magnetic tape file system written in SQL, and
not an RDBMS at all.

If the tables are different, then having a generic procedure which
works equally on automobiles, octopi or Britney Spear's discology is
saying that your application is a disaster of design.

1) This is dangerous because some user can insert pretty much whatever
they wish -- consider the string 'Foobar; DELETE FROM Foobar; SELECT *
FROM Floob' in your statement string.

2) It says that you have no idea what you are doing, so you are giving
control of the application to any user, present or future. Remember
the basics of Software Engineering? Modules need weak coupling and
strong cohesion, etc. This is far more fundamental than just SQL; it
has to do with learning to programming at all.

3) If you have tables with the same structure which represent the same
kind of entities, then your schema is not orthogonal. Look up what
Chris Date has to say about this design flaw. Look up the term
attribute splitting.

4) You might have failed to tell the difference between data and
meta-data. The SQL engine has routines for that stuff and applications
do not work at that level, if you want to have any data integrity.

Table look-ups are done with a join, not a procedure in SQL. You are
still thinking and writing procedural code in whatever your first
programming language is.

>> FieldName, FieldValue.. <<

We are dealing with a farm? Again, back to basics; fields and columns
are totally different concepts.

>> What are your thoughts on these sorts of approaches, <<

Same as everyone else thinks; it sucks. That design flaw is called
either OTLT (One True Look-up Table) or a MUCK (Massively Unified Code
Key). Google it for all the articles denouncing it.

>> what alternative would you suggest? <<

Since each encoding is independent of all the others and is a different
of kind, then it goes in its own table. That is simple, RDBMS basics.

You do not create a "ZipCode_ICD-9_race_sex_ ..insert 100's of other
encoding names" table. If for no other reason, the data-element name
of the nightmare when you try be accurate and follow ISO-11179 rules.
The best you coudl do is a meta-data name to spotlight the lack of
proper schema design.

Newbies who confuse fields and columns can get caught in the MUCK.
Why? A column has meaning in and of itself while field gets its
meaning from the application program reading it.

Tables do not "overlap"; the whole point of normalization and data
modeling is to avoid redundancy. We seek "one fact, one place, one
time, one way" in the schema.

Throw out your non-design and start over; this is sooooo flawed that
you are going to be writing SP kludges for everything.

How to effectively create dynamic queries?

SQL Apprentice Question
Let's say I have a search screen in my application that allows users to
do various AND OR conditions to about 14 pieces of criteria. That is a
complicated query to build dynamically and will be super slow because
it is dynamic.

Is there an efficient way to do this type of quering?

Celko Answers
I think what you want is the ability to load tables with criteria and
not have to use dynamic SQL. Let's say you want to search for job
candidates based on their skills.

skill = Java AND (skill = Perl OR skill = PHP)

becomes the disjunctive canonical form:

(Java AND Perl) OR (Java AND PHP)

which we load into this table:

CREATE TABLE Query
(and_grp INTEGER NOT NULL,
skill CHAR(4) NOT NULL,
PRIMARY KEY (and_grp, skill));

INSERT INTO Query VALUES (1, 'Java');
INSERT INTO Query VALUES (1, 'Perl');
INSERT INTO Query VALUES (2, 'Java');
INSERT INTO Query VALUES (2, 'PHP');

Assume we have a table of job candidates:

CREATE TABLE Candidates
(candidate_name CHAR(15) NOT NULL,
skill CHAR(4) NOT NULL,
PRIMARY KEY (candidate_name, skill));

INSERT INTO Candidates VALUES ('John', 'Java'); --winner
INSERT INTO Candidates VALUES ('John', 'Perl');
INSERT INTO Candidates VALUES ('Mary', 'Java'); --winner
INSERT INTO Candidates VALUES ('Mary', 'PHP');
INSERT INTO Candidates VALUES ('Larry', 'Perl'); --winner
INSERT INTO Candidates VALUES ('Larry', 'PHP');
INSERT INTO Candidates VALUES ('Moe', 'Perl'); --winner
INSERT INTO Candidates VALUES ('Moe', 'PHP');
INSERT INTO Candidates VALUES ('Moe', 'Java');
INSERT INTO Candidates VALUES ('Celko', 'Java'); -- loser
INSERT INTO Candidates VALUES ('Celko', 'Algol');
INSERT INTO Candidates VALUES ('Smith', 'APL'); -- loser
INSERT INTO Candidates VALUES ('Smith', 'Algol');

The query is simple now:

SELECT DISTINCT C1.candidate_name
FROM Candidates AS C1, Query AS Q1
WHERE C1.skill = Q1.skill
GROUP BY Q1.and_grp, C1.candidate_name
HAVING COUNT(C1.skill)
= (SELECT COUNT(*)
FROM Query AS Q2
WHERE Q1.and_grp = Q2.and_grp);

You can retain the COUNT() information to rank candidates. For example
Moe meets both qualifications, while other candidates meet only one of
the two. You can Google "canonical disjunctive form" for more details.
This is a form of relational division.

Counting by Calendar and Fiscal periods in same query

SQL Apprentice Question
I have a query that returns results based on a count of tests done by period
and year. The period can either be a calendar month or Fiscal Calendar month.

As always thanks in advance!

Select
count( modelDesc)as CompCnt,
TestYear as CalYear,
TestMonth as CalMonth,
FiscYear as FiscYear,
FiscMonth as FiscMonth
From CompFails
group by ModelDesc,FiscYear, FiscMonth, testYear, testMonth

I also attempted to set the count as

select COUNT(ModelDesc)as ModelCnt FROM (SELECT
DISTINCT f.modelDesc from compfails f
group by FiscYear, FiscMonth) as FiscCompCnt

And got the "subquery returns more than one row error"

Currently the query returns the results as so:

Current Results

COMPcnt CalYear CalMonth FiscYear FiscMonth
------- ----------- ----------- ----------- -----------
26 2005 10 2006 1
1 2005 10 2006 2
17 2005 11 2006 2
1 2005 11 2006 3
10 2005 12 2006 3
19 2006 1 2006 4
1 2006 1 2006 5
16 2006 2 2006 5

I would like the results as follows but I am having a heck of a time
figuring the syntax.

CalCompCnt CalYear CalMonth FiscCompCnt FiscYear FiscMonth
----------- ----------- ---------------------- ----------- -----------
20 2006 1 26 2006 1
16 2006 2 18 2006 2
27 2005 10 11 2006 3
18 2005 11 19 2006 4
10 2005 12 17 2006 5

DDL Follows:

CREATE TABLE CompFails (
ModelDesc Varchar(40) NULL,
CalorimeterTestDate DATETIME NULL,
TestYear INT NULL,
TestMonth INT NULL,
FiscYear INT NULL,
FiscMonth INT NULL
)

INSERT INTO CompFails
(ModelDesc,
CalorimeterTestDate,
TestYear,
TestMonth,
FiscYear,
FiscMonth)

VALUES('tstModelDesc',2005-10-06 00:00:00.000,2005,10,2006,1)
VALUES('tstModelDesc',2005-10-03 00:00:00.000,2005,10,2006,1)
VALUES('tstModelDesc',2005-10-06 00:00:00.000,2005,10,2006,1)
VALUES('tstModelDesc',2005-10-06 00:00:00.000,2005,10,2006,1)
VALUES('tstModelDesc',2005-10-06 00:00:00.000,2005,10,2006,1)
VALUES('tstModelDesc',2005-10-03 00:00:00.000,2005,10,2006,1)
VALUES('tstModelDesc',2005-10-10 00:00:00.000,2005,10,2006,1)
VALUES('tstModelDesc',2005-10-11 00:00:00.000,2005,10,2006,1)
VALUES('tstModelDesc',2005-10-11 00:00:00.000,2005,10,2006,1)
VALUES('tstModelDesc',2005-10-14 00:00:00.000,2005,10,2006,1)
VALUES('tstModelDesc',2005-10-14 00:00:00.000,2005,10,2006,1)
VALUES('tstModelDesc',2005-10-12 00:00:00.000,2005,10,2006,1)
VALUES('tstModelDesc',2005-10-12 00:00:00.000,2005,10,2006,1)
VALUES('tstModelDesc',2005-10-18 00:00:00.000,2005,10,2006,1)
VALUES('tstModelDesc',2005-10-19 00:00:00.000,2005,10,2006,1)
VALUES('tstModelDesc',2005-10-20 00:00:00.000,2005,10,2006,1)
VALUES('tstModelDesc',2005-10-28 00:00:00.000,2005,10,2006,1)
VALUES('tstModelDesc',2005-10-21 00:00:00.000,2005,10,2006,1)
VALUES('tstModelDesc',2005-10-18 00:00:00.000,2005,10,2006,1)
VALUES('tstModelDesc',2005-10-20 00:00:00.000,2005,10,2006,1)
VALUES('tstModelDesc',2005-10-21 00:00:00.000,2005,10,2006,1)
VALUES('tstModelDesc',2005-10-27 00:00:00.000,2005,10,2006,1)
VALUES('tstModelDesc',2005-11-02 00:00:00.000,2005,11,2006,2)
VALUES('tstModelDesc',2005-10-24 00:00:00.000,2005,10,2006,1)
VALUES('tstModelDesc',2005-10-21 00:00:00.000,2005,10,2006,1)
VALUES('tstModelDesc',2005-11-08 00:00:00.000,2005,11,2006,2)
VALUES('tstModelDesc',2005-10-25 00:00:00.000,2005,10,2006,1)
VALUES('tstModelDesc',2005-10-25 00:00:00.000,2005,10,2006,1)
VALUES('tstModelDesc',2005-11-04 00:00:00.000,2005,11,2006,2)
VALUES('tstModelDesc',2005-11-07 00:00:00.000,2005,11,2006,2)
VALUES('tstModelDesc',2005-11-08 00:00:00.000,2005,11,2006,2)
VALUES('tstModelDesc',2005-11-04 00:00:00.000,2005,11,2006,2)
VALUES('tstModelDesc',2005-10-31 00:00:00.000,2005,10,2006,2)
VALUES('tstModelDesc',2005-12-05 00:00:00.000,2005,12,2006,3)
VALUES('tstModelDesc',2005-11-15 00:00:00.000,2005,11,2006,2)
VALUES('tstModelDesc',2005-11-10 00:00:00.000,2005,11,2006,2)
VALUES('tstModelDesc',2005-11-10 00:00:00.000,2005,11,2006,2)
VALUES('tstModelDesc',2005-12-02 00:00:00.000,2005,12,2006,3)
VALUES('tstModelDesc',2005-11-15 00:00:00.000,2005,11,2006,2)
VALUES('tstModelDesc',2005-11-18 00:00:00.000,2005,11,2006,2)
VALUES('tstModelDesc',2005-11-21 00:00:00.000,2005,11,2006,2)
VALUES('tstModelDesc',2005-11-29 00:00:00.000,2005,11,2006,3)
VALUES('tstModelDesc',2005-11-16 00:00:00.000,2005,11,2006,2)
VALUES('tstModelDesc',2005-11-18 00:00:00.000,2005,11,2006,2)
VALUES('tstModelDesc',2005-11-21 00:00:00.000,2005,11,2006,2)
VALUES('tstModelDesc',2005-11-21 00:00:00.000,2005,11,2006,2)
VALUES('tstModelDesc',2005-11-18 00:00:00.000,2005,11,2006,2)
VALUES('tstModelDesc',2005-12-12 00:00:00.000,2005,12,2006,3)
VALUES('tstModelDesc',2005-12-07 00:00:00.000,2005,12,2006,3)
VALUES('tstModelDesc',2005-12-09 00:00:00.000,2005,12,2006,3)
VALUES('tstModelDesc',2005-12-16 00:00:00.000,2005,12,2006,3)
VALUES('tstModelDesc',2005-12-12 00:00:00.000,2005,12,2006,3)
VALUES('tstModelDesc',2005-12-17 00:00:00.000,2005,12,2006,3)
VALUES('tstModelDesc',2006-01-04 00:00:00.000,2006,1, 2006,4)
VALUES('tstModelDesc',2006-01-02 00:00:00.000,2006,1, 2006,4)
VALUES('tstModelDesc',2005-12-19 00:00:00.000,2005,12,2006,3)
VALUES('tstModelDesc',2006-01-09 00:00:00.000,2006,1, 2006,4)
VALUES('tstModelDesc',2006-01-13 00:00:00.000,2006,1, 2006,4)
VALUES('tstModelDesc',2006-02-07 00:00:00.000,2006,2, 2006,5)
VALUES('tstModelDesc',2005-12-12 00:00:00.000,2005,12,2006,3)
VALUES('tstModelDesc',2006-01-13 00:00:00.000,2006,1, 2006,4)
VALUES('tstModelDesc',2006-01-16 00:00:00.000,2006,1, 2006,4)
VALUES('tstModelDesc',2006-01-24 00:00:00.000,2006,1, 2006,4)
VALUES('tstModelDesc',2006-01-18 00:00:00.000,2006,1, 2006,4)
VALUES('tstModelDesc',2006-01-19 00:00:00.000,2006,1, 2006,4)
VALUES('tstModelDesc',2006-01-24 00:00:00.000,2006,1, 2006,4)
VALUES('tstModelDesc',2006-01-28 00:00:00.000,2006,1, 2006,4)
VALUES('tstModelDesc',2006-01-22 00:00:00.000,2006,1, 2006,4)
VALUES('tstModelDesc',2006-02-01 00:00:00.000,2006,2, 2006,5)
VALUES('tstModelDesc',2006-01-24 00:00:00.000,2006,1, 2006,4)
VALUES('tstModelDesc',2006-01-25 00:00:00.000,2006,1, 2006,4)
VALUES('tstModelDesc',2006-01-20 00:00:00.000,2006,1, 2006,4)
VALUES('tstModelDesc',2006-01-21 00:00:00.000,2006,1, 2006,4)
VALUES('tstModelDesc',2006-02-01 00:00:00.000,2006,2, 2006,5)
VALUES('tstModelDesc',2006-01-30 00:00:00.000,2006,1, 2006,5)
VALUES('tstModelDesc',2006-01-27 00:00:00.000,2006,1, 2006,4)
VALUES('tstModelDesc',2006-01-28 00:00:00.000,2006,1, 2006,4)
VALUES('tstModelDesc',2006-02-02 00:00:00.000,2006,2, 2006,5)
VALUES('tstModelDesc',2006-02-08 00:00:00.000,2006,2, 2006,5)
VALUES('tstModelDesc',2006-02-22 00:00:00.000,2006,2, 2006,5)
VALUES('tstModelDesc',2006-02-16 00:00:00.000,2006,2, 2006,5)
VALUES('tstModelDesc',2006-02-16 00:00:00.000,2006,2, 2006,5)
VALUES('tstModelDesc',2006-02-17 00:00:00.000,2006,2, 2006,5)
VALUES('tstModelDesc',2006-02-15 00:00:00.000,2006,2, 2006,5)
VALUES('tstModelDesc',2006-02-14 00:00:00.000,2006,2, 2006,5)
VALUES('tstModelDesc',2006-02-15 00:00:00.000,2006,2, 2006,5)
VALUES('tstModelDesc',2006-02-21 00:00:00.000,2006,2, 2006,5)
VALUES('tstModelDesc',2006-02-18 00:00:00.000,2006,2, 2006,5)
VALUES('tstModelDesc',2006-02-19 00:00:00.000,2006,2, 2006,5)
VALUES('tstModelDesc',2006-02-21 00:00:00.000,2006,2, 2006,5)
VALUES('tstModelDesc',2006-01-02 00:00:00.000,2006,1, 2006,4)

Celko Answers
First construct a general enterprise calendar table; you should talk to
the trolls in accounting so you get the fiscal part right.

CREATE TABLE Calendar
(cal_date DATETIME NOT NULL PRIMARY KEY,
cal_year INTEGER NOT NULL,
cal_month INTEGER NOT NULL,
fiscal_year INTEGER NOT NULL,
fiscal_month INTEGER NOT NULL,
etc.);

Now get your tests table right. For example, why are things you must
know set to NULLs? Why did you have no key?

CREATE TABLE CompFails
(model_desc VARCHAR(40) NOT NULL,
test_date DATETIME NOT NULL,
PRIMARY KEY (model_desc, test_date)
);

Then your view or query will look like this skeleton:

SELECT ..
FROM CompFails AS T, Calendar AS C
WHERE T.fail_date = C.cal_date
AND ..;

He had computable columns in your original non-table, in addition to
the other design flaws.

Wednesday, April 19, 2006

Structure M:M

SQL Apprentice Question
What is the best way to create a M:M relationship between a column and
itself.

For example I have a table of suburbs and I would like associate with each
suburb it's surrounding suburbs which effectively will be a M:M relationship
with itself ?

CREATE TABLE Suburbs
(
SuburbID INT IDENTITY PRIMARY KEY,
Suburb VARCHAT(50),
State VARCHAR(3)
)

CREATE TABLE Surrounds
(
SuburbID INT,
NeighbouringSuburbID INT
)

This is what I have come up with however I am sure there is a better way.

Celko Answers
>> the M:M table toaccurately record this:

(Suburb1, Suburb2)
( Suburb2, Suburb1)

Surely this relationship should be able to be recorded using a single
entry
without duplication ? <<

Never use IDENTITY and always look for a standard code -- USPS or
Census would be good for this. Let's get the DDL right first

CREATE TABLE Suburbs
(merto_code INTEGER NOT NULL PRIMARY KEY,
suburb_name VARCHAR(30) NOT NULL,
state_code CHAR(2) NOT NULL);

CREATE TABLE Neighbors
(metro_code_1 INTEGER NOT NULL,
metro_code_2 INTEGER NOT NULL,
CHECK (metro_code_1 < metro_code_2 ),
PRIMARY KEY (metro_code_1, metro_code_2));

Now for the answer, use a view.

CREATE VIEW Surrounds (metro_code_1, metro_code_2)
AS
SELECT metro_code_1, metro_code_2
FROM Neightbors
UNION ALL
SELECT metro_code_2, metro_code_1
FROM Neightbors;

You will need an INSTEAD OF trigger for updates, inserts and deletes.
This will prevent one-way relationships, hide teh details fromt he
users, etc.

How can I swap rows to columns in a SELECT statement

SQL Apprentice Question
I've got dynamically changing columns in a table. One way is to define many
parameter columns in this table with the datatype nvarchar and a second
table that is defining the datatype, max and min values of these parameter
fields.

Because I want to have more flexibility I think to save the dynamically
changing columns as rows in another table.
Is there a way to retrieve the columns I saved as rows in another table as
columns?

Celko Answers
Columns do not change; SQL has strong typing. Your real problem is
that you do not have a data model or RDBMS design yet. I found an old
"cut & paste". Someone like you posted this:

CREATE TABLE EAV -- no key declared
(key_col VARCHAR (10) NULL,
attrib_value VARCHAR (50) NULL);

INSERT INTO EAV VALUES ('LOCATION','Bedroom');
INSERT INTO EAV VALUES ('LOCATION','Dining Room');
INSERT INTO EAV VALUES ('LOCATION','Bathroom');
INSERT INTO EAV VALUES ('LOCATION','courtyard');
INSERT INTO EAV VALUES ('EVENT','verbal aggression');
INSERT INTO EAV VALUES ('EVENT','peer');
INSERT INTO EAV VALUES ('EVENT','bad behavior');
INSERT INTO EAV VALUES ('EVENT','other');

CREATE TABLE EAV_DATA -note lack of constraints, defaults, DRI
(id INTEGER IDENTITY (1,1) NOT NULL,
bts_id INTEGER NULL,
key_col VARCHAR (10) NULL,
attrib_value VARCHAR (50) NULL );

INSERT INTO EAV_DATA VALUES (1, 'LOCATION', 'Bedroom');
INSERT INTO EAV_DATA VALUES (1, 'EVENT', 'other');
INSERT INTO EAV_DATA VALUES (1, 'EVENT', 'bad behavior');
INSERT INTO EAV_DATA VALUES (2, 'LOCATION', 'Bedroom');
INSERT INTO EAV_DATA VALUES (2, 'EVENT', 'other');
INSERT INTO EAV_DATA VALUES (2, 'EVENT', 'verbal aggression');
INSERT INTO EAV_DATA VALUES (3, 'LOCATION', 'courtyard');
INSERT INTO EAV_DATA VALUES (3, 'EVENT', 'other');
INSERT INTO EAV_DATA VALUES (3, 'EVENT', 'peer');

Ideally, the result set of the query would be Location Event count
(headings if possible)

Bedroom verbal aggression 1
Bedroom peer 0
Bedroom bad behavior 0
Bedroom other 2
Dining Room verbal aggression 0
Dining Room peer 0
Dining Room bad behavior 0
Dining Room other 0
Bathroom verbal aggression 0
Bathroom peer 0
Bathroom bad behavior 0
Bathroom other 0
courtyard verbal aggression 0
courtyard peer 1
courtyard bad behavior 0
courtyard other 1

Also, if possible, another query would return this result set. (I think
I know how to do this one.)

Location Event count
Bedroom verbal aggression 1
Bedroom other 2
courtyard peer 1
courtyard other 1

Here is an answer from Thomas Coleman

SELECT Locations.locationvalue, Events.eventvalue,
(SELECT COUNT(*)
FROM (SELECT LocationData.locationvalue, EventData.eventvalue

FROM (SELECT TD1.bts_id, TD1.value AS locationvalue
FROM eav_data AS TD1
WHERE TD1.key = 'location') AS LocationData
INNER JOIN
(SELECT TD2.bts_id, TD2.value AS eventvalue
FROM eav_data AS TD2
WHERE TD2.key = 'event'
) AS EventData
ON LocationData.bts_id = EventData.bts_id
) AS CollatedEventData
WHERE CollatedEventData.locationvalue = Locations.locationvalue
AND CollatedEventData.eventvalue = Events.eventvalue
FROM (SELECT T1.value AS locationvalue
FROM EAV AS T1
WHERE T1.key = 'location') AS Locations,
(SELECT T2.value AS eventvalue
FROM EAV AS T2
WHERE T2.key = 'event') AS Events
ORDER BY Locations.locationvalue, Events.eventvalue ,
SELECT Locations.locationvalue, Events.eventvalue
(SELECT COUNT(*)
FROM (SELECT LocationData.locationvalue, EventData.eventvalue

FROM (SELECT TD1.bts_id, TD1.value AS locationvalue
FROM eav_data AS TD1
WHERE TD1.key = 'location') AS LocationData
INNER JOIN
(SELECT TD2.bts_id, TD2.value AS eventvalue
FROM eav_data AS TD2
WHERE TD2.key = 'event') AS EventData
ON LocationData.bts_id = EventData.bts_id)
AS CollatedEventData
WHERE CollatedEventData.locationvalue = Locations.locationvalue
AND CollatedEventData.eventvalue = Events.eventvalue)
FROM (SELECT T1.value AS locationvalue
FROM EAV AS T1
WHERE T1.key = 'location') AS Locations,
(SELECT T2.value AS eventvalue
FROM EAV AS T2
WHERE T2.key = 'event') AS Events;

Is the same thing in a proper schema as:

SELECT L.locationvalue, E.eventvalue, COUNT(*)
FROM Locations AS L, Events AS E
WHERE L.btd_id = E.btd_id
GROUP BY L.locationvalue, E.eventvalue;

The reason that I had to use so many subqueries is that those entities
are all lopped into the same table. There should be separate tables for
Locations and Events.

The column names are seriously painful; using underscores at the end of
the column name is really non-intuitive. I removed them for my example
and came across the next column name faux pas. Don't use "key" and
"value" for column names. It means that the developer *has* surround
the column name with square brackets for everything which is a serious
pain.

There is such a thing as "too" generic. There has to be some structure
or everything becomes nothing more than a couple of tables called
"things". The real key (no pun intended) is commonality. Is there a
pattern to the data that they want to store? It may not be possible to
create one structure to rule them all and in the darkness bind them.

"To be is to be something in particular; to be nothing in particular is
to be nothing." --Aristole

All data integrity is destroyed. Any typo becomes a new attribute or
entity. Entities are found missing attributes, so all the reports are
wrong.

Try to write a single CHECK() constraint that works for all the
attributes of those 30+ entities your users created because you were
too dumb or too lazy to do your job. It can be done! You need a case
expression almost 70 WHEN clauses for a simple invoice and order system
when I tried it as an exercise.

Try to write a single DEFAULT clause for 30+ entities crammed into one
column. Impossible!

Try to set up DRI actions among the entities. If you thought the WHEN
clauses in the single CASE expression were unmaintainable, wait until
you see the "TRIGGERs from Hell" -- Too bad that they might not fit
into older SQL Server which had some size limits. Now maintain it.

Finally, write a simple relational division query in EAV.

Auto-Increment

SQL Apprentice Question
I'm creating a new SQL 2005 Express database and want to have a Customers
table. How do I auto-increment the CustomerID field? Programming in VB.Net
2005.

Celko Answers
Let's get back to the basics of an RDBMS. Rows are not records; fields
are not columns; tables are not files; there is no sequential access or
ordering in an RDBMS.

What you want is a relational key that you can verify and validate.
Auto-incrementing has to do with the internal state of the hardware and
not the data model. This is why there are industry standard codes, why
websites use email addreses, etc.

Select from Two Tables each Column that is Different

SQL Apprentice Question
I have two tables, Table A and Table B. For the sake of this, we will
say that their DDL is identical. They have an associated key, ID.

I need to return for each ID only those columns that are different.

A plain English statement would look like this:
select ID, Column1 from Table A if A.Column1 <> B.Column1.
select ID, Column2 from TableB if A.Column2 <> B.Column2.
. . .

Of course, I want only a single return set with all of the columns that
do not pass the comparison.

What methods of accomplishing this do you recommend?

Celko Answers
CREATE TABLE Alpha
(vague_id INTEGER NOT NULL PRIMARY KEY,
foobar CHAR(5) NOT NULL);

CREATE TABLE Beta
(vague_id INTEGER NOT NULL PRIMARY KEY,
foobar CHAR(5) NOT NULL);

SELECT Alpha.vague_id, Alpha.foobar, Beta.foobar
FROM Alpha
FULL OUTER JOIN
Beta
ON Alpha.vague_id = Beta.vague_id
AND Alpha.foobar <> Beta.foobar;

Tuesday, April 18, 2006

Query Help

SQL Apprentice Question
I have the following table and data. What is the best way to get the
following results via query?

Brian 19991002 20000201
Brian 20000301 20000501
Charles 19961031 19990501

SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
SET ANSI_PADDING ON
GO
CREATE TABLE [dbo].[quiz1]([name] [varchar](50) NOT NULL, [start_dt]
[char](8) NOT NULL,
[end_dt] [char](8) NOT NULL, [rownum] [int] IDENTITY(1,1) NOT NULL,
CONSTRAINT [PK_quiz1] PRIMARY KEY CLUSTERED
( [name] ASC, [start_dt] ASC, [end_dt] ASC )
) ON [PRIMARY]

GO
SET ANSI_PADDING OFF
go

set nocount on
insert into quiz1(name, start_dt, end_dt) values('Brian', '19961001',
'19981230')
insert into quiz1(name, start_dt, end_dt) values('Brian', '19981230',
'19990605')
insert into quiz1(name, start_dt, end_dt) values('Brian', '19990605',
'19991002')
insert into quiz1(name, start_dt, end_dt) values('Brian', '20000201',
'20000301')
insert into quiz1(name, start_dt, end_dt) values('Brian', '20000501',
'99991231')
insert into quiz1(name, start_dt, end_dt) values('Charles', '19910106',
'19910731')
insert into quiz1(name, start_dt, end_dt) values('Charles', '19910731',
'19940201')
insert into quiz1(name, start_dt, end_dt) values('Charles', '19940201',
'19941021')
insert into quiz1(name, start_dt, end_dt) values('Charles', '19941021',
'19961031')
insert into quiz1(name, start_dt, end_dt) values('Charles', '19990501',
'20000331')
insert into quiz1(name, start_dt, end_dt) values('Charles', '20000331',
'99991231')
insert into quiz1(name, start_dt, end_dt) values('John', '19980103',
'19980727')
insert into quiz1(name, start_dt, end_dt) values('John', '19980727',
'20000103')
insert into quiz1(name, start_dt, end_dt) values('John', '20000103',
'20000601')
insert into quiz1(name, start_dt, end_dt) values('John', '20000601',
'99991231')
set nocount off
go

Celko Answers
First. we need to fix that REALLLLLLLLY bad schema:

IDENTITY cannot ever, ever, ever be a relational key!! You missed the
first day of RDBMS class. You never read the book! Duh!

CREATE TABLE Quiz1 -- an industry standard name?
(vague_over_sized_name VARCHAR(50) NOT NULL PRIMARY KEY, --lots of
Poles and Greeks?
start_dt DATETIME NOT NULL,
end_dt DATETIME NOT NULL,
CHECK (start_dt < end_dt)); -- true? You never stopped it!!

>> What is the best way to get the following results via query? <<

Do you work from specs this vague? If so, how????? Do you speak English
or a language that uses Latin alphabet? Posting DLL is good but not
enough; we need clear specs!!

My first guess, based on your vague posted, would be:

SELECT vague_over_sized_name, (MIN(start_dt) + INTERVAL 1 DAY) AS
start_dt, MAX(end_dt)
FROM Quiz1
GROUP BY vague_over_sized_name;

But we have no such specs. After a decade or so of doing
homework or someone's job for free,I would like the morons to learn to
write clear specs. That will get them into management or analysis.

If this is what she meant then we want to model the events , then we
need go back to SQL FOR SMARTIES and Snodgrass's work for more help.

how many joins is too many

SQL Apprentice Question
Does anyone have any good sources of information that talk about how many
joins is too many? I've heard the number 4 or 5 thrown around but haven't
found any 'evidence' to back them up.

Celko Answers

>> I've heard the number 4 or 5 thrown around but haven't found any 'evidence' to back them up. <<

The "Rule of Five" first came from a psychologist named Brown. The
idea is that huymans have a hard time with more than five distinctions
in an input (i.e. if I ask you to rank a movie, food, etc. on a 1-5
scale, you will tend to produce the same results next week. But if I
give you a 1-10 scale, you will not repeat the same answers next week.
This shows up in matching colors, taste, tones, phrases, grid patterns
and a ton of other things.

Fewer than five distinctions is better; more than seven is nearly
impossible. just for fun, look at the combinations 1 items =1, 2
items = 2!, .but .5! = 120 possible ways to arrange the items.

This was quoted in a Yourdon book back in the early days of Software
Engineering.

Now, we have a wonderful mental tool called "Chunking"; in SQL it is
views, CTEs, and derived tables. We take several things and aggregate
them into a new "single unit" until you get to five or fewer chunks.

Trigger on main table to update itself with detail totals

SQL Apprentice Question

I have a main table (Invoice) with a field (InvoiceTotal) that gets computed
from the sum of related detail records in another table (InvoiceDetail). If
a certain field in the Invoice table gets changed (Invoice.Version) then I
need it to re-calculate the summary field (Invoice.InvoiceTotal).

I would like to create a trigger on the Invoice table that fires only when
the Invoice.Version field is updated. However, I can't seem to get the
syntax correct to have table Invoice reference/update itself.

Here is what I am trying as a trigger on table Invoice. Any ideas where I
am going wrong or if there is another way to do this? I keep getting several
errors about " Invalid column name 'InvoiceID' "

IF UPDATE (Version)
BEGIN

UPDATE Invoice
SET Invoice.InvoiceTotal = (SELECT SUM(InvoiceDetail.TotalPrice) FROM
InvoiceDetail WHERE InvoiceDetail.InvoiceID = Invoice.InvoiceID AND
InvoiceDetail.Version = Invoice.Version)
FROM Invoice
INNER JOIN inserted on Invoice.InvoiceID = inserted.InvoiceID

END

Celko Answers
Please post DDL, so that people do not have to guess what the keys,
constraints, Declarative Referential Integrity, data types, etc. in
your schema are. Sample data is also a good idea, along with clear
specifications. It is very hard to debug code when you do not let us
see it.

>> I have a main [sic] table (Invoice) with a field [sic] (InvoiceTotal) that gets computed from the sum of related detail records [sic] in another table (InvoiceDetail). <<

Let's get back to the basics of an RDBMS. Rows are not records; fields
are not columns; tables are not files; the goal is to remove all
redudancy from the data. That means we do not want computed columns in
the schema.

>> I would like to create a trigger on the Invoice table that fires only when

the Invoice.Version field [sic] is updated. <<

And now, procedural code in a declarative language, subverting the most
basic ideas and all the advantages of SQL.

>> where I am going wrong or if there is another way to do this? <<

1) Everywhere -- you are a textbook disaster!! Tom Johnston used your
schema in a great series of articles a few years ago that went into
details that you can Google.

2) With an RDBMS schema? Have you ever read anything? Had a course? A
mentor?

Here is a skeleton schema:

CREATE TABLE Invoices -- plural , unless there is just one!!
(invoice_nbr INTEGER NOT NULL PRIMARY KEY,
customer_id INTEGER NOT NULL,
purchase_date DATETIME NOT NULL,
..);

CREATE TABLE InvoicesDetails
(invoice_nbr INTEGER NOT NULL,
sku INTEGER NOT NULL
REFERENCES Inventory (sku),
PRIMARY KEY (invoice_nbr, sku),
item_qty INTEGER NOT NULL,
..);

CREATE TABLE Inventory
(sku INTEGER NOT NULL PRIMARY KEY,
item_description CHAR(25) NOT NULL,
unit_cost DECIMAL (12, 2) NOT NULL,
..);

Since we have an RDBMS and not a ^%$*@ punch card or mag tape file
system, we can use a VIEW to get totals! They will always be right,
do not have to be materialized until needed and will not be redundant.

Monday, April 17, 2006

Bday dates revisited

SQL Apprentice Question
Hi,

I'm trying to create a query that will tell me when someone's birth day is
approaching from my employee table. I tried Case When
Datediff(dd,b_day,getdate()) < 4 Then 'Remind' Else '' End As B_dayCk

Datepart(dd,b_day) - Datepart(dd,b_day) < 4 (This checks for the day but not
for the month). If I And or Or with Datepart(mm,b_day) - Datepart(mm,b_day)
< 0 doesn't work either.

I'm sure there's a simple way out of this. Can someone help pls.

TIA

Celko Answers
You already have a Calendar table, right? Of course you do; it is one
of the first things you put in a schema.

So add a column for "(n) business days in the future". This kind of
column is used to get due dates for business purposes, where n = {30,
60, 90, 120}; you just want to have (n=4) from your example.

representing months with columns

SQL Apprentice Question
keep running into people that represent each month, January through
December, as a column in a table. Considering the nature of some of the
reports generated from this data, storing it in this way can certainly be
convenient; however I personally find the idea repugnant. I've been doing
this a while and think I have a pretty good intuitive grasp on what is good
database design and what isn't... but in this case the other designer could
come back and argue that since the number of months never changes,
representing each of them with a column is perfectly valid. Then I could
argue that that greatly complicates retrieving aggregate data, but then he
could say that my approach greatly complicates presenting the data with a
column per month...

In the beginning of my career as a database designer I never had issues like
this, I didn't need to work with anyone. Now it seems at every turn I end up
arguning with someone about what is good design and what isn't. Usually it's
regrettably obvious that the other person doesn't know what he's doing
("MUST you use nested cursors when all you need is a JOIN!?"), but in this
case I'm not so sure. Before I start another argument, I would like to hear
other people's thoughts in the matter.

Comments?

Celko Answers
This denormalized design is fine in a Data Warehouse or OLAP schema,
where the data is static, the range of the data is static and the data
validated before it goes into the tables.

It is a nightmare for OLTP. I would construct a Calendar table with
ranges for fiscal periods, holidays, promotional periods, etc. set it
up for 10-20 years and join to it to get your "annual window" for
reports.

As an aside, years ago there was an article in DB2 magazine comparing
performance and storage usage between the two designs. I vaguely
remember that the normalized design performed better but took up more
space after April in the sample data. Disk is cheap; time and
integrity are not.

CASE in where clause

SQL Apprentice Question
want to create SQL (maybe stored proc) using a SELECT statement that can
add WHERE conditions if (and only if) values are sent to it. For example:

SELECT ID, Person
FROM People
WHERE LastName LIKE '%@var1%' +
CASE WHEN @var2 IS NOT NULL THEN ' AND TypeCode = ' + @var2
ELSE ''
END

....etc

Is this possible? Thanks

Celko Answers
Fundamental mistake! There is no CASE **statement** in SQL. There is
a CASE **expression**; remember programming 101? Expressions return
scalar values, not control of execution flow.

SELECT person_id, person_name
FROM People
WHERE last_name LIKE '%' + @var1 + '%'
AND foobar_code = COALESCE(@my_code, foobar_code) ;

Even for an example, you had some pretty awful data element names.
Something can be a type of something or a code. It cannot be both.
There is no such thing as just an "id" -- it has to identify something
in particular.

You might want to get a book on SQL and data modeling.

The problem is that SQL is set-oriented and not sequential. The THEN
clauses in a CASE expression (which includes COALESCE()) all have to be
evaluated to determine the data type of the whole expression. It does
not matter if some of them are unreachable.

COALESCE correctly promotes its arguments to the highest data type in
the expression:

13 / COALESCE(CAST(NULL AS INTEGER), 2.00) = 6.5

The proprietary ISNULL() uses the first data type and gets things wrong

13 / ISNULL(CAST(NULL AS INTEGER), 2.00) = 6

You would need to write:

13 / ISNULL(CAST(NULL AS DECIMAL(4,2)), 2.00)

Thursday, April 13, 2006

Passing a parameter to IN predicate of DB2 udf

SQL Apprentice Question
I have encountered a problem in passing a VARCHAR parameter (which is a
list of accepted values) to the IN predicate of a DB2 user-defined
function.

For example, I have a table (mytable) storing ppl names & their
corresponding groups. I would like to create a UDF (myfunc) that select
a list of ppl names from (mytable) who belong to some dynamic choices
of groups. I have tried the following codings but it doesn't work if I
try to provide a list of designated choices of groups as a single
parameter.

Can anyone help? Thanks a lot!

////Reference SQLs:

CREATE TABLE mytable (
name VARCHAR(10),
group CHAR(1)
);

INSERT into mytable values ('Peter','A');
INSERT into mytable values ('John','A');
INSERT into mytable values ('Mary','B');
INSERT into mytable values ('Susan','C');

CREATE FUNCTION myfunc (cgroup VARCHAR(20))
RETURNS TABLE
(
name VARCHAR(10)
)

LANGUAGE SQL
READS SQL DATA
NO EXTERNAL ACTION
DETERMINISTIC
RETURN

SELECT
name
FROM
mytable
where
group in (cgroup)
;

> db2 "select * from mytable"

NAME GROUP
---------- -----
Peter A
John A
Mary B
Susan C

4 record(s) selected.

>db2 "select * from table(myfunc(char('A'))) as t1"

NAME
----------
Peter
John

2 record(s) selected.

>db2 "select * from table(myfunc(char('A,B'))) as t1"

NAME
----------

0 record(s) selected.

>db2 "select * from table(myfunc(char('''A'',''B'''))) as t1"

NAME
----------

0 record(s) selected.

Celko Answers
This is a common Newbie design error, usually comitted by people who
learned to program in BASIC instead of a compiled language. How many
parameters can a proceure accept? I don't the limit in DB2, but SQL
Server can have 1024 of them.

CREATE FUNCTION myfunc (cgroup VARCHAR(20))
RETURNS TABLE
(IN foobar_ name VARCHAR(10),
IN p1 INTEGER, IN p2 INTEGER, ..IN pn INTEGER)
LANGUAGE SQL
READS SQL DATA
NO EXTERNAL ACTION
DETERMINISTIC
RETURN
SELECTmfoobar_name
FROM Foobar
WHERE grp_nbr IN (p1, p2, .., pn);

This will avoid dynamic SQL and all the horrible problems you have with
it. You can do this parameter list with a simple text editor.

Literal value with "IN" clause

SQL Apprentice Question
Is it okay to use a literal value with the IN clause. E.g.

SELECT somefield, anotherfield
....
WHERE ...etc.
AND 1234 IN (SELECT userid FROM tblUsers)

I was told it wasn't valid, but I'm pretty sure it worked for me. Just
seeking clarification.

Celko Answers
It is valid, Standard SQL and can be a useful trick to avoid OR-ed
predicates. The IN() list just has to be expressions that will cast
to the proper data type.

Wednesday, April 12, 2006

Creating random numbers

SQL Apprentice Question
wanted to create random numbers for some strange purpose using SQL
Server 2000. Even if the numbers are going to repeat its fine. [Pls note I
don't want to use newid() ].

For ex: I want to pass two integer values say, 10, 30 ... now i expect
randoms to be generated between these two numbers. How do to it in SQL
Server. Can any one help me in this?

Celko Answers
The first problem is that there are two kinds of random selection from
a set:

1) With replacement = you can get multiple copies of the same value.
This is shooting dice.

This one is easy if you have a random function in your SQL product.
Most of the pseudo-random generators return a floating point fraction
value between 0.00 and 0.9999.... at whatever precision your SQL engine
has. The choice of a seed to start the generator can be the system
clock or some other constantly changing value.

SELECT S1.key_col
FROM SomeTable AS S1, SomeTable AS S2
WHERE S1.key_col <= S2.key_col
GROUP BY S1.key_col
HAVING COUNT(S2.key_col)
= (SELECT COUNT(*)
FROM SomeTable AS S3) * RANDOM(seed) + 1.0;

Or you can add a column for this.

CREATE TABLE RandNbrs2
(seq_nbr INTEGER PRIMARY KEY,
randomizer FLOAT -- warning !! not standard SQL
DEFAULT (
(CASE (CAST(RAND() + 0.5 AS INTEGER) * -1)
WHEN 0.0 THEN 1.0 ELSE -1.0 END)
* (CAST(RAND() * 100000 AS INTEGER) % 10000)
* RAND())
NOT NULL);

INSERT INTO RandNbrs2 VALUES (1, DEFAULT);
INSERT INTO RandNbrs2 VALUES (2, DEFAULT);
INSERT INTO RandNbrs2 VALUES (3, DEFAULT);
INSERT INTO RandNbrs2 VALUES (4, DEFAULT);
INSERT INTO RandNbrs2 VALUES (5, DEFAULT);
INSERT INTO RandNbrs2 VALUES (6, DEFAULT);
INSERT INTO RandNbrs2 VALUES (7, DEFAULT);
INSERT INTO RandNbrs2 VALUES (8, DEFAULT);
INSERT INTO RandNbrs2 VALUES (9, DEFAULT);
INSERT INTO RandNbrs2 VALUES (10, DEFAULT);

SELECT * FROM RandNbrs2;

2) Without replacement = you can each value only once. This is dealing
playing cards.

This is trickier. I would start with a table that has the keys and a
sequentially numbered column in it:

CREATE TABLE CardDeck
(keycol NOT NULL PRIMARY KEY,
seq INTEGER NOT NULL);

INSERT INTO CardDeck (keycol, seq)
SELECT S1.keycol, COUNT(S2.keycol)
FROM SomeTable AS S1, Sometable AS S2
WHERE S1.key_col <= S2.key_col
GROUP BY S1.key_col;

Now shuffle the deck by determing a random swap pair for all the rows:

BEGIN
DECLARE i INTEGER, j INTEGER;
SET i = (SELECT COUNT(*) FROM CardDeck);
WHILE i < 0
LOOP
SET j = (SELECT COUNT(*) FROM CardDeck) * RANDOM(seed) + 1.0;
UPDATE CardDeck
SET seq = CASE WHEN seq = i THEN j
WHEN seq = j THEN i
ELSE seq END;
WHERE seq IN (i, j);
SET i = i - 1;
LOOP END;
END;

You don't really need j, but it makes the code easier to read.

Biography:

Marsaglia, G and Zaman, A. 1990. Toward a Univesal Random Number
Generator. Statistics & Probability Letters 8 (1990) 35-39.

Marsaglia, G, B. Narasimhan, and A. Zaman. 1990. A Random Number
Generator for PC's. Computer Physics Communications 60 (1990) 345-349.

Leva, Joseph L. 1992. A Fast Normal Random Number Generator. ACM
Transactions on Mathematical Software. Dec 01 1992 v 18 n 4. p 449

Leva, Joseph L. 1992. Algorithm 712: A Normal Random Number Generator.
ACM Transactions on Mathematical Software. Dec 01 1992 v 18 n 4. p 454

Bays, Carter and W.E. Sharp. 1992. Improved Random Numbers for Your
Personal
Computer or Workstation. Geobyte. Apr 01 1992 v7 n2. p 25

Hulquist, Paul F. 1991. A Good Random Number Generator for
Microcomputers.Simulation. Oct 01 1991 v57 n 4. p 258

Komo, John J. 1991. Decimal Pseudo-random Number Generator. Simulation.
Oct 01 1991 v57 n4. p 228

Chambers, W.G. and Z.D. Dai. 1991. Simple but Effective Modification to
a Multiplicative Congruential Random-number Generator. IEEE
Proceedings.Computers and Digital Technology. May 01 1991 v 138 n3. p
121

Maier, W.L. 1991.. A Fast Pseudo Random Number Generator. Dr. Dobb's
Journal.May 01 1991 v17 n 5. p 152

Sezgin, Fatin. 1990. On a Fast and Portable Uniform Quasi-random Number
Generator. Simulation Digest. Wint 1990 v 21 n 2. p 30

Macomber, James H. and Charles S. White. 1990. An n-Dimensional Uniform
Random Number Generator Suitible for IBM-Compatible Microcomputers.
Interfaces. May 01 1990 v 20 n 3. p 49

Carta, David G. 1990. Two Fast Implementations of the "Minimal
Standard" Random Number Generator. Communications of the ACM. Jan 01
1990 v 33 n 1. p 87

Elkins, T.A. 1989. A Highly Random-number Generator. Computer
Language. Dec 01 1989 v 6 n 12 p 59

Kao, Chiang. A Random Number Generator for Microcomputers. OR: The
Journal of the Operational Research Society. Jul 01 1989 v 40 n 7. p
687

Chassing, P. 1989. An Optimal Random Number Generator Zp. Statistics &
Probability Letters. Feb 01 1989 v 7 n 4. p 307

Also, you can contact Kenneth G. Hamilton 72727,177 who has done some
work with RNG's. He has implemented one (at least one) of the best.

"A Digital Dissolve for Bit-Mapped Graphics Screens" by Mike Morton in
Dr.Dobb's Journal, November 1986, page 48.

CMOS Cookbook by Don Lancaster; Sams 1977, page 318.

Art of Computer Programming, Volume 2: Seminumeral Algorithms, 2nd
edition by Donald Knuth; Addison-Wesley 1981; page 29.

Numerical Recipes in Pascal: The Art of Scientific Computing by Press
et al.; Cambridge 1989; page 233.

Row numbering unpredictable

SQL Apprentice Question
need to create a stored procedure that returns the row number (for
paging) AFTER the data has been sorted with an order by. The source is
a view. The code I have is:

SELECT rownum = IDENTITY(1,1,bigint), *
INTO #tmp
FROM viewName
ORDER BY CustomerName -- field name I'm ordering by

When I recieve the results back, the rownum column is not the same
order as the customername (it jumps half way to a high number?!?),
which means I can't page it based on rownum without jumping all over
the dataset.

Anyone got any ideas on how to solve that other than client side paging
(in ADO :-P)
This is SQL 2000 SP3 (pah!)

Celko Answers
The basic principle of a tiered architecture is that display is done in
the front end adn NEVER in the database. Why are you seeking violating
40 years of Software Engineering?

Query help...

SQL Apprentice Question
have a table with two columns...

Inital Number, End Number

(sample data)
1, 1
1, 3
2, 4
1, 5
3, 2
etc

I want to make another column that provides the following information..
1-1
1-3
2-4
1-5
3-2

I figured I could make hundreds of case statements... eg:
case when initialnumber=1 and endnumber=1 then '1-1' end as Column3
case when initialnumber=1 and endnumber=2 then '1-2' end as Column3

But is there a better way?
Thanks

Celko Answers
Please post DDL, so that people do not have to guess what the keys,
constraints, Declarative Referential Integrity, data types, etc. in
your schema are. Sample data is also a good idea, along with clear
specifications. It is very hard to debug code when you do not let us
see it.

CREATE TABLE Foobar
(start_nbr INTEGER NOT NULL,
end_nbr INTEGER NOT NULL,
CHECK (start_nbr <= end_nbr),
PRIMARY KEY (start_nbr, end_nbr));

>> I want to make another column that provides the following information.<<

What you showed seemed to be display and formatting. Repeat after me
the fundamental rule of a tiered architecture is that dispaly is NEVER
-- repeat NEVER -- done in the data base.

Tuesday, April 11, 2006

Help with Stored Procedure

SQL Apprentice Question
Hi, I'm so new to programming in MS SQL that the paint is still wet behind my
ears...

Using sql 2000 and vb.net (I normally develop in MS Access)

I have a search form set up so that users can enter any combination of the
following search criteria:
User types all or part business name,
User types all or part account name,
User selects from a ComboBox a business type
User types all or part contact name,
User selects from a ComboBox a contact type

I want to have a stored procedure that can handle any combinations of data
entry. The following is my guess for structure and it does not compile...

CREATE PROCEDURE GetSearchResult
@Account nvarchar(10),
@Business nvarchar(100),
@BusinessTypeID Int,
@ContactID Int,
@Contact nvarchar(100),
@ContactTypeID Int
DECLARE @SearchOptions Int
DECLARE @ErrorStatus Int OUTPUT
AS
IF LEN(@Business)>0
SET @SearchOptions = 1
IF LEN(@Account) > 0
SET @SearchOptions = @SearchOptions + 2
IF @BusinessTypeID > 0
SET @SearchOptions = @SearchOptions + 4
IF LEN(@Contact) > 0
SET @SearchOptions = 10
IF @ContactTypeID > 0
SET @SearchOptions = @SearchOptions + 11
CASE @SearchOptions
WHEN 1 THEN
-- Only search for Business name
BEGIN
SELECT * FROM vwBusinessSearchResult
WHERE bBusiness LIKE '%' + @Business + '%'
END
WHEN 2 THEN
-- Only search for Account name
BEGIN
SELECT * FROM vwBusinessSearchResult
WHERE bAccount LIKE '%' + @Account + '%'
END
WHEN 3 THEN
-- Search for Business name and Account
BEGIN
SELECT * FROM vwBusinessSearchResult
WHERE bBusiness LIKE '%' + @Business + '%'
AND bAccount LIKE '%' + @Account + '%'
END
WHEN 4 THEN
-- Only search for Business type id
BEGIN
SELECT * FROM vwBusinessSearchResult
WHERE bID Exists In(SELECT bbtBusinessID
FROM tblBusinessBusinessType
WHERE bbtBusinessTypeID = @BusinessTypeID)
END
WHEN 5 THEN
-- Search for Business name and Business type id
BEGIN
SELECT * FROM vwBusinessSearchResult
WHERE bBusiness LIKE '%' + @Business + '%'
AND bID Exists In(SELECT bbtBusinessID
FROM tblBusinessBusinessType
WHERE bbtBusinessTypeID = @BusinessTypeID)
END
WHEN 6 THEN
-- Search for Account and Business type id
BEGIN
SELECT * FROM vwBusinessSearchResult
WHERE bAccount LIKE '%' + @Account + '%'
AND bID Exists In(SELECT bbtBusinessID
FROM tblBusinessBusinessType
WHERE bbtBusinessTypeID = @BusinessTypeID)
END
WHEN 7 THEN
-- Search for Business name, Account and Business type id
BEGIN
SELECT * FROM vwBusinessSearchResult
WHERE bBusiness LIKE '%' + @Business + '%'
AND bAccount LIKE '%' + @Account + '%'
AND bID Exists In(SELECT bbtBusinessID
FROM tblBusinessBusinessType
WHERE bbtBusinessTypeID = @BusinessTypeID)
END
WHEN 10 THEN
-- Only Search for Contact name
BEGIN
SELECT * FROM vwBusinessContactSearchResult
WHERE cFullName LIKE '%' + @Contact + '%'
END
WHEN 11 THEN
-- Only search for Contact type id
BEGIN
SELECT * FROM vwBusinessContactSearchResult
WHERE cID Exists In(SELECT cctContactID
FROM tblContactContactType
WHERE cctContactTypeID = @ContactTypeID)
END
WHEN 21 THEN
-- Search for Contact name name Contact type id
BEGIN
SELECT * FROM vwBusinessContactSearchResult
WHERE cFullName LIKE '%' + @Contact + '%'
AND cID Exists In(SELECT cctContactID
FROM tblContactContactType
WHERE cctContactTypeID = @ContactTypeID)
END
END
GO
GRANT EXECUTE ON GetSearchResult TO PUBLIC
GO

Can you please give me advice regarding my approach to facilitating a search
form and corrections for more stored procedure

Celko Answers
>> Is there any difference performance wise between using coalesce() in
this fashion vs isnull() ? <<

COALESCE () is Standard SQL and takes a list of parameters. It is a
member fo the CASE expression family.

COALESCE correctly promotes its arguments to the highest data type in
the expression:

13 / COALESCE(CAST(NULL AS INTEGER), 2.00) = 6.5

The proprietary ISNULL() uses the first data type and gets things wrong

13 / ISNULL(CAST(NULL AS INTEGER), 2.00) = 6

You would need to write:

13 / ISNULL(CAST(NULL AS DECIMAL(4,2)), 2.00)

Normalizing help and naming conventions

SQL Apprentice Question
I have an Access Database that I am working on normalizing to SQL Server.
This Access database has been revised multiple times, each time the
programmer (not me!) didn't work to get the existing data to the same level
as the revised program. For instance, the HairColor column has the
following DISTINCT values,

Auburn
Black
Black/Gray
Blonde
Brown
Brown/Gray
Dark Brown
Gray
Gray/Black
Gray/Brown
Lt. Brown
Red
Red-blonde
Red/Blonde
Red/Brown

Now the current revision of the Access database limits the values for the
HairColor column to,

Black
Blonde
Brown
Gray
Red

So I need to make the current values reflect the current dropdown. I know
how to go about this, but have a question about normalizing this database.
Should I create a seperate table, HairColor and then assign ID's to the
HairColor column? What is the benefit of doing it this way, instead of just
leaving the text in there?

Also, if I am to create a new table, how does the naming convention work
with that? Should I prefix my table name with fk_HairColor, or something
like that?

Any advice is appreciated!

Thanks!

Celko Answers
>> Should I create a seperate table, HairColor and then assign ID's to the
HairColor column? What is the benefit of doing it this way, instead of
just
leaving the text in there? <<

Let's start with basics:

1) Do your research! Is there an industry standard code for hair
colors? Use it

2) If not, then design an encoding that has a missing and/or misc code
value -- Hey, I am bald!

3) If you expect the values to change, the list to be long, or that you
willneed to add other information later, then put the encoding into a
separate table.

4) If you expect the values to be static, the list to be short or that
you will not need to add other information later, then put the encoding
into a CHECK() constraint. For example, the ISO sex codes are going to
stay the same for awhile, as are the two-letter state codes.

Get a copy of SQL PROGRAMMING STYLE. You never, never prefix a data
element name with thigns to tell you (1) Where it is used -- so no
table names (2) how it is used -- so no pk- or fk- crap (3) how it is
store, so no data types. Follow ISO-11179 rules and name a thing for
what it is, independent of its location or usage.

Do not blindly design an encoding as a numbered list. Think about
hierarchical ("Dewey Decimal") codes, abbreviation codes, etc. That
is also in SQL PROGRAMMING STYLE.

Complicated Update query based on existing data

SQL Apprentice Question
OK folks, may have a tough one or perhaps just not thinking it through
well. I need to create a SQL query or queries that updates two columns
based on some business rules. Here's an example of the data:

GroupID complete_num first_num second_num InUse biggest
965423 1.0 1 0
965423 2.0 2 0
965423 3.0 3 0 X
965423 3.1 3 1
965423 3.2 3 2 X
324554 1.0 1 0
324554 2.0 2 0 X X
123456 0.1 0 1
123456 0.2 0 2 X X

Hopefully the above even vaguely lines up for you. The last two
columns are currently blank. The representation above is how I would
like them to look after the queries run. So for the above data, you
have a group id that links all records for one set together. I need to
have the "InUse" box updated with a value of "X" for the highest number
that has zero in the second_num column for a group. I also need the
biggest column set to "X" for the largest number in a particular group,
so 3.1 is larger than 3.0. Keep in mind, I have separated the
complete_num into two columns as there could be a "decimal" value of
"10" which is higher than "1". They are not the same as complete_num
is not really a decimal numeric representation. It's used
programatically for other things. Seperating into two separate columns
allows better sorting of data as complete_num is varchar and first and
second num are integer. You will also see the case, 123456,where there
is no zero in the second_num column. In this case, the highest
second_num value will have both set to "X". Any help would be
appreciated and SQL queries are prefered over any procedures/functions
as this is a one time query I need to run against the database. If you
need further clarification or more examples, please let me know.

As requested, here is a table creation script and some minimal data to
limit the length of the message. Also Hugo's view is great however
since this data is for one time use, would imagine updates to the
existing data would be preferable over introducing a new view into the
mix. If not updating the data directly based on the view would now be
a simple matter when you introduce the Id from the original table into
the view.

CREATE TABLE [abcd].[dbo].[TBL1
(Id,GroupId,complete_num,first_num,second_num)] (
[Id] int NOT NULL,
[GroupID] nvarchar (60) NULL,
[complete_num] varchar (255) NULL,
[first_num] integer,
[second_num] integer,
[InUse] varchar (2) NULL,
[biggest] varchar (2) NULL,
)

INSERT INTO TBL1 (Id,GroupId,complete_num,first_num,second_num) VALUES
(91510,ABC1235,2.0,2,0);
INSERT INTO TBL1 (Id,GroupId,complete_num,first_num,second_num) VALUES
(89377,ABC1235,2.1,2,1);
INSERT INTO TBL1 (Id,GroupId,complete_num,first_num,second_num) VALUES
(89371,ABC1235,2.2,2,2);
INSERT INTO TBL1 (Id,GroupId,complete_num,first_num,second_num) VALUES
(1310,M123456,1.0,1,0);
INSERT INTO TBL1 (Id,GroupId,complete_num,first_num,second_num) VALUES
(1309,M123456,2.0,2,0);
INSERT INTO TBL1 (Id,GroupId,complete_num,first_num,second_num) VALUES
(1311,M123456,3.0,3,0);
INSERT INTO TBL1 (Id,GroupId,complete_num,first_num,second_num) VALUES
(1312,M123456,4.0,4,0);
INSERT INTO TBL1 (Id,GroupId,complete_num,first_num,second_num) VALUES
(1315,M123456,5.0,5,0);
INSERT INTO TBL1 (Id,GroupId,complete_num,first_num,second_num) VALUES
(1318,M123456,6.0,6,0);
INSERT INTO TBL1 (Id,GroupId,complete_num,first_num,second_num) VALUES
(1319,M123456,7.0,7,0);
INSERT INTO TBL1 (Id,GroupId,complete_num,first_num,second_num) VALUES
(1317,M123456,8.0,8,0);
INSERT INTO TBL1 (Id,GroupId,complete_num,first_num,second_num) VALUES
(5342,M123456,9.0,9,0);
INSERT INTO TBL1 (Id,GroupId,complete_num,first_num,second_num) VALUES
(5346,M123456,10.0,10,0);
INSERT INTO TBL1 (Id,GroupId,complete_num,first_num,second_num) VALUES
(5756,M123456,11.0,11,0);
INSERT INTO TBL1 (Id,GroupId,complete_num,first_num,second_num) VALUES
(6315,M123456,12.0,12,0);
INSERT INTO TBL1 (Id,GroupId,complete_num,first_num,second_num) VALUES
(6604,M123456,13.0,13,0);
INSERT INTO TBL1 (Id,GroupId,complete_num,first_num,second_num) VALUES
(6920,M123456,14.0,14,0);
INSERT INTO TBL1 (Id,GroupId,complete_num,first_num,second_num) VALUES
(1002,M123456,15.0,15,0);
INSERT INTO TBL1 (Id,GroupId,complete_num,first_num,second_num) VALUES
(4023,ARDFO32,0.1,0,1);

Celko Answers
Your programs will be total nightmares and crap until you learn how to
design a schema.

Look at the DDL; do you really have a NCHAR(60) grpoup identifier? In
Chinese?? Wel;l;, since you allowed it, you will get one! Why do you
have redundant split attributes (i.e. first_num || second_num =
complete_num)? Let's spit on normalization!! I also love the clear,
meaningful names of the data elements. Tell us what a thing is, not
its sequential order inside another column. Logical not physical
descriptions.

There are no keys, no constraints. This is not a table at all! And
you invented your own syntax for CREATE TABLE.

Your idea of updatind computed columns is a way to mimic punch cards.
Back in the 1950-60's we had to store those things in the physical
card, like you are doing now.

Making a guess, if you normalized your schema, had a key and followed
the baisc data modeling rules, would this nightmare look more like
this? Better names that show subordination (well, Foobar is a dummy
name, but that is all you gave us)

CREATE TABLE Foobar
(group_id CHAR(6) NOT NULL
CHECK (group_id LIKE '[0-9][0-9][0-9][0-9][0-9][0-9]')
section_nbr INTEGER DEFAULT 1 NOT NULL
CHECK (section_num >= 0),
subsection_nbr INTEGER DEFAULT 0 NOT NULL
CHECK (subsection_num >= 0),
PRIMARY KEY (group_id, section_nbr, subsection_nbr));

Do you need a constaint to assure that the subsections are in sequence?
Is there a check digit rule in the group_id? 90% of the work in RDBMS
is done in the DDL!!

>> So for the above data, you have a group id that links [sic] all records [sic] for one set together. <<

SQL does not have links; it has REFERENCES and grouping. Totally
different concepts, based on sets and not pre-RDBMS file and pointer
systems. Rows are nothing whatsoever like records.

>> I need to have the "InUse" box [sic, tha tis display, not SQL!] updated with a value of "X" for the highest number that has zero in the second_num [subsection] column for a group. <<

>> I also need the biggest column set to "X" for the largest number in a particular group,so 3.1 is larger than 3.0.<<

Now, the answer to your question is a VIEW, not a "punch cards and bit
flags" solution via updates.

CREATE VIEW InUseFoobar (group_id, section_nbr, subsection_nbr)
AS
SELECT group_id, MAX(section_nbr), 0
FROM Foobar
WHERE subsection = 0
GROUP BY group_id) ;

>> I also need the biggest column set to "X" for the largest number in a particular group,so 3.1 is larger than 3.0.<<

CREATE VIEW MaxFoobar (group_id, section_nbr, subsection_nbr)
AS
SELECT group_id, section_nbr, MAX(subsection_nbr)
FROM Foobar AS F1, InUseFoobar AS U1
WHERE F1.group_id = .U1.group_id
AND F1.section_nbr = .U1.section_nbr
GROUP BY group_id, section_nbr ;

<>

Joe Celko The SQL Apprentice