Joe Celko The SQL Apprentice: November 2006

Friday, November 24, 2006

While loop?

SQL Apprentice Question
So I am just starting to learn how to use Database. I need the help of a SQL
guru! I have these three tables:
CREATE TABLE SB3_ScheduleChange
(
emp_num EmpNumType:nvarchar(7) NOT NULL PRIMARY KEY,
Monday char(35) NULL,
Tuesday char(35) NULL,
Wednesday char(35) NULL,
Thursday char(35) NULL,
Friday char(35) NULL,
Saturday char(35) NULL,
Sunday char(35) NULL,
Name char(35) NOT NULL;
)

CREATE TABLE SB3_ScheduleChange
(
emp_num EmpNumType:nvarchar(7) NOT NULL PRIMARY KEY,
sched_type_id nt, NOT NULL,
start_date DateType:datetime NOT NULL,
end_date DateType:datetime NOT NULL,
comments varchar(8000) NULL,
)

CREATE TABLE SB3_ScheduleType
(
sched_type_id int, NOT NULL PRIMARY KEY,
reason char(20) NULL,

)

I am trying to come up with a query that will iterate through each employee
with the emp_num primary key in the emp_shcedule table. I would like the
iteration to check if the employee has an exception or "change" from their
regular default schedule in the table emp_schedule checking it against the
"SB3_ScheduleChange" table. And if the query finds a difference in schedule
to update or insert it into that employees default schedule until the change
is over on a certain date. My problem is I am new to SQL and I dont know how
to do this. Do I need a Calendar table? Use a while loop? Any examples,
help, or suggestions would be very much appreciated. Not that this will help
but here are some queries I have come up with. Just dont know how to tie it
all together.

This would list name, emp_num, sched_type_id, for each employee who has
A schedule type difference. Query shows every employee with a schedule type
difference.

SELECT es.Name, es.emp_num, sc.sched_type_id
FROM emp_schedule AS es JOIN SB3_ScheduleChange AS sc
ON(es.emp_num = sc.emp_num)

List name, emp_num of employees that are not assigned to any schedule
change. Query shows every employee without a schedule type difference.

SELECT name, emp_num
FROM emp_schedule AS es
WHERE NOT EXISTS
(SELECT emp_num
FROM SB3_ScheduleChange AS sc
WHERE es.emp_num = sc.emp_num);

And if necessary a Calendar CREATE TABLE. But once again I am not sure.
CREATE TABLE dbo.SB3_Calender
(
ActualDate DATETIME NOT NULL PRIMARY KEY,
MonthName CHAR(15) NULL,
DayNumber INT NULL,
YearNumber INT NULL,
DayOfWeek CHAR(15) NULL
CHECK (DayOfWeek IN ('Sunday', 'Monday', 'Tuesday',
'Wednesday','Thursday','Friday','Saturday')),
DayType CHAR(15) NULL
CHECK ( DayType IN ('Business','Weekend','Holiday')),
)

GO

SET NOCOUNT ON
DECLARE @Counter INT
DECLARE @ActualDate DATETIME
DECLARE @FirstDate DATETIME
SET @Counter = 1
SET @FirstDate = '1/1/2006'
SET @ActualDate = @FirstDate
WHILE @Counter < 1096
BEGIN
INSERT INTO Calender (ActualDate)
values(@ActualDate)
SET @ActualDate = DATEADD(day, @Counter, @FirstDate)
SET @Coutner = @Counter + 1
END

GO

UPDATE Calender
SET DayOfWeek = DateName(DW, ActualDate)

GO

UPDATE Calender
SET DayNumber = DateName(DD,ActualDate)

GO

UPDATE Calender
SET MonthName = DateName(MM,ActualDate)

GO

UPDATE Calender
SET YearNumber = DateName(YY,ActualDate)

GO

UPDATE Calender
SET DayType = 'Business'
WHERE DayOfWeek <> 'Saturday' AND DayOfWeek <> 'Sunday'

GO

UPDATE Calender
SET DayType = 'Weekend'
WHERE DayOfWeek = 'Saturday' OR DayOfWeek = 'Sunday'

GO

UPDATE Calender
SET DayType = 'Holiday'
WHERE (MonthName ='January' AND DayNumber = 1 ) OR
(MonthName ='July' AND DayNumber = 4) OR
(MonthName ='December' AND DayNumber = 25)

GO

Celko Answers

>> So I am just starting to learn how to use Database. <<

1) What you are doing is mimicing a paper schedule form, not designing
a normalized schema.

2) data element names like the days of the week are values and not
attributes. Tha tis the output of a cross tabs report, not a table.

3) Column names like "sched_type_id" are absurd; an attribute can a key
(identifier, points to a unique *entity* in the data model) or a type
(non-key *attribute*, holds a value) but NEVER both.

4) You are still thinking about procedural code instead of declarative
programming and data. YOu have designed a magentic tape file merge,
complete with record at a time processing. Welcom to 1957!

Why not something like this?

CREATE TABLE PersonnelSchedules
(emp_num CHAR(7) NOT NULL
REFERENCES Personnel(emp_num)
ON UPDATE CASCADE
ON DELETE CASCADE,
activity_date DATETIME NOT NULL,
PRIMARY KEY (emp_num, activity_date),
activity_code INTEGER NOT NULL
CHECK (activity_code IN (...)),
..); ,

I am assuming one activity per day; if not this can be modified by
adding an activity number. You might need other attributes, but I
don't know you business rules. A single fact should be modeled witha
single row in a single table, not split all over the schema.

>> I am trying to come up with a query that will iterate through each employee with the emp_num primary key in the emp_shcedule table. I would like the iteration to check if the employee has an exception or "change" from their regular default schedule in the table emp_schedule checking it against the "SB3_ScheduleChange" table. <<

Why not go to the PersonnelSchedules and make this change, without
storing it in multiple locations? If you need to mark something as a
schedule change, then add a column for that attribute to the table.

When you get a new guy, fill out his schedule for 5-10 years in
advance. It will use less disk space than a hi-res employee badge
photograph. Now you can use a VIEW to get the weekly schedules, you
can do manpower projections, etc.

Most queries are easy if you have the right DDL. You should never use
a cursor in your SQL; we had to in the old days because we did not have
CTEs, CASE expressions, etc.

complex(?) query

SQL Apprentice Question
I'm a novice sql writer and need some help in writing a query to
extract applicable data from the following table (titled EMPLOYEE):

--
ID_NUMBER CODE DATE
------------------ --------- --------
12 VO 20060914
12 XD 20060913
12 AD 20060912
12 WR 20060911
12 AT 20060910
45 VO 20060914
45 XR 20060913
45 AT 20060912
45 AD 20060911
45 AT 20060910
78 AD 20060914
78 AT 20060913
78 VO 20060912
78 AD 20060911
78 AT 20060910

I need to select ID_NUMBER
from EMPLOYEE
where CODE = 'VO'

caveat: I only want the ID_NUMBER(s) where the CODE = 'VO'
and the previous CODE (by DATE) = 'AD'
or the previous CODE (by DATE) = 'AD' with any CODE in between
except 'AT';

E.g., in the above example, the appropriate code should select
ID_NUMBER(s) 12 and 78 because
1. a VO code exists
2. an AD code (by DATE) precedes it
3. although 'AD' does not come immediately before 'VO' (in the
case of ID_NUMBER 12) 'AT' cannot be found in between

I hope I haven't confused anyone. Any help would be appreciated.

Celko Answers

>>I'm a novice sql writer and need some help in writing a query to extract applicable data from the following table (titled EMPLOYEE): <<

First, let's clean up your missing DDL. The table name should tell us
what set of entities is modeled in the table; do you really have one
employee? Small firm! Try Personnel -- the collective name of the set
or something that tells us what the set is. Code is too vague --
postal code? Date is both too vague *and* a reserved word. A name
like "id_number" is also uselessly general; emp_id would be a better
choice. Since you did not post DDL, we have to guess at constaints and
keys. A skeleton of what you need is something like this:

CREATE TABLE PersonnelActions
(emp_id INTEGER NOT NULL,
action_date action_dateTIME DEFAULT CURRENT_TIMESTAMP NOT NULL,
PRIMARY KEY (emp_id, foobar_date),
action_code CHAR(2) NOT NULL
CHECK (foobar_code IN ('VO', 'XD'))
);

You need to read a book on data modeling and ISO-11179 rules for names.
I would also look up the use of UPPERCASE for names -- it is the worst
way to code, being about 8-12% harder to detect misspellings. That is
why books and newspapers use lowercase.

>> I only want the emp_id(s) where the action_code = 'VO'

and the previous action_code (by action_date) = 'AD'
or the previous action_code (by action_date) = 'AD' with any
action_code in between
except 'AT'; <<

SELECT DISTINCT emp_id
FROM PersonnelAction AS PVO,
PersonnelAction AS PAD
WHERE PVO.emp_id = PAD.emp_id
AND PVO.action_code = 'VO'
AND PAD.action_code = 'AD'
AND PAD.action_date < PVO.action_date
AND NOT EXISTS
(SELECT *
FROM PersonnelAction AS PAT
WHERE PAT.action_code = 'AT'
AND PAT.emp_id = PVO.emp_id
AND PAT_action_date BETWEEN PAD.action_date AND
PVO.action_date);

one to one random mapping between int

SQL Apprentice Question
I need to one-to-one map a range of integer (A), say 1 to 10000, randomly to
1 to 10000 (B). And I need to get A if a B is given. Any existed good
algorithm in T-SQL?

Celko Answers
Here is an implementation of the additive congruential method of
generating values in pseudo-random order and is due to Roy Hann of
Rational Commerce Limited, a CA-Ingres consulting firm. It is based on
a shift-register and an XOR-gate, and it has its origins in
cryptography. While there are other ways to do this, this code is nice
because:

1) The algorithm can be written in C or another low level language for
speed. But math is fairly simple even in base ten.

2) The algorithm tends to generate successive values that are (usually)
"far apart", which is handy for improving the performance of tree
indexes. You will tend to put data on separate physical data pages in
storage.

3) The algorithm does not cycle until it has generated every possible
value, so we don't have to worry about duplicates. Just count how many
calls have been made to the generator.

4) The algorithm produces uniformly distributed values, which is a nice
mathematical property to have. It also does not include zero.

Generalizing the algorithm to arbitrary binary word sizes, and
therefore longer number sequences, is not as easy as you might think.
Finding the "tap" positions where bits are extracted for feedback
varies according to the word-size in an extremely non-obvious way.
Choosing incorrect tap positions results in an incomplete and usually
very short cycle, which is unusable. If you want the details and tap
positions for words of one to 100 bits, see E. J. Watson, "Primitive
Polynomials (Mod 2)", Mathematics of Computation, v.16, 1962,
p.368-369. Here is code for a 31-bit integer, which you can use:

see the details at:

http://www.rationalcommerce.com/resources/surrogates.htm

UPDATE generator31
SET keyval
= keyval/2 + MOD(MOD(keyval, 2) + MOD(keyval/2, 2), 2) * 8;

T-SQL version:

CREATE TABLE Generator31 (keyval INTEGER NOT NULL);
INSERT INTO Generator31 VALUES (1); -- any start value

UPDATE Generator31
SET keyval =
keyval/2 + ((keyval % 2) + (keyval/8 % 2) % 2)* 8
SELECT * FROM Generator31;

Or if you prefer, the algorithm in C:

int Generator31 ()
{static int n = 1;
n = n >> 1 ((n^n >> 3) & 1) << 30;
return n;

oining on "between x and y"

SQL Apprentice Question
I have these tables:

Sales (SaleId, ProductId, QTY)
Bandwidth (ProductId, RangeFrom, RangeTo, Price)

A RangeTo field is included in the Bandwidth table cause an "open end"
is possible; 1-10, 11-20, 21>

Now i would like to get a table like:

SalesView(SaleId, ProductId, Qty, Price)

Where price has to be NULL if it cannot be matched in the bandwith
table; i always need all the records from the Sales table...

I tried something like this

SELECT s.SaleId, s.ProductId, s.QTY, s.QTY*b.Price AS Price
FROM tblSales s
INNER JOIN tblBandwidth b
ON b.ProductId = s.ProductID
AND
(
s.QTY BETWEEN b.RangeFrom AND b.RangeTo
OR
(s.QTY >= b.RangeFrom AND b.RangeTo IS NULL)
)

With inner join i don't get all sales records (there out of range),
with outer join i get them doubled or more (and i don't want to use a
DISTINCT)

Any great ideas?

Celko Answers

>> A RangeTo field [sic] is included in the Bandwidth table cause an "open end" is possible; 1-10, 11-20, 21 <<

That usually means that the price for that range holds for all
quantities over the from range value. In your example, any ordr over
21 units gets that same unit price. But then you say:

>> price has to be NULL if it cannot be matched in the bandwith table; i always need all the records [sic] from the Sales table... <<

You can use DRI to force all sales to be of actual products you stock,
and for every bandwidth to reference a real product. You should not
have any missing data if you design the schema properly. Here is a
guess at what you want:

CREATE TABLE Sales
(sale_id INTEGER NOT NULL PRIMARY KEY,
product_id INTEGER NOT NULL
REFERENCES Products(product_id),
sold_qty INTEGER NOT NULL
CHECK(sold_qty > 0));

CREATE TABLE Bandwidth -- weird name, why not Discounts?
(product_id INTEGER NOT NULL
REFERENCES Products(product_id),
from_range INTEGER DEFAULT 1 NOT NULL
CHECK (from_range > 0),
to_range INTEGER,
CHECK(from_range <= to_range),
unit_price DECIMAL (10,4) NOT NULL,
PRIMARY KEY (product_id, from_range) );

If you have a single price for an item, then the DEFAULT values will
give you a (1, NULL) row in the Bandwidth table. Hide the work in a
VIEW that will always be current:

CREATE VIEW SalesRevenue(sale_id, product_id, sale_qty, sale_total)
AS
SELECT S.sale_id, S.product_id, S.sale_qty,
S.sale_qty*B.unit_price
FROM Sales AS S, Bandwidth AS B
WHERE B.product_id = S.product_id
AND S.sale_qty BETWEEN
B.from_range AND COALESCE (B.to_range, S.sale_qty);

The COALESCE() will handle any quantity not explicitly in the Bandwidth
table.

SQL Apprentice Question
guess i didn't explain myself in such a good way. I might also used
some strange descriptions like bandwidth, probably cause english is not
my native language. But, let's not blame it on that; in the situation i
have now, referential integrity by the database is not possible, for as
far as i know, mainly because of:

* the actual quantity for 1 line item (i will use this term instead of
sale) is determined by a group of line items (if 1 is updated the whole
lot can switch into another range)
* the bandwidth/ranges/discounts may have a closed range: "1-10, 11-20,
21-30" (dont ask me why, commercial reasons probably)
* the line items are gathered over a period of time; at time of closing
the period the definete prices/tariff should be set; the system should
indicate where bandwidth/ranges do not fit the quantities
* different 'pricelists' can be applied to a set of 'line items'/sales

this is the reason why i stated "Where price has to be NULL if it
cannot be matched in the bandwith
table; i always need all the records from the Sales table... ", but
maybe i should have used the word tariff, because i don't mean the
qty*tariff result here, just the tariff.

Celko Answers
>> * the actual quantity for 1 line item (i will use this term instead of sale) is determined by a group of line items (if 1 is updated the whole lot can switch into another range)<<

I would use a VIEW with a GROUP BY to build these groups.

>> * the bandwidth/ranges/discounts may have a closed range: "1-10, 11-20, 21-30" (dont ask me why, commercial reasons probably) <<

That is still strange to me, but okay ...

>> * the line items are gathered over a period of time; at time of closing the period the definite prices/tariff should be set; the system should indicate where bandwidth/ranges do not fit the quantities <<

And having a VIEW would let you keep the correct totals

>> * different 'pricelists' can be applied to a set of 'line items'/sales <<

I would put all the price lists into one table, with the ranges and
price list name ("A list" customers, "B list" customers, etc.)

Here is a slight re-write; see if it helps

CREATE TABLE Sales
(sale_id INTEGER NOT NULL PRIMARY KEY,
product_id INTEGER NOT NULL
REFERENCES Products(product_id),
sold_qty INTEGER NOT NULL
CHECK(sold_qty > 0));

CREATE TABLE Bandwidth -- weird name, why not Discounts?
(product_id INTEGER NOT NULL
REFERENCES Products(product_id),
from_range INTEGER DEFAULT 1 NOT NULL
CHECK (from_range > 0),
to_range INTEGER NOT NULL, -- fix this constraint
CHECK(from_range <= to_range),
unit_price DECIMAL (10,4) NOT NULL,
PRIMARY KEY (product_id, from_range) );

Use an OUTER JOIN to presrve the sales data.

CREATE VIEW SalesRevenue(sale_id, product_id, sale_qty, sale_total)
AS
SELECT S.sale_id, S.product_id, S.sale_qty,
S.sale_qty*B.unit_price
FROM Sales AS S
LEFT OUTER JOIN
Bandwidth AS B
ON B.product_id = S.product_id
AND S.sale_qty BETWEEN
B.from_range AND B.to_range;

query question...

SQL Apprentice Question
hi, not sure if db2 has a aggregate function to 'sum (concat)' a string
column in a group by?
i have a table like this:
myCategory, mySubCategory, myGroup, mySubGroup, myString
CatKey1, SubCatKey1, GrpKey1, SubGrpKey1, 'string1'
CatKey1, SubCatKey1, GrpKey1, SubGrpKey2, 'string2'
CatKey1, SubCatKey1, GrpKey1, SubGrpKey2, 'string3'
CatKey1, SubCatKey1, GrpKey1, SubGrpKey1, 'string4'
......

i want to group CatKye, SubCatKey, GrpKey, SubGrpKey
so, the result set to be like this (in sorted order):
myCategory, mySubCategory, myGroup, mySubGroup, myString
CatKey1, SubCatKey1, GrpKey1, SubGrpKey1, 'string1,string4'
CatKey1, SubCatKey1, GrpKey1, SubGrpKey2, 'string2,string3'

any idea? i heard there is something like ROWCONCAT... didn't find it
in db2 reference though.
thanks a lot.

Celko Answers
What you are looking for exists in Sybase as LIST() and one of the open
source products (Posttgres? I cannot remember). But that is not the
real question. Why do you wish to destroy First Normal Form (1NF) with
a concatendated list structure? It is the foundation of RDBMS, after
all.

Why are you formatting data in the back end? The basic principle of a
tiered architecture is that display is done in the front end and never
in the back end. This a more basic programming principle than just SQL
and RDBMS.

Yes, there are kludges in SQL to do this. You can also still write
procedural code with GOTOs and get "spaghetti code", but it does not
mean you should.

dynamic cursor - sorting in declaration

SQL Apprentice Question
I have a small table "ABC" like this:

id_position | value
---------------------------
1 | 11
2 | 22
3 | 33

I try to use a dynamic cursor as below.
When the statement "order by id_position" in declare part of the cursor_abc
is omitted - cursor work as it should.
But when the statement "order by id_position" is used, cursor behave as
static one.
What's the matter, does anybody know?

Code:

declare @id_position as int, @value as int

DECLARE cursor_abc CURSOR
FOR
select id_position, value from abc
order by id_position

set nocount on
open cursor_abc
FETCH NEXT FROM cursor_abc
INTO @id_position, @value

WHILE @@FETCH_STATUS = 0
BEGIN

print @id_position
print @value
print '----------------------------'

update abc set value=666 --next reading should give value=666

FETCH NEXT FROM cursor_abc
INTO @id_position, @value

END

CLOSE cursor_abc
DEALLOCATE cursor_abc
GO

Celko Answers
Please post DDL, so that people do not have to guess what the keys,
constraints, Declarative Referential Integrity, data types, etc. in
your schema are. Sample data is also a good idea, along with clear
specifications. It is very hard to debug code when you do not let us
see it.

In many years of writing SQL, I have seldom found a need for a cursor.
They usually run 1-2 orders of magnitude slower than a relational
solution.

When someone uses one, it is generally becasue they are mimicing a
magnetic tape file system, and probably violating the basic principle
of a tiered architecture that display is done in the front end and
never in the back end. This a more basic programming principle than
just SQL and RDBMS.

Finally, id_position is not an ISO-11179 data element name and it makes
no sense. Identifier of what? Position of what? You have two
adjectives without a noun. But I bet you mant it to be PHYSICAL
location because you are mimicing a magnetic tape file system, instead
of using SQL for an RDBMS.

What is your real problem? Show us and perhaps we can help you.

Thursday, November 16, 2006

Totals Help

SQL Apprentice Question
Based on this query..... how can I total just the starred SYS_OP's produced
column Compute by will sum them all.
I want daily totals of just the starred operations but still list them all.

DECLARE @sDate SmallDateTime
DECLARE @sEnd SmallDateTime
DECLARE @sPart Char(15)
SET @sDate = '20061002'
SET @sEnd = '20061005'
set @sPart = '24277'

SET NOCOUNT ON

SELECT CONVERT(CHAR(10),C_DATE,101) as JT_DATE,
JT.DEPT, JT.PART, OP AS JT_OP,
CASE
WHEN LOD.LAST_OP IS NULL THEN ''
ELSE
'***'
END
AS SYS_OP,
SHIFT, C_COUNT AS PRODUCED
FROM Label_Audit..JOB_TICKETS_RAW JT
LEFT JOIN LABEL_AUDIT..V_LAST_OP_DEPT LOD
ON LOD.PART = JT.PART
AND JT.OP = LOD.LAST_OP
WHERE C_DATE BETWEEN @sDate AND @sEnd
AND JT.PART = @sPart
ORDER BY C_DATE, OP, SHIFT

JT_DATE DEPT PART OP SYS_OP SHIFT PRODUCED
---------- ---- --------------- ----- ------ ----- -----------
10/02/2006 8800 24277 020 1 134
10/03/2006 8800 24277 020 1 201
10/03/2006 8800 24277 020 2 405
10/03/2006 8800 24277 020 3 500
10/04/2006 8800 24277 020 1 625
10/04/2006 8800 24277 020 2 620
10/04/2006 8800 24277 020 3 100
10/04/2006 8800 24277 025 2 634
10/04/2006 8800 24277 025 3 950
10/04/2006 8800 24277 030 *** 1 891
10/04/2006 8800 24277 030 *** 2 634
10/04/2006 8800 24277 030 *** 3 950
10/04/2006 8800 24277 030A 1 891
10/05/2006 8800 24277 020 3 401
10/05/2006 8800 24277 025 3 821
10/05/2006 8800 24277 030 *** 3 821

Celko Answers
1) start using ISO-8601 formats for dates, not local dialect

2) start using valid data element names (part?? Unh?! part_name?
part_nbr? part_wgt?) Is c_date "completion_date?" or what? Write code
as if another human being has to maintain it after you win the lottery.

3) Real SQL programmers do not write proprietary, deprecated COMPUTE BY
clause. This is a decades old legacy kludge from the Sybase days when
there were no report writers. Welcome to the year 2006 (actually, this
was bad code in 1996). You ought to be doing this in a proper tool and
not trying to fake a 1950's break and total file system report.

4) Your spec does not tell us what to sum! Production, maybe? And
what do we group by? Here is a skeleton query:

SELECT ..
SUM (CASE WHEN LOD.last_op IS NULL THEN c_count ELSE 0 END) AS
star_stuff
SUM (CASE WHEN LOD.last_op IS NOT NULL THEN c_count ELSE 0 END) AS
other_stuff
FROM ..
WHERE ..
GROUP BY .. ;

sql statement

SQL Apprentice Question
What is wrong int this SQL statement?

select top 10 DOCInt.*, DOCDet.* , Cate.*, Arti.*, [Ar An].*
from
DOCInt INNER JOIN DOCDet ON DOCInt.CodDoc=DOCDet.CodDoc
LEFT JOIN Cate ON DOCDet.IDCategory=Cate.[ID Category]
LEFT JOIN Arti ON DOCDet.IDArti=Arti.[ID Arti]
INNER JOIN [Ar An] ON DOCInt.IDAnag=[Ar An].[ID An]
GROUP BY DOCInt.IDDoc

Celko Answers

>> What is wrong in this SQL statement? <<

Well, your names are a nightmare that violate common sense and
ISO-11179 rules, but for now, you do not understand how a SELECT (GROUP
BY in particular) work:

Here is how a SELECT works in SQL ... at least in theory. Real
products will optimize things, but the code has to produce the same
results.

a) Start in the FROM clause and build a working table from all of the
joins, unions, intersections, and whatever other table constructors are
there. The [table expression] AS [correlation name] option allows you
give a name to this working table which you then have to use for the
rest of the containing query.

b) Go to the WHERE clause and remove rows that do not pass criteria;
that is, that do not test to TRUE (i.e. reject UNKNOWN and FALSE). The
WHERE clause is applied to the working set in the FROM clause.

c) Go to the optional GROUP BY clause, partiton the original table
into groups and reduce each grouping to a *single* row, replacing the
original working table with the new grouped table. The rows of a
grouped table must be only group characteristics: (1) a grouping column
(2) a statistic about the group (i.e. aggregate functions) (3) a
function or constant(4) an expression made up of only those three
items. The original table no longer exists and you cannot reference
anything in it (this was an error in early Sybase products).

d) Go to the optional HAVING clause and apply it against the grouped
working table; if there was no GROUP BY clause, treat the entire table
as one group.

e) Go to the SELECT clause and construct the expressions in the list.
This means that the scalar subqueries, function calls and expressions
in the SELECT are done after all the other clauses are done. The AS
operator can also give names to expressions in the SELECT list. These
new names come into existence all at once, but after the WHERE clause,
GROUP BY clause and HAVING clause have been executed; you cannot use
them in the SELECT list or the WHERE clause for that reason.

If there is a SELECT DISTINCT, then redundant duplicate rows are
removed. For purposes of defining a duplicate row, NULLs are treated
as matching (just like in the GROUP BY).

f) Nested query expressions follow the usual scoping rules you would
expect from a block structured language like C, Pascal, Algol, etc.
Namely, the innermost queries can reference columns and tables in the
queries in which they are contained.

g) The ORDER BY clause is part of a cursor, not a query. The result
set is passed to the cursor, which can only see the names in the SELECT
clause list, and the sorting is done there. The ORDER BY clause cannot
have expression in it, or references to other columns because the
result set has been converted into a sequential file structure and that
is what is being sorted.

As you can see, things happen "all at once" in SQL, not "from left to
right" as they would in a sequential file/procedural language model. In
those languages, these two statements produce different results:
READ (a, b, c) FROM File_X;
READ (c, a, b) FROM File_X;

while these two statements return the same data:

SELECT a, b, c FROM Table_X;
SELECT c, a, b FROM Table_X;

Think about what a confused mess this statement is in the SQL model.

SELECT f(c2) AS c1, f(c1) AS c2 FROM Foobar;

That is why such nonsense is illegal syntax.

Help me convince them this is bad!!!

SQL Apprentice Question
I have an IT manager that thinks he knows everything when it comes to
database design. Unfortunately, he's not familiar with relational
databases but thinks he knows it all. He's come up with a really crazy
design, and I've got to come up with all the reasons it's not right. I
want to have as much ammo as possible so we're not stuck supporting
something that will never perform well.

We're in the process of moving some functionality from a Z/OS DB2
database to a RS/6000 DB2. Currently the Z/OS database has a table
that has a four column primary key. We've determined that one of these
keys is not needed on the new system, which would leave us with a three
column key. However, the manager wants to make it a two column key, be
concatenating two of the source columns (separated by double commas)
into just one column on the new system. Searches can be done on either
of the source columns or both.

I know searching this will be horrific, with tons of LIKE statements,
and indexing will be next to useless. But I need as many reasons as
possible to not take this approach. I don't want to end up being
blamed for poor performance down the road.

Celko Answers
>> He wants to make this a generic table, that can be used for different types of data in the future. <<

Pretty clearly the concatenated column is a total violation of 1NF.
Likewise, a "generic table" a violation of logic and RDBMS. I call
such things "Britney Spears, Automobiles and Squid" tables to show that
they cannot even have a basic ISO-11179 data element name. "To be is
to be something in particular; to be nothing in particular is to be
nothing at all" -- Aristotle

Here is a little "cut *& paste" I give people who want ot do a EAV
design. Someone like your boss posted this:

CREATE TABLE EAV -- no key declared
(key_col VARCHAR (10) NULL,
attrib_value VARCHAR (50) NULL);

INSERT INTO EAV VALUES ('LOCATION','Bedroom');
INSERT INTO EAV VALUES ('LOCATION','Dining Room');
INSERT INTO EAV VALUES ('LOCATION','Bathroom');
INSERT INTO EAV VALUES ('LOCATION','courtyard');
INSERT INTO EAV VALUES ('EVENT','verbal aggression');
INSERT INTO EAV VALUES ('EVENT','peer');
INSERT INTO EAV VALUES ('EVENT','bad behavior');
INSERT INTO EAV VALUES ('EVENT','other');

CREATE TABLE EAV_DATA -note lack of constraints, defaults, DRI
(id INTEGER IDENTITY (1,1) NOT NULL,
bts_id INTEGER NULL,
key_col VARCHAR (10) NULL,
attrib_value VARCHAR (50) NULL );

INSERT INTO EAV_DATA VALUES (1, 'LOCATION', 'Bedroom');
INSERT INTO EAV_DATA VALUES (1, 'EVENT', 'other');
INSERT INTO EAV_DATA VALUES (1, 'EVENT', 'bad behavior');
INSERT INTO EAV_DATA VALUES (2, 'LOCATION', 'Bedroom');
INSERT INTO EAV_DATA VALUES (2, 'EVENT', 'other');
INSERT INTO EAV_DATA VALUES (2, 'EVENT', 'verbal aggression');
INSERT INTO EAV_DATA VALUES (3, 'LOCATION', 'courtyard');
INSERT INTO EAV_DATA VALUES (3, 'EVENT', 'other');
INSERT INTO EAV_DATA VALUES (3, 'EVENT', 'peer');

Ideally, the result set of the query would be Location Event count
(headings if possible)

Bedroom verbal aggression 1
Bedroom peer 0
Bedroom bad behavior 0
Bedroom other 2
Dining Room verbal aggression 0
Dining Room peer 0
Dining Room bad behavior 0
Dining Room other 0
Bathroom verbal aggression 0
Bathroom peer 0
Bathroom bad behavior 0
Bathroom other 0
courtyard verbal aggression 0
courtyard peer 1
courtyard bad behavior 0
courtyard other 1

Also, if possible, another query would return this result set. (I think
I know how to do this one.)

Location Event count
Bedroom verbal aggression 1
Bedroom other 2
courtyard peer 1
courtyard other 1

Here is an answer From: Thomas Coleman

SELECT Locations.locationvalue, Events.eventvalue,
(SELECT COUNT(*)
FROM (SELECT LocationData.locationvalue, EventData.eventvalue

FROM (SELECT TD1.bts_id, TD1.value AS locationvalue
FROM eav_data AS TD1
WHERE TD1.key = 'location') AS LocationData
INNER JOIN
(SELECT TD2.bts_id, TD2.value AS eventvalue
FROM eav_data AS TD2
WHERE TD2.key = 'event'
) AS EventData
ON LocationData.bts_id = EventData.bts_id
) AS CollatedEventData
WHERE CollatedEventData.locationvalue = Locations.locationvalue
AND CollatedEventData.eventvalue = Events.eventvalue
FROM (SELECT T1.value AS locationvalue
FROM EAV AS T1
WHERE T1.key = 'location') AS Locations,
(SELECT T2.value AS eventvalue
FROM EAV AS T2
WHERE T2.key = 'event') AS Events
ORDER BY Locations.locationvalue, Events.eventvalue ,
SELECT Locations.locationvalue, Events.eventvalue
(SELECT COUNT(*)
FROM (SELECT LocationData.locationvalue, EventData.eventvalue

FROM (SELECT TD1.bts_id, TD1.value AS locationvalue
FROM eav_data AS TD1
WHERE TD1.key = 'location') AS LocationData
INNER JOIN
(SELECT TD2.bts_id, TD2.value AS eventvalue
FROM eav_data AS TD2
WHERE TD2.key = 'event') AS EventData
ON LocationData.bts_id = EventData.bts_id)
AS CollatedEventData
WHERE CollatedEventData.locationvalue = Locations.locationvalue
AND CollatedEventData.eventvalue = Events.eventvalue)
FROM (SELECT T1.value AS locationvalue
FROM EAV AS T1
WHERE T1.key = 'location') AS Locations,
(SELECT T2.value AS eventvalue
FROM EAV AS T2
WHERE T2.key = 'event') AS Events;

Is the same thing in a proper schema as:

SELECT L.locationvalue, E.eventvalue, COUNT(*)
FROM Locations AS L, Events AS E
WHERE L.btd_id = E.btd_id
GROUP BY L.locationvalue, E.eventvalue;

The reason that I had to use so many subqueries is that those entities
are all plopped into the same table. There should be separate tables
for Locations and Events.

The column names are seriously painful Don't use "key" and "value" for
column names. It means that the developer *has* surround the column
name with double quotes for everything which is a serious pain.

There is such a thing as "too" generic. There has to be some structure
or everything becomes nothing more than a couple of tables called
"things". The real key (no pun intended) is commonality. Is there a
pattern to the data that they want to store? It may not be possible to
create one structure to rule them all and in the darkness bind them.

"To be is to be something in particular; to be nothing in particular is
to be nothing." --Aristole

All data integrity is destroyed. Any typo becomes a new attribute or
entity. Entities are found missing attributes, so all the reports are
wrong.

Try to write a single CHECK() constraint that works for all the
attributes of those 30+ entities your users created because you were
too dumb or too lazy to do your job. It can be done! You need a case
expression almost 70 WHEN clauses for a simple invoice and order system
when I tried it as an exercise.

Try to write a single DEFAULT clause for 30+ entities crammed into one
column. Impossible!

Try to set up DRI actions among the entities. If you thought the WHEN
clauses in the single CASE expression were unmaintainable, wait until
you see the "TRIGGERs from Hell" -- Too bad that they might not fit
into older SQL Server which had some size limits. Now maintain it.

Best way to force a varchar column to have no whitespace

SQL Apprentice Question
I have a column that I do not want any whitespace in whatsoever. I'm
wondering how do enforce this a DDL level instead of in 40 million
seat-of-the-pants after-the-fact computer programs accessing the
database.

Celko Answers
CONSTRAINT no_white_space
CHECK ( LEW(foo) = LEN (REPLACE (foo, ' ', '')))

You can then nest calls to REPLACE() for tabs,newlines, etc. easily.

Joe Celko The SQL Apprentice