SQL for Smarties | SQL Programming Style | Trees and Hierarchies in SQL | SQL Puzzles and Answers | Data and Databases


Friday, November 24, 2006

While loop?

SQL Apprentice Question
So I am just starting to learn how to use Database. I need the help of a SQL
guru! I have these three tables:
CREATE TABLE SB3_ScheduleChange
(
emp_num EmpNumType:nvarchar(7) NOT NULL PRIMARY KEY,
Monday char(35) NULL,
Tuesday char(35) NULL,
Wednesday char(35) NULL,
Thursday char(35) NULL,
Friday char(35) NULL,
Saturday char(35) NULL,
Sunday char(35) NULL,
Name char(35) NOT NULL;
)

CREATE TABLE SB3_ScheduleChange
(
emp_num EmpNumType:nvarchar(7) NOT NULL PRIMARY KEY,
sched_type_id nt, NOT NULL,
start_date DateType:datetime NOT NULL,
end_date DateType:datetime NOT NULL,
comments varchar(8000) NULL,
)


CREATE TABLE SB3_ScheduleType
(
sched_type_id int, NOT NULL PRIMARY KEY,
reason char(20) NULL,


)


I am trying to come up with a query that will iterate through each employee
with the emp_num primary key in the emp_shcedule table. I would like the
iteration to check if the employee has an exception or "change" from their
regular default schedule in the table emp_schedule checking it against the
"SB3_ScheduleChange" table. And if the query finds a difference in schedule
to update or insert it into that employees default schedule until the change
is over on a certain date. My problem is I am new to SQL and I dont know how
to do this. Do I need a Calendar table? Use a while loop? Any examples,
help, or suggestions would be very much appreciated. Not that this will help
but here are some queries I have come up with. Just dont know how to tie it
all together.


This would list name, emp_num, sched_type_id, for each employee who has
A schedule type difference. Query shows every employee with a schedule type
difference.


SELECT es.Name, es.emp_num, sc.sched_type_id
FROM emp_schedule AS es JOIN SB3_ScheduleChange AS sc
ON(es.emp_num = sc.emp_num)


List name, emp_num of employees that are not assigned to any schedule
change. Query shows every employee without a schedule type difference.


SELECT name, emp_num
FROM emp_schedule AS es
WHERE NOT EXISTS
(SELECT emp_num
FROM SB3_ScheduleChange AS sc
WHERE es.emp_num = sc.emp_num);


And if necessary a Calendar CREATE TABLE. But once again I am not sure.
CREATE TABLE dbo.SB3_Calender
(
ActualDate DATETIME NOT NULL PRIMARY KEY,
MonthName CHAR(15) NULL,
DayNumber INT NULL,
YearNumber INT NULL,
DayOfWeek CHAR(15) NULL
CHECK (DayOfWeek IN ('Sunday', 'Monday', 'Tuesday',
'Wednesday','Thursday','Friday','Saturday')),
DayType CHAR(15) NULL
CHECK ( DayType IN ('Business','Weekend','Holiday')),
)


GO


SET NOCOUNT ON
DECLARE @Counter INT
DECLARE @ActualDate DATETIME
DECLARE @FirstDate DATETIME
SET @Counter = 1
SET @FirstDate = '1/1/2006'
SET @ActualDate = @FirstDate
WHILE @Counter < 1096
BEGIN
INSERT INTO Calender (ActualDate)
values(@ActualDate)
SET @ActualDate = DATEADD(day, @Counter, @FirstDate)
SET @Coutner = @Counter + 1
END


GO


UPDATE Calender
SET DayOfWeek = DateName(DW, ActualDate)


GO


UPDATE Calender
SET DayNumber = DateName(DD,ActualDate)


GO


UPDATE Calender
SET MonthName = DateName(MM,ActualDate)


GO


UPDATE Calender
SET YearNumber = DateName(YY,ActualDate)


GO


UPDATE Calender
SET DayType = 'Business'
WHERE DayOfWeek <> 'Saturday' AND DayOfWeek <> 'Sunday'


GO


UPDATE Calender
SET DayType = 'Weekend'
WHERE DayOfWeek = 'Saturday' OR DayOfWeek = 'Sunday'


GO


UPDATE Calender
SET DayType = 'Holiday'
WHERE (MonthName ='January' AND DayNumber = 1 ) OR
(MonthName ='July' AND DayNumber = 4) OR
(MonthName ='December' AND DayNumber = 25)


GO



Celko Answers

>> So I am just starting to learn how to use Database. <<


1) What you are doing is mimicing a paper schedule form, not designing
a normalized schema.

2) data element names like the days of the week are values and not
attributes. Tha tis the output of a cross tabs report, not a table.


3) Column names like "sched_type_id" are absurd; an attribute can a key
(identifier, points to a unique *entity* in the data model) or a type
(non-key *attribute*, holds a value) but NEVER both.


4) You are still thinking about procedural code instead of declarative
programming and data. YOu have designed a magentic tape file merge,
complete with record at a time processing. Welcom to 1957!


Why not something like this?


CREATE TABLE PersonnelSchedules
(emp_num CHAR(7) NOT NULL
REFERENCES Personnel(emp_num)
ON UPDATE CASCADE
ON DELETE CASCADE,
activity_date DATETIME NOT NULL,
PRIMARY KEY (emp_num, activity_date),
activity_code INTEGER NOT NULL
CHECK (activity_code IN (...)),
..); ,


I am assuming one activity per day; if not this can be modified by
adding an activity number. You might need other attributes, but I
don't know you business rules. A single fact should be modeled witha
single row in a single table, not split all over the schema.



>> I am trying to come up with a query that will iterate through each employee with the emp_num primary key in the emp_shcedule table. I would like the iteration to check if the employee has an exception or "change" from their regular default schedule in the table emp_schedule checking it against the "SB3_ScheduleChange" table. <<


Why not go to the PersonnelSchedules and make this change, without
storing it in multiple locations? If you need to mark something as a
schedule change, then add a column for that attribute to the table.

When you get a new guy, fill out his schedule for 5-10 years in
advance. It will use less disk space than a hi-res employee badge
photograph. Now you can use a VIEW to get the weekly schedules, you
can do manpower projections, etc.


Most queries are easy if you have the right DDL. You should never use
a cursor in your SQL; we had to in the old days because we did not have
CTEs, CASE expressions, etc.

complex(?) query

SQL Apprentice Question
I'm a novice sql writer and need some help in writing a query to
extract applicable data from the following table (titled EMPLOYEE):


--
ID_NUMBER CODE DATE
------------------ --------- --------
12 VO 20060914
12 XD 20060913
12 AD 20060912
12 WR 20060911
12 AT 20060910
45 VO 20060914
45 XR 20060913
45 AT 20060912
45 AD 20060911
45 AT 20060910
78 AD 20060914
78 AT 20060913
78 VO 20060912
78 AD 20060911
78 AT 20060910


I need to select ID_NUMBER
from EMPLOYEE
where CODE = 'VO'


caveat: I only want the ID_NUMBER(s) where the CODE = 'VO'
and the previous CODE (by DATE) = 'AD'
or the previous CODE (by DATE) = 'AD' with any CODE in between
except 'AT';


E.g., in the above example, the appropriate code should select
ID_NUMBER(s) 12 and 78 because
1. a VO code exists
2. an AD code (by DATE) precedes it
3. although 'AD' does not come immediately before 'VO' (in the
case of ID_NUMBER 12) 'AT' cannot be found in between


I hope I haven't confused anyone. Any help would be appreciated.



Celko Answers

>>I'm a novice sql writer and need some help in writing a query to extract applicable data from the following table (titled EMPLOYEE): <<


First, let's clean up your missing DDL. The table name should tell us
what set of entities is modeled in the table; do you really have one
employee? Small firm! Try Personnel -- the collective name of the set
or something that tells us what the set is. Code is too vague --
postal code? Date is both too vague *and* a reserved word. A name
like "id_number" is also uselessly general; emp_id would be a better
choice. Since you did not post DDL, we have to guess at constaints and
keys. A skeleton of what you need is something like this:

CREATE TABLE PersonnelActions
(emp_id INTEGER NOT NULL,
action_date action_dateTIME DEFAULT CURRENT_TIMESTAMP NOT NULL,
PRIMARY KEY (emp_id, foobar_date),
action_code CHAR(2) NOT NULL
CHECK (foobar_code IN ('VO', 'XD'))
);


You need to read a book on data modeling and ISO-11179 rules for names.
I would also look up the use of UPPERCASE for names -- it is the worst
way to code, being about 8-12% harder to detect misspellings. That is
why books and newspapers use lowercase.



>> I only want the emp_id(s) where the action_code = 'VO'


and the previous action_code (by action_date) = 'AD'
or the previous action_code (by action_date) = 'AD' with any
action_code in between
except 'AT'; <<

SELECT DISTINCT emp_id
FROM PersonnelAction AS PVO,
PersonnelAction AS PAD
WHERE PVO.emp_id = PAD.emp_id
AND PVO.action_code = 'VO'
AND PAD.action_code = 'AD'
AND PAD.action_date < PVO.action_date
AND NOT EXISTS
(SELECT *
FROM PersonnelAction AS PAT
WHERE PAT.action_code = 'AT'
AND PAT.emp_id = PVO.emp_id
AND PAT_action_date BETWEEN PAD.action_date AND
PVO.action_date);

one to one random mapping between int

SQL Apprentice Question
I need to one-to-one map a range of integer (A), say 1 to 10000, randomly to
1 to 10000 (B). And I need to get A if a B is given. Any existed good
algorithm in T-SQL?

Celko Answers
Here is an implementation of the additive congruential method of
generating values in pseudo-random order and is due to Roy Hann of
Rational Commerce Limited, a CA-Ingres consulting firm. It is based on
a shift-register and an XOR-gate, and it has its origins in
cryptography. While there are other ways to do this, this code is nice
because:

1) The algorithm can be written in C or another low level language for
speed. But math is fairly simple even in base ten.


2) The algorithm tends to generate successive values that are (usually)
"far apart", which is handy for improving the performance of tree
indexes. You will tend to put data on separate physical data pages in
storage.


3) The algorithm does not cycle until it has generated every possible
value, so we don't have to worry about duplicates. Just count how many
calls have been made to the generator.


4) The algorithm produces uniformly distributed values, which is a nice
mathematical property to have. It also does not include zero.


Generalizing the algorithm to arbitrary binary word sizes, and
therefore longer number sequences, is not as easy as you might think.
Finding the "tap" positions where bits are extracted for feedback
varies according to the word-size in an extremely non-obvious way.
Choosing incorrect tap positions results in an incomplete and usually
very short cycle, which is unusable. If you want the details and tap
positions for words of one to 100 bits, see E. J. Watson, "Primitive
Polynomials (Mod 2)", Mathematics of Computation, v.16, 1962,
p.368-369. Here is code for a 31-bit integer, which you can use:


see the details at:


http://www.rationalcommerce.com/resources/surrogates.htm

UPDATE generator31
SET keyval
= keyval/2 + MOD(MOD(keyval, 2) + MOD(keyval/2, 2), 2) * 8;


T-SQL version:


CREATE TABLE Generator31 (keyval INTEGER NOT NULL);
INSERT INTO Generator31 VALUES (1); -- any start value


UPDATE Generator31
SET keyval =
keyval/2 + ((keyval % 2) + (keyval/8 % 2) % 2)* 8
SELECT * FROM Generator31;


Or if you prefer, the algorithm in C:


int Generator31 ()
{static int n = 1;
n = n >> 1 ((n^n >> 3) & 1) << 30;
return n;

oining on "between x and y"

SQL Apprentice Question
I have these tables:


Sales (SaleId, ProductId, QTY)
Bandwidth (ProductId, RangeFrom, RangeTo, Price)


A RangeTo field is included in the Bandwidth table cause an "open end"
is possible; 1-10, 11-20, 21>


Now i would like to get a table like:


SalesView(SaleId, ProductId, Qty, Price)


Where price has to be NULL if it cannot be matched in the bandwith
table; i always need all the records from the Sales table...


I tried something like this


SELECT s.SaleId, s.ProductId, s.QTY, s.QTY*b.Price AS Price
FROM tblSales s
INNER JOIN tblBandwidth b
ON b.ProductId = s.ProductID
AND
(
s.QTY BETWEEN b.RangeFrom AND b.RangeTo
OR
(s.QTY >= b.RangeFrom AND b.RangeTo IS NULL)
)


With inner join i don't get all sales records (there out of range),
with outer join i get them doubled or more (and i don't want to use a
DISTINCT)


Any great ideas?



Celko Answers

>> A RangeTo field [sic] is included in the Bandwidth table cause an "open end" is possible; 1-10, 11-20, 21 <<


That usually means that the price for that range holds for all
quantities over the from range value. In your example, any ordr over
21 units gets that same unit price. But then you say:


>> price has to be NULL if it cannot be matched in the bandwith table; i always need all the records [sic] from the Sales table... <<


You can use DRI to force all sales to be of actual products you stock,
and for every bandwidth to reference a real product. You should not
have any missing data if you design the schema properly. Here is a
guess at what you want:

CREATE TABLE Sales
(sale_id INTEGER NOT NULL PRIMARY KEY,
product_id INTEGER NOT NULL
REFERENCES Products(product_id),
sold_qty INTEGER NOT NULL
CHECK(sold_qty > 0));


CREATE TABLE Bandwidth -- weird name, why not Discounts?
(product_id INTEGER NOT NULL
REFERENCES Products(product_id),
from_range INTEGER DEFAULT 1 NOT NULL
CHECK (from_range > 0),
to_range INTEGER,
CHECK(from_range <= to_range),
unit_price DECIMAL (10,4) NOT NULL,
PRIMARY KEY (product_id, from_range) );


If you have a single price for an item, then the DEFAULT values will
give you a (1, NULL) row in the Bandwidth table. Hide the work in a
VIEW that will always be current:


CREATE VIEW SalesRevenue(sale_id, product_id, sale_qty, sale_total)
AS
SELECT S.sale_id, S.product_id, S.sale_qty,
S.sale_qty*B.unit_price
FROM Sales AS S, Bandwidth AS B
WHERE B.product_id = S.product_id
AND S.sale_qty BETWEEN
B.from_range AND COALESCE (B.to_range, S.sale_qty);


The COALESCE() will handle any quantity not explicitly in the Bandwidth
table.


SQL Apprentice Question
guess i didn't explain myself in such a good way. I might also used
some strange descriptions like bandwidth, probably cause english is not
my native language. But, let's not blame it on that; in the situation i
have now, referential integrity by the database is not possible, for as
far as i know, mainly because of:


* the actual quantity for 1 line item (i will use this term instead of
sale) is determined by a group of line items (if 1 is updated the whole
lot can switch into another range)
* the bandwidth/ranges/discounts may have a closed range: "1-10, 11-20,
21-30" (dont ask me why, commercial reasons probably)
* the line items are gathered over a period of time; at time of closing
the period the definete prices/tariff should be set; the system should
indicate where bandwidth/ranges do not fit the quantities
* different 'pricelists' can be applied to a set of 'line items'/sales


this is the reason why i stated "Where price has to be NULL if it
cannot be matched in the bandwith
table; i always need all the records from the Sales table... ", but
maybe i should have used the word tariff, because i don't mean the
qty*tariff result here, just the tariff.

Celko Answers
>> * the actual quantity for 1 line item (i will use this term instead of sale) is determined by a group of line items (if 1 is updated the whole lot can switch into another range)<<


I would use a VIEW with a GROUP BY to build these groups.


>> * the bandwidth/ranges/discounts may have a closed range: "1-10, 11-20, 21-30" (dont ask me why, commercial reasons probably) <<


That is still strange to me, but okay ...


>> * the line items are gathered over a period of time; at time of closing the period the definite prices/tariff should be set; the system should indicate where bandwidth/ranges do not fit the quantities <<


And having a VIEW would let you keep the correct totals


>> * different 'pricelists' can be applied to a set of 'line items'/sales <<


I would put all the price lists into one table, with the ranges and
price list name ("A list" customers, "B list" customers, etc.)

Here is a slight re-write; see if it helps


CREATE TABLE Sales
(sale_id INTEGER NOT NULL PRIMARY KEY,
product_id INTEGER NOT NULL
REFERENCES Products(product_id),
sold_qty INTEGER NOT NULL
CHECK(sold_qty > 0));


CREATE TABLE Bandwidth -- weird name, why not Discounts?
(product_id INTEGER NOT NULL
REFERENCES Products(product_id),
from_range INTEGER DEFAULT 1 NOT NULL
CHECK (from_range > 0),
to_range INTEGER NOT NULL, -- fix this constraint
CHECK(from_range <= to_range),
unit_price DECIMAL (10,4) NOT NULL,
PRIMARY KEY (product_id, from_range) );


Use an OUTER JOIN to presrve the sales data.


CREATE VIEW SalesRevenue(sale_id, product_id, sale_qty, sale_total)
AS
SELECT S.sale_id, S.product_id, S.sale_qty,
S.sale_qty*B.unit_price
FROM Sales AS S
LEFT OUTER JOIN
Bandwidth AS B
ON B.product_id = S.product_id
AND S.sale_qty BETWEEN
B.from_range AND B.to_range;

query question...

SQL Apprentice Question
hi, not sure if db2 has a aggregate function to 'sum (concat)' a string
column in a group by?
i have a table like this:
myCategory, mySubCategory, myGroup, mySubGroup, myString
CatKey1, SubCatKey1, GrpKey1, SubGrpKey1, 'string1'
CatKey1, SubCatKey1, GrpKey1, SubGrpKey2, 'string2'
CatKey1, SubCatKey1, GrpKey1, SubGrpKey2, 'string3'
CatKey1, SubCatKey1, GrpKey1, SubGrpKey1, 'string4'
......

i want to group CatKye, SubCatKey, GrpKey, SubGrpKey
so, the result set to be like this (in sorted order):
myCategory, mySubCategory, myGroup, mySubGroup, myString
CatKey1, SubCatKey1, GrpKey1, SubGrpKey1, 'string1,string4'
CatKey1, SubCatKey1, GrpKey1, SubGrpKey2, 'string2,string3'


any idea? i heard there is something like ROWCONCAT... didn't find it
in db2 reference though.
thanks a lot.



Celko Answers
What you are looking for exists in Sybase as LIST() and one of the open
source products (Posttgres? I cannot remember). But that is not the
real question. Why do you wish to destroy First Normal Form (1NF) with
a concatendated list structure? It is the foundation of RDBMS, after
all.

Why are you formatting data in the back end? The basic principle of a
tiered architecture is that display is done in the front end and never
in the back end. This a more basic programming principle than just SQL
and RDBMS.


Yes, there are kludges in SQL to do this. You can also still write
procedural code with GOTOs and get "spaghetti code", but it does not
mean you should.

dynamic cursor - sorting in declaration

SQL Apprentice Question
I have a small table "ABC" like this:


id_position | value
---------------------------
1 | 11
2 | 22
3 | 33


I try to use a dynamic cursor as below.
When the statement "order by id_position" in declare part of the cursor_abc
is omitted - cursor work as it should.
But when the statement "order by id_position" is used, cursor behave as
static one.
What's the matter, does anybody know?


Code:


declare @id_position as int, @value as int


DECLARE cursor_abc CURSOR
FOR
select id_position, value from abc
order by id_position


set nocount on
open cursor_abc
FETCH NEXT FROM cursor_abc
INTO @id_position, @value


WHILE @@FETCH_STATUS = 0
BEGIN


print @id_position
print @value
print '----------------------------'


update abc set value=666 --next reading should give value=666


FETCH NEXT FROM cursor_abc
INTO @id_position, @value


END


CLOSE cursor_abc
DEALLOCATE cursor_abc
GO



Celko Answers
Please post DDL, so that people do not have to guess what the keys,
constraints, Declarative Referential Integrity, data types, etc. in
your schema are. Sample data is also a good idea, along with clear
specifications. It is very hard to debug code when you do not let us
see it.

In many years of writing SQL, I have seldom found a need for a cursor.
They usually run 1-2 orders of magnitude slower than a relational
solution.


When someone uses one, it is generally becasue they are mimicing a
magnetic tape file system, and probably violating the basic principle
of a tiered architecture that display is done in the front end and
never in the back end. This a more basic programming principle than
just SQL and RDBMS.


Finally, id_position is not an ISO-11179 data element name and it makes
no sense. Identifier of what? Position of what? You have two
adjectives without a noun. But I bet you mant it to be PHYSICAL
location because you are mimicing a magnetic tape file system, instead
of using SQL for an RDBMS.


What is your real problem? Show us and perhaps we can help you.

Thursday, November 16, 2006

Totals Help

SQL Apprentice Question
Based on this query..... how can I total just the starred SYS_OP's produced
column Compute by will sum them all.
I want daily totals of just the starred operations but still list them all.

DECLARE @sDate SmallDateTime
DECLARE @sEnd SmallDateTime
DECLARE @sPart Char(15)
SET @sDate = '20061002'
SET @sEnd = '20061005'
set @sPart = '24277'


SET NOCOUNT ON


SELECT CONVERT(CHAR(10),C_DATE,101) as JT_DATE,
JT.DEPT, JT.PART, OP AS JT_OP,
CASE
WHEN LOD.LAST_OP IS NULL THEN ''
ELSE
'***'
END
AS SYS_OP,
SHIFT, C_COUNT AS PRODUCED
FROM Label_Audit..JOB_TICKETS_RAW JT
LEFT JOIN LABEL_AUDIT..V_LAST_OP_DEPT LOD
ON LOD.PART = JT.PART
AND JT.OP = LOD.LAST_OP
WHERE C_DATE BETWEEN @sDate AND @sEnd
AND JT.PART = @sPart
ORDER BY C_DATE, OP, SHIFT


JT_DATE DEPT PART OP SYS_OP SHIFT PRODUCED
---------- ---- --------------- ----- ------ ----- -----------
10/02/2006 8800 24277 020 1 134
10/03/2006 8800 24277 020 1 201
10/03/2006 8800 24277 020 2 405
10/03/2006 8800 24277 020 3 500
10/04/2006 8800 24277 020 1 625
10/04/2006 8800 24277 020 2 620
10/04/2006 8800 24277 020 3 100
10/04/2006 8800 24277 025 2 634
10/04/2006 8800 24277 025 3 950
10/04/2006 8800 24277 030 *** 1 891
10/04/2006 8800 24277 030 *** 2 634
10/04/2006 8800 24277 030 *** 3 950
10/04/2006 8800 24277 030A 1 891
10/05/2006 8800 24277 020 3 401
10/05/2006 8800 24277 025 3 821
10/05/2006 8800 24277 030 *** 3 821


Celko Answers
1) start using ISO-8601 formats for dates, not local dialect

2) start using valid data element names (part?? Unh?! part_name?
part_nbr? part_wgt?) Is c_date "completion_date?" or what? Write code
as if another human being has to maintain it after you win the lottery.


3) Real SQL programmers do not write proprietary, deprecated COMPUTE BY
clause. This is a decades old legacy kludge from the Sybase days when
there were no report writers. Welcome to the year 2006 (actually, this
was bad code in 1996). You ought to be doing this in a proper tool and
not trying to fake a 1950's break and total file system report.


4) Your spec does not tell us what to sum! Production, maybe? And
what do we group by? Here is a skeleton query:


SELECT ..
SUM (CASE WHEN LOD.last_op IS NULL THEN c_count ELSE 0 END) AS
star_stuff
SUM (CASE WHEN LOD.last_op IS NOT NULL THEN c_count ELSE 0 END) AS
other_stuff
FROM ..
WHERE ..
GROUP BY .. ;

sql statement

SQL Apprentice Question
What is wrong int this SQL statement?

select top 10 DOCInt.*, DOCDet.* , Cate.*, Arti.*, [Ar An].*
from
DOCInt INNER JOIN DOCDet ON DOCInt.CodDoc=DOCDet.CodDoc
LEFT JOIN Cate ON DOCDet.IDCategory=Cate.[ID Category]
LEFT JOIN Arti ON DOCDet.IDArti=Arti.[ID Arti]
INNER JOIN [Ar An] ON DOCInt.IDAnag=[Ar An].[ID An]
GROUP BY DOCInt.IDDoc


Celko Answers


>> What is wrong in this SQL statement? <<


Well, your names are a nightmare that violate common sense and
ISO-11179 rules, but for now, you do not understand how a SELECT (GROUP
BY in particular) work:

Here is how a SELECT works in SQL ... at least in theory. Real
products will optimize things, but the code has to produce the same
results.


a) Start in the FROM clause and build a working table from all of the
joins, unions, intersections, and whatever other table constructors are
there. The [table expression] AS [correlation name] option allows you
give a name to this working table which you then have to use for the
rest of the containing query.


b) Go to the WHERE clause and remove rows that do not pass criteria;
that is, that do not test to TRUE (i.e. reject UNKNOWN and FALSE). The
WHERE clause is applied to the working set in the FROM clause.


c) Go to the optional GROUP BY clause, partiton the original table
into groups and reduce each grouping to a *single* row, replacing the
original working table with the new grouped table. The rows of a
grouped table must be only group characteristics: (1) a grouping column
(2) a statistic about the group (i.e. aggregate functions) (3) a
function or constant(4) an expression made up of only those three
items. The original table no longer exists and you cannot reference
anything in it (this was an error in early Sybase products).


d) Go to the optional HAVING clause and apply it against the grouped
working table; if there was no GROUP BY clause, treat the entire table
as one group.


e) Go to the SELECT clause and construct the expressions in the list.
This means that the scalar subqueries, function calls and expressions
in the SELECT are done after all the other clauses are done. The AS
operator can also give names to expressions in the SELECT list. These
new names come into existence all at once, but after the WHERE clause,
GROUP BY clause and HAVING clause have been executed; you cannot use
them in the SELECT list or the WHERE clause for that reason.


If there is a SELECT DISTINCT, then redundant duplicate rows are
removed. For purposes of defining a duplicate row, NULLs are treated
as matching (just like in the GROUP BY).


f) Nested query expressions follow the usual scoping rules you would
expect from a block structured language like C, Pascal, Algol, etc.
Namely, the innermost queries can reference columns and tables in the
queries in which they are contained.


g) The ORDER BY clause is part of a cursor, not a query. The result
set is passed to the cursor, which can only see the names in the SELECT
clause list, and the sorting is done there. The ORDER BY clause cannot
have expression in it, or references to other columns because the
result set has been converted into a sequential file structure and that
is what is being sorted.


As you can see, things happen "all at once" in SQL, not "from left to
right" as they would in a sequential file/procedural language model. In
those languages, these two statements produce different results:
READ (a, b, c) FROM File_X;
READ (c, a, b) FROM File_X;


while these two statements return the same data:


SELECT a, b, c FROM Table_X;
SELECT c, a, b FROM Table_X;


Think about what a confused mess this statement is in the SQL model.


SELECT f(c2) AS c1, f(c1) AS c2 FROM Foobar;


That is why such nonsense is illegal syntax.

Help me convince them this is bad!!!

SQL Apprentice Question
I have an IT manager that thinks he knows everything when it comes to
database design. Unfortunately, he's not familiar with relational
databases but thinks he knows it all. He's come up with a really crazy
design, and I've got to come up with all the reasons it's not right. I
want to have as much ammo as possible so we're not stuck supporting
something that will never perform well.

We're in the process of moving some functionality from a Z/OS DB2
database to a RS/6000 DB2. Currently the Z/OS database has a table
that has a four column primary key. We've determined that one of these
keys is not needed on the new system, which would leave us with a three
column key. However, the manager wants to make it a two column key, be
concatenating two of the source columns (separated by double commas)
into just one column on the new system. Searches can be done on either
of the source columns or both.


I know searching this will be horrific, with tons of LIKE statements,
and indexing will be next to useless. But I need as many reasons as
possible to not take this approach. I don't want to end up being
blamed for poor performance down the road.



Celko Answers
>> He wants to make this a generic table, that can be used for different types of data in the future. <<


Pretty clearly the concatenated column is a total violation of 1NF.
Likewise, a "generic table" a violation of logic and RDBMS. I call
such things "Britney Spears, Automobiles and Squid" tables to show that
they cannot even have a basic ISO-11179 data element name. "To be is
to be something in particular; to be nothing in particular is to be
nothing at all" -- Aristotle

Here is a little "cut *& paste" I give people who want ot do a EAV
design. Someone like your boss posted this:


CREATE TABLE EAV -- no key declared
(key_col VARCHAR (10) NULL,
attrib_value VARCHAR (50) NULL);


INSERT INTO EAV VALUES ('LOCATION','Bedroom');
INSERT INTO EAV VALUES ('LOCATION','Dining Room');
INSERT INTO EAV VALUES ('LOCATION','Bathroom');
INSERT INTO EAV VALUES ('LOCATION','courtyard');
INSERT INTO EAV VALUES ('EVENT','verbal aggression');
INSERT INTO EAV VALUES ('EVENT','peer');
INSERT INTO EAV VALUES ('EVENT','bad behavior');
INSERT INTO EAV VALUES ('EVENT','other');


CREATE TABLE EAV_DATA -note lack of constraints, defaults, DRI
(id INTEGER IDENTITY (1,1) NOT NULL,
bts_id INTEGER NULL,
key_col VARCHAR (10) NULL,
attrib_value VARCHAR (50) NULL );


INSERT INTO EAV_DATA VALUES (1, 'LOCATION', 'Bedroom');
INSERT INTO EAV_DATA VALUES (1, 'EVENT', 'other');
INSERT INTO EAV_DATA VALUES (1, 'EVENT', 'bad behavior');
INSERT INTO EAV_DATA VALUES (2, 'LOCATION', 'Bedroom');
INSERT INTO EAV_DATA VALUES (2, 'EVENT', 'other');
INSERT INTO EAV_DATA VALUES (2, 'EVENT', 'verbal aggression');
INSERT INTO EAV_DATA VALUES (3, 'LOCATION', 'courtyard');
INSERT INTO EAV_DATA VALUES (3, 'EVENT', 'other');
INSERT INTO EAV_DATA VALUES (3, 'EVENT', 'peer');


Ideally, the result set of the query would be Location Event count
(headings if possible)


Bedroom verbal aggression 1
Bedroom peer 0
Bedroom bad behavior 0
Bedroom other 2
Dining Room verbal aggression 0
Dining Room peer 0
Dining Room bad behavior 0
Dining Room other 0
Bathroom verbal aggression 0
Bathroom peer 0
Bathroom bad behavior 0
Bathroom other 0
courtyard verbal aggression 0
courtyard peer 1
courtyard bad behavior 0
courtyard other 1


Also, if possible, another query would return this result set. (I think
I know how to do this one.)


Location Event count
Bedroom verbal aggression 1
Bedroom other 2
courtyard peer 1
courtyard other 1


Here is an answer From: Thomas Coleman


SELECT Locations.locationvalue, Events.eventvalue,
(SELECT COUNT(*)
FROM (SELECT LocationData.locationvalue, EventData.eventvalue


FROM (SELECT TD1.bts_id, TD1.value AS locationvalue
FROM eav_data AS TD1
WHERE TD1.key = 'location') AS LocationData
INNER JOIN
(SELECT TD2.bts_id, TD2.value AS eventvalue
FROM eav_data AS TD2
WHERE TD2.key = 'event'
) AS EventData
ON LocationData.bts_id = EventData.bts_id
) AS CollatedEventData
WHERE CollatedEventData.locationvalue = Locations.locationvalue
AND CollatedEventData.eventvalue = Events.eventvalue
FROM (SELECT T1.value AS locationvalue
FROM EAV AS T1
WHERE T1.key = 'location') AS Locations,
(SELECT T2.value AS eventvalue
FROM EAV AS T2
WHERE T2.key = 'event') AS Events
ORDER BY Locations.locationvalue, Events.eventvalue ,
SELECT Locations.locationvalue, Events.eventvalue
(SELECT COUNT(*)
FROM (SELECT LocationData.locationvalue, EventData.eventvalue


FROM (SELECT TD1.bts_id, TD1.value AS locationvalue
FROM eav_data AS TD1
WHERE TD1.key = 'location') AS LocationData
INNER JOIN
(SELECT TD2.bts_id, TD2.value AS eventvalue
FROM eav_data AS TD2
WHERE TD2.key = 'event') AS EventData
ON LocationData.bts_id = EventData.bts_id)
AS CollatedEventData
WHERE CollatedEventData.locationvalue = Locations.locationvalue
AND CollatedEventData.eventvalue = Events.eventvalue)
FROM (SELECT T1.value AS locationvalue
FROM EAV AS T1
WHERE T1.key = 'location') AS Locations,
(SELECT T2.value AS eventvalue
FROM EAV AS T2
WHERE T2.key = 'event') AS Events;


Is the same thing in a proper schema as:


SELECT L.locationvalue, E.eventvalue, COUNT(*)
FROM Locations AS L, Events AS E
WHERE L.btd_id = E.btd_id
GROUP BY L.locationvalue, E.eventvalue;


The reason that I had to use so many subqueries is that those entities
are all plopped into the same table. There should be separate tables
for Locations and Events.


The column names are seriously painful Don't use "key" and "value" for
column names. It means that the developer *has* surround the column
name with double quotes for everything which is a serious pain.


There is such a thing as "too" generic. There has to be some structure
or everything becomes nothing more than a couple of tables called
"things". The real key (no pun intended) is commonality. Is there a
pattern to the data that they want to store? It may not be possible to
create one structure to rule them all and in the darkness bind them.


"To be is to be something in particular; to be nothing in particular is
to be nothing." --Aristole


All data integrity is destroyed. Any typo becomes a new attribute or
entity. Entities are found missing attributes, so all the reports are
wrong.


Try to write a single CHECK() constraint that works for all the
attributes of those 30+ entities your users created because you were
too dumb or too lazy to do your job. It can be done! You need a case
expression almost 70 WHEN clauses for a simple invoice and order system
when I tried it as an exercise.


Try to write a single DEFAULT clause for 30+ entities crammed into one
column. Impossible!


Try to set up DRI actions among the entities. If you thought the WHEN
clauses in the single CASE expression were unmaintainable, wait until
you see the "TRIGGERs from Hell" -- Too bad that they might not fit
into older SQL Server which had some size limits. Now maintain it.

Best way to force a varchar column to have no whitespace

SQL Apprentice Question
I have a column that I do not want any whitespace in whatsoever. I'm
wondering how do enforce this a DDL level instead of in 40 million
seat-of-the-pants after-the-fact computer programs accessing the
database.

Celko Answers
CONSTRAINT no_white_space
CHECK ( LEW(foo) = LEN (REPLACE (foo, ' ', '')))

You can then nest calls to REPLACE() for tabs,newlines, etc. easily.

Monday, October 30, 2006

Algorithm Question

SQL Apprentice Question
I have user table named TBL_USER which has userid and username fields.

I have another table named TBL_PRODUCT which need user info in it


Which way you offer me to follow .


1. Having Userid Field in TBL_Product
or
2. Having Username field in TBL_PRODUCT


to have relationship between 2 tables.


Could you explain which way i should follow and wht i should follow ?
You might give a web url ..



Celko Answers
First, stop using uppercase names that violate ISO-11179 rules. Those
prefixes are redundant, mess up the data dictionary.


>> I have user table named TBL_USER which has userid and username field [sic] <<


Columns are not fields. The source of your design problem is that you
do not know what a table, coluymn or row are. Let's try guess at the
proper DDL, since you did not bother to post anything:

CREATE TABLE Users
(user_id INTEGER NOT NULL PRIMARY KEY
CHECK (<>,
user_name VARCHAR(35) NOT NULL,
etc);
.



>> I have another table named TBL_PRODUCT which need user info in it <<


NO! Why is a user attribute part of a set of products??

CREATE TABLE Products
(product_id INTEGER NOT NULL PRIMARY KEY
CHECK (<>,
product_name VARCHAR(35) NOT NULL,
etc.);
.



>> Which way you offer me to follow .. to have relationship between 2 tables. <<


Of course.

CREATE TABLE Purchases
(product_id INTEGER NOT NULL
REFERENCES Products(product_id)
ON UPDATE CASCADE
ON DELETE CASCADE, -- guess at the rules
user_id INTEGER NOT NULL
REFERENCES Users(user_id)
ON UPDATE CASCADE
ON DELETE CASCADE, -- guess at the rules
PRIMARY KEY (user_id, product_id),
etc.);



>> Could you explain which way I should follow and why i.. <<


Any book on basic data modeling will tell you. A table is not a file;
it represents a set of entities of the same kind and their attributes
or it models a relationship -- never both in one table.

SQL search returning duplicate values

SQL Apprentice Question
I am having some trouble. I am doing some db search tests before
moving from access to mysql/sql server. Before I get into the Full
Text Search battle.....

If I try to do the following and there are multiple stores who have the
same book, all values are returned, which make duplicate entries in the
results. How do I accomplish the following but get a single price
value? How do I get the lowest, highest, middle, n'th etc value?


SELECT DISTINCT tblBook.*, tblStore.Price
FROM tblBook INNER JOIN tblStore ON tblBook.ID = tblStore.BookID
WHERE (((tblBook.Description) Like '%value%'));


tblBook tblStore
======== ========
ID------------| ID
Value |------BookID
Description Price



Celko Answers

>> Any help would be greatly appreciated. <<


Why did you make the book price an attribute of a store instead of the
book??? Think about it.

Also, read something about ISO-11179 metadata rules, so you will stop
using things like "tblBook" (a scalar value because it is singular and
it belongs to a piece of furniture -- I would have modeled a set of
book titles in my schema). Why does the book_id attribute change names
from table to table? Did you know that the ISBN-13 is the standard
identifier for books?


In the real world, you would have several prices for a book (wholesale,
retail, discounted, etc.) and they would be with the book. The min and
max prices would be known and then you can use a weighted average to
get a "middle" value. That means you need a quantity at each price
point.


Your current design is too simple for your query. Perhaps you use a
discount scale based on volume, or classes like this:


CREATE TABLE BookSales -- plural name for a set!! no redundant prefixes
(isbn CHAR(13) NOT NULL PRIMARY KEY, -- industry standards
wholesale_price DECIMAL (8,4) NOT NULL,
wholesale_qty INTEGER DEFAULT 0 NOT NULL,
retail_price DECIMAL (8,4) NOT NULL,
retail_qty INTEGER DEFAULT 0 NOT NULL,
etc.);


Start over and get some help from someone who knows how to do the model.

Wednesday, October 18, 2006

Default IN

SQL Apprentice Question
I want to pass parameters to a sql query. For example:

WHERE ANS IN (" & MyString & ")


So if MyString = 'This', 'That', 'The Other'


It will populate the Where clause.


But I also want to be able to pass an empty string and have it return
all values for ANS.


Any ideas?


Celko Answers

>> I want to pass parameters to a sql query. For example: <<


This comes up all the time from Newbies who do not work with a compiled
language and have no idea what a parameter is. You will get a FAQ to
the standard kludges for this -- parsing routines without safety
checks, dynamic SQL that can crash, etc.

One answer is that T-SQL can handle up to 1024 parameters (REAL
parameters, not things parsed off a single string in procedural code)
and you will not need more than 100 in the real world. Suddenly, you
have all the power of the compiler to check data, to optimize, etc.


Another answer is to load a working table with a single column. The
bad news is that in the year 2006, SQL is missing basic parts of the
VALUES() clause that would make this very, very easy.

Using one stored procedure to update table created in another?

SQL Apprentice Question
I have a stored procedures which creates a new table and then runs a
specific query. The results of said query are then inserted into the
newly created table. Now, I have another stored procedure which
should, in theory, update that new table with extra data.


I thought I could just do an update in the second sp but I'm hitting
some problems. Mainly, it's telling me that there's an invalid column
name, error 207, although I copied and pasted the name so the spelling
should match. It tells me that "observation_report_id" is the invalid
column name.


If it;s not the spelling of the name that's wrong, then what else could
be it be? Is there something else I'm missing that could bring up the
207 error?


If I can't get this working, could I do it with parameters? How would
I go about passing the unique key for each row from sp1 to sp2?


Thanks for any help,
-M-


Code example:


CREATE PROCEDURE frm_CNE_BasicInfo_pt2
AS


/** run query and insert results **/
update tbl_CNE_BasicInfo
set mainname=[mainname],
heading=[heading],
title=[title],
where obsRepID=observation_report_id


Select obstable.n_number, obstable.taskname as taskname,
obstable.observation_report_id, obstable.stream_id, mainname.mainname,
heading.heading, title.title
from (select p.n_number, tt.name as taskname,
orr.observation_report_id, pp.stream_id
from
parent p, parentpast pp, pastor pr, observationreport orr, trequest tr,
ttype tt
where p.parent_id = pp.parent_id
and pp.pastor_id = pr.pastor_id
and p.record_status = 'A'
and p.parent_id = orr.parent_id
and p.parent_id = tr.parent_id
and orr.trequest_id = tr.trequest_id
and tr.ttype_id = tt.ttype_id
and pp.stream_id = tr.stream_id
)as obstable
left outer join
(select ao.observation_report_id, dt.name, ao.value as mainname
from
atomicobservation ao
inner join dtype dt
on dt.dtype_id=ao.dtype_id
where dt.name = 'mainname')
as mainname
on mainname.observation_report_id=obstable.observation_report_id


etc etc



Celko Answers

>> I have a stored procedure which creates a new table and then runs a specific query. The results of said query are then inserted into the newly created table. Now, I have another stored procedure which should, in theory, update that new table with extra data. <<


This is not how to write an RDBMS. The schema is supposed to be a data
model of some real world situation. Creating tables on the fly is like
elephants appearing out of the sky.

What you seem to be doing is mimicking a file system. Hang a scratch
tape, dump some data to it. In the next *procedural* step, update the
scratch tape. In declarative programming, we try to get the desired
results in one step.


Given column names like "heading" and "title", it looks like you are
formatting a report on the database instead of in your application.
File systems and 3GL programming languages blend data and application
code into the same module, but RDBMS is supposed to be a tiered
architecture.


You can probably kludge your way thru this, but I would re-think how
you code in SQL.

Multiple bit data type in a table

SQL Apprentice Question
We are going to have a table that store about 300 bit fields (answers to
about 300 yes/no question for a client). I am curious as to whether I need
to split them based on the size of each row. How are mulitiple bit fields
in a table stored in SQL server 2000? Also if each field allows null (or
does not allow), how would that change the size of each row?

Any insight, recommendation, advice would be appreciated.
Thanks



Celko Answers

>> a table that store about 300 bit fields [sic] (answers to about 300 yes/no question for a client). <<


Bits are a low-level assembly language things that are a bitch to do
anything high-level with -- liker stat analysis. Columns are nothing
like fields. You are still thinking in assembly language, not SQL.


>> Any insight, recommendation, advice would be appreciated. <<


Do not design questionaires this way. First of all, there are very few
independent yes/no questions. You are looking for dependencies in the
data; that is the whole point of gathering data.

There are unanswered questions -- people do that! There are
interdependent questions -- "Are you female? No. Have you had cancer of
the uterus? Yes" is clearly wrong and needs a N/A answer. This means a
decision table for the validation rules (you do have those, don't you?)
and flow of questions chart.


Frankly, you probably want to get a questionaire package like RaoSoft
and use it. But if you have to use SQL, then get a normalized schema
and use an encoding for the answers that can be expanded to several
values as needed. Consider multiple choice questions -- nobody like to
do 300 questions.

Friday, October 13, 2006

need help with insert into table from two other tables

SQL Apprentice Question
have a table with 3000 Client_id's.

I have another table with 56 different order types. The order types
primary key is two fields (search_id, item_id)


I need to insert into another table each client_id, along with each
unique instance (56 instances) of search_id, item_id.


So the new table would have 3000 (client_id's) x 56 (distinct order
types) = 168000 rows.


What would be the easiest to go about this?



Celko Answers
>> I have another table with 56 different order types. The order types primary key is two fields [sic] (search_id, item_id) I need to insert into another table each client_id, along with each unique instance (56 instances) of search_id, item_id. <<


The quick answer is to look up a CROSS JOIN. You can materialize it or
put it in a VIEW.

But a better question is why are you doing this? Does each row (NOT
field) actually represent a fact in the reality of the data model?
That is, does everyone really have all the order types? My guess would
be that you are printing out a form of some kind and want to show all
the options to the users to get them to check off what they really use.
Cross joins do not happen very often in the real world.

Question About Revenue

SQL Apprentice Question
I have a reports table in my database where i am storing revenue
collected for each particular month.

Table structure
ID Int(4)
Hotel Number char(6)
Month char(1)
Year char(4)


each row can be something like 1001, 1894, 1,2006


I need to write a query which should return text 'N/A' for months that dont
contain data. For example if march 2006 has no revenue then it should return
n/a.


Can someone please assist me?


Celko Answers
Please post DDL, so that people do not have to guess what the keys,
constraints, Declarative Referential Integrity, data types, etc. in
your schema are. Sample data is also a good idea, along with clear
specifications. It is very hard to debug code when you do not let us
see it.


>> I have a reports table in my database where i am storing revenue collected for each particular month. <<


Your narrative did not have a revenue column :) We will ignore the
fact that computing values and storing them in a table is a bad
programming practice for OLTP apps and should only be done with a data
warehouse app.

CREATE TABLE ComputableSummary
(hotel_nbr CHAR(6) NOT NULL
CHECK (??),
year_month CHAR(7) NOT NULL
CHECK (year_month LIKE '[0-9][0-9][0-9][0-9]-[0-9][0-9]'),
hotel_revenue DECIMAL(12,2) NOT NULL);


This next one can be part of a general Calendar table


CREATE TABLE ReportPeriods
( cal_date DATETIME NOT NULL PRIMARY KEY,
year_month CHAR(7) NOT NULL
CHECK (year_month LIKE '[0-9][0-9][0-9][0-9]-[0-9][0-9]'),
etc.);



>> I need to write a query which should return text 'N/A' for months that don't contain data. <<


SELECT S.hotel_nbr, R.year_month, S.hotel_revenue
FROM ReportPeriods AS R
LEFT OUTER JOIN
ComputableSummary AS S
ON R.year_month = S.year_month
WHERE .. ;

And then following the basic rules of a tiered architecture, you would
do the display in the front end for NULL revenues. However, if you
want to be a bad programmer, kludge it with:


COALESCE (CAST(S.hotel_revenue AS CHAR(12)), 'N/A') AS revenus

Update Priority Field

SQL Apprentice Question
I am trying to create a priority field to organize a list of tasks.
The following 3 fields are being used: EventID (PK, int, not null),
Name (nvarchar(50), not null), pri_ss (int, null).
It would be nice if the code did several things at once:
1. Automitically add items to the list with the max pri_ss number +
1.
(i.e. if you have 50 prioritized items in the list and you add
another one, it adds it as pri_ss = 51)
2. If I change the priority of an item I would like the priority of
all the numbers below it have 1 added to their initial number. (i.e.
if i change item 25 to item 5, i want item 5 to become item 6, and item
6 to become item 7, and so on)
My environment is SQL Server 2000, and i am somewhat fimiliar with.
Any code from SQL experts would be a great help!

Celko Answers

Here is an old posting about manipulating such numberings:

Given a motorpool with numbered parking spaces, you want to move the
automobiles around.


CREATE TABLE Motorpool
(parking_space INTEGER NOT NULL PRIMARY KEY
CHECK (parking_space > 0),
vin CHAR(17) NOT NULL);


Re-arrange the display order based on the parking_space column:


CREATE PROCEDURE SwapVehicles (@old_parking_space INTEGER,
@new_parking_space INTEGER)
AS
UPDATE Motorpool
SET parking_space
= CASE parking_space
WHEN @old_parking_space
THEN @new_parking_space
ELSE parking_space + SIGN(@old_parking_space - @new_pos)
END
WHERE parking_space BETWEEN @old_parking_space AND @new_parking_space
OR parking_space BETWEEN @new_parking_space AND @old_parking_space;


When you want to drop a few rows, remember to close the gaps with this:


CREATE PROCEDURE CloseMotorpoolGaps()
AS
UPDATE Motorpool
SET parking_space
= (SELECT COUNT (M1.parking_space)
FROM Motorpool AS M1
WHERE M1.parking_space <= Motorpool.parking_space);

Friday, October 06, 2006

append results of a simple select to table without a cursor?

SQL Apprentice Question
I supect it can be done but I don't see it

scenario:


User has Items and the first user is the Default user


I want all new Users to have the same list of items as the default user


Select * from Users
right join Items
on User.UserID = Items.UserID
where User.LocalID = 0 -- This is the default user


the above gets me the list of x items for the default Customer


Is it possible to append this list to Items for the new User, without using
a cursor?


Thanks



Celko Answers


>> I want all new users to have the same list of items as the default user <<


Please post DDL, so that people do not have to guess what the keys,
constraints, Declarative Referential Integrity, data types, etc. in
your schema are. Sample data is also a good idea, along with clear
specifications. It is very hard to debug code when you do not let us
see it. ISO-11179 prefers that you use collective or plural names for
tables because they are sets, unless they really do model single
entites. Here is a skeleton of what you need:

CREATE TABLE Users
(user_id INTEGER DEFAULT 0 NOT NULL PRIMARY KEY
CHECK (user_id >= 0),
user_name VARCHAR(35) DEFAULT '{{DEFAULT}}' NOT NULL, --usps
standard length
etc.)


--set up the default user with little known trick
INSERT INTO Users(user_id, user_name, ..)
DEFAULT VALUES;


--table of items needed
CREATE TABLE Items
(item_nbr INTEGER NOT NULL PRIMARY KEY, -- needs standard code
item_name VARCHAR(35) NOT NULL,
etc.);


-- ownership is a relation so it has its own table!
CREATE TABLE Ownership
(user_id INTEGER NOT NULL
REFERENCES Users (user_id)
ON DELETE CASCADE
ON UPDATE CASCADE,
item_nbr INTEGER NOT NULL
REFERENCES Items(item_nbr)
ON DELETE CASCADE
ON UPDATE CASCADE,
PRIMARY KEY (user_id, item_name),
etc.);


-- Now you need to proc to add new users and get them a default package
to start
CREATE PROCEDURE AddNewGuy
(@new_user_id INTEGER, @new_user_name VARCHAR(35), etc.)
AS
BEGIN
INSERT INTO Users (user_id, user_name, etc.)
VALUES (@new_user_id, @new_user_name, etc.);


INSERT INTO Ownership
SELECT @new_user_id, O.item_nbr
FROM Ownership AS O
WHERE O.user_id = 0;
END;



>> Is it possible to append this list to Items for the new User, without using a cursor? <<


Of course. Why did you even think of using a cursor?

3 value logic. Why is SQL so special?

SQL Apprentice Question
3 value logic. Why is SQL so special?

what would be the consequences of NULL=NULL being true?



Celko Answers

>>, what would be the consequences of NULL=NULL being true?


NULL=NULL should not be true.
NULL=NULL should not be false.
NULL=NULL should not be UNKNOWN.
NULL=NULL should be NULL. <<

NULL is a missing *attribute* value; UNKNOWN is a *logical* value. The
first rule of NULLs is that they propagate. You can easily set up
contradictions that depend on the order evaluation when you have a
BOOLEAN data type. All SQLK data types must allow NULLs by definition.


NULL OR TRUE = NULL -- by definition
UNKNOWN OR TRUE = TRUE -- by definition


NULL AND TRUE = NULL -- by definition
UNKNOWN AND TRUE = UNKNOWN -- by definition


This is why we have the IS [NOT] [TRUE | FALSE |UNKNOWN]
predicate in SQL-92

Cursor Performance

SQL Apprentice Question
Cursor performance for a real world application.

I have a table that holds records for a stops along a bus route. Each
records has a number of people getting on, number of people getting
off, a spot check, and a current load column.
The spot check column is the to verify that the sensors on the bus are
working correctly; the most reliable number to be use in the is
SPOT_CHECK.


Table Structure as follows:


ID (identity column)
ROUTE (Description of the bus route)
ONS (# of people getting on)
OFFS (# of people getting off)
SPOT_CHECK (visual spot count of people on the bus)
LOAD (Calculated load on the bus)


ID ROUTE ONS OFFS SPOT_CHECK LOAD
1 AAAA 5 0 null
2 AAAA 0 0 null
3 AAAA 2 1 null
4 AAAA 0 2 5
5 AAAA 6 3 8
6 AAAA 0 5 null
7 AAAA 1 2 null
8 AAAA 0 1 null
9 AAAA 0 1 null
10 AAAA 0 0 null


Now, I want to calculate the load at each stop along the way.


Load = Previous stops load + current stop ONS - Current stop's OFFS if
SPOT_CHECK is null, otherwise LOAD = SPOT_CHECK


So the results of the above table will be as follows:


ID ROUTE ONS OFFS SPOT_CHECK LOAD
1 AAAA 5 0 null 5
2 AAAA 0 0 null 5
3 AAAA 2 1 null 6
4 AAAA 0 2 5 5
5 AAAA 6 3 10 10
6 AAAA 0 5 null 5
7 AAAA 1 2 4 4
8 AAAA 0 1 null 3
9 AAAA 0 1 2 2
10 AAAA 0 2 null 0


Current programming, we using a cursor and seeing much much slower
performance than in our Oracle app. Does anyone see a way of doing
this load calculation without using a cursor and improving performance?


I can't do simple sums of ons and offs to calculate load, because of
the SPOT CHECK column.


Celko Answers
>> I have a table that holds records [sic] for a stops along a bus route. <<


Please post DDL and do not confuse records and rows.


>> records [sic] has a number of people getting on, number of people getting off, a spot check, and a current load column. <<


The current load is a computed column and we do not store such
computations in a good schema design.


>> The spot check column is the to verify that the sensors on the bus are working correctly; the most reliable number to be use in the is spot_check<<


Why did you use a vague, magical, non-relational, universal "id" column
instead of a relational key? The stops are a known attribute of a
route, but you left them out of the data model in spite of talking
about them.

CREATE TABLE BusTraffic
(route_nbr INTEGER NOT NULL ,
stop_nbr INTEGER NOT NULL,
bus_nbr INTEGER NOT NULL
REFERENCES Buses(bus_nbr),
PRIMARY KEY (route_nbr, stop_nbr),
boarding_cnt INTEGER DEFAULT 0 NOT NULL,
disembark_cnt INTEGER DEFAULT 0 NOT NULL,
spotcheck_cnt INTEGER -- null means not done
);


other stuff not shown might be ..


CREATE TABLE Buses
(bus_nbr INTEGER NOT NULL PRIMARY KEY,
bus_capacity INTEGER NOT NULL,
etc.);



>> Now, I want to calculate the load at each stop along the way... I can't do simple sums of ons and offs to calculate load, because of the SPOT CHECK column. <<


Probably something like this

SELECT T2.route_nbr, T2.stop_nbr,
SUM (CASE WHEN spotcheck_cnt IS NULL
THEN (boarding_cnt - disembark_cnt)
ELSE spotcheck_cnt END) AS bus_load,
(CASE WHEN spotcheck_cnt IS NULL
THEN 'spot checked'
ELSE 'computed' END) AS verification
FROM BusTraffic AS T1, BusTraffic AS T2
WHERE T1.route_nbr = T2.route_nbr
AND T1.stop_nbr <= T2.stop_nbr;


If you can use the OLAP SUM() OVER() functions in SQL, this will be
easier and faster than a self-join.

scramble ssn with sql server

SQL Apprentice Question
I'm trying to scramble the ssn#s within our database. I would need a 9
digit number to be converted into another 9 digit number in our dev
database.


Example #1:
ssn: 123456789 converts to 987654321


Also there is a catch, there is a possibility that there could be
duplicate ssn within a table due to bad data. I was the 2 records with
the same actual ssn# to be converted into the same scrambled ssn# using
sql server (so that the scrambled ssn#s match) for this issue.


Is there a way to do this?



Celko Answers
>> create table SSN_MASK


(EW_SSN integer identity primary key not null,
OLD_SSN varchar(9)); -- not sure how you defined it <<

SSN is always CHAR(9). Simply numbering it with a proprietary feature
is not that good; you have destroyed the data type. Here is one we
used in procedural code with arrays.


CREATE TABLE SSN_masks
(shift_id INTEGER NOT NULL PRIMARY KEY -- cols 8 & 9
CHECK (shift_id BETWEEN 00 AND 99),
col1 INTEGER NOT NULL,
col2 INTEGER NOT NULL,
col3 INTEGER NOT NULL,
col4 INTEGER NOT NULL,
col5 INTEGER NOT NULL,
col6 INTEGER NOT NULL,
col7 INTEGER NOT NULL);


You take the last two digits of the SSN (fastest changing values) and
look up a vector that tells you how to shift the remaining seven
digits.


shift(x) = ABS((x + col_n) % 11-10)


Since you have 100 different masks, the data gets scrambled pretty good
and it destroys the area digits which would otherwise repeat and give
information about the population geographic distribution. This is also
reversible because we preserve the shift identifier in the output;
shift all of them and it is a bitch to unscramble without the shift
array. Then you change the array when you get the next sample.

Thursday, October 05, 2006

Table with Foreign and running total

SQL Apprentice Question
sql server noob here. class Relational DB question - I wanted to see
if my thinking is right and bounce this very simple design question off
you guys.. In particular using ASP.NET as the client and wanted to know
if I should address this on the db side or the client/business layer
side.

3 tables:


Master:
ssn varchar (10) (PK)
phone nchar(10)


Detail
ssn varchar (10) FK ?
datetime datetime ?
amt numeric


Totals:
ssn varchar (10) FK
amt numeric (should hold a running total by ssn of detail.amt)


Whenever a new detail record is entered, of course the master key must
be there, I want to keep Totals.amt current by ssn. Three questions:


1. What's the best way to keep Totals current? A trigger? Call it
automatically using db dependendancy setup? would dotNet or ADO.net
need to know or do anything.


2. I have not tried, but I can set up a compound unique key for detail
that covers ssn and datetime right? and can ssn still be a foreign key
to Master? That did not seem trivial in sql server 2000.. at a high
level how?


3. I've know some look down on the idea of running totals living inside
the db, afterall Totals.amt can be calculated at any given time. I've
seen in other designs like this, Is this terribly unacceptable? Even if
a poor design, I'm still looking for how to maintain integrity for this
design. Also wonder, would it be terrible to maintain integrity with
client code?


If you have any code or samples that would be greatly appreciated.



Celko Answers
CREATE TABLE Callers
(ssn CHAR(9) NOT NULL PRIMARY KEY, --you got the size wrong!
phone_nbr CHAR(10) NOT NULL -- us only, fixed length?
);


CREATE TABLE CallLog
(ssn CHAR(9) NOT NULL
REFERENCES Callers(ssn)
ON UPDATE CASCADE,
call_time DATETIME DEFAULT CURRENT_TIMESTAMP NOT NULL,
call_amt DEICMAL (8,2) NOT NULL
PRIMARY KEY (ssn, call_time));


CREATE VIEW CallerAcctTotals (ssn, caller_amt_tot)
AS
SELECT C.ssn, COALESCE (SUM(L.caller_amt), 0.00)
FROM Callers AS C
LEFT OUTER JOIN
CallLog AS L
ON C.ssn = L.ssn
GROUP BY C.ssn;



>> Whenever a new detail record [sic] is entered, of course the master key must


be there, I want to keep Totals.amt current by ssn. <<

Rows are not records-- nothing like them at all. Youa re still
thinking in terms of files -- names like Master and Details really show
that mindset -- tape file terms!



>> 1. What's the best way to keep Totals current? A trigger? <<


Triggers? You mean proceudral file system code! No, use a VIEW that
will always be up to date when it is invoked, not an other file thinly
disguised as a table.


>> 2. I have not tried, but I can set up a compound unique key for detail that covers ssn and datetime right? <<


DATETIME is a reserved word and tooooo vague to be a data element name.
And, yes these two columns should be the key for the log of calls
made.


>> can ssn still be a foreign key to Callers? <<


See DDL for use of UNIQUE()


>> 3. I've know some look down on the idea of running totals living inside the db, afterall Totals.amt can be calculated at any given time. I've seen in other designs like this, Is this terribly unacceptable? <<


Might okay for a data warehouse, but not for a production DB.


>> Even if a poor design, I'm still looking for how to maintain integrity for this design. <<


Constantly firing triggers that will drag performance into the ground
and kill it when it gets to production size


>> Also wonder, would it be terrible to maintain integrity with client code? <<


YES. That defeats the whole purpose of RDBMS as the one central tool
for data integrity. The minute someone uses QA or another tool to get
around you app code, the game is over. Oh, and how did you plan on
being sure that ALL zillion app programs do integrity checks the same
way? or at all?

SQL is declarative and not procedural. Your whole approach is wrong.

Creating a view do display normalised data flattened

SQL Apprentice Question
I have created the following schema to illustrate my question:


CREATE TABLE [dbo].[tblCustomerProperty] (
[cpcusID] [int] NOT NULL ,
[cpproID] [int] NOT NULL ,
[cpValue] [varchar] (50) COLLATE Latin1_General_CI_AS NOT NULL
) ON [PRIMARY]
GO


CREATE TABLE [dbo].[tblCustomers] (
[cusID] [int] IDENTITY (1, 1) NOT NULL ,
[cusName] [varchar] (100) COLLATE Latin1_General_CI_AS NOT NULL
) ON [PRIMARY]
GO


CREATE TABLE [dbo].[tblProperties] (
[proID] [int] NOT NULL ,
[proName] [varchar] (50) COLLATE Latin1_General_CI_AS NOT NULL
) ON [PRIMARY]
GO


ALTER TABLE [dbo].[tblCustomers] WITH NOCHECK ADD
CONSTRAINT [PK_tblCustomers] PRIMARY KEY CLUSTERED ([cusID]) ON
[PRIMARY]
GO


ALTER TABLE [dbo].[tblProperties] WITH NOCHECK ADD
CONSTRAINT [PK_tblProperties] PRIMARY KEY CLUSTERED ([proID]) ON
[PRIMARY]
GO


ALTER TABLE [dbo].[tblCustomerProperty] ADD
CONSTRAINT [FK_tblCustomerProperty_tblCustomers] FOREIGN KEY
([cpcusID]) REFERENCES [dbo].[tblCustomers] ([cusID]) ON DELETE CASCADE
,
CONSTRAINT [FK_tblCustomerProperty_tblProperties] FOREIGN KEY
([cpproID]) REFERENCES [dbo].[tblProperties] ([proID]) ON DELETE
CASCADE
GO


This is essentially a very normalized customer database. A number of
properties are defined in the tblProperties table. One record per
customer exists in the tblCustomer table, and one record per
customer/property combination exists in the tblCustomerProperty table.


I imagine this is fairly common in systems that need to be highly
configurable. Obviously it comes with it's performance overheads but
it's very flexible.


Assume the following data
INSERT INTO tblProperties(proID, proName) VALUES(1, 'Occupation')
INSERT INTO tblProperties(proID, proName) VALUES(2, 'Email')
INSERT INTO tblCustomers(cusName) VALUES('Fred Bloggs')
DECLARE @ID int
SELECT @ID = @@IDENTITY
INSERT INTO tblCustomerProperty(cpcusID, cpproID, cpValue) VALUES(@ID,
1, 'Computer Engineer')
INSERT INTO tblCustomerProperty(cpcusID, cpproID, cpValue) VALUES(@ID,
2, '...@bloggs.com')


My question is how can I create a view that will return a flat view of
customers? The view needs to dynamically include 'columns' specified
in the tblProperties table without having to be changed.


eg.


cusID cusName Occupation Email
1 Fred Bloggs Computer Engineer f...@bloggs.com


Many thanks!



Celko Answers

>> This is essentially a very normalized customer database. <<


Not only is it un-normalized (it never made it to normalized to be
denormalized!), it is full of ISO-11179 violations, lack of relational
keys, data and meta-data are mixed, etc.

What you have is called a EAV (entity-attribute-value) design flaw. It
is a very common design error among people who were asleep in their
database classes :) Google it; it is a disaster that will fall apart
from the lack of data integrity in about one year of production work.


Also, get rid of that silly "tbl-" prefix and camel case -- it makes
you look like an OO programmer who does not know that even MS gave up
on camel case and Hungarian notation.


Seriously, you need to start over from scratch with a real data model
instead a vague "thingies have properties" view of the world. "To be
is to be something in particular; to be nothign in particular or
everything in general is to be Nothing" -- Aristotle

Using GO in stored proc

SQL Apprentice Question
I need to alter a temp table to add column, and then access that column
in next line.As it cant be possible in a same batch , I need to put a
GO between the lines.But the problem is, these are the lines of a
stored proc,and as soos as query analyzer(sqlserver2000) finds a GO,it
takes it as a last line of the stored proc.


Can't I add column and access the same in a same sp?Is it technically
possible?


Should you have any comments or any other work around , pls share with
me.


Apologies for any inconvenience in understnding the above points.


thanks,

Celko Answers
>> Should you have any comments or any other work around , pls share with me. <<


Do not program this way at all. This is like wanting to modify an
automobile in the middle of a race. You should start the race with a
complete car and not create it on the fly. Temp tables are also a
symptom of a procedural approach to a problem rather than a relational
approach.

Until we have actual specs (i.e. What you want to do) instead of a
kludge request (I.e. how you have decided to do it already), we cannot
help you.

Thursday, September 28, 2006

Using Variables & Updating tbl: Update STATUS field from CASE exp

SQL Apprentice Question
I need help.
I am fairly new in declaring variables & can be put into one SP.

--> Update the [Status] aliased as OLD to the [NEW] value pulled from a CASE
statement.


[MyTable] table to update
[NEW] (PULLS NEW VALUES BY USING CASE EXPRESSIONS)
[OLD] (VALUE IN MYTABLE NEEDING UPDATING)
[Expire Date] Column in MyTable


-- This is what I have so far.. :
SELECT Status AS OLD, (CASE WHEN [Expire Date] < GETDATE() THEN
'Expired' ELSE CASE WHEN [Expire Date] >= getdate()
THEN 'Active' ELSE 'undefined' END END) AS NEW
FROM MyTable
WHERE (Status <> (CASE WHEN [Expire Date] < GETDATE() THEN 'Expired'
ELSE CASE WHEN [Expire Date] >= getdate()
THEN 'Active' ELSE 'UNK' END END))


I want to
-have one SP to declare these as variables
-Then update the table to something like this:


UPDATE MyTable
Set @Old = @New


---
CREATE TABLE MyTable(
[ID] [int] IDENTITY(1,1)
[Status] [char](10),
[Expire Date] [datetime])


Celko Answers


is this what you meant to post so as to have a valid name, with proper
data element names?

CREATE TABLE Items
(item_id INTEGER NOT NULL PRIMARY KEY,
expiry_date DATETIME NOT NULL,
etc.);


Unlike a deck of punch cards, we can use a VIEW (or a computed column
if you really want to be proprietary instead of portable and
maintainable):


CREATE VIEW ItemStatus(item_id, .., item_status)
AS
SELECT item_id, ..,
CASE WHEN expiry_date < CURRENT_TIMESTAMP
THEN 'expired' ELSE 'active' END
FROM Items;


Your syntax for the CASE expression was wrong. The 'unk' path will not
be executed unless you have NULLs, which I assume that you do not.


You are approaching this problem as if you were using punch cards that
have to PHYSICALLY store the data. Your "old" and "new" (reserved
words in SQL!) look like tapes in a 1950's file merge program.


Start thinking LOGICALLY, with predicates and not procedures. Start
using Standard SQL -- SQL Server has most of SQL-92 these days;
CURRENT_TIMESTAMP and not the old UNIX-based getdate(). Read ISO-11179
and learn how to name data elements.


Reply

For SQL Buffs

SQL Apprentice Question
I have MyTable (ID, BUSINESS_UNIT_ID, CUSTOMER_NUMBER). I need a list
of customer numbers that exist in more than one BUSINESS_UNIT_ID.

Celko Answers
>> I have MyTable (ID, BUSINESS_UNIT_ID, CUSTOMER_NUMBER). <<


Please post DDL, so that people do not have to guess what the keys,
constraints, Declarative Referential Integrity, data types, etc. in
your schema are. Sample data is also a good idea, along with clear
specifications. It is very hard to debug code when you do not let us
see it.

My guess is that "id" is a totally redundant, non-relational rowing
numbers that newbies who never read anything on RDBMS use. Did you
mean


CREATE TABLE CustomerAssignments
(business_unit INTEGER NOT NULL
REFERENCES OrgChart(business_unit_id)
ON UPDATE CASCADE,
customer_id NTEGER NOT NULL
REFERENCES Customers (customer_id)
ON UPDATE CASCADE
ON DELETE CASCADE ,
PRIMARY KEY (business_unit_id, customer_id));



>> I need a list of customer numbers that exist in more than one BUSINESS_UNIT_ID.<<


SELECT customer_id
FROM CustomerAssignments
GROUP BY customer_id
HAVING COUNT(*) > 1;

Without the key, you would have to use COUNT(DISTINCT business_unit_id)

Good Technique For Removing Dupes?

SQL Apprentice Question
I have a table full of records, some of which are duplicates
and I need to go through the table and remove the duplicate records.
My technique works, but it is extremely slow, and I was wondering if
anyone can tell me a faster method.


Here is a description of my current technique:


1. Declare a cursor and fetch all fields from first record from
TABLE1.
2. SELECT COUNT(*)
FROM TABLE1
WHERE Field1 = @Value1, Field2 = @Value2
3. If COUNT <= 1 THEN
This is not a duplicate record, go to the next record
ELSE
BEGIN
Must be a duplicate record.
DELETE FROM TABLE 1 where Field3 <> @Value 3
Go to the next record.
END


I am wondering if attempting to delete a record from the table
while it is being looped through and SELECTED from is causing it to be
unusually slow.


Does anyone have a different and possibly faster de-duping
algorithm?



Celko Answers
>> I have a table full of records [sic], some of which are duplicates and I need to go through the table and remove the duplicate records [sic].


1) Rows are not records; tables are not files, nor are columns
anything like fields. You are in trouble now because you did not
bother with the constraints that you needed. As soon as you scrub this
data, add those constraints and quit implementing files in SQL.

2) If you want to use cursors, then look up the "DELETE FROM [table
name] .. WHERE CURRENT OF [cursor] ;" syntax.


3) Your pseudo_code was a bit vague as to what makes a duplicate so I
would guess something like this will work:


DELETE FROM Table1
WHERE field3 <
(SELECT MAX(field3) FROM Table_1 AS T2
WHERE Table_1.field1 = T2.field1
AND Table_1.field2 = T2.field2 ;


This says find groupings based on (field1, field2) and keep the single
rows that have the max value in field3. Pretty common problem when
clenang out a history and fiedl3 is a datetime.

Wednesday, September 20, 2006

Time recording query

SQL Apprentice Question
Here is an interesting problem:
Lets say I own some imaginary company. Like for all companies, my
employees report their time to some time reporting application.
Following table stores these time recording activities:

EmpId Date Project No_of_hours
1 08/01 P1 4
1 08/01 P2 3
1 08/01 P3 1
1 08/02 P3 7
1 08/02 P1 1
1 08/03 P1 4
1 08/03 P1 4


- In given month, employees spend hours on multiple projects


I want to write a query which entering the begin and end dates and the
emp id will return the projects the employee worked on during that
period and the no_of_hours he worked on each
Output I am looking should be something similar to this:


EmpId Project No_of_hours
1 P1 30
1 P2 15
1 P3 20


Any help is appreciated ...


Celko Answers
The start and stop times are what you should have been catching in the
first place and not the computed hours. Think raw data and single
facts when designing a table. Let me use a history table for price
changes. The fact to store is that a price had a duration:

CREATE TABLE PriceHistory
(upc CHAR(13) NOT NULL
REFERENCES Inventory(upc),
start_date DATE NOT NULL,
end_date DATE, -- null means current
CHECK(start_date < end_date),
PRIMARY KEY (upc, start_date),
item_price DECIMAL (12,4) NOT NULL
CHECK (item_price > 0.0000),
etc.);


You actually needs more checks to assure that the start date is at
00:00 and the end dates is at 23:59:59.999 Hrs. You then use a BETWEEN
predicate to get the appropriate price.


SELECT ..
FROM PriceHistory AS H, Orders AS O
WHERE O.sales_date BETWEEN H.start_date
AND COALESCE (end_date, CURRENT_TIMESTAMP);


It is also a good idea to have a VIEW with the current data:


CREATE VIEW CurrentPrices (..)
AS
SELECT ..
FROM PriceHistory
WHERE end_date IS NULL;

SQL Query - A better way?

SQL Apprentice Question
I have a query that I have written and it works, but it seems a bit
redundant and I am wondering if there is a better way to write it.

My basic problem is I want to pull only 1 record, and that record is
the newest one based on the LASTUPDATE date field.


Here is my attempt at the query


SELECT * FROM ACTION_HISTORY
WHERE CASE_ID = '534623'
AND EVENTNAME='AssignedChanged'
AND LASTUPDATE = ( SELECT MAX(LASTUPDATE) FROM ACTION_HISTORY WHERE
CASE_ID = '534623' AND EVENTNAME='AssignedChanged')
ORDER BY LASTUPDATE DESC;



Celko Answers
>> My basic problem is I want to pull only 1 record [sic], and that record [sic] is the newest one based on the LASTUPDATE date field [sic]. <<


You are confusing rows and records, fields and columns, so yoiu wind up
with a data model that shows events and not facts. Each row should be
complete fact, and temporal facts have durations.

Let me use a history table for price changes. The fact to store is
that a price had a duration:


CREATE TABLE PriceHistory
(upc CHAR(13) NOT NULL
REFERENCES Inventory(upc),
start_date DATE NOT NULL,
end_date DATE, -- null means current
CHECK(start_date < end_date), -- actaualloy needs more checks
PRIMARY KEY (upc, start_date),
item_price DECIMAL (12,4) NOT NULL
CHECK (item_price > 0.0000),
etc.);


You then use a BETWEEN predicate to get the appropriate price.


SELECT ..
FROM PriceHistory AS H, Orders AS O
WHERE O.sales_date BETWEEN H.start_date
AND COALESCE (end_date, CURRENT_TIMESTAMP);


It is also a good idea to have a VIEW with the current data:


CREATE VIEW CurrentPrices (..)
AS
SELECT ..
FROM PriceHistory
WHERE end_date IS NULL;

SORT THE PHYSICAL ORDER ON A TABLE'S COLUMNS

SQL Apprentice Question
I'm working in SQL2000 . how can I move the physical position of a column
in a table by script?
for example:


select * from table1


f1 f2 f3
---- ----- -----


desired result:
f3 f2 f1
---- ----- -----


any help would be greatly appreciated.


Celko Answers
>> how can I move the physical position of a column in a table by script? <<


You have the wrong mental model of how RDBMS and SQL work. Think
logical and not physical models.

One of the many reasons that columns are not like fields is that they
have no physical location and are not required to be contigous. You
use a column name to get their data. Likewise, the rows in a table have
no ordering, unlike the records in a file. You locate them by a key.


In a file system, 3GL commands like "READ (a,b,c) FROM FileA" give you
different results than "READ (c,a,b) FROM FileA" while "SELECT a,b,c
FROM TableA" has the same data as "SELECT c,a,b FROM TableA".

What am I missing with this simple query?

SQL Apprentice Question
I am trying to add the results of this query to a new table using the
procedure below:

INSERT into db.TempCheckHistGN (EmpID,GrossPay,NetPay)
SELECT EmployeeID, SUM(GrossPay), as Expr1, Sum(NetPay) as Expr2 FROM
dbo.tblPACheckHist
WHERE (CheckDate Between '11/1/2005' AND '12/31/2010') AND (EmployeeID >=''
AND EmployeeID <='zzzzzz')
GROUP BY EmployeeID


I get a duplicate key error (EmpID is the key) but I shouldn't since the
records are being summed by EmployeeID


Celko Answers
Are the two data elements emp_id and employee_id actually the same data
element? If so, then use the same name for them. If not, check that
they converted properly on insertion. Let's clean up the code a bit
- ISO-8601 dates, consistent use of BETWEEN and removal of silly
prefixes, etc:

INSERT INTO db.TempCheckHistgn (emp_id, gross_pay, net_pay)
SELECT emp_id, SUM(gross_pay), SUM(net_pay)
FROM dbo.PayCheckHistory
WHERE check_date BETWEEN '2005-11-01' AND '2010-31-2010'
AND emp_id BETWEEN '' AND 'ZZZZZZ' -- is this redundant?
GROUP BY emp_id;



>> I get a duplicate key error (emp_id is the key) but I shouldn't since the records [sic] are being summed by emp_id. <<


Making a guess at the DDL that you did not post, I would assume that
the INSERT INTO has been run twice. I would also guess that emp_id
BETWEEN '' AND 'ZZZZZZ' covers all the personnel identifiers, but
there might ones with numerics and perhaps the all-blank one is
special. It looks weird, tho. That predicate looks more like a
CHECK() than a search condition.

You talked about "records"; you materialize and store computed
data; this is a symptom of still having a file system mindset. This
problem used to happen in file systems when the same transaction tape
was hung by the next shift.


In the relational world, we avoid temp tables of all kinds by using
virtual tables, such as a VIEW, which is guaranteed to be up-to-date
when it is invoked:


CREATE VIEW PaycheckHistorySummary (emp_id, gross_pay_tot, net_pay_tot)


AS
SELECT emp_id, SUM(gross_pay), SUM(net_pay)
FROM dbo.PaycheckHistory
WHERE check_date BETWEEN '2005-11-01' AND '2010-31-2010'
AND emp_id BETWEEN '' AND 'ZZZZZZ' -- is this redundant?
GROUP BY emp_id;

Sunday, September 10, 2006

Delete with inner join

SQL Apprentice Question
Is this the proper way of doing a join in a delete or update?
Is the a better way?
delete useraccount where useraccountid in (
select useraccountid
from UserAccount with (nolock )
inner join employee with (nolock) on employee.personId = UserAccount.PersonId
where EmployerId =66 )

Celko Answers

>> Is this the proper way of doing a join in a delete or update? <<


There is no proper way of oing a join in a delete or update; that
syntax is proprietary and subject to errors.


>> Is the a better way? <<


Assuming that you actually have more than one user account and that you
are concerned with the set of Personnel and not a single employee,
let's use names that follow ISO-11179 rules. You might be able to do
this:

DELETE FROM UserAccounts
WHERE EXISTS
(SELECT *
FROM Personnel AS P
WHERE UserAccounts.person_id = P.person_id
AND P.employer_id = 66);


This can use indexes instead of being forced to materialize a JOIN
result. You can drop the WITH (NOLOCK) options since a statement is
treated as a transaction. Think about predicates and not about
creating table. That is how we did data processing with scratch tapes
in the old days.

Please explain Ternary relationship to me.

SQL Apprentice Question
There is a very basic, simple ternary relationship.
http://explorer.x-y.net/data/f1.gif


The question is to get choose a possible combination of each A,B,C
entity count, and the number of triples in R.


I thought that since there should be 1 B for each A, max number of A->B
is the number of entities in A. And since there is a C for each A->B,
(A->B)->C is the number of A->B


But I was wrong. But the correct answer is not explained, and I
couldn't find any information about these kind of problem on the text
book. The text book just gives very simple concept explanation nothing
more. Would you please tell me more about ternary relationship and how
to count the number of triples? Thank you in advance.



Celko Answers

>> I couldn't find any information about these kind of problem on the text book. The text book just gives very simple concept explanation nothing more. Would you please tell me more about ternary relationship and how to count the number of triples? <<


You cannot break it into binary relations like you are trying to do.

Fifth Normal Form (5NF), also called the Join-Projection Normal Form or
the Projection-Join Normal Form, is based on the idea of a lossless
JOIN or the lack of a join-projection anomaly. This problem occurs
when you have an n-way relationship, where n > 2. A quick check for
5NF is to see if the table is in 3NF and all the candidate keys are
single columns.


As an example of the problems solved by 5NF, consider a table of house
notes that records the buyer, the seller, and the lender:


HouseNotes
buyer seller lender
==================================
'Smith' 'Jones' 'National Bank'
'Smith' 'Wilson' 'Home Bank'
'Nelson' 'Jones' 'Home Bank'


This table is a three-way relationship, but because many CASE tools
allow only binary relationships it might have to be expressed in an E-R
diagram as three binary relationships, which would generate CREATE
TABLE statements leading to these tables:


BuyerLender
buyer lender
=============================
'Smith' 'National Bank'
'Smith' 'Home Bank'
'Nelson' 'Home Bank'


SellerLender
seller lender
=======================
'Jones' 'National Bank'
'Wilson' 'Home Bank'
'Jones' 'Home Bank'


BuyerSeller
buyer seller
================
'Smith' 'Jones'
'Smith' 'Wilson'
'Nelson' 'Jones'


The trouble is that when you try to assemble the original information
by joining pairs of these three tables together, thus:


SELECT BS.buyer, SL.seller, BL.lender
FROM BuyerLender AS BL,
SellerLender AS SL,
BuyerSeller AS BS
WHERE BL.buyer = BS.buyer
AND BL.lender = SL.lender
AND SL.seller = BS.seller;


you will recreate all the valid rows in the original table, such as
('Smith', 'Jones', 'National Bank'), but there will also be false rows,
such as ('Smith', 'Jones', 'Home Bank'), which were not part of the
original table. This is called a join-projection anomaly.


There are also strong JPNF and overstrong JPNF, which make use of JOIN
dependencies (JD for short). Unfortunately, there is no systematic way
to find a JPNF or 4NF schema, because the problem is known to be NP
complete. This is a mathematical term that means as the number of
elements in a problem increase, the effort to solve it increases so
fast and requires so many resources that you cannot find a general
answer.