SQL for Smarties | SQL Programming Style | Trees and Hierarchies in SQL | SQL Puzzles and Answers | Data and Databases


Thursday, September 28, 2006

Using Variables & Updating tbl: Update STATUS field from CASE exp

SQL Apprentice Question
I need help.
I am fairly new in declaring variables & can be put into one SP.

--> Update the [Status] aliased as OLD to the [NEW] value pulled from a CASE
statement.


[MyTable] table to update
[NEW] (PULLS NEW VALUES BY USING CASE EXPRESSIONS)
[OLD] (VALUE IN MYTABLE NEEDING UPDATING)
[Expire Date] Column in MyTable


-- This is what I have so far.. :
SELECT Status AS OLD, (CASE WHEN [Expire Date] < GETDATE() THEN
'Expired' ELSE CASE WHEN [Expire Date] >= getdate()
THEN 'Active' ELSE 'undefined' END END) AS NEW
FROM MyTable
WHERE (Status <> (CASE WHEN [Expire Date] < GETDATE() THEN 'Expired'
ELSE CASE WHEN [Expire Date] >= getdate()
THEN 'Active' ELSE 'UNK' END END))


I want to
-have one SP to declare these as variables
-Then update the table to something like this:


UPDATE MyTable
Set @Old = @New


---
CREATE TABLE MyTable(
[ID] [int] IDENTITY(1,1)
[Status] [char](10),
[Expire Date] [datetime])


Celko Answers


is this what you meant to post so as to have a valid name, with proper
data element names?

CREATE TABLE Items
(item_id INTEGER NOT NULL PRIMARY KEY,
expiry_date DATETIME NOT NULL,
etc.);


Unlike a deck of punch cards, we can use a VIEW (or a computed column
if you really want to be proprietary instead of portable and
maintainable):


CREATE VIEW ItemStatus(item_id, .., item_status)
AS
SELECT item_id, ..,
CASE WHEN expiry_date < CURRENT_TIMESTAMP
THEN 'expired' ELSE 'active' END
FROM Items;


Your syntax for the CASE expression was wrong. The 'unk' path will not
be executed unless you have NULLs, which I assume that you do not.


You are approaching this problem as if you were using punch cards that
have to PHYSICALLY store the data. Your "old" and "new" (reserved
words in SQL!) look like tapes in a 1950's file merge program.


Start thinking LOGICALLY, with predicates and not procedures. Start
using Standard SQL -- SQL Server has most of SQL-92 these days;
CURRENT_TIMESTAMP and not the old UNIX-based getdate(). Read ISO-11179
and learn how to name data elements.


Reply

For SQL Buffs

SQL Apprentice Question
I have MyTable (ID, BUSINESS_UNIT_ID, CUSTOMER_NUMBER). I need a list
of customer numbers that exist in more than one BUSINESS_UNIT_ID.

Celko Answers
>> I have MyTable (ID, BUSINESS_UNIT_ID, CUSTOMER_NUMBER). <<


Please post DDL, so that people do not have to guess what the keys,
constraints, Declarative Referential Integrity, data types, etc. in
your schema are. Sample data is also a good idea, along with clear
specifications. It is very hard to debug code when you do not let us
see it.

My guess is that "id" is a totally redundant, non-relational rowing
numbers that newbies who never read anything on RDBMS use. Did you
mean


CREATE TABLE CustomerAssignments
(business_unit INTEGER NOT NULL
REFERENCES OrgChart(business_unit_id)
ON UPDATE CASCADE,
customer_id NTEGER NOT NULL
REFERENCES Customers (customer_id)
ON UPDATE CASCADE
ON DELETE CASCADE ,
PRIMARY KEY (business_unit_id, customer_id));



>> I need a list of customer numbers that exist in more than one BUSINESS_UNIT_ID.<<


SELECT customer_id
FROM CustomerAssignments
GROUP BY customer_id
HAVING COUNT(*) > 1;

Without the key, you would have to use COUNT(DISTINCT business_unit_id)

Good Technique For Removing Dupes?

SQL Apprentice Question
I have a table full of records, some of which are duplicates
and I need to go through the table and remove the duplicate records.
My technique works, but it is extremely slow, and I was wondering if
anyone can tell me a faster method.


Here is a description of my current technique:


1. Declare a cursor and fetch all fields from first record from
TABLE1.
2. SELECT COUNT(*)
FROM TABLE1
WHERE Field1 = @Value1, Field2 = @Value2
3. If COUNT <= 1 THEN
This is not a duplicate record, go to the next record
ELSE
BEGIN
Must be a duplicate record.
DELETE FROM TABLE 1 where Field3 <> @Value 3
Go to the next record.
END


I am wondering if attempting to delete a record from the table
while it is being looped through and SELECTED from is causing it to be
unusually slow.


Does anyone have a different and possibly faster de-duping
algorithm?



Celko Answers
>> I have a table full of records [sic], some of which are duplicates and I need to go through the table and remove the duplicate records [sic].


1) Rows are not records; tables are not files, nor are columns
anything like fields. You are in trouble now because you did not
bother with the constraints that you needed. As soon as you scrub this
data, add those constraints and quit implementing files in SQL.

2) If you want to use cursors, then look up the "DELETE FROM [table
name] .. WHERE CURRENT OF [cursor] ;" syntax.


3) Your pseudo_code was a bit vague as to what makes a duplicate so I
would guess something like this will work:


DELETE FROM Table1
WHERE field3 <
(SELECT MAX(field3) FROM Table_1 AS T2
WHERE Table_1.field1 = T2.field1
AND Table_1.field2 = T2.field2 ;


This says find groupings based on (field1, field2) and keep the single
rows that have the max value in field3. Pretty common problem when
clenang out a history and fiedl3 is a datetime.

Wednesday, September 20, 2006

Time recording query

SQL Apprentice Question
Here is an interesting problem:
Lets say I own some imaginary company. Like for all companies, my
employees report their time to some time reporting application.
Following table stores these time recording activities:

EmpId Date Project No_of_hours
1 08/01 P1 4
1 08/01 P2 3
1 08/01 P3 1
1 08/02 P3 7
1 08/02 P1 1
1 08/03 P1 4
1 08/03 P1 4


- In given month, employees spend hours on multiple projects


I want to write a query which entering the begin and end dates and the
emp id will return the projects the employee worked on during that
period and the no_of_hours he worked on each
Output I am looking should be something similar to this:


EmpId Project No_of_hours
1 P1 30
1 P2 15
1 P3 20


Any help is appreciated ...


Celko Answers
The start and stop times are what you should have been catching in the
first place and not the computed hours. Think raw data and single
facts when designing a table. Let me use a history table for price
changes. The fact to store is that a price had a duration:

CREATE TABLE PriceHistory
(upc CHAR(13) NOT NULL
REFERENCES Inventory(upc),
start_date DATE NOT NULL,
end_date DATE, -- null means current
CHECK(start_date < end_date),
PRIMARY KEY (upc, start_date),
item_price DECIMAL (12,4) NOT NULL
CHECK (item_price > 0.0000),
etc.);


You actually needs more checks to assure that the start date is at
00:00 and the end dates is at 23:59:59.999 Hrs. You then use a BETWEEN
predicate to get the appropriate price.


SELECT ..
FROM PriceHistory AS H, Orders AS O
WHERE O.sales_date BETWEEN H.start_date
AND COALESCE (end_date, CURRENT_TIMESTAMP);


It is also a good idea to have a VIEW with the current data:


CREATE VIEW CurrentPrices (..)
AS
SELECT ..
FROM PriceHistory
WHERE end_date IS NULL;

SQL Query - A better way?

SQL Apprentice Question
I have a query that I have written and it works, but it seems a bit
redundant and I am wondering if there is a better way to write it.

My basic problem is I want to pull only 1 record, and that record is
the newest one based on the LASTUPDATE date field.


Here is my attempt at the query


SELECT * FROM ACTION_HISTORY
WHERE CASE_ID = '534623'
AND EVENTNAME='AssignedChanged'
AND LASTUPDATE = ( SELECT MAX(LASTUPDATE) FROM ACTION_HISTORY WHERE
CASE_ID = '534623' AND EVENTNAME='AssignedChanged')
ORDER BY LASTUPDATE DESC;



Celko Answers
>> My basic problem is I want to pull only 1 record [sic], and that record [sic] is the newest one based on the LASTUPDATE date field [sic]. <<


You are confusing rows and records, fields and columns, so yoiu wind up
with a data model that shows events and not facts. Each row should be
complete fact, and temporal facts have durations.

Let me use a history table for price changes. The fact to store is
that a price had a duration:


CREATE TABLE PriceHistory
(upc CHAR(13) NOT NULL
REFERENCES Inventory(upc),
start_date DATE NOT NULL,
end_date DATE, -- null means current
CHECK(start_date < end_date), -- actaualloy needs more checks
PRIMARY KEY (upc, start_date),
item_price DECIMAL (12,4) NOT NULL
CHECK (item_price > 0.0000),
etc.);


You then use a BETWEEN predicate to get the appropriate price.


SELECT ..
FROM PriceHistory AS H, Orders AS O
WHERE O.sales_date BETWEEN H.start_date
AND COALESCE (end_date, CURRENT_TIMESTAMP);


It is also a good idea to have a VIEW with the current data:


CREATE VIEW CurrentPrices (..)
AS
SELECT ..
FROM PriceHistory
WHERE end_date IS NULL;

SORT THE PHYSICAL ORDER ON A TABLE'S COLUMNS

SQL Apprentice Question
I'm working in SQL2000 . how can I move the physical position of a column
in a table by script?
for example:


select * from table1


f1 f2 f3
---- ----- -----


desired result:
f3 f2 f1
---- ----- -----


any help would be greatly appreciated.


Celko Answers
>> how can I move the physical position of a column in a table by script? <<


You have the wrong mental model of how RDBMS and SQL work. Think
logical and not physical models.

One of the many reasons that columns are not like fields is that they
have no physical location and are not required to be contigous. You
use a column name to get their data. Likewise, the rows in a table have
no ordering, unlike the records in a file. You locate them by a key.


In a file system, 3GL commands like "READ (a,b,c) FROM FileA" give you
different results than "READ (c,a,b) FROM FileA" while "SELECT a,b,c
FROM TableA" has the same data as "SELECT c,a,b FROM TableA".

What am I missing with this simple query?

SQL Apprentice Question
I am trying to add the results of this query to a new table using the
procedure below:

INSERT into db.TempCheckHistGN (EmpID,GrossPay,NetPay)
SELECT EmployeeID, SUM(GrossPay), as Expr1, Sum(NetPay) as Expr2 FROM
dbo.tblPACheckHist
WHERE (CheckDate Between '11/1/2005' AND '12/31/2010') AND (EmployeeID >=''
AND EmployeeID <='zzzzzz')
GROUP BY EmployeeID


I get a duplicate key error (EmpID is the key) but I shouldn't since the
records are being summed by EmployeeID


Celko Answers
Are the two data elements emp_id and employee_id actually the same data
element? If so, then use the same name for them. If not, check that
they converted properly on insertion. Let's clean up the code a bit
- ISO-8601 dates, consistent use of BETWEEN and removal of silly
prefixes, etc:

INSERT INTO db.TempCheckHistgn (emp_id, gross_pay, net_pay)
SELECT emp_id, SUM(gross_pay), SUM(net_pay)
FROM dbo.PayCheckHistory
WHERE check_date BETWEEN '2005-11-01' AND '2010-31-2010'
AND emp_id BETWEEN '' AND 'ZZZZZZ' -- is this redundant?
GROUP BY emp_id;



>> I get a duplicate key error (emp_id is the key) but I shouldn't since the records [sic] are being summed by emp_id. <<


Making a guess at the DDL that you did not post, I would assume that
the INSERT INTO has been run twice. I would also guess that emp_id
BETWEEN '' AND 'ZZZZZZ' covers all the personnel identifiers, but
there might ones with numerics and perhaps the all-blank one is
special. It looks weird, tho. That predicate looks more like a
CHECK() than a search condition.

You talked about "records"; you materialize and store computed
data; this is a symptom of still having a file system mindset. This
problem used to happen in file systems when the same transaction tape
was hung by the next shift.


In the relational world, we avoid temp tables of all kinds by using
virtual tables, such as a VIEW, which is guaranteed to be up-to-date
when it is invoked:


CREATE VIEW PaycheckHistorySummary (emp_id, gross_pay_tot, net_pay_tot)


AS
SELECT emp_id, SUM(gross_pay), SUM(net_pay)
FROM dbo.PaycheckHistory
WHERE check_date BETWEEN '2005-11-01' AND '2010-31-2010'
AND emp_id BETWEEN '' AND 'ZZZZZZ' -- is this redundant?
GROUP BY emp_id;

Sunday, September 10, 2006

Delete with inner join

SQL Apprentice Question
Is this the proper way of doing a join in a delete or update?
Is the a better way?
delete useraccount where useraccountid in (
select useraccountid
from UserAccount with (nolock )
inner join employee with (nolock) on employee.personId = UserAccount.PersonId
where EmployerId =66 )

Celko Answers

>> Is this the proper way of doing a join in a delete or update? <<


There is no proper way of oing a join in a delete or update; that
syntax is proprietary and subject to errors.


>> Is the a better way? <<


Assuming that you actually have more than one user account and that you
are concerned with the set of Personnel and not a single employee,
let's use names that follow ISO-11179 rules. You might be able to do
this:

DELETE FROM UserAccounts
WHERE EXISTS
(SELECT *
FROM Personnel AS P
WHERE UserAccounts.person_id = P.person_id
AND P.employer_id = 66);


This can use indexes instead of being forced to materialize a JOIN
result. You can drop the WITH (NOLOCK) options since a statement is
treated as a transaction. Think about predicates and not about
creating table. That is how we did data processing with scratch tapes
in the old days.

Please explain Ternary relationship to me.

SQL Apprentice Question
There is a very basic, simple ternary relationship.
http://explorer.x-y.net/data/f1.gif


The question is to get choose a possible combination of each A,B,C
entity count, and the number of triples in R.


I thought that since there should be 1 B for each A, max number of A->B
is the number of entities in A. And since there is a C for each A->B,
(A->B)->C is the number of A->B


But I was wrong. But the correct answer is not explained, and I
couldn't find any information about these kind of problem on the text
book. The text book just gives very simple concept explanation nothing
more. Would you please tell me more about ternary relationship and how
to count the number of triples? Thank you in advance.



Celko Answers

>> I couldn't find any information about these kind of problem on the text book. The text book just gives very simple concept explanation nothing more. Would you please tell me more about ternary relationship and how to count the number of triples? <<


You cannot break it into binary relations like you are trying to do.

Fifth Normal Form (5NF), also called the Join-Projection Normal Form or
the Projection-Join Normal Form, is based on the idea of a lossless
JOIN or the lack of a join-projection anomaly. This problem occurs
when you have an n-way relationship, where n > 2. A quick check for
5NF is to see if the table is in 3NF and all the candidate keys are
single columns.


As an example of the problems solved by 5NF, consider a table of house
notes that records the buyer, the seller, and the lender:


HouseNotes
buyer seller lender
==================================
'Smith' 'Jones' 'National Bank'
'Smith' 'Wilson' 'Home Bank'
'Nelson' 'Jones' 'Home Bank'


This table is a three-way relationship, but because many CASE tools
allow only binary relationships it might have to be expressed in an E-R
diagram as three binary relationships, which would generate CREATE
TABLE statements leading to these tables:


BuyerLender
buyer lender
=============================
'Smith' 'National Bank'
'Smith' 'Home Bank'
'Nelson' 'Home Bank'


SellerLender
seller lender
=======================
'Jones' 'National Bank'
'Wilson' 'Home Bank'
'Jones' 'Home Bank'


BuyerSeller
buyer seller
================
'Smith' 'Jones'
'Smith' 'Wilson'
'Nelson' 'Jones'


The trouble is that when you try to assemble the original information
by joining pairs of these three tables together, thus:


SELECT BS.buyer, SL.seller, BL.lender
FROM BuyerLender AS BL,
SellerLender AS SL,
BuyerSeller AS BS
WHERE BL.buyer = BS.buyer
AND BL.lender = SL.lender
AND SL.seller = BS.seller;


you will recreate all the valid rows in the original table, such as
('Smith', 'Jones', 'National Bank'), but there will also be false rows,
such as ('Smith', 'Jones', 'Home Bank'), which were not part of the
original table. This is called a join-projection anomaly.


There are also strong JPNF and overstrong JPNF, which make use of JOIN
dependencies (JD for short). Unfortunately, there is no systematic way
to find a JPNF or 4NF schema, because the problem is known to be NP
complete. This is a mathematical term that means as the number of
elements in a problem increase, the effort to solve it increases so
fast and requires so many resources that you cannot find a general
answer.

working with gaps in date series

SQL Apprentice Question
I've seen some pretty creative SQL statements that locate
first/last/missing elements in a series but I haven't been able to
adapt any of them to work speedily with my data set.

Here's the problem:


We have a table of services for our clients (about 2 million rows).
The rows are simply the client's ID and the date of service. Rows may
dupe as clients may have more than one service per date.


We need to be able to define a case start and end date for each client
ID. The end date is defined as the last service date with no further
activity within some number (usually 180) of days. Furthermore, I'd
need to enumerate the cases per client.


So using the example below, I'd be looking for results such as:


Client Start End Case
55577 2/01/2004 5/11/2004 1
55577 1/09/2005 1/09/2005 2
55577 3/04/2006 OPEN 3
72395 4/04/2006 OPEN 1


In these cases, the OPEN dates indicate that there has not been a 180
day period of inactivity since the most recent date.


I currently do this in VB, looping through the ServList dataset and
populating a CaseList recordset. It takes about 30mins to run the job.
However, I'd prefer doing it all in a stored procedure and I'd prefer
to do it without use of cursors. Possible?


Thanks for any help/thoughts,
Steve


CREATE TABLE #ServList (
ClientID int
, ServDate smalldatetime)


INSERT INTO #Servlist VALUES (55577, '2/01/2004')
INSERT INTO #Servlist VALUES (55577, '2/01/2004')
INSERT INTO #Servlist VALUES (55577, '5/11/2004')
INSERT INTO #Servlist VALUES (55577, '1/09/2005')
INSERT INTO #Servlist VALUES (55577, '3/04/2006')
INSERT INTO #Servlist VALUES (55577, '5/17/2006')
INSERT INTO #Servlist VALUES (72395, '4/04/2006')
INSERT INTO #Servlist VALUES (72395, '4/05/2006')
INSERT INTO #Servlist VALUES (72395, '4/06/2006')


Celko Answers
>> We have a table of services for our clients (about 2 million rows). The rows are simply the client's ID and the date of service. Rows may dupe as clients may have more than one service per date. <<


Clear specs, thank you!! But weak DDL and you do not seem to know that
SQL uses ISO-8601 date formats, like all other ISO standards do, in the
sample data. Here is my guess at the real DDL:

CREATE TABLE ServiceTickets
(client_id CHAR(5) NOT NULL
REFERENCES Clients(client_id)
ON UPDATE CASCADE
ON DELETE CASCADE,
service_code CHAR(5) NOT NULL
REFERENCES Services(service_code)
ON UPDATE CASCADE,
service_date DATETIME DEFAULT CURRENT_TIMESTAMP NOT NULL,
service_period INTEGER DEFAULT 0 NOT NULL,
PRIMARY KEY (client_id, service_code, service_date));



>> We need to be able to define a case start and end date for each client ID. The end date is defined as the last service date with no further activity within some number (usually 180) of days. Furthermore, I'd need to enumerate the cases per client. <<


Change the way you think for a minute. Data and declarations, not
procedures and computations. SQL, not VB.


>> I currently do this in VB, looping through the ServList dataset and populating a CaseList recordset. It takes about 30 mins to run the job. <<


Build a table with (n) years of these reporting periods of 180 (or
whatever) days. A spreadsheet is great for this kind of thing.

CREATE TABLE CasePeriods -- needs better name
(service_period INTEGER NOT NULL PRIMARY KEY
CHECK (case_period_nbr > 0),
start_date DATETIME NOT NULL,
end_date DATETIME NOT NULL);


Now we can find the first service_period that a client has.


SELECT T.client_id, MIN(C.service_period)
FROM ServiceTickets AS T, CasePeriods AS C
WHERE T.service_date BETWEEN C.start_date AND C.end_date
AND end_date < CURRENT_TIMESTAMP -- completed periods only
AND service_period = 0 -- unassigned periods
AND ??
GROUP BY T.client_id
HAVING COUNT(*) > 0;


I have a question about the rules. If the guy comes in on day 1, then
comes back on day 180 and 181, are we in the same case or not? If the
guy comes in on day 1, then comes back on day 180 and 182, are we in
the same case or not?


Test this, then use it in an UPDATE statement to put a value in the
service_period column for those rows that are within your (n) day
range. We could hide this in a VIEW, but it looks complex enough to
materialize the service_period values.



>> In these cases, the OPEN dates indicate that there has not been a 180 day period of inactivity since the most recent date. <<


Ther ain't no such date as "Open"; this is why I used a zero reporting
period for the things in process.

An interesting Lastname, Firstname, Middlename challenge

SQL Apprentice Question
I'm sure you all have seen situations where a field has a combined name
in the format of "Lastname, Firstname Middlename" and I've seen
examples of parsing that type of field. But what happens when the data
within this type of field is inconsistent? Here are some examples

Apple, John A.
Berry John B.
CherryJohn C
Donald John D


How does one parse the data when the data isn't consistent?


Celko Answers
>> How does one parse the data when the data isn't consistent? <<


I get a mailing package and save myself a lot of problems -- why
re-invent the wheel?

Group 1 Software , Melissa Data Corporation and SAA are such
companies. They can scrub mailing lists MUCH better than you can --
unless you want to make a major project of it for a few years.

Change Query Table

SQL Apprentice Question
have a series of tables in the database that hold orders for of the
months
i.e.
Tbl: June06
OrderNo, Amount
0001 100
102 150

Tbl: July06


What I want to do is, when a user passes in the month say July06, I
should query from table call June06.
Instead of putting the entire query in a string, is there a way to
switch the table names?


My query is long and I have tables for couple years.


Thanks



Celko Answers

>> I have a series of tables in the database that hold orders for of the months <<


"series of tables"??? No such animal in RDBMS. A table models a set
of entities or relationship of the same -- the whole damn set, not
parts of it. This is just basic math and elementary school set theory,
not advanced stuff.

This total screw up has a name -- Attribute Splitting. You take the
values of an attribute and make them into columns or tables in the
schema. Do you also have separate table for employees based on gender
or religion? Same stupid error!



>> What I want to do is, when a user passes in the month say July06, I should query from table call June06. <<


That is one of the MANNNNNY reasons this is screwed up, non-relational
design. You are mimicing a 1950's magentic tape system! LITERALLY!!
The tape labels had "yyddd" so you could keep track of them.

Put everything in one table, create a reporting period table:


CREATE TABLE ReportPeriods
(period_name CHAR(10) NOT NULL PRIMARY KEY,
start_date DATE NOT NULL,
end_date DATE NOT NULL,
etc.);


Use a BETWEEN predicate JOIN to classify your data.


Also, name the periods in alphabetic order for sorting. That means
"2006-06" and not "Jun06"; this is a basic programming trick and not
just SQL.

Best way to insert data into tables without primary keys

SQL Apprentice Question
I am working on a SQL Server database in which there are no primary
keys set on the tables. I can tell what they are using for a key. It
is usually named ID, has a data type of int and does not allow nulls.
However, since it is not set as a primary key you can create a
duplicate key.

This whole thing was created by someone who is long gone. I don't
know how long I will be here and I don't want to break anything. I
just want to work with things the way they are.


So if I want to insert a new record, and I want the key, which is
named ID, to be the next number in the sequence, is there something I
can do in an insert sql statement to do this?



Celko Answers
>> I am working on a SQL Server database in which there are no primary keys set on the tables. <<


By definition, it is not a table at all, but a simple file written with
SQL ..


>> I can tell what they are using for a key. It is usually named ID, has a data type of int and does not allow nulls.<<


Ah yes, the Magical, Universal "id" that God put on all things in
creation. To hell with ISO-11179 and metadata, to hell with Aristotle
and the law of identity!


>> However, since it is not set as a primary key you can create a duplicate key. <<


Duplicate key is an oxymoron


>> This whole thing was created by someone who is long gone. I don't know how long I will be here and I don't want to break anything. <<


A better question; how long can an enterprise with a DB like this
survive? I'd be updating the resume and stealing office supplies.


>> > So if I want to insert a new record [sic], and I want the key, which is named ID, to be the next number in the sequence, is there something I can do in an insert statement to do this? <<


Rows are not anything like records; the failure of the first guy to
understand this is why he mimiced a magnetic tape file system's record
numbers instad of providing a relational key.

The stinking dirty kludge is to use "SELECT MAX(id)+1 FROM Foobar" in
the INSERT INTO statements. Oh, you also need to check for dups and add
a uniqueness constraint (mop the floor and fix the leak).


The right answer is to re-design this system properly.

Update Query in SQL 2005 with inner join

SQL Apprentice Question
I have the following update query

UPDATE Employee
SET Deactivated = 1
FROM Employee AS Employee_1 INNER JOIN
Assignment ON Employee_1.SSN = Assignment.SSN CROSS
JOIN
Employee
WHERE (Assignment.SCHOOLID = '0') AND (Assignment.TERM IS NULL)


and it updates, but the problem is it is updating the complete table and not
filtering with the where statement.


Any ideas?


Celko Answers
Do you really have just one Employee, as you said with your data
element name? Are you really using assembly language bit flags in SQL?
SQL programmers do not set flags; they use predicates and VIEWs to
find the state of their data. Programmers who work with punch cards
set flags.

SQL programmers also know not to use the proprietary UPDATE.. FROM..
syntax. Here is what I think you were trying to do in Standard,
portable, predictable SQL. I also cleaned up your data element names
to look more like ISO-11179:


UPDATE Personnel
SET deactivated_flag = 1
WHERE EXISTS
(SELECT *
FROM Personnel AS P, Assignments AS A
WHERE P.ssn = Personnel.ssn
AND A.ssn = Personnel.ssn
AND A.school_id = '0'
AND A.school_term IS NULL);


One of the MANY reasons that we do not use bit flags or even have
Booleans in SQL is that when someone modifes Assignments.school_term
your deactivated_flag is wrong. Now you need procedural code in a
trigger to fix this, or to run a stored procedure whenever there is any
doubt.


If you use a VIEW and quit thinking like a punch card programmer, then
the data is *always* correct:


CREATE VIEW ActivePersonnel (..)
AS
SELECT ..
FROM Personnel AS P
WHERE WHERE EXISTS
(SELECT *
FROM Assignments AS A
WHERE A.ssn = P.ssn
AND A.school_id = '0'
AND A.school_term IS NULL);

Identity column as a foreign key - help needed in logic

SQL Apprentice Question
have these tables as shown below. Say I want to duplicate a condition
group with ID = 10. Notice that there are identity columns in the Conditions
and Values tables. I can use a INSERT INTO ... SELECT FROM to insert new
rows into the ConditionGroups table. But when I get to Conditions and Values
subsequently I will need to get the generated identity value first before I
insert values.


What's the best way to do this? I want to do this in the database itself.
Are cursors avoidable?

--------------


CREATE TABLE [dbo].[ConditionGroups] (
[ID] [int] IDENTITY (1, 1) NOT NULL ,
[Name] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL
) ON [PRIMARY]
GO


CREATE TABLE [dbo].[Conditions] (
[ID] [int] IDENTITY (1, 1) NOT NULL ,
[ConditionGroupID] [int] NULL ,
[Lhs] [varchar] (10) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL ,
[Operator] [char] (3) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL
) ON [PRIMARY]
GO


CREATE TABLE [dbo].[Values] (
[ID] [int] NOT NULL ,
[ConditionID] [int] NOT NULL ,
[RhsValue] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL
) ON [PRIMARY]
GO



Celko Answers


>> Can it be better modeled? <<


SQL is for data and **not** for rules. You ought to be using Prolog or
LISP. You are trying to drive nails with a pumpkin. Having said all
that, people will call me mean if I do not give you this kludge:

I think what you want is the ability to load tables with criteria and
not have to use dynamic SQL:


skill = Java AND (skill = Perl OR skill = PHP)


becomes the disjunctive canonical form:


(Java AND Perl) OR (Java AND PHP)


which we load into this table:


CREATE TABLE Query
(and_grp INTEGER NOT NULL,
skill CHAR(4) NOT NULL,
PRIMARY KEY (and_grp, skill));


INSERT INTO Query VALUES (1, 'Java');
INSERT INTO Query VALUES (1, 'Perl');
INSERT INTO Query VALUES (2, 'Java');
INSERT INTO Query VALUES (2, 'PHP');


Assume we have a table of job candidates:


CREATE TABLE Candidates
(candidate_name CHAR(15) NOT NULL,
skill CHAR(4) NOT NULL,
PRIMARY KEY (candidate_name, skill));


INSERT INTO Candidates VALUES ('John', 'Java'); --winner
INSERT INTO Candidates VALUES ('John', 'Perl');
INSERT INTO Candidates VALUES ('Mary', 'Java'); --winner
INSERT INTO Candidates VALUES ('Mary', 'PHP');
INSERT INTO Candidates VALUES ('Larry', 'Perl'); --winner
INSERT INTO Candidates VALUES ('Larry', 'PHP');
INSERT INTO Candidates VALUES ('Moe', 'Perl'); --winner
INSERT INTO Candidates VALUES ('Moe', 'PHP');
INSERT INTO Candidates VALUES ('Moe', 'Java');
INSERT INTO Candidates VALUES ('Celko', 'Java'); -- loser
INSERT INTO Candidates VALUES ('Celko', 'Algol');
INSERT INTO Candidates VALUES ('Smith', 'APL'); -- loser
INSERT INTO Candidates VALUES ('Smith', 'Algol');


The query is simple now:


SELECT DISTINCT C1.candidate_name
FROM Candidates AS C1, Query AS Q1
WHERE C1.skill = Q1.skill
GROUP BY Q1.and_grp, C1.candidate_name
HAVING COUNT(C1.skill)
= (SELECT COUNT(*)
FROM Query AS Q2
WHERE Q1.and_grp = Q2.and_grp);


You can retain the COUNT() information to rank candidates. For example
Moe meets both qualifications, while other candidates meet only one of
the two. You can Google "canonical disjunctive form" for more details.
This is a form of relational division.

Query Problem

SQL Apprentice Question
I have an orders table and an order_items table. Simply, they look like
this:


Orders:
ID | Status
-------
0 | New
1 | InProgress
2 | InProgress


Order_Items:
ID | Ord_ID | Supplier | Status
-------------
0 | 0 | Fred | New
1 | 1 | Fred | New
2 | 1 | Fred | Complete
3 | 2 | Fred | New
4 | 2 | Joe | Complete


When Joe wants to view his 'Complete' Orders, he should see order 2,
because all his items for order 2 are complete (even though its
orderstatus is inprogress)


When Fred wants to view his new orders, he should see order 0 and 2
(because all his items for 2 are new), and order 1 should be seen as
inprogress.


How can i write a query which given a supplier and status (either new,
inprogress or complete) will return all the relevant orders?


Thanks



Celko Answers
Please post DDL, so that people do not have to guess what the keys,

constraints, Declarative Referential Integrity, data types, etc. in


your schema are.


Look at that Orders table -- vague "id" that might or might not be a
key (you are not using IDENTITY for order numbers, are you!!??); vague
"status" attribute (what kind of status??). Cann't this be done in a
view off of the OrderItems table instead of mimicing a file? Look up
how to name a data element.


Here is my guess and corrections to your "pseudo-code" non-table


CREATE TABLE Order_Items
(order_nbr INTEGER NOT NULL
CHECK (<< validation rule here>>),
item_nbr INTEGER NOT NULL
CHECK (item_nbr > 0),
supplier_name CHAR(10) NOT NULL
REFERENCES Suppliers (supplier_name)
ON UPDATE CASCADE,
item_status CHAR(1) DEFAULT 'N' NOT NULL
CHECK (item_status IN ('N', 'C'), -- new, completed
PRIMARY KEY (order_nbr, item_nbr));


Notice the use of a relational key, instead of mimicing a tape file
record number? The use of IDENTITY for items in a bill of materials or
order problem screw up things. Use an item number within the order
number.


INSERT INTO Order_Items VALUES (0, 1, 'Fred', 'N');
INSERT INTO Order_Items VALUES (1, 1, 'Fred', 'N');
INSERT INTO Order_Items VALUES (1, 2, 'Fred', 'C');
INSERT INTO Order_Items VALUES (2, 1, 'Fred', 'N');
INSERT INTO Order_Items VALUES (2, 2, 'Joe', 'C');


That is, in Order #2, Fred supplied item #1 and Joe supplied item #2.
Lot easier to track things with a proper design. Now throw out your
redundant table:


CREATE VIEW OrderStatus (order_nbr, supplier, order_status)
AS
SELECT order_nbr, supplier,
CASE WHEN MIN(item_status) = 'New'
THEN 'New'
WHEN MAX(item_status) = 'Complete'
THEN 'Complete'
ELSE 'In Progress' END;
FROM Order_Items
GROUP BY order_nbr, supplier;


The VIEW is always current and you do not have to keep writing to disk
to mimic a physical file.

Why do I get different results with this?

SQL Apprentice Question
This is taken from a large but not very complex SQL statement.


,SUM (CASE ELUBECOUPONS.TICKETID WHEN NULL THEN 0 ELSE 1 END)


when I execute the statement with the one above the answer is 0 rows.


If I change that statement to:


,SUM (CASE WHEN ELUBECOUPONS.TICKETID IS NULL THEN 0 ELSE 1 END)


the answer is 624,510 rows


Same database, I only changed this one result column.


I clearly don't understand something about the case statement.

Celko Answers
What is the basic rule about NULLs? They cannot be compared to
anything, even eaqch other! You mean to use the other form of CASE
expression:

SUM (CASE WHEN ElubeCoupons.ticket_nbr IS NULL THEN 0 ELSE 1 END)


Since SUM() drops out NULLs, you could use this with numeric ticket
numbers.


SUM (SIGN (ElubeCoupons.ticket_nbr))


Little data modeling thing; a "_nbr" implies a sequence or other
generating rule for the numeric or pseudo-numeric value. "_id" just
says that the value is unique. That is why we talk about ticket
numbers and not ticket identifiers.

Saturday, September 09, 2006

Graph Representations

SQL Apprentice Question
have a problem that's twisting my mind up. The summary of the problem is
that I have table of organizations, each of which can function in one of two
roles at any given time - call them Role A and role B. These organizations
will have relationships between them (I imagine it programatically as a
directed graph or linked list)...possibly to an infinite degree. For
example - representing the organizations by numerals (maybe their primary
key in the table) and the roles as defined above - we might have the
following:


(org 1 in role A -> org 2 in role B -> org 3 in role A...)


1A->2B->3A->4B->5A
|->6A


(org 2 in role A -> org 3 in role B...)


2A->3B->5A
|->2B->4A
|->6A
|->7A


As you can see, each organization will be a "root" node, but then the path
can take nearly progression to and from the other organizations, having an
infinite number of traditional "edges" in a graph. The graph would
ultimately end at one or more "leaf" nodes (as represented above). This
graph represents the relationships between the associated organizations.
There is no mutual exclusion between the paths: in other words, multiple
organizations may have a relationship with 5A (org 5 in role A) - as shown
above.


How in the world do I represent these relationships in a database structure?
Please help.




Celko Answers
Try a modifed nested sets model. Nodes have a compound key:

CREATE TABLE Nodes
(node_id INTEGER NOT NULL
CHECK(node_id > 0),
node_type CHAR(1) NOT NULL
CHECK(node_type IN ('A', 'B'),
PRIMARY KEY (node_id, node_type),
etc.);


The forest of various arrangements of nodes has to identify each tree
in that forest:


CREATE TABLE Forest
(tree_id INTEGER NOT NULL,
node_id INTEGER NOT NULL,
node_type CHAR(1) NOT NULL,
REFERENCES Nodes (node_id, node_type)
ON UPDATE CASCADE
ON DELETE CASCADE,
PRIMARY KEY (tree_id, node_id)
lft INTEGER NOT NULL,
rgt INTEGER NOT NULL,
UNIQUE (tree_id, lft),
etc.
);

hierarchical query 2005

SQL Apprentice Question
I am working with visual studio 2005 and sql server 2005 workgroup edition.
I have three tables where each row has an ID and a PID.
What I want to do is create either a hierarchical query to fill a data set
or create the dataset itself as an hierarchial dataset.
Is this possible with the tools I`m using and if so, how?


Celko Answers
>> I have three tables where each row has an ID and a PID. <<


Please post DDL, so that people do not have to guess what the keys,
constraints, Declarative Referential Integrity, data types, etc. in
your schema are. Sample data is also a good idea, along with clear
specifications. It is very hard to debug code when you do not let us
see it.

Also, why did you think that "id" and "pid" have any meaning to other
people? "id" is a postfix in ISO-11179 which is too vague to stand by
itself -- it begs the question "identiifer of what?" by its very
nature.



>> What I want to do is create either a hierarchical query to fill a data set or create the dataset itself as an hierarchial dataset. <<


What is a " hierarchical query"? That term was never used in SQL while
I was on the Committee. You might want to get a copy of TREES &
HIERARCHIES IN SQL
for *several* different ways to model a hierarchy or
a tree. But first, you might want to learn some basics.


SQL Apprentice Question
I`m sorry for being a bit unclear.
the tables are:

category:
cat_id int (PK)
cat_name nvarchar


subcategory:
subcat_id int (PK)
parent_id int
subcat_name nvarchar


items:
item_id int (PK)
parent_id int (can be category or sub-category)
item_name nvarchar
item_desc nvarchar


I want to display the data (in some control, tree or other, I`m open to
suggestions)
so that each category contains the items and sub-categories where the parent
id is the id of the category and so on going into the hierarchy.


Celko Answers

>> I`m sorry for being a bit unclear. <<>> );


Google "nested sets model" explain the following


CREATE TABLE FishTaxonomy
(fish_id INTEGER NOT NULL
REFERENCES Fishes (fish_id)
ON UPDATE CASCADE,
lft INTEGER NOT NULL UNIQUE,
rgt INTEGER NOT NULL UNIQUE,
CHECK(lft < rgt) );


A given fish_id and all their superiorss, no matter how deep the tree.


SELECT F2.*
FROM FishTaxonomy AS F1, FishTaxonomy AS F2
WHERE F1.lft BETWEEN F2.lft AND F2.rgt
AND F1.fish_id = :my_fish_id;


2. The fish_id and all their subordinates. There is a nice symmetry
here.


SELECT F1.*
FROM FishTaxonomy AS F1, FishTaxonomy AS F2
WHERE F1.lft BETWEEN F2.lft AND F2.rgt
AND F2.fish_id = :my_fish_id;

Passing parameter into SP for permissions?

SQL Apprentice Question
How can I pass a user name into a stored procedure I've created that
assigns certain table and SP permissions? The EXECUTE doesn't seem to
allow variables when permissions are involved. It wants literals, i.e.
'John' instead of @username.

Celko Answers
>> How can I pass a user name into a stored procedure I've created that assigns certain table and SP permissions? <<


You can probably do it with dynamic SQL, but why not use the DCL and
keep your system secure?

You do have a security officer who creates and monitors the user
accounts, don't you? Or do you really do it at the application level
on the fly? If so, have you told the security officer and the auditors
about this "feature" to subvert authority?

How to erase hundreds of DEFAULT values?

SQL Apprentice Question
I'm newbie (yet) in MS SQL2k and I have a big problem, so I would like
to please for help:


There are 9 big databases with a _lot_ of user tables.


I had to insert 2 new fields into _every_ user tables (at the end). It
succeeded, but noticed that I made a mistake: set a Default value for
them. But they should had been empty :-/


These are the new fileds:


Modify_vC varchar (50) NULL Default: suser_sname()
Modify_Dt datetime NULL Default: getdate()


So I wrote (mainly copy-pasted from this newsgroup Thanks for it! :) a
script, but it does not change DEFAULT's value.


(
I tried also to attach "DEFAULT NULL" at the end of this line, but it
throws an error.
exec ('ALTER TABLE ' + @table_name + ' ALTER COLUMN Modosito_vC
varchar (50) NULL DEFAULT NULL')
)


How can I erase DEFAULT's value for these two fields? By hand, it would
take for a year...
(to run 9x (9 DB's) is OK, but to table to table would be horrible)


Thanks for your help.
Bálint


The script:
------------------------- script --------------------------
USE database1


declare @table_name sysname
declare tables_cursor cursor local fast_forward
for
select
quotename(table_schema) + '.' + quotename(table_name)


from
information_schema.tables
where
table_type = 'base table'
and objectproperty(object_id(quotename(table_schema) + '.' +
quotename(table_name)), 'IsMSShipped') = 0


open tables_cursor


while 1 = 1
begin
fetch next from tables_cursor into @table_name


if @@error != 0 or @@fetch_status != 0 break


exec ('ALTER TABLE ' + @table_name + ' ALTER COLUMN Modosito_vC
varchar (50) NULL')
exec ('ALTER TABLE ' + @table_name + ' ALTER COLUMN Modositas_Dt
datetime NULL')


end


close tables_cursor
deallocate tables_cursor


go


Celko Answers
>> had to insert 2 new fields [sic] into _every_ user tables (at the end). It succeeded, but noticed that I made a mistake: set a Default value for them. But they should had been empty :-/ <<


No, they should not have been added to the schema at all. First of
all, they are not attributes in a proper data model. They have nothing
to do wiht the entities to which they are attached.

Secondly, their names include their data type in violation of ISO-11179
conventions and good programming. Thaty is a pure newbie thing where
you carry over old programming habits to the new language. Also, not
knowing the columns and fields are totally different concepts.


Third, it is illegal under SOX and several other laws have audit
information in the same schema as the data. The audit trail has to be
external to the data and requires at least two independent
confirmations. Any single user with full rights on your tables can
change or destroy the audit trail.