SQL for Smarties | SQL Programming Style | Trees and Hierarchies in SQL | SQL Puzzles and Answers | Data and Databases


Friday, December 28, 2007

A Strange Use of UDFs?

SQL Apprentice Question
started a new project recently and the .Net/SQL Server 2000
application I was hired to help with was developed by a large
consulting firm. All the data access for the application is done via
stored procedures, but most stored procedure data access is done
via functions. So, you might have a SELECT inside a stored
procedure that looks similar to this:

SELECT e.LastName, et.TerritoryDescription, en.NADescription
FROM Employees e
LEFT JOIN fxEmpNat() en ON e.EmployeeID = en.EmployeeID
LEFT JOIN fxEmpTer() et ON e.EmployeeID = et.EmployeeID
WHERE e.LastName = 'Fuller'


And the function definitions are *similar* to those created in the
script listed at the end of this post (using the Northwind database).
I would solve the same problem using this approach:


SELECT e.LastName, t.TerritoryDescription, n.NADescription
FROM dbo.Employees e
LEFT JOIN dbo.EmployeeTerritories et ON e.EmployeeID = et.EmployeeID
JOIN dbo.Territories t ON et.TerritoryID = t.TerritoryID
LEFT JOIN dbo.EmployeeNationality en ON e.EmployeeID = en.EmployeeID
JOIN dbo.Nationality n ON en.NationalityID = n.NationalityID
WHERE e.LastName = 'Fuller'


The original developers are long gone and no one currently working
on the project knows why the function-centric approach was used.
There are no security restrictions that would merit such an approach,
and even if there were I would think a VIEW solution would be the
more traditional approach. And there are only a couple of complicated
relationships that one might want to "hide" from a less experienced
developer that did not know the intricacies of the data. Anybody have
any ideas when this approach would be justified?


-- Addl. Table and Function Definitions
CREATE TABLE Nationality
(
NationalityID int PRIMARY KEY,
NADescription varchar(20) NOT NULL
)
go
CREATE INDEX IX_Nationality_NADescription ON Nationality(NADescription)
go
INSERT Nationality VALUES (1,'America')
INSERT Nationality VALUES (2,'Canada')
INSERT Nationality VALUES (3,'Angola')
go


CREATE TABLE EmployeeNationality
(
EmployeeID int,
NationalityID int,
CreateDate datetime
CONSTRAINT PK_EmployeeNationality
PRIMARY KEY NONCLUSTERED
(EmployeeID,NationalityID,CreateDate)
)
go
INSERT EmployeeNationality values(1,1,'01/01/80')
INSERT EmployeeNationality values(1,1,'01/01/90')
INSERT EmployeeNationality values(2,1,'01/01/90')
INSERT EmployeeNationality values(3,2,'01/01/90')
INSERT EmployeeNationality values(4,3,'01/01/90')
go


CREATE FUNCTION fxEmpNat()


RETURNS TABLE


AS


RETURN (
SELECT e.EmployeeID, n.NADescription
FROM dbo.Employees e
JOIN dbo.EmployeeNationality en
ON e.EmployeeID = en.EmployeeID
AND en.CreateDate = (SELECT MAX(en2.CreateDate)
FROM EmployeeNationality en2
WHERE en.EmployeeID = en2.EmployeeID
AND en.NationalityID = en2.NationalityID )
JOIN dbo.Nationality n ON en.NationalityID = n.NationalityID
)
go


CREATE FUNCTION fxEmpTer()


RETURNS TABLE


AS


RETURN (
SELECT e.EmployeeID, t.TerritoryDescription
FROM dbo.Employees e
JOIN dbo.EmployeeTerritories et ON e.EmployeeID = et.EmployeeID
JOIN dbo.Territories t ON et.TerritoryID = t.TerritoryID
)
go



Celko Answers
One reason I can think of is job security. This code will never port
and can be read only by dialect speakers. A function call cannot be
optimized like a VIEW, so you are at risk for poor performance, as
well as the maintenance problems, as time goes on.

Another reason is that they are not SQL programmers and do not think
in terms of declarations. They want to see the familiar function call
they know from procedural languages.



Original source

Using WHERE clause parameter

SQL Apprentice Question
When I try to use an SP with parameter that is the WHERE clause it generates
an error.

E.g,

@myWhere = varchar(200)

AS

SELECT x, y FROM skwi WHERE @myWhere

myWhere = status = 7 AND LastName = 'Smith'


The problem is that the where clause is built conditionally in the program.
Any advise and examples on how to accomplish would be appreciated.



Celko Answers

>> When I try to use an SP with parameter that is the WHERE clause it generates an error. <<


The short, dangerous kludge is to use Dynamic SQL.

The right answer is to get out that old text book on Software
Engineering and the chapters on coupling and cohesion of code
modules. Those rules still apply in SQL.


You are writing a "Britney Spears, Automobiles and Squids" module --
you have no idea what it will do at run time, so it has absolutely no
cohesion. Instead of depending on every random future user to write
proper SQL, you need to earn your salary and proper them with a well-
defined module with a meaningful name and a known parameter list.


If you want a general query tool, then use QA. Application users
should be kept away form it.





Original source

Thursday, December 27, 2007

How do I group in a union?

SQL Apprentice Question
I have a select statement that uses union to pull data from multiple
databases and return them in a single recordset. I want to group these
results using group by. How do I do that?


Here's what I have:


SELECT EmployeeID, ProjectID
FROM DB1.table1
UNION
SELECT EmployeeID, ProjectID
FROM DB2.table1


This works fine but I want to group all projects by EmployeeID. I tried the
following but it didn't work


SELECT EmployeeID, ProjectID
FROM DB1.table1
UNION
SELECT EmployeeID, ProjectID
FROM DB2.table1
GROUP BY EmployeeID


I'd appreciate some help here.


Celko Answers
The results of a UNION do not have column names

SELECT X.emp_id, COUNT(project_id) AS project_tot
FROM (SELECT emp_id, project_id
FROM DB1.Table1
UNION
SELECT emp_id, project_id
FROM DB2.Table1)
AS X(emp_id, project_id)
GROUP BY X.emp_id;


UNION ALL will be faster, if it is possible.




Original source

when calling UPDATE from the result set of a SELECT statement, is the order in which rows from the SELECT statement 100% geronteed?

SQL Apprentice Question
when calling UPDATE from the result set of a SELECT statement, is the order
in which rows from the SELECT statement 100% geronteed?

tableA


myid
-----------
0
1
2
3
4
5


UPDATE tableB u
SET myid = i.myid
FROM tableA i
ORDER BY myid


Will this always update tableB with 5 since it is the last one? is this 100%
garonteed to follow the order of the source result set?


Celko Answers
No. There is no ordering in a table by defintion. Since this strictly
proprietary syntax it an do anything MS feels like next week. You are
just looking for the comfort of a 1960's sequential file system
instead of learning to think in RDBMS.


Original source

Which values do NOT appear within a tolerance value Options

SQL Apprentice Question
I have a table of servers that receive updates on a regular basis

CREATE TABLE [dbo].[tbl_ServerUpdate](
[ServerUpdateID] [int] NOT NULL,
[Server] [nvarchar](256) NOT NULL,
[UpdateStamp] [datetime] NOT NULL,
)


I then have tolerance values in another table which I can get like this
(they are in terms of minutes):


DECLARE @Interval INT


SELECT @Interval = IntervalValue
FROM tbl_Interval (NOLOCK)
WHERE IntervalID = 1


I want to know which servers have NOT received an update within the
tolerance value. For example, give me the set of Server that have not
received an update in the last 5 (@Interval) minutes.


Celko Answers
What is an update_id? What would it mean in a logical data model?
Surely, you did NOT just physically number the rows in a table!

This table also had no key, so I made an assumption that you want to
use the (originally very long AND possibly in Chinese!) server names.
But isn't the update interval logically an attribute of each server?
Shouldn't it be in the Servers table? I cannot see an interval
floating around as an entity in itself.


CREATE TABLE Servers
(server_name VARCHAR(25) NOT NULL PRIMARY KEY,
update_stamp DATETIME DEFAULT CURRENT_TIMESTAMP NOT NULL,
update_interval INTEGER NOT NULL
CHECK(update_interval > 0),
etc.
);


If you need this a lot, put it in a VIEW.


CREATE VIEW ExpiredServers (server_name)
AS
SELECT server_name
FROM Servers
WHERE update_stamp
< DATEADD(MI, -update_interval, CURRENT_TIMESTAMP);




SQL Apprentice Question
In the real world, everything is not a beautifully static picture of
relational data. I didn't include the key because I don't think its relevant
to the solution. But since you brought it up: is it better to have a natural
key on a varchar or an identity key as an integer that doesn't really have
any intrinsic business meaning

? In theory, it should be the varchar, but in practice its more performant
to index an int rather than a varchar(25). Is that not true?


The interval is not an attribute of the server, but rather an attribute of
the update type. In my case, there are seven different minute intervals. So,
in the end, I need to know - for each update type - which server did not
receive it within the last [interval] (in terms of minutes). The interval is
configurable by the operations group - sometimes they may want to be alerted
when a server didn't receive an update in the last 3 minutes, sometimes in
the last 10 minutes. They can change this real-time in the database.


I am not a TSQL guru - thats why I use this newsgroup from time to time. I
also have every one of your books on my shelf. However, the corporate world
is NOT academic and there is a tradeoff between having an understandable
data model that is easy to work with and having a data model that can serve
as a model in CS405 (or whatever). Sometimes we have to denormalize or
flatten data to be able to run reports in an acceptable time; sometimes we
have to add bit columns that could easily be derived from another to improve
query performance.


The balance you give is valuable, so I hope you continue to berate us when
our designs violate, but I'll bet - despite all the companies you have
worked with - you can't name one that had a data model that gave you a
boner.



Celko Answers
>> is it better to have a natural key on a VARCHAR(n) or an IDENTITY key [SIC] as an integer that doesn't really have any intrinsic business meaning? <<


How important is data integrity? Why not use a pointer chain DB
instead of mimicking it in SQL? At least they have features to do
garbage collection, restore chains, etc. Eventually, denormalization
and short-cuts will come back and bite you.


>> In theory, it should be the VARCHAR, but in practice its more performant to index an INTEGER rather than a VARCHAR(25). Is that not true? <<


It depends on the product. You automatically assumed indexing; look
at hashing in Teradata. Longer strings for keys lead to easy perfect
hashing, which is always one probe as opposed to multiple probes with
an index when the database gets large. If you are using hidden
pointer chains like SQL Anywhere to implement PK-FK, there is no
difference. SQL Server does have a lot of "code museum" problems and
this is one of them; one type of simple B+ Tree index is used for all
data types and distributions. However, until you get to large DBs, it
works fine and it will get a shot in the arm from 64-bit hardware,
too.


>> The interval is not an attribute of the server, but rather an attribute of the update type. In my case, there are seven different minute intervals. So, in the end, I need to know - for each update type - which server did not receive it within the last [interval] (in terms of minutes). The interval is configurable by the operations group - sometimes they may want to be alerted when a server didn't receive an update in the last 3 minutes, sometimes in the last 10 minutes. They can change this real-time in the database. <<


A type is an attribute by definition, so what entity does the
update_type belong to, if not a server? I would think from this
description, you would have seven columns for the logically different
updates, and seven columns for their corresponding intervals. Or is
this a repeating group where not all update types apply to all servers
and the update types can be changed?


>> I also have every one of your books on my shelf. <<


Neat! Number 7 is out in 2008 February.


>> .. I'll bet - despite all the companies you have worked with - you can't name one that had a data model that gave you a boner. <<


LOL! Remember what I do for a living -- this is like asking a doctor
why all his patients are sick!

The best one I can remember just off hand was for a credit management
company. They had just designed it and wanted a two day review from
me. The only real problem I found was that a bunch of the columns
were stubbed in with a magical CHAR(1) NOT NULL data type; they were
still working on the encodings and waiting for advise from legal.


Another one was a software company with a portal product that manages
corporate software access. We replaced ~60 tables with a nested set
model that lead to ~6 tables for the core processes. In fairness, the
original model had grown over time from a denormalized model on a
small platform to a mainframe tool. People kept adding tables to it
as a work-around and it had become a jungle.



Original source

QUOTED_IDENTIFIER & ANSI_NULLS Options

SQL Apprentice Question
does anyone know how to keep QA from adding the lines setting these
two options on and off along with blank lines at the beginning and end
of every object you edit? i have searched quite a bit on this but
haven't been able to come up with anything.


Celko Answers
>> is there a reason I wouldn't want to do this? <<


Conformance to ANSI/ISO Standards should be a goal in any shop, so you
would not turn off options that bring you to that goal. Why would you
want to write your own database language?



Original source

update table dateCol3 to the later of dateCol1 or dateCol2 Options

SQL Apprentice Question
CREATE TABLE #tmp1(rowID int identity(1,1), dateCol1 datetime, dateCol2
datetime, dateCol3 datetime)

INSERT INTO #tmp1(dateCol1, dateCol2)
SELECT '1/1/05', '2/1/05'
UNION ALL SELECT '3/1/05', '3/7/05'
UNION ALL SELECT '4/1/05', '3/20/05'
UNION ALL SELECT '5/1/05', '5/13/05'


UPDATE #tmp1 SET dateCol3 = ?


If dateCol1 > dateCol2 then update dateCol3 to dateCol1
else
update #tmp1 Set dateCol3 to dateCol2


I am sure this is not the most normalized example, but what would be the
tSql to update my table with the latest date in a row without having to do it
in 2 queries?


I realize I could say

update #tmp1 set datecol3 = datecol1 where datecol1 > datecol2

and then


update #tmp1 set datecol3 = datecol2 where datecol2 > datecol1


Is there a way to do this in one query statement? What would that look like?


Thanks,




Celko Answers
If you followed ISO-11179 data element naming rules, avoided
needlessly proprietary code, and put your date into the proper format
for Standard SQL, would your posting look like this?

CREATE TABLE Foobar
(foobar_id INTEGER NOT NULL PRIMARY KEY,
col1_date DATETIME NOT NULL,
col2_date DATETIME NOT NULL,
col3_date DATETIME);


INSERT INTO Foobar(foobar_id, col1_date, col2_date)
VALUES (42, '2005-01-01', '2005-02-01');
Etc.


IF col1_date > col2_date
THEN update col3_date to col1_date
ELSE update col3_date to col2_date
<< col3_date =" CASE"> col2_date
THEN col1_date
ELSE col2_date END;


You have a few choices here.
1) You can make col3_date into a computed column in proprietary syntax
-- look it up.
2) You can use the CASE expression to update col3_date; "UPDATE Foobar
SET col3_date = CASE.. END;" Of course this still means that you have
materialized computed data in violate of good design.
3) You can put it in a VIEW, and drop col3_date from the base table.
The code will always be right and portable.


Original source

Thursday, October 04, 2007

How to retrieve all records from 30th record to 50th record of a table?

SQL Apprentice Question
I have a table named myTable. It is sorted by some column and there is
no primery key or unique column.I want to retrieve all records from
30th record to 50th record. How can I do this by a sql statement?



Celko Answers
>> I have a table named myTable. It is sorted by some column and there is no primary key or unique column. <<


BY DEFINITION:
1) Tables have at least one key
2) Tables have no ordering


>> I want to retrieve all records from 30th record to 50th record. <<


1) Rows are not anything like records
2) Tables have no ordering

Please read a book --ANY book -- are RDBMS. You have gotten every
basic concept wrong.




Original Source

No idea where to start with Query

SQL Apprentice Question
I need some help with a query using SQL 2005 and I really don't know
where to start.
I have 3 columns: Date(dateTime), Name(varchar), Value(decimal)
I need to retireive 3 values, one query or all, or one query for
each. It makes no difference to me.

1)For each Name I need to get the difference of current day's value
and previous day's value
(TodayValue- PreviousDayValue)


2)For each Name I need to get the sum of difference of yesterday's
value and today's value for the current month
So something like (Day1Value - Day2Value) + (Day2Value - Day3Value)
+...+ (Day29Value - Day30Value)


3)Same as #2 but just for the current year.


Any help would be greatly appreciated.



Celko Answers
The best place to start is with DDL, so that people do not have to
guess what the keys, constraints, Declarative Referential Integrity,
data types, etc. in your schema are. If you know how, follow ISO-11179
data element naming conventions and formatting rules. Sample data is
also a good idea, along with clear specifications. It is very hard to
debug code when you do not let us see it. What you did post was a pile
of vague names and/or reserved words. Let's make it real and sensible:

CREATE TABLE DogSchedule
(walk_date DATETIME DEFAULT CURRENT_TIMESTAMP NOT NULL,
dog_name CHAR(15) NOT NULL,
walk_kilometers DECIMAL (5,2) NOT NULL,
PRIMARY KEY (walk_date, dog_name));


1)For each dog_name I need to get the difference of current day's
value and previous day's value <<


You did not tell us if you are sure that all days are represented in
the table? Only one walk per day? What constraint enforces that
business rule? My point is that SQL is an integrated whole -- you
cannot separate DDL and DML; they must work together!


Next, you are going to be using the delta from day to day, so let's
put that in a VIEW.


CREATE VIEW DeltaDogWalks (walk_date, dog_name, delta_kilometers)
AS
SELECT D2.walk_date, D2.dog_name,
(D2.walk_kilometers - D1.walk_kilometers)
FROM DogSchedule AS D1, DogSchedule AS D2
WHERE D1.dog_name = D2.dog_name
AND D2.walk_date = DATEADD(DD, 1, D1.walk_date);


To sum the deltas, set up a report periods table that you can adjust:


CREATE TABLE ReportPeriods
(period_name CHAR(15) NOT NULL PRIMARY KEY,
start_date DATETIME NOT NULL,
end_date DATETIME NOT NULL,
CHECK(start_date < end_date));


Now use a "walk_date BETWEEN start_date AND end_date" to group on the
names of the reporting periods.


Original source

Thursday, August 09, 2007

A Strange Query Options

SQL Apprentice Question
Hi guys,

I'm trying to work this query out in my head see if you can help me with it.
I'll give you a little background information so you have the big picture.
This is a real estate database, every home has an APN number it's a unique
number that always means the same property. When a company wants to list a
home on the MLS to sell the home it gets an MLS number which is only unique
to the instance the entity wants to sell that property, so in a span of a
few years the same APN could go for sale several times and have several MLS
Numbers, but maintain the same APN. This database is setup so each row
is an MLS Number.


I need to make a list of all entries that qualify:


1. Find APN's with multiple entries


2. Narrow it down to only solds


3. Only solds that have been modified in the last 3 years


and it needs to generate a list of just DISTINCT APN's that qualify.


Thanks,




Celko Answers
Please post DDL, so that people do not have to guess what the keys,
constraints, Declarative Referential Integrity, data types, etc. in
your schema are. We don't even know your data types, column or table
names much less your codes! Sample data is also a good idea, along
with clear specifications. It is very hard to debug code when you do
not let us see it.


>> This is a real estate database, every home has an APN number it's a unique number that always means the same property. <<


CREATE TABLE Properties
(apn INTEGER NOT NULL PRIMARY KEY,
etc);


>> When a company wants to list a home on the MLS to sell the home it gets an MLS number which is only unique to the instance the entity wants to sell that property, so in a span of a few years the same APN could go for sale several times and have several MLS Numbers, but maintain the same APN. <<


CREATE TABLE Listings
(apn INTEGER NOT NULL
REFERENCES Properties(apn)
ON DELETE CASCADE
ON UPDATE CASCADE,
start_date DATETIME DEFAULT CURRENT_TIMESTAMP NOT NULL,
end_date DATETIME, -- null means current
CHECK (start_date < end_date),
PRIMARY KEY (apn, start_date),
mls INTEGER NOT NULL,
listing_status CHAR(10) DEFAULT 'listed' NOT NULL
CHECK (listing_status IN ('listed', 'sold', 'reduced', etc)),
asking_price DECIMAL(12,2) NOT NULL,
etc);

You also need a Calendar table, since this is a history schema. Google
that.



>> I need to make a list of all entries that qualify:


1. Find APN's with multiple entries
2. Narrow it down to only solds
3. Only solds that have been modified in the last 3 years and it
needs to generate a list of just DISTINCT APN's that qualify. <<

Did you notice that you gave a step by step **procedure**, rather than
a **declarative statement**? You do not thinking SQL yet. This is
steps 1 and 2 as a query:


SELECT apn, COUNT(*) AS listed_cnt
FROM Listings
WHERE listing_status = 'sold'
GROUP BY apn
HAVING COUNT(*) > 1;


Step 3 is impossible with what you posted. What does "modified" mean
and where is it in the tables? I assume with the Properties, which
means I need a history schema on it, too. My mental image is that
"modified" is going to involve the bedroom counts, kitchen appliances,
roofing, etc. and NOT one column with a simple code in it.

Friday, July 13, 2007

Boolean computed column

SQL Apprentice Question
I have a column in my database I want to be a computed flag based on an
active date.. the Flag column is called active..

basicall I want if the ActiveDate <= getdate() (as in today is after the
item was active) then the active flag is a bit 1, else its a bit 0... I
tried this as a flat out <= statement, got an error by SQL Managment
studio... so how would this be done? thanks!


Celko Answers
>> I have a column in my database I want to be a computed flag based on an active date.. the Flag column is called active.. basically I want if the ActiveDate <= getdate() (as in today is after the item was active) then the active flag is a bit 1, else its a bit 0...
I tried this as a flat out <= statement, got an error by SQL
Management studio... so how would this be done? <<

This would be done by forgetting all the basic rules for programming
in SQL :)


1) We do not use the proprietary BIT data type that does not exist in
Standard SQL


2) Flags are for assembly language programming and have no place in
RDBMS; that is why SQL has no BOOLEAN data types


3) We do not store computed columns in a table. Write a VIEW and
learn to use CURRENT_TIMESTAMP instead of the old proprietary
getdate() function call to get today's active data.


Original Source

SQL Views - embedded view work-a-rounds

SQL Apprentice Question
I've been asked to re-write a sql view. The view itself contains
several calls to other views (embedded). Is there a way to get around
using embedded views. I've written the same query up using temp.
tables but obviously temp. tables can't be used in views?


Is there any special things I should be looking for?


Celko Answers
>> I've been asked to re-write a sql view. The view itself contains several calls[sic: invocations?] to other views (embedded). Is there a way to get around using embedded views. I've written the same query up using temp tables but obviously temp. tables can't be used in views? <<


Nesting VIEWs is a good progrmming practice when it is done right. It
can assure that nobody invents their own definition of something, like
how we compute a tricky formula that can send us all to prison.

Temp tables are a baaaaad idea. The SQL Server model is in violation
of ANSI/ISO and most everyone else's model of them. They are usually
a way to fake a "scratch tape" in a procedural solution, where each
step passes the tape to the next step in a process; SQL is declarative
and we want to write that way.


But the real point is that you never said *why* you want to re-write
this unnamed VIEW. Damn to give advise about anything without any
kind of spec at all ..

real world advise on temp tables please

SQL Apprentice Question
a system with around 1500 users. a "bad" stored procedure that writes 3
#temp_tables. I notice system slow downs when more than 5 - 10 people run the
SP at the same time. Is this to be expected. what are the real world
expectancies of the temdb?

Celko Answers
>> a "bad" stored procedure that writes 3 #temp_tables. I notice system slow downs when more than 5 - 10 people run the SP at the same time. Is this to be expected. what are the real world expectancies of the temdb? <<


Yes, it is expected. The best solution is to re-write the procedure
to use derived tables and subqueries.

Besides being proprietary in both syntax and implementation the # temp
tables are usually a sign of bad programming. They are used as
"scratch tapes" in a routine structured as if it were a 1950's mag
tape batch; each step of a sequential process is written to a scratch
tape (aka # temp table) to be passed to the following step.


Remember coupling and cohesion from that freshman S.E. course?
Temporal coupling?

Saturday, June 23, 2007

I have problem with union

SQL Apprentice Question
use Sql Server 2000. I've structure like it:

Create table T1(
RID int primary key,
A varchar(8),
...
...
)


Create table T2(
RID int primary key,
A varchar(8),
...
...
)


To make easy to view, so I make a view like it


Create view VUni as
Select RID, A From T1
union


Select RID, A From T2


Now, when I execute


Select * From VUNI Where RID between 15 and 25


So the process needs a lot of times alias Very Slow. How to make it best
performance?


Oh yeach, it's other question, I want to select data from table which its
name is name dimanically (variable)


IF @YEAR=2000 Select * FROM T2000
ELSE
IF @YEAR=2001 Select * FROM T2001
ELSE
IF @YEAR=2002 Select * FROM T2002
ELSE
...
...


To make it compact, So how to the code?


Next question, How to avoid error? I execute like it


Select a/b from T2


seldom the statement is error, because value of filed "a" is 0. I want if
a/b is error calculation, so it give value 0 automatically. Can I do it?
Without I ve to make a function? Or any function (built in) to handle it?


Celko Answers
>> I want to select data from table whose name is dynamic (variable) <<


IF @year=2000 SELECT * FROM T2000
ELSE
IF @year=2001 SELECT * FROM T2001
ELSE
IF @year=2002 SELECT * FROM T2002
ELSE
...
<<

This design flaw is so bad it has a name, like a disease: "Attribute
Splitting". Instead of separate tables, you need one table with a
"_year" column in it. You then use VIEWs or queries.


You probably also split an attribute in the first part of this
posting, and are trying to fix it with a UNION [ALL] construct.



Original Source

Working days calendar in T-SQL

SQL Apprentice Question
My company working days is from Monday to Friday.
I would like to generate a result set which show all
consecutive working days (Monday to Friday excluding the
weekends) that is, for June 2007, it show me 1st, 4th, 5th,
6th, 8th, 11th etc... until 29th (last working day of June). Is
it possible to do this with T-SQL?My company working days is from Monday to Friday.
I would like to generate a result set which show all
consecutive working days (Monday to Friday excluding the
weekends) that is, for June 2007, it show me 1st, 4th, 5th,
6th, 8th, 11th etc... until 29th (last working day of June). Is
it possible to do this with T-SQL?

Celko Answers
Build a calendar table with one column for the calendar data and other
columns to show whatever your business needs in the way of temporal
information. Do not try to calculate holidays in SQL -- Easter alone
requires too much math.

CREATE TABLE Calendar
(cal_date DATE NOT NULL PRIMARY KEY,
fiscal_year INTEGER NOT NULL,
fiscal_month INTEGER NOT NULL,
week_in_year INTEGER NOT NULL, -- SQL server is not ISO standard
holiday_type INTEGER NOT NULL
CHECK(holiday_type IN ( ..), --
day_in_year INTEGER NOT NULL,
julian_business_day INTEGER NOT NULL,
...);


The Julian business day is a good trick. Number the days from
whenever your calendar starts and repeat a number for a weekend or
company holiday.


A calendar table for US Secular holidays can be built from the data at
this website, so you will get the three-day weekends:

http://www.smart.net/~mmontes/ushols.html

Time zones with fractional hour displacements http://www.timeanddate.com/worldclock/city.html?n=246 http://www.timeanddate.com/worldclock/city.html?n=176 http://www.timeanddate.com/worldclock/city.html?n=5 http://www.timeanddate.com/worldclock/city.html?n=54

But the STRANGE ones are:
http://www.timeanddate.com/worldclock/city.html?n=63 http://www.timeanddate.com/worldclock/city.html?n=5


Original Source

Saturday, June 16, 2007

group by datetime

SQL Apprentice Question
I am baffled by this query and can use a little help pls!


query count for each Monday of the week in the last few months between 9 pm
and 1 am. I know I can use datepart() but can't figure out how to query
between 9pm and 1am. Thanks.



Celko Answers
>> I am baffled by this query and can use a little help pls! <<


Please post DDL, so that people do not have to guess what the keys,
constraints, Declarative Referential Integrity, data types, etc. in
your schema are. Sample data is also a good idea, along with clear
specifications. It is very hard to debug code when you do not let us
see it. Why are you so rude?


>> query count for each Monday of the week in the last few month between 9 pm [sic: 11:00:00 Hrs] and 1 am [sic: 01:00:00 the next day??]. I know I can use DATEPART() but can't figure out how to query between 9 pm [sic] and 1 am [sic]. <<


You really have no idea how time works!! What the hell is that AM and
PM crap? You never heard of ISO-8601 Standards and UTC???

What you do is set up a table of temporal ranges with upper and lower
limits with TIMESTAMP limits and join to it. Hey, you spit on us by
not posting DDL, why should we post DDL and data for you?


Your unit of measurement is wrong and you are getting screwed up. But
your invisible DDL tells us nothing!!

Original Source

Updating based on values of two records

SQL Apprentice Question
I'm sure this is obvious, but for some reason, I can't get my hands
around the solution. I have a table like this:

Account CalYear CalMonth Amount


Comm 2006 12 80
Comm 2007 01 100
Comm 2007 02 125
Incr 2007 01 21
Incr 2007 02 28


I want to update the incr account so it shows the correct difference
between the previous month, the change basically. Corrected it would
look like this:


Account CalYear CalMonth Amount


Comm 2006 12 80
Comm 2007 01 100
Comm 2007 02 125
Incr 2007 01 20
Incr 2007 02 25



Celko Answers

>> I have a table like this: <<


Why do you spit on us and want us to do your homework/job for you?

Please post DDL, so that people do not have to guess what the keys,
constraints, Declarative Referential Integrity, data types, etc. in
your schema are. Sample data is also a good idea, along with clear
specifications. It is very hard to debug code when you do not let us
see it.


Why did you invent your own two column data types that avoid the
temporal data types? Have you ever read the ISO-8601 rules for
temporal data?


CREATE TABLE StinkingFoobar -- until we get a real name
(acct_type CHAR(4) NOT NULL,
start_date DATETIME NOT NULL,
end_date DATETIME NOT NULL,
CHECK (start_date < end_date),
squid_amt INTEGER NOT NULL, -- read ISO-11179 before you program
again!!
PRIMARY KEY (acct_type, start_date)
);


Did I guess right?? Gee, wish you had helped!!



>> I want to update the incr account so it shows the correct difference between the previous month <<


No. That is soooooo non-RDBMS!! This is a computed value and needs
to be done in a VIEW using the OLAP functions. RTFM.


Original Source

newbie question on update

SQL Apprentice Question
I have a clients table which consists of, among other things, a column
named source.

I have a second table, sources, with two columns - client_id and source.


The are fewer rows in sources than in clients. I am trying to update
clients.source with sources.source without affecting the clients that
do not have a corresponding row in sources.


My first try was update clients set clients.source = sources.source
where clients.client_id = sources.client_id


But I ended up with nulls in the clients.source column for rows in
which there was no row in sources.


I am guessing that I need to involve a join somehow, but I am having a
very hard time not thinking procedurally, and just don't "get" it.


I've been looking at Join examples, but anything non-trivial just
baffles me. Would someone be so kind as to explain how to do what I'm
trying to do?


Thanks.



Celko Answers
>> I am finding it difficult to think in terms of sets and 'simultaneous' actions. I keep wanting to solve these things procedurally. <<


It takes about a year to have the revelation :) My next book will
deal with thinking in sets. Hey, remember how weird recursion was?


Original Source

Thursday, June 14, 2007

Question regarding multiple data sources and coalesce

SQL Apprentice Question
I inherited a db that is pulling data from 5 different sources and
storing them in 5 different tables. The tables all have the same
basic data, minor differences based on source. The company currently
creates a "summary" table by joining the 5 source tables and using
coalesce( ) to display data from preferred data sources if it is
available from there.

I've been asked to present an alternate schema. My first thought was
to normalize the 5 tables into 1 with a source andI could just include
the source in the ORDER BY of queries to get the preferred source.
But then I thought since each source could have 30Million rows maybe I
would loose performance over the existing schema. So, could somebody
point me towards a reference source that may cover this type of
topic? Of course, any opinions (on this issue) would be appreciated
as well.


Celko Answers

>> I've been asked to present an alternate schema. My first thought was to normalize the 5 tables into 1 with a source and I could just include the source in the ORDER BY of queries to get the preferred source. <<


Something ike this? Cram all the data that you have into staging
table

CREATE TABLE Foobar
(foo_key INTEGER NOT NULL,
source_nbr INTEGER NOT NULL
CHECK (source_nbr BETWEEN 1 AND 5),
PRIMARY KEY (foo_key, source_nbr),
etc.);


Assuming sources are ranks from 1 to 5, pull out the most trusted for
each key


SELECT foo_key, etc.
FROM Foobar AS F1
WHERE source_nbr =
(SELECT MIN(source_nbr)
FROM Foobar AS F2
WHERE F1.foo_key = F2.foo_key);


This is the best we can do without more specs.




Original Source

Need urgent help on a QUERY

SQL Apprentice Question
Given:

SELECT STATE_ID
FROM GROUP_STATE


STATE_ID
----------------
10
15
16
17
18
19
20


SELECT GROUP_ID
FROM GROUP
WHERE GROUP_ID NOT IN (SELECT DISTINCT GROUP_ID FROM GROUP_STATE)


GROUP_ID
---------------
1
2
16
5


I need a query that would look like:
GROUP_ID STATE_ID
--------------- ----------------
1 10
1 15
1 16
1 17
1 18
1 19
1 20


2 10
2 15
2 16
2 17
2 18
2 19
2 20


ETC....



Celko Answers
SELECT GS.state_id, G.group_id
FROM ( SELECT state_id FROM GroupStates AS GS
CROSS JOIN
SELECT group_id FROM Groups
WHERE group_id
NOT IN (SELECT DISTINCT group_id FROM GroupStates)
AS G);

Do not use reserved words for data element names.


Original Source

changing order of records

SQL Apprentice Question
hi, i've got a problem. i need to change order of records in my table . here
is the table

id forumname
----------------
1 test1
2 test2
3 test3
4 test4


i need to write code that can move nay record up or down without changing
id of forumname. i thought about another column called order


id forumname order
---------------------
1 test1 1
2 test2 2
3 test3 3
4 test4 4


but couldnt figureout how to change that order column. any one got an idea?



Celko Answers
Here is an old "cut & paste" for this problem: Given a motorpool with
numbered parking spaces, you want to move the automobiles around.
CREATE TABLE Motorpool
(parking_space INTEGER NOT NULL PRIMARY KEY
CHECK (parking_space > 0),
vin CHAR(17) NOT NULL);

Re-arrange the display order based on the parking_space column:


CREATE PROCEDURE SwapVehicles (@old_parking_space INTEGER,
@new_parking_space INTEGER)
AS
UPDATE Motorpool
SET parking_space
= CASE parking_space
WHEN @old_parking_space
THEN @new_parking_space
ELSE parking_space + SIGN(@old_parking_space - @new_pos)
END
WHERE parking_space BETWEEN @old_parking_space AND
@new_parking_space
OR parking_space BETWEEN @new_parking_space AND
@old_parking_space;


When you want to drop a few rows, remember to close the gaps with
this:


CREATE PROCEDURE CloseMotorpoolGaps()
AS
UPDATE Motorpool
SET parking_space
= (SELECT COUNT (M1.parking_space)
FROM Motorpool AS M1
WHERE M1.parking_space <= Motorpool.parking_space);




Original Source

Wednesday, June 13, 2007

self-referencing constraint with identity column?

SQL Apprentice Question
have a table that requires to have a self-referencing constraint to enforce
a parent-child type of relationship:

ID (identity column)
ParentID (INT column that references the ID column).


To determine if I'm at the top-level of of the relationship, I was going to
leave the ParentID null, otherwise it must be a value of another ID column in
the table which indicates its child of another.


However, can you have it so you don't use NULL to indicate this and instead
set ID and ParentID equal to the same value and still use the IDENTITY column?


So on a INSERT (i.e the IDENTITY would generate 25, so I'd like to set the
ParentID to 25 as well). This would was causing FK violation and I was
wondering if there is a way around it using an insert trigger?



Celko Answers

>> I have a table that requires to have a self-referencing constraint to enforce a parent-child type of relationship: <<


No it is not required at all. Look up the Nested Sets model for trees
and hierarchies. You will find the constraitns are much easier and the
code will run1-2 orders of magnitude faster than recursive path
traversals.

Also, never use IDENTITY in an RDBMS; find a valid relational key
instead.



Original Source

Tuesday, June 12, 2007

Database Design Question

SQL Apprentice Question
I'm fairly new to database design, having only really worked with
existing tables etc in the past.


Simple question this really...........


In a users table, is it wise to have a ManagerID column that is
effectively pointing to another user in the same table, with the
theory being that if that person is top dog, they will just have a
null entry in ManagerID.


I will check for circular references before entering the data.


Is there a better way of storing the data, or is this the only way?




Celko Answers
>> Is there a better way of storing the data, or is this the only way? <<


Get a copy of TREES & HIERARCHIES IN SQL for several different ways to
do an organizational chart. What you have is thd adjacency list
model; look up the nested sets model.



Original Source

Using Soundex to identify possible duplicates

SQL Apprentice Question
In a table that has person forename and surname, is it possible to use
soundex to identify for every row in the table what similar matches
there are in the same table?



Celko Answers

>> is it possible to use Soundex to identify for every row in the table what similar matches there are in the same table? <<


Don't do it. Name handling is an ugly problem and if you have to do
this on a regular basis get a package designed for this kind of work.
Some companies are Group 1 Software, SSA (used to have a great booklet
on this topic), Melissa Data Corporation and Promark Software Inc.

Their software handles mailing lists and you can get a review copy
from Melissa Data. They do not choke on names like "John Paul van der
Poon" and worse.


Original Source

giving one union preference

SQL Apprentice Question
Let say I have a query with the following structure:

Select name, mdate, kdate
from table1 a
inner join (
select name, mdate
from table2 b
where a.id = b.id
and mdate >= kdate
union
select name, mdate
from table3 c
where a.id = c.id
and mdate >= kdate
) mindate


When the select mdate from the inner join union (mindate) returns the
same mdate for table b and c, I want to give preference to the c.mdate
and so c.name.


Is there a way to do this?




Celko Answers
>> When the select mdate from the inner join union (mindate) returns the same mdate for table b and c, I want to give preference to the c.mdate and so c.name. <<


This question makes no sense. The poorly named derived table,
Mindate, acts as if it is materialized and you no longer have access
to tables B and C. It is like asking to see the eggs after the cake
has been baked.

Original Source

Help writing query to return data if there are 4 consecutive holidays

SQL Apprentice Question
I need to write a query that returns data if an employee has taken four or
more consecutive holidays. I need to get the startdate and the employee name.
The employee works either a 4X10..or an 8X5 shift...
Also, the employee need not have an off only on Sat and Sun...Some of them
work on the weekends and have an off during the weekdays.


Also, I need to take into account the days employee is off (eg.Sat and Sun).


So if an employee is out Mon, Tues, Wed, Thu it shows that employee or if
the employee is out on Thu, Fri, Mon, Tues it also shows that employee...


Can anyone assist!




Celko Answers
Use a Calendar table which has the work days Julianized (consecutively
numbered). Google this newsgroup for examples.

Original Source

add blank row between groups of rows to separate them?

SQL Apprentice Question
I want to add a blank row between groups of rows to separate them.


select * from tbl1 where fname in ('Bill', 'tom', 'sam')


returns


Bill, 1, 2, 3
bill 4, 5, 6
tom, 1, 2, 3
tom, 4, 5, 6
sam, 1, 2, 3
sam, 4, 5, 6


I want to return this:


Bill, 1, 2, 3
bill 4, 5, 6


tom, 1, 2, 3
tom, 4, 5, 6


sam, 1, 2, 3
sam, 4, 5, 6


How can I perform this type of operation with tsql?



Celko Answers
>> How can I perform this type of operation with tsql? <<


Why are you formatting data in the back end? The basic principle of a
tiered architecture is that display is done in the front end and never
in the back end. This a more basic programming principle than just
SQL and RDBMS.

Original Source

Monday, June 11, 2007

Calc minutes between days

SQL Apprentice Question
I am using DATEDIFF('n',EndTime, StartTime) to return minutes between 2
datetime columns. The problem is that when the StartTime is 23:00:00 PM and
EndTime is 07:00:00 AM I get a negative # of minutes. How can get 480
minutes from this calculation. Note the 23:00:00 is yesterday and the
07:00:00 is today. Thanks


Celko Answers

>> OK, but I'm not storing the date portion so it thinks both dates are 12/30/1899. Do I need a CASE statement to check for 1st time 2nd time? <<


No, you need to learn to do it right and not look for kludges in
newsgroups. Oh, you will get the kludges, as you have seen, but then
your code will become a convoluted mess over time, queries cannot use
indexes, portability is destroyed, etc.

It would also help if you had learned to use ISO-8601 temporal
formats, too (i.e '1899-12-30'). That would have lead you to study
the ISO temporal model. A proper data model will use DATETIME data
types and model events as having durations. Get a free copy of Rick
Snodgrass book in PDF from the University of Arizona website and spent
a week with. There is also a ton of free code you can download.

Get info from multiple tables. Join problem? Options

SQL Apprentice Question
The Stored Procedure below returns article attributes for my articles
(tblArticleAttribute). This works fine IF NOT tblArticleAttribute.Content is
empty, then it returns all templatedefinitions with the Content. But, if the
Content is empty, it also has to return all template definitions, but with
the content fields empty (now it returns nothing). This is because I bind
this information to a dynamic form, and if (example) the title or ingress is
empty the textfields (definitions) should be shown with no text. Then the
user can fill inn text an save it.

----------------------------------
Here is the SQL:
----------------------------------


SELECT tblTemplateDefinition.[Name] as TemplateDefinitionName,
tblTemplateDefinition.HelpText as TemplateDefinitionHelpText,
tblArticleAttribute.[Content], tblArticle.Id AS ArticleId,
tblTemplateDefinition.Id AS TemplateDefinitionId
FROM tblTemplateDefinition
LEFT OUTER JOIN tblArticleAttribute ON tblTemplateDefinition.Id =
tblArticleAttribute.TemplateDefinitionId
INNER JOIN tblArticle ON tblArticleAttribute.ArticleId= tblArticle.Id
WHERE tblArticle.Id = @ArticleId


----------------------------------
Here is the tables:
----------------------------------


tblArticle:
id | Name


tblArticleAttribute
TemplateDefinitionId | ArticleId | Content


tblTemplate
Id | Name


tblTemplateDefinition
Id | TemplateId | TemplateDefinitionId |Name | HelpText



Celko Answers
Please post DDL, so that people do not have to guess what the keys,
constraints, Declarative Referential Integrity, data types, etc. in
your schema are. Sample data is also a good idea, along with clear
specifications. It is very hard to debug code when you do not let us
see it.

You might want to read any basic book on data modeling. You seem to
have multiple names for the same data element, based on where it
appears. And you have a universal, magical "id" column in all the
pseudo-code you did give us. You also violate ISO-11179 naming rules.
My guess as to what you meant to post if you knew how would be
something like this:


CREATE TABLE Articles:
(article_id INTEGER NOT NULL PRIMARY KEY, -- needs work
article_title VARCHAR(150) NOT NULL);


I have no idea which carefully researched industry standard you are
using the article identifier. But at least I know it is not an
IDENTITY column -- that would be soooo non-relational.


Why did you use a singular name for the table? Do you really have
only one article. Why did you put that silly "tbl-" prefix on the
table names? Just to mess up the data dictionary, or because you are
an OO programmer who has not learned RDBMS yet?


CREATE TABLE ArticleAttributes
(template_id INTEGER NOT NULL PRIMARY KEY,
article_id INTEGER NOT NULL
REFERENCES Articles(article_id)
ON UPDATE CASCADE
ON DELETE CASCADE,
article_content VARCHAR(255) NOT NULL);


We would never store attributes in a schema; that would mix data and
metadata. What about the templates? Why is the definition of a
template not an attribute of a template? Why it is an entity with its
own table?


Also note that you need DRI actions, constraints, etc. to make this
work.


I can make a lot of assumptions about the magical universal names and
how they are related (is the magical "id" actually article_id,
template_id, template_definition_id, template_id, or a hundred other
possible things?) and the cardinality of the relationships, etc., but
based on decades of cleaning up bad schemas, you need more than a
sample query with an OUTER JOIN.


Can you explain your problem better?

Thursday, June 07, 2007

Maximum number of tables (260) exceeded

SQL Apprentice Question
Hi! I receive the referenced message when i run a view I created in SQL2005
in a SQL2000 install of the same database. Basically it is a union query with
about 9 individual select queries. I am sure this has something to do with a
lack of efficiency in my query structure but I am confounded that it runs in
SQL2005 and not in SQL2000. Any suggestions? Thanks in avance for any help
you can offer...



Celko Answers
>> I am confounded that it runs in SQL2005 and not in SQL2000. Any suggestions?

Why not just write better code? Seriously, after all my decades (NOT
years) with SQL, I cannot imagine anything properly designed that
would break the limit.

Remember the human interface part of the first SE course you had
before you were allowed to production data? 5 to 7 levels is all a
human being can maintain. Duh.


Find the moron who nested things this deep and kill him. Then re-
write EVERYTHING he wrote and re-write it. Do a Wikipeida on
"spaghetti code" versus "lasagna code" for some of the issues.




Original Source

Monday, June 04, 2007

db table design question

SQL Apprentice Question
We have an application which has its settings stored in a database
(SQL Server 2000)

We have (strong) disagrements on the best way to store these values,
and I am looking for advice on doing this.
Most of the values are not used by the stored procedures, just the
applications running on the clients PC. The data rarely changes.


Developer #1 wants many tables with many columns. This gives the
ability to strongly type each variable. Each table would have only
one row. He believes this to be faster in retrieving the data.


Developer #2 wants one table with a Key column and a Value column.
This table would have many rows. The application would be
responsible
for detecting invalid datatype in the value. He believes this to be
a
more efficent use of the database, and easier to maintain.


So - what do you think? Are there any advantages or problems with
either approach?



Celko Answers
Use #1. The other way is called a OTLT ("One True Lookup Table") or
MUCK ("Massively Unified Code Keys") in the literature. It is
incredibly bad design. Here is a cut and paste from one of my books
on Constants tables

04.02. Constants Table


When you configure a system, you might want to have a way to set and
keep constants in the schema. One method for doing this is to have a
one-row table that can be set with default values at the start, and
then updated only by someone with administrative privileges.


CREATE TABLE Constants
(lock CHAR(1) DEFAULT 'X'
NOT NULL PRIMARY KEY
CHECK (lock = 'X'),
pi FLOAT DEFAULT 3.142592653 NOT NULL,
e FLOAT DEFAULT 2.71828182 NOT NULL,
phi FLOAT DEFAULT 1.6180339887 NOT NULL,
..);


To initialize the row, execute this statement.


INSERT INTO Constants VALUES DEFAULTS;


Most SQL programmers do not know about the VALUES DEFAULTS option in
the INSERT INTO statement. The lock column assures there is only one
row and the DEFAULT values load the initial values. These defaults
can include the current user and current timestamp, as well as numeric
and character constant values.


Another version of this idea that does not allow for any updates is a
VIEW defined with a table constructor. [[not in SQL Server yet!]]


CREATE VIEW Constants (pi, e, phi, ..)
AS VALUES (CAST 3.142592653 AS FLOAT),
(CAST 2.71828182 AS FLOAT),
(CAST 1.6180339887 AS FLOAT),
..;


Please notice that you have to use a CAST() operators to assure that
the data types are correct. This is not a problem with INTEGER
values, but can be if you wanted DOUBLE PRECISION and got a default of
DECIMAL(s, p) or FLOAT.


[[ a little bit later in the chapter..]]


OTLT or MUCK Table Problems


I think that Paul Keister was the first person to coin the phrase
"OTLT" (One True Look-up Table) for a common SQL programming technique
that is popular with Newbies. Don Peterson (www.SQLServerCentral.com)
gave the same technique the name "Massively Unified Code-Key" or MUCK
tables in one of his articles.


The technique crops up time and time again, but I'll give him credit
as the first writer to give it a name. Simply put, the idea is to
have one table to do all of the code look-ups in the schema. It
usually looks like this:


CREATE TABLE Look-ups
(code_type CHAR(10) NOT NULL,
code_value VARCHAR(255) NOT NULL, -- notice size!
code_description VARCHAR(255) NOT NULL, -- notice size!
PRIMARY KEY (code_value, code_type));


So if we have Dewey Decimal Classification (library codes), ICD
(International Classification of Diseases), and two-letter ISO-3166
country codes in the schema, we have them all in one, honking big
table.


Let's start with the problems in the DDL and then look at the awful
queries you have to write (or hide in VIEWs). So we need to go back
to the original DDL and add a CHECK() constraint on the code_type
column. Otherwise, we might "invent" a new encoding system by
typographical error.


The Dewey Decimal and ICD codes are digits and have the same format --
three digits, a decimal point and more digits (usually three); the
ISO-3166 is alphabetic. Oops, need another CHECK constraint that will
look at the code_type and make sure that the string is in the right
format. Now the table looks something like this, if anyone attempted
to do it right, which is not usually the case:


CREATE TABLE OTLT
(code_type CHAR(10) NOT NULL
CHECK(code_type IN ('DDC','ICD','ISO3166', ..),
code_value VARCHAR(255) NOT NULL,
CHECK
(CASE
WHEN code_type = 'DDC'
AND code_value
SIMILAR TO '[0-9][0-9][0-9].[0-9][0-9][0-9]'
THEN 1
WHEN code_type = 'ICD'
AND code_value
SIMILAR TO '[0-9][0-9][0-9].[0-9][0-9][0-9]'
THEN 1
WHEN code_type = 'ISO3166'
AND code_value SIMILAR TO '[A-Z][A-Z]'
THEN 1 ELSE 0 END = 1),
code_description VARCHAR(255) NOT NULL,
PRIMARY KEY (code_value, code_type));


The "SIMILAR TO" predicate is the SQL-92 version of a regular
expression parser based on the POSIX Standards, if you are not
familiar with it. Since the typical application database can have
dozens and dozens of codes in it, just keep extending this pattern for
as long as required. Not very pretty is it? In fact, there is a good
chance that you will exceed the number of WHEN clauses allowed in a
CASE expression. That's why most OTLT programmers don't bother with
it.


Now let us consider adding new rows to the OTLT.


INSERT INTO OTLT (code_type, code_value, code_description)
VALUES
('ICD', 259.0, 'Inadequate Genitalia after Puberty');


and also


INSERT INTO OTLT (code_type, code_value, code_description)
VALUES
('DDC', 259.0, 'Christian Pastoral Practices & Religious Orders');


If you make an error in the code_type during insert, update or delete,
you have screwed up a totally unrelated value. If you make an error
in the code_type during a query, the results could be interesting.
This can really hard to find when one of the similarly structured
schemes had unused codes in it.


The next thing you notice about this table is that the columns are
pretty wide VARCHAR(n), or even worse, that they are NVARCHAR(n) which
can store characters from a strange language. The value of (n) is
most often the largest one allowed in that particular SQL product.


Since you have no idea what is going to be shoved into the table,
there is no way to predict and design with a safe, reasonable maximum
size. The size constraint has to be put into the WHEN clause of that
second CHECK() constraint between code_type and code_value. Or you
can live with fixed length codes that are longer or shorter than what
they should be.


These large sizes tend to invite bad data. You give someone a
VARCHAR(n) column, and you eventually get a string with a lot of white
space and a small odd character sitting at the end of it. You give
someone an NVARCHAR(255) column and eventually it will get a Buddhist
sutra in Chinese Unicode.


Now let's consider the problems with actually using the OTLT in a
query. It is always necessary to add the code_type as well as the
value which you are trying to look-up.


SELECT P1.ssn, P1.lastname, .., L1.code_description
FROM OTLT AS L1, Personnel AS P1
WHERE L1.code_type = 'ICD'
AND L1.code_value = P1.disease_code
AND ..;


In this sample query, you need to know the code_type of the Personnel
table disease_code column and of every other encoded column in the
table. If you got a code_type wrong, you can still get a result.


You also need to allow for some overhead for data type conversions.
It might be more natural to use numeric values instead of VARCHAR(n)
for some encodings to ensure a proper sorting order. Padding a string
of digits with leading zeros adds overhead and can be risky if
programmers do not agree on how many zeros to use.


When you execute a query, the SQL engine has to pull in the entire
look-up table, even if it only uses a few codes. If one code is at
the start of the physical storage, and another is at the end of
physical storage, I can do a lot of caching and paging. When I update
the OTLT table, I have to lock out everyone until I am finished. It
is like having to carry an encyclopedia set with you when all you
needed was a magazine article.


Now consider the overhead with a two-part FOREIGN KEY in a table:


CREATE TABLE EmployeeAbsences
(..
code_type CHAR(3) -- min length needed
DEFAULT 'ICD' NOT NULL
CHECK (code_type = 'ICD'),


code_value CHAR(7) NOT NULL, -- min length needed
FOREIGN KEY (code_type, code_value)
REFERENCES OTLT (code_type, code_value),
..);


Now I have to convert the character types for more overhead. Even
worse, ICD has a natural DEFAULT value (000.000 means "undiagnosed"),
while Dewey Decimal does not. Older encoding schemes often used all
9's for "miscellaneous" so they would sort to the end of the reports
in COBOL programs. Just as there is no Magical Universal "id", there
is no Magical Universal DEFAULT value. I just lost one of the most
important features of SQL!


I am going to venture a guess that this idea came from OO programmers
who think of it as some kind of polymorphism done in SQL. They say to
themselves that a table is a class, which it isn't, and therefore it
ought to have polymorphic behaviors, which it doesn't.


Maybe there are good reasons for the data modeling principle that a
well-designed table is a set of things of the same kind instead of a
pile of unrelated items.




Original Source

Find first available block that does not intersect a range

SQL Apprentice Question
Given a number of numeric ranges, I need to find the first available row
(range) that can accomodate a requested block of contiguous values. For
example, given the following table definition:

CREATE TABLE [dbo].[tbl_AllocatedRange](
[RangeID] [int] IDENTITY(1,1) NOT NULL,
[RangeStart] [bigint] NOT NULL,
[RangeEnd] [bigint] NOT NULL,
CONSTRAINT [PK_tbl_AllocatedRange] PRIMARY KEY CLUSTERED
(
[RangeID] ASC
)


And the following DML:


INSERT INTO [dbo].[tbl_AllocatedRange] ([RangeID], [RangeStart], [RangeEnd])
VALUES ( 1, 100, 200 );


INSERT INTO [dbo].[tbl_AllocatedRange] ([RangeID], [RangeStart], [RangeEnd])
VALUES ( 2, 500, 650 );


INSERT INTO [dbo].[tbl_AllocatedRange] ([RangeID], [RangeStart], [RangeEnd])
VALUES ( 3, 2000, 2200 );


INSERT INTO [dbo].[tbl_AllocatedRange] ([RangeID], [RangeStart], [RangeEnd])
VALUES ( 4, 4000, 8000 );


INSERT INTO [dbo].[tbl_AllocatedRange] ([RangeID], [RangeStart], [RangeEnd])
VALUES ( 5, 16000, 25000 );


INSERT INTO [dbo].[tbl_AllocatedRange] ([RangeID], [RangeStart], [RangeEnd])
VALUES ( 6, 25001, 50000 );


INSERT INTO [dbo].[tbl_AllocatedRange] ([RangeID], [RangeStart], [RangeEnd])
VALUES ( 7, 100000, 200000 );


A request for a block of 10000, should return the row where RangeID=6
because that is the first available position where I could allocate a
contiguous block of 10000 without intersecting any of the values in the
existing ranges.


Celko Answers
Let's add some constraints and reduce those BIGINTs to just INTEGER
unless you really need it.


CREATE TABLE AllocatedRanges
(range_id INTEGER NOT NULL PRIMARY KEY,
range_start INTEGER NOT NULL
CHECK (range_start >= 0),
range_end INTEGER NOT NULL
CHECK (range_end >= 0),
CHECK (range_start <= range_end));


This will get you all of the candidates, but in RDBMS, there is no
concept of an ordering so talking about the "first avaialble" range
needs more definition.


SELECT range_id, (range_end - range_start)+1 AS range_size, @my_size
AS my_size
FROM AllocatedRanges
WHERE (range_end - range_start)+1 >= @my_size;


We can do it with the lowest range id:


SELECT MIN(range_id)
FROM AllocatedRanges
WHERE (range_end - range_start)+1 >= @my_size;


But the usual requirement is to find the best fit, not the first
one.


SELECT range_id
FROM AllocatedRanges
WHERE (range_end - range_start)+1 =
(SELECT MIN((range_end - range_start)+1 - @my_size)
FROM AllocatedRanges AS A1
WHERE ((range_end - range_start)+1 - @my_size) >= 0);


Original Source

mutliple refential integrity

SQL Apprentice Question
Suppose I want a column in my table to reference two other columns of
other two tables (of course datatypes are same, and the other two
columns are primary key in their respective relations). How can I
enforce this using SQL?


Celko Answers
>> Suppose I want a column in my table to reference two other columns of other two tables (of course data types are same, and the other two columns are primary key in their respective relations). How can I enforce this using SQL? <<-


CREATE TABLE Foo
(foo_id INTEGER NOT NULL PRIMARY KEY,
..);

CREATE TABLE Bar
(_id INTEGER NOT NULL PRIMARY KEY,
..);


CREATE TABLE Foobar
(foo_id INTEGER NOT NULL
REFERENCES Foo(foo_id),
bar_id INTEGER NOT NULL
REFERENCES Bar(bar_id),


Original Source

Wednesday, May 23, 2007

SQL Query Help

SQL Apprentice Question
Need to learn SQL queries again (SQL 2005). Need help with the
following situation:


I have two tables, CUSTOMERS and ORDERS. One each column in these
tables should have identical data (except the name of the column is
different) which is customer's name. Because each order creates an
entry in ORDER table, each customer's name may be appearing more than
once in that table. There could be also instances where a particular
customer might not have placed an order at all and will not have any
entry in the ORDERS table, but do have one entry in CUSTOMERS table
(no duplicates in CUSTOMERS table).


I need help with a query which shows a list of customers who have
never placed any orders. This essentially means that they do not have
any entry in the ORDERS table. Here is the Pseudo Code:


select customername from CUSTOMERS table
who are NOT IN
ORDERS table


Will appreciate any quick thoughts. Thanks in advance!



Celko Answers

>> Need to learn SQL queries again (SQL 2005). <<>> I have two tables, CUSTOMERS and ORDERS. One each column in these tables should have identical data (except the name of the column is different) which is customer's name. <<>> I need help with a query which shows a list of customers who have never placed any orders. <<


Wild guess, done without DDL

SELECT C.customer_name
FROM Customers AS C
WHERE NOT EXISTS
(SELECT *
FROM Orders AS O
WHERE O.customer_name = C.customer_name);

Totals

SQL Apprentice Question
I have a table that is populated everyday with daily totals, I'm
trying to teach myself SQL using this table.


Someone kindly gave me a query that gave the weekly totals based on
the table:
SELECT week_start, week_start + 6 AS week_end, SUM(Total3) AS
production
FROM (SELECT Date - DATEPART(weekday, Date + @@DATEFIRST -
1) + 1 AS week_start, Total3
FROM dbo.A_Totals) AS d
GROUP BY week_start
ORDER BY week_start


How can I adapt this to divide the weekly totals into years so I'd
have: week number, year, year-1,year-2 etc.


Thanks in advance


Celko Answers

>> I have a table that is populated everyday with daily totals, I'm trying to teach myself SQL using this table. <<


Bad idea; get a good book and a tutorial, then work on a problem.

Your approach to SQL is typical of someone coming in via procedural
code and looking for functions to solve it. SQL is a data retrieval
language that likes to use sets (tables). This is a WHOLE DIFFERENT
PROGRAMMING PARADIGM. After you have a few basics, then look up the
use of calendars -- temporal stuff is one of the hardest parts of
RDBMS, so it is a horrible starting point for a self-education. Try
something like this:


CREATE TABLE ReportPeriods
(period_name VARCHAR(20) NOT NULL PRIMARY KEY,
start_date DATETIME NOT NULL,
end_date DATETIME NOT NULL,
CHECK (start_date < end_date));


Create and load a table of reporting periods of whatever range you
want. They can overlap, have gaps, etc. They can be crazy, one shot
things, etc. If you are a Don Martin fan:


INSERT INTO ReportPeriods
VALUES ('National Gorilla Suit Week', 2007-05-21', 2007-05-27');
etc.


You also have to learn to design data which should take about a year,
if you really work at it. So if your weeks are encoding the ISO-8601
year-week number format, then you can simply write:


SELECT period_name, SUM(some_col) AS week_total
FROM DailyReports AS D, ReportPeriods AS R
WHERE D.report_date BETWEEN start_date AND end_date
AND period_name LIKE 'WEEK-2007__;


Change the wild cards as needed.

Friday, May 11, 2007

Correct Way to Insert into Multiple Tables

SQL Apprentice Question
I am just wondering if what I am doing would be considered the correct
way to insert data into multiple tables when a forigen key is in place
between the tables primary keys.


Here is a simple DB structure
Table 1
ID int (auto incriment) Primary Key
FirstName varchar(100)
LastName varchar(200)


Table 2
ID (Primary Key)
Age int
BirthPlace varchar(100)


I have created the following sotred proceedure, it works from the test I
have done, but I am wondering if it is the *correct* way to do it.


PROCEDURE [dbo].[InsertPerson]
@ID int output,
@FirstName varchar(100),
@LastName varchar(200),
@Age int,
@BirthPlace varchar(100)
AS
Insert into dbo.UserInfo(FirstName, LastName)
Values(@FirstName, @LastName)
SET @ID = SCOPE_IDENTITY()


Insert into dbo.UserDetails(ID, Age, BirthPlace)
Values (@ID, @Age, @BirthPlace)



Celko Answers

>> I am just wondering if what I am doing would be considered the correct way to insert data into multiple tables when a forigen key is in place between the tables primary keys. <<


Please post DDL, so that people do not have to guess what the keys,
constraints, Declarative Referential Integrity, data types, etc. in
your schema are. Sample data is also a good idea, along with clear
specifications. It is very hard to debug code when you do not let us
see it.

And what you have done is wrong on several points.


1) an auto-increment cannot ever be used as a key in an RDBMS. A key
is a subset of attributes in the data model. An auto-increment is
inside the hardware and has nothing to do with the data model,


2) Do you really have names that long? Well, you will if you allow
it,


3) Why did you split information of what-ever-the-heck you are
modeling across two tables?


4) "age" of what? Ands we never store age; we store a birthdate so we
can always correctly compute the age. Is this what you wanted?


CREATE TABLE Customers
(cust_id CHAR(9) NOT NULL PRIMARY KEY
CHECK (<< validation constraint>>),
last_name VARCHAR(20) NOT NULL,
first_name VARCHAR(20) NOT NULL,
birth_date DATETIME NOT NULL,
birth_location VARCHAR(20) NOT NULL);


The problem goes away with a valid design. I think what you might
have wanted to know is that INSERT INTO statements work on one and
only one base table. Put both inserts into a single transaction and
add a PRIMARY KEY and FOREIGN KEY constraints.

Last Day Of Previous Month...with a twist

SQL Apprentice Question
have a requirement to design a query that identifies items sold
between two dates. There is a 'SoldDate' datetime field used to
register what date the item was sold.


The query needs to identify all sales between the last day of the
previous month and going back one year.


What I would like to do is to design a query / stored procedure that
will dynamically create the criteria to allow the client to simply run
the query or stored proc.


I know how to establish the last day of the previous month part, I'm
just not sure of how best to design the remainder of the query.


Celko Answers
>> I know how to establish the last day of the previous month part, I'm just not sure of how best to design the remainder of the query. <<


Instead of using procedural coding, why not use a table of the
reporting periods for a decade or two? A simple BETWEEN predicate
will classify each sale quickly and give you extra control over non-
operating days, etc.

difficulty with SQL to get view

SQL Apprentice Question
am having difficulty in designing my SQL. :(


In this setup, an Invoice can have multiple Bills (installment
payments). I would like a query that returns invoices with overdue
bills


a bill is overdue if :: NOT B_Paid and Now() > B_DueDate --- how to
put this in the following '???'


SELECT B_Invoice, ??? As Overdue
FROM T_Bill
GROUP BY B_Invoice


CREATE TABLE T_Bill (
RefNo INTEGER IDENTITY PRIMARY KEY,
B_Invoice INTEGER NOT NULL REFERENCES
T_Invoice,
B_BillDate SMALLDATETIME DEFAULT CURRENT_TIMESTAMP
NOT NULL,
B_DueDate SMALLDATETIME DEFAULT (CURRENT_TIMESTAMP
+14) NOT NULL,
B_Paid BIT DEFAULT 0 NOT NULL,
B_PaidDate SMALLDATETIME DEFAULT CURRENT_TIMESTAMP NOT NULL,
B_PaymentInfo VARCHAR(24) DEFAULT '???' NOT NULL
);



Celko Answers
Your table is not designed properly. Your assembly-style bit flag is
redundant when you have a payment date. IDENTITY cannot be a
relational key by definition and it is redundant given the invoice
number. You never put the table as a prefix on a data element names
because that would change the names from table to table. If the due
date is always 14 days after the billing date, then put compute it in
a VIEW or a query instead of wasting space. This also gives you more
control without having to alter the tables.

CREATE TABLE Billings
(invoice_nbr INTEGER NOT NULL
REFERENCES Invoices (invoice_nbr),
billing_date DATETIME NOT NULL,
PRIMARY KEY (invoice_nbr, billing_date),
paid_date DATETIME, -- null is unpaid
payment_note VARCHAR(24) DEFAULT '???' NOT NULL);



>> I would like a query that returns invoices with overdue bills <<


SELECT invoice_nbr, CURRENT_TIMESTAMP
FROM Billings
WHERE paid_date IS NULL
AND DATEADD (D, billing_date, 14) >= CURRENT_TIMESTAMP;

You need to get a book on basic data modeling. And one on RDBMS.
What you have is a badly designed punch card file system, with flag,
sequential numbering, etc. .