SQL for Smarties | SQL Programming Style | Trees and Hierarchies in SQL | SQL Puzzles and Answers | Data and Databases


Thursday, August 31, 2006

problem with select

SQL Apprentice Question
I'm trying to do a select and I'm having a problem with it (code below)


declare @teste_varchar2 as varchar(20)
declare @teste_varchar as varchar(500)


set @teste_varchar2 = "valor_fact"
exec ('select ' +@teste_varchar2+ ' from ##CONTENC where contracto = ' +
@cont_descCursor)


What is odd with the above code is that if I use a similar code but not
dynamic sql it works.


select valor_fact from ##CONTENC where contracto = @cont_descCursor


Celko Answers
>> I'm trying to do a select and I'm having a problem with it (code below) <<


Oh yes, the old "Britney Spears, Automobile and Squid" code Module!!

The short answer is use slow, proprietrary dynamic SQL to kludge a
query together on the fly with your table name in the FROM clause.


The right answer is never pass a table name as a parameter. You need to
understand the basic idea of a data model and what a table means in
implementing a data model. Go back to basics. What is a table? A model
of a set of entities or relationships. EACH TABLE SHOULD BE A DIFFERENT
KIND OF ENTITY. When you have many tables that model the same entity,
then you have a magnetic tape file system written in SQL, and not an
RDBMS at all.


If the tables are different, then having a generic procedure which
works equally on automobiles, octopi or Britney Spear's discology is
saying that your application is a disaster of design.


1) This is dangerous because some user can insert pretty much whatever
they wish -- consider the string 'Foobar; DELETE FROM Foobar; SELECT *
FROM Floob' in your statement string.


2) It says that you have no idea what you are doing, so you are giving
control of the application to any user, present or future. Remember the
basics of Software Engineering? Modules need weak coupling and strong
cohesion, etc. This is far more fundamental than just SQL; it has to
do with learning to programming at all.


3) If you have tables with the same structure which represent the same
kind of entities, then your schema is not orthogonal. Look up what
Chris Date has to say about this design flaw. Look up the term
attribute splitting.


4) You might have failed to tell the difference between data and
meta-data. The SQL engine has routines for that stuff and applications
do not work at that level, if you want to have any data integrity.

View limitations?

SQL Apprentice Question
I was trying to do a complex view with declares, sets that use selects, and
the main query which uses case statements. I've tried to create this view
using Enterprise manager and VS, but the result is always the same.

Once I have the whole shebang down, I run it to make sure it works, which it
does. However, when I try to save the view, I get two different errors:


VS: Incorrect syntax near the keyword declare
the view starts with the following lines:
DECLARE @roomId int, @cabinetSort tinyint
SET @roomId = ...


EM: Does not execute; I get no results, just a message box telling me one
row was affected by the query. Attempting to save, I get the error: View
definition includes no output columns or includes no items in the from clause


What am I missing here?



Celko Answers

>> What am I missing here? <<


The correct definition of a VIEW.

It is not a procedure with parameters or a code module with local
variables It is a virtual table that is defined by a single SELECT
statement, and it also allows the WITH CHECK OPTION clause.


You are still thinking that it is an executable procedural code module.
It is declarative, not procedural.

Indexing question

SQL Apprentice Question
I have a database table that stores the history of a data readings
taken from hardware devices. The hardware device is queried by software
once a minute (or more) and the value stored in the database for
trending an analysis.

The table structure is:
CREATE TABLE DeviceData (
[DataID] [bigint] IDENTITY (1, 1) NOT NULL ,
[DeviceID] [int] NOT NULL ,
[DataValue] [decimal](18, 4) NOT NULL ,
[DataTimestamp] [datetime] NOT NULL
)


After running for a few weeks, the number of rows in the table is
getting large as expected. I started to notice the queries against this
table are taken longer to run. There is currently a primary key
(DataID) and an index on DeviceID. I later realized that since most of
the queries search for a particular date range, an index on
DataTimestamp would definitely help.


Once I added the index, query times for newly added data greatly
improved. The older data, however, still takes longer then I would
like. My question is, once I added the index, is only new data that
gets added to the table indexed? Is there any way to optimize the
queries for the older data.


Being a software developer, not a DBA, any other adivce is greatly
appeciated.


Celko Answers
What have seen, but apparently do not realize is that your IDENTITY
column is redundant and not a key at all. It is an attribute of the
hardware and has nothing to do with the data model. The data_timestamp
is the natural key and should be so delcared:

CREATE TABLE DeviceData
(reading_timestamp DATETIME DEFAULT CURRENT_TIMESTAMP
NOT NULL PRIMARY KEY,
device_id INTEGER NOT NULL
REFERENCES Devices(device_id),
device_reading DECIMAL (18,4) NOT NULL );


Names like :"data_value" are vague; isn't what you really have the
device readings in that column?



>> Once I added the index, query times for newly added data greatly improved. The older data, however, still takes longer then I would like. My question is, once I added the index, is only new data that gets added to the table indexed? Is there any way to optimize the queries for the older data.<<


If all the data are in the same table, they all are indexed. First
check the data to be sure that you really have unique timestamps. Drop
the redundant, exposed locator IDENTITY column. Drop your indexes.
Add a primary key constraint.

I think that you have the wrong model of the world. In SQL, there are
DRI constraints which you **must have** if you want to have data
integrity. In SQL Server the uniqueness constraints are enforced by
indexes. Other products do it in other ways (hashing, bit vectors,
etc.) These are called primary indexes. But implement a LOGICAL
concept -- a relational key.


To improve performance, you can add optional indexes (secondary
indexes) to a table. This is vendor dependent and not part of the SQL
Standard at all.


You are still thinking about a file system which does not have the
concept of relational keys and whose indexes are all of the same kind.
In a sequentail file, you locate records (which are not rows) by a
physical position number (which newbies fake with IDENTITY).


It takes at least a year of full time SQL programming, a lot of reading
and a good teacher to get the hang of it. Could be worse; could be APL
or LISP :)

Help with CASE syntax

SQL Apprentice Question
Can anyone tell me what is wrong with the following syntax?

CASE {Invoicing Transaction Amounts History:Item Number}
WHEN like 'GP%'
THEN {Invoicing Transaction Amounts History:Extended Price}
ELSE 0
END


Celko Answers
>> Can anyone tell me what is wrong with the following syntax? <<


well, you made it up :)

The CASE expression is an *expression* and not a control statement;
that is, it returns a value of one datatype. SQL-92 stole the idea and
the syntax from the ADA programming language. Here is the BNF for a
:


::= |


::=
CASE
...
[]
END


::=
CASE
...
[]
END


::= WHEN THEN


::= WHEN THEN


::= ELSE


::=


::=


::= | NULL


::=


The searched CASE expression is probably the most used version of the
expression. The WHEN ... THEN ... clauses are executed in left to
right order. The first WHEN clause that tests TRUE returns the value
given in its THEN clause. And, yes, you can nest CASE expressions
inside each other. If no explicit ELSE clause is given for the CASE
expression, then the database will insert a default ELSE NULL clause.
If you want to return a NULL in a THEN clause, then you must use a CAST
(NULL AS ) expression. I recommend always giving the ELSE
clause, so that you can change it later when you find something
explicit to return.


The is defined as a searched CASE expression
in which all the WHEN clauses are made into equality comparisons
against the . For example


CASE iso_sex_code
WHEN 0 THEN 'Unknown'
WHEN 1 THEN 'Male'
WHEN 2 THEN 'Female'
WHEN 9 THEN 'N/A'
ELSE NULL END


could also be written as:


CASE
WHEN iso_sex_code = 0 THEN 'Unknown'
WHEN iso_sex_code = 1 THEN 'Male'
WHEN iso_sex_code = 2 THEN 'Female'
WHEN iso_sex_code = 9 THEN 'N/A'
ELSE NULL END


There is a gimmick in this definition, however. The expression


CASE foo
WHEN 1 THEN 'bar'
WHEN NULL THEN 'no bar'
END


becomes


CASE WHEN foo = 1 THEN 'bar'
WHEN foo = NULL THEN 'no_bar' -- error!
ELSE NULL END


The second WHEN clause is always UNKNOWN.


The SQL-92 Standard defines other functions in terms of the CASE
expression, which makes the language a bit more compact and easier to
implement. For example, the COALESCE () function can be defined for
one or two expressions by


1) COALESCE () is equivalent to ()


2) COALESCE (, ) is equivalent to


CASE WHEN IS NOT NULL
THEN
ELSE END


then we can recursively define it for (n) expressions, where (n >= 3),
in the list by


COALESCE (, , . . ., n), as equivalent to:


CASE WHEN IS NOT NULL
THEN
ELSE COALESCE (, . . ., n)
END


Likewise, NULLIF (, ) is equivalent to:


CASE WHEN =
THEN NULL
ELSE END


It is important to be sure that you have a THEN or ELSE clause with a
datatype that the compiler can find to determine the highest datatype
for the expression.


A trick in the WHERE clause is use it for a complex predicate with
material implications.


WHERE CASE
WHEN
THEN 1
WHEN
THEN 1
...
ELSE 0 END = 1

SP for poll table

SQL Apprentice Question
This is my poll table:
CREATE TABLE [dbo].[Poll](
[Id] [int] NOT NULL,
[Statement] [nvarchar](500) COLLATE Latin1_General_CI_AI NULL,
[Answer1] [nvarchar](500) COLLATE Latin1_General_CI_AI NULL,
[Score1] [int] NULL,
[Answer2] [nvarchar](500) COLLATE Latin1_General_CI_AI NULL,
[Score2] [int] NULL
)


I have a statement with two answers. If users select answer1 the score1
value has to be updated with + 1.


The stored procedure to update the score will get a '1' (for answer1) or a
'2' (for answer2) (and Id).
The stored procedure should update the Score1 or Score2 column. But how do I
know the correct column name?
And second, I first have to select the current score of that column.


Maybe, I'm thinking the wrong way. But I don't know how to do this.


Thanks for your help,


Celko Answers
Let's fix the DDL first. All those nulls made no sense, you had no
key, etc.

CREATE TABLE Poll
(question_nbr INTEGER NOT NULL PRIMARY KEY,
question_txt NVARCHAR(500) NOT NULL,
answer1 NVARCHAR(500) NOT NULL,
score1 INTEGER DEFAULT 0 NOT NULL
CHECK (score1 >= 0),
answer2 NVARCHAR(500) NOT NULL,
score2 INTEGER DEFAULT 0 NOT NULL
CHECK (score2 >= 0));


CREATE PROCEDURE UpdatePollScores
(@my_question_nbr INTEGER, @my_answer_nbr INTEGER)
AS
UPDATE Poll
SET score1
= score1 + CASE @my_answer_nbr WHEN 1 THEN 1 ELSE 0 END,
score2
= score2 + CASE @my_answer_nbr WHEN 2 THEN 2 ELSE 0 END
WHERE question_nbr = @my_question_nbr;

Why does this query work but that one does not

SQL Apprentice Question
Why does this query work;

SELECT e1.emp_no, e1.emp_lname, e1.domicile, d1.location
FROM employee_enh e1 JOIN employee_enh e2
ON e1.domicile = e2.domicile
JOIN department d1 JOIN department d2
ON d1.location = d2.location
ON e1.dept_no = d1.dept_no
WHERE e1.emp_no <> e2.emp_no


but this one does not?;


SELECT e1.emp_no, e1.emp_lname, e1.domicile, d1.location
FROM employee_enh e1 JOIN employee_enh e2
ON e1.domicile = e2.domicile
JOIN department d1 JOIN department d2
ON e1.dept_no = d1.dept_no
ON d1.location = d2.location
WHERE e1.emp_no <> e2.emp_no


All statements contained in both queries are identical. The change
between the two is that the order of the last two ON statements are
switched.


Query Analyzer gives me the following error;
The column prefix 'e1' does not match with a table name or alias name
used in the query.


What is the rule?


Celko Answers
>> What is the rule? <<


Infixed joins are executed from left to right. The ON clause (if any)
is associated with the most recent JOIN clause. Parentheses are
executed in their order of nesting. No great surprises here -- pretty
much like other scoping rules in 3GLs

A derived table can be constructured with parens and an AS clause.
Only the alias is available to containing queries, not the contained
table names. This catches people.

Easy way to compare the contents of 2 tables ?

SQL Apprentice Question
I've got a couple tables with identical structure...

I would like to create an exception report indicating any differences in the
content of any of the columns (even in the case where one value may be null
and its counterpart a space)... Any ideas on how to build such a query ?

Celko Answers
>> I've got a couple tables with identical structure...<<

That should not happen. A table should contain all the entities of the
same kind in one and only one table.


>> I would like to create an exception report indicating any differences in the content of any of the columns (even in the case where one value may be null and its counterpart a space)... Any ideas on how to build such a query ? <<


If you have SQL-2005

(SELECT * FROM Foo
EXCEPT
SELECT * FROM Bar)
UNION
(SELECT * FROM Bar
EXCEPT
SELECT * FROM Foo)


This does not require that you know the structure of the tables, just
that they are alike.
This is called a OUTER UNION (not the same as an OUTER JOIN!) and is
defined in the SQL-92 Standards. Nobody implements it.

Wednesday, August 30, 2006

Filtering a query by date threshold

SQL Apprentice Question
The locations of vehicles are received and stored in a database as a lat/lon
value, and accompanied by the datetime timestamp of when the position was
taken. Because of the technology, sometimes vehicles will submit their
position multiple times within a minute, sometimes they are unable to report
(because of visibility) for a few minutes.

I need to write a query that only contains records of vehicle location that
are more than a minute older than the previous record. To clarify, heres
the DDL for my example:


CREATE TABLE [Locations] (
[location_id] [numeric](18, 0) IDENTITY (1, 1) NOT NULL ,
[vehicle_id] [int] NOT NULL ,
[timestamp] [datetime] NOT NULL ,
[latitude] [numeric](12, 9) NOT NULL ,
[longitude] [numeric](12, 9) NOT NULL ,
CONSTRAINT [PK_Locations] PRIMARY KEY CLUSTERED
(
[location_id]
) ON [PRIMARY]
) ON [PRIMARY]
GO


insert into locations(vehicle_id, timestamp, latitude, longitude)
values(1, '1/1/2006 11:21:00', 1.1, 2.3)


insert into locations(vehicle_id, timestamp, latitude, longitude)
values(1, '1/1/2006 11:21:15', 1.1, 2.3)


insert into locations(vehicle_id, timestamp, latitude, longitude)
values(1, '1/1/2006 11:21:19', 1.1, 2.3)


insert into locations(vehicle_id, timestamp, latitude, longitude)
values(1, '1/1/2006 11:24:00', 1.1, 2.3)


insert into locations(vehicle_id, timestamp, latitude, longitude)
values(1, '1/1/2006 11:24:49', 1.1, 2.3)


insert into locations(vehicle_id, timestamp, latitude, longitude)
values(1, '1/1/2006 11:27:11', 1.1, 2.3)
go


My intended query for this sample data would only return the records 1, 4
and 6, because the others would be within a minute of the previous record
used. Is this possible within a query or will I need to make use of a
stored procedure for such filtering?



Celko Answers
Time is best modeled as durations, not chronons. IDENTITY cannot ever
be a relational key by definition. TIMESTAMP is a reserved word in SQL
as well as too vague. This is one of the few times I would use FLOAT
over NUMERIC(s,p) because the trig libraries are all in floating point.
I hope the vehicle id is really the VIN, so you can verify and
validate it that will be CHAR(17) with a fancy constraint.


>> I need to write a query that only contains records [sic] of vehicle location that are more than a minute older than the previous record [sic]. <<


Rows are not records and when you use the wrong mental model, you are
going to have problems. The column pairs (arrive_time, depart_time) and
(latitude, longitude) are atomic, but not scalar -- that, they make
sense only as pairs. Some products would let you create or use such
built-in data types; we have to fake it in SQL server.

You are mimicing a log in a procedural system, not the fact you want to
capture. Try this schema, with a more accurate table name:


CREATE TABLE LocationHistory
(vehicle_id INTEGER NOT NULL, -- the VIN, I hope
arrive_time DATETIME DEFAULT CURRENT_TIMESTAMP NOT NULL,
depart_time DATETIME,
CHECK (arrive_time <= DATEADD (mm, -5, depart_time)),
vehicle_latitude FLOAT NOT NULL,
vehicle_longitude FLOAT NOT NULL,
PRIMARY KEY (arrive_time, depart_time, latitude, longitude)
);


Now you need a procedure that will close out the prior vehicle location
(i.e. the row with the (depart_time IS NULL; use a VIEW to display
these rows as "last known location") and create a new row for the now
current vehicle location. A simply UPDATE and INSERT in one
transaction -- no fancy self-joins at all.


A constraint simply prevents you from storing data that you did not
want to have anyway. Think in non-procedural terms, not in
step-by-step "capture data, filter data" procedures.


Do not whine about the looooong primary key; without it, you would have
no data integrity at all. If it does not have to be right, the answer
is always 42 :)

Stumped on some code...

SQL Apprentice Question
ok, so I'm workin on a customer & leads part of this application. I
need to setup a way to set a territory location status.

We have two territories - LA and NY...
LA Territory - California Customers & Leads.
NY Territory - Customers & Leads in Connecticut, Maine, Massachusetts,
New Hampshire, New Jersey, New York, Pennsylvania, Rhode Island,
Vermont.


I've got tables that store State, Area Codes and Zipcodes. I've got a
bit field setup to flag the 9 states, 33 Area Codes, and 7,564 Zipcodes
for these states.


We don't always get complete data, so we have to check all of these
variables to see what area the customer or lead belongs to.


I've tried using a UDF & passing in the State, Area, Zip of the
customer/lead and returning what territory the cust/lead belongs to.


I've tried having a case statement.


When I setup the UDF in the search query, our sql server came to a
screamin halt after the memery used by sql maxed out our server... it
jumped from only usin a few megs to just over two gigs, then our server
crashed... so I figured that UDF's are good but not for bulk-results...


I've tried usin the case statement, but it increases the query time
from 4 seconds to 30 seconds.


I've tried checking if the area/zip codes matchup using "IN" and
"EXISTS" tags...


I'm at a loss... seems like whatever I do, the queries go from quick to
crawl.
Any fresh idea's on how to grab this sorta data quickly would be much
appreciated.



Celko Answers
>> I've got tables that store State, Area Codes and Zipcodes. I've got a bit field [sic] setup to flag the 9 states, 33 Area Codes, and 7,564 Zipcodes for these states. <<


Do not program SQL as if it were a low-level language. Fields and
columns are totally different concepts. It sounds like your tables
should look more like this:

CREATE TABLE Area_Codes
(state_code CHAR(2)NOT NULL
CHECK (state_code IN ('CA', 'NY', ..),
area_code CHAR(3) NOT NULL
CHECK (area_code IN ('212', ..),
PRIMARY KEY (state_code, area_code)
);


-- Using ranges instead of bits will make the data much smaller.


CREATE TABLE ZipCodes
(state_code CHAR(2)NOT NULL
CHECK (territory_code IN ('CA', 'NY', ..),
high_zipcode CHAR(5) NOT NULL UNIQUE
CHECK (high_zipcode LIKE '[0-9][0-9][0-9][0-9][0-9]'),
low_zipcode CHAR(5) NOT NULL UNIQUE
CHECK (low_zipcode LIKE '[0-9][0-9][0-9][0-9][0-9]'),
CHECK (low_zipcode <= high_zipcode)
);


I am guessing that you do not need to match area codes to ZIP codes,
since they can overlap. There are tools for that from mailing list
software companies.


You can now compute the Territory code with a CASE expression in a VIEW
or query from the State codes.


Another point is that SQ: is meant to do JOINs and not functions and
procedures. Start thinking that way instead of in proceudral code.

LEFT JOIN... WHERE vs AND ?

SQL Apprentice Question
Ok... I give up... how do these 2 queries differ?


SELECT * FROM Table1 LEFT JOIN Table2 ON (Table1Field = Table2Field)
WHERE (OtherField<>6)
SELECT * FROM Table1 LEFT JOIN Table2 ON (Table1Field = Table2Field)
AND (OtherField<>6)


(The "WHERE" is replaced with "AND")

Celko Answers
>> Ok... I give up... how do these 2 queries differ? <<


Here is how OUTER JOINs work in SQL-92. Assume you are given:

Table1 Table2
a b a c
====== ======
1 w 1 r
2 x 2 s
3 y 3 t
4 z


and the outer join expression:


Table1
LEFT OUTER JOIN
Table2
ON Table1.a = Table2.a <== join condition
AND Table2.c = 't'; <== single table condition


We call Table1 the "preserved table" and Table2 the "unpreserved table"
in the query. What I am going to give you is a little different, but
equivalent to the ANSI/ISO standards.


1) We build the CROSS JOIN of the two tables. Scan each row in the
result set.


2) If the predicate tests TRUE for that row, then you keep it. You also
remove all rows derived from it from the CROSS JOIN


3) If the predicate tests FALSE or UNKNOWN for that row, then keep the
columns from the preserved table, convert all the columns from the
unpreserved table to NULLs and remove the duplicates.


So let us execute this by hand:


Let @ = passed the first predicate
Let * = passed the second predicate


Table1 CROSS JOIN Table2
a b a c
=========================
1 w 1 r @
1 w 2 s
1 w 3 t *
2 x 1 r
2 x 2 s @
2 x 3 t *
3 y 1 r
3 y 2 s
3 y 3 t @* <== the TRUE set
4 z 1 r
4 z 2 s
4 z 3 t *


Table1 LEFT OUTER JOIN Table2
a b a c
=========================
3 y 3 t <= only TRUE row
-----------------------
1 w NULL NULL Sets of duplicates
1 w NULL NULL
1 w NULL NULL
-----------------------
2 x NULL NULL
2 x NULL NULL
2 x NULL NULL
3 y NULL NULL <== derived from the TRUE set - Remove
3 y NULL NULL
-----------------------
4 z NULL NULL
4 z NULL NULL
4 z NULL NULL


the final results:


Table1 LEFT OUTER JOIN Table2
a b a c
=========================
1 w NULL NULL
2 x NULL NULL
3 y 3 t
4 z NULL NULL


The basic rule is that every row in the preserved table is represented
in the results in at least one result row.


There are limitations and very serious problems with the extended
equality version of an outer join used in some diseased mutant
products. Consider the two Chris Date tables


Suppliers SupParts
supno supno partno qty
========= ==============
S1 S1 P1 100
S2 S1 P2 250
S3 S2 P1 100
S2 P2 250


and let's do an extended equality outer join like this:


SELECT *
FROM Supplier, SupParts
WHERE Supplier.supno *= SupParts.supno
AND qty < 200;


If I do the outer first, I get:


Suppliers LOJ SupParts
supno supno partno qty
=======================
S1 S1 P1 100
S1 S1 P2 250
S2 S2 P1 100
S2 S2 P2 250
S3 NULL NULL NULL


Then I apply the (qty < 200) predicate and get


Suppliers LOJ SupParts
supno supno partno qty
===================
S1 S1 P1 100
S2 S2 P1 100


Doing it in the opposite order


Suppliers LOJ SupParts
supno supno partno qty
===================
S1 S1 P1 100
S2 S2 P1 100
S3 NULL NULL NULL


Sybase does it one way, Oracle does it the other and Centura (nee
Gupta) lets you pick which one -- the worst of both non-standard
worlds! In SQL-92, you have a choice and can force the order of
execution. Either do the predicates after the join ...


SELECT *
FROM Supplier
LEFT OUTER JOIN
SupParts
ON Supplier.supno = SupParts.supno
WHERE qty < 200;


... or do it in the joining:


SELECT *
FROM Supplier
LEFT OUTER JOIN
SupParts
ON Supplier.supno = SupParts.supno
AND qty < 200;


Another problem is that you cannot show the same table as preserved and
unpreserved in the extended equality version, but it is easy in SQL-92.
For example to find the students who have taken Math 101 and might
have taken Math 102:


SELECT C1.student, C1.math, C2.math
FROM (SELECT * FROM Courses WHERE math = 101) AS C1
LEFT OUTER JOIN
(SELECT * FROM Courses WHERE math = 102) AS C2
ON C1.student = C2.student;

Tricky Stored Procedure and UDF for timesheet app

SQL Apprentice Question
I am trying to develop a stored procedure and UDF which will create a
holding table to allow
the users of my application to store their timesheets for the current
week and was looking
for some ideas as to how to best put it together.

The stored procedure will need to:


1. Take as input the end day for the current week
2. Generate a new table from the cartesian
product of an employees table and a user
defined function or other means of generating
the dates encompassing the given date, going back 7 days


The table will look like this when first generated


ID Date Employee Name Job No Division
Hours Category
PRIMARY INDEX Generated from udf From employees Blank Blank
Blank Blank


The thing is that I am really struggling with the definition of the UDF
and stored procedure to make this happen.


Can anyone point me in the right direction here?



Celko Answers

>> I am trying to develop a stored procedure and UDF which will create a holding table to allow the users of my application to store their timesheets for the current week <<


Why mimic a paper form in the database? Instead of thinking in
procedural terms, think of using a calendar table (delcarative
relational programming) with a week number column in it. What is that
vague magical universal id column for? You should have a natural key
that needs enforcement. Likewise, your other data element names are
eirther reserved words or too vague.

CREATE TABLE Timesheets
(work_date DATETIME DEFAULT CURRENT_TIMESTAMP
emp_name VARCHAR(35) NOT NULL
REFERENCES Personnel (emp_name),
job_nbr INTEGER NOT NULL
REFERENCES Jobs (job_nbr),
company_division INTEGER NOT NULL,
work_hrs DECIMAL (5,2) DEFAULT 0.00 NOT NULL
CHECK (work_hrs >= 0.00),
job_category INTEGER NOT NULL, -- or is it empl_category? or what?
PRIMARY KEY (work_date, emp_name, ,job_nbr));


The job category should be in the Jobs table; personnel category should
be in the Personnel table. What is this vague category?



>> 2. Generate a new table from the cartesian product [CROSS JOIN] of Personnel table and a user defined function or other means of generating the dates encompassing the given date, going back 7 days <<


You do not go around creating tables on the fly. Especially not tables
with the same structure. That is a non-normal form redundancy and
attribute splitting. You are thinking in terms of scratch tapes,
procedural code and paper forms, and not relational tables.

Just insert the data from the paper forms into the Timesheet table.
You can then do a LEFT OUTER JOIN on the calendar table to fill in the
missing days with with zero hours and a job of "not working" in a
stored procedure or a VIEW.

Semicolon - Why?

SQL Apprentice Question
Why does a semicolon appear at the end of some T-SQL statements in 2005 BOL?

It doesn't seem necessary.


Celko Answers
The semi-colon is part of Standard SQL and always has been. In the
early days of Sybase/SQL Server T-SQL, you did not need it because of
the syntax of the language. Back then, it was just good programming
practice to make the code portable and readable. Today, it is required
in some parts of the language.

It is a statement terminator, not a separator or an introducer -- very
important, if you ever worked with HP's MPE operating system which had
sub-systems that used the semi-colon all three ways.

How to detect orphaned and correct records?

SQL Apprentice Question
I got a Parent table and one that should be a child table of it but found
the relationship missing. The fields to create an maintain it are there but
the relationship itself is missing. Since this is on a production
environment I want to make sure that the records in te table are OK and if
not will have to bring corrections. What query can I use to find the
possible orphaned records in the child table?


Celko Answers

>> I got a Parent [sic] table and one that should be a child [sic] table of it but found the relationship missing. The fields [sic] to create and maintain it are there but the relationship itself is missing. Since this is on a production environment I want to make sure that the records [sic] in the table are OK and if not will have to bring corrections. What query can I use to find the possible orphaned records [sic] in the child [sic] table? <<


The correct terms are "referenced" and "referencing" tables; you are
using pre-RDBMS terms that apply to pointer chains. Rows are not
records; fields are not columns; tables are not files. The reason you
have orphans is that some moron before you missed these basic facts
(i.e. columns can have REFERENCES and other constraints while fields do
not; tables are part of a whole schema while files are unrelated; etc.)

Please post DDL, so that people do not have to guess what the keys,
constraints, Declarative Referential Integrity, data types, etc. in
your schema are. Sample data is also a good idea, along with clear
specifications. It is very hard to debug code when you do not let us
see it. My guess, based on a total lack of any DDL, is:


--orphan killer
DELETE FROM Children
WHERE NOT EXISTS
(SELECT *
FROM Parents AS P
WHERE P.unknown_key = Children.unknown_key);


Now that the data is cleaned, *immediately* add a PK-FK constraint to
the Children.unknown_key column(s). Something like this:


ALTER TABLE Children
ADD CONSTRAINT foobar
FOREIGN KEY (unknown_key)
REFERENCES Parent( unknown_key)
ON UPDATE CASCADE
ON DELETE CASCADE


Mop the floor, but FIX THE LEAK!!

inner join using SubString

SQL Apprentice Question
On the below, I'm trying to inner join on 2 fields.

example data:
stcdl.od = 'AZDB1234'
itk.od = '1234'


UPDATE itk
SET itk.ta = stcdl.ta
FROM itk
INNER JOIN stcdl ON SubString(stcdl.od,5,4) = itk.od


Obviously, what I'm trying to do does not work.
What will work?


Celko Answers
Let's try the Standard SQL syntax instead of the unpredictable
proprietary T-SQL syntax:

UPDATE itk
SET ta
= (SELECT stcdl.ta
FROM stcdl
WHERE SUBSTRING (stcdl.od, 5, 4) = itk.od);


If the scalar subquery returns nothing, you will get a NULL assigned.
If the scalar subquery returns more than one row , you will get a
cardinality violation.

T-SQL Syntax in Stored Procedure Question

SQL Apprentice Question
I haven't used MS-SQL for many years and I'm now trying to port a
Firebird database to MS-SQL 2000. Of course the syntax for store
procedures and triggers is significantly different. I don't want to
get into a here's_the_way_we_used_to_do_it_back_home pissing match,
but here's an example of an very simple SP in Firebird/Interbase:


create procedure spAintThisCool(iCoolID BigInt)
AS
declare variable sField1 VarChar(50);
declare variable sField2 VarChar(50);
declare variable sField2 Text;
Begin
for Select field1, field2, field3
from tblCool
where (CoolID = :iCoolID)
into :sField1, :sField2, sField2
do
Begin
// do some cool stuff with the variables
End
End


The "for" key word would allow us to iterate through the table taking
the values of each row and storing them into the variables declared
using the "into" key word. We could then use the variables in what
ever calculations were needed.


How would this be done in T-SQL?


Also, one of the fields I need to read is a BLOB field with text
content which I believe is called a TEXT field in MS-SQL. Is there
anything special I need to do the read/write fields of type TEXT or do
I just treat it like a VarChar() field?


Finally, the SP examples in the Northwind database are quite simple.
Is there a better source of examples with perhaps some more complex
T-SQL SPs?



Celko Answers
A few syntax things for later:

1) T-SQL declares local variables with a prefix @ at the start of a
BEGIN-END block. No VARIABLE keyword is used.


2) T-SQL uses SELECT ..INTO to load a table, not as a singleton SELECT
as per Standard SQL.


3) Quit putting datatype prefixes on data element names and "sp-" on
procedures. Nothing to do with T-SQL versus some other 4GL; this is
straight ISO-11179 standards and good programming. People will think
you are an OO hillbilly :)


4) There is no FOR loop. You can do it with **uggggh** if you want to
slow things down by at least an order of magnitude.


5) Comments are C-style /* */ pairs or ANSI -- at the start of a line.


6) TEXT is a common name, but everyone has a different version in their
proprietary code; avoid it unless you **really* need it.


7) T-SQL is a very simple one-pass compiler that does not optimize like
other 4GLs. Keep your code as short and simple as you can. The old
"rule of thumb" is no more than one page (50 lines) per procedure.


8) The first execution determines the plan used. Recompiles can be
important and so can modularization. Kimberly Tripp has a good
presentation on SP in T-SQL; see if you can find it.


9) BIGINT is not a cool identifier. No validation. No verification.
Surely you look for industry standard first and when you have to design
an identifier yourself (< 15% of the time), you use a check digit, an
appropriate encoding, etc. Yeah, yeah, I know it is pseudo-code, but
it is (unfortunately) worth mentioning.


10) If you showed us the "cool stuff" someone here can probably write
it with a single SQL statement instead of a cursor.


10) Trapping errors in T-SQL is done with a global flag that has to be
caught then processed. Look up some examples. It is not the SQL/PSM
or the try-catch model.


Good luck on your conversion!

Drop Table using a variable

SQL Apprentice Question
Is there a function that can take an @Variable that will drop a table? I
can't seem to be able to get Drop Table @Variable to work because Drop Table
doesn't want a string, it wants the plain table name.

Celko Answers
>> Is there a function that can take an @Variable that will drop a table? I can't seem to be able to get Drop Table @Variable to work because Drop Table doesn't want a string, it wants the plain table name. <<


Don't write code like this, nor use that mental model for RDBMS. A
table models a set of entities or relationships. Having tables appear
and disappear in an application is like a magic world in which
elephants drop from the sky or mountains dissolve in an instant.

Only the DBA should be creating or dropping tables and then only
**rarely** and with lots of documentation. The documentation is
important because the entire data model has just changed.


I'll bet what you have done is write a non-RDBMS in SQL that mimics a
magnetic tape file system, so you have a "create table on the fly"
routine in dynamic SQL (= "hang scratch tape") and now you want to drop
them (= "dismount scratch tape").


Wrong mental model of SQL programming!


The ANSI model for temporary table is that they are persisted like any
other base table, but they have two extra options: (1) They can clean
themselves at certain points in the session (2) they are either LOCAL
to a session or GLOBAL to everyone. This is very different from the
Sybase/SQL Server model, which is based on scratch tapes. YOu can
mimic this behavior easily in SQL Server.

Views within views

SQL Apprentice Question
We have a large database that gets queried often for reporting purposes. For
data access, we are limiting the users to views only, no direct table access.
The view definitions can be quite complex, incorporating as many as 20
tables or more. We have been limiting the view definitions to contain table
joins only, no views, so that the indexes are being utilized.

Some of the developers would like to create views of the common
multiple-table joins to then be re-used within other views.


I was under the impression that using views that bring back thousands of
rows of data within other views leads to performance problems due to the
"table scan" of the subview for the join of that information to the table in
the outer view.


Does anyone have any thoughts?


By the way, yes, we are looking at creating tables with this information for
reporting use in the long-term but we need to know the thoughts on the above
issue in the short-term.


Celko Answers
>> I was under the impression that using views that bring back thousands of rows of data within other views leads to performance problems due to the "table scan" of the subview for the join of that information to the table in the outer view. <<


Not true. The text of the VIEWs are often copied into the parse tree
and then the whole thing is compiled in the usual manner. The
optimizer can re-arrange the query and is not obligated to execute it
in a particular order. That is what happens with procedural language
and not with declarative ones like SQL

In some products, common VIEWs will be materialized and shared among
sessions, increasing overall thruput. This can be at the expense of
one session, but usually the extra main storage makes up for it and all
sessions run faster.

User-defined Select list

SQL Apprentice Question
I have a stored procedure that selects from a table, the details of the
query or table aren't that important (it's quite simple). Right now,
the columns it selects are specified explicitly. I'd like for the user
to be able to specify which columns they want to select (There are 2 or
3 columns that must always be selected, but the user won't have the
option to deselect them).

What I'd like to do is something like this:


"CREATE PROCEDURE testProcedure
@selectList
AS
SELECT @selectList FROM SomeTable....
...
GO


--calling it
EXEC testProcedure @selectList='SomeTableID, Column1, Column2'"


...Where @selectList is a list of whatever columns the user wants to
see, and passed to the stored procedure from the program. I don't know
if this is possible, or how to even start searching for a solution.



Celko Answers
>> I'd like for the user to be able to specify which columns they want to select (There are 2 or 3 columns that must always be selected, but the user won't have the option to deselect them). <<


Why not filter it in the front end, where display work is supposed to
be done?



SQL Apprentice Question
> >> I'd like for the user to be able to specify which columns they want to select (There are 2 or 3 columns that must always be selected, but the user won't have the option to deselect them). <<

> Why not filter it in the front end, where display work is supposed to
> be done?



I started off with something like that

The table in question has just over 80 columns (depending on how the
next meeting with the client goes, this could increase, on their whim).
The way they want the interface to work, it could retreive between a
couple of hundred and a couple of thousand rows from this table,
depending on what options they select. I found that when they ran their
largest searches (got back a few thousand rows, and by they way I'm
working with a small test set of data, not the FULL set), things just
took too long because it was getting back data for all 80 columns
(well, about 60 at the time, it's grown).


Earlier, I tried to create a number of views: A view for the front
desk, a view for the accountant, a view for the manager, a view for the
recruite, etc... so that someone logs in as accountant and only sees
the data that they NEED to see. The client was insistent that all users
be able to choose from any to all (though I think "all" would be pretty
rare) columns (I think part of the problem is the way the run their
business - everyone can do everyone else's job and often DOES - but
there's not much I can do about that ;) ).


Someone at work suggested a FlexGrid control. For the time to implement
it and get it working, it didn't seem worth the effort.


Basically, any user must be able to select any columns from the large
table. I decided to go with this dynamic SQL because it's simple and it
seems to run nice and quickly.
I am by no measure an SQL expert, so if you have another suggestion for
this sort of problem, I'm open to it.



Celko Answers

>> The table in question has just over 80 columns (depending on how the next meeting with the client goes, this could increase, on their whim). <<


The only tables I have seen with that kind of width are for medical
research where they conduct 100's of tests on each subject. And whim
is a poor way to design a system.


>> Earlier, I tried to create a number of views: A view for the front desk, a view for the accountant, a view for the manager, a view for the recruiter, etc... so that someone logs in as accountant and only sees the data that they NEED to see. <<


Without any further information, I would guess that these should be
tables in their own right. People with a spreadsheet background tend
to think this way.


>> The client was insistent that all users be able to choose from any to all (though I think "all" would be pretty rare) columns (I think part of the problem is the way the run their business - everyone can do everyone else's job and often DOES - but there's not much I can do about that ;) ). <<


Does his auditor know about this practice? What you can do is try to
get some legal protection on yourself when the audits come. Get them
to sign off on the design and note that you gave them constructive
knowledge.

Decades ago, I was volunteered by my wife to work on a UNIX/SQL box at
a New Age/Hippie Co-op. The guy that set up their inventory system was
the worst programmer I had ever met. He was so proud that "It's an SQL
database and anyone can change anything any way they want!" So anyone
DID change things; actuially, it was more like EVERYONE changed things.
When they went out of business in a few months, the inventory was a
total mess. They showed 5000 cases of hot sauce from a commune in
California that had never made that much product, etc.

User defined functions and booleans?

SQL Apprentice Question
For some strange reason, I can't find any information on this... Can a UDF
return a boolean? I know there isn't a boolean type, but there sorta is...
The reason I ask is because it would make my UDFs more graceful, as in:

UDF GetSecurity:
Declare @SecID int
Set @SecID = Select SecId from Security where UserID = @user and blocked = 1
return @SecID


Use:
Select SecureStuff from SecuredTable where not exists (GetSecurity(@userId))


It would be much clearer to do:


UDF Blocked:
RETURN EXISTS(select SecId from Security where UserID= @user and blocked = 1)


Use:
Select SecureStuff from SecuredTable where not Blocked(@userId)


If this is possible, then what do I set the return type in the UDF as?
Boolean obviously isn't a valid choice, nor is bit...



Celko Answers
Now, we are getting to the real problem. You do not want to give up
programming in a procedural language style, with flags, subroutines,
loops, materialized local variables, temp tables to mimic scratch tapes
and all that jazz.


SELECT securestuff
FROM SecuredTable
WHERE EXISTS
(SELECT sec_id
FROM Security
WHERE user_id = @my_user
AND blocked <> 1);


SQL is declarative and has totally different style. Spreadsheets are
the only other declarative language most people use. You might was
well be speaking English with Japanese grammar.


Since you did not bother to post DDL, I am going to guess that
"blocked" is a bit flag with an obvious meaning. This is just like
we wrote in assembly language and very low level procedural code. You
will see a lot of newbies still using the "is_" naming
conventions from C.


In SQL the data would provide us with that information. For example,
we might record the date that a user_id was or will be retired and
compare it to the CURRENT_TIMESTAMP. When you use Boolean flags in a
schema, you must have CHECK() constraints to assure that the flag is
set properly - for example, that users have their blocked_flag = 1
when (expire_date < CURRENT_TIMESTAMP).


But wait a minute! The blocked_flag is redundant and the whole goal of
RDBMS is to get rid of data redundancies. This is one reason we do not
have Booleans; the other reason have to do with NULLs and 3VL.


I tell people it takes at least a year of full-time SQL coding to have
an epiphany and to throw out the old thought patterns.

Monday, August 21, 2006

Dynamic "ORDER BY" based on data in database

SQL Apprentice Question
I'm dealing with a legacy system (yecch) that I'm trying to search. The data
I'm searching is a flattened representation of a tree. My problem is that my
search results aren't being displayed in the correct order.
Sorting order is controlled by settings stored in the database in two
different locations for each level of the tree. If a particular level's
search value is 0, sorting is controlled by a value in the root. Otherwise,
the value in the node is used. The search value is an integer from 1 to 12,
each number identifying the column or columns that the data will be sorted
against.

In a perfect world, I could have a query that looked something like:


declare @bSortBy varchar(20) -- holds the text to ORDER BY
switch (bOrder == 0 ? aOrder : bOrder) -- use b's sorting value only if not
zero
case 1:
select @bSortBy = "b.name" -- sort by node name
break
case 2:
select @bSortBy = "b.dateCreated" -- sort by node creation date
break
-- do the same for tables c and d
SELECT aId, bId, cId, dId WHERE foo = bar ORDER BY @bSortBy, @cSortBy,
@dSortBy


I guess the big problem I have is ordering by different fields depending on
the data I get from my search hits. Is there any way to do this in Sql???
Any pointers would be appreciated.



Celko Answers
>> The data I'm searching is a flattened representation of a tree. <<


Get a copy of TREES & HIERARCHIES IN SQL for some other ways to model
this kind of data. That is a better long-term answer.

Standard SQL-92 does not allow you to use a function or expression in
an ORDER BY clause. The ORDER BY clause is part of a cursor and it can
only see the column names that appear in the SELECT clause list that
was used to build the result set. BP will now chime in that SQL-99
(officially called "a standard in progress" and not recognized by the
U.S. Government for actual use) does allow this.


But aside from this, there is the good programming practice of showing
the fields that are used for the sort to the user, usually on the left
side of each line since we read left to right.


The standard trick for picking a sorting order at run time is to use a
flag in CASE expression. If you want to sort on more than one column
and allow all possible combinations of sorting use one CASE per column:


SELECT
CASE @flag_1
WHEN 'a' THEN CAST (a AS CHAR(n))
WHEN 'b' THEN CAST (b AS CHAR(n))
WHEN 'c' THEN CAST (c AS CHAR(n))
ELSE NULL END AS sort_1,
CASE @flag_2
WHEN 'x' THEN CAST (x AS CHAR(n))
WHEN 'y' THEN CAST (y AS CHAR(n))
WHEN 'z' THEN CAST (z AS CHAR(n))
ELSE NULL END AS sort_2,
...
CASE @flag_n
WHEN 'n1' THEN CAST (n1 AS CHAR(n))
WHEN 'n2' THEN CAST (n2 AS CHAR(n))
WHEN 'n3' THEN CAST (n3 AS CHAR(n))
ELSE NULL END AS sort_2,


FROM Foobar
WHERE ...
ORDER BY sort_1, sort_2, ...


More than one sort column and only a limited set of combinations then
use concatenation.


CASE @flag_1
WHEN 'ab'
THEN CAST(a AS CHAR(n)) ||' ' || CAST(b AS CHAR(n))
WHEN 'ba'
THEN CAST(b AS CHAR(n)) ||' ' || CAST(a AS CHAR(n))
ELSE NULL END AS sort_1,


If you need ASC and DESC options, then use a combination of CASE and
ORDER BY


CASE @flag_1
WHEN @flag_1 = 'a' AND @flag_1_ad = 'ASC'
THEN CAST (a AS CHAR(n))
WHEN @flag_1 = 'b' AND @flag_1_ad = 'ASC'
THEN CAST (b AS CHAR(n))
WHEN @flag_1 = 'c' AND @flag_1_ad = 'ASC'
THEN CAST (c AS CHAR(n))
ELSE NULL END AS sort_1_a,
CASE @flag_1
WHEN @flag_1 = 'a' AND @flag_1_ad = 'DESC'
THEN CAST (a AS CHAR(n))
WHEN @flag_1 = 'b' AND @flag_1_ad = 'DESC'
THEN CAST (b AS CHAR(n))
WHEN @flag_1 = 'c' AND @flag_1_ad = 'DESC'
THEN CAST (c AS CHAR(n))
ELSE NULL END AS sort_1_d


.. ORDER BY sort_1_a ASC, sort_1_d DESC


I have shown explicit CAST( AS CHAR(n)), but if the datatypes of
the THEN clause expressions were already the same, there would be no
reason to force the conversions.


You change the ELSE NULL clause to any constant of the appropriate
datatype, but it should be something useful to the reader.


A neater way of doing this is to use one column for each sorting option
so you do not have worry about CAST() operations.


SELECT ...
CASE WHEN @flag = 'a' THEN a ELSE NULL END AS sort1,
CASE WHEN @flag = 'b' THEN b ELSE NULL END AS sort2,
CASE WHEN @flag = 'c' THEN c ELSE NULL END AS sort3
FROM Foobar
WHERE ...
ORDER BY sort1, sort2, sort3;

Help with reconciling data in two tables

SQL Apprentice Question
I have two tables - PartsShipped and PartsConsumed. Each table has
three columns - Processdate (a date/time stamp), PartNumber, and
SerialNumber.

I need a simple query that will show me the 3 bits of info for all
serial numbers that exist in the PartsShipped table, but not in the
PartsConsumed table.


I know that this shouldn't be too hard, but I cannot for the life of me
get it to work.

Celko Answers

>> Each table has three columns - Processdate (a date/time stamp), PartNumber, and SerialNumber. <<


Without a quantity? A little strange. Please post DDL, so that people
do not have to guess what the keys, constraints, Declarative
Referential Integrity, data types, etc. in your schema are. Sample data
is also a good idea, along with clear specifications.


>> I need a simple query that will show me the 3 bits of info for all serial numbers that exist in the PartsShipped table, but not in the PartsConsumed table. <<


SELECT process_date, part_nbr, serial_nbr
FROM PartsShipped AS S
WHERE NOT EXISTS
(SELECT *
FROM PartsConsumed AS C
WHERE C.serial_nbr = S.serial_nbr);

Ifyou have SQL-2005, you can also use the EXCEPT operator.


SELECT * FROM PartsShipped
EXCEPT
SELECT * FROM PartsConsumed;

Wednesday, August 16, 2006

Stored procedure returning twice the result

SQL Apprentice Question
I have the following procedure returning twice the result, doubling it,
whatever. I think it's the way I'm using the join, but I have no idea
how to solve my problem. I commented the last select/join as that was
doubling the result too.


Thanks for your patience.


BTW, this is a repost, I changed the title as the other one was not
meaningful.


Here's the code :


=============================
CREATE procedure ent_tasks_per_user_company (
@companyName as varchar(50),
@resourceName as varchar(50)
)
AS


SELECT
tasks.WPROJ_ID as WPROJ_ID, tasks.ENT_ProjectUniqueID as
ProjectUniqueID, tasks.ENT_TaskUniqueID as TaskUniqueID,
tasks.TaskEnterpriseOutlineCode1ID as TaskEnterpriseOutlineCode1ID,
codes.OC_NAME as OC_NAME, codes.OC_DESCRIPTION as OC_DESCRIPTION,
codes.OC_CACHED_FULL_NAME as OC_CACHED_FULL_NAME,
taskStd.TaskName as TaskName, taskStd.TaskResourceNames as
TaskResourceNames, taskStd.TaskPercentComplete as TaskPercentComplete


INTO #myTemp


FROM MSP_VIEW_PROJ_TASKS_ENT as tasks


INNER JOIN MSP_OUTLINE_CODES as codes
ON
(
codes.CODE_UID = tasks.TaskEnterpriseOutlineCode1ID
AND
codes.OC_CACHED_FULL_NAME LIKE @companyName + '.%'
)


INNER JOIN MSP_VIEW_PROJ_TASKS_STD as taskStd
ON
(
taskStd.WPROJ_ID = tasks.WPROJ_ID
AND
taskStd.TaskUniqueID = tasks.ENT_TaskUniqueID
--AND
--taskStd.TaskResourceNames LIKE '%' + @resourceName + '%'
)


WHERE (tasks.TaskEnterpriseOutlineCode1ID <> -1)


/*SELECT #myTemp.*, taskCode.OC_NAME as Department FROM #myTemp


INNER JOIN MSP_OUTLINE_CODES taskCode
ON
(
taskCode.CODE_UID = #myTemp.TaskEnterpriseOutlineCode1ID
)*/


SELECT * FROM #myTemp WHERE #myTemp.TaskResourceNames LIKE '%' +
@resourceName + '%'


==============


For a particular employee, I should have only 1 result, but I got 2...
and its the exact same thing.



Celko Answers
Why did you use VARCHAR(50)? Can you give an example you found in your
research of a company name that long? Why did you use text at all
instead of a DUNS or other industry standard?


Can you give an example of a non-unique id, since by definition an
identifier has to be unique? Why did you alias columns back to their
original names? All I can think of is that you like redundancy.


Why do you have double underscores in a name? That is a horrible
practice in any language. It invites typos, it is hard to see on a
laser printer or screen, etc.


Why did you load a temp table instead of using a query expression as
the body of the procedure? I know that this is to mimic a scratch tape
in a 1950's file system in which you had to materialize all the data.
Welcome to SQL and virtual data.


Why did you violate ISO-11179 and use VIEW in data element names?
Actually, you have some other violations, too.


There is no such thing as a code_id; a data element can be one or the
other, but never both by definition. Can I assume that "_code1" is
a meaningful attribute qualifier in your industry? Or is it a bad name
from a repeating group in the table?


I also see that you have a data element called "taskresourcenames",
but since it is a column, it cannot be plural. They model scalar
values, etc.


Surely you did not violate 1NF and cram a list into a column!! But who
knows, since you never posted DDL or sample data. That could be the
cause of duplication.


Finally, the last LIKE predicate is going to be a bit slow. Here is a
clean up on the code, taking it from OO and file system thinking into
SQL style.


CREATE PROCEDURE Ent_T_Per_User_Company
(@my_company_name AS VARCHAR(50), -- careful research??
@my_resource_name AS VARCHAR(50))
AS
SELECT T.wproj_id, T.ent_project_id,
T.ent_task_id,
T.task_enterprise_outline_code1,
C.oc_name, C.oc_description,
C.oc_cached_full_name,
TS.task_name, TS.task_resource_name,
TS.task_percent_complete
FROM MSP_Proj_T_Ent AS T,
MSP_Outline_Codes AS C,
MSP_Proj_T_Std AS TS
WHERE C.code = T.task_enterprise_outline_code1
AND C.oc_cached_full_name
LIKE @my_company_name + '.%'
AND TS.wproj_id = T.wproj_id
AND TS.task_id = T.ent_task_id
AND T.task_enterprise_outline_code1 <> -1)
AND TS.task_resource_name
LIKE '%' +@my_resource_name +'%';


Please post DDL, so that people do not have to guess what the keys,
constraints, Declarative Referential Integrity, data types, etc. in
your schema are. Sample data is also a good idea, along with clear
specifications. It is very hard to debug code when you do not let us
see it.

Monday, August 14, 2006

SQL Challenge Urgent, Please help

SQL Apprentice Question
Hello Expert,
Here I am asking your help.
I have a table with following data:
TaskID ParentTaskID TaskName ProjectName
1 1 BA Rail
2 22 FA Financial
3 1 BA.1 Rail
4 1 BA.2 Rail
5 22 FA.1 Financial
6 22 FA.2 Financial

Now I want the following format:


ID ParentID Name
1 1 Rail
2 1 BA
3 2 BA.1
4 2 BA.2
5 22 Financial
6 5 FA.1
7 5 FA.2


I need to create following hierarchy if I could tranform the data the
above way in the Project Dimension:
Rail
--BA
----BA.1
----BA.2
Financial
--FA
----FA.1
----FA.2
Please help and thanks in advance


Celko Answers
Get a copy of TREES & HIERARCHIES IN SQL for several different ways to
model this kind of problem.

Why are parentheses () so important?

SQL Apprentice Question
Having finally getting a query to run, I discovered it was down to some ()
around an AND.


This is the particular line:


([ExpireDate] is null or [ExpireDate] > @TheDateToday)


Without the (), the query will not run. Why is that please?


Please not, that the above is 1 of about 8 AND's in a query that joins about
6 tables.


Thanks,


Celko Answers
>> Without the (), the query will not run. Why is that please? <<


Basic Boolean algebra. The order of execution is NOT, AND then OR.
The parens force the OR to execute within them and produce a result
that is passed to the rest of the search condition. If your parameter
name was meaningful, then you might want to use this code instead:

(expire_date IS NULL OR expire_date > CURRENT_TIMESTAMP)

Thursday, August 03, 2006

I am new, trying to create a complex view.

SQL Apprentice Question
Here is the current code. It is actually currently working:
SELECT TOP 100 PERCENT dbo.Placements.JobId, dbo.Placements.Branch,
dbo.Placements.PeopleName, dbo.Placements.Status,
CASE WHEN dbo.Place_FallOff.date IS NOT NULL THEN
dbo.Place_FallOff.date ELSE dbo.Placements.PlacementDate END AS 'Placement
Date',
CASE WHEN dbo.Place_FallOff.date IS NOT NULL THEN
YEAR(dbo.Place_FallOff.date) ELSE YEAR(dbo.Placements.PlacementDate) END AS
[Year],
CASE WHEN dbo.Place_FallOff.date IS NOT NULL THEN
MONTH(dbo.Place_FallOff.date) ELSE MONTH(dbo.Placements.PlacementDate)
END AS [Month], dbo.Placements.Reqtype,
dbo.Placements.FeePercentage, dbo.Placements.CandidatePC,
dbo.Placements.SCommission2,
dbo.Placements.jobreqpc, dbo.Placements.SCommission1,
dbo.Placements.JobFillPC, dbo.Placements.Commission1, dbo.Placements.Other,
dbo.Placements.Commission2, dbo.Invoice.Amount,
dbo.Placements.Division, dbo.Placements.CompanyName, CASE WHEN feeinfo =
'flat' AND
dbo.Place_FallOff.date IS NULL THEN feepercentage WHEN
feeinfo = 'percentage' AND dbo.Place_FallOff.date IS NULL
THEN feepercentage / 100 * Salary WHEN feeinfo =
'flat' AND dbo.Place_FallOff.date IS NOT NULL
THEN - 1 * feepercentage WHEN feeinfo = 'percentage'
AND dbo.Place_FallOff.date IS NOT NULL
THEN - 1 * feepercentage / 100 * Salary END AS Fee
FROM dbo.Placements INNER JOIN
dbo.Invoice ON dbo.Placements.PlacementCode =
dbo.Invoice.placementcode FULL OUTER JOIN
dbo.Place_FallOff ON dbo.Placements.PlacementCode =
dbo.Place_FallOff.Placementcode
WHERE (MONTH(dbo.Placements.PlacementDate) = MONTH(GETDATE()) OR
MONTH(dbo.Place_FallOff.[date]) = MONTH(GETDATE()))
AND (YEAR(dbo.Placements.PlacementDate) = YEAR(GETDATE()) OR
YEAR(dbo.Place_FallOff.[date]) = YEAR(GETDATE())) OR
(MONTH(dbo.Placements.PlacementDate) =
MONTH(GETDATE()) - 1) OR
(MONTH(dbo.Placements.PlacementDate) =
MONTH(GETDATE()) - 2)
ORDER BY dbo.Placements.Branch

We have a fall off table that captures the date of the fall off as well as
the placement code and a note as to why the candidate fell off.
I refer to the fall off in the statement above, but get slightly skewed data
(curently reflect the fall off date as the date and the positive fee in one
column and a negative fee in another column). I am looking for a way to
show the original placement (with orignal fee and orginal date) in one row
and then a second row that shows the fall off date and a negative fee
(fee*-1). Any ideas as how to insert this infomation into the view?


Celko Answers
Please post DDL, so that people do not have to guess what the keys,
constraints, Declarative Referential Integrity, data types, etc. in
your schema are. Sample data is also a good idea, along with clear
specifications. It is very hard to debug code when you do not let us
see it.

Just a few guesses at some problems


DATE is a reserved word, as well as useless name that begs the question
"of what?"


SQL can handle unary plus and minus, so that "-1 *" is not needed.


Look up the COALESCE() function to replace your CASE expressions, such
as:
COALESCE (F.falloff_date, P.placement_date) placement_date,


Get rid of the SELECT 100 PERCENT .. think about it for two seconds.
Likewise, putting an ORDER BY on a VIEW also useless. The VIEW is a
table and tables have no ordering by definition.


Learn to use temporal data types and function instead of splitting up
dates into parts, COBOL style.


Why are you doing a FULL OUTER JOIN between placements and fall-offs?
How can someone fall off without being a placement? Also isn't falling
off a status change (i.e attribute) and not a different kind of entity
which requires a new table?

Towel tossed in on use of BETWEEN

SQL Apprentice Question
After extensive testing, with various sizes of tables (the report below concerns a table of 10 million rows), I have concluded that my earlier assertion that using [BETWEEN] seemed to be more efficient than using [ >= AND <= ] was not correct -as many of you gently countered. There were other confounding factors that I had not properly considered.


In fact, using [BETWEEN] incurs a slight performance penalty, most likely in the creation of the execution plan, where [BETWEEN] is converted into [ >= AND <= ].


It is worthy to note that there was no difference found between a properly constructed [ >= AND < ] and [ >= AND <= ].


Since the query was covered with the index, there was no difference between the use of an 'regular' Index or a Clustered Index. (I will redo that test because I want to know the effect of a non-covering index vs. a Clustered Index.


The data below is a representative indication of my results. (The code is also included if someone has use of it for any similar testing.)


The query was crafted so that it would return approximately 2% of the table.
Each Query was verified to return the same number of rows -obstensibly, the same data.
The datetime field was designed to be 'scattered' time values aver a date range of approximately 6.5 years.
The 'unordered' table was loaded to cause significant 'disruption' to table scans.


Table: 10 M Rows
Criteria [ BETWEEN ] [ >= AND < ] [ >= AND <= ]
Query Rows Returned 218,898 218,898 218,898


No Primary key, No Index, Disordered table
Execution Time (ms) 1360 1326 1330


Primary key on [ID], Index on [Testdate]
Execution Time (ms) 60 46 46


Clustered Index on [Testdate]
Execution Time (ms) 60 46 46


/*
Demonstration of Query speed effect from
Using [BETWEEN] vs. [>= AND <] vs. [>= AND <=]


Kudos to the many who challenge and inspire.


The use of @RangeLimit in the WHILE block below serves to limit the
Range of created dates to approx. 6.5 years.
Alter the Primary key and Indexing as desired.


*/


CREATE TABLE #Test
( ID int --PRIMARY KEY
, TestDate datetime
)


--CREATE INDEX Test_datetime_idx ON #Test ( TestDate )
--CREATE clustered INDEX dt_index ON #Test ( TestDate )


DECLARE
@iCounter int
, @RangeLimit int
, @StartDataLoad datetime
, @StopDataLoad datetime
, @TestStart1 datetime
, @TestStop1 datetime
, @TestStart2 datetime
, @TestStop2 datetime
, @TestStart3 datetime
, @TestStop3 datetime
, @DataRowsLoaded int
, @RowsTest1 int
, @RowsTest2 int
, @RowsTest3 int


SELECT
@iCounter = 0
, @RangeLimit = 1


SET NOCOUNT ON


SET @StartDataLoad = getdate()


WHILE @iCounter < 10000000


BEGIN


INSERT INTO #Test ( ID, TestDate )
VALUES ( @iCounter, '01/01/2000'+ cast( rand( @iCounter ) * @RangeLimit AS datetime ))


SELECT
@iCounter = ( @iCounter + 1 )
, @RangeLimit = ( @RangeLimit + 1 )


If ( @iCounter % 2400 ) = 0
SET @RangeLimit = 1


END


SELECT @DataRowsLoaded = count(1)
FROM #Test


SET @StopDataLoad = getdate()


DBCC DROPCLEANBUFFERS
DBCC FREEPROCCACHE


SET @TestStart1 = getdate()


SELECT @RowsTest1 = count(1)
FROM #Test
WHERE TestDate BETWEEN '03/01/2001' AND '03/31/2001'


SET @TestStop1 = getdate()


DBCC DROPCLEANBUFFERS
DBCC FREEPROCCACHE


SET @TestStart2 = getdate()


SELECT @RowsTest2 = count(1)
FROM #Test
WHERE TestDate >= '03/01/2001' AND TestDate < '04/01/2001'


SET @TestStop2 = getdate()


DBCC DROPCLEANBUFFERS
DBCC FREEPROCCACHE


SET @TestStart3 = getdate()


SELECT @RowsTest3 = count(1)
FROM #Test
WHERE TestDate >= '03/01/2001' AND TestDate <= '03/31/2001 23:59:59.998'


SET @TestStop3 = getdate()


DROP TABLE #Test


PRINT ( '' )
PRINT ( 'Loading Data...' )
PRINT ( ' Start at: ' + convert( varchar(30), @StartDataLoad, 113 ))
PRINT ( ' Stop at: ' + convert( varchar(30), @StopDataLoad, 113 ))
PRINT ( ' Load Time (ms): ' + cast( datediff( ms, @StartDataLoad, @StopDataLoad ) AS varchar(10) ))
PRINT ( ' Rows in Table: ' + cast( @DataRowsLoaded as varchar(10) ))
PRINT ( '' )
PRINT ( 'INSERT [ BETWEEN ]' )
PRINT ( ' Start at: ' + convert( varchar(30), @TestStart1, 113 ))
PRINT ( ' Stop at: ' + convert( varchar(30), @TestStop1, 113 ))
PRINT ( ' Difference (ms): ' + cast( datediff( ms, @TestStart1, @TestStop1 ) AS varchar(10) ))
PRINT ( ' Rows in Query: ' + cast( @RowsTest2 as varchar(10) ))
PRINT ( '' )
PRINT ( 'INSERT [ >= AND < ]' )
PRINT ( ' Start at: ' + convert( varchar(30), @TestStart2, 113 ))
PRINT ( ' Stop at: ' + convert( varchar(30), @TestStop2, 113 ))
PRINT ( ' Difference (ms): ' + cast( datediff( ms, @TestStart2, @TestStop2 ) AS varchar(10) ))
PRINT ( ' Rows in Query: ' + cast( @RowsTest2 as varchar(10) ))
PRINT ( '' )
PRINT ( 'INSERT [ >= AND <= ]' )
PRINT ( ' Start at: ' + convert( varchar(30), @TestStart3, 113 ))
PRINT ( ' Stop at: ' + convert( varchar(30), @TestStop3, 113 ))
PRINT ( ' Difference (ms): ' + cast( datediff( ms, @TestStart3, @TestStop3 ) AS varchar(10) ))
PRINT ( ' Rows in Query: ' + cast( @RowsTest3 as varchar(10) ))
PRINT ( '' )


Celko Answers
>> After extensive testing, with various sizes of tables (the report below concerns a table of 10 million rows), I have concluded that my earlier assertion that using [BETWEEN] seemed to be more efficient than using [ >= AND <= ] was not correct -as many of you gently countered. <<


In theory it should be the same. However, the BETWEEN is easier to
maintain and read. Since 80% of the total cost is in maintaining code,
I would not give it up for a small gain (which I doubt is really
there).

It also had a mild advantage in products with better optimizers.
Remember B-Tree indexes? They use a BETWEEN logic to traverse the tree
structure, so they try to group (>= and <=) pairs into a jump to a node
in the index tree. if the BETWEEN is already there, so much the
better. The optimizer will assume that the programmer knows something
and will favor BETWEEN when it has a choice.