Joe Celko The SQL Apprentice: April 2007

Friday, April 27, 2007

joining on same table multiple times on different values

SQL Apprentice Question
Having perfomance problems with this query (returning to many rows and
takening to long), related I think
to joining on the DomainDetail table 3 times.
Is there a better way to do this?

SELECT SO.SystemOrderID AS Order_ID,
SO.SystemOrderTypeID AS Order_TypeID,
SO.OrderStatusID AS Order_StatusID,
SO.CreateDate AS Order_CreateDate,
DD.[Name] AS Detail_Status
DD1.Name AS Detail_Type
DD2.Name AS Order_Status
FROM MyDatabase3.dbo..SystemOrder SO
INNER JOIN MyDatabase2.dbo..SystemOrderDetail SOD
ON SOD.SystemOrderID = SO.SystemOrderID
INNER JOIN MyDatabase1.dbo.DomainDetail DD
ON SOD.DetailStatusID = DD.DomainValue
INNER JOIN MyDatabase1.dbo.DomainDetail DD1
ON SOD.DetailTypeID = DD1.DomainValue
INNER JOIN MyDatabase1.dbo.DomainDetail DD2
ON SO.OrderStatusID = DD2.DomainValue
WHERE SO.CreateDate > dateadd(month,-13,getdate())AND
SOD.CreateDate > dateadd(month,-13,getdate())AND
DD.DomainID IN(125,126,127)

thanks

Celko Answers
I first re-wrote your query to bring the names closer to ISO-11179,
remove the display headers in the SELECT list and get rid of redundant
predicates like "SOD.system_order_id = SO.system_order_id" (that
happens a lot with the infixed syntax ). Untested I got this.

SELECT SO.system_order_id, SO.system_order_type,
SO.order_status, SO.create_date,
DD.name AS detail_status
DD1.name AS detail_type
DD2.name AS order_status
FROM Mydatabase3.DBO..System_Order AS SO,
Mydatabase2.DBO..System_Orderdetail AS SOD,
Mydatabase1.DBO.Domaindetail AS DD,
Mydatabase1.DBO.Domaindetail AS DD1,
Mydatabase1.DBO.Domaindetail AS DD2
WHERE SOD.detail_status = DD.domain_value
AND SOD.detail_type = DD1.domain_value
AND SO.order_status = DD2.domain_value
AND SOD.system_order_id = SO.system_order_id
AND SO.create_date > DATEADD(MONTH, -13, CURRENT_TIMESTAMP)
AND SOD.create_date > DATEADD(MONTH, -13, CURRENT_TIMESTAMP)
AND DD.domain_id IN (125, 126, 127);

What you posted scares me. First of all, there is no such thing as a
"type_id" in a valid data model-- an attribute can be an identifier or
a type, but NEVER both at once. Ditto for "status_id" data element.

Is your order information spread across three databases (mydatabase3,
mydatabase2, mydatabase1)? That just does not work; a data model
needs to be in one place. I am still trying to make sense of the
"system-" prefix -- what is a "System_orders" entity and why is it
logically different from a plain "Orders" table?

There is a magical column with the uselessly vague name "name" that is
equivalent to "detail_status", "detail_type" and "order_status". But
a type and a status are very different kinds of attributes, just as
orders and details are very different kinds of entities. Ditto for
the equally magical "domain_value" data element. We need to know
"name of WHAT!?" to make sense of this.

I am going to make a guess that this schema:

1) Mixes data and metadata. The choice of names implies this strongly

2) Has non-scalar columns. We know that already from your magical
changing columns. Do you find the meaning of the magical domain_value
from the domain_id?

3) Has split entities across tables, which is probably why you have
multiple databases.

Please post DDL, so that people do not have to guess what the keys,
constraints, Declarative Referential Integrity, data types, etc. in
your schema are. Sample data is also a good idea, along with clear
specifications. It is very hard to debug code when you do not let us
see it.

Sunday, April 22, 2007

dynamically change column name when displaying

SQL Apprentice Question
have a table, after i query the table for displaying, several column names
will be changed based on the condition, but I don't want change the acutal
table column names, it will only be changed when displaying.
I tried CTE and virtual table with column aliases, but they doesn't work,
they require you know the column name first, i CANNOT pass @columnName to
them.

I can certainly create a temp table and copy all the data from the orginal
table, and then based on the condition to rename the column name of the temp
table, then display it, but that is not efficient.

Celko Answers
You have missed the ENTIRE CONCEPT OF A TIERED ARCHITECTURE!!

You NEVER query a table for display; you query a table for **data**
and the front end does their display. You are writing 1950's file
systems in SQL!!

Outline number sorting

SQL Apprentice Question
Is there a way to in SQL to sort outline numbers to retain the
"natural order" state?

E.g.:

1.1
1.2
1.3
1.10

Shows up now as:
1.1
1.10
1.2
1.3

Is there any way to use recursive functionality to check for longer
outline numbers (i.e., like 1.1.1.1.1)?

Celko Answers
>> Is there a way to in SQL to sort outline numbers to retain the "natural order" state? <<

I use this system in my books and other things a lot. My solution is
to pad each section with leading zeros and hope I never have more than
99 headings.

00.00.
01.00.
01.01
01.01.02.
etc.

You enforce this with LIKE predicates and ORs in the DDL rather than
trying to do it in the DML.

Stored procedure returns duplicates

SQL Apprentice Question
I am trying to create a report in Crystal Reports (v 8.5). I have a
stored procedure to pull data from two databases and parameters.
There are multiple one-to-many relationships and the stored procedure
returns duplicates; e.g., one schedule may have multiple resources,
supplies, and/or orders (and one order may have multiple foods). Is
there a way to stop the duplication?

The stored procedure looks like this:

************************************************************************************
SET QUOTED_IDENTIFIER OFF
GO

SET ANSI_NULLS OFF
GO

CREATE PROCEDURE usp_rpt1 (
@start_date smalldatetime,
@end_date smalldatetime,
@rpt_type varchar(3),
@rpt_id int
)
AS

set nocount on

--Set up some string variables to build the selection query for the
parameters supplied

declare @fields varchar(255)
declare @tables varchar(255)
declare @where varchar(2000)

CREATE TABLE #tmp_sched(sched_id int, rpt_type_desc varchar(100),
rpt_id int)

set end_date = midnight of next day
SELECT @end_date = DATEADD(day,1,@end_date)
SELECT @end_date = CONVERT(smalldatetime,
CONVERT(varchar(4),YEAR(@end_date)) + '-'
+
CONVERT(varchar(2),MONTH(@end_date)) + '-'
+
CONVERT(varchar(2),DAY(@end_date))

IF @rpt_type = 'LOC'
INSERT INTO #tmp_sched
SELECT DISTINCT s.sched_id, l.loc_desc, l.loc_id
FROM tbl_sched s
LEFT JOIN tbl_sched_res_date srd ON s.sched_id = srd.sched_id
LEFT JOIN tbl_res r ON srd.res_id = r.res_id
LEFT JOIN tbl_grp g ON r.grp_id = g.grp_id
LEFT JOIN tbl_loc l ON g.loc_id = l.loc_id
WHERE l.loc_id = CONVERT(varchar(12),@rpt_id)
AND g.obsolete_flag = 0
AND r.obsolete_flag = 0
ANd l.obsolete_flag = 0
AND s.deleted_flag = 0
AND srd.mtg_start_date_local >=
CONVERT(varchar(20), @start_date, 1)
AND srd.mtg_start_date_local < rpt_type =" 'GRP'" sched_id =" srd.sched_id" res_id =" r.res_id" grp_id =" g.grp_id" grp_id =" CONVERT(varchar(12),@rpt_id)" parent_grp_id =" CONVERT(varchar(12),@rpt_id))" obsolete_flag =" 0" obsolete_flag =" 0" deleted_flag =" 0">=
CONVERT(varchar(20), @start_date, 1)
AND srd.mtg_start_date_local < rpt_type =" 'RES'" sched_id =" srd.sched_id" res_id =" r.res_id" res_id =" CONVERT(varchar(12),@rpt_id)" obsolete_flag =" 0" deleted_flag =" 0">=
CONVERT(varchar(20), @start_date, 1)
AND srd.mtg_start_date_local < rpt_type =" 'REG'" sched_id =" srd.sched_id" res_id =" r.res_id" grp_id =" g.grp_id" loc_id =" l.loc_id" loc_id =" reg.region_id" region_id =" CONVERT(varchar(12),@rpt_id)" obsolete_flag =" 0" obsolete_flag =" 0" obsolete_flag =" 0" obsolete_flag =" 0" deleted_flag =" 0">=
CONVERT(varchar(20), @start_date, 1)
AND srd.mtg_start_date_local < sched_id =" srd.sched_id" res_id =" r.res_id" grp_id =" g.grp_id" grp_id =" 0" parent_grp_id =" 0)" obsolete_flag =" 0" obsolete_flag =" 0" deleted_flag =" 0">=
CONVERT(varchar(20), @start_date, 1)
AND srd.mtg_start_date_local < description =" ts.rpt_type_desc," date =" CONVERT(varchar(12),srd.mtg_start_date_local,101)," starttime =" srd.mtg_start_date_local," endtime =" srd.mtg_end_date_local," schedid =" s.sched_id," meetingtitle =" s.sched_desc," resourceused =" r.res_desc," resourcesetup =" su.setup_desc" numberofattendees =" Attendees.string_value," orderid =" ord.order_id," foodqty =" CONVERT" fooddesc =" i.item_name," side =" sidei.item_name," meetingdesc =" ord.order_desc," supplies =" suppliesudf.udf_desc," suppliesval =" supplies.value," accountcode =" ord.order_user_acct_code," cateringnotes =" ord.order_notes," foodnotes =" oi.order_notes" sched_id =" s.sched_id" sched_id =" srd.sched_id" res_id =" r.res_id" sched_id =" srs.sched_id" res_id =" srs.res_id" setup_id =" rs.setup_id" res_id =" rs.res_id" setup_id =" su.setup_id" sched_id =" supplies.sched_id" request_tab_id =" (SELECT" request_tab_hdr =" 'A)" request_tab_id =" (SELECT" request_tab_hdr =" 'Mtg" udf_id =" suppliesudf.udf_id" sched_id =" s.sched_id" udf_id =" (SELECT" udf_desc =" 'Number" sched_id =" s.sched_id" udf_id =" (SELECT" udf_desc =" 'Meeting" order_sched_id =" s.sched_id" order_id =" oi.order_id" menu_item_id =" mi.menu_item_id" item_id =" i.item_id" order_item_id =" side.order_item_id" item_id =" sidei.item_id" deleted_flag =" 0" deleted_flag =" 0">

Celko Answers
>> Any suggestion is greatly appreciated. <<

EVERYTHING you are doing is TOTALLY wrong. You have just been cussed
out by one of the people who wrote this language. If you have brain
instead of an ego, you might want to listen.

This is a (bad) COBOL program written in SQL! There is so much
formatting done in SQL code! The bad news -- for me-- is that this
code is so awful I cannot use it in my next book as a bad example
because it is too proprietary! You could be famous!

Your code is so awful, you even use the "tbl-" prefixes to tell us you
have no idea about RDBMS! You keep converting dates to strings because
you are writing COBOL in SQL and want strings!

Why do your have "CREATE TABLE #tmp_sched" when view would work?
Answer: because magnetic tape files have to be materialized

Why do you spit on ISO-11179 rules and use a "tbl-" prefix? Because
you know only BASIC programming, which needs the prefixes for the one
pass compiler.

You write SQL with flags like it was 1950's Assembly language! Flags
in SQL!! Ghod Damn!! Varying length identifiers!? And I loved the way
spit on ANSI/ISO Standards with "SET QUOTED_IDENTIFIER OFF", etc.?

You need help you cannot get on a newsgroup.

Thursday, April 12, 2007

Month Year Date Comparison

SQL Apprentice Question
am running into a slight issue with a month year comparison. I
wrote q SP thats user-input driven. User selects a Start Month, Start
Year and an End Month and End Year. The database that houses the
information has month and year columns.

Example:
Name Month Year

Me 1 2006
You 1 2006
Us 3 2005
He 5 2007
She 6 2005
It 7 2006
We 12 2005
They 11 2005

The query looks like this:

select cust.name, CASE rev.[month]
WHEN '01' THEN 'January'
WHEN '02' THEN 'February'
WHEN '03' THEN 'March'
WHEN '04' THEN 'April'
WHEN '05' THEN 'May'
WHEN '06' THEN 'June'
WHEN '07' THEN 'July'
WHEN '08' THEN 'August'
WHEN '09' THEN 'September'
WHEN '10' THEN 'October'
WHEN '11' THEN 'November'
WHEN '12' THEN 'December'
END AS [Month], rev.year
from rev
inner join cust ON rev.name = cust.name
where (rev.name = @ter) AND (rev.month >= @start_month) AND (rev.year

>= @start_year) AND (rev.month <= @end_month) AND (rev.year <=

@end_year)
group by blah blah blah
order by blay blah blah

This works great until they span years.
Example:
Start Month = 11
Start Year = 2005
End Month = 2
End Year = 2006

Any ideas on how to make this work?

Celko Answers
This design flaw is called attribute splitting. A date is a single
unit of information, but you have put it into multiple columns. We can
fake it for your current problem, but you need to learn to use
temproal data types in SQL -- this is not 1950's COBOL any more, which
is how you are writing code.

Wednesday, April 11, 2007

Convert = / = to outer joins (ANSI compliant)

SQL Apprentice Question
I'm working on converting *= and =* to 'left outer join' and 'right outer
join'.

I noticed the difference in behavior between the =* and the phrase "right
outer join" and between *= and 'left outer join'. The result set is
different. Here is an example:

select a.au_id, b.title, c.qty

from titleauthor a, titles b, sales c

where (a.title_id =* b.title_id)

and (a.title_id =* c.title_id)

I try to conver the above to:

select a.au_id, b.title, c.qty

from titleauthor a

right outer join titles b

on (a.title_id = b.title_id )

right outer join sales c

on (a.title_id = c.title_id )

The first results into 391 rows in pubs database of sql-server 2000 and the
second produces 34 rows. It seems not that straight forward to convert.

The question is: If I want to get 391 row, what should I change/add in the
second sql statement?

Thank you. Appreciate your help.

Celko Answers

There is no simple answer because the old proprietary *=, +=, OUTER 
<tbl name>, (+) syntaxes all behaved a little differently.  There were 
a lot of weird tricks to work around the fixed orders of execution and 
so forth. 

I am also not sure how to handle a "HAVING  x *= y" clause, which old 
Sybase people used to write. And then nesting subqueries could mess up 
things.  But if you understand how the ANSI syntax works, then you can 
take the **intent** of the old code and re-write the statements pretty 
quickly. 


Here is how OUTER JOINs work in SQL-92.  Assume you are given: 


Table1       Table2 
 a   b        a   c 
 ======       ====== 
 1   w        1   r 
 2   x        2   s 
 3   y        3   t 
 4   z 


and the outer join expression: 


 Table1 
 LEFT OUTER JOIN 
 Table2 
 ON Table1.a = Table2.a      <== join condition 
    AND Table2.c = 't';      <== single table condition 


We call Table1 the "preserved table" and Table2 the "unpreserved 
table" in the query.  What I am going to give you is a little 
different, but equivalent to the ANSI/ISO standards. 


1) We build the CROSS JOIN of the two tables.  Scan each row in the 
result set. 


2) If the predicate tests TRUE for that row, then you keep it. You 
also remove all rows derived from it from the CROSS JOIN 


3) If the predicate tests FALSE or UNKNOWN for that row, then keep the 
columns from the preserved table, convert all the columns from the 
unpreserved table to NULLs and remove the duplicates. 


So let us execute this by hand: 


 Let @ = passed the first predicate 
 Let * = passed the second predicate 


 Table1 CROSS JOIN Table2 
 a   b        a   c 
 ========================= 
 1   w       1   r @ 
 1   w       2   s 
 1   w       3   t * 
 2   x       1   r 
 2   x       2   s @ 
 2   x       3   t * 
 3   y       1   r 
 3   y       2   s 
 3   y       3   t @* <== the TRUE set 
 4   z       1   r 
 4   z       2   s 
 4   z       3   t * 


 Table1 LEFT OUTER JOIN Table2 
 a   b        a   c 
 ========================= 
 3   y     3      t      <= only TRUE row 
 ----------------------- 
 1   w     NULL   NULL   Sets of duplicates 
 1   w     NULL   NULL 
 1   w     NULL   NULL 
 ----------------------- 
 2   x     NULL   NULL 
 2   x     NULL   NULL 
 2   x     NULL   NULL 
 3   y     NULL   NULL  <== derived from the TRUE set - Remove 
 3   y     NULL   NULL 
 ----------------------- 
 4   z     NULL   NULL 
 4   z     NULL   NULL 
 4   z     NULL   NULL 


the final results: 


 Table1 LEFT OUTER JOIN Table2 
 a   b        a   c 
 ========================= 
 1   w     NULL   NULL 
 2   x     NULL   NULL 
 3   y     3      t 
 4   z     NULL   NULL 


The basic rule is that every row in the preserved table is represented 
in the results in at least one result row. 


There are limitations and very serious problems with the extended 
equality version of an outer join used in some diseased mutant 
products.  Consider the two Chris Date tables 


 Suppliers        SupParts 
 supno             supno partno qty 
 =========        ============== 
 S1               S1   P1    100 
 S2               S1   P2    250 
 S3               S2   P1    100 
                  S2   P2    250 


and let's do an extended equality outer join like this: 


 SELECT * 
  FROM Supplier, SupParts 
 WHERE Supplier.supno *= SupParts.supno 
   AND qty < 200; 


If I do the outer first, I get: 


 Suppliers LOJ SupParts 
 supno supno partno qty 
 ======================= 
 S1     S1   P1    100 
 S1     S1   P2    250 
 S2     S2   P1    100 
 S2     S2   P2    250 
 S3   NULL  NULL   NULL 


Then I apply the (qty < 200) predicate and get 


 Suppliers LOJ SupParts 
 supno supno partno qty 
 =================== 
 S1   S1   P1    100 
 S2   S2   P1    100 


Doing it in the opposite order 


 Suppliers LOJ SupParts 
 supno supno partno qty 
 =================== 
 S1   S1   P1    100 
 S2   S2   P1    100 
 S3   NULL NULL  NULL 


Sybase does it one way, Oracle does it the other and Centura (nee 
Gupta) lets you pick which one -- the worst of both non-standard 
worlds!  In SQL-92, you have a choice and can force the order of 
execution. Either do the predicates after the join ... 


 SELECT * 
   FROM Supplier 
        LEFT OUTER JOIN 
        SupParts 
        ON Supplier.supno = SupParts.supno 
 WHERE qty < 200; 


 ... or do it in the joining: 


 SELECT * 
  FROM Supplier 
       LEFT OUTER JOIN 
       SupParts 
       ON Supplier.supno = SupParts.supno 
          AND qty < 200; 


Another problem is that you cannot show the same table as preserved 
and unpreserved in the extended equality version, but it is easy in 
SQL-92.  For example to find the students who have taken Math 101 and 
might have taken Math 102: 


 SELECT C1.student, C1.math, C2.math 
  FROM (SELECT * FROM Courses WHERE math = 101) AS C1 
       LEFT OUTER JOIN 
       (SELECT * FROM Courses WHERE math = 102) AS C2 
       ON C1.student = C2.student;

Tuesday, April 10, 2007

need help switching a boolean value

SQL Apprentice Question
what i want to acheieve is the following, in my table i have an "isactive"
field, i want to create a sql line something like

update tblProducts set isactive = NOT isactive where productID=123....but it
doesnt work.. is there any syntax that handles this in sql?

Celko Answers
>> in my table i have an "isactive" field [sic: fields and columns are TOTALLY different concepts!!!], I want to create a SQL line [sic: statement? that makes no sense!!] something like this: <<

Do you every read ISO Standards and know not to use "tbl" affixes?

UPDATE Products
SET isactive -- assembly lamnguage flag in SQL !!!
= NOT isactive
WHERE product_id = 123;

....but it doesn't work <<

DUH!! If you had bothered to read ANY SQL book before posting, you
would know we have no Boolean data types in SQL. If you had a better
education in math or RDBMS, you would know why.

Newsgroups are a bad place to get a basic education. We will give you
kludges to get rid of you because of your stupid questions, but they
will not make you a good programmer.

SQL: how to convert time from GMT to EST

SQL Apprentice Question
In my Database, all date and time are stored in GMT, however, when it
is displayed, I need them in EST. How can this be done with Daylight
saving time taken into account as well?

Celko Answers

>> In my Database, all date and time are stored in GMT, .. <<

Interesting. GMT has not existed for a VERY LONG TIME! Google iot, so
you will not look so far behind. Did you mean UTC and just do not
bother to learn this decade's technology?

>> How can this be done with Daylight Saving Time taken into account as well? <<

This is called "local lawful time" and it varies quite a bit. Set up
a copy of the Standard SQL temporal information tables that MS has not
bothered with yet. Then set up VIEWs and use them. These auxiliary
tables will have (start_time, end_time, displacement off of UTC)
columns for the DST.

Monday, April 09, 2007

Data type in audit record

SQL Apprentice Question
I want my application to audit any data changes (update, insert,
delete) made by the users. Rather than have an audit table mirroring
each user table, I'd prefer to have a generic structure which can log
anything. This is what I've come up with:

TABLE: audit_record
*audit_record_id (uniqueidentifier, auto-assign, PK) - unique
idenfiier of the audit record
table_name (varchar) - name of the table where the action (insert/
update/delete) was made
pk_value (varchar) - primary key of the changed record. If the PK
itself has changed, this will store the old value.
user_id (varchar) - user who changed the record
date (datetime) - date/time at which the change was made
action (int) - 0, 1 or 2 (insert, update, delete)

TABLE: audit_column
*audit_record_id (uniqueidentifier, composite PK) - FK to
cdb_audit_record table
*column_name (varchar, composite PK) - name of the column with changed
data
new_value (text?) - value after the change

So every column which changes has its new value logged individually in
the audit_column table. However, I'm not sure what data type the
new_value column should have. The obvious answer (to me) is text, as
that can handle any necessary data type with the appropriate
conversion (we don't store any binary data). However, this table is
going to grow to millions of records and I'm not sure what the
performance implications of a text column will be, particularly given
that the actual data stored in it will almost always be tiny.

Any thoughts/recommendations/criticism would be greatly appreciated.

Thanks

Celko Answers

>> I want my application to audit any data changes (update, insert, delete) made by the users. Rather than have an audit table mirroring each user table, I'd prefer to have a generic structure which can log anything. <<

Any chance you might post DDL instead of your personal pseudo-code?
And I hope you know that auto-numbering is not a relational key.
Finally, Google "EAV design flaw" for tens of thousands of words on
why this approach stinks. There is no such magical shape shifting
table in RDBMS. Data Versus metadata, etc.? Freshman database
course, 3rd week of the quarter?

While you might like this kludge your accountants and auditors will
not. NEVER keep audit trails on the same database or even the same
hardware as the database.

>> Any thoughts/recommendations/criticism would be greatly appreciated. <<

Look at third party tools that follow the law and get a basic dat
modeling book.

Newbie: a query to get back the closest number to user input number.

SQL Apprentice Question
have a situation where I need to return the name and salary of an
employee closest to the salary entered by the user.

Is there a "Nearest" or "Closest" type of operator in sql that will
help me instantly get the row that is closest to an arbitrary number?
If not, can anyone help me with an approach to this type of single
table problem?

Much appreciated.

Celko Answers
Please post DDL, so that people do not have to guess what the keys,
constraints, Declarative Referential Integrity, data types, etc. in
your schema are. Sample data is also a good idea, along with clear
specifications. Is this what you meant to post?

CREATE TABLE Personnel
(emp_name CHAR(25) NOT NULL PRIMARY KEY,
salary_amt DECIMAL (10,2) NOT NULL
CHECK (salary_amt > 0.00));

Try something like this:

SELECT emp_name, salary_amt, @my_salary AS search_salary
FROM Personnel AS P
WHERE ABS(P.salary_amt - @my_salary)
= (SELECT MIN (ABS(P2.salary_amt - @my_salary))
FROM Personnel AS P2);

The closest salary might be higher or lower than the search value, so
you need ABS(). In the real world, there will be a lot of ties, since
most companies have salary schedules rather than odd.

Joe Celko The SQL Apprentice