SQL for Smarties | SQL Programming Style | Trees and Hierarchies in SQL | SQL Puzzles and Answers | Data and Databases


Wednesday, May 31, 2006

Using Aggregate functions in joins

SQL Apprentice Question
SELECT
amount,
currencycode
FROM
Expenses (NOLOCK)


We have to select the latest exchnage rate from other table
EXCHANGE_RATE.. this table will contain exchange rate of every date. We
have to take the exchage rate for the latest day and use for
calculating amount in the expenses table.
i tried this but this is not working.


SELECT EX.usdamount *ER.Exchangerate
from EXPENSES EX INNER JOIN EXCHANGE_RATE ER
on EX.currencycode = ER.Currenycode
and ER.Date = max(Date) from Exchange_rate where currencycode =
ER.CurrencyCode


Can anyone help me with this


Celko Answers
>> We have to select the latest exchnage rate from other table EXCHANGE_RATE. this table will contain exchange rate of every date. <<


I would suggest that you create a table more like this:

CREATE TABLE ExchangeRates
(currency_code CHAR(3) NOT NULL,
rate DECIMAL(8,4) NOT NULL,
start_date DATETIME NOT NULL,
end_date DATETIME, -- current rate
CHECK (start_date < end_date),
PRIMARY KEY (currency_code, start_date));


Now use a VIEW to get the current rate. and BETWEEN predicates to match
a rate to a past date. The table will be faster and more useful than
doing a lot of subqueries and aggregates in your code.

Google API to make text search into database

SQL Apprentice Question
I'd like to know if it's possible to use Google APIs to make a text
search into DBs such Oracle or Sql Server.



Celko Answers
Do not use SQL for text and document searching; get a tool built for
this job. If you are doing this only once, then you can use whatever
your RDBMS product has for strings.

Need help design the database for the scheduling task application

SQL Apprentice Question
I need some help to design the database or table for my
application on scheduling task. My application is to do some scheduling
task based on daily, weekly and monthly. On the daily the choice are
everyday, weekdays, or every what ever value of the days. On the daily
the choice are every what ever value of the week on the selected days.
On the monthly the choice are on what date, on what day of what week,
and on selected month. My scheduling task choice is the same as
Microsoft Windows XP Professional Version 2002 SP 2 scheduling task
choice. I want to store these choices on database so I can check the
value using timer


Celko Answers
>> On the daily the choice are everyday, weekdays, or every what ever value of the days. On the daily the choice are every what ever value of the week on the selected days. On the monthly the choice are on what date, on what day of what week, and on selected month. <<


Do you have a calendar table which identifiies all of these groupings?
Do not try to compute them.

Monday, May 29, 2006

average question

SQL Apprentice Question
SELECT AVG(SCORE) AS AVERAGE_SCORE
FROM GRADES

SCORE is an integar column with values from 0 - 5

Now this only returns integer values such as 3, 4...

How can I make AVERAGE_SCORE to be a decimal?

ie 3.452


Celko Answers
select avg(1.0*SCORE) AS AVERAGE_SCORE FROM GRADES


Just put the query in a VIEW and it will be re-calculated each time.
You are still thinking like a COBOL programmer who wants to write all
his data to a file, not a like an SQL programmer who knows that a VIEW
is also a TABLE and does not have to havea physical existence.

use column value subquery

SQL Apprentice Question
Is it possible to use a columnvalue in a subquery? Like this:


SELECT id, tablename, columnname, description,
(SELECT Count(columnname) FROM tablename) as recordcount
FROM tableobjects


Obviously where "columnname" and "tablename" in the subquery should be the
values from columns in "tableobjects".



Celko Answers
>> Is it possible to use a columnvalue in a subquery? Like this: <<


Perhaps you ought to look up ther concepts of coupling an cohesion in a
basic text on software engineering, so you will know better than to
write code like this. This kidn of code is unpredictable until run
time, has no cohesion, etc. This is not just bad SQL; it is bad
programming in any language.

Get last inserted PK

SQL Apprentice Question
I have a table that has a computed value in the primary key (int column).
The computed expression is below:

(convert(int,(rand() * power(2,30))))


Does anyone know how I can get the last inserted value on this table since I
can't use @@IDENTITY. Thanks.

Celko Answers
What you have will fail because of duplication. Use additive congrunce instead.

http://www.rationalcommerce.com/resources/keys.htm http://www.rationalcommerce.com/resources/surrogates.htm http://www.rationalcommerce.com/resources/lfsr.gif

The formula is easy for a 31-bit number, always gives a unique answer and you will have the last number in a one-row table that will stay in main storage.

Need help with poorly performing SPROC

SQL Apprentice Question
I have a SPROC that gradually slows to a crawl. It processes roughly
38,000 records from a staging table, makes some alterations to data,
and subsequently calls other SPROCS to either insert or update
production data based on what is in the staging table.

When I initially created it, I opened a cursor for the staging table
and went through all of the records. Memory use would gradually
increase to debilitating levels. In an effort to improve performance, I
used a table variable instead of the cursor, and the performance has
improved only slightly. The memory use appears to be stable, but it
looks as though processor usage increases over time (which doesn't
really make sense to me).


When it first starts to run, it is processing about 130 records per
minute. 20 minutes in, it is processing less than 40 records a minute.


For clarity, I am including the SQL here:


CREATE PROCEDURE moveIndividualDataToProduction
@iReturnCode int OUTPUT
AS


--Gett all records in staging, calling wbospsiIndividual to insert all
new records, and/or wbospsuIndividual to update records
DECLARE --vars for the records from IndividualStaging
@iError int,
@iIndividualId int,
@iSiteId int,
@chLanguageCode nchar (4),



--vars to hold existing data from Individual


@oldiIndividualId int,
@oldiSiteId int,
@oldchLanguageCode nchar (4),
@oldvchAssignedId nvarchar (255),
@oldvchSalutation nvarchar (255),
@oldvchFirstName nvarchar (255),



--Need this one to determine the useraccesslevel for ocpuser
@onyxsource nvarchar(15),


--Var for nssg id
@nssgid int,


--vars for xp_smtpmail
@mailmessage varchar(255),
@mailsubject varchar(255),


--vars for OCPUser Insert
@OchOCPUserId nchar(255),
@OiSiteId int,
@OiSystemId int,
@OiTableId int,
@OvchUserAccessLevel nvarchar(25),
@OvchPassword nvarchar(255),



--Need this to store what the next ID will be
@NextID int,


--Store the new individual ID
@newid int,


--Local vars for table variable
@count int,
@iRow int


Select @NextID = iLastId + 1 from LastId where chIdType = 'ocpuser'


DECLARE @IndividualTbl TABLE(
RowID INT IDENTITY(1, 1),
TViIndividualId int,
TViSiteId int,
TVchLanguageCode nchar (4),
TVvchAssignedId nvarchar (255),
TVvchSalutation nvarchar (255),
TVvchFirstName nvarchar (255),
)


--Fill the table variable
INSERT @IndividualTbl
SELECT *
FROM IndividualStaging


SET @count = @@ROWCOUNT
SET @iRow = 1
WHILE @iRow <= @count
BEGIN
SELECT @iIndividualId = TViIndividualId,
@iSiteId = TViSiteId,
@chLanguageCode = TVchLanguageCode,
@vchAssignedId = TVvchAssignedId,
@vchSalutation = TVvchSalutation,
@vchFirstName = TVvchFirstName,
@vchMiddleName = TVvchMiddleName,
@vchLastName = TVvchLastName,

FROM @IndividualTbl
WHERE RowID = @iRow


--Check to see if this is new
IF NOT EXISTS (select vchUser10 from Individual where vchUser10 =
@vchUser10)
BEGIN
--First, we need to add an ID if there is none, and email the person
with the ID
If Len(@vchUser7) < 1 or @vchUser7 IS NULL
Begin
Exec getNewID @newid OUT
Set @vchUser7 = @newid
--We need to email the user with the new ID
Select @mailmessage = N'Blah''
Select @mailsubject = N'Blah'


--exec master..xp_sendmail @recipients='m...@mail.com,@message =
@mailmessage ,@subject = mailsubject
exec master.dbo.xp_smtp_sendmail
@FROM = N'm...@mail.com',
@FROM_NAME = N'SQL Automation Email',


@replyto = N'm...@mail.com',
@TO = N'm...@mail.com',
--@TO = @vchEmailAddress,
@CC = N'',
@BCC = N'',
@priority = N'NORMAL',
@subject = @mailsubject,


@type = N'text/plain',
@message = @mailmessage,


@messagefile= N'',
@attachment = N'',


@attachments= N'',
@codepage = 0,


@timeout = 10000,
@server = N'1.1.1.1'
End


--Call the insert SPROC
exec wbospsiIndividual
@iSiteId,
@iIndividualId OUT,
@chLanguageCode,
@vchAssignedId,
@vchSalutation,
@vchFirstName,
@vchMiddleName,



--Get everything ready to insert new OCPUser record
Select @onyxsource = onyxsource from ldapusers where entryuuid =
@vchUser10
Set @OchOCPUserId = N'U-D' + REPLICATE('0',6 - LEN(CAST(@NextID as
nvarchar(6)))) + CAST(@NextID as nvarchar(6))
Set @OiSiteId = 1
Set @OiSystemId = @iIndividualId
Set @OiTableId = 1
If @onyxsource = 'kbt_usa'
BEGIN
Set @OvchUserAccessLevel = 'OCPLead'
END
Else
BEGIN
Set @OvchUserAccessLevel = 'OCPCust'
END
select @OvchPassword = dbo.cfncEncryptPassword(@OchOCPUserId,'junk')
Set @OchLanguageCode = N'ENG '
Set @OvchUser1 = N'BUS'
Set @OvchUser2 = LEFT(@vchFirstName + N' ' + @vchLastName, 255)
Set @OvchUser3 = @vchEmailAddress
Set @OvchUser4 = NULL
Set @OvchUser5 = NULL
Set @OvchUser6 = NULL
Set @OvchUser7 = NULL
Set @OvchUser8 = NULL
Set @OvchUser9 = NULL
Set @OvchUser10 = NULL
Set @OchInsertBy = @chInsertBy
Set @OdtInsertDate = @dtInsertDate
Set @OchUpdateBy = @chUpdateBy
Set @OdtUpdateDate = @dtUpdateDate
Set @OtiRecordStatus = @tiRecordStatus


Set @NextID = @NextID + 1


Insert into OCPUser values(
@OchOCPUserId,
@OiSiteId,
@OiSystemId,
@OiTableId,
@OvchUserAccessLevel,
@OvchPassword,
@OchLanguageCode,
@OvchUser1,
@OvchUser2,
@OvchUser3,
@OvchUser4,
)
END


--Otherwise, this is an update
ELSE
BEGIN
--Grab the existing record values that won't change in LDAP


select @oldiIndividualId = iIndividualId ,
@oldiSiteId = iSiteId ,
@oldchLanguageCode = chLanguageCode,
@oldvchAssignedId = vchAssignedId,
@oldvchSalutation = vchSalutation,
@oldvchSuffix = vchSuffix,
@oldvchURL = vchURL,
@oldchGender = chGender,
@oldiUserTypeId = iUserTypeId,
@oldiUserSubTypeId = iUserSubTypeId,
@oldchTitleCode = chTitleCode,
@oldvchTitleDesc = vchTitleDesc,
@oldvchDepartmentDesc = vchDepartmentDesc,
@oldiPhoneTypeId = iPhoneTypeId,

from individual where vchUser10 = @vchUser10


--Call the update SPROC
exec wbospsiIndividual
--Keep values from old record - these don't change in LDAP
@oldiSiteId,
@oldiIndividualId,
@oldchLanguageCode,
@oldvchAssignedId,

--Get these values from LDAP, they could have changed
@vchUser4,
--Keep values from old record - these don't change in LDAP
@oldvchUser5,
--Get these values from LDAP, they could have changed
@vchUser6,
--Keep values from old record - these don't change in LDAP
@oldvchUser7,
--Get these values from LDAP, they could have changed
@vchUser8,
@vchUser9,
@vchUser10,
@chUpdateBy,
@dtUpdateDate,
0,
@tiRecordStatus


--Need to update the email address in the OCPUser record - vchUser3
update ocpuser set vchUser3 = @vchEmailAddress where iSystemId =
@oldiIndividualId


END
SET @iRow = @iRow + 1
END


Update LastId Set iLastId = @NextID - 1 where chIdType = 'ocpuser'


--CLOSE Individual_Cursor
--DEALLOCATE Individual_Cursor


SELECT @iError = @@ERROR
IF @iError <> 0
BEGIN
SET @iReturnCode = 1
RETURN @iReturnCode
END
GO



Celko Answers
I am trying to think of a nice way to say this, but you did not get
anything right.

You do not know that rows are not anything like records, that SQL is a
set-oriented language, how to name data elements with ISO-11179 rules
(you actually put data types in names and used camelCase in SQL!), you
used proprietary features to assure this is hard to maintain and port,
etc.


What you have done is mimic a 1950's magnetic tape file system, right
down to the terminology, in SQL destroying all the advantages of RDBMS.


This procedure should be an UPDATE statement and an INSERT INTO
statement and that is about all. Some day SQL Server might have the
SQL-99 MERGE statement and you can do in one statement.


No temp tables, no loops, no cursors, etc.


Since you did not bother to post DDL, it is hard to guess what is
supposed to happen.

Friday, May 26, 2006

CASE AROUND ORDER BY

SQL Apprentice Question
I'm trying to have some control on how my data is ordered
depending on an input parameter

my question is


in a stored procedure how can I do something like this at the end of my
statement.
pOrder as input value where pOrder can be 1 or 0


CASE WHEN pOrder = 1
THEN
ORDER BY STREET
ELSE
ORDER BY CITY
END



Celko Answers
You missed the point that CASE is an expression and NOT a procedural
control statement.

Standard SQL-92 does not allow you to use a function or expression in
an ORDER BY clause. The ORDER BY clause is part of a cursor and it can
only see the column names that appear in the SELECT clause list that
was used to build the result set. Someone will now chime in that
SQL-99 (officially called "a standard in progress" and not recognized
by the U.S. Government for actual use) does allow this.


But aside from this, there is the good programming practice of showing
the fields that are used for the sort to the user, usually on the left
side of each line since we read left to right.


The standard trick for picking a sorting order at run time is to use a
flag in CASE expression. If you want to sort on more than one column
and allow all possible combinations of sorting use one CASE per column:


SELECT
CASE @flag_1
WHEN 'a' THEN CAST (a AS CHAR(n))
WHEN 'b' THEN CAST (b AS CHAR(n))
WHEN 'c' THEN CAST (c AS CHAR(n))
ELSE NULL END AS sort_1,
CASE @flag_2
WHEN 'x' THEN CAST (x AS CHAR(n))
WHEN 'y' THEN CAST (y AS CHAR(n))
WHEN 'z' THEN CAST (z AS CHAR(n))
ELSE NULL END AS sort_2,
...
CASE @flag_n
WHEN 'n1' THEN CAST (n1 AS CHAR(n))
WHEN 'n2' THEN CAST (n2 AS CHAR(n))
WHEN 'n3' THEN CAST (n3 AS CHAR(n))
ELSE NULL END AS sort_2,


FROM Foobar
WHERE ...
ORDER BY sort_1, sort_2, ...


More than one sort column and only a limited set of combinations then
use concatenation.


CASE @flag_1
WHEN 'ab'
THEN CAST(a AS CHAR(n)) ||' ' || CAST(b AS CHAR(n))
WHEN 'ba'
THEN CAST(b AS CHAR(n)) ||' ' || CAST(a AS CHAR(n))
ELSE NULL END AS sort_1,


If you need ASC and DESC options, then use a combination of CASE and
ORDER BY


CASE @flag_1
WHEN @flag_1 = 'a' AND @flag_1_ad = 'ASC'
THEN CAST (a AS CHAR(n))
WHEN @flag_1 = 'b' AND @flag_1_ad = 'ASC'
THEN CAST (b AS CHAR(n))
WHEN @flag_1 = 'c' AND @flag_1_ad = 'ASC'
THEN CAST (c AS CHAR(n))
ELSE NULL END AS sort_1_a,
CASE @flag_1
WHEN @flag_1 = 'a' AND @flag_1_ad = 'DESC'
THEN CAST (a AS CHAR(n))
WHEN @flag_1 = 'b' AND @flag_1_ad = 'DESC'
THEN CAST (b AS CHAR(n))
WHEN @flag_1 = 'c' AND @flag_1_ad = 'DESC'
THEN CAST (c AS CHAR(n))
ELSE NULL END AS sort_1_d


.. ORDER BY sort_1_a ASC, sort_1_d DESC


I have shown explicit CAST( AS CHAR(n)), but if the datatypes of
the THEN clause expressions were already the same, there would be no
reason to force the conversions.


You change the ELSE NULL clause to any constant of the appropriate
datatype, but it should be something useful to the reader.


A neater way of doing this is to use one column for each sorting option
so you do not have worry about CAST() operations.


SELECT ...
CASE WHEN @flag = 'a' THEN a ELSE NULL END AS sort1,
CASE WHEN @flag = 'b' THEN b ELSE NULL END AS sort2,
CASE WHEN @flag = 'c' THEN c ELSE NULL END AS sort3
FROM Foobar
WHERE ...
ORDER BY sort1, sort2, sort3;

Using Materialized Path to create a paths table

SQL Apprentice Question
I wonder if anyone has any thoughts on using the materialized path as
an intermediate step to create a kind of paths table to represent a
tree hierarchy

Here are some records


EmployeeId ParentEmployeeId MaterializedPath
1 1.
2 1 1.1.
3 1 1.2.
4 2 1.1.1


etc.


To find the descendants of a node using the materialized path is quite
low-cost: an index on MaterializedPath would be usable with the
following query


select
Descendant.EmployeeId
from Employee as Ancestor
inner join Employee as Descendant on Descendant.MaterializedPath like
Ancestor.MaterializedPath + '%'
where Ancestor.EmployeeId = 1


To find the ancestors however with the following query


select
Ancestor.EmployeeId
from Employee as Descendant
inner join Employee as Ancestor on Descendant.MaterializedPath like
Ancestor.MaterializedPath + '%'
where Descendant.EmployeeId = 4


requires a table scan.


This is very expensive for any non-trivial number of nodes where you
need to find a node's ancestors.


But if you treat the MaterializedPath as an intermediate result, it is
a trival to use it to build a denormalized EmployeeAncestor table,
which for the set of data above would be as follows


EmployeeId AncestorEmployeeId
1 1
2 1
2 2
3 1
3 2
4 1
4 2
4 4


I.e. there is one record for each of each node's ancestors (including
the node itself)


This table is easy to maintain using the same trigger tha maintains the
MaterializedPath (although not so easy if you did not maintain the
MaterializedPath itself)


The advantage is that it becomes very much cheaper to calculate the
ancestors of a node:


select
Ancestor.EmployeeId
from
EmployeeAncestor Descendant
inner join Employee Ancestor on Ancestor.EmployeeId =
Descendant.AncestorEmployeeId
where Descendant.EmployeeId = 4


The query to find descendant is similar:


select
Descendant.EmployeeId
from
Employee Ancestor
inner join EmployeeAncestor Descendant on Descendant.AncestorEmployeeId
= Ancestor.EmployeeId
where Ancestor.EmployeeId = 1


So it seems to me that the materialized path is not actually a good
thing to use to find ancestor/descendant relationships of itself, but
by using it to create a simple ancestor node-type table, you can get
slightly better performance for finding descendants, and vastly better
performance results for ancestors.


Celko Answers
Get a copy of TREES & HIERARCHIES IN SQL and look up the Nested Sets
model. Much faster, easier to use in calculations, etc. Avoid path
enumeration and adjacency list models.

Previous row calculations using SQL

SQL Apprentice Question
I need help in writing a SQL, where in a calculated column depends on
previous's row column, for example,


Table: test
Time Pkts Seq_no
---------------------------------
10:00 20 25
10:01 15 40
10:02 17 57
10:03 10 60
10:04 12 72


Query Result: The output of the query should be
Time Pkts Seq_no drops
-----------------------------------------
10:00 20 25 NULL
10:01 15 40 0
10:02 17 57 0
10:03 10 60 7 (=57+10-60)
10:04 12 72 0


Is this kind of calculation possible using SQL. Any help in this
regards will be highly appreciated.

Celko Answers
Get a copy of SQL-2005 and look at SUM() OVER() function that is new to
Standard SQL. It can get complicated but it is used for running
totals, etc.

Boolean bit field Which is True False

SQL Apprentice Question
I have a field that I need to query for true and false.
The field is a bit.

Is True 0 or 1.


I can't open Books Online and the download instructions based on my version
SQL 2000, are not clear and I don't know what detailed version I have nor
where to find it.


Celko Answers
>> I have a field [sic] that I need to query for true and false.<<


Let's get back to the basics of an RDBMS. Rows are not records; fields
are not columns; tables are not files. Totally different concepts!


>> The field [sic] is a bit. <<


You are missing another basic point here. There is no BOOLEAN data
type in SQL and it is a sign of bad programming to write with lower
level bit flags in a high-level language.

Get a book on RDBMS and read about three-valued logic and NULLs to
learn why this cannot be implemented in the SQL model.


Also, look at C# and VB -- do you use -1, +1, or 0 for BOOLEAN?

Tuesday, May 23, 2006

Does an AGE dimension make sense

SQL Apprentice Question
I am new in designing a data warehouse. Does it makes sense to have
a dimension table called AGE where maybe numbers from 10 - 100 are in
it? Maybe I should callculate the age in my query ?! I think it is OK
to have a TIME dimension wich is responisble for the birthday, and
other dates in the fact table. Otherwise I have to update the AGE
relationship daily.

Celko Answers
Do you need the age at the time of the event you are modeling or do you
the birthdate to calculate the current age? How accurate do you need
to be? Days, months or years?

If you have the time of the event, then either birthdate or age can be
calculated from the other. For example, "On 2005-10-31 Billy Jones
was 10 years old and bought candy.." versus "On 2005-10-31 Billy
Jones who was born in 1995 bought candy.."

Dimensional modeling problem: need advice from expirienced engeners

SQL Apprentice Question
Let me describe the problem for which I'need your advice.


1. I have large number (50000000 items) of objects with their
propreties
for example these object are some coins.


2. I also have large number (30000000 items) of object which are
related with previous
defined objects(coins). For example "coin collectors"


------------------------------------------------------
SO TILL NOW HE INTRODUCED 2 OBJECT TYPES LETS CONTINUE
------------------------------------------------------


3. I have listing of coins and it's owners every day. So it can be
said that I have snaoshoot of situation for EACH day.


Example:


DAY 1:
______
mike - coin#1235
mike - coin#156
.
.
David - coin#456
.
Peter - coin#78
.
.
.


DAY 2:
______
mike - coin#156
.
.
.
David - coin#456
David - coin#1235


Peter - nocoins


Samuel - coin#4


So we see that:
- some coins changed their owners
- some coins are added
- some new coins are introduced
- some coins are destroyed
- some new owners are on the sceen
- some owners retired


-------------------------------------------------------------------------
SO TILL NOW WE HAVE REGULAR TRADING SITUATION HERE WITH ALL POSIBLE
CASES
-------------------------------------------------------------------------


MY PROBLEM IS HOW TO MODEL MY SITATION USING DIMENSIONAL MODEL IN ORDER
TO
GET MODEL THAT WOULD BE BEST FOR DATA WAREHOUSING AND OLAP ANALYSIS.


EVERY DAY I NEED STATISTICAL INFORMATION LIKE:


1. total number of coins owned by mike on some specific day
2. total number of coins owned by mike i some time interval
3. total number of coins that mike lost in some period of time
4. total number of coins that mike obtained in some period of time


ALSO I NEED STATISTICAL INFORMATION OF TYPE:
5. How many coins are owned by coin owners from JAPAN in some period of
time



Celko Answers
Time is made of durations, so you need to model peridos of ownership.
I also assume that a coin is unique and has only one owner. Try
something like:

CREATE TABLE Ownership
(owner_id INTEGER NOT NULL
REFERENCES Owners(owner_id)
ON UPDATE CASCADE,
coin_id INTEGER NOT NULL
REFERENCES Coins(coin_id)
ON UPDATE CASCADE,
start_date TIMESTAMP DEFAULT CURRENT_TIMESTAMP NOT NULL,
end_date TIMESTAMP, -- null means current
CHECK (start_date < end_date),
PRIMARY KEY (coin_id, start_date));


Now use " my_date BETWEEN start_date AND COALESCE (end_date,
CURRENT_TIMESTAMP) " in your queries.

PLEASE HELP ME WITH DM PROBLEM

SQL Apprentice Question
I am working
on a project that is related to the field of agriculture and that has
as an objective to find the "optimal values" of the operating
conditions that affect the outcome (the amount of meat produced i.e.
the weight) of an animal production (chicken broilers in my case). To
do so, I have to use historical data of previous productions as my
training dataset. The length a production cycle is typically around 44
days. For each production, a data acquisition system stores the
real-time and historical data of hundreds of parameters. These
parameters represent sensor measurements of all the operating
conditions (current temperature, set point temperature, humidity,
static pressure, etc...) and these are what I refer to as the inputs.
The operating costs and the production outcome are what I refer to as
outputs. The operating cost is indirectly computed from parameters
like water consumption, feed consumption, heater/cooling runtimes, and
lighting runtime; and the outcome of a production is defined by
parameters like animal mortality and conversion factor (amount of feed
in Lbs to produce 1Lb of meat). So the main objective of this project
is to find the set of "optimal daily values" (1value/day) for the
inputs that would minimize the operating costs and conversion ratio
outputs.
The biggest problem I am facing right now is the following: The
historical data that I have in the DB are time series for each measured
parameter. Some of these time series follow some kind of cyclic
pattern (e.g. daily water/feed consumption ...) while others follow an
increasing/decreasing trend (animal weight, total heater run time,
total water/feed consumption.....). My goal is to be able to come up
with a model that suggests a set of curves for the optimal daily values
throughout the length of the production cycle, one curve for each
measured input/output parameter. This model would allow the farmer to
closely monitor his production on a daily basis to make sure his
production parameters follow the "optimal curves" suggested by my
model. I have looked at ANN and I think it might be the solution to my
problem since it allows to model multiple input/outputs problems (Am I
wrong?), but I could not figure out a way to model the inputs/outputs
as time series (an array of values for each parameter). As far as I
know, all kinds of classifiers accept only single valued samples.
One approach would be to create one classifier/day (e.g. for day1:
extract a single value for each parameter and use these values as a
training sample and repeat this for all previous production to
construct the training set). The problem with this approach is that 44
or so classifiers will be constructed (hard to manage all of this) and
each of these resulting ANN will be some kind of "typical average"
of the training data but not necessarily the "optimal values"
leading to the best production outcome, if I am not mistaken.
Another approach would be to find a way to feed in the inputs and
outputs as time series (an array of 44 daily values for each
input/output parameter). In this case, there would be only one
resulting ANN and the training samples, would be a set of arrays for
each parameter, as opposed to single daily parameter values in the
first case. The problem is, I could not find any classifier that would
allow me to do that.


Another issue that I have is the amount of data. While a single
production cycle could represent 1-2GB of data, the length of the
production cycle (44 days) makes it difficult to have 100's of
production cycle historical data, as I could gather data for no more
than 7 full cycles/year. Fortunately, a farm can have many production
units (5-10 barns/site in big sites), so this makes it possible to have
40-70 cycles/yr. My question is: would this be enough to come up with
an acceptably accurate model or is it necessary to have hundreds of
samples?



Celko Answers
I seem to remember "Evolutionary Operation" -- EvOp -- from chemical
manufacturing. The basic idea is small adjustments in multiple factors
to get an optimal setting for a process. There was an assumption of a
local optimal point among the parameters, but a relatively small sample
is needed to adjust things.

Monday, May 22, 2006

Infinity and indefinite extensibility

SQL Apprentice Question
There is probably some standard terminolgy for what you are about to read.
I don't know what that standard terminology is. Sorry about that.

I think it's worthwhile, in the discussion of infinite domains and finite
state implementations, to distinguish between "infinite" and "indefinitely
extensible".


For example, the familiar decimal place value notation system for natural
numbers is indefinitely extensible. That is, there is no such thing as the
largest natural number that the scheme can represent. If a number can be
represented in decimal, its successor can be represented as well. And so
on.


However, every number that can be written in decimal is finite. And every
set of natural that has been formed by listing the elements is a finite
set. And this will remain true until the twelth of never. (yuk, yuk).


You can't represent "infinity" with natural numbers, but there is no upper
bound to the range of natural numbers.


It seems to me that this distinction would be revelant over in the
"impossible database design" topic.



Celko Answers

>> There is probably some standard terminolgy for what you are about to read.<<


Most of it comes from Cantor and set theory. Before Cantor, infinite
was a process and hence the "tipped over" 8 notation. It is the
programmer's endless loop written with a symbol to show there is no
upper bound. This is why you see it in limits and summation of a
series.

After Cantor, we got the aleph notation. The numbers are put into a
set and the set is treated as a completed whole thing and not a process
-- the SQL model. It gets a bit tricky when you do summation over a
set instead of looping over a sequence.


If you have a test for set membership that meets certain conditions,
then you do not need to mateiralize the whole set.

Referential Integrity - Countries & States/Provinces

SQL Apprentice Question
I have a Country table with (surprise!) countries, and a Locale table
with states, territories, provinces, etc. I also have another table,
Port, that usually contains Country information but only occasionally
contains Locale information.

My question is, how do I set up the foreign keys maintain referential
integrity for Locale data but still allow for NULLs?


The following seems to fail whenever I have NULLs for the Country
and/or Locale of a Port:


CREATE TABLE Country
(
CountryID nchar(2) NOT NULL PRIMARY KEY,
Country nvarchar(50) NOT NULL
)


CREATE TABLE Locale
(
CountryID nchar(2) NOT NULL REFERENCES Country (CountryID),
LocaleID nvarchar(3) NOT NULL,
Locale nvarchar(100) NOT NULL,
PRIMARY KEY (CountryID, LocaleID)
)


CREATE TABLE Port
(
PortID smallint NOT NULL PRIMARY KEY,
Port nvarchar(100) NOT NULL,
SeaID tinyint NOT NULL REFERENCES Sea (SeaID),
CountryID nchar(2) NULL, --sometimes NULL!!!
LocaleID nvarchar(3) NULL, --often NULL!!!
FOREIGN KEY (CountryID, LocaleID) REFERENCES Locale (CountryID,
LocaleID)
)



Celko Answers
Look up the nested sets model and buidl a hierarchy of geography.
Also, ISO country codes are CHAR(3). The trick will be in having a
node type in the locations table and so constraints on the leaf nodes.

Sets and Lists, again

SQL Apprentice Question
Recently, in a thread on implementing both threads and lists in a
programming language, the example of lists or sets of Presidents arose. I
mentioned that in a list of presidents, Grover Cleveland would appear once,
but in a list of presidencies, he would appear twice.

Bob Badour asked what purppose would be served by a list of presidents, or
words to that effect. I'm interested.


If one could have a set of presidents, why would one ever want a list? In
general, if a language implements sets, why would the same language need
to also implement lists? What does it buy you?


I'm thinking of Lisp, which implemented lists, but not sets. MDL (aka
Muddle) implemented arrays, and that's one step closer to implementing sets,
but not all the way.


SQL implemented sets, but not lists. Although local extensions of SQL do
implement lists, e.g. "Segmented Strings" in DEC Rdb (aka Oracle/Rdb),
it's not really part of the language as such.


I'm also thinking of Pascal, which implemented sets, (as bitmaps), and also
lists, albeit implicitly. What I mean is that you can combine the concepts
of "record" and "pointer" in Pascal to construct dynamic linked lists of
whatevers. But Pascal was primarily for teaching and learning programming,
and may have implemented both for precisely that purpose.


So, if you have sets, why do you need lists?


Celko Answers
>> I mentioned that in a list of presidents, Grover Cleveland would appear once, but in a list of presidencies, he would appear twice. <<


We should not model presidents, but "The Presidency" by perople and
terms, so the n-th president of US has the value of Mr. X. And the
order is important -- so you can blame the previous guy for a disaster
that happened in your term.


>> So, if you have sets, why do you need lists? <<


Ever try to write a parser in SQL instead of LISP? The LISP/list model
is the best way to lingustics, semantic networks, etc. Those tools do
STRUCTURE and not DATA.

One of my Pascal programming exervcises was based ont he combinartory
operators given in Raymond Smulliyan's book "To Mock a Mocking Bird"
and part of the problem was to pick a method. Pascal list processing
was 1000 times better than try to process the expression as strings !

Numbering in SQL

SQL Apprentice Question
I have table with 10-20 rows with field P6 which is empty. I want to
update numbers to P6 starting 1 and increasing by 1. I suppose it is
done by triggers but I don't know how to do that

Celko Answers
>> have table with 10-20 rows with field [sic] P6 which is empty [sic]. <<


You have NEVER written SQL before, have you? Columsn are not fields
and we have NULLs, not the empty spreadsheet cells you assume. TOTALLY
WRONG MINDSET!


>> I want to update numbers to P6 starting 1 and increasing by 1. <<


That is a SEQUENTAL MAGNETIC TAPE FILE and you are tryign to write
1950's code in SQL! It has northing whatsoever to do with RDBMS.
Tables have no ordering by definition. This is soooooo wrong

Geeesh!!! You can insert a set at a time, so adding one to a previous
value makes no sense. Doesn't anyone go to RDBMS classes any more?

need help with a query

SQL Apprentice Question
Create table test
(col1 varchar(50),col2 int)

insert test values ('ABC',1)
insert test values ('ABC',2)
insert test values ('XYZ',1)
insert test values ('XYZ',1)


I would like the output to report those unique values in Col1 that have
different values in Col2.. So the output based upon the input in table test
would return
'ABC'



Celko Answers
SELECT col1
FROM CrappyNonTableWithoutKey
GROUP BY col1
HAVING MIN(col2) < MAX(col2) ;

SQL Apprentice Question
Should that not be

HAVING MIN(col2) <> MAX(col2) ;

They are looking for values where col2 are different.


Celko Answers
Think about it for a second:

HAVING MIN(col2) <> MAX(col2) ;


is defined as:


HAVING (MIN(col2) < MAX(col2)) OR (MIN(col2) > MAX(col2))


becomes


HAVING (MIN(col2) < MAX(col2)) OR FALSE


becomes


HAVING (MIN(col2) < MAX(col2))

Friday, May 19, 2006

how to replace NULL values in a query

SQL Apprentice Question
I'm using MS SQL Server 2000. I have a simple table (Table1):

Table1:
ID Name PostCode
1 James 12345
2 Mandy 99100
3 John NULL
4 Alex NULL


how to build a query that list all Table1 data replace those with PostCode
NULL with Postcode '00000' ?


Result expected:
ID Name PostCode
1 James 12345
2 Mandy 99100
3 John 00000
4 Alex 00000

Celko Answers
Use ISNULL in an UPDATE, then go back and change the DEFAULT to
'00000'', if that is your convention. Remember, mop the floor but then
fix the leak.

Wednesday, May 17, 2006

the datatype

SQL Apprentice Question
I got some problems here. I have an attribute that determines the unit
of something,e.g. the size of using "mm" , the length of using "seconds"
and something others may using "n-page", so ,which data type should I
use. the "nvarchar" or others?


thank you very much!

Celko Answers
>> I have an attribute that determines the unit of something,e.g. the size of using "mm" , the length of using "seconds" and something others may using "n-page", so ,which data type should I use. the "nvarchar" or others? <<


You need to learn how a relational design works. What meaningful name
could you give such a column? "some_kind_of_unit" or "don't_know_unit"
or "could_be_anything" is a bit vague. Since seconds cannot be
transformed into volts, etc. this column is in violation of !NF -- it
holds several totally different things, like field in a file system.

Units of measure are part of an attribute's value, not an attribute.
You are confusing data and meta-data. Get of a copy of
SQL PROGRAMMING STYLE or SQL FOR SMARTIES and read the chapters on scales and
measurements, and the design of encoding schemes.

Impossible Database Design

SQL Apprentice Question
I was just wondering if it is possible to create a database design
where you can have


1. Infinite repeating events like "every year, every last monday,..."


2. The possibility to test whether events overlap other events


I would say it is not possible since I would need to run an infinite
query to catch a singular event..
Any ideas?
What would be the 'nearest' approach?

Celko Answers
>> you can have 1. Infinite repeating events like "every year, every last monday,..."
<<

Create a view which uses the CURRENT_DATE to get the year and to form
a window of (-x, +x) years around it. I use this trick for a table of
5 second intervals in a day.



>> 2. The possibility to test whether events overlap other events <<


The usual way to do that is see if a calendar date falls between the
start and end dates of more than one event. But you already know that
a Monday event will not overlap a Tuesday event, July event will not
overlap a June event, etc.

>> My conclusion would be that probably the best way would be to simply not support queries for predicting overlaps for dates without ending point.. <<


You can use a NULL end_date for an "eternity marker" then "COALESCE
(end_date, CURRENT_DATE)" is a VIEW or queries for the current state of
affairs. For longer term, use something like "COALESCE (end_date,
(SELECT MAX(end_date FROM Calendar))".

You do not have to generate all the dates (which can only go to
9999-12-31 in ISO Standards), but just a subset of them as needed. The
days-of-the week cycle is every 19 years (i Think) so you can use that
fact to set your window.


>> Can you think of any better evidence that Joe is a self-aggrandizing ignorant who wouldn't recognize logic if it bit him on the ass? <<


I am self-aggrandizing when I mention someone else's book? Wow!
Great logic.

Check the literature; Chronons are out of fashion and have been for
some time. For example, TSQL projects have had about 18-20 people
working on them, all with a decade or two in temporal data and they
reject that model. Jan Hidders is more to date of this than I am, if
you want current info.


I do not need to materialize Aleph null or Aleph one rows in a table
if I can test any requested value to see if it is an element of the
set. Remember basic set theory? I can either enumerate a (finite) set
or I can give a membership rule (infinite sets). Then we argue about
what rules the rules must follow. Dedekind cuts at pi and all that
jazz.


>> Java's Date class (irony?) uses a 64 bit long as milliseconds, giving it millisecond resolution, and the ability to represent dates crazy-far into the future <<


The old FIPS-127 specs said that SQL had to go out to at least 5
decimal seconds, but önly" to 9999-12-31; do java programmers think
their code will still be in use the year "crazy-far"? :)

Yes, there is some complexity in doing it in SQL, but I can copy the
code and feel pretty sure about it. Your code will always look better
in a make-believe language that you do ot have to implement. But I
find some of Date's code to be problematic. Various arrangements of
PACK() and UNPACK() produce different representations of the same
facts, which he considers a problem in SQL, but not in his language.

I can live with the assumption that time has a starting point -- some
temporal logic uses that model because time moves in one direction.
But I cannot see time with a pre-determined end point.


The one that really got me, however, was some charts in the back of the
book where one axis is parts (P1, P2, etc.). He puts them into
intervals, just the way he does Chronons. Parts are discrete and
keyed by a nominal scale, and are not continous or dense. If you stop
and look at it, this is a multi-value DBMS model, not a relational one.


And time is a bitch to work with by its nature-- remember the old kids
math puzzle Ïf a hen and half can lay an egg and half in a day and a
half, then how long does it take for ..?"



>> Date's approach seems completely reasonable to me. After all, we are dealing with computers, right? The best we can hope for are acceptable


representations of continuous systems. I mean, is there really a need
to handle a time interval as an infinite number of instants? <<

No, not as an "infinite number of instants", but as a continuum, which
is very different. A continuum has no points, so everything is an
interval. This is what explains Zeno's paradoxes. This is a model,
not an implementation. What we should have done in SQL was require the
(start,end) model instead of points in time either directly or by
implication

NULLs in Ingres

SQL Apprentice Question
I'm seeing SQL like this work fine:

UPDATE topics SET bonus_time=NULL, last_read=NULL WHERE id=3

While SQL like this fails:

SELECT * FROM topics WHERE (topics.parent_id = NULL) AND ( (topics.type
= 'Reply' OR topics.type = 'SillyReply' ) )

With this error: 'line 1, Column 'null' not found in any specified
table.'


So, I can set a value equal to NULL but I can't seach on NULL?


I'm starting to read about the ifnull ingres SQL command but thought
I'd ping the group about the "proper" way to use a NULL in a WHERE
statement. Is it simply not possible or is there a special way to
escape it?


Celko Answers
"SET x = NULL" is an assignment clause.
"WHERE x = NULL" is a search condition

The nature of NULLs is to propagate and they do not compare to anything
because they are ,arkers and have no values or data type. To help the
SQL engine you can CAST (NULL AS ), however.


I have a good section on them in SQL FOR SMARTIES.

create table with recursive relationship

SQL Apprentice Question
I am fairly new to SQL and I am currently trying to create
a SQL table (using Microsoft SQL) that has a recursive
relationship, let me try to explain:

I have a piece of Data let's call it "Item" wich may again contain one
more "Items". Now how would I design a set of SQL Tables that are
capable of storing this information?


I tried the following two approaches:


1.) create a Table "Item" with Column "ItemID" as primary key, some
colums for the Data an Item can store and a Column "ParentItemID". I
set a foreign key for ParentItemID wich links to the primarykey
"ItemID" of the same table.


2.) create separate Table "Item_ParentItem" that stores
ItemID-ParentItemID-pairs. Each column has a foreign key linked to
primary key of the "Item" Column "ItemID".


In both approaches when I try to delete an Item I get an Exception
saying that the DELETE command could not be executed because it
violates a COLUMN REFERENCE constraint. The goal behind these FK_PK
relations is is that when an Item gets deleted, all childItems should
automatically be deleted recursively.


How is this "standard-problem" usually solved in sql? Or do I inned to
implement the recursive deletion myself using stored
procedures or something ?



Celko Answers

>> How is this "standard-problem" usually solved in sql? <<


Get a copy of TREES & HIERARCHIES IN SQL for several ways to model this
kind of problem.


>> Or do I inned to implement the recursive deletion myself using stored


procedures or something ? <<

No need for recursive procedural code if you use the nested sets model.
Younger programmers who learned HTML, XML, etc. find it to be
intuitive. Older programmers who grew up with pointer chains need to
adjust their mind-set.

DB2: restart sequence number

SQL Apprentice Question
How to restart sequence number once the data is imported/loaded into
the tables?


Celko Answers

>> How to restart sequence number once the data is imported/loaded into the tables? <<


The kludge is ALTER SEQUENCE.

The right answer is that you should not have to do a re-sequence. This
will change the values in that column, and thus the meaning of the
rows.


Surely you did not use this as an exposed physical locator to mimic a
magnetic tape file! It must have meaning in your data model and not in
the physical storage for this to be a relational design.

View from a merge of two tables

SQL Apprentice Question
I have two tables.


CREATE TABLE tblEmployees
(
EmployeeID int identity NOT NULL,
LastName varchar(50) NULL,
FirstName varchar(50) NULL,
);


CREATE TABLE tlkpDept
(
DeptID char(5) NULL,
Name char(10) NULL,
LongName char(50) NULL
);


Now I want to create a view called AssignedTo. [The application I'm
doing, will track the status of our customer requests. Each request
can be assigned to either an individual employee or an entire
department]


I want the view to have two columns, ID and LongName. ID should be
either the DeptID or the EmployeeID. The LongName column should be
either the LongName or the FirstName + LastName.


I'm not even sure how to begin to write such a complex SQL.
EnterpriseManager is being less than helpful too.


Celko Answers

>> I have two tables. <<


Actually, you have no tables; they lack a primary key. Did you really
hire people whose names you do not know and sequentially number them?
Do you have departments without names?

You are also not asking for a proper query - you are violating 1NF by
trying to force one column to hold two different domain values.
Formatting is done in the front end in a tiered architecture, a
principle more basic than SQL programming.


Can I assume that you assign personnel to departments? I also assume
that there tables for departments, customers, etc. in the schema.
Let's try this:


CREATE TABLE Personnel
(emp_id INTEGER NOT NULL PRIMARY KEY,
last_name VARCHAR(50) NOT NULL,
first_name VARCHAR(50) NOT NULL,
dept_id CHAR(5) NOT NULL
REFERENCES Departments(dept_id),
..
);



>> The application I'm doing will track the status of our customer requests <<


Then we need a table for those trouble tickets:

CREATE TABLE CustomerRequests
(ticket_nbr INTEGER NOT NULL PRIMARY KEY,
ticket_date DATETIME DEFAULT CURRENT_TIMESTAMP NOT NULL,
cust_id INTEGER NOT NULL
REFERENCES Customers(cust_id),
dept_id CHAR(5) NOT NULL
REFERENCES Departments(dept_id),
emp_id INTEGER - null means department level
REFERENCES Personnel(emp_id)
ON UPDATE CASCADE
ON DELETE SET NULL,
request_status INTEGER NOT NULL,
..);


Notice that I kick the request to the department level with DRI
actions. The query would be something like this.


SELECT R.ticket_nbr, R.cust_id, R.dept_id,
D.dept_name, D.long_dept_name,
P.emp_id, P.last_name, P.first_name,
FROM (CustomerRequests AS R
INNER JOIN
Department AS D
ON D.dept_id = R.dept_id)
LEFT OUTER JOIN
Personnel AS P
ON P.emp_id = R.emp_id;


Then you do the formatting in the front end.

Junction table data integrity approaches

SQL Apprentice Question
A database that I'm supporting has a 'super' many to many table (DDL
below) that joins any table to any other table. The business uses this
to create links between 'Vehicles', 'People', 'Groups of people',
'Places' and about ten other entities/tables.


Now I understand that this is not good DB design because the integrity
cannot be enforced and this is indeed where problems are surfacing. I
want to try and ensure that orphaned records never occur in this super
sized junction table. On the other hand, if this were done with a
many-to-many table for each entity then the integrity would be
maintained by the DB, but the design would not be as manageable due to
the number of tables.


The orphaned records are causing code crashes because the developers
expect there to be a record on the other end of every join.


I have two questions then;
1) Is there a way to keep the current structure and enforce integrity?
2) Is there another approach that you'd recommend?


Triggers are an option although they are generally 'hidden' from
developers who will forget to add a trigger for new tables.


A practical option is to clean the table of orphaned records once a day
and possibly do a quick check for orphan records when a user logs in
(as this only occurrs infrequently, the delay of a few seconds should
not be too much of an issue). This sounds like sticking plaster though
as I'm just covering up the problem, not solving it.


I appreciate that this has been asked before but I've Googled for a
good hour and I can't quite find anything that quite answers it.


DDL below, thanks in advance for your opinions,


Ryan


CREATE TABLE [dbo].[Link](
[FromPrimaryKey] [int] NOT NULL,
[FromTargetTableID] [int] NOT NULL,
[ToPrimaryKey] [int] NOT NULL,
[ToTargetTableID] [int] NOT NULL,
[Notes] [nvarchar](3000) COLLATE Latin1_General_CI_AS NULL,
[IsGenerated] [bit] NOT NULL,
CONSTRAINT [PK_Links] PRIMARY KEY CLUSTERED
(
[FromPrimaryKey] ASC,
[FromTargetTableID] ASC,
[ToPrimaryKey] ASC,
[ToTargetTableID] ASC
)WITH (PAD_INDEX = OFF, IGNORE_DUP_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY]


Celko Answers
Not only is this a bad idea, but it says that **all** primary keys are
integers! That is not very likely in the real world.

You are also committing the sin of mixing data and metadata in a
schema.


Do not use the proprietary BIT data type; this is SQL and not a punch
card system which depended on such low level flags.



>> A practical option is to clean the table of orphaned records [sic] once a day and possibly do a quick check for orphan records [sic] when a user logs in (as this only occurrs infrequently, the delay of a few seconds should not be too much of an issue). <<


This sort of "monster table" is not easier to maintain. You make an
error in one place and destroy integrity in a totally unrelated place.
The fact that you are creating orpahans during a session is the
problem. Cleaning them up is a kludge.

What you want is a relationship table for each relationship. It should
have a meaningful name like "Vehicle Assignments" and contain other
information about the relationship, say from and to dates on the (VIN,
employee_id) that make up the natural key.


While this is not a OTLT or MUCK design, it is as bad or worse. You
have missed the most basic ideas of RDBMS in this design.


Rows are not records and "link" refers to pointer chains; SQL has
REFERENCES which are abstract and may or may not be implemented with
pointers. You even think about adding procedural code (triggers)
instead of declarative constraints. You are still thinking about a
non-RDBMS model of data.

Multiple foreign keys

SQL Apprentice Question
I wonder if it is possible to have multiple foreign keys between to tables
and delete data automatically. We have on table with a number of locations
and another table with a number of routes between them. The foreign keys go
from location to the beginning and the end of the route. How can I secure
that (1) only data from location is used in route and (2) if a dataset in
location is deleted both routes to and from are deleted. I can't have a
cascading key on both relations and i couldn't make a trigger work either.
Any suggestions?


Celko Answers
Technically, this is okay in Standard SQL because of the constraints
that force one and only one execution path.

CREATE TABLE Locations
(location_id INTEGER NOT NULL PRIMARY KEY,
location_name VARCHAR(30) NOT NULL);


CREATE TABLE Routes
(route_name VARCHAR(30) NOT NULL PRIMARY KEY,
start_location_id INTEGER NOT NULL
CONSTRAINT starting
REFERENCES Locations (location_id)
ON UPDATE CASCADE
ON DELETE CASCADE,
final_location_id INTEGER NOT NULL
CONSTRAINT ending
REFERENCES Locations (location_id)
ON UPDATE CASCADE
ON DELETE CASCADE,
UNIQUE(start_location_id, final_location_id),
CHECK (start_location_id <> final_location_id));


A smart SQL engine will detect that {ending, starting} and {starting,
ending} will give the same results. This would be legal and deleting a
location removing a node from a graph -- the edges would also
disappear.

Multiple Cascade Paths - Fails on SQL Server 2000

SQL Apprentice Question
This is an old issue, which I chose to ignore, since it was for a temporary
app with a handful of users (famous last words, I know).

The "multiple foreign keys" post today got me thinking about it, and I
figured it is worth looking for a solution.


I will post DDL at the end, but the theory is the main point here, so I hope
my description will suffice.


Please forgive the camel case naming convention, and the silly tbl prefixes.
I do not like it either, but it is a standard in our department.


tblParameter has two FK constraints.
One references tblReport.strReportID
One references tblLookupProcedures.strLookupProcedure
Both tblReport and tblLookupProcedures have a FK reference to
tblDatabaseConnections.strConnectionName


Ideally, all of these constraints have ON UPDATE CASCADE, but the second
constraint on tblParameter will error out with this message:


****
Introducing FOREIGN KEY constraint 'FK_tblParameter_tblLookupProcedures' on
table 'tblParameter' may cause cycles or multiple cascade paths. Specify ON
DELETE NO ACTION or ON UPDATE NO ACTION, or modify other FOREIGN KEY
constraints.
****


I believe the error is caused because updates to the PK in
tblDatabaseConnections triggers updates to the FK in two tables, each of
which are referenced by tblParameter. Now the references are not on the
same columns, but SQL Server seems to think this could lead to circular
references.


The questions are:
1. Is this against RD theory, or is it just a quirk of SQL Server?
2. Is there a setting that will allow this?
3. What is the usual workaround? A trigger is the only thing that I could
think of.


Here is the DDL, scaled down to demonstrate the issue.


CREATE TABLE [dbo].[tblDatabaseConnections] (
[strConnectionName] [varchar] (10) PRIMARY KEY NOT NULL ,
[strDatabase] [varchar] (50) NOT NULL
) ON [PRIMARY]
GO


CREATE TABLE [dbo].[tblLookupProcedures] (
[strLookupProcedure] [varchar] (50) PRIMARY KEY NOT NULL ,
[strProcedureDescription] [varchar] (100) NOT NULL ,
[strConnectionName] [varchar] (10) NOT NULL
) ON [PRIMARY]
GO


CREATE TABLE [dbo].[tblReport] (
[strReportID] [varchar] (10) PRIMARY KEY NOT NULL ,
[strReportDescr] [varchar] (50) NULL ,
[strConnectionName] [varchar] (10) NOT NULL
) ON [PRIMARY]
GO


CREATE TABLE [dbo].[tblParameter] (
[strReportID] [varchar] (10) NOT NULL ,
[strParamID] [varchar] (20) NOT NULL ,
[strLookupProcedure] [varchar] (50) NOT NULL
PRIMARY KEY ([strReportID],[strParamID])
) ON [PRIMARY]
GO


ALTER TABLE [dbo].[tblLookupProcedures] ADD
CONSTRAINT [FK_tblLookupProcedures_tblDatabaseConnections] FOREIGN KEY
([strConnectionName]
) REFERENCES [dbo].[tblDatabaseConnections] ([strConnectionName]
) ON UPDATE CASCADE
GO


ALTER TABLE [dbo].[tblReport] ADD
CONSTRAINT [FK_tblReport_tblDatabaseConnections] FOREIGN KEY
([strConnectionName]
) REFERENCES [dbo].[tblDatabaseConnections] ([strConnectionName]
) ON UPDATE CASCADE
GO


ALTER TABLE [dbo].[tblParameter] ADD
CONSTRAINT [FK_tblParameter_tblReport] FOREIGN KEY
([strReportID]
) REFERENCES [dbo].[tblReport] ([strReportID]
) ON UPDATE CASCADE
GO


--This constraint fails due to a perceived cyclical reference
ALTER TABLE [dbo].[tblParameter] ADD CONSTRAINT
[FK_tblParameter_tblLookupProcedures] FOREIGN KEY
([strLookupProcedure]
) REFERENCES [dbo].[tblLookupProcedures]([strLookupProcedure]
) on update cascade
GO


-- if we remove on update cascade, the constraint gets created
ALTER TABLE [dbo].[tblParameter] ADD CONSTRAINT
[FK_tblParameter_tblLookupProcedures_NOCASCADE] FOREIGN KEY
([strLookupProcedure]
) REFERENCES [dbo].[tblLookupProcedures]([strLookupProcedure]
)
GO


drop table [dbo].[tblParameter];
drop table [dbo].[tblReport];
drop table [dbo].[tblLookupProcedures];
drop table [dbo].[tblDatabaseConnections];



Celko Answers
DB2 had this problem in a very early version. If you had cascade
chains like A-->B, B-->C and A-->C, the final values in C would be
whoever got there last to overwrite the value. Today DB2 has a pretty
good cycle detector and allows some things that it did not before.

The relational rule is that all possible cascade paths must leave the
DB in the same state when they finish and that state has to be validate
under all the constraints.


This is hard to implement in theory -- remember graph theory? So real
products give up at some point, using a combination of a graph and
constraints. SQL Server happens to quit very early :)

Using inner and outer joins in the same query

SQL Apprentice Question
Can anyone tell me if it is bad practice to use both an INNER JOIN and
an LEFT OUTER JOIN in the same query on SQL Server 2000?

Celko Answers
Not a problem, but remember that they are executed in left to right
order, so you may want to use parens and/or make derived tables out of
some of the results. Look up the scoping rules for the FROM clause.

Thursday, May 11, 2006

Exotic SQL Question

SQL Apprentice Question
Table 1:
id1 name
10 - Monday
20 - Tuesday
30 - Wed..
40 - etc

Table2:
id2 - name
5 - blue
25 - red
33 - yellow
77- gree


How do I use SQL to join the tables so that the "id1" columns join on
the largest "id2" value smaller than "id1"


Result:
Monday - blue
Tuesday - blue
Wed - red
etc


Thanks


Celko Answers
Please post DDL, so that people do not have to guess what the keys,
constraints, Declarative Referential Integrity, data types, etc. in
your schema are.

CREATE TABLE Colors
(color_nbr INTEGER NOT NULL PRIMARY KEY,
color_name CHAR(10) NOT NULL);


CREATE TABLE Weekdays
(zeller_nbr INTEGER NOT NULL PRIMARY KEY,
weekday_name CHAR(10) NOT NULL);



>> How do I use SQL to join the tables so that the color_nbr columns join on the largest zeller_nbr value smaller than color_nbr <<


SELECT C1.color_nbr, W1.zeller_nbr, C1.color_name, W1.weekday_name
FROM Colors AS C1, Weekdays AS W1
WHERE zeller_number
= (SELECT MAX(zeller_nbr)
FROM Weekdays AS W2
WHERE W2.zeller_nbr < C1.color_nbr
AND W2.zeller_nbr = W1.zeller_nbr)

This sounds like the old joke about the teacher and the students:


teacher: "Mary, what is 6 times 7?"
student: "Red?"
teacher: "Billy, what is 6 times 7?"
student: "Thursday!"
teacher: "Johnny, what is 6 times 7?"
student: "42!"
teacher: "Johnny, tell the class how you go that answer."
student: "I divided Red by Thursday!"


untested.

nested tree, how to

SQL Apprentice Question
I've problems representing nested tree in sql server strucutre ...
my scenario is :
several process (p1,p2,p3,etc..) makes several operations
(op1,op2,op3,...)
i need to store the combination process,operation,time and this ismy
table
structure (processid,opid,dateop)
now i need to show a tree hystory of the operations in a definited date
range....
here's a sample:
op1
50% makes op2
20% makes op1
60% makes op3
100% makes op4
20% makes op5

50% makes op3
100% makes op6
100% makes op1


the real problem is the amount of data ., about 20 000 000 records ...
how can represent this tree in an efficient way ?!??!!?


Celko Answers
Please post DDL, so that people do not have to guess what the keys,
constraints, Declarative Referential Integrity, data types, etc. in
your schema are. Sample data is also a good idea, along with clear
specifications. What you posted makes no sense.


>> Several process (p1,p2,p3,etc..) makes several operations (op1,op2,op3,...) I need to store the combination (process, operation, time) <<


You have described a matrix (ops by processes), not a tree. In a tree,
one process might have one or more operations as subordinates.

You show a cycle where OP1 is one of its own subordinates. Trees do
not cycles.


Have you gotten a copy of TREES & HIERARCHIES IN SQL?

Orders in last 24 months

SQL Apprentice Question
I would like to count the number of orders for each month for each one of my
customers over the last 24 months. The information will then be used into a
Crystal Report Bar Chart.

The problem I'm having is that Crystal will only display information that
exists. Some of my customers do not have orders in every months of the last 2
years.


I would like to create a view that will return 0 for each month that each
customer does not have any orders.


I have founs a few articles on creating a Calendar Table to help the process
but I cannot get the result I'm looking for.


Celko Answers
Fill this table with the last 24 months you want to look at:

CRERATE TABLE ReportPeriods
(rpt_period_name CHAR(10) NOT NULL PRIMARY KEY,
start_date DATETIME NOT NULL,
end_date DATETIME NOT NULL,
CHECK (start_date < end_date));


Then this is your query:


SELECT R.rpt_period_name, SUM(sales_amt) AS sales_month_tot
FROM ReportPeriods AS R
LEFT OUTER JOIN
Sales AS S
ON S.sale_date BETWEEN start_date AND end_date
WHERE S.cust_id = @my_guy
GROUP BY R.rpt_period_name;

Sql statement Help

SQL Apprentice Question
Hi I'm trying to create a isBooked statement

What I have is a db with:


Date, StartTime,EndTime


I currently run two SQL statements:
1) "SELECT Count(PupilID) FROM TblLessons WHERE " _
& "Date =@GetDate and Time between @GetTime and @GetTimeEnd"
2) Dim StrSqlEndTime As String = "SELECT Count(PupilID) FROM TblLessons
WHERE " _
& "Date= @GetDate and TimeEnd between @GetTime and @GetTimeEnd "


I may be going it the wrong direction so please help out.
If I have already booked a appointment at say 10:00 to 12.30
Using the above statements misses a appointment if I try to added a new for
10.15 to 11.15, but catches 09:15 to 10:15 I understand why but don't know
how to correct.
I guess I'm going about this all wrong so any help would be great.


Celko Answers
Please post DDL, so that people do not have to guess what the keys,
constraints, Declarative Referential Integrity, data types, etc. in
your schema are. Sample data is also a good idea, along with clear
specifications.

What you did post was wrong; there is no TIME data type in SQL Server.


CREATE TABLE AppointmentSchedule
(pupil_id INTEGER NOT NULL,
start_time DATETIME NOT NULL,
end_time DATETIME NOT NULL,
CHECK (start_time < end_time),
PRIMARY KEY (pupil_id, start_time),
..);


We never put "tbl-" as a prefix on a data element name in SQL. What
you posted is not a SQL statement -- it is in some host language for
dynamci execution. Use a stored procedure in the database instead.



>> I guess I'm going about this all wrong so any help would be great. <<


What are you trying to do. You never told us. Do you need to delete
appointsments? Do you need to add appointments? Can they overlap?

Wednesday, May 10, 2006

Check Contraint question

SQL Apprentice Question
have the following check constraint


(isnull(patindex(('%[' + ' ' + char(9) + char(10) + char(13) + ']%'),[LicensePlateNumber]),0) = 0)


which works fine, throwing an error if those characters are entered. Is there a way to have it not throw an error, but rather just remove the offending characters if entered?

Celko Answers
No. Constraints are declarative and do not perform actions. I would
do this kind of thing inthe front end or in the inpout procedure.
Triggers will fire any time the table is touched and work on all rows,
so they can be a bit costly.

NewBee Trigger Question

SQL Apprentice Question
I want to create a basic insert trigger. In T1 I add a row, which creates a
new ID. The trigger fires after the insert, how do I get the new ID to add
to the child table?

I know this is as simple as it gets but I've read about 10 posts and dont'
see it?


Thanks in advance.

Celko Answers
>> want to create a basic insert trigger. In T1 I add a row, which creates a new ID [sic]. The trigger fires after the insert, how do I get the new ID to add to the child [sic] table? <<


Stop using SQL and go back to a network database. You have described
how they work as they build pointer chains as the data is inserted. I
am not kidding -- read a DB history book. You even used the term
"child" instead of "referenced" table!! Pure network/pointer chain
database concepts and terms, not anything like RDBMS.


>> I've read about 10 posts and dont' see it? <<


Perhaps you should have read one book on RDBMS instead?

You do not create a relational key. It already exists in the real
world and you discover it.


Triggers are a kludge for putting procedural code into a declarative
language.


You need to start over; you do not know what you are doing. People
here will give you kludges to get rid of you quickly because we cannot
give you a 1-2 year course in RDBMS. Telling someone to "smash rats
with a rock when they get near your baby" is easier than "improve the
sewer system by learning civil engineering so rats are not a problem"


Look up this article: http://www.apa.org/journals/psp/psp7761121.html


Journal of Personality and Social Psychology


Unskilled and Unaware of It: How Difficulties in Recognizing One's Own
Incompetence Lead to Inflated Self-Assessments


Remember it takes SIX years to become a Union Journeyman Carpenter in
New York State. How many years to be an SQL programmer? A few weeks
in a ceritificate training class!

Table Joins on more than one field

SQL Apprentice Question
Can someone help me with this SQL?

1) EMPLOYEE table has columns 'employee_id' and 'emp_sid' and other
columns


2) EMPLOYEE_BENEFITS table has a column called employee_entity, this
column can be joined to either 'employee_id' OR 'emp_sid' but not both
in the EMPLOYEE table.


3) EMPLOYEE_TRACK table has column called employee_track_entity, this
column can be joined to the employee_benefits_id (PK) of the
EMPLOYEE_BENEFITS table.


I am listing the sql for the tables (the tables shows only the columns
in question)


CREATE TABLE [dbo].[EMPLOYEE] (
[employee_id] [int] IDENTITY (1, 1) NOT NULL ,
[empsid_id] [int] NOT NULL
) ON [PRIMARY]
GO


CREATE TABLE [dbo].[EMPLOYEE_BENEFITS] (
[employee_benefits_id] [int] IDENTITY (1, 1) NOT NULL ,
[employee_entity] [int] NOT NULL
) ON [PRIMARY]
GO


CREATE TABLE [dbo].[EMPLOYEE_TRACK ] (
[employee_track_id] [int] IDENTITY (1, 1) NOT NULL ,
[employee_track_entity] [int] NOT NULL
) ON [PRIMARY]
GO


SELECT * FROM EMPLOYEE e
INNER JOIN
EMPLOYEE_BENEFITS eb ON (e.employee_id = eb.employee_entity OR
e.empsid_id = eb.employee_entity)
INNER JOIN
EMPLOYEE_TRACK et ON eb.employee_benefits_id = et.employee_track_entity


The above SQL I wrote is this: the second inner join uses a OR to join
either of the columns in the first table EMPLOYEE. There is performance
degradation with this SQL. With huge data It takes about 30 seconds to
execute. I know this is not the perfect way to do it, can anyone of the
SQL Gurus please enlighten me to a faster approach?


If I dont use the OR I can try left join on the same table
EMPLOYEE_BENEFITS twice by changing the join types, but If I did this
what table alias can I use to join to the 3rd table?


SELECT * FROM EMPLOYEE e
LEFT JOIN
EMPLOYEE_BENEFITS eb1 ON e.employee_id = eb.employee_entity
LEFT JOIN
EMPLOYEE_BENEFITS eb2 ON e.empsid_id = eb.employee_entity
INNER JOIN
EMPLOYEE_TRACK et ON [???].employee_benefits_id =
et.employee_track_entity


Celko Answers
>> Can someone help me with this SQL? <<


Not really, because you do not have an RDBMS. You have a bunch of
poorly designed non-tables written in SQL.


>> 1) EMPLOYEE table has columns 'employee_id' and 'emp_sid' and other


columns <<

The table name EMPLOYEE (all uppercase so it ihard to read or you are
using punchcards for the one employee you have -- singular names mean
one entity). If this table is for personnel data (a set), not for each
employee as if they records in a sequential file, why did you give it
such a bad name. Which one of these two columns is the PRIMARY KEY?
Oh, you have no keys at all!!



>> 2) EMPLOYEE_BENEFITS table has a column called employee_entity, this column can be joined to either 'employee_id' OR 'emp_sid' but not both in the EMPLOYEE table. <<


You actually used a postfix of entity! So much for data modeling and
ISO-11179 specs! You also missed the whole idea of PK-FK constraints --
There is no OR option in the concept. I think that some early network
DBMS system had "variant pointers" that could work liekthat, but I
owuld have to research it


>> 3) EMPLOYEE_TRACK table has column called employee_track_entity, this column can be joined to the employee_benefits_id (PK) of the EMPLOYEE_BENEFITS table.<<


Again, a singular name so we have only one track. Since IDENTITY can
never be a key by definition, EMPLOYEE_BENEFITS has no key to
reference. Don't your benefit programs have names, tax ids, or
something you can validate and verify?

Again, you are creating a pointer chain DBMS system in SQL, but do not
have the background to realize that you are re-inventing a square
wheel.


CREATE TABLE Personnel
(employee_id CHAR(9) NOT NULL PRIMARY KEY, -- use legally required id
...);


CREATE TABLE EmployeeBenefits
(employee_benefits_id INTEGER NOT NULL PRIMARY KEY,
employee_id CHAR(9) NOT NULL
REFERENCES Personnel(employee_id)
ON UPDATE CASCADE
ON DELETE CASCADE,
..);


CREATE TABLE EmployeeTracks
(employee_track_id INTEGER NOT NULL PRIMARY KEY,
.. );



>> The above SQL I wrote is this: the second inner join uses a OR to join either of the columns in the first table EMPLOYEE. There is performance degradation with this SQL. With huge data It takes about 30 seconds to execute. <<


You are worried about performance degradation?? You forgot the lack of
data integrity caused by two keys.


>> I know this is not the perfect way to do it, can anyone of the SQL Gurus please enlighten me to a faster approach? <<


Clean up the DDL. Get real keys instead of that IDENTITY crap. Learn
basic data modeling. Fix the multiple key problem. And stop putting
SELECT * in production code.

I saw a problem like this once a few decades ago. Two companies had
merged because they wer tired of competing in the same market (welding
supplies, same products). One company coded its inventory by the
location in the warehouse (makes picking orders very easy) and the
other coded by the type of welding done (aluminium, brass, underwater,
etc.). The warehouses were arranged very differently becuase of the
encoding. Are you familar with the Japanese housing numbering system
versus the United States?


They wanted a combined inventory and catalog, but their customers and
personnel were too used to one system or the other and the politics
were awful.


What they needed as a kludge was one and only one SKU code and a
conversion table in the computer and a pair of codes on the labels.
Until they could design a good SKU code.


Did this cost a lot of time and money? You bet! In fact, it killed the
merger. Each warehouse was an island of data, so there was no timely
way to move inventory across the two SKU codes to fill orders.
Someone asks for 5 Widgets and 2 are under code #A and 3 are under code
#B in another warehouse, but nobody knew!


Have you been to a Barnes & Noble lately? Look at the company sticky
label that goes over the pre-printed ISBN code. Same expensive, stupid
design flaw that you and the welding supplies companies had.

Tuesday, May 09, 2006

SQL 2005 Server Boolean value

SQL Apprentice Question
I am trying to implement a boolean field in my SQL 2005 database. I've read
some newsgroups saying that there is no boolean type but instead a bit typeis
used which can be 0 or 1.


Does this mean on my .NET side i shouldn't use the boolean type but another
data type which I set to 0 or 1, maybe a byte?


I've also noticed people using char(1) and using Y/N or T/F.


Can anyone give me advice on which is the best method?

Celko Answers

>> am trying to implement a boolean field in my SQL 2005 database. I've read some newsgroups saying that there is no boolean type but instead a bit type is used which can be 0 or 1. <<


That is a kludge. SQL does not have a BOOLEAM data type on purpose and
BIT is a proprietary Numeric data type.


>> Does this mean on my .NET side i shouldn't use the boolean type but another data type which I set to 0 or 1, maybe a byte? <<


But did you want to use 1, 0 , or -1 on the NET, depending on whether
you are using VB or C#. There is no agreement about the bit level
mapping to BOOLEANs.


>> I've also noticed people using char(1) and using Y/N or T/F.<<


This is the best of a bad design situation. What you will find is that
you wanted a status code of some kind and that you will keep adding
values to it -- a questionaire with "yes" and "no" then needs to add
"Not Applicable", "Not Answered" and so forth.

Newbie - Query with loop

SQL Apprentice Question
I am relatively new to Transact SQL and need some help on what should be a
simple query...


I have a table with
* Item
* Cost
* MonthEndingDate (e.g. 1/31/2006, 2/28/2006)


I need a query to return the Cost for a given month, but if the cost is null
or there is no record, I need the previous month's Cost. If the previous
month's Cost is null or there is no record, I need the month before that
(going back 3 months). If all three months are null then I need a null.


Example


Item Cost MonthEnding
Book 10.25 1/31/2006
Book 10.50 2/28/2006
Pen 1.07 2/28/2006
Pen 1.08 3/31/2006
Pen 1.10 4/30/2006
Pencil .10 12/31/2006
Pencil .15 1/31/2006


I would need the query to return:


Item Cost
Book 10.50
Pen 1.10
Pencil null


Any help would be greatly appreciated!! Thanks in advance!


Celko Answers
It really helsp if you post DDL; her is my guess about keys and
constaints... and the table name!

CREATE TABLE ItemSummary
(month_end DATETIME NOT NULL,
item_name VARCHAR(20) NOT NULL,
item_cost DECIMAL(8,2) NOT NULL,
PRIMARY KEY (month_end, item_name));


INSERT INTO ItemSummary (item_name, item_cost, month_end) VALUES
('Book', 10.25, '2006-01-31');
INSERT INTO ItemSummary (item_name, item_cost, month_end) VALUES
('Book', 10.50, '2006-02-28');
INSERT INTO ItemSummary (item_name, item_cost, month_end) VALUES
('Pen', 1.07, '2006-02-28');
INSERT INTO ItemSummary (item_name, item_cost, month_end) VALUES
('Pen', 1.08, '2006-03-31');
INSERT INTO ItemSummary (item_name, item_cost, month_end) VALUES
('Pen', 1.10, '2006-04-30');
INSERT INTO ItemSummary (item_name, item_cost, month_end) VALUES
('Pencil', 0.10, '2006-12-31');
INSERT INTO ItemSummary (item_name, item_cost, month_end) VALUES
('Pencil', 0.15, '2006-01-31');


This answer is more of a demonstration of soem coding tricks with
COALESCE(), but it might run better than you think, if the optimizer
does early evaluation on the COALESCE() list.


SELECT DISTINCT item_name,
COALESCE (
(SELECT MAX (item_cost)
FROM ItemSummary AS I2
WHERE month_end = '2006-04-30'
AND I2.item_name = I1.item_name),
(SELECT MAX(item_cost)
FROM ItemSummary AS I2
WHERE month_end = '2006-03-31'
AND I2.item_name = I1.item_name),
(SELECT MAX(item_cost)
FROM ItemSummary AS I2
WHERE month_end = '2006-02-28'
AND I2.item_name = I1.item_name)
) AS recent_cost


FROM ItemSummary AS I1;


item_name recent_cost
==================
Book 10.50
Pen 1.10
Pencil NULL


I hard-wired the month ends, but you can use a DATEDIFF function and a
parameter to make this more general. A better way would be with a
report period calendar table.

Query Problem

SQL Apprentice Question
I am having one problem with the query. Please see below the query.


Declare @p_code_type_cd varchar (200),---3599, 6023
@p_lang_cd varchar (10),--39
@p_code_val varchar (2000), --67330000, 67000000


Select @p_code_type_cd = '3599,6023', @p_lang_cd = '39', @p_code_val =
'67330000, 67000000'
select code_type_cd, code_val, desc_text
from code_tbl where code_type_cd in (@p_code_type_cd)
and code_val in (@p_code_val)and lang_cd = @p_lang_cd


When this query is executed it produces no result. I analyzed and found
that the query which is being executed is converted as mentioned
below:-


select code_type_cd, code_val, desc_text
from code_tbl where code_type_cd in (3599,6023)
and code_val in (67330000, 67000000) and lang_cd = 39


i think it is missing few single quotes value. I am just confused how
to add these quotes in the query so that it becomes like..


select code_type_cd, code_val, desc_text
from code_tbl where code_type_cd in ('3599','6023')
and code_val in ('67330000','67000000')and lang_cd = '39'

Celko Answers
Actually, you are missing a proper data model. Data element names like
"code_type_cd" are absurd. A code and a type are different kinds of
attributes, so you should have names like "postal_code" or
"blood_type" instead of a list of adjectives looking for a noun. Same
problem with "code_val"

What kind of code is in "code_tbl"? This name implies that you are
dealing with furniture. It must be ONE AND ONE KIND of code to be a
valid table. There is no such thing as a "Magical, Universal Does
Everything" code table in an RDBMS. Surely you have not mixed data and
metadata in a schema to build a OTLT or MUCK? Google those words and
start your research.


You also do not seem to know that SQL is compiled, so passing a string
is not like writing code on the fly in an interpreter. Without DDL and
sensible data element names, nobody can really help you. But based on
past experience, when the schema is bad, the kludge is usually dynamic
SQL.


The reason people give you that kludge is that it gets rid of you
faster than actually soving the root problems. That could take more
time and effort than we want to give away for free in a newsgroup.
Please get some real help somewhere else.

how to find conditions across rows (attendance)

SQL Apprentice Question
I need to find students that have 4 consecutive absences. When a
student is absent 4 times in a row, they can be dropped from the class.

My class attendance file contains each attendance by date and whether
they were present or not. When the student has 4 consecutive value 1
(absent) for a given session and a given class the are considered to be
dropped.
If I needed to know the total number of absences, I know I could group
and summarize, but this one has the consecutive twist.
Table:


CREATE TABLE "dbo"."clsatt"
("FULL_CLASS_ID" CHAR(15) NOT NULL,
"STUDENT_ID" CHAR(20) NULL,
"SESSION_ID" CHAR(10) NULL,
"MEETING" SMALLINT NOT NULL,
"PRESENT" CHAR(2) NOT NULL)
;
Present value of 1 is absent, value of 2 is present (3 means holiday)
Classes typically meet 12 times.
I would want something like
FULL_CLASS_ID, STUDENT_ID, SESSION_ID, 'Dropped'
as the output.


Notice in the example the first student was absent the last 4 meetings
The second student 5 absenses
and the third student was totally absent
In these three examples, they are flagged as dropped.
TIA
Rob
Inserts:


---------------------------------------------------------------------------­------
insert into clsatt values ('BUS100','1675812194','200203',1,'2')
insert into clsatt values ('BUS100','1675812194','200203',2,'2')
insert into clsatt values ('BUS100','1675812194','200203',3,'2')
insert into clsatt values ('BUS100','1675812194','200203',4,'2')
insert into clsatt values ('BUS100','1675812194','200203',5,'2')
insert into clsatt values ('BUS100','1675812194','200203',6,'2')
insert into clsatt values ('BUS100','1675812194','200203',7,'2')
insert into clsatt values ('BUS100','1675812194','200203',8,'2')
insert into clsatt values ('BUS100','1675812194','200203',9,'1')
insert into clsatt values ('BUS100','1675812194','200203',10,'1')
insert into clsatt values ('BUS100','1675812194','200203',11,'1')
insert into clsatt values ('BUS100','1675812194','200203',12,'1')
insert into clsatt values ('BUS100','1712400537','200203',1,'2')
insert into clsatt values ('BUS100','1712400537','200203',2,'2')
insert into clsatt values ('BUS100','1712400537','200203',3,'2')
insert into clsatt values ('BUS100','1712400537','200203',4,'2')
insert into clsatt values ('BUS100','1712400537','200203',5,'2')
insert into clsatt values ('BUS100','1712400537','200203',6,'2')
insert into clsatt values ('BUS100','1712400537','200203',7,'2')
insert into clsatt values ('BUS100','1712400537','200203',8,'1')
insert into clsatt values ('BUS100','1712400537','200203',9,'1')
insert into clsatt values ('BUS100','1712400537','200203',10,'1')
insert into clsatt values ('BUS100','1712400537','200203',11,'1')
insert into clsatt values ('BUS100','1712400537','200203',12,'1')
insert into clsatt values ('BUS100','1801704805','200203',1,'1')
insert into clsatt values ('BUS100','1801704805','200203',2,'1')
insert into clsatt values ('BUS100','1801704805','200203',3,'1')
insert into clsatt values ('BUS100','1801704805','200203',4,'1')
insert into clsatt values ('BUS100','1801704805','200203',5,'1')
insert into clsatt values ('BUS100','1801704805','200203',6,'1')
insert into clsatt values ('BUS100','1801704805','200203',7,'1')
insert into clsatt values ('BUS100','1801704805','200203',8,'1')
insert into clsatt values ('BUS100','1801704805','200203',9,'1')
insert into clsatt values ('BUS100','1801704805','200203',10,'1')
insert into clsatt values ('BUS100','1801704805','200203',11,'1')
insert into clsatt values ('BUS100','1801704805','200203',12,'1')
insert into clsatt values ('BUS100','1922287588','200203',1,'1')
insert into clsatt values ('BUS100','1922287588','200203',2,'1')
insert into clsatt values ('BUS100','1922287588','200203',3,'2')
insert into clsatt values ('BUS100','1922287588','200203',4,'2')
insert into clsatt values ('BUS100','1922287588','200203',5,'2')
insert into clsatt values ('BUS100','1922287588','200203',6,'2')
insert into clsatt values ('BUS100','1922287588','200203',7,'2')
insert into clsatt values ('BUS100','1922287588','200203',8,'2')
insert into clsatt values ('BUS100','1922287588','200203',9,'2')
insert into clsatt values ('BUS100','1922287588','200203',10,'2')
insert into clsatt values ('BUS100','1922287588','200203',11,'1')
insert into clsatt values ('BUS100','1922287588','200203',12,'2')
insert into clsatt values ('BUS100','2188469657','200203',1,'1')
insert into clsatt values ('BUS100','2188469657','200203',2,'1')
insert into clsatt values ('BUS100','2188469657','200203',3,'2')
insert into clsatt values ('BUS100','2188469657','200203',4,'2')
insert into clsatt values ('BUS100','2188469657','200203',5,'2')
insert into clsatt values ('BUS100','2188469657','200203',6,'2')
insert into clsatt values ('BUS100','2188469657','200203',7,'2')
insert into clsatt values ('BUS100','2188469657','200203',8,'2')
insert into clsatt values ('BUS100','2188469657','200203',9,'1')
insert into clsatt values ('BUS100','2188469657','200203',10,'1')
insert into clsatt values ('BUS100','2188469657','200203',11,'1')
insert into clsatt values ('BUS100','2188469657','200203',12,'2')
insert into clsatt values ('BUS100','2515197431','200203',1,'1')
insert into clsatt values ('BUS100','2515197431','200203',2,'1')
insert into clsatt values ('BUS100','2515197431','200203',3,'2')
insert into clsatt values ('BUS100','2515197431','200203',4,'2')
insert into clsatt values ('BUS100','2515197431','200203',5,'1')
insert into clsatt values ('BUS100','2515197431','200203',6,'2')
insert into clsatt values ('BUS100','2515197431','200203',7,'2')
insert into clsatt values ('BUS100','2515197431','200203',8,'1')
insert into clsatt values ('BUS100','2515197431','200203',9,'2')
insert into clsatt values ('BUS100','2515197431','200203',10,'2')
insert into clsatt values ('BUS100','2515197431','200203',11,'1')
insert into clsatt values ('BUS100','2515197431','200203',12,'2')


Celko Answers
need to find students that have 4 consecutive absences. When a
student is absent 4 times in a row, they can be dropped from the class.

My class attendance file contains each attendance by date and whether
they were present or not. When the student has 4 consecutive value 1
(absent) for a given session and a given class the are considered to be
dropped.
If I needed to know the total number of absences, I know I could group
and summarize, but this one has the consecutive twist.
Table:


The tabel made no sense. NULL student ids of 20 characters in length?
Numeric attendance codes kept in CHAR(), you have no key? etc.


CREATE TABLE ClassAttendance
(class_name CHAR(15) NOT NULL,
student_id CHAR(20) NOT NULL,
session_id CHAR(10) NOT NULL,
meeting_nbr INTEGER NOT NULL,
attend_code INTEGER DEFAULT 1 NOT NULL
CHECK (attend_code IN (1,2,3)),
PRIMARY KEY (class_name, student_id, session_id, meeting_nbr));


Here is a shot using the new OLAP functions:


SELECT DISTINCT student_id, class_name, session_id
FROM (SELECT student_id, class_name, session_id,
SUM(attend_code)
OVER(PARTITION BY class_name, session_id
ORDER BY student_id, class_name, session_id
ROWS 4 PRECEDING)
FROM ClassAttendance
GROUP BY student_id, class_name, session_id)
AS X (student_id, class_name, session_id, last_four)
WHERE last_four = 4;

Anohter answer: Since there are only a few ranges, we can build an
auxiliary table and use it:

CREATE TABLE FourRanges
(start_session_nbr INTEGER NOT NULL,
end_session_nbr INTEGER NOT NULL,
CHECK (start_session_nbr < end_session_nbr));


INSERT INTO FourRanges VALUES (1, 4);
INSERT INTO FourRanges VALUES (2, 5);
INSERT INTO FourRanges VALUES (3, 6);
INSERT INTO FourRanges VALUES (4, 7);
..
INSERT INTO FourRanges VALUES (9, 12);


SELECT A1.course_name, A1.session_id, A1.student_id,
SUM(attend_code)
FROM ClassAttendance AS A1, FourRanges AS F
WHERE A1.session_nbr BETWEEN F.start_session_nbr
AND F.end_session_nbr
GROUP BY course_name, session_id, student_id
HAVING SUM(attend_code) = 4;