SQL for Smarties | SQL Programming Style | Trees and Hierarchies in SQL | SQL Puzzles and Answers | Data and Databases


Monday, July 24, 2006

Having Clause Question

SQL Apprentice Question
Sorry if this a long winded question with a simple answer


i have the following script that builds 2 tables including data


Create Table tblProducts
(
intProdID int IDENTITY(1,1),
strName varchar(20) UNIQUE
)
go
insert into tblProducts (strName) values('A')
insert into tblProducts (strName) values('B')
insert into tblProducts (strName) values('C')
insert into tblProducts (strName) values('D')


go
Create UNIQUE CLUSTERED index ProdIndex on tblProducts(intProdID)
go
--Prices
drop table tblPrices
go
Create table tblPrices
(
intPriceID int IDENTITY(1,1),
intProdID int,
intPrice int,
dtInputDate dateTime
)
go
insert into tblPrices(intProdID,intPrice,dtInputDate) values (1,1.5,'21
july 2006')
insert into tblPrices(intProdID,intPrice,dtInputDate) values
(1,12.5,'20 july 2006')
insert into tblPrices(intProdID,intPrice,dtInputDate) values
(1,13.5,'22 july 2006')
insert into tblPrices(intProdID,intPrice,dtInputDate) values
(1,14.5,'23 july 2006')
insert into tblPrices(intProdID,intPrice,dtInputDate) values
(2,15.5,'24 july 2006')
insert into tblPrices(intProdID,intPrice,dtInputDate) values
(3,17.5,'25 july 2006')
insert into tblPrices(intProdID,intPrice,dtInputDate) values
(2,81.5,'26 july 2006')
insert into tblPrices(intProdID,intPrice,dtInputDate) values
(2,16.5,'27 july 2006')
insert into tblPrices(intProdID,intPrice,dtInputDate) values
(3,18.5,'28 july 2006')
insert into tblPrices(intProdID,intPrice,dtInputDate) values
(3,14.5,'29 july 2006')
insert into tblPrices(intProdID,intPrice,dtInputDate) values
(4,12.5,'30 july 2006')
go
create unique clustered index PriceIndex on tblPrices(intPriceID)
go
create nonclustered index PriceProdIndex on tblPrices(intProdID)


Im trying to get the following data
the Product name, Price for that product at the latest date. Trying to
do this without using a sub query


So in my Sybase days i would have done the following


select tblProducts.strName, tblPrices.intPrice, tblPrices.dtInputDate
from tblProducts join tblPrices on tblProducts.intProdID =
tblPrices.intProdID
group by
strName
having dtInputDate = max(dtInputDate)


which my mate confirm works on Sysbase 12.5 now on SQL server i do the
previous and i get the following error


Msg 8121, Level 16, State 1, Line 2
Column 'tblPrices.dtInputDate' is invalid in the HAVING clause because
it is not contained in either an aggregate function or the GROUP BY
clause.


Is there anyway of doing the above with out using a sub query?



Celko Answers
Thanks for the DDL. It shows us immediately that your problem is a
lack of a proper design. Get rid of the silly prefixes that violate
ISO-11179 and the rules of data modeling. Stop using IDENTITY as a
key; I am goignto assume that you have a UPC or SKU or something that
your inventory uses. Stop leaving everything NULL-able. Start using
DRI actions. Learn how to do temporal models in SQL (i.e. read
Snodgrass or the current working drafts at NCITS H2)


CREATE TABLE Products
(sku INTEGER NOT NULL PRIMARY KEY, -- need industry std
product_name VARCHAR(20) NOT NULL);


CREATE TABLE PriceList
(sku INTEGER NOT NULL,
REFERENCES Products (sku)
ON UPDATE CASCADE
ON DELETE CASCADE,
item_price DECIMAL (12,2) NOT NULL,
start_date DATETIME NOT NULL,
end_date DATETIME, - null means current
PRIMARY KEY (sku, start_date));



>> I'm trying to get the following data: the product name, price for that product at the latest date. <<


SELECT P.product_name, L.item_price, L.start_date
FROM Products AS P
LEFT OUTER JOIN
PriceList AS L
ON P.sku = L.sku
WHERE L.end_date IS NULL;

There is no problem at all, if you start with the right design.


The other problem you have is not knowing SQL, and speaking in Sybase
dialect. Here is how a SELECT works in SQL ... at least in theory.
Real products will optimize things, but the code has to produce the
same results.


a) Start in the FROM clause and build a working table from all of the
joins, unions, intersections, and whatever other table constructors are
there. The [table expression] AS [correlation name] option allows you
give a name to this working table which you then have to use for the
rest of the containing query.


b) Go to the WHERE clause and remove rows that do not pass criteria;
that is, that do not test to TRUE (i.e. reject UNKNOWN and FALSE). The
WHERE clause is applied to the working set in the FROM clause.


c) Go to the optional GROUP BY clause, partiton the original table
into groups and reduce each grouping to a *single* row, replacing the
original working table with the new grouped table. The rows of a
grouped table must be only group characteristics: (1) a grouping column
(2) a statistic about the group (i.e. aggregate functions) (3) a
function or constant(4) an expression made up of only those three
items. The original table no longer exists and you cannot reference
anything in it (this was an error in early Sybase products).


d) Go to the optional HAVING clause and apply it against the grouped
working table; if there was no GROUP BY clause, treat the entire table
as one group.


e) Go to the SELECT clause and construct the expressions in the list.
This means that the scalar subqueries, function calls and expressions
in the SELECT are done after all the other clauses are done. The AS
operator can also give names to expressions in the SELECT list. These
new names come into existence all at once, but after the WHERE clause,
GROUP BY clause and HAVING clause have been executed; you cannot use
them in the SELECT list or the WHERE clause for that reason.


If there is a SELECT DISTINCT, then redundant duplicate rows are
removed. For purposes of defining a duplicate row, NULLs are treated
as matching (just like in the GROUP BY).


f) Nested query expressions follow the usual scoping rules you would
expect from a block structured language like C, Pascal, Algol, etc.
Namely, the innermost queries can reference columns and tables in the
queries in which they are contained.


g) The ORDER BY clause is part of a cursor, not a query. The result
set is passed to the cursor, which can only see the names in the SELECT
clause list, and the sorting is done there. The ORDER BY clause cannot
have expression in it, or references to other columns because the
result set has been converted into a sequential file structure and that
is what is being sorted.


As you can see, things happen "all at once" in SQL, not "from left to
right" as they would in a sequential file/procedural language model. In
those languages, these two statements produce different results:
READ (a, b, c) FROM File_X;
READ (c, a, b) FROM File_X;


while these two statements return the same data:


SELECT a, b, c FROM Table_X;
SELECT c, a, b FROM Table_X;


Think about what a confused mess this statement is in the SQL model.


SELECT f(c2) AS c1, f(c1) AS c2 FROM Foobar;


That is why such nonsense is illegal syntax.

No comments: