SQL for Smarties | SQL Programming Style | Trees and Hierarchies in SQL | SQL Puzzles and Answers | Data and Databases


Wednesday, May 23, 2007

SQL Query Help

SQL Apprentice Question
Need to learn SQL queries again (SQL 2005). Need help with the
following situation:


I have two tables, CUSTOMERS and ORDERS. One each column in these
tables should have identical data (except the name of the column is
different) which is customer's name. Because each order creates an
entry in ORDER table, each customer's name may be appearing more than
once in that table. There could be also instances where a particular
customer might not have placed an order at all and will not have any
entry in the ORDERS table, but do have one entry in CUSTOMERS table
(no duplicates in CUSTOMERS table).


I need help with a query which shows a list of customers who have
never placed any orders. This essentially means that they do not have
any entry in the ORDERS table. Here is the Pseudo Code:


select customername from CUSTOMERS table
who are NOT IN
ORDERS table


Will appreciate any quick thoughts. Thanks in advance!



Celko Answers

>> Need to learn SQL queries again (SQL 2005). <<>> I have two tables, CUSTOMERS and ORDERS. One each column in these tables should have identical data (except the name of the column is different) which is customer's name. <<>> I need help with a query which shows a list of customers who have never placed any orders. <<


Wild guess, done without DDL

SELECT C.customer_name
FROM Customers AS C
WHERE NOT EXISTS
(SELECT *
FROM Orders AS O
WHERE O.customer_name = C.customer_name);

Totals

SQL Apprentice Question
I have a table that is populated everyday with daily totals, I'm
trying to teach myself SQL using this table.


Someone kindly gave me a query that gave the weekly totals based on
the table:
SELECT week_start, week_start + 6 AS week_end, SUM(Total3) AS
production
FROM (SELECT Date - DATEPART(weekday, Date + @@DATEFIRST -
1) + 1 AS week_start, Total3
FROM dbo.A_Totals) AS d
GROUP BY week_start
ORDER BY week_start


How can I adapt this to divide the weekly totals into years so I'd
have: week number, year, year-1,year-2 etc.


Thanks in advance


Celko Answers

>> I have a table that is populated everyday with daily totals, I'm trying to teach myself SQL using this table. <<


Bad idea; get a good book and a tutorial, then work on a problem.

Your approach to SQL is typical of someone coming in via procedural
code and looking for functions to solve it. SQL is a data retrieval
language that likes to use sets (tables). This is a WHOLE DIFFERENT
PROGRAMMING PARADIGM. After you have a few basics, then look up the
use of calendars -- temporal stuff is one of the hardest parts of
RDBMS, so it is a horrible starting point for a self-education. Try
something like this:


CREATE TABLE ReportPeriods
(period_name VARCHAR(20) NOT NULL PRIMARY KEY,
start_date DATETIME NOT NULL,
end_date DATETIME NOT NULL,
CHECK (start_date < end_date));


Create and load a table of reporting periods of whatever range you
want. They can overlap, have gaps, etc. They can be crazy, one shot
things, etc. If you are a Don Martin fan:


INSERT INTO ReportPeriods
VALUES ('National Gorilla Suit Week', 2007-05-21', 2007-05-27');
etc.


You also have to learn to design data which should take about a year,
if you really work at it. So if your weeks are encoding the ISO-8601
year-week number format, then you can simply write:


SELECT period_name, SUM(some_col) AS week_total
FROM DailyReports AS D, ReportPeriods AS R
WHERE D.report_date BETWEEN start_date AND end_date
AND period_name LIKE 'WEEK-2007__;


Change the wild cards as needed.

Friday, May 11, 2007

Correct Way to Insert into Multiple Tables

SQL Apprentice Question
I am just wondering if what I am doing would be considered the correct
way to insert data into multiple tables when a forigen key is in place
between the tables primary keys.


Here is a simple DB structure
Table 1
ID int (auto incriment) Primary Key
FirstName varchar(100)
LastName varchar(200)


Table 2
ID (Primary Key)
Age int
BirthPlace varchar(100)


I have created the following sotred proceedure, it works from the test I
have done, but I am wondering if it is the *correct* way to do it.


PROCEDURE [dbo].[InsertPerson]
@ID int output,
@FirstName varchar(100),
@LastName varchar(200),
@Age int,
@BirthPlace varchar(100)
AS
Insert into dbo.UserInfo(FirstName, LastName)
Values(@FirstName, @LastName)
SET @ID = SCOPE_IDENTITY()


Insert into dbo.UserDetails(ID, Age, BirthPlace)
Values (@ID, @Age, @BirthPlace)



Celko Answers

>> I am just wondering if what I am doing would be considered the correct way to insert data into multiple tables when a forigen key is in place between the tables primary keys. <<


Please post DDL, so that people do not have to guess what the keys,
constraints, Declarative Referential Integrity, data types, etc. in
your schema are. Sample data is also a good idea, along with clear
specifications. It is very hard to debug code when you do not let us
see it.

And what you have done is wrong on several points.


1) an auto-increment cannot ever be used as a key in an RDBMS. A key
is a subset of attributes in the data model. An auto-increment is
inside the hardware and has nothing to do with the data model,


2) Do you really have names that long? Well, you will if you allow
it,


3) Why did you split information of what-ever-the-heck you are
modeling across two tables?


4) "age" of what? Ands we never store age; we store a birthdate so we
can always correctly compute the age. Is this what you wanted?


CREATE TABLE Customers
(cust_id CHAR(9) NOT NULL PRIMARY KEY
CHECK (<< validation constraint>>),
last_name VARCHAR(20) NOT NULL,
first_name VARCHAR(20) NOT NULL,
birth_date DATETIME NOT NULL,
birth_location VARCHAR(20) NOT NULL);


The problem goes away with a valid design. I think what you might
have wanted to know is that INSERT INTO statements work on one and
only one base table. Put both inserts into a single transaction and
add a PRIMARY KEY and FOREIGN KEY constraints.

Last Day Of Previous Month...with a twist

SQL Apprentice Question
have a requirement to design a query that identifies items sold
between two dates. There is a 'SoldDate' datetime field used to
register what date the item was sold.


The query needs to identify all sales between the last day of the
previous month and going back one year.


What I would like to do is to design a query / stored procedure that
will dynamically create the criteria to allow the client to simply run
the query or stored proc.


I know how to establish the last day of the previous month part, I'm
just not sure of how best to design the remainder of the query.


Celko Answers
>> I know how to establish the last day of the previous month part, I'm just not sure of how best to design the remainder of the query. <<


Instead of using procedural coding, why not use a table of the
reporting periods for a decade or two? A simple BETWEEN predicate
will classify each sale quickly and give you extra control over non-
operating days, etc.

difficulty with SQL to get view

SQL Apprentice Question
am having difficulty in designing my SQL. :(


In this setup, an Invoice can have multiple Bills (installment
payments). I would like a query that returns invoices with overdue
bills


a bill is overdue if :: NOT B_Paid and Now() > B_DueDate --- how to
put this in the following '???'


SELECT B_Invoice, ??? As Overdue
FROM T_Bill
GROUP BY B_Invoice


CREATE TABLE T_Bill (
RefNo INTEGER IDENTITY PRIMARY KEY,
B_Invoice INTEGER NOT NULL REFERENCES
T_Invoice,
B_BillDate SMALLDATETIME DEFAULT CURRENT_TIMESTAMP
NOT NULL,
B_DueDate SMALLDATETIME DEFAULT (CURRENT_TIMESTAMP
+14) NOT NULL,
B_Paid BIT DEFAULT 0 NOT NULL,
B_PaidDate SMALLDATETIME DEFAULT CURRENT_TIMESTAMP NOT NULL,
B_PaymentInfo VARCHAR(24) DEFAULT '???' NOT NULL
);



Celko Answers
Your table is not designed properly. Your assembly-style bit flag is
redundant when you have a payment date. IDENTITY cannot be a
relational key by definition and it is redundant given the invoice
number. You never put the table as a prefix on a data element names
because that would change the names from table to table. If the due
date is always 14 days after the billing date, then put compute it in
a VIEW or a query instead of wasting space. This also gives you more
control without having to alter the tables.

CREATE TABLE Billings
(invoice_nbr INTEGER NOT NULL
REFERENCES Invoices (invoice_nbr),
billing_date DATETIME NOT NULL,
PRIMARY KEY (invoice_nbr, billing_date),
paid_date DATETIME, -- null is unpaid
payment_note VARCHAR(24) DEFAULT '???' NOT NULL);



>> I would like a query that returns invoices with overdue bills <<


SELECT invoice_nbr, CURRENT_TIMESTAMP
FROM Billings
WHERE paid_date IS NULL
AND DATEADD (D, billing_date, 14) >= CURRENT_TIMESTAMP;

You need to get a book on basic data modeling. And one on RDBMS.
What you have is a badly designed punch card file system, with flag,
sequential numbering, etc. .

Tuesday, May 08, 2007

Simple update question

SQL Apprentice Question
Greetings,
I have a simple update query in Sql 2k and want to update only
one field instead of all 3 fields.

Update customer set
Name = @inName,
City = @inCity
State = Case when @instate is not null then @instate else
State end
Where Customer = @inCust


I would like to know what's the best way to prevent not to update all
3 field when updating only one. If I use the case statement then check
for @instate if not null then itself - would this prevent
from updating the third field. Is there a better way to do this
update. Thanks in advance.



Celko Answers

>> The question is by updating all fields [sic] isn't too much overhead for SQL Server 2k to deal with? <<


You missed one of the major points of SQL. The unit of work is the
entire row and not the columns within a row. This is one of the many,
many ways that newbies confuse rows and records, fields and columns.
This is why we have OLD and NEW (called INSERTED and DELETED in local
dialect) in the SQL update model. Records are updated a field at a
time in file systems.

Friday, May 04, 2007

Stored Procedures---Stacking IFs

SQL Apprentice Question
What is the proper syntax for stacking these IFs in a Stored Procedure???

Thanks in advance...


IF @Loan_Nbr IS NOT NULL
IF @MTM_Losses_Accrued IS NOT NULL
IF EXISTS ( SELECT 1 FROM [DMD_Data]..[UDF_WARRANTY] WARRANTY

Celko Answers
>> What is the proper syntax for stacking these IFs in a Stored Procedure <<


What the hell are you talking about?? IF-THEN-ELSE-ENDIF constructs in
any procedural programming language are nested or sequential. This
has nothing to why SQL. Don't you know how to program in ANY
language??!!

Please something that makes sense when you are sober

Naming Conventions

SQL Apprentice Question
I'm coming from a MS Access background and so I'm very used to and
comfortable with the hungarian (Leszynski et al) naming conventions.
However, I'm getting started into my first SQL Server Database and
really want to use the appropriate up to date standard naming
convention (ISO compliant).


I think I have the general idea from raking though countless
conflicting sites and posts, but I'm a bit stuck on what to do
regarding pk / fk naming


For example, in my MS Access world I would have two tables:


tblOrders
=======
strOrderID
intCustomerID
dtOrderDate
...


tblCustomers
==========
intCustomerID
strCustomerName
...


So what would the appropriate and most up-to-date and standard naming
be for SQL Server? My Guess:


Orders
=====
Ord_Order_ID_Pk
Ord_Customer_ID_Fk
Ord_Order_Date
...


Customers
========
Cus_Customer_ID_Pk
Cus_Customer_Name
...


How close (or far) am I from "Celko Proof" naming here?
All help gratefully accepted!


Celko Answers

>> I'm getting started into my first SQL Server Database and really want to use the appropriate up to date standard naming convention (ISO compliant). <<


Get a copy of SQL PROGRAMMING STYLE. Back in the early 1980's I
worked for
AIRMICS (Army Institute for Research in Management Information &
Computer Sciences) and researched code formatting. I based the book
on the ISO-11179 rules and a set of postfixes from Teradata, the other
data warehouse vendor and other meta data projects.


>> I think I have the general idea from raking though countless conflicting sites and posts, <<


Get the book -- I give the DoD, ISO, etc. Standards.


>> .. but I'm a bit stuck on what to do regarding pk / fk naming <<


Same as any other data element; the principle to name it for what it
is. Not for how is implemented. Not for how it is used in one place.

>> For example, in my MS Access world I would have two tables:


tblOrders
=======
strOrderID
intCustomerID
dtOrderDate
...
<<


In a valid data model this would be:


CREATE TABLE Orders -- "tbl-" is silly and redundant
(order_id INTEGER NOT NULL PRIMARY KEY,
customer_id INTEGER NOT NULL
REFERENCES Customers (customer_id),
order_date DATE DEFAULT CURRENT_TIMESTAMP NOT NULL,
..);


Your prefixes deal with data types (implementations) and not the
nature of the data element. They are also redundant in many cases:
SQL has only one data structure, the Table (derived, CTE, virtual,
base, but still a table) and an order_date does not need a prefix to
repeat the postfix. All those prefixes have done is screw the hell
out of your data dictionary.


CREATE TABLE Customers
( customer_id INTEGER NOT NULL PRIMARY KEY,
customer_name VARCHAR(35) NOT NULL,
..);



>> Ord_Order_ID_Pk


Ord_Customer_ID_Fk
Ord_Order_Date <<

NO! NO! NO! The " Ord_Customer_ID_Fk" is the same data element as
the "Cus_Customer_ID_Pk" and should have the same data element name in
the entire schema. Do you change your fingerprints when you walk from
room to room in your house? Same thing here. Different names would
mean they are logically different data elements and they are not. The
PK- FK- crap is how they are used locally and not what they are by
their nature. You qualify the location with "<table name>.<attribute
name>" when it is not clear.



>> How close (or far) am I from "Celko Proof" naming here? <<


You did get the idea that a table is a collective or plural noun
because it is a set of more than one entity (exception -- if it really
is just one entity, use a singular name, but you do not see many of
those).

And you did not do something really stupid like have a "type_id"
affix. How can an attribute be both an identifier and a type at the
same time? Hey, lets go all out for a "type_id_value_name" postfix!!


And you did not have a magical universal "id" on every table to mimic
a sequential file record number for a physical locater. Such people
are called "ID-iots" in my book :)


I'd give you a solid C+ , but not a C++ :)

Proper Case

SQL Apprentice Question
I have a ProperCase function that works well. However, I'm constantly
adding "exceptions" into the funciton. For example, I want IPA uppercase,
but not when it's in constIPAtion. This got me to thinking that it might be
easier to add all my cases into a table, rather than constantly editing my
function. But it's not working quite the way I'd expect it to. So rather
than reinvent the wheel, I thought I'd come here and see if anyone has taken
a table approach to ProperCase. If so, can you please share how you did
this?

Celko Answers

>> So rather than reinvent the wheel, I thought I'd come here and see if anyone has taken a table approach to ProperCase. <<


Get a copy of SQL PROGRAMMING STYLE for more than you probably want to
know.. Formatting code was one of the big issues of the day when I
was a researcher at AIRMICS (Army Institute for Research in Management
Information & Computer Sciences) in the early 1980's. A summary is:

1) For people who read languages in a Latin alphabet, the eye tends to
jump to the uppercase letters -- they start sentences and proper
nouns, so they mark "special things" in the language. This is one
reason even MS dropped "camelCase" -- it jerks the eye around,
mentally and physically.


2) Uppercase words are read as Boumas (single units recognized by
shape). That is why the keywords in a program should be uppercase;
you wan to read them as a structure in which user words are held
(compilers will not let you misspell them, unlike written English
where all uppercase is horrible)


3) lowercase letters are actually read (as opposed to eating them as a
shape) and misspelling can be quickly seen. Newspapers and books knew
this centuries ago. That is why you use them for column names,
variables and so forth. But Capitalize (or ProperCase) schema
objects, like table and view names because they are proper nouns.


The underscore is much better than either camelCase or ProperCase.
The eye can see the two parts of the the ISO-11179 names easily and
read the postfix to learn the nature of the attribute without being
jerked to the uppercase letters.


These tested rules are much easier to implement in a language with
string functions than using a big look-up table in SQL.


Good naming conventions and formatting can reduce maintenance time by
8-12% and since that is where the real cost is, it is worth the
effort.