SQL for Smarties | SQL Programming Style | Trees and Hierarchies in SQL | SQL Puzzles and Answers | Data and Databases


Wednesday, September 20, 2006

What am I missing with this simple query?

SQL Apprentice Question
I am trying to add the results of this query to a new table using the
procedure below:

INSERT into db.TempCheckHistGN (EmpID,GrossPay,NetPay)
SELECT EmployeeID, SUM(GrossPay), as Expr1, Sum(NetPay) as Expr2 FROM
dbo.tblPACheckHist
WHERE (CheckDate Between '11/1/2005' AND '12/31/2010') AND (EmployeeID >=''
AND EmployeeID <='zzzzzz')
GROUP BY EmployeeID


I get a duplicate key error (EmpID is the key) but I shouldn't since the
records are being summed by EmployeeID


Celko Answers
Are the two data elements emp_id and employee_id actually the same data
element? If so, then use the same name for them. If not, check that
they converted properly on insertion. Let's clean up the code a bit
- ISO-8601 dates, consistent use of BETWEEN and removal of silly
prefixes, etc:

INSERT INTO db.TempCheckHistgn (emp_id, gross_pay, net_pay)
SELECT emp_id, SUM(gross_pay), SUM(net_pay)
FROM dbo.PayCheckHistory
WHERE check_date BETWEEN '2005-11-01' AND '2010-31-2010'
AND emp_id BETWEEN '' AND 'ZZZZZZ' -- is this redundant?
GROUP BY emp_id;



>> I get a duplicate key error (emp_id is the key) but I shouldn't since the records [sic] are being summed by emp_id. <<


Making a guess at the DDL that you did not post, I would assume that
the INSERT INTO has been run twice. I would also guess that emp_id
BETWEEN '' AND 'ZZZZZZ' covers all the personnel identifiers, but
there might ones with numerics and perhaps the all-blank one is
special. It looks weird, tho. That predicate looks more like a
CHECK() than a search condition.

You talked about "records"; you materialize and store computed
data; this is a symptom of still having a file system mindset. This
problem used to happen in file systems when the same transaction tape
was hung by the next shift.


In the relational world, we avoid temp tables of all kinds by using
virtual tables, such as a VIEW, which is guaranteed to be up-to-date
when it is invoked:


CREATE VIEW PaycheckHistorySummary (emp_id, gross_pay_tot, net_pay_tot)


AS
SELECT emp_id, SUM(gross_pay), SUM(net_pay)
FROM dbo.PaycheckHistory
WHERE check_date BETWEEN '2005-11-01' AND '2010-31-2010'
AND emp_id BETWEEN '' AND 'ZZZZZZ' -- is this redundant?
GROUP BY emp_id;

No comments: