SQL Apprentice Question
I have a project that I have been working on and I need to insert a
record into multiple tables and I if any one of the inserts fails I
need to rollback all of the previous inserts that were done. To
illustrate, I have ten tables that need to have a record inserted into
them and if it errors out on table six, then I want to rollback the
previous five inserts. I'd appreciate any advice I can get. Thanks.
Celko Answers
>> I have a project that I have been working on and I need to insert a record [sic] into multiple tables and I if any one of the inserts fails Ineed to rollback all of the previous inserts that were done. <<
Easy enough; set up a series of INSERT INTO's in a single transaction,
trap each insertion's error and do a ROLLBACK and return if you have a
failure. Do not commit until the end of the whole thing.
The scope of transactions in T-SQL is independent of the block
structure of the language. Think of a "transaction guy" with a bucket
of data looking at a house. The house pumps data into his bucket. He
does not care what is happening inside; he is waiting to see a COMMIT
or ROLLBACK flag come out of the window of the house. At that point,
he either throws the data out or throws it in the database.
But a better question why do you want to store the same data in
multiple tables? The major reason we moved from files to RDBMS was to
get rid of redundancy -- the mantra is "one fact, one time, one way,
one place!" and not "Let's make ten copies and try to keep them all the
same!" Instead of making ten copies of a mag tape with the same data,
we use VIEWs, CTE, and derived tables in SQL.
You did know that a row is not a record in your posting or understand
transactions, makes me wonder if your schema is messed up because you
are mimicing files.
Thursday, June 29, 2006
Ambiguous Column Names in Multi-Table Join
SQL Apprentice Question
A (possibly dumb) question, but I've had no luck finding a definitive
answer to it. Suppose I have two tables, Employees and Employers, which
both have a column named "Id":
Employees
-Id
-FirstName
-LastName
-SSN
etc.
Employers
-Id
-Name
-Address
etc.
and now I perform the following join:
SELECT Employees.*, Employers.*
FROM Employees LEFT JOIN Employers ON (Employees.Id=Employers.Id)
The result-set will contain two "Id" columns, so SQL Server will
disambiguate them; one column will still be called "Id", while the
other will be called "Id1." My question is, how are you supposed to
know which "Id" column belongs to which table? My intuition tells me,
and limited testing seems to indicate, that it depends on the order in
which the table names show up in the query, so that in the above
example, "Id" would refer to Employees.Id, while "Id1" would refer to
Employers.Id. Is this order guaranteed?
Also, why does SQL Server use such a IMO brain-damaged technique to
handle column name conflicts? In MS Access, it's much more
straightforward; after executing the above query, you can use
"Employees.Id" and "Employers.Id" (and more generally,
"TableNameOrTableAlias.ColumnName") to refer to the specific "Id"
column you want, instead of "Id" and "Id1" -- the
"just-tack-on-a-number" strategy is slightly annoying when dealing with
complex queries.
Celko Answers
>> A (possibly dumb) question, but I've had no luck finding a definitive answer to it. Suppose I have two tables, Employees and Employers, which both have a column named "Id" <<
Well, first of all, kill the stupid bastard that used "id" as a column
name, since he never read ISO-11179 or any book on BASIC data modeling.
This is not a data element name; it is too vague (identifier of
what??) and it appears EVERYWHERE, so it is a meaningless exposed
physical locator.
Please post DDL, so that people do not have to guess what the keys,
constraints, Declarative Referential Integrity, data types, etc. in
your schema are. Sample data is also a good idea, along with clear
specifications. It is very hard to debug code when you do not let us
see it. If you were polite and had a valid schema, would it look like
this?
CREATE TABLE Personnel -- note the use of a collective name for a set
(ssn CHAR(9) NOT NULL PRIMARY KEY, -- legal requirement!
last_name VARCHAR(20) NOT NULL,
first_name VARCHAR(20) NOT NULL,
.. );
CREATE TABLE Employers
(duns_nbr CHAR(9) NOT NULL PRIMARY KEY, -- industry standards!!
employer_name VARCHAR(20) NOT NULL,
..);
>> and now I perform the following join: <<
I hope not! it makes no sense. What is the relationship between two
values in totally different domains?? Gee, we need a table for that
..
>> The result-set will contain two "Id" columns, so SQL Server will disambiguate them; one column will still be called "Id", while the other will be called "Id1." My question is, how are you supposed to know which "Id" column belongs to which table? <<
By having a proper data model in which different data elements have
different names. What you have here is a "Vague, Magical, Universal
one-size-fits-all Kabalah Number" on tables, when you need a third
table called "Employment" with the employees and employers identifiers
in its columns. Basically the engine is tryitn to do the best it can
with your crappy design.
>> My intuition tells me, and limited testing seems to indicate, that it depends on the order in which the table names show up in the query, so that in the above example, "Id" would refer to Employees.Id, while "Id1" would refer to Employers.Id. Is this order guaranteed? <<
This is one of MANY reasons good programmers do not do this kind of
crappy design. The vendor is free to do anything they wish with the
display of such data. Nobody agrees. Nobody does it the same in
different releases. The best you can do is alias one of the columns.
A (possibly dumb) question, but I've had no luck finding a definitive
answer to it. Suppose I have two tables, Employees and Employers, which
both have a column named "Id":
Employees
-Id
-FirstName
-LastName
-SSN
etc.
Employers
-Id
-Name
-Address
etc.
and now I perform the following join:
SELECT Employees.*, Employers.*
FROM Employees LEFT JOIN Employers ON (Employees.Id=Employers.Id)
The result-set will contain two "Id" columns, so SQL Server will
disambiguate them; one column will still be called "Id", while the
other will be called "Id1." My question is, how are you supposed to
know which "Id" column belongs to which table? My intuition tells me,
and limited testing seems to indicate, that it depends on the order in
which the table names show up in the query, so that in the above
example, "Id" would refer to Employees.Id, while "Id1" would refer to
Employers.Id. Is this order guaranteed?
Also, why does SQL Server use such a IMO brain-damaged technique to
handle column name conflicts? In MS Access, it's much more
straightforward; after executing the above query, you can use
"Employees.Id" and "Employers.Id" (and more generally,
"TableNameOrTableAlias.ColumnName") to refer to the specific "Id"
column you want, instead of "Id" and "Id1" -- the
"just-tack-on-a-number" strategy is slightly annoying when dealing with
complex queries.
Celko Answers
>> A (possibly dumb) question, but I've had no luck finding a definitive answer to it. Suppose I have two tables, Employees and Employers, which both have a column named "Id" <<
Well, first of all, kill the stupid bastard that used "id" as a column
name, since he never read ISO-11179 or any book on BASIC data modeling.
This is not a data element name; it is too vague (identifier of
what??) and it appears EVERYWHERE, so it is a meaningless exposed
physical locator.
Please post DDL, so that people do not have to guess what the keys,
constraints, Declarative Referential Integrity, data types, etc. in
your schema are. Sample data is also a good idea, along with clear
specifications. It is very hard to debug code when you do not let us
see it. If you were polite and had a valid schema, would it look like
this?
CREATE TABLE Personnel -- note the use of a collective name for a set
(ssn CHAR(9) NOT NULL PRIMARY KEY, -- legal requirement!
last_name VARCHAR(20) NOT NULL,
first_name VARCHAR(20) NOT NULL,
.. );
CREATE TABLE Employers
(duns_nbr CHAR(9) NOT NULL PRIMARY KEY, -- industry standards!!
employer_name VARCHAR(20) NOT NULL,
..);
>> and now I perform the following join: <<
I hope not! it makes no sense. What is the relationship between two
values in totally different domains?? Gee, we need a table for that
..
>> The result-set will contain two "Id" columns, so SQL Server will disambiguate them; one column will still be called "Id", while the other will be called "Id1." My question is, how are you supposed to know which "Id" column belongs to which table? <<
By having a proper data model in which different data elements have
different names. What you have here is a "Vague, Magical, Universal
one-size-fits-all Kabalah Number" on tables, when you need a third
table called "Employment" with the employees and employers identifiers
in its columns. Basically the engine is tryitn to do the best it can
with your crappy design.
>> My intuition tells me, and limited testing seems to indicate, that it depends on the order in which the table names show up in the query, so that in the above example, "Id" would refer to Employees.Id, while "Id1" would refer to Employers.Id. Is this order guaranteed? <<
This is one of MANY reasons good programmers do not do this kind of
crappy design. The vendor is free to do anything they wish with the
display of such data. Nobody agrees. Nobody does it the same in
different releases. The best you can do is alias one of the columns.
SELECT...CASE in stored procedures
SQL Apprentice Question
I m writing some stored procedures.
Pls tell me, how can we use IF..ELSE, nested IF..ELSE in stored
procedure. means i want to pass a int parameter to stored procedure &
according to this value, we will execute my queries.
eg if i=1 then insert, if i=2 then update or if i=3 then update. Also i
can choose different SELECT statement according to i.
Now how can i use the SELECT....CASE in stored procedure. means same
above thing i want to do with SELECT....CASE. For example, for case 1,
i want to write insert statement, for case 2 update statement & for
case 3 delete statement.
pls give details
Celko Answers
>> if i=1 then insert, if i=2 then update or if i=3 then update. Also I can choose different SELECT statement according to i. <<
Have you ever had a course in basic software engineering? Look up
coupling and cohesion. THIS IS BASIC NOBODY SHOULD EVER POST THIS KIND
OF THING!! It screams that you have no business programming! What are
they teaching in colleges these days?
Cohesion:
This is how well a module does one and only one thing; that it is
logically coherent. The modules should have strong cohesion. You
ought to name the module in the format "[verb][object]", where the
"[object]" is a specific logical unit in the data model. There are
several types of cohesion. We rank them going from the worst form of
cohesion to the best
1) Coincidental
2) Logical
3) Temporal
4) Procedural
5) Communicational
6) Informational
7) Functional
Coupling:
If modules have to be used in a certain order, then they are strongly
coupled. If they can be executed independently of each other and put
together like Lego blocks, then they are loosely or weakly coupled.
There are several kinds of coupling, which are ranked from worse to
best:
1) Content
2) Common
3) Control
4) Stamp
5) Data
This is covered briefly in a chapter on writing stored procedures in my
book on SQL PROGRAMMING STYLE. In the meantime, you can read DeMarco,
Yourdon, Constantine, Myers or several other of the pioneers.
This is FAR more basic than SQL programming. This is what you are
supposed to know before you write any code in any language. Please go
back to school and get some foundations.
I m writing some stored procedures.
Pls tell me, how can we use IF..ELSE, nested IF..ELSE in stored
procedure. means i want to pass a int parameter to stored procedure &
according to this value, we will execute my queries.
eg if i=1 then insert, if i=2 then update or if i=3 then update. Also i
can choose different SELECT statement according to i.
Now how can i use the SELECT....CASE in stored procedure. means same
above thing i want to do with SELECT....CASE. For example, for case 1,
i want to write insert statement, for case 2 update statement & for
case 3 delete statement.
pls give details
Celko Answers
>> if i=1 then insert, if i=2 then update or if i=3 then update. Also I can choose different SELECT statement according to i. <<
Have you ever had a course in basic software engineering? Look up
coupling and cohesion. THIS IS BASIC NOBODY SHOULD EVER POST THIS KIND
OF THING!! It screams that you have no business programming! What are
they teaching in colleges these days?
Cohesion:
This is how well a module does one and only one thing; that it is
logically coherent. The modules should have strong cohesion. You
ought to name the module in the format "[verb][object]", where the
"[object]" is a specific logical unit in the data model. There are
several types of cohesion. We rank them going from the worst form of
cohesion to the best
1) Coincidental
2) Logical
3) Temporal
4) Procedural
5) Communicational
6) Informational
7) Functional
Coupling:
If modules have to be used in a certain order, then they are strongly
coupled. If they can be executed independently of each other and put
together like Lego blocks, then they are loosely or weakly coupled.
There are several kinds of coupling, which are ranked from worse to
best:
1) Content
2) Common
3) Control
4) Stamp
5) Data
This is covered briefly in a chapter on writing stored procedures in my
book on SQL PROGRAMMING STYLE. In the meantime, you can read DeMarco,
Yourdon, Constantine, Myers or several other of the pioneers.
This is FAR more basic than SQL programming. This is what you are
supposed to know before you write any code in any language. Please go
back to school and get some foundations.
Sunday, June 25, 2006
Distinct and Unique
SQL Apprentice Question
Distinct(in oracle and SQL server) and Unique(in oracle) are they both same
Can we use them interchangeably?
Celko Answers
>> Distinct(in oracle and SQL server) and Unique(in oracle) are they both same Can we use them interchangeably? <<
No. That is one of MANY Oracle dialect screw-ups to prevent Standard,
portable code. The UNIQUE() constraint is part of the DDL; the
DISTINCT option is part of the DML.
Distinct(in oracle and SQL server) and Unique(in oracle) are they both same
Can we use them interchangeably?
Celko Answers
>> Distinct(in oracle and SQL server) and Unique(in oracle) are they both same Can we use them interchangeably? <<
No. That is one of MANY Oracle dialect screw-ups to prevent Standard,
portable code. The UNIQUE() constraint is part of the DDL; the
DISTINCT option is part of the DML.
Quick Method to delete from Two Tables
SQL Apprentice Question
I have two tables ( a & b ) Both are linked by a ledgerref field. table
what would be the quickest and easiest way to delete records from both when
a.textStatus = 1
Celko Answers
Getting it out of dialect, and correcting the "textStatus" data
element name (test and status are both suffixes to an attribute in
ISO-11179). I will not comment on the practice of using flags in SQL
to mimic an assembly language programming, or redundant tables to mimic
scratch tapes.
DELETE FROM Beta
WHERE EXISTS
(SELECT *
FROM Alpha
WHERE Beta.ledger_ref = Alpha.ledger_ref
AND Alpha.foobar_status = 1);
DRI action would be better. The best solution would be a proper
relational design.
I have two tables ( a & b ) Both are linked by a ledgerref field. table
what would be the quickest and easiest way to delete records from both when
a.textStatus = 1
Celko Answers
Getting it out of dialect, and correcting the "textStatus" data
element name (test and status are both suffixes to an attribute in
ISO-11179). I will not comment on the practice of using flags in SQL
to mimic an assembly language programming, or redundant tables to mimic
scratch tapes.
DELETE FROM Beta
WHERE EXISTS
(SELECT *
FROM Alpha
WHERE Beta.ledger_ref = Alpha.ledger_ref
AND Alpha.foobar_status = 1);
DRI action would be better. The best solution would be a proper
relational design.
trigger/(exists) problem
SQL Apprentice Question
I created an insert trigger on a table that uses two columns as a key.
The trigger checks for an existing record with the same values for
those two columns and throws a user friendly error message. The
problem comes when I determine if a record with those values already
exist (e.g. -
if exists (Select * from a, inserted b where a. =
b. and a. = b.)
begin
'some code here
end
)
I truncated and the 'some code here still executes. I even
tried "select * from" after a truncate and the 'some code
here still executes..
Is there something I'm not doing correctly in my ALTER TRIGGER
statement? I've also noticed there is some type of unexplained delay
in that I can drop the trigger and inserts still throw the old error
message in query analyzer. After I can recreate the trigger and the
standard primary key violation error will be thrown.
oh..btw I'm using SQL2K
Celko Answers
>> I created an insert trigger on a table that uses two columns as a key. The trigger checks for an existing record [sic] with the same values for those two columns and throws a user friendly error message. <<
Why did you think about procedural trigger code instead of a
declarative constraint? Let the front end deal with the user input and
the back end deal with the data. This is the whole idea of a tiered
architecture. For example, all your TRIM() operations ought to be
inside a CHECK() constraint to prevent bad data in the table, not a
after-thought in a trigger or block of procedural code.
I created an insert trigger on a table that uses two columns as a key.
The trigger checks for an existing record with the same values for
those two columns and throws a user friendly error message. The
problem comes when I determine if a record with those values already
exist (e.g. -
if exists (Select * from
b.
begin
'some code here
end
)
I truncated
tried "select * from
here still executes..
Is there something I'm not doing correctly in my ALTER TRIGGER
statement? I've also noticed there is some type of unexplained delay
in that I can drop the trigger and inserts still throw the old error
message in query analyzer. After I can recreate the trigger and the
standard primary key violation error will be thrown.
oh..btw I'm using SQL2K
Celko Answers
>> I created an insert trigger on a table that uses two columns as a key. The trigger checks for an existing record [sic] with the same values for those two columns and throws a user friendly error message. <<
Why did you think about procedural trigger code instead of a
declarative constraint? Let the front end deal with the user input and
the back end deal with the data. This is the whole idea of a tiered
architecture. For example, all your TRIM() operations ought to be
inside a CHECK() constraint to prevent bad data in the table, not a
after-thought in a trigger or block of procedural code.
many to many query with 1 row per result?
SQL Apprentice Question
I have three tables
Student
-ID
-FirstName
Test
-ID
-Date
StudentTests
- TestID
- ChildID
- Grade
What I want to do is have a query that returns a student's information,
with a list of tests they've had. So, one row per student would be
ideal. I thought about changing table layout to have a fixed number of
tests, but I want to be able to change the number of tests pretty much
dynamically. Then I thought I could change table layout dynamically
(adding / removing columns as needed, and using dynamic SQL) but then I
thought that ... might not be the best idea :)
Right now I have this query
SELECT S.ID, S.FirstName, S.LastName, T.ID, T.Date, ST.Grade
FROM dbo.Student S
LEFT JOIN dbo.StudentTests ST ON ST.StudentID = S.ID
LEFT JOIN dbo.Test T ON T.ID = ST.TestID
ORDER BY S.LastName
Let's say there are 5 tests, I get 5 rows, I'm not super excited by
that, but I'm also not coming up with a way to change it.
So is there a way to do this?
Celko Answers
Please post DDL, so that people do not have to guess what the keys,
constraints, Declarative Referential Integrity, data types, etc. in
your schema are. Sample data is also a good idea, along with clear
specifications. It is very hard to debug code when you do not let us
see it.
You might also want to learn ISO-11179 data element naming rules so
that when you do write DDL, it will be usable. Only one student,
magical "id" in the tables, reserved words, compound table names (do
you really say "mid-term exams" or "mid-term studenttest"?), etc. This
is sloppy even for a personal pseudo-code. Is this what you meant?
CREATE TABLE Students
(student_id INTEGER NOT NULL PRIMARY KEY,
..);
CREATE TABLE TestForms
(test_id INTEGER NOT NULL PRIMARY KEY,
test_date DATETIME NOT NULL,
..);
CREATE TABLE Exams
(student_id INTEGER NOT NULL
REFERENCES Students(student_id)
ON UPDATE CASCADE,
test_id INTEGER NOT NULL
REFERENCES TestForms(test_id)
ON UPDATE CASCADE,
test_grade CHAR(1) DEFAULT 'I' NOT NULL
CHECK ( test_grade IN ('I', 'W', 'A', 'B', 'C', 'D', 'F'),
PRIMARY KEY (student_id, test_id));
>> What I want to do is have a query that returns a student's information, with a list of tests they've had. So, one row per student would be ideal. <<
Did you know that you are supposed to do reports in application and not
the database in a tiered architecture?
>> I thought about changing table layout to have a fixed number of tests, but I want to be able to change the number of tests pretty much dynamically. <<
Did you know that a table has a fixed number of columns by definition?
You are describing a report, which should be done in the front end.
>> Then I thought I could change table layout dynamically (adding / removing columns as needed, and using dynamic SQL) but then I thought that ... might not be the best idea :) <<
Finally, you got something right! Dynamic SQL is a way of saying that
you have no idea what to do, so you will let someone else decide at run
time.
>> Let's say there are 5 tests, I get 5 rows, I'm not super excited by that, but I'm also not coming up with a way to change it. <<
If the tests are attributes of an exam schedule, then each one gets a
column. just like height, weight and eye color would in a table that
models a person. If the tests are separate entities related to a
student, then each one gets a row in a gradebook or exams table.
This report is called a cross tabs and has been for the last 250+
years. So of course Microsoft calls it a PIVOT to be different. Her
is a quick way to write it in portable, standard SQL:
SELECT S1.student_id,
MAX(CASE WHEN T1.test_id = 1 THEN T1.test_grade ELSE '' END)
AS exam_1,
MAX(CASE WHEN T1.test_id = 2 THEN T1.test_grade ELSE '' END)
AS exam_2,
MAX(CASE WHEN T1.test_id = 3 THEN T1.test_grade ELSE '' END)
AS exam_3,
MAX(CASE WHEN T1.test_id = 4 THEN T1.test_grade ELSE '' END)
AS exam_4,
MAX(CASE WHEN T1.test_id = 5 THEN T1.test_grade ELSE '' END)
AS exam_5
FROM Students AS S1, Exams AS T1
WHERE S1.student_id = T1.student_id
GROUP BY S1.student_id;
I have three tables
Student
-ID
-FirstName
Test
-ID
-Date
StudentTests
- TestID
- ChildID
- Grade
What I want to do is have a query that returns a student's information,
with a list of tests they've had. So, one row per student would be
ideal. I thought about changing table layout to have a fixed number of
tests, but I want to be able to change the number of tests pretty much
dynamically. Then I thought I could change table layout dynamically
(adding / removing columns as needed, and using dynamic SQL) but then I
thought that ... might not be the best idea :)
Right now I have this query
SELECT S.ID, S.FirstName, S.LastName, T.ID, T.Date, ST.Grade
FROM dbo.Student S
LEFT JOIN dbo.StudentTests ST ON ST.StudentID = S.ID
LEFT JOIN dbo.Test T ON T.ID = ST.TestID
ORDER BY S.LastName
Let's say there are 5 tests, I get 5 rows, I'm not super excited by
that, but I'm also not coming up with a way to change it.
So is there a way to do this?
Celko Answers
Please post DDL, so that people do not have to guess what the keys,
constraints, Declarative Referential Integrity, data types, etc. in
your schema are. Sample data is also a good idea, along with clear
specifications. It is very hard to debug code when you do not let us
see it.
You might also want to learn ISO-11179 data element naming rules so
that when you do write DDL, it will be usable. Only one student,
magical "id" in the tables, reserved words, compound table names (do
you really say "mid-term exams" or "mid-term studenttest"?), etc. This
is sloppy even for a personal pseudo-code. Is this what you meant?
CREATE TABLE Students
(student_id INTEGER NOT NULL PRIMARY KEY,
..);
CREATE TABLE TestForms
(test_id INTEGER NOT NULL PRIMARY KEY,
test_date DATETIME NOT NULL,
..);
CREATE TABLE Exams
(student_id INTEGER NOT NULL
REFERENCES Students(student_id)
ON UPDATE CASCADE,
test_id INTEGER NOT NULL
REFERENCES TestForms(test_id)
ON UPDATE CASCADE,
test_grade CHAR(1) DEFAULT 'I' NOT NULL
CHECK ( test_grade IN ('I', 'W', 'A', 'B', 'C', 'D', 'F'),
PRIMARY KEY (student_id, test_id));
>> What I want to do is have a query that returns a student's information, with a list of tests they've had. So, one row per student would be ideal. <<
Did you know that you are supposed to do reports in application and not
the database in a tiered architecture?
>> I thought about changing table layout to have a fixed number of tests, but I want to be able to change the number of tests pretty much dynamically. <<
Did you know that a table has a fixed number of columns by definition?
You are describing a report, which should be done in the front end.
>> Then I thought I could change table layout dynamically (adding / removing columns as needed, and using dynamic SQL) but then I thought that ... might not be the best idea :) <<
Finally, you got something right! Dynamic SQL is a way of saying that
you have no idea what to do, so you will let someone else decide at run
time.
>> Let's say there are 5 tests, I get 5 rows, I'm not super excited by that, but I'm also not coming up with a way to change it. <<
If the tests are attributes of an exam schedule, then each one gets a
column. just like height, weight and eye color would in a table that
models a person. If the tests are separate entities related to a
student, then each one gets a row in a gradebook or exams table.
This report is called a cross tabs and has been for the last 250+
years. So of course Microsoft calls it a PIVOT to be different. Her
is a quick way to write it in portable, standard SQL:
SELECT S1.student_id,
MAX(CASE WHEN T1.test_id = 1 THEN T1.test_grade ELSE '' END)
AS exam_1,
MAX(CASE WHEN T1.test_id = 2 THEN T1.test_grade ELSE '' END)
AS exam_2,
MAX(CASE WHEN T1.test_id = 3 THEN T1.test_grade ELSE '' END)
AS exam_3,
MAX(CASE WHEN T1.test_id = 4 THEN T1.test_grade ELSE '' END)
AS exam_4,
MAX(CASE WHEN T1.test_id = 5 THEN T1.test_grade ELSE '' END)
AS exam_5
FROM Students AS S1, Exams AS T1
WHERE S1.student_id = T1.student_id
GROUP BY S1.student_id;
Traversing a tree hierachy backwards
SQL Apprentice Question
need to be able to traverse or find the all relevant parent nodes of a
tree given a particular node. Im using the adjacency model with stored path.
So if I know a certain employees number, I need to list all their superiors
back to the top of the tree.
I have a table structure similar to the following
CREATE TABLE Tree (
Node int NOT NULL IDENTITY(100, 1),
ParentNode int,
EmployeeID int NOT NULL,
Depth tinyint,
Lineage varchar(255) )
The linieage is represented by a the path to the node seperated by a '/'
Node ParentNode EmployeeID Depth Lineage
100 NULL 1001 0 /
101 100 1002 1 /100/
102 101 1003 2 /100/101/
103 102 1004 3 /100/101/102/
So if I have selected employeeid=1004, I want a quick and easy selected to
show all the superiors.
Celko Answers
>> I'm using the adjacency model with stored path. TREES & HIERARCHIES IN SQL? you can get some ideas from it.
need to be able to traverse or find the all relevant parent nodes of a
tree given a particular node. Im using the adjacency model with stored path.
So if I know a certain employees number, I need to list all their superiors
back to the top of the tree.
I have a table structure similar to the following
CREATE TABLE Tree (
Node int NOT NULL IDENTITY(100, 1),
ParentNode int,
EmployeeID int NOT NULL,
Depth tinyint,
Lineage varchar(255) )
The linieage is represented by a the path to the node seperated by a '/'
Node ParentNode EmployeeID Depth Lineage
100 NULL 1001 0 /
101 100 1002 1 /100/
102 101 1003 2 /100/101/
103 102 1004 3 /100/101/102/
So if I have selected employeeid=1004, I want a quick and easy selected to
show all the superiors.
Celko Answers
>> I'm using the adjacency model with stored path. TREES & HIERARCHIES IN SQL? you can get some ideas from it.
A tricky query
SQL Apprentice Question
I have to write a query to generate a report over some interesting
data. It's basically scheduling which days people are working. The
data looks like this:
Employee StartDate EndDate Roster
---------------- ---------------- ---------------- ------------
Bob 12-Jun-06 24-Jun-06 _*___**
Mary 12-Jun-06 24-Jun-06 *_*__*_
The trick is, the roster field contains a string with a _ or *
depending on wether the person is scheduled to work that day or not,
but the first character always starts on the sunday. The startdate and
enddate can be any day of the week.
In the example above, the 12-jun is a monday, so monday corresponds to
the second character in the roster string, so Bob's working and Mary's
not. The roster string wraps around, so the first character of the
roster string actually corresponds with the enddate here! Now, this
roster string could be 7, 10, 14 days long. The startDate -> endDate
could be the length of the roster string or less (only show a subset of
the roster data).
So! I need to write a query to feed a report to format this into
something like:
Monday 12-Jun Tuesday 13-Jun Wednesday 14-Jun Thursday 15-Jun
Friday 16-Jun
----------------------- -----------------------
----------------------------- -------------------------
---------------------
Bob Mary
Bob
Mary
I could get the report out if I can write a query to get it to this:
Employee DateWorking
---------------- -------------------
Bob 12-Jun
Bob 16-Jun
Mary 13-Jun
Mary 16-Jun
Any ideas?
Celko Answers
>> I have to write a query to generate a report over some interesting data. <<
First of all, you are not using ISO-8601 format dates. You might want
to do that, since what you did post was ambigous as well as
non-standard, I hope you do not think that ORACLE is a standard.
Next, you might want to read a book on programming principles. We do
not do reports in the database in a tiered architecture. This is more
fundamental than SQL.
>> It's basically scheduling which days people are working. The data looks like this: <<
Please post DDL, so that people do not have to guess what the keys,
constraints, Declarative Referential Integrity, data types, etc. I
would love to see the LOGICAL definition of that silly bar chart you
labeled "roster" in your narrative since it is pure display.
CREATE TABLE Roster
(employee_name VARCHAR(20) NOT NULL,
start_date DATETIME NOT NULL,
end_date DATETIME NOT NULL,
CHECK (start_date < end_date),
PRIMARY KEY (employee_name, start_date));
>> I could get the report out if I can write a query to get it to this: ..<<
This is usually done with a Calendar table:
SELECT R.employee_name, C.cal_date
FROM Calendar AS C
LEFT OUTER JOIN
Roster AS R
ON C.cal_date BETWEEN R.start_date AND R.end_date
WHERE C.cal_date BETWEEN @my_start_date AND @my_end_date;
Google for other uses of the Calendar and auxiliary tables.
I have to write a query to generate a report over some interesting
data. It's basically scheduling which days people are working. The
data looks like this:
Employee StartDate EndDate Roster
---------------- ---------------- ---------------- ------------
Bob 12-Jun-06 24-Jun-06 _*___**
Mary 12-Jun-06 24-Jun-06 *_*__*_
The trick is, the roster field contains a string with a _ or *
depending on wether the person is scheduled to work that day or not,
but the first character always starts on the sunday. The startdate and
enddate can be any day of the week.
In the example above, the 12-jun is a monday, so monday corresponds to
the second character in the roster string, so Bob's working and Mary's
not. The roster string wraps around, so the first character of the
roster string actually corresponds with the enddate here! Now, this
roster string could be 7, 10, 14 days long. The startDate -> endDate
could be the length of the roster string or less (only show a subset of
the roster data).
So! I need to write a query to feed a report to format this into
something like:
Monday 12-Jun Tuesday 13-Jun Wednesday 14-Jun Thursday 15-Jun
Friday 16-Jun
----------------------- -----------------------
----------------------------- -------------------------
---------------------
Bob Mary
Bob
Mary
I could get the report out if I can write a query to get it to this:
Employee DateWorking
---------------- -------------------
Bob 12-Jun
Bob 16-Jun
Mary 13-Jun
Mary 16-Jun
Any ideas?
Celko Answers
>> I have to write a query to generate a report over some interesting data. <<
First of all, you are not using ISO-8601 format dates. You might want
to do that, since what you did post was ambigous as well as
non-standard, I hope you do not think that ORACLE is a standard.
Next, you might want to read a book on programming principles. We do
not do reports in the database in a tiered architecture. This is more
fundamental than SQL.
>> It's basically scheduling which days people are working. The data looks like this: <<
Please post DDL, so that people do not have to guess what the keys,
constraints, Declarative Referential Integrity, data types, etc. I
would love to see the LOGICAL definition of that silly bar chart you
labeled "roster" in your narrative since it is pure display.
CREATE TABLE Roster
(employee_name VARCHAR(20) NOT NULL,
start_date DATETIME NOT NULL,
end_date DATETIME NOT NULL,
CHECK (start_date < end_date),
PRIMARY KEY (employee_name, start_date));
>> I could get the report out if I can write a query to get it to this: ..<<
This is usually done with a Calendar table:
SELECT R.employee_name, C.cal_date
FROM Calendar AS C
LEFT OUTER JOIN
Roster AS R
ON C.cal_date BETWEEN R.start_date AND R.end_date
WHERE C.cal_date BETWEEN @my_start_date AND @my_end_date;
Google for other uses of the Calendar and auxiliary tables.
Don't know where to start
SQL Apprentice Question
I'm still a novice in Integration Services, and i need to calculate the
time that our tickets are in a Pending Status. The problem that i'm
having is that the Pending Start time, and the Pending End time, are
not in the same record. The table that need to query is like the
following
Ticket__ DateTime Summary
cs00001 1/1/2006 13:00 Ticket entered Pending status
cs00002 1/1/2006 13:03 Some other ticket activity
cs00001 1/1/2006 13:20 Some other activity
cs00001 1/1/2006 13:30 Ticket exited Pending status
I think i need to do this as a SSIS job as we already have one that
imports all the data from our oracle database to SQL (for reporting),
and i would like to run this as a 2nd step in that existing job - but i
just have no idea where to start.
Any help would be appreciated.
Celko Answers
>> I need to calculate the time that our tickets are in a Pending Status. The problem that I'm having is that the Pending Start time, and the Pending End time, are not in the same record [sic] <<
Your error is bedrock, fundamental and deadly. Rows are not records.
Nothing alike. NOTHING. NADA!! A record is a unit of contigous
physical storage in a file system. A row (no matter how it is
physically implemented; which does not have to be contigous storage) is
a single **fact** about a member of a set.
What is the fact in your example; the whole fact and nothing but the
fact?? The fact is that "ticket # XXXX was pending from start time
tttt until finish time ffff" with the constraints that (start time <
finish time), etc. from the DDL (aka "Universe of Discourse").
You mimicked a paper sign-in-sign-out sheet and not a whole fact. Let
(finish-time IS NULL) mean that the process is on-going and not yet
complete.
>> The table that need to query is like the following <<
Please post DDL, so that people do not have to guess what the keys,
constraints, Declarative Referential Integrity, data types, etc. in
your schema are. Sample data is also a good idea, along with clear
specifications. It is very hard to debug code when you do not let us
see it.
if you had proper netiquette, you might have posted (with the ISO-8601
temporal formats, please!!)
CREATE TABLE TicketHistory
(ticket_nbr CHAR (7) NOT NULL,
start_date DATETIME NOT NULL,
end_date DATETIME, -- null is current
status_code CHAR(3) DEFAULT 'new' NOT NULL
CHECK (status_code IN (..) ),
PRIMARY KEY ()ticket_nbr, start_date));
I'm still a novice in Integration Services, and i need to calculate the
time that our tickets are in a Pending Status. The problem that i'm
having is that the Pending Start time, and the Pending End time, are
not in the same record. The table that need to query is like the
following
Ticket__ DateTime Summary
cs00001 1/1/2006 13:00 Ticket entered Pending status
cs00002 1/1/2006 13:03 Some other ticket activity
cs00001 1/1/2006 13:20 Some other activity
cs00001 1/1/2006 13:30 Ticket exited Pending status
I think i need to do this as a SSIS job as we already have one that
imports all the data from our oracle database to SQL (for reporting),
and i would like to run this as a 2nd step in that existing job - but i
just have no idea where to start.
Any help would be appreciated.
Celko Answers
>> I need to calculate the time that our tickets are in a Pending Status. The problem that I'm having is that the Pending Start time, and the Pending End time, are not in the same record [sic] <<
Your error is bedrock, fundamental and deadly. Rows are not records.
Nothing alike. NOTHING. NADA!! A record is a unit of contigous
physical storage in a file system. A row (no matter how it is
physically implemented; which does not have to be contigous storage) is
a single **fact** about a member of a set.
What is the fact in your example; the whole fact and nothing but the
fact?? The fact is that "ticket # XXXX was pending from start time
tttt until finish time ffff" with the constraints that (start time <
finish time), etc. from the DDL (aka "Universe of Discourse").
You mimicked a paper sign-in-sign-out sheet and not a whole fact. Let
(finish-time IS NULL) mean that the process is on-going and not yet
complete.
>> The table that need to query is like the following <<
Please post DDL, so that people do not have to guess what the keys,
constraints, Declarative Referential Integrity, data types, etc. in
your schema are. Sample data is also a good idea, along with clear
specifications. It is very hard to debug code when you do not let us
see it.
if you had proper netiquette, you might have posted (with the ISO-8601
temporal formats, please!!)
CREATE TABLE TicketHistory
(ticket_nbr CHAR (7) NOT NULL,
start_date DATETIME NOT NULL,
end_date DATETIME, -- null is current
status_code CHAR(3) DEFAULT 'new' NOT NULL
CHECK (status_code IN (..) ),
PRIMARY KEY ()ticket_nbr, start_date));
Storing SQL in a table
SQL Apprentice Question
I have a project where I'm trying to "patch" something that probably
needs a rewrite. It currently has some code set in a dll that gets
called in order to give the app calling it the appropriate code to pull
info from a database.
Rewriting it is not an option at this point.
I had the idea of putting code into a SQL table, which would be
associated with a username, that I would then read & put into the sql
that is called within the dotnet application.
Currently, I have to make a change to the dll in order to make a change
at all to the app, which is something I would like to get around having
to do.
I already know what I need to put into the table in order to 'store" the
sql code in it, but I can't get it to take over a few hundred
characters, no matter what type of field I change it to.
Is there a way to store sql into a field in a table? If so, how? I
know the code is right, it has the ' ' where the single quotes need to
be, but it still gets truncated.
Is there a better way to do this, that (a) won't necessitate a program
change each time the data potentially changes? and (b) will make it able
to be called in a manner that would give me the entire piece of code needed?
Celko Answers
Blasting Cap wrote:
> I have a project where I'm trying to "patch" something that probably [needs] rewriting it is not an option at this point. <<
Based on a mere 30+ years in the trade, I think this phrase means "It
is time to update the resume and look for a new job; this company is
dead in a year."
Kludging things only makes them worse. Any kludge that last longer
than 3 weeks is permanent. To paraphrase Mills, "How does a system
fail? One kludge at a time."
I have a project where I'm trying to "patch" something that probably
needs a rewrite. It currently has some code set in a dll that gets
called in order to give the app calling it the appropriate code to pull
info from a database.
Rewriting it is not an option at this point.
I had the idea of putting code into a SQL table, which would be
associated with a username, that I would then read & put into the sql
that is called within the dotnet application.
Currently, I have to make a change to the dll in order to make a change
at all to the app, which is something I would like to get around having
to do.
I already know what I need to put into the table in order to 'store" the
sql code in it, but I can't get it to take over a few hundred
characters, no matter what type of field I change it to.
Is there a way to store sql into a field in a table? If so, how? I
know the code is right, it has the ' ' where the single quotes need to
be, but it still gets truncated.
Is there a better way to do this, that (a) won't necessitate a program
change each time the data potentially changes? and (b) will make it able
to be called in a manner that would give me the entire piece of code needed?
Celko Answers
Blasting Cap wrote:
> I have a project where I'm trying to "patch" something that probably [needs] rewriting it is not an option at this point. <<
Based on a mere 30+ years in the trade, I think this phrase means "It
is time to update the resume and look for a new job; this company is
dead in a year."
Kludging things only makes them worse. Any kludge that last longer
than 3 weeks is permanent. To paraphrase Mills, "How does a system
fail? One kludge at a time."
mssql: case statements with multiple columns
SQL Apprentice Question
A standard case statement in MSSQL server looks like:
...WHERE s.idMarket =
CASE @Mode
WHEN 'Sales' THEN 1
ELSE 2
END
but I want a similar statement to use multiple values of 'idMarket' for
each value of '@Mode'
e.g.
...WHERE s.idMarket =
CASE @Mode
WHEN 'Sales' THEN 1, 3, 5, 6
ELSE 2, 4, 7, 8, 9
END
is this possible through an mssql query?
Celko Answers
>> A standard case statement in MSSQL server looks like: <<
There is no CASE statement in SQL; there is a CASE expression. An
expression returns a scalar value of a known data type.
>> I want a similar statement [sic] to use multiple values [sic] of 'idMarket' for each value of '@Mode' <<
Again, you missed the concept "expression" and "scalar" so the
question is wrong. Then even if it would have worked, you would have
to use IN() and not = in the predicate. Besides not bothering to learn
SQL, why did you violate ISO-11179 by putting "id-" as a prefix? Why
did you use a vague nmae like "mode" --mode of what??
Try something like this:
...WHERE CASE
WHEN S.market_id IN (1, 3, 5, 6)
AND @mode = 'sales'
THEN 'T'
WHEN S.market_id IN (2, 4, 7, 8, 9)
AND @mode <> 'sales'--mode of what??
THEN 'T' ELSE 'F' END = 'T';
A standard case statement in MSSQL server looks like:
...WHERE s.idMarket =
CASE @Mode
WHEN 'Sales' THEN 1
ELSE 2
END
but I want a similar statement to use multiple values of 'idMarket' for
each value of '@Mode'
e.g.
...WHERE s.idMarket =
CASE @Mode
WHEN 'Sales' THEN 1, 3, 5, 6
ELSE 2, 4, 7, 8, 9
END
is this possible through an mssql query?
Celko Answers
>> A standard case statement in MSSQL server looks like: <<
There is no CASE statement in SQL; there is a CASE expression. An
expression returns a scalar value of a known data type.
>> I want a similar statement [sic] to use multiple values [sic] of 'idMarket' for each value of '@Mode' <<
Again, you missed the concept "expression" and "scalar" so the
question is wrong. Then even if it would have worked, you would have
to use IN() and not = in the predicate. Besides not bothering to learn
SQL, why did you violate ISO-11179 by putting "id-" as a prefix? Why
did you use a vague nmae like "mode" --mode of what??
Try something like this:
...WHERE CASE
WHEN S.market_id IN (1, 3, 5, 6)
AND @mode = 'sales'
THEN 'T'
WHEN S.market_id IN (2, 4, 7, 8, 9)
AND @mode <> 'sales'--mode of what??
THEN 'T' ELSE 'F' END = 'T';
Tuesday, June 20, 2006
group data by time interval
SQL Apprentice Question
What SQL will group timed events, so I can see how many records were recorded
in each 10 minute interval. There may be no data for some 10 minute
intervals, in which case I must report zero.
for example...with raw data in a table as
2006/06/19 12:00:00 record 1
2006/06/19 12:05:00 record 2
2006/06/19 12:21:00 record 3
2006/06/19 12:22:00 record 4
2006/06/19 12:23:00 record 5
2006/06/19 12:24:00 record 6
result should be
2 records for 12:00-12:10
0 records for 12:10-12:20
4 records for 12:20-12:30
Celko Answers
>> What SQL will group timed events, so I can see how many records [sic] were recorded in each 10 minute interval. <<
Populate a table with the desired time slots, thus:
CREATE TABLE ReportTimeslots
(range_name CHAR(18) NOT NULL PRIMARY KEY,
start_time DATETIME NOT NULL,
end_time DATETIME NOT NULL,
CHECK (start_time < end_time));
Then do your query.
SELECT R.range_name, COUNT(*) AS event_cnt
FROM ReportTimeslots AS R
LEFT OUTER JOIN
Events AS E
ON E.event_time BETWEEN R.start_time AND R.end_time
GROUP BY R.range_name;
Another trick is a VIEW that sets itself each day by using the
CURRENT_TIMESTAMP for the range_name and does a little temporal math to
get the proper start and end times.
What SQL will group timed events, so I can see how many records were recorded
in each 10 minute interval. There may be no data for some 10 minute
intervals, in which case I must report zero.
for example...with raw data in a table as
2006/06/19 12:00:00 record 1
2006/06/19 12:05:00 record 2
2006/06/19 12:21:00 record 3
2006/06/19 12:22:00 record 4
2006/06/19 12:23:00 record 5
2006/06/19 12:24:00 record 6
result should be
2 records for 12:00-12:10
0 records for 12:10-12:20
4 records for 12:20-12:30
Celko Answers
>> What SQL will group timed events, so I can see how many records [sic] were recorded in each 10 minute interval. <<
Populate a table with the desired time slots, thus:
CREATE TABLE ReportTimeslots
(range_name CHAR(18) NOT NULL PRIMARY KEY,
start_time DATETIME NOT NULL,
end_time DATETIME NOT NULL,
CHECK (start_time < end_time));
Then do your query.
SELECT R.range_name, COUNT(*) AS event_cnt
FROM ReportTimeslots AS R
LEFT OUTER JOIN
Events AS E
ON E.event_time BETWEEN R.start_time AND R.end_time
GROUP BY R.range_name;
Another trick is a VIEW that sets itself each day by using the
CURRENT_TIMESTAMP for the range_name and does a little temporal math to
get the proper start and end times.
Monday, June 19, 2006
How to name Tables & Fields (pk - fk) ?
SQL Apprentice Question
Do you have any suggest or guide lines to help me to understand the better
way for naming tables & fields ?
Normally, I name the pk of all tables with : ID and the fk with :
ID+TableName
Example:
Table: Authors
id (pk)
AuthorName
Table: Book
id (pk)
idAuthor (fk)
Title
It's correct ?
Celko Answers
>> Do you have any suggest or guide lines to help me to understand the better way for naming tables & fields [sic] ? <<
Columns are not fields -- major difference!.
Look at the ISO-11179 Standards for metadata and data element names.
The basic priniciple is to name a thing for what it is, not how it is
used, where is located or how it is stored.
I have an entire book, SQL PROGRAMMING STYLE on this topic, with rules
and guide lines.
>> Is it correct ? <<
Completely wrong
1) you have TWO names for the same data element
2) the word "id" is a modifier for an attribute. Identifer of what??
3) you avoid any industry standards -- use ISBN and not "book_id"
Do you have any suggest or guide lines to help me to understand the better
way for naming tables & fields ?
Normally, I name the pk of all tables with : ID and the fk with :
ID+TableName
Example:
Table: Authors
id (pk)
AuthorName
Table: Book
id (pk)
idAuthor (fk)
Title
It's correct ?
Celko Answers
>> Do you have any suggest or guide lines to help me to understand the better way for naming tables & fields [sic] ? <<
Columns are not fields -- major difference!.
Look at the ISO-11179 Standards for metadata and data element names.
The basic priniciple is to name a thing for what it is, not how it is
used, where is located or how it is stored.
I have an entire book, SQL PROGRAMMING STYLE on this topic, with rules
and guide lines.
>> Is it correct ? <<
Completely wrong
1) you have TWO names for the same data element
2) the word "id" is a modifier for an attribute. Identifer of what??
3) you avoid any industry standards -- use ISBN and not "book_id"
Reference alias field name
SQL Apprentice Question
Is there a way to Reference an alias field name in an SQL Statement?
Example:
Select
1 + 1 AS F1,
F1 + 1 AS F2
Celko Answers
>> Is there a way to Reference an alias field name in an SQL Statement? <<
You are still thinking of a procedural "left-to-right" programming
language based on files and not tables.
Here is how a SELECT works in SQL ... at least in theory. Real
products will optimize things, but the code has to produce the same
results.
a) Start in the FROM clause and build a working table from all of the
joins, unions, intersections, and whatever other table constructors are
there. The [table expression] AS [correlation name] option allows you
give a name to this working table which you then have to use for the
rest of the containing query.
b) Go to the WHERE clause and remove rows that do not pass criteria;
that is, that do not test to TRUE (i.e. reject UNKNOWN and FALSE). The
WHERE clause is applied to the working set in the FROM clause.
c) Go to the optional GROUP BY clause, make groups and reduce each
group to a single row, replacing the original working table with the
new grouped table. The rows of a grouped table must be group
characteristics: (1) a grouping column (2) a statistic about the group
(i.e. aggregate functions) (3) a function or (4) an expression made up
those three items. The original table no longer exists.
d) Go to the optional HAVING clause and apply it against the grouped
working table; if there was no GROUP BY clause, treat the entire table
as one group.
e) Go to the SELECT clause and construct the expressions in the list.
This means that the scalar subqueries, function calls and expressions
in the SELECT are done after all the other clauses are done. The AS
operator can also give names to expressions in the SELECT list. These
new names come into existence all at once, but after the WHERE clause,
GROUP BY clause and HAVING clause have been executed; you cannot use
them in the SELECT list or the WHERE clause for that reason.
If there is a SELECT DISTINCT, then redundant duplicate rows are
removed. For purposes of defining a duplicate row, NULLs are treated
as matching (just like in the GROUP BY).
f) Nested query expressions follow the usual scoping rules you would
expect from a block structured language like C, Pascal, Algol, etc.
Namely, the innermost queries can reference columns and tables in the
queries in which they are contained.
g) The ORDER BY clause is part of a cursor, not a query. The result
set is passed to the cursor, which can only see the names in the SELECT
clause list, and the sorting is done there. The ORDER BY clause cannot
have expression in it, or references to other columns because the
result set has been converted into a sequential file structure and that
is what is being sorted.
As you can see, things happen "all at once" in SQL, not "from left to
right" as they would in a sequential file/procedural language model. In
those languages, these two statements produce different results:
READ (a, b, c) FROM File_X;
READ (c, a, b) FROM File_X;
while these two statements return the same data:
SELECT a, b, c FROM Table_X;
SELECT c, a, b FROM Table_X;
Think about what a confused mess this statement is in the SQL model.
SELECT f(c2) AS c1, f(c1) AS c2 FROM Foobar;
That is why such nonsense is illegal syntax.
Is there a way to Reference an alias field name in an SQL Statement?
Example:
Select
1 + 1 AS F1,
F1 + 1 AS F2
Celko Answers
>> Is there a way to Reference an alias field name in an SQL Statement? <<
You are still thinking of a procedural "left-to-right" programming
language based on files and not tables.
Here is how a SELECT works in SQL ... at least in theory. Real
products will optimize things, but the code has to produce the same
results.
a) Start in the FROM clause and build a working table from all of the
joins, unions, intersections, and whatever other table constructors are
there. The [table expression] AS [correlation name] option allows you
give a name to this working table which you then have to use for the
rest of the containing query.
b) Go to the WHERE clause and remove rows that do not pass criteria;
that is, that do not test to TRUE (i.e. reject UNKNOWN and FALSE). The
WHERE clause is applied to the working set in the FROM clause.
c) Go to the optional GROUP BY clause, make groups and reduce each
group to a single row, replacing the original working table with the
new grouped table. The rows of a grouped table must be group
characteristics: (1) a grouping column (2) a statistic about the group
(i.e. aggregate functions) (3) a function or (4) an expression made up
those three items. The original table no longer exists.
d) Go to the optional HAVING clause and apply it against the grouped
working table; if there was no GROUP BY clause, treat the entire table
as one group.
e) Go to the SELECT clause and construct the expressions in the list.
This means that the scalar subqueries, function calls and expressions
in the SELECT are done after all the other clauses are done. The AS
operator can also give names to expressions in the SELECT list. These
new names come into existence all at once, but after the WHERE clause,
GROUP BY clause and HAVING clause have been executed; you cannot use
them in the SELECT list or the WHERE clause for that reason.
If there is a SELECT DISTINCT, then redundant duplicate rows are
removed. For purposes of defining a duplicate row, NULLs are treated
as matching (just like in the GROUP BY).
f) Nested query expressions follow the usual scoping rules you would
expect from a block structured language like C, Pascal, Algol, etc.
Namely, the innermost queries can reference columns and tables in the
queries in which they are contained.
g) The ORDER BY clause is part of a cursor, not a query. The result
set is passed to the cursor, which can only see the names in the SELECT
clause list, and the sorting is done there. The ORDER BY clause cannot
have expression in it, or references to other columns because the
result set has been converted into a sequential file structure and that
is what is being sorted.
As you can see, things happen "all at once" in SQL, not "from left to
right" as they would in a sequential file/procedural language model. In
those languages, these two statements produce different results:
READ (a, b, c) FROM File_X;
READ (c, a, b) FROM File_X;
while these two statements return the same data:
SELECT a, b, c FROM Table_X;
SELECT c, a, b FROM Table_X;
Think about what a confused mess this statement is in the SQL model.
SELECT f(c2) AS c1, f(c1) AS c2 FROM Foobar;
That is why such nonsense is illegal syntax.
1NF Violation?
SQL Apprentice Question
I'm evaluating an existing database for a medical clinic. It has many tables
with *identical columns*. Apparently the existing dba thought that doctors
with different specialties were different entities: like "Cardiologists"
"Hepatologists" "Etc". These tables store contact information, including:
Name (first and last all in one column), Address, Location (whatever the
heck that is), PhoneNumber, PhoneExtension, FaxNumber. The DDL of these
tables is *identical* (with the only difference being the table name).
Does this violate the "no repeating groups" rule of 1NF?
I've generally thought that rule was for columns like Phone1, Phone2, Phone3
(within the same table). But these are repeating groups *among* tables. Yes?
Does the "no repeating groups" rule mean both *within* and *among* tables?
Celko Answers
>> Does this violate the "no repeating groups" rule of 1NF? <<
No, this is called "attribute splitting" and it is one of the
"non-normal form" redundancies. In this example, you have converted
the values of an attribute into multiple tables. Another way to do
this screw up is to split the values into columns within the same table
(usually bit flags that should be modeled as a single code, but there
are other ways).
Tom Johnston did a series of articles years ago on other redundancies
that are not taught in begining classses.
I'm evaluating an existing database for a medical clinic. It has many tables
with *identical columns*. Apparently the existing dba thought that doctors
with different specialties were different entities: like "Cardiologists"
"Hepatologists" "Etc". These tables store contact information, including:
Name (first and last all in one column), Address, Location (whatever the
heck that is), PhoneNumber, PhoneExtension, FaxNumber. The DDL of these
tables is *identical* (with the only difference being the table name).
Does this violate the "no repeating groups" rule of 1NF?
I've generally thought that rule was for columns like Phone1, Phone2, Phone3
(within the same table). But these are repeating groups *among* tables. Yes?
Does the "no repeating groups" rule mean both *within* and *among* tables?
Celko Answers
>> Does this violate the "no repeating groups" rule of 1NF? <<
No, this is called "attribute splitting" and it is one of the
"non-normal form" redundancies. In this example, you have converted
the values of an attribute into multiple tables. Another way to do
this screw up is to split the values into columns within the same table
(usually bit flags that should be modeled as a single code, but there
are other ways).
Tom Johnston did a series of articles years ago on other redundancies
that are not taught in begining classses.
SQL Join
SQL Apprentice Question
I'm trying to do a join based on the following tables:
Person(person_id,person_name)
Grade(grade_id,grade_person_id,grade_score)
The data looks like this:
Person:
1,John
2,Dave
Grade:
1,1,80
2,1,90
3,2,60
4,2,70
I'd like a query that returns the each persons name along with their
highest grade.
What would the query be?
Celko Answers
Please post DDL, so that people do not have to guess what the keys,
constraints, Declarative Referential Integrity, data types, etc. in
your schema are. Sample data is also a good idea, along with clear
specifications. It is very hard to debug code when you do not let us
see it.
If you had followed minimal Netiquette, would you have posted something
that shows that you are dealing with sets (collective or plural names)
that have particular roles in the data model:
CREATE TABLE Students -- we doi not grade people unless we are
cannibals
(student_id INTEGER NOT NULL PRIMARY KEY,
student_name VARCHAR(35) NOT NULL)';
CREATE TABLE Grades -- in what?
(student_id INTEGER NOT NULL
REFERENCES Students (student_id)
grade_score DECIMAL (5,2) NOT NULL,
PRIMARY KEY (student_id, grade_score);
>> The data looks like this: .. <<
I HOPE NOT!! That would a disaster.
>> I'd like a query that returns the each person's [student] name along with their
highest grade. <<
Why does this smell of homework assignment. So first thing is to
answer your quesiton then track you down and report you for cheating
to your school if needed.
SELECT S.student_id, S.student_name, MAX(G.grade_score)
FROM Grades AS G, Students AS S
WHERE S.student_id = G.student_id
GROUP BY S.student_id, S.student_name;
You have to have the student_id in case of duplicate names.
I'm trying to do a join based on the following tables:
Person(person_id,person_name)
Grade(grade_id,grade_person_id,grade_score)
The data looks like this:
Person:
1,John
2,Dave
Grade:
1,1,80
2,1,90
3,2,60
4,2,70
I'd like a query that returns the each persons name along with their
highest grade.
What would the query be?
Celko Answers
Please post DDL, so that people do not have to guess what the keys,
constraints, Declarative Referential Integrity, data types, etc. in
your schema are. Sample data is also a good idea, along with clear
specifications. It is very hard to debug code when you do not let us
see it.
If you had followed minimal Netiquette, would you have posted something
that shows that you are dealing with sets (collective or plural names)
that have particular roles in the data model:
CREATE TABLE Students -- we doi not grade people unless we are
cannibals
(student_id INTEGER NOT NULL PRIMARY KEY,
student_name VARCHAR(35) NOT NULL)';
CREATE TABLE Grades -- in what?
(student_id INTEGER NOT NULL
REFERENCES Students (student_id)
grade_score DECIMAL (5,2) NOT NULL,
PRIMARY KEY (student_id, grade_score);
>> The data looks like this: .. <<
I HOPE NOT!! That would a disaster.
>> I'd like a query that returns the each person's [student] name along with their
highest grade. <<
Why does this smell of homework assignment. So first thing is to
answer your quesiton then track you down and report you for cheating
to your school if needed.
SELECT S.student_id, S.student_name, MAX(G.grade_score)
FROM Grades AS G, Students AS S
WHERE S.student_id = G.student_id
GROUP BY S.student_id, S.student_name;
You have to have the student_id in case of duplicate names.
trouble with table-valued function
SQL Apprentice Question
Weird problem. I have a table-valued function (not inline) called
SalesByKeyword which returns a table of sales stats for products
containing a keyword/phrase in their descriptions. The function works
fine when I pass a constant into it as the keyword, but I'm trying to
have it evaluate for each row of the join. ST04.SimpleText is the
keyword for each row.
I get the error 'Msg 4104, Level 16, State 1, Line 1
The multi-part identifier "ST04.SimpleText" could not be bound.'
SELECT TOP 50 LTRIM(RTRIM(LOWER(ST04.SimpleText))) AS [Text],
COUNT(*) AS [Count],
CONVERT(smallint, CONVERT(decimal(7, 0), SUM(ResultCount)) /
CONVERT(decimal(7, 0), COUNT(*))) AS AvgResults,
SBK.OrderCount,
SBK.LineCount,
SBK.TotalValue
FROM wsite.dbo.ST04_Search ST04
LEFT OUTER JOIN wsite.dbo.SalesByKeyword(ST04.SimpleText,
'05/16/2006', '05/17/2006') AS SBK
ON (SBK.Keyword = ST04.SimpleText)
WHERE (
(ST04.SimpleText IS NOT NULL) AND
(ST04.Created >= '05/16/2006') AND
(ST04.Created < '06/17/2006') AND
(ST04.StartRec = 1)
)
GROUP BY LTRIM(RTRIM(LOWER(ST04.SimpleText))), SBK.OrderCount,
SBK.LineCount, SBK.TotalValue
ORDER BY [Count] DESC, [Text]
Why, why!!!
Celko Answers
>> I have a table-valued function (not inline) called SalesByKeyword which returns a table of sales stats for products containing a keyword/phrase in their descriptions. The function works fine when I pass a constant into it as the keyword, but I'm trying to have it evaluate for each row of the join. ST04.SimpleText is the keyword for each row. <<
I would dump the proprietary table-valued function in favor of real
SQL. All you are going to do is sink further into a proprietary hole
that cannot be ported, maintained or optimized
You might want to use ISO-8601 format dates, CAST() instead of
CONVERT(), etc. so later programmers can read and maintain your code.
Question: why are you trimming and CASTING () data from a table?
Answer: the tables have no constraints or proper data types. You need
to fix that as soon as possible, or you will spend your life writing
these convolved queries.
And finally, what is this supposed to do? The average of something
related to a keyword set is weird. I woudl have expected to see
something like "WHERE O.product_descrip LIKE '%' + K.keyword +'%'"
from your narrative.
Weird problem. I have a table-valued function (not inline) called
SalesByKeyword which returns a table of sales stats for products
containing a keyword/phrase in their descriptions. The function works
fine when I pass a constant into it as the keyword, but I'm trying to
have it evaluate for each row of the join. ST04.SimpleText is the
keyword for each row.
I get the error 'Msg 4104, Level 16, State 1, Line 1
The multi-part identifier "ST04.SimpleText" could not be bound.'
SELECT TOP 50 LTRIM(RTRIM(LOWER(ST04.SimpleText))) AS [Text],
COUNT(*) AS [Count],
CONVERT(smallint, CONVERT(decimal(7, 0), SUM(ResultCount)) /
CONVERT(decimal(7, 0), COUNT(*))) AS AvgResults,
SBK.OrderCount,
SBK.LineCount,
SBK.TotalValue
FROM wsite.dbo.ST04_Search ST04
LEFT OUTER JOIN wsite.dbo.SalesByKeyword(ST04.SimpleText,
'05/16/2006', '05/17/2006') AS SBK
ON (SBK.Keyword = ST04.SimpleText)
WHERE (
(ST04.SimpleText IS NOT NULL) AND
(ST04.Created >= '05/16/2006') AND
(ST04.Created < '06/17/2006') AND
(ST04.StartRec = 1)
)
GROUP BY LTRIM(RTRIM(LOWER(ST04.SimpleText))), SBK.OrderCount,
SBK.LineCount, SBK.TotalValue
ORDER BY [Count] DESC, [Text]
Why, why!!!
Celko Answers
>> I have a table-valued function (not inline) called SalesByKeyword which returns a table of sales stats for products containing a keyword/phrase in their descriptions. The function works fine when I pass a constant into it as the keyword, but I'm trying to have it evaluate for each row of the join. ST04.SimpleText is the keyword for each row. <<
I would dump the proprietary table-valued function in favor of real
SQL. All you are going to do is sink further into a proprietary hole
that cannot be ported, maintained or optimized
You might want to use ISO-8601 format dates, CAST() instead of
CONVERT(), etc. so later programmers can read and maintain your code.
Question: why are you trimming and CASTING () data from a table?
Answer: the tables have no constraints or proper data types. You need
to fix that as soon as possible, or you will spend your life writing
these convolved queries.
And finally, what is this supposed to do? The average of something
related to a keyword set is weird. I woudl have expected to see
something like "WHERE O.product_descrip LIKE '%' + K.keyword +'%'"
from your narrative.
Give me your Thoughts-Storing format in database
SQL Apprentice Question
I have the following tables:
NumberKey
ID
Name
NumberValue
ID
NumberKey_ID
Value
The values stored in the NumberValue table could be currency,
percentage, or number that require 7 digits to the right of the decimal
place, etc., etc. (This table is not very big, so I'm storing many
different types of numbers here, and not worrying about the most
efficient data types). I am toying with the idea of adding a format
field to the NumberKey table to use later in the UI to format the Value
field in NumberValue. I am envisioning using percentage, currency,
etc. as string so the UI knows what format function to use to display
the value in a text box, etc. etc.
Does anyone have any experience with this? Is this even a good idea?
Should I let business logic handle this?
Celko Answers
>> The values stored in the NumberValue table could be currency, percentage, or number that require 7 digits to the right of the decimal place, etc., etc. <<
Then **by definition** this is NOT a table. The mi9stake you are
making is so bad and so popular among non_SQL programemrs it has a name
-- EAV (Entity-Attribute-Value) modeling.
A table **by definition** is a set of things of the same kind. As
Aristlotle put it, "To be is to be something in particular; to be
nothing in particular or to be everythig gin general is to be nothing."
>> (This table is not very big, so I'm storing many different types of numbers here, and not worrying about the most efficient data types). <<
You have no idea what data integrity is.
>> I am toying with the idea of adding a format field [sic] to the NumberKey table to use later in the UI to format the Value field [sic] in NumberValue. <<
You do not know that a field and column are different concepts. Fields
can change type and format; columns do not. COBOL programmers often
want to do this because that is how their language works.
Why are you formatting data in the back end? The basic principle of a
tiered architecture is that display is done in the front end and never
in the back end. This a more basic programming principle than just SQL
and RDBMS.
>> Does anyone have any experience with this? <<
Yes. I bill $1000 per day plus expenses to repair these systems. I
have found that they get corrupted in about one year in production, and
corrupted beyond repair in less than two years.
I found an old "cut & paste". Someone like you posted this:
CREATE TABLE EAV -- no key declared
(key_col VARCHAR (10) NULL,
attrib_value VARCHAR (50) NULL);
INSERT INTO EAV VALUES ('LOCATION','Bedroom');
INSERT INTO EAV VALUES ('LOCATION','Dining Room');
INSERT INTO EAV VALUES ('LOCATION','Bathroom');
INSERT INTO EAV VALUES ('LOCATION','courtyard');
INSERT INTO EAV VALUES ('EVENT','verbal aggression');
INSERT INTO EAV VALUES ('EVENT','peer');
INSERT INTO EAV VALUES ('EVENT','bad behavior');
INSERT INTO EAV VALUES ('EVENT','other');
CREATE TABLE EAV_DATA -note lack of constraints, defaults, DRI
(id INTEGER IDENTITY (1,1) NOT NULL,
bts_id INTEGER NULL,
key_col VARCHAR (10) NULL,
attrib_value VARCHAR (50) NULL );
INSERT INTO EAV_DATA VALUES (1, 'LOCATION', 'Bedroom');
INSERT INTO EAV_DATA VALUES (1, 'EVENT', 'other');
INSERT INTO EAV_DATA VALUES (1, 'EVENT', 'bad behavior');
INSERT INTO EAV_DATA VALUES (2, 'LOCATION', 'Bedroom');
INSERT INTO EAV_DATA VALUES (2, 'EVENT', 'other');
INSERT INTO EAV_DATA VALUES (2, 'EVENT', 'verbal aggression');
INSERT INTO EAV_DATA VALUES (3, 'LOCATION', 'courtyard');
INSERT INTO EAV_DATA VALUES (3, 'EVENT', 'other');
INSERT INTO EAV_DATA VALUES (3, 'EVENT', 'peer');
Ideally, the result set of the query would be Location Event count
(headings if possible)
Bedroom verbal aggression 1
Bedroom peer 0
Bedroom bad behavior 0
Bedroom other 2
Dining Room verbal aggression 0
Dining Room peer 0
Dining Room bad behavior 0
Dining Room other 0
Bathroom verbal aggression 0
Bathroom peer 0
Bathroom bad behavior 0
Bathroom other 0
courtyard verbal aggression 0
courtyard peer 1
courtyard bad behavior 0
courtyard other 1
Also, if possible, another query would return this result set. (I think
I know how to do this one.)
Location Event count
Bedroom verbal aggression 1
Bedroom other 2
courtyard peer 1
courtyard other 1
Here is an answer From: Thomas Coleman
SELECT Locations.locationvalue, Events.eventvalue,
(SELECT COUNT(*)
FROM (SELECT LocationData.locationvalue, EventData.eventvalue
FROM (SELECT TD1.bts_id, TD1.value AS locationvalue
FROM eav_data AS TD1
WHERE TD1.key = 'location') AS LocationData
INNER JOIN
(SELECT TD2.bts_id, TD2.value AS eventvalue
FROM eav_data AS TD2
WHERE TD2.key = 'event'
) AS EventData
ON LocationData.bts_id = EventData.bts_id
) AS CollatedEventData
WHERE CollatedEventData.locationvalue = Locations.locationvalue
AND CollatedEventData.eventvalue = Events.eventvalue
FROM (SELECT T1.value AS locationvalue
FROM EAV AS T1
WHERE T1.key = 'location') AS Locations,
(SELECT T2.value AS eventvalue
FROM EAV AS T2
WHERE T2.key = 'event') AS Events
ORDER BY Locations.locationvalue, Events.eventvalue ,
SELECT Locations.locationvalue, Events.eventvalue
(SELECT COUNT(*)
FROM (SELECT LocationData.locationvalue, EventData.eventvalue
FROM (SELECT TD1.bts_id, TD1.value AS locationvalue
FROM eav_data AS TD1
WHERE TD1.key = 'location') AS LocationData
INNER JOIN
(SELECT TD2.bts_id, TD2.value AS eventvalue
FROM eav_data AS TD2
WHERE TD2.key = 'event') AS EventData
ON LocationData.bts_id = EventData.bts_id)
AS CollatedEventData
WHERE CollatedEventData.locationvalue = Locations.locationvalue
AND CollatedEventData.eventvalue = Events.eventvalue)
FROM (SELECT T1.value AS locationvalue
FROM EAV AS T1
WHERE T1.key = 'location') AS Locations,
(SELECT T2.value AS eventvalue
FROM EAV AS T2
WHERE T2.key = 'event') AS Events;
Is the same thing in a proper schema as:
SELECT L.locationvalue, E.eventvalue, COUNT(*)
FROM Locations AS L, Events AS E
WHERE L.btd_id = E.btd_id
GROUP BY L.locationvalue, E.eventvalue;
The reason that I had to use so many subqueries is that those entities
are all lopped into the same table. There should be separate tables for
Locations and Events.
The column names are seriously painful. Beyond the fact that I
personally hate underscores in column names, using underscores at the
end of the column name is really non-intuitive. I removed them for my
example and came across the next column name faux pas. Don't use "key"
and "value" for column names. It means that the developer *has*
surround the column name with square brackets for everything which is a
serious pain.
There is such a thing as "too" generic. There has to be some structure
or everything becomes nothing more than a couple of tables called
"things". The real key (no pun intended) is commonality. Is there a
pattern to the data that they want to store? It may not be possible to
create one structure to rule them all and in the darkness bind them.
"To be is to be something in particular; to be nothing in particular is
to be nothing." --Aristole
All data integrity is destroyed. Any typo becomes a new attribute or
entity. Entities are found missing attributes, so all the reports are
wrong.
ry to write a single CHECK() constraint that works for all the
attributes of those 30+ entities your users created because you were
too dumb or too lazy to do your job. It can be done! You need a case
expression almost 70 WHEN clauses for a simple invoice and order system
when I tried it as an exercise.
try to write a single DEFAULT clause for 30+ entities crammed into one
column. Impossible!
Try to set up DRI actions among the entities. If you thought the WHEN
clauses in the single CASE expression were unmaintainable, wait until
you see the "TRIGGERs from Hell" -- Too bad that they might not fit
into older SQL Server which had some size limits. Now maintain it.
I have the following tables:
NumberKey
ID
Name
NumberValue
ID
NumberKey_ID
Value
The values stored in the NumberValue table could be currency,
percentage, or number that require 7 digits to the right of the decimal
place, etc., etc. (This table is not very big, so I'm storing many
different types of numbers here, and not worrying about the most
efficient data types). I am toying with the idea of adding a format
field to the NumberKey table to use later in the UI to format the Value
field in NumberValue. I am envisioning using percentage, currency,
etc. as string so the UI knows what format function to use to display
the value in a text box, etc. etc.
Does anyone have any experience with this? Is this even a good idea?
Should I let business logic handle this?
Celko Answers
>> The values stored in the NumberValue table could be currency, percentage, or number that require 7 digits to the right of the decimal place, etc., etc. <<
Then **by definition** this is NOT a table. The mi9stake you are
making is so bad and so popular among non_SQL programemrs it has a name
-- EAV (Entity-Attribute-Value) modeling.
A table **by definition** is a set of things of the same kind. As
Aristlotle put it, "To be is to be something in particular; to be
nothing in particular or to be everythig gin general is to be nothing."
>> (This table is not very big, so I'm storing many different types of numbers here, and not worrying about the most efficient data types). <<
You have no idea what data integrity is.
>> I am toying with the idea of adding a format field [sic] to the NumberKey table to use later in the UI to format the Value field [sic] in NumberValue. <<
You do not know that a field and column are different concepts. Fields
can change type and format; columns do not. COBOL programmers often
want to do this because that is how their language works.
Why are you formatting data in the back end? The basic principle of a
tiered architecture is that display is done in the front end and never
in the back end. This a more basic programming principle than just SQL
and RDBMS.
>> Does anyone have any experience with this? <<
Yes. I bill $1000 per day plus expenses to repair these systems. I
have found that they get corrupted in about one year in production, and
corrupted beyond repair in less than two years.
I found an old "cut & paste". Someone like you posted this:
CREATE TABLE EAV -- no key declared
(key_col VARCHAR (10) NULL,
attrib_value VARCHAR (50) NULL);
INSERT INTO EAV VALUES ('LOCATION','Bedroom');
INSERT INTO EAV VALUES ('LOCATION','Dining Room');
INSERT INTO EAV VALUES ('LOCATION','Bathroom');
INSERT INTO EAV VALUES ('LOCATION','courtyard');
INSERT INTO EAV VALUES ('EVENT','verbal aggression');
INSERT INTO EAV VALUES ('EVENT','peer');
INSERT INTO EAV VALUES ('EVENT','bad behavior');
INSERT INTO EAV VALUES ('EVENT','other');
CREATE TABLE EAV_DATA -note lack of constraints, defaults, DRI
(id INTEGER IDENTITY (1,1) NOT NULL,
bts_id INTEGER NULL,
key_col VARCHAR (10) NULL,
attrib_value VARCHAR (50) NULL );
INSERT INTO EAV_DATA VALUES (1, 'LOCATION', 'Bedroom');
INSERT INTO EAV_DATA VALUES (1, 'EVENT', 'other');
INSERT INTO EAV_DATA VALUES (1, 'EVENT', 'bad behavior');
INSERT INTO EAV_DATA VALUES (2, 'LOCATION', 'Bedroom');
INSERT INTO EAV_DATA VALUES (2, 'EVENT', 'other');
INSERT INTO EAV_DATA VALUES (2, 'EVENT', 'verbal aggression');
INSERT INTO EAV_DATA VALUES (3, 'LOCATION', 'courtyard');
INSERT INTO EAV_DATA VALUES (3, 'EVENT', 'other');
INSERT INTO EAV_DATA VALUES (3, 'EVENT', 'peer');
Ideally, the result set of the query would be Location Event count
(headings if possible)
Bedroom verbal aggression 1
Bedroom peer 0
Bedroom bad behavior 0
Bedroom other 2
Dining Room verbal aggression 0
Dining Room peer 0
Dining Room bad behavior 0
Dining Room other 0
Bathroom verbal aggression 0
Bathroom peer 0
Bathroom bad behavior 0
Bathroom other 0
courtyard verbal aggression 0
courtyard peer 1
courtyard bad behavior 0
courtyard other 1
Also, if possible, another query would return this result set. (I think
I know how to do this one.)
Location Event count
Bedroom verbal aggression 1
Bedroom other 2
courtyard peer 1
courtyard other 1
Here is an answer From: Thomas Coleman
SELECT Locations.locationvalue, Events.eventvalue,
(SELECT COUNT(*)
FROM (SELECT LocationData.locationvalue, EventData.eventvalue
FROM (SELECT TD1.bts_id, TD1.value AS locationvalue
FROM eav_data AS TD1
WHERE TD1.key = 'location') AS LocationData
INNER JOIN
(SELECT TD2.bts_id, TD2.value AS eventvalue
FROM eav_data AS TD2
WHERE TD2.key = 'event'
) AS EventData
ON LocationData.bts_id = EventData.bts_id
) AS CollatedEventData
WHERE CollatedEventData.locationvalue = Locations.locationvalue
AND CollatedEventData.eventvalue = Events.eventvalue
FROM (SELECT T1.value AS locationvalue
FROM EAV AS T1
WHERE T1.key = 'location') AS Locations,
(SELECT T2.value AS eventvalue
FROM EAV AS T2
WHERE T2.key = 'event') AS Events
ORDER BY Locations.locationvalue, Events.eventvalue ,
SELECT Locations.locationvalue, Events.eventvalue
(SELECT COUNT(*)
FROM (SELECT LocationData.locationvalue, EventData.eventvalue
FROM (SELECT TD1.bts_id, TD1.value AS locationvalue
FROM eav_data AS TD1
WHERE TD1.key = 'location') AS LocationData
INNER JOIN
(SELECT TD2.bts_id, TD2.value AS eventvalue
FROM eav_data AS TD2
WHERE TD2.key = 'event') AS EventData
ON LocationData.bts_id = EventData.bts_id)
AS CollatedEventData
WHERE CollatedEventData.locationvalue = Locations.locationvalue
AND CollatedEventData.eventvalue = Events.eventvalue)
FROM (SELECT T1.value AS locationvalue
FROM EAV AS T1
WHERE T1.key = 'location') AS Locations,
(SELECT T2.value AS eventvalue
FROM EAV AS T2
WHERE T2.key = 'event') AS Events;
Is the same thing in a proper schema as:
SELECT L.locationvalue, E.eventvalue, COUNT(*)
FROM Locations AS L, Events AS E
WHERE L.btd_id = E.btd_id
GROUP BY L.locationvalue, E.eventvalue;
The reason that I had to use so many subqueries is that those entities
are all lopped into the same table. There should be separate tables for
Locations and Events.
The column names are seriously painful. Beyond the fact that I
personally hate underscores in column names, using underscores at the
end of the column name is really non-intuitive. I removed them for my
example and came across the next column name faux pas. Don't use "key"
and "value" for column names. It means that the developer *has*
surround the column name with square brackets for everything which is a
serious pain.
There is such a thing as "too" generic. There has to be some structure
or everything becomes nothing more than a couple of tables called
"things". The real key (no pun intended) is commonality. Is there a
pattern to the data that they want to store? It may not be possible to
create one structure to rule them all and in the darkness bind them.
"To be is to be something in particular; to be nothing in particular is
to be nothing." --Aristole
All data integrity is destroyed. Any typo becomes a new attribute or
entity. Entities are found missing attributes, so all the reports are
wrong.
ry to write a single CHECK() constraint that works for all the
attributes of those 30+ entities your users created because you were
too dumb or too lazy to do your job. It can be done! You need a case
expression almost 70 WHEN clauses for a simple invoice and order system
when I tried it as an exercise.
try to write a single DEFAULT clause for 30+ entities crammed into one
column. Impossible!
Try to set up DRI actions among the entities. If you thought the WHEN
clauses in the single CASE expression were unmaintainable, wait until
you see the "TRIGGERs from Hell" -- Too bad that they might not fit
into older SQL Server which had some size limits. Now maintain it.
Help with query
SQL Apprentice Question
Thanks in advance for your help.
Given the following table:
CREATE TABLE T1 (
seq INTEGER NOT NULL,
value CHAR(1) NOT NULL)
INSERT INTO TABLE T1 VALUES (1, 'A')
INSERT INTO TABLE T1 VALUES (2, 'A')
INSERT INTO TABLE T1 VALUES (3, 'A')
INSERT INTO TABLE T1 VALUES (5, 'B')
INSERT INTO TABLE T1 VALUES (6, 'B')
INSERT INTO TABLE T1 VALUES (7, 'A')
I would like to write a query that would return the beginning and
ending seq for each value. The desired results for the above data
would be as follows:
Beg End Value
1 3 A
5 6 B
7 7 A
I'm hesitant to post the queries I have been trying, because I don't
want to taint someone's thinking, but I have been approaching this with
the idea that a self join will be part of the solution. Simply
grouping by value using max and min functions will not return the
desired results.
Celko Answers
Another version with the outer limits of each sequential run. I assume
a table of sequential numbers called Sequence and the little-used ALL()
predicate.
SELECT val, MIN(lft), rgt
FROM (
SELECT F1.val, S1.seq, MAX(S2.seq)
FROM Foobar AS F1, Sequence AS S1, Sequence AS S2
WHERE S2.seq BETWEEN S1.seq AND (SELECT MAX(seq) FROM Foobar)
AND F1.val
= ALL(SELECT val
FROM Foobar AS F2
WHERE F2.seq BETWEEN S1.seq AND S2.seq
AND S1.seq <= S2.seq)
GROUP BY F1.val, S1.seq) AS X (val, lft, rgt)
GROUP BY X.val, X.rgt;
Thanks in advance for your help.
Given the following table:
CREATE TABLE T1 (
seq INTEGER NOT NULL,
value CHAR(1) NOT NULL)
INSERT INTO TABLE T1 VALUES (1, 'A')
INSERT INTO TABLE T1 VALUES (2, 'A')
INSERT INTO TABLE T1 VALUES (3, 'A')
INSERT INTO TABLE T1 VALUES (5, 'B')
INSERT INTO TABLE T1 VALUES (6, 'B')
INSERT INTO TABLE T1 VALUES (7, 'A')
I would like to write a query that would return the beginning and
ending seq for each value. The desired results for the above data
would be as follows:
Beg End Value
1 3 A
5 6 B
7 7 A
I'm hesitant to post the queries I have been trying, because I don't
want to taint someone's thinking, but I have been approaching this with
the idea that a self join will be part of the solution. Simply
grouping by value using max and min functions will not return the
desired results.
Celko Answers
Another version with the outer limits of each sequential run. I assume
a table of sequential numbers called Sequence and the little-used ALL()
predicate.
SELECT val, MIN(lft), rgt
FROM (
SELECT F1.val, S1.seq, MAX(S2.seq)
FROM Foobar AS F1, Sequence AS S1, Sequence AS S2
WHERE S2.seq BETWEEN S1.seq AND (SELECT MAX(seq) FROM Foobar)
AND F1.val
= ALL(SELECT val
FROM Foobar AS F2
WHERE F2.seq BETWEEN S1.seq AND S2.seq
AND S1.seq <= S2.seq)
GROUP BY F1.val, S1.seq) AS X (val, lft, rgt)
GROUP BY X.val, X.rgt;
The Temp Table blues OR The Dynamic SQL blues
SQL Apprentice Question
I need some suggestions (if any) for improvement. If you think
I'm on the right track, let me know as well. :) Basically I'm a web dev
creating a page that will search through almost 5 million records, so
speed is important. The users will be able to search via 15 different
criteria. Due to this potential, dynamically building the query string
is the only sensible solution to me. I built my page with a d-sql SP in
the backend and all is well. Then my boss throws me a curve ball. He
wants users to be able to dump in multiple "equipment" numbers into one
of the text boxes, separated by commas, and have the page search via
that as well.
My solution (with the other 14 search criteria's screened out) is
below. Is there any way to improve this method? I can't use a table
variable because it's inaccessible when the d-sql is executed. Thanks.
CREATE PROCEDURE itv_history
@equipment nvarchar(250)
AS
SET DATEFORMAT mdy
CREATE TABLE #ContTable
(
ContValue nvarchar(15)
)
DECLARE @sql nvarchar(2000)
SET @sql = N'SELECT * FROM dbo.ITVD '
if @equipment <> ''
BEGIN
declare @separator nvarchar(1)
set @separator = ','
declare @separator_position int
declare @array_value nvarchar(15)
set @equipment = @equipment + ','
while patindex('%,%' , @equipment) <> 0
begin
select @separator_position = patindex('%,%' , @equipment)
select @array_value = left(@equipment, @separator_position - 1)
Insert #ContTable
Values (ltrim(rtrim(@array_value)))
select @equipment = stuff(@equipment, 1, @separator_position, '')
end
SET @sql = @sql + ' fvd INNER JOIN #ContTable a ON fvd.equipment LIKE
'+'''%'''+'+a.ContValue+'+'''%''')
END
execute (@sql)
Celko Answers
>> My solution (with the other 14 search criteria's screened out) is below. Is there any way to improve this method? <<
Another option is to use the fact that T-SQL can handle over 1000
parameters. This is almost certainly large enough -- heck, 100 is
probalby more typing than anyone wants to do! The code is easy to
write with "cut & paste" editing. Now, if you are lucky, parameter
sniffing might help execution.
SELECT .. FROM .. WHERE parm IN (@p001, @p002,.. , @p100)
Trying to write even a simple parser in SQL is not a good idea-- that
is a good job for 'C' or assembly languages. Is a pair of commas a
zero or a NULL? What do you do with alphas in the string? What about
the stray single quote? Minus signs in the front or the back of a
substring? Decimals -- error, rounded or accepted? Ditto FLOAT and
REAL? Did we squeeze out spaces or not?
I need some suggestions (if any) for improvement. If you think
I'm on the right track, let me know as well. :) Basically I'm a web dev
creating a page that will search through almost 5 million records, so
speed is important. The users will be able to search via 15 different
criteria. Due to this potential, dynamically building the query string
is the only sensible solution to me. I built my page with a d-sql SP in
the backend and all is well. Then my boss throws me a curve ball. He
wants users to be able to dump in multiple "equipment" numbers into one
of the text boxes, separated by commas, and have the page search via
that as well.
My solution (with the other 14 search criteria's screened out) is
below. Is there any way to improve this method? I can't use a table
variable because it's inaccessible when the d-sql is executed. Thanks.
CREATE PROCEDURE itv_history
@equipment nvarchar(250)
AS
SET DATEFORMAT mdy
CREATE TABLE #ContTable
(
ContValue nvarchar(15)
)
DECLARE @sql nvarchar(2000)
SET @sql = N'SELECT * FROM dbo.ITVD '
if @equipment <> ''
BEGIN
declare @separator nvarchar(1)
set @separator = ','
declare @separator_position int
declare @array_value nvarchar(15)
set @equipment = @equipment + ','
while patindex('%,%' , @equipment) <> 0
begin
select @separator_position = patindex('%,%' , @equipment)
select @array_value = left(@equipment, @separator_position - 1)
Insert #ContTable
Values (ltrim(rtrim(@array_value)))
select @equipment = stuff(@equipment, 1, @separator_position, '')
end
SET @sql = @sql + ' fvd INNER JOIN #ContTable a ON fvd.equipment LIKE
'+'''%'''+'+a.ContValue+'+'''%''')
END
execute (@sql)
Celko Answers
>> My solution (with the other 14 search criteria's screened out) is below. Is there any way to improve this method? <<
Another option is to use the fact that T-SQL can handle over 1000
parameters. This is almost certainly large enough -- heck, 100 is
probalby more typing than anyone wants to do! The code is easy to
write with "cut & paste" editing. Now, if you are lucky, parameter
sniffing might help execution.
SELECT .. FROM .. WHERE parm IN (@p001, @p002,.. , @p100)
Trying to write even a simple parser in SQL is not a good idea-- that
is a good job for 'C' or assembly languages. Is a pair of commas a
zero or a NULL? What do you do with alphas in the string? What about
the stray single quote? Minus signs in the front or the back of a
substring? Decimals -- error, rounded or accepted? Ditto FLOAT and
REAL? Did we squeeze out spaces or not?
ORDER BY clause with unknown column name
SQL Apprentice Question
can i use an unknown column in an ORDER BY clause with t-sql?
i know it will always be an identity field and it is in the first column.
it is also the primary key.
can i depend on a recordset always being in this order without the the
clause?
Celko Answers
>> can i use an unknown column in an ORDER BY clause with t-sql? <<
No, you have to sort by something. When do not put the ORDER BY in a
cursor (it is not part of a SELECT, another common newbie assumption),
then the engine can out the rows into a sequence in any order. Every
SQL product will be a bit different, depending on physical storage,
parallelism in the hardware, etc.
>> I know it will always be an identity field and it is in the first column. It is also the primary key. <<
You might want to read a book and find out why IDENTITY can *never* be
a key. By definition. What you are doing is mimicing a 1950's magnetic
tape file in SQL. The IDENTITY is an exposed physical locator you are
using, the same way we used record positions on a mag tape.
>> can I depend on a recordset always being in this order without the [ORDER BY] clause? <<
No. This is the definition of a table -- it is a set without any
physical ordering. When you finally read a book on RDBMS, pay
attention to "The Information Prinicple" and some of the other rules
that Dr. Codd set up.
There are some proprietary kludges you can use to destroy portability
and data integrity. For example, there is a ordinal position number
that was removed from Standard SQL a few years ago, but exists in some
products.
All you will get in Newsgroups are the kludges; you need to get an
education. And it will take you at least a year to do that. Your
whole mindset is wrong and you have to unlearn a lot.
can i use an unknown column in an ORDER BY clause with t-sql?
i know it will always be an identity field and it is in the first column.
it is also the primary key.
can i depend on a recordset always being in this order without the the
clause?
Celko Answers
>> can i use an unknown column in an ORDER BY clause with t-sql? <<
No, you have to sort by something. When do not put the ORDER BY in a
cursor (it is not part of a SELECT, another common newbie assumption),
then the engine can out the rows into a sequence in any order. Every
SQL product will be a bit different, depending on physical storage,
parallelism in the hardware, etc.
>> I know it will always be an identity field and it is in the first column. It is also the primary key. <<
You might want to read a book and find out why IDENTITY can *never* be
a key. By definition. What you are doing is mimicing a 1950's magnetic
tape file in SQL. The IDENTITY is an exposed physical locator you are
using, the same way we used record positions on a mag tape.
>> can I depend on a recordset always being in this order without the [ORDER BY] clause? <<
No. This is the definition of a table -- it is a set without any
physical ordering. When you finally read a book on RDBMS, pay
attention to "The Information Prinicple" and some of the other rules
that Dr. Codd set up.
There are some proprietary kludges you can use to destroy portability
and data integrity. For example, there is a ordinal position number
that was removed from Standard SQL a few years ago, but exists in some
products.
All you will get in Newsgroups are the kludges; you need to get an
education. And it will take you at least a year to do that. Your
whole mindset is wrong and you have to unlearn a lot.
Cursor not completing when stored procedure runs within it
SQL Apprentice Question
I am having an interesting problem I haven't seen.
First, here's the code that sets up the cursor, with a select statement
where the exec should be, and the results:
DECLARE @order_id int,
@row_id int,
@qty_rtn int,
@invoice_id int,
@date_shipped datetime
DECLARE order_return CURSOR FOR
select r.order_id_display, r.row_id -1, r.quantity, s.line_id,
getdate() from batch..temp_response r, shipment s,
receipt_item i
where isnull(r.status, 0) >= 0 and new_status in ('R', 'U')
and i.i_order_id_display = r.order_id_display
and i.order_id = s.order_id
and i.row_id = r.row_id - 1
and i.upc=r.upc and amount = 1
and i.order_id in ('0FD94RQXB4JL9J8V4R3G5B8CC5') --for
testing purposes I selected one order only
OPEN order_return
FETCH NEXT FROM order_return INTO @order_id, @row_id, @qty_rtn,
@invoice_id, @date_shipped
WHILE @@FETCH_STATUS = 0
BEGIN
select 'exec process_line_item_shipping', @order_id, @row_id, 0,
@qty_rtn, @date_shipped, @invoice_id
-- exec process_line_item_shipping @order_id, @row_id, 0, @qty_rtn,
@date_shipped, @invoice_id
FETCH NEXT FROM order_return INTO @order_id, @row_id, @qty_rtn,
@invoice_id, @date_shipped
END
CLOSE order_return
DEALLOCATE order_return
This returns
exec process_line_item_shipping 491232 0 0 1
2006-06-16 12:46:19.330 534386
exec process_line_item_shipping 491232 1 0 1
2006-06-16 12:46:19.330 534386
Which is exactly what I'd expect.
HOWEVER.... when I remove the comment tag off the actual SP exec
command, then I ONLY get
exec process_line_item_shipping 491232 0 0 1
2006-06-16 12:46:19.330 534386
and only the first exec statement runs.
I've done a select @@fetch_status before and after the exec statement,
and it's 0 each time.
The stored procedure run has no cursors within it, just several
calculations, inserts and update statements.
Celko Answers
Please post DDL, so that people do not have to guess what the keys,
constraints, Declarative Referential Integrity, data types, etc. in
your schema are. Sample data is also a good idea, along with clear
specifications. It is very hard to debug code when you do not let us
see it.
What you did post was awful. You are using SQL cursors, which are the
worst way to use SQL -- orders of magnitude poorer performance, lack of
portability, etc. Read some of the postings here and *any* other SQL
Newsgroup. My rule of thumb is that you should not write more than
five of them in 25 years in IT.
Looking at what you did post, it looks like you missed most of the
basic ideas of RDBMS and building a procedural routine that mimics a
file system. .
1) Why would anyone put the display order into a table? All display
work is done in the front end and not the database.
2) Ignoring design flaw #1, why did you use two different names for the
same data element (I.i_order_id_display = R.order_id_display)? Surely
nobody would put the data type or table on a data element.
3) What is a row_id? If it refers to the physical rows in a table,
then it is wrong. If it refers to the position on the input screen or
original paper form, then it is wrong. You woudl be mimicing a paper
form instead of building a relational model.
4) You use vague data element names Amount of what? It does not seem
to be money. Quantity of what? Ordered or returned or on-hand, or what?
That is like an adjective without a noun.
5) Why don't you follow ISO-11179 naming rules or at least be
consistent? Look at @date_shipped is "" while @invoice_id
is "" instead.
6) When I see procedure named "Process_Line_Item_Shipping' I worry
that you are going thru each item in an order, one at a time. SQL is a
set-oriented language and you should be working with a sub-set of
items. No loops. No Cursors.
My guess, based on no DDL, is that you need a table for the Orders, for
the Order Details, Shipments and working table of returns. The
returns will be used to update the Order Details with return
quantities and shipping info (perhaps the Orders will need changes).
I have done this in one UPDATE statement for some fairly simple
business rules. The trick was a detail table keyed on (order_nbr, sku,
ship_status, ship_date). Reports are done off of VIEWs (what
percentage of Lawn Gnomes are returned? in how many days? ) and you
needed to watch constraints (you cannot return more than you bought).
I am having an interesting problem I haven't seen.
First, here's the code that sets up the cursor, with a select statement
where the exec should be, and the results:
DECLARE @order_id int,
@row_id int,
@qty_rtn int,
@invoice_id int,
@date_shipped datetime
DECLARE order_return CURSOR FOR
select r.order_id_display, r.row_id -1, r.quantity, s.line_id,
getdate() from batch..temp_response r, shipment s,
receipt_item i
where isnull(r.status, 0) >= 0 and new_status in ('R', 'U')
and i.i_order_id_display = r.order_id_display
and i.order_id = s.order_id
and i.row_id = r.row_id - 1
and i.upc=r.upc and amount = 1
and i.order_id in ('0FD94RQXB4JL9J8V4R3G5B8CC5') --for
testing purposes I selected one order only
OPEN order_return
FETCH NEXT FROM order_return INTO @order_id, @row_id, @qty_rtn,
@invoice_id, @date_shipped
WHILE @@FETCH_STATUS = 0
BEGIN
select 'exec process_line_item_shipping', @order_id, @row_id, 0,
@qty_rtn, @date_shipped, @invoice_id
-- exec process_line_item_shipping @order_id, @row_id, 0, @qty_rtn,
@date_shipped, @invoice_id
FETCH NEXT FROM order_return INTO @order_id, @row_id, @qty_rtn,
@invoice_id, @date_shipped
END
CLOSE order_return
DEALLOCATE order_return
This returns
exec process_line_item_shipping 491232 0 0 1
2006-06-16 12:46:19.330 534386
exec process_line_item_shipping 491232 1 0 1
2006-06-16 12:46:19.330 534386
Which is exactly what I'd expect.
HOWEVER.... when I remove the comment tag off the actual SP exec
command, then I ONLY get
exec process_line_item_shipping 491232 0 0 1
2006-06-16 12:46:19.330 534386
and only the first exec statement runs.
I've done a select @@fetch_status before and after the exec statement,
and it's 0 each time.
The stored procedure run has no cursors within it, just several
calculations, inserts and update statements.
Celko Answers
Please post DDL, so that people do not have to guess what the keys,
constraints, Declarative Referential Integrity, data types, etc. in
your schema are. Sample data is also a good idea, along with clear
specifications. It is very hard to debug code when you do not let us
see it.
What you did post was awful. You are using SQL cursors, which are the
worst way to use SQL -- orders of magnitude poorer performance, lack of
portability, etc. Read some of the postings here and *any* other SQL
Newsgroup. My rule of thumb is that you should not write more than
five of them in 25 years in IT.
Looking at what you did post, it looks like you missed most of the
basic ideas of RDBMS and building a procedural routine that mimics a
file system. .
1) Why would anyone put the display order into a table? All display
work is done in the front end and not the database.
2) Ignoring design flaw #1, why did you use two different names for the
same data element (I.i_order_id_display = R.order_id_display)? Surely
nobody would put the data type or table on a data element.
3) What is a row_id? If it refers to the physical rows in a table,
then it is wrong. If it refers to the position on the input screen or
original paper form, then it is wrong. You woudl be mimicing a paper
form instead of building a relational model.
4) You use vague data element names Amount of what? It does not seem
to be money. Quantity of what? Ordered or returned or on-hand, or what?
That is like an adjective without a noun.
5) Why don't you follow ISO-11179 naming rules or at least be
consistent? Look at @date_shipped is "
is "
6) When I see procedure named "Process_Line_Item_Shipping' I worry
that you are going thru each item in an order, one at a time. SQL is a
set-oriented language and you should be working with a sub-set of
items. No loops. No Cursors.
My guess, based on no DDL, is that you need a table for the Orders, for
the Order Details, Shipments and working table of returns. The
returns will be used to update the Order Details with return
quantities and shipping info (perhaps the Orders will need changes).
I have done this in one UPDATE statement for some fairly simple
business rules. The trick was a detail table keyed on (order_nbr, sku,
ship_status, ship_date). Reports are done off of VIEWs (what
percentage of Lawn Gnomes are returned? in how many days? ) and you
needed to watch constraints (you cannot return more than you bought).
Thursday, June 15, 2006
Aggregate Case Conditions
SQL Apprentice Question
I have a table AAA which contains two columns fileid and typecd
for single fileid there can be either 1, 2 or 3 typecds home, host and
payroll
for e.g.
Fileid Typecd
100 Home
100 host
105 home
106 host
106 payroll
107 home
107 host
107 payroll
Now there is second table BBB which contains fileid(primary key) and
itemflag
Now i have to set itemflag for a fileid based on the above table.. like
if fileid contains only home then itemflag should be 1
only host then itemflag should be 2
only payroll then itemflag should be 3
home and host then itemflag should be 4
home and payroll then itemflag should be 5
host and payroll then itemflag should be 6
home host and payroll then itemflag should be 7
I tried this but this is not working giving error.
Insert into BBB
Select AAA.fileid,
(select
case sum(case AAA.typeCd when 'Home' then 1
when 'Host' then 2
when 'Payroll' then 4 end)
when 1 then 1
when 2 then 2
when 3 then 4
when 4 then 3
when 5 then 5
when 6 then 6
when 7 then 7 end
) as ItemFlg,
FROM AAA (nolock)
This is not working because white fetching first row from AAA, the
typecd can contain any of the three value,it is
not having all the three values at a single time.
Can any one help me with this.
Celko Answers
Please post DDL, as you have been asked to do before.
CREATE TABLE AAA
(file_id INTEGER NOT NULL,
type_cd CHGAR(7) NOT NULL
CHECK (type_cd IN ('home', 'host', 'payroll')),
PRIMARY KEY (file_id, type_cd));
>> Now there is second table BBB which contains file_id(primary key) and
item_flag <<
This is a bad design. The "item_flag" is computed and should not be
presisted in a base table. Use a VIEW instead
CREATE VIEW BBB (file_id, item_flag)
AS
SELECT file_id,
CASE WHEN COUNT(*) = 3 THEN 7 -- all 3 type codes
WHEN MIN(type_cd) = 'home' AND MAX(type_cd) = 'home' THEN 1
WHEN MIN(type_cd) = 'host' AND MAX(type_cd) = 'host' THEN
2
WHEN MIN(type_cd) = 'payroll' AND MAX(type_cd) = 'payroll'
THEN 3
WHEN MIN(type_cd) = 'home' AND MAX(type_cd) = 'host' THEN 4
WHEN MIN(type_cd) = 'home' AND MAX(type_cd) = 'payroll'
THEN 5
WHEN MIN(type_cd) = 'host' AND MAX(type_cd) = 'payroll'
THEN 6
ELSE NULL END
) AS item_flag
FROM AAA
GROUP BY file_id;
The VIEW will always be current and does not require constant updating.
I have a table AAA which contains two columns fileid and typecd
for single fileid there can be either 1, 2 or 3 typecds home, host and
payroll
for e.g.
Fileid Typecd
100 Home
100 host
105 home
106 host
106 payroll
107 home
107 host
107 payroll
Now there is second table BBB which contains fileid(primary key) and
itemflag
Now i have to set itemflag for a fileid based on the above table.. like
if fileid contains only home then itemflag should be 1
only host then itemflag should be 2
only payroll then itemflag should be 3
home and host then itemflag should be 4
home and payroll then itemflag should be 5
host and payroll then itemflag should be 6
home host and payroll then itemflag should be 7
I tried this but this is not working giving error.
Insert into BBB
Select AAA.fileid,
(select
case sum(case AAA.typeCd when 'Home' then 1
when 'Host' then 2
when 'Payroll' then 4 end)
when 1 then 1
when 2 then 2
when 3 then 4
when 4 then 3
when 5 then 5
when 6 then 6
when 7 then 7 end
) as ItemFlg,
FROM AAA (nolock)
This is not working because white fetching first row from AAA, the
typecd can contain any of the three value,it is
not having all the three values at a single time.
Can any one help me with this.
Celko Answers
Please post DDL, as you have been asked to do before.
CREATE TABLE AAA
(file_id INTEGER NOT NULL,
type_cd CHGAR(7) NOT NULL
CHECK (type_cd IN ('home', 'host', 'payroll')),
PRIMARY KEY (file_id, type_cd));
>> Now there is second table BBB which contains file_id(primary key) and
item_flag <<
This is a bad design. The "item_flag" is computed and should not be
presisted in a base table. Use a VIEW instead
CREATE VIEW BBB (file_id, item_flag)
AS
SELECT file_id,
CASE WHEN COUNT(*) = 3 THEN 7 -- all 3 type codes
WHEN MIN(type_cd) = 'home' AND MAX(type_cd) = 'home' THEN 1
WHEN MIN(type_cd) = 'host' AND MAX(type_cd) = 'host' THEN
2
WHEN MIN(type_cd) = 'payroll' AND MAX(type_cd) = 'payroll'
THEN 3
WHEN MIN(type_cd) = 'home' AND MAX(type_cd) = 'host' THEN 4
WHEN MIN(type_cd) = 'home' AND MAX(type_cd) = 'payroll'
THEN 5
WHEN MIN(type_cd) = 'host' AND MAX(type_cd) = 'payroll'
THEN 6
ELSE NULL END
) AS item_flag
FROM AAA
GROUP BY file_id;
The VIEW will always be current and does not require constant updating.
Results in Parallel columns
SQL Apprentice Question
I need to place the results of two different queries in the same result
table parallel to each other.
So if the result of the first query is
1 12
2 34
3 45
and the second query is
1 34
2 44
3 98
the results should be displayed as
1 12 34
2 34 44
3 45 98
If a union is done for both the queries , we get the results in rows.
How can the above be done.
Celko Answers
>>I need to place the results of two different queries in the same result table parallel to each other. <<
This is not a table; the rows of a table model elements of a set of the
same kind of thing. What you want is a display kludge to show, say,
automobiles and squid as if they were the same kind of things.
Here is your kludge, since people often gripe that I do not post
the bad code the OP wants:
SELECT *, ROW_NUMBER() OVER(ORDER BY duh) AS lft_nbr
FROM Foo
FULL OUTER JOIN
SELECT *, ROW_NUMBER() OVER(ORDER BY doh) AS rgt_nbr
FROM Bar
ON Foo.lft_nbr = Bar.rgt_nbr;
The right way is handle display issues in the applications and front
ends, not the RDBMS.
I need to place the results of two different queries in the same result
table parallel to each other.
So if the result of the first query is
1 12
2 34
3 45
and the second query is
1 34
2 44
3 98
the results should be displayed as
1 12 34
2 34 44
3 45 98
If a union is done for both the queries , we get the results in rows.
How can the above be done.
Celko Answers
>>I need to place the results of two different queries in the same result table parallel to each other. <<
This is not a table; the rows of a table model elements of a set of the
same kind of thing. What you want is a display kludge to show, say,
automobiles and squid as if they were the same kind of things.
Here is your kludge, since people often gripe that I do not post
the bad code the OP wants:
SELECT *, ROW_NUMBER() OVER(ORDER BY duh) AS lft_nbr
FROM Foo
FULL OUTER JOIN
SELECT *, ROW_NUMBER() OVER(ORDER BY doh) AS rgt_nbr
FROM Bar
ON Foo.lft_nbr = Bar.rgt_nbr;
The right way is handle display issues in the applications and front
ends, not the RDBMS.
Data Modeling Question: One Entity or Two
SQL Apprentice Question
am evaluating an existing MS Access database for a new client - with the
intent to migrate it to SQL Server and possibly change/improve the
relational database design in the process.
The client is a medical specialty office in which patients show up with lab
results taken by their primary care physician (PCP) prior to the patient
showing up at my client's specialty office. My client's office then proceeds
to take additional lab measurements over time. The data collected by the PCP
and my client's office is practically the same.
My client's existing MS Access database stores this information in two
tables - one for labs taken at the PCP office (and apparently only the most
recent set of results prior to the patient showing up at my client's
office); and another table for labs taken at my client's office (and
measured over time). In the existing database these two tables have similar
(almost identical) columns.
The client's in-house DBA sees these tables as representing two entities,
not one (one entity is "last set of labs measured by the PCP" and the other
entity is "labs measured in-house; over time"). I understand all these lab
results as one entity ("lab results"); and therefore we can/should move all
this data into one table when we migrate the data to the new SQL Server
database.
What do you think? Do these lab results represent one entity or two?
FWIW: The business managers do not differentiate between the two types of
labs (measured by PCP vs in-house).
Celko Answers
>> What do you think? Do these lab results represent one entity or two? .. FWIW: The business managers do not differentiate between the two types of labs (measured by PCP vs in-house). <<
What is the LOGICAL difference between them? Apparently, none. Ergo,
you use one table and column for the lab_type. Leave this encoding
open enough that you can extend it lately, when add other sources.
am evaluating an existing MS Access database for a new client - with the
intent to migrate it to SQL Server and possibly change/improve the
relational database design in the process.
The client is a medical specialty office in which patients show up with lab
results taken by their primary care physician (PCP) prior to the patient
showing up at my client's specialty office. My client's office then proceeds
to take additional lab measurements over time. The data collected by the PCP
and my client's office is practically the same.
My client's existing MS Access database stores this information in two
tables - one for labs taken at the PCP office (and apparently only the most
recent set of results prior to the patient showing up at my client's
office); and another table for labs taken at my client's office (and
measured over time). In the existing database these two tables have similar
(almost identical) columns.
The client's in-house DBA sees these tables as representing two entities,
not one (one entity is "last set of labs measured by the PCP" and the other
entity is "labs measured in-house; over time"). I understand all these lab
results as one entity ("lab results"); and therefore we can/should move all
this data into one table when we migrate the data to the new SQL Server
database.
What do you think? Do these lab results represent one entity or two?
FWIW: The business managers do not differentiate between the two types of
labs (measured by PCP vs in-house).
Celko Answers
>> What do you think? Do these lab results represent one entity or two? .. FWIW: The business managers do not differentiate between the two types of labs (measured by PCP vs in-house). <<
What is the LOGICAL difference between them? Apparently, none. Ergo,
you use one table and column for the lab_type. Leave this encoding
open enough that you can extend it lately, when add other sources.
Numeric or Int to store prices
SQL Apprentice Question
am upgrading tables on a legacy SQL Server 2000 database
Some price data is stored in real datatypes, which is not correct.
Prices are in Euros and I just need 2 decimals.
I could convert them to numeric or use integer multiplying values per
100 then dividing on them on the presentation layer.
I think it could be faster, take less space and also more cross
compatible.
Is this a correct practice ?
Why should one avoid the MONEY data type for currency?
Celko Answers
>> Prices are in Euros and I just need 2 decimals. <<
There are websites with the rules for Euro conversions (in particular,
look at triangulation). I would use NUMERIC (n, 4) to be safe. And
you already know to avoid FLOAT, REAL, MONEY and SMALLMONEY data types.
>> Why should one avoid the MONEY data type for currency? <<
1) They are proprietary
2) They are proprietary
3) The MONEY datatype has rounding errors. Using more than one
operation (multiplication or division) on money columns will produce
severe rounding errors. A simple way to visualize money arithmetic is
to place a ROUND() function calls after every operation. For example,
Amount = (Portion / total_amt) * gross_amt
can be rewritten using money arithmetic as:
Amount = ROUND(ROUND(Portion/total_amt, 4) * gross_amt, 4)
Rounding to four decimal places might not seem an issue, until the
numbers you are using are greater than 10,000.
BEGIN
DECLARE @gross_amt MONEY,
@total_amt MONEY,
@my_part MONEY,
@money_result MONEY,
@float_result FLOAT,
@all_floats FLOAT;
SET @gross_amt = 55294.72;
SET @total_amt = 7328.75;
SET @my_part = 1793.33;
SET @money_result = (@my_part / @total_amt) * @gross_amt;
SET @float_result = (@my_part / @total_amt) * @gross_amt;
SET @Retult3 = (CAST(@my_part AS FLOAT)
/ CAST( @total_amt AS FLOAT))
* CAST(FLOAT, @gross_amtAS FLOAT);
SELECT @money_result, @float_result, @all_floats;
END;
@money_result = 13525.09 -- incorrect
@float_result = 13525.0885 -- incorrect
@all_floats = 13530.5038673171 -- correct, with a -5.42 error
am upgrading tables on a legacy SQL Server 2000 database
Some price data is stored in real datatypes, which is not correct.
Prices are in Euros and I just need 2 decimals.
I could convert them to numeric or use integer multiplying values per
100 then dividing on them on the presentation layer.
I think it could be faster, take less space and also more cross
compatible.
Is this a correct practice ?
Why should one avoid the MONEY data type for currency?
Celko Answers
>> Prices are in Euros and I just need 2 decimals. <<
There are websites with the rules for Euro conversions (in particular,
look at triangulation). I would use NUMERIC (n, 4) to be safe. And
you already know to avoid FLOAT, REAL, MONEY and SMALLMONEY data types.
>> Why should one avoid the MONEY data type for currency? <<
1) They are proprietary
2) They are proprietary
3) The MONEY datatype has rounding errors. Using more than one
operation (multiplication or division) on money columns will produce
severe rounding errors. A simple way to visualize money arithmetic is
to place a ROUND() function calls after every operation. For example,
Amount = (Portion / total_amt) * gross_amt
can be rewritten using money arithmetic as:
Amount = ROUND(ROUND(Portion/total_amt, 4) * gross_amt, 4)
Rounding to four decimal places might not seem an issue, until the
numbers you are using are greater than 10,000.
BEGIN
DECLARE @gross_amt MONEY,
@total_amt MONEY,
@my_part MONEY,
@money_result MONEY,
@float_result FLOAT,
@all_floats FLOAT;
SET @gross_amt = 55294.72;
SET @total_amt = 7328.75;
SET @my_part = 1793.33;
SET @money_result = (@my_part / @total_amt) * @gross_amt;
SET @float_result = (@my_part / @total_amt) * @gross_amt;
SET @Retult3 = (CAST(@my_part AS FLOAT)
/ CAST( @total_amt AS FLOAT))
* CAST(FLOAT, @gross_amtAS FLOAT);
SELECT @money_result, @float_result, @all_floats;
END;
@money_result = 13525.09 -- incorrect
@float_result = 13525.0885 -- incorrect
@all_floats = 13530.5038673171 -- correct, with a -5.42 error
sort
SQL Apprentice Question
I Have Table Num As Int , Name As NvarChar(20) I Need Trigger to Resort The
Field Num When I Change The num
Num Name
1 aaaaa
2 bbbbb
3 cccccc
4 ddddd
I Want when I Change the Num 1 To 2 Came Like
Num Name
1 bbbbb
2 aaaaaa
3 cccccc
4 ddddd
Celko Answers
Let's make this problem more concrete. You want to put automobiles
into numbered parking spaces and move them around.
CREATE TABLE Motorpool
(parking_space INTEGER NOT NULL PRIMARY KEY
CHECK (parking_space > 0),
vin CHAR(170) NOT NULL);
Re-arrange the display order based on the parking_space column:
CREATE PROCEDURE SwapVehicles (@old_parking_space INTEGER,
@new_parking_space INTEGER)
AS
UPDATE Motorpool
SET parking_space
= CASE parking_space
WHEN @old_parking_space
THEN @new_parking_space
ELSE parking_space + SIGN(@old_parking_space - @new_pos)
END
WHERE parking_space BETWEEN @old_parking_space AND @new_parking_space
OR parking_space BETWEEN @new_parking_space AND @old_parking_space;
When you want to drop a few rows, remember to close the gaps with this:
CREATE PROCEDURE CloseMotorpoolGaps()
AS
UPDATE Motorpool
SET parking_space
= (SELECT COUNT (M1.parking_space)
FROM Motorpool AS M1
WHERE M1.parking_space <= Motorpool.parking_space);
I Have Table Num As Int , Name As NvarChar(20) I Need Trigger to Resort The
Field Num When I Change The num
Num Name
1 aaaaa
2 bbbbb
3 cccccc
4 ddddd
I Want when I Change the Num 1 To 2 Came Like
Num Name
1 bbbbb
2 aaaaaa
3 cccccc
4 ddddd
Celko Answers
Let's make this problem more concrete. You want to put automobiles
into numbered parking spaces and move them around.
CREATE TABLE Motorpool
(parking_space INTEGER NOT NULL PRIMARY KEY
CHECK (parking_space > 0),
vin CHAR(170) NOT NULL);
Re-arrange the display order based on the parking_space column:
CREATE PROCEDURE SwapVehicles (@old_parking_space INTEGER,
@new_parking_space INTEGER)
AS
UPDATE Motorpool
SET parking_space
= CASE parking_space
WHEN @old_parking_space
THEN @new_parking_space
ELSE parking_space + SIGN(@old_parking_space - @new_pos)
END
WHERE parking_space BETWEEN @old_parking_space AND @new_parking_space
OR parking_space BETWEEN @new_parking_space AND @old_parking_space;
When you want to drop a few rows, remember to close the gaps with this:
CREATE PROCEDURE CloseMotorpoolGaps()
AS
UPDATE Motorpool
SET parking_space
= (SELECT COUNT (M1.parking_space)
FROM Motorpool AS M1
WHERE M1.parking_space <= Motorpool.parking_space);
Monday, June 12, 2006
Help required to update SQL
SQL Apprentice Question
have a table named Car with the field name as Position. I want to
update the Position field. If i enter a new value as 4 for position
which already exist, then the existing value 4 and all the below items
like 5,6 and 7 must be incremented by 1
Position
1
2
3
4
5
6
7
How can this be achieved
Celko Answers
>> I have a table named Car with the field [sic] name as position. I want to update the position field [sic]. <<
Please post DDL, so that people do not have to guess what the keys,
constraints, Declarative Referential Integrity, data types, etc. in
your schema are. Sample data is also a good idea, along with clear
specifications. It is very hard to debug code when you do not let us
see it.
Your vague narrative is also wrong. Columns are not fields. POSITION()
s a reserved word in Standard SQL. Since you used a singular name, you
must have one car; otherwise you would have used a collective or plural
noun.
>> If I enter a new value as 4 for space_nbr which already exist, then the existing value 4 and all the below items like 5, 6 and 7 must be incremented by 1 <<
Let's get back to the basics of an RDBMS. Rows are not records; fields
are not columns; tables are not files; there is no sequential access or
ordering in an RDBMS, so "first", "next" , "below" and "last" are
totally meaningless.
I am going to make a guess at what you meant. Do you have a motorpool
in which you assign parking spaces?
CREATE TABLE Motorpool
(space_nbr INTEGER NOT NULL PRIMARY KEY
CHECK (space_nbr > 0),
vin CHAR(17) NOT NULL);
Re-arrange the display order based on the space_nbr column:
CREATE PROCEDURE SwapParkingSpacees (@old_space_nbr INTEGER,
@new_space_nbr INTEGER)
AS
UPDATE Motorpool
SET space_nbr
= CASE space_nbr
WHEN @old_space_nbr
THEN @new_space_nbr
ELSE space_nbr + SIGN(@old_space_nbr - @new_pos)
END
WHERE space_nbr BETWEEN @old_space_nbr AND @new_space_nbr
OR space_nbr BETWEEN @new_space_nbr AND @old_space_nbr;
When you want to drop a few rows, remember to close the gaps with this:
CREATE PROCEDURE CloseParkingSpaceGaps()
AS
UPDATE Motorpool
SET space_nbr
= (SELECT COUNT (F1.space_nbr)
FROM Motorpool AS F1
WHERE F1.space_nbr <= Motorpool.space_nbr);
To insert a new car into the motorpool, add the new vehicle to the "end
of the line" and then swap it with the target parking space.
have a table named Car with the field name as Position. I want to
update the Position field. If i enter a new value as 4 for position
which already exist, then the existing value 4 and all the below items
like 5,6 and 7 must be incremented by 1
Position
1
2
3
4
5
6
7
How can this be achieved
Celko Answers
>> I have a table named Car with the field [sic] name as position. I want to update the position field [sic]. <<
Please post DDL, so that people do not have to guess what the keys,
constraints, Declarative Referential Integrity, data types, etc. in
your schema are. Sample data is also a good idea, along with clear
specifications. It is very hard to debug code when you do not let us
see it.
Your vague narrative is also wrong. Columns are not fields. POSITION()
s a reserved word in Standard SQL. Since you used a singular name, you
must have one car; otherwise you would have used a collective or plural
noun.
>> If I enter a new value as 4 for space_nbr which already exist, then the existing value 4 and all the below items like 5, 6 and 7 must be incremented by 1 <<
Let's get back to the basics of an RDBMS. Rows are not records; fields
are not columns; tables are not files; there is no sequential access or
ordering in an RDBMS, so "first", "next" , "below" and "last" are
totally meaningless.
I am going to make a guess at what you meant. Do you have a motorpool
in which you assign parking spaces?
CREATE TABLE Motorpool
(space_nbr INTEGER NOT NULL PRIMARY KEY
CHECK (space_nbr > 0),
vin CHAR(17) NOT NULL);
Re-arrange the display order based on the space_nbr column:
CREATE PROCEDURE SwapParkingSpacees (@old_space_nbr INTEGER,
@new_space_nbr INTEGER)
AS
UPDATE Motorpool
SET space_nbr
= CASE space_nbr
WHEN @old_space_nbr
THEN @new_space_nbr
ELSE space_nbr + SIGN(@old_space_nbr - @new_pos)
END
WHERE space_nbr BETWEEN @old_space_nbr AND @new_space_nbr
OR space_nbr BETWEEN @new_space_nbr AND @old_space_nbr;
When you want to drop a few rows, remember to close the gaps with this:
CREATE PROCEDURE CloseParkingSpaceGaps()
AS
UPDATE Motorpool
SET space_nbr
= (SELECT COUNT (F1.space_nbr)
FROM Motorpool AS F1
WHERE F1.space_nbr <= Motorpool.space_nbr);
To insert a new car into the motorpool, add the new vehicle to the "end
of the line" and then swap it with the target parking space.
Concat Null Yields Null Woes..
SQL Apprentice Question
I have a bunch of views that concat fileds, some of which allow nulls
so I am using the isNull function when joining them. I dont want to
have to do this, I want to set the database so that concat null does
not yield null for all views and procedures.
How do I do this this in EM ?
I read and tried a bunch of stuff on this but nothings worked so far.
Please dont reply telling me to use the concat null yields null
setting - I think I know what needs doing - but I dont know "How" to do
it!
Celko Answers
>> I have a bunch of views that concat fields [sic], some of which allow NULLs so I am using the ISNULL() function when joining them. I dont want to have to do this, I want to set the database so that concat NULL does not yield NULL for all views and procedures. <<
Oh, you want to write your own language and not bother with SQL! This
behavior is one of many reasons that columns are not anything like
fields and why I jump on newbies to actually read a book about the
language before they start coding.
>> Please dont reply telling me to use the concat null yields null setting - I think I know what needs doing - but I dont know "How" to do <<
Nobody should be so irresponsible as to give you that advice. Your
code would not port, would not work properly, etc. what needs doing
is a bit more education on your part instead of looking for kludges to
save yourself some typing. Also, why aren't you using QA or a code
editor instead of EM?
SQL programmers think of the schema as a whole. The first place to
look is the DDL and the Data Dictionary (which you probably do not
have, if you have that many NULLs). Which of these columns really
should be blanks, empty strings or other defaults and not NULL-able at
all? I will bet most of them, based on two decades of cleaning up
SQL.
You will find that most bad DML are kludges made in response to bad
DDL.
I have a bunch of views that concat fileds, some of which allow nulls
so I am using the isNull function when joining them. I dont want to
have to do this, I want to set the database so that concat null does
not yield null for all views and procedures.
How do I do this this in EM ?
I read and tried a bunch of stuff on this but nothings worked so far.
Please dont reply telling me to use the concat null yields null
setting - I think I know what needs doing - but I dont know "How" to do
it!
Celko Answers
>> I have a bunch of views that concat fields [sic], some of which allow NULLs so I am using the ISNULL() function when joining them. I dont want to have to do this, I want to set the database so that concat NULL does not yield NULL for all views and procedures. <<
Oh, you want to write your own language and not bother with SQL! This
behavior is one of many reasons that columns are not anything like
fields and why I jump on newbies to actually read a book about the
language before they start coding.
>> Please dont reply telling me to use the concat null yields null setting - I think I know what needs doing - but I dont know "How" to do <<
Nobody should be so irresponsible as to give you that advice. Your
code would not port, would not work properly, etc. what needs doing
is a bit more education on your part instead of looking for kludges to
save yourself some typing. Also, why aren't you using QA or a code
editor instead of EM?
SQL programmers think of the schema as a whole. The first place to
look is the DDL and the Data Dictionary (which you probably do not
have, if you have that many NULLs). Which of these columns really
should be blanks, empty strings or other defaults and not NULL-able at
all? I will bet most of them, based on two decades of cleaning up
SQL.
You will find that most bad DML are kludges made in response to bad
DDL.
INvariant part inside SELECT
SQL Apprentice Question
Can we do this trick and if yes then how? Just schematically: the SP should
return the number of records if the parameter @Count=1, if not, then the
records themselves. The problem is that there is some complicated JOIN and
the whole set of WHERE clauses that I wouldn't like to repeat in two
different queries looking almost identically excluding the main SELECT part.
The idea described below doesn't work.
--Parameter
Declare @Count bit
SET @Count = 1
SELECT
CASE
WHEN @Count = 1
THEN pe.*, pn.*
ELSE COUNT(*)
END
...
FROM ...
INNER JOIN ... ON ...
WHERE ...
Any ideas?
Celko Answers
>> Just schematically: the SP should return the number of records [sic] if the parameter @Count=1, if not, then the records [sic] themselves. <<
Did you ever have a software engineering course? Remember cohesion?
The idea that a properly designed code module will perform one
well-defined task. Good programmers do not write things that return
the square root of a number or translate Flemish depending on a
parameter.
>> The idea described below doesn't work... <<
Why did you make your low-level BIT flag a reserved word? Why are you
thinking in terms of assembly language style flags and variant records
instead of rows?
>> --Parameter
DECLARE @Count bit
SET @Count = 1
SELECT
CASE
WHEN @Count = 1
THEN pe.*, pn.*
ELSE COUNT(*)
END
...
FROM ...
INNER JOIN ... ON ...
WHERE ... ; <<
CASE is an expression and not a control flow device. You can use an
IF-THEN-ELSE construct in T-SQL to mimic procedural coding with variant
records instead of using declarative coding.
Did you also notice that you want to return one column and then want to
return two columns? Arow in a relational table always has a fixed
number of columns, unlike records in a file. Basically, you are still
writing COBOL or some other procedural file-oriented language, but you
are doing it in SQL.
>> The problem is that there is some complicated JOIN and the whole set of WHERE clauses that I wouldn't like to repeat in two different queries looking almost identically excluding the main SELECT part. <<
This is a simple matter of cut & paste, not the end of the world.
However, if you are just looking for a newsgroup kludge instead of a
real answer in one query, try:
SELECT
CASE WHEN @assembly_language_flag = 1
THEN 'violated cohesion'
ELSE COUNT(*) END AS foobar,
CASE WHEN @assembly_language_flag = 1
THEN PA.x
ELSE 'violated cohesion' END AS x,
etc.
FROM ..
Boy that is awful, isn't it?
Can we do this trick and if yes then how? Just schematically: the SP should
return the number of records if the parameter @Count=1, if not, then the
records themselves. The problem is that there is some complicated JOIN and
the whole set of WHERE clauses that I wouldn't like to repeat in two
different queries looking almost identically excluding the main SELECT part.
The idea described below doesn't work.
--Parameter
Declare @Count bit
SET @Count = 1
SELECT
CASE
WHEN @Count = 1
THEN pe.*, pn.*
ELSE COUNT(*)
END
...
FROM ...
INNER JOIN ... ON ...
WHERE ...
Any ideas?
Celko Answers
>> Just schematically: the SP should return the number of records [sic] if the parameter @Count=1, if not, then the records [sic] themselves. <<
Did you ever have a software engineering course? Remember cohesion?
The idea that a properly designed code module will perform one
well-defined task. Good programmers do not write things that return
the square root of a number or translate Flemish depending on a
parameter.
>> The idea described below doesn't work... <<
Why did you make your low-level BIT flag a reserved word? Why are you
thinking in terms of assembly language style flags and variant records
instead of rows?
>> --Parameter
DECLARE @Count bit
SET @Count = 1
SELECT
CASE
WHEN @Count = 1
THEN pe.*, pn.*
ELSE COUNT(*)
END
...
FROM ...
INNER JOIN ... ON ...
WHERE ... ; <<
CASE is an expression and not a control flow device. You can use an
IF-THEN-ELSE construct in T-SQL to mimic procedural coding with variant
records instead of using declarative coding.
Did you also notice that you want to return one column and then want to
return two columns? Arow in a relational table always has a fixed
number of columns, unlike records in a file. Basically, you are still
writing COBOL or some other procedural file-oriented language, but you
are doing it in SQL.
>> The problem is that there is some complicated JOIN and the whole set of WHERE clauses that I wouldn't like to repeat in two different queries looking almost identically excluding the main SELECT part. <<
This is a simple matter of cut & paste, not the end of the world.
However, if you are just looking for a newsgroup kludge instead of a
real answer in one query, try:
SELECT
CASE WHEN @assembly_language_flag = 1
THEN 'violated cohesion'
ELSE COUNT(*) END AS foobar,
CASE WHEN @assembly_language_flag = 1
THEN PA.x
ELSE 'violated cohesion' END AS x,
etc.
FROM ..
Boy that is awful, isn't it?
stop using dynamic sql
SQL Apprentice Question
The following stored procedure (SP) is using a dynamic sql to build the query.
How can this be written using a standard sql. i.e. NOT dynamically being
built.
Initially I thought I can have something like the following query but it
does not seem to be working for when the where caluse parameters are not
passed.
So I ended up using the dynamic sql as it returns the correct data.
Can you see how the SP can be altered please?
Thanks
------------------------this query does not return the correct data where
the where parameters are not passed.------------
select
*
from
tbl_Management
where
([Year] is null or [Year] = @Year)
AND
(YearPeriod is null or YearPeriod = @YearPeriod)
AND
(AreaCode is null or AreaCode = @AreaCode)
---------------------------------------------------------------
create procedure usp_PatientManagementAdminGet
@Year int = null,
@YearPeriod int = null,
@AreaCode varchar(3) = null
as
declare @sql varchar(1000)
set @sql = 'select'
set @sql = @sql + ' ID'
set @sql = @sql + ' ,AreaCode'
set @sql = @sql + ' ,Year'
set @sql = @sql + ' ,YearPeriod'
set @sql = @sql + ' ,A1=A2+A3'
set @sql = @sql + ' ,A2,A3'
set @sql = @sql + ' ,B1=B2+B3'
set @sql = @sql + ' ,X1=convert(int, ((B2+B3)*1.0/(A2+A3))*100)'
set @sql = @sql + ' from'
set @sql = @sql + ' tbl_Management'
set @sql = @sql + ' where'
if (@Year > 0)
begin
set @sql = @sql + ' [Year] = ' + convert(varchar(4), @Year)
set @sql = @sql + ' AND'
end
if (@YearPeriod > 0)
begin
set @sql = @sql + ' YearPeriod = ' + convert(varchar(2), @YearPeriod)
set @sql = @sql + ' AND'
end
if (@ProgrammeAreaCode is not null)
begin
set @sql = @sql + ' AreaCode = ''' + convert(varchar(3), @AreaCode) + ''''
set @sql = @sql + ' AND'
end
--trim off the last AND...
set @sql = left(@sql, len(@sql) - 3)
exec sp_sqlexec @sql
Celko Answers
Your real problem is that you still think in procedural terms and not
in declarative set-oriented programming.
Why did you think of dynamic SQL at all? (answer: you never un-learned
low level scripting or worked with a compiled language).
Why did you put redundant prefixes on data element names (answer: you
never un-learned BASIC, or other weakly typed languages or read
ISO-11179 naming rules).
Why did you avoid temporal data types? Answer: they do not exist in the
procedural language you mimic in SQL.
You do not understand data types and other constraints. Is there an
area code that varies in length? There will be in your data model!
Poor design invites garbage.
Why did you build the dynamic string in steps? Answer: procedural code
is done in steps; declarative code is only a few statements and they
execute all at once. Your bad code shows characteristic errors.
You have a magical, universal, vague and non-relational "id" in a
table because you do not know what a relational key is. It begs the
questions "id of what? How do I validate it? How do I verify it?"
Surely, you did not use IDENTITY as a key!! That would mimic a
sequential file and not anythign like an RDBMS.
Did you know that year is a reserved word in SQL? You would know that
if you had bothered to run a code check. But I will bet that you have
no such tool and that you do not keep a data dictionary either.
Why did you CAST() things so often in a strongly typed language?
Because you have never written in one before. Why do you use the
proprietary CONVERT to keep the code non-portable?
Why do you think that a1, b1, etc. are good data element names? Are
they standard in your trade? I doubt it. But you would know that if
you had a data dictionary.
Just cleaning up the code, we get some thing like this:
CREATE PROCEDURE GetPatientManagementAdmin
(@my_something_year INTEGER = NULL,
@my_year_period INTEGER = NULL,
@my_area_code CHAR(3) = NULL)
AS
SELECT patient_id, -- wild guess that this exists
area_code, something_year, year_period,
(a2+a3) AS a1, a2, a3,
(b2+b3) AS b1,
CAST(((b2+b3)*1.0/(a2+a3))*100)AS INTEGER)
AS x1 -- better name for all of this??
FROM Patient_Management
WHERE COALESCE (@my_ something_year, something_year) = something_year
AND COALESCE (@my_year_period, year_period) = year_period
AND = COALESCE (area_code, @my_area_code) = area_code;
But since you did not post DDL, we cannot solve the real problem; you
have a bad schema design. You are using integers for years!! You
should use (begin_date, end_date) pairs for temporal ranges. That would
probably mean an auxiliary calendar year That requires that you think
in terms of data solutions instead of procedural code.
Your year parameter probably ought to default to CURRENT_TIMESTAMP.
>> Also, if there is no practical useable natural key - like the Members table of a user group then using a IDENTITY for that is fine so long as you don't mind gaps. <<
Let's do the whole list of assumptions
1) do not mind gaps
2) do not need validation
3) do not need verification
4) will never scale up (Tony's no growth business model)
5) will never port your code- - no upgrades either! Daniel Wetzler
found out the hard way that IDENTITY changed behavior in SQL 2000 and
SQL 2005. If you perform the statement below you get only one dataset
which has the described properties.
SELECT DISTINCT IDENTITY (INTEGER) AS fake_id, title1, ..
FROM Foobar
WHERE title1 IS NOT NULL
AND ..
The IDENTITY function makes each row unique so DISTINCT doesn't
eliminate the duplicates in this case. Interestingly, this behavior
seems to have changed in SQL Server 2005. If I run this as a SELECT
INTO on 2005, the execution plan computes the IDENTITY value after
DISTINCT.
For 2000 the kludge is a bit hard to see. The following should insert
just one row into the target table.
CREATE TABLE Foobar (title1 VARCHAR(10), ..);
INSERT INTO Foobar VALUES ('1', ..);
INSERT INTO Foobar VALUES ('1', ..);
SELECT IDENTITY (INTEGER) AS fake_id, title1, ..
INTO Foobar2
FROM (SELECT DISTINCT title1, ..
FROM Foobar
WHERE ..);
Since we are dealling with a proprietary feature, this is subject to
change without noti ce again.
Of course, asking the members for their email address as their
identifier is never, ever done in the real world. Oh, wait -- it is!!
Do you suppose that there are other industry standard, well-known
identifiers in the world? Gasp!
Why, that would get around the problems mentioned above and be damn
near universal! But that is not as fast to code (e.g, requires a
CHECK() for a valid email address) as Tony's magical sequential number.
All you seem to care about is how fast you can code something, not
about the lifetime costs of maintaining the system.
The following stored procedure (SP) is using a dynamic sql to build the query.
How can this be written using a standard sql. i.e. NOT dynamically being
built.
Initially I thought I can have something like the following query but it
does not seem to be working for when the where caluse parameters are not
passed.
So I ended up using the dynamic sql as it returns the correct data.
Can you see how the SP can be altered please?
Thanks
------------------------this query does not return the correct data where
the where parameters are not passed.------------
select
*
from
tbl_Management
where
([Year] is null or [Year] = @Year)
AND
(YearPeriod is null or YearPeriod = @YearPeriod)
AND
(AreaCode is null or AreaCode = @AreaCode)
---------------------------------------------------------------
create procedure usp_PatientManagementAdminGet
@Year int = null,
@YearPeriod int = null,
@AreaCode varchar(3) = null
as
declare @sql varchar(1000)
set @sql = 'select'
set @sql = @sql + ' ID'
set @sql = @sql + ' ,AreaCode'
set @sql = @sql + ' ,Year'
set @sql = @sql + ' ,YearPeriod'
set @sql = @sql + ' ,A1=A2+A3'
set @sql = @sql + ' ,A2,A3'
set @sql = @sql + ' ,B1=B2+B3'
set @sql = @sql + ' ,X1=convert(int, ((B2+B3)*1.0/(A2+A3))*100)'
set @sql = @sql + ' from'
set @sql = @sql + ' tbl_Management'
set @sql = @sql + ' where'
if (@Year > 0)
begin
set @sql = @sql + ' [Year] = ' + convert(varchar(4), @Year)
set @sql = @sql + ' AND'
end
if (@YearPeriod > 0)
begin
set @sql = @sql + ' YearPeriod = ' + convert(varchar(2), @YearPeriod)
set @sql = @sql + ' AND'
end
if (@ProgrammeAreaCode is not null)
begin
set @sql = @sql + ' AreaCode = ''' + convert(varchar(3), @AreaCode) + ''''
set @sql = @sql + ' AND'
end
--trim off the last AND...
set @sql = left(@sql, len(@sql) - 3)
exec sp_sqlexec @sql
Celko Answers
Your real problem is that you still think in procedural terms and not
in declarative set-oriented programming.
Why did you think of dynamic SQL at all? (answer: you never un-learned
low level scripting or worked with a compiled language).
Why did you put redundant prefixes on data element names (answer: you
never un-learned BASIC, or other weakly typed languages or read
ISO-11179 naming rules).
Why did you avoid temporal data types? Answer: they do not exist in the
procedural language you mimic in SQL.
You do not understand data types and other constraints. Is there an
area code that varies in length? There will be in your data model!
Poor design invites garbage.
Why did you build the dynamic string in steps? Answer: procedural code
is done in steps; declarative code is only a few statements and they
execute all at once. Your bad code shows characteristic errors.
You have a magical, universal, vague and non-relational "id" in a
table because you do not know what a relational key is. It begs the
questions "id of what? How do I validate it? How do I verify it?"
Surely, you did not use IDENTITY as a key!! That would mimic a
sequential file and not anythign like an RDBMS.
Did you know that year is a reserved word in SQL? You would know that
if you had bothered to run a code check. But I will bet that you have
no such tool and that you do not keep a data dictionary either.
Why did you CAST() things so often in a strongly typed language?
Because you have never written in one before. Why do you use the
proprietary CONVERT to keep the code non-portable?
Why do you think that a1, b1, etc. are good data element names? Are
they standard in your trade? I doubt it. But you would know that if
you had a data dictionary.
Just cleaning up the code, we get some thing like this:
CREATE PROCEDURE GetPatientManagementAdmin
(@my_something_year INTEGER = NULL,
@my_year_period INTEGER = NULL,
@my_area_code CHAR(3) = NULL)
AS
SELECT patient_id, -- wild guess that this exists
area_code, something_year, year_period,
(a2+a3) AS a1, a2, a3,
(b2+b3) AS b1,
CAST(((b2+b3)*1.0/(a2+a3))*100)AS INTEGER)
AS x1 -- better name for all of this??
FROM Patient_Management
WHERE COALESCE (@my_ something_year, something_year) = something_year
AND COALESCE (@my_year_period, year_period) = year_period
AND = COALESCE (area_code, @my_area_code) = area_code;
But since you did not post DDL, we cannot solve the real problem; you
have a bad schema design. You are using integers for years!! You
should use (begin_date, end_date) pairs for temporal ranges. That would
probably mean an auxiliary calendar year That requires that you think
in terms of data solutions instead of procedural code.
Your year parameter probably ought to default to CURRENT_TIMESTAMP.
>> Also, if there is no practical useable natural key - like the Members table of a user group then using a IDENTITY for that is fine so long as you don't mind gaps. <<
Let's do the whole list of assumptions
1) do not mind gaps
2) do not need validation
3) do not need verification
4) will never scale up (Tony's no growth business model)
5) will never port your code- - no upgrades either! Daniel Wetzler
found out the hard way that IDENTITY changed behavior in SQL 2000 and
SQL 2005. If you perform the statement below you get only one dataset
which has the described properties.
SELECT DISTINCT IDENTITY (INTEGER) AS fake_id, title1, ..
FROM Foobar
WHERE title1 IS NOT NULL
AND ..
The IDENTITY function makes each row unique so DISTINCT doesn't
eliminate the duplicates in this case. Interestingly, this behavior
seems to have changed in SQL Server 2005. If I run this as a SELECT
INTO on 2005, the execution plan computes the IDENTITY value after
DISTINCT.
For 2000 the kludge is a bit hard to see. The following should insert
just one row into the target table.
CREATE TABLE Foobar (title1 VARCHAR(10), ..);
INSERT INTO Foobar VALUES ('1', ..);
INSERT INTO Foobar VALUES ('1', ..);
SELECT IDENTITY (INTEGER) AS fake_id, title1, ..
INTO Foobar2
FROM (SELECT DISTINCT title1, ..
FROM Foobar
WHERE ..);
Since we are dealling with a proprietary feature, this is subject to
change without noti ce again.
Of course, asking the members for their email address as their
identifier is never, ever done in the real world. Oh, wait -- it is!!
Do you suppose that there are other industry standard, well-known
identifiers in the world? Gasp!
Why, that would get around the problems mentioned above and be damn
near universal! But that is not as fast to code (e.g, requires a
CHECK() for a valid email address) as Tony's magical sequential number.
All you seem to care about is how fast you can code something, not
about the lifetime costs of maintaining the system.
SQL Statement question, eliminating a field in Group by Clause
SQL Apprentice Question
Is there a way not to use Group By on some fields when that field is only in
the CASE. For example, in the statement below I do not want to use
InvoiceType in the CASE because I do not want it to be grouped by
InvoiceType. But I want a Total of all invoices, as Subtotal, if the type if
1 it is positive if it is 2 it is negative.
SELECT Products.ProductName,
SubTotal = CASE WHEN InvoiceType=1
THEN
SUM(ISNULL(CustomerInvoiceDetails.UnitPrice,0)*ISNULL(CustomerInvoices.CurrencyRate,1)*ISNULL(CustomerInvoiceDetails.Quantity,0))
WHEN InvoiceType=2 THEN -SUM(ISNULL(CustomerInvoiceDetails.UnitPrice,0)*
ISNULL(CustomerInvoices.CurrencyRate,1)*ISNULL(CustomerInvoiceDetails.Quantity,0))
ELSE
SUM(ISNULL(CustomerInvoiceDetails.UnitPrice,0)*ISNULL(CustomerInvoices.CurrencyRate,1)*ISNULL(CustomerInvoiceDetails.Quantity,0))
END
FROM [Customer Invoices] CustomerInvoices LEFT JOIN [Customer Invoice
Details] CustomerInvoiceDetails
ON CustomerInvoices.InvoiceID = CustomerInvoiceDetails.InvoiceID
GROUP BY Products.ProductName,InvoiceType
Celko Answers
>> Is there a way not to use GROUP BY on some fields [sic] when that field [sic] is only in the CASE [expression]. <<
You might want to read a book on how a GROUP BY works, on why columns
are not anything like fields and how a CASE expression works. I also
hope that you stop using the proprietary ISNULL, equal sign, etc. and
write standard SQL some day.
Looking at the code you posted, I have to ask why the heck did you
allow NULLs in the tables?? You are cleaning up more NULLs in one
query than I have seen in entire databases for major automobile
companies. Why do you have order without details, as shown by the LEFT
OUTER JOIN?
If your schema is that screwed up, you have real data integrity
problems.
SELECT P.product_name,
SUM (CASE I.invoice_type
WHEN 2
THEN -D.unit_price * I.currency_rate * D.quantity
ELSE D.unit_price * I.currency_rate * D.quantity
END) AS product_total
FROM CustomerInvoices AS I,
CustomerInvoiceDetails AS D,
Products AS P
WHERE I.invoice_id = D.invoice_id
AND D.product_id = P.product_id
GROUP BY P.product_name, I.invoice_type;
Of course you did not bother to post DDL, so this is a guess based on
that missing DDL.
Is there a way not to use Group By on some fields when that field is only in
the CASE. For example, in the statement below I do not want to use
InvoiceType in the CASE because I do not want it to be grouped by
InvoiceType. But I want a Total of all invoices, as Subtotal, if the type if
1 it is positive if it is 2 it is negative.
SELECT Products.ProductName,
SubTotal = CASE WHEN InvoiceType=1
THEN
SUM(ISNULL(CustomerInvoiceDetails.UnitPrice,0)*ISNULL(CustomerInvoices.CurrencyRate,1)*ISNULL(CustomerInvoiceDetails.Quantity,0))
WHEN InvoiceType=2 THEN -SUM(ISNULL(CustomerInvoiceDetails.UnitPrice,0)*
ISNULL(CustomerInvoices.CurrencyRate,1)*ISNULL(CustomerInvoiceDetails.Quantity,0))
ELSE
SUM(ISNULL(CustomerInvoiceDetails.UnitPrice,0)*ISNULL(CustomerInvoices.CurrencyRate,1)*ISNULL(CustomerInvoiceDetails.Quantity,0))
END
FROM [Customer Invoices] CustomerInvoices LEFT JOIN [Customer Invoice
Details] CustomerInvoiceDetails
ON CustomerInvoices.InvoiceID = CustomerInvoiceDetails.InvoiceID
GROUP BY Products.ProductName,InvoiceType
Celko Answers
>> Is there a way not to use GROUP BY on some fields [sic] when that field [sic] is only in the CASE [expression]. <<
You might want to read a book on how a GROUP BY works, on why columns
are not anything like fields and how a CASE expression works. I also
hope that you stop using the proprietary ISNULL, equal sign, etc. and
write standard SQL some day.
Looking at the code you posted, I have to ask why the heck did you
allow NULLs in the tables?? You are cleaning up more NULLs in one
query than I have seen in entire databases for major automobile
companies. Why do you have order without details, as shown by the LEFT
OUTER JOIN?
If your schema is that screwed up, you have real data integrity
problems.
SELECT P.product_name,
SUM (CASE I.invoice_type
WHEN 2
THEN -D.unit_price * I.currency_rate * D.quantity
ELSE D.unit_price * I.currency_rate * D.quantity
END) AS product_total
FROM CustomerInvoices AS I,
CustomerInvoiceDetails AS D,
Products AS P
WHERE I.invoice_id = D.invoice_id
AND D.product_id = P.product_id
GROUP BY P.product_name, I.invoice_type;
Of course you did not bother to post DDL, so this is a guess based on
that missing DDL.
Most efficient way to run update query
SQL Apprentice Question
Hi all,
Any thoughts on the best way to run an update query to update a specific
list of records where all records get updated to same thing. I would think
a temp table to hold the list would be best but am also looking at the
easiest for an end user to run. The list of items is over 7000
Example:
update imitmidx_sql set activity_cd = 'O', activity_dt = '20060601',
prod_cat = 'OBS' where item_no = '001-LBK'
update imitmidx_sql set activity_cd = 'O', activity_dt = '20060601',
prod_cat = 'OBS' where item_no = '001-LYE'
update imitmidx_sql set activity_cd = 'O', activity_dt = '20060601',
prod_cat = 'OBS' where item_no = '001-XLBK'
update imitmidx_sql set activity_cd = 'O', activity_dt = '20060601',
prod_cat = 'OBS' where item_no = '001-XLYE'
update imitmidx_sql set activity_cd = 'O', activity_dt = '20060601',
prod_cat = 'OBS' where item_no = '002-LGR'
update imitmidx_sql set activity_cd = 'O', activity_dt = '20060601',
prod_cat = 'OBS' where item_no = '002-LRE'
All records get set to same. I tried using an IN list but this was
significantly slower:
update imitmidx_sql set activity_cd = 'O', activity_dt = '20060601',
prod_cat = 'OBS'
where item_no in
('001-LBK',
'001-LYE',
'001-XLBK',
'001-XLYE',
'002-LGR',
'002-LRE')
Celko Answers
Put the working data into a working table with an index on it and try
this:
UPDATE Imitmidx
SET activity_cd = 'O',
activity_dt = '2006-06-01',
prod_cat = 'OBS'
WHERE item_no
IN (SELECT item_no FROM WorkingData);
You can also add constraints for data scrubbing to WorkingData.
Hi all,
Any thoughts on the best way to run an update query to update a specific
list of records where all records get updated to same thing. I would think
a temp table to hold the list would be best but am also looking at the
easiest for an end user to run. The list of items is over 7000
Example:
update imitmidx_sql set activity_cd = 'O', activity_dt = '20060601',
prod_cat = 'OBS' where item_no = '001-LBK'
update imitmidx_sql set activity_cd = 'O', activity_dt = '20060601',
prod_cat = 'OBS' where item_no = '001-LYE'
update imitmidx_sql set activity_cd = 'O', activity_dt = '20060601',
prod_cat = 'OBS' where item_no = '001-XLBK'
update imitmidx_sql set activity_cd = 'O', activity_dt = '20060601',
prod_cat = 'OBS' where item_no = '001-XLYE'
update imitmidx_sql set activity_cd = 'O', activity_dt = '20060601',
prod_cat = 'OBS' where item_no = '002-LGR'
update imitmidx_sql set activity_cd = 'O', activity_dt = '20060601',
prod_cat = 'OBS' where item_no = '002-LRE'
All records get set to same. I tried using an IN list but this was
significantly slower:
update imitmidx_sql set activity_cd = 'O', activity_dt = '20060601',
prod_cat = 'OBS'
where item_no in
('001-LBK',
'001-LYE',
'001-XLBK',
'001-XLYE',
'002-LGR',
'002-LRE')
Celko Answers
Put the working data into a working table with an index on it and try
this:
UPDATE Imitmidx
SET activity_cd = 'O',
activity_dt = '2006-06-01',
prod_cat = 'OBS'
WHERE item_no
IN (SELECT item_no FROM WorkingData);
You can also add constraints for data scrubbing to WorkingData.
Google API to make text search into database
SQL Apprentice QuestionI'd like to know if it's possible to use Google APIs to make a text
search into DBs such Oracle or Sql Server.
The trend, certainly within the Microsoft space is to use SQL to query
non-structred data that offers a standard language people can use, most
developers know SQL which is why they have taken that direction, you may
also want to google the LINQ project which is even more exciting.
SQL is quite definitely not and should not imho be limited to query
relational data as definied by a 'rdbms'.
As somebody who supports standards I would have thought you'd appreciate the
'standard' interface to 'data' be it relational or non-structured.
Celko Answers
Do not use SQL for text and document searching; get a tool built for
this job. If you are doing this only once, then yhou can use whatever
your RDBMS product has for strings.
>> The trend, certainly within the Microsoft space is to use SQL to query non-structured data that offers a standard language people can use, most developers know SQL which is why they have taken that direction ..<<
I would say that Microsoft is trying to lock people into their products
and their proprietary tools, not a love of standards.
The developers who work with text are not SQL programmers; they have a
whole different mindset. Our mindset is strongly valued logic and
syntax rules; they have fuzzy logics and semantics. Knowledge
management journals are nothings like TODS and SigMod journals. Etc.
>> SQL is quite definitely not and should not imho be limited to query relational data as defined by a 'rdbms'. <<
What else should a *RELATIONAL DATA base system" do besides , well,
relational data? List processing? Symbolic Algebra? Semantic info
processing? Graph structures? Chop tomatoes? The Swiss Army Knife
school of software!
>> As somebody who supports standards I would have thought you'd appreciate the 'standard' interface to 'data' be it relational or non-structured. <<
I do. The de jure standards for text searching were set up by NISO,
and the de facto ones by Lexus, Nexis, WestLaw and Google.
search into DBs such Oracle or Sql Server.
The trend, certainly within the Microsoft space is to use SQL to query
non-structred data that offers a standard language people can use, most
developers know SQL which is why they have taken that direction, you may
also want to google the LINQ project which is even more exciting.
SQL is quite definitely not and should not imho be limited to query
relational data as definied by a 'rdbms'.
As somebody who supports standards I would have thought you'd appreciate the
'standard' interface to 'data' be it relational or non-structured.
Celko Answers
Do not use SQL for text and document searching; get a tool built for
this job. If you are doing this only once, then yhou can use whatever
your RDBMS product has for strings.
>> The trend, certainly within the Microsoft space is to use SQL to query non-structured data that offers a standard language people can use, most developers know SQL which is why they have taken that direction ..<<
I would say that Microsoft is trying to lock people into their products
and their proprietary tools, not a love of standards.
The developers who work with text are not SQL programmers; they have a
whole different mindset. Our mindset is strongly valued logic and
syntax rules; they have fuzzy logics and semantics. Knowledge
management journals are nothings like TODS and SigMod journals. Etc.
>> SQL is quite definitely not and should not imho be limited to query relational data as defined by a 'rdbms'. <<
What else should a *RELATIONAL DATA base system" do besides , well,
relational data? List processing? Symbolic Algebra? Semantic info
processing? Graph structures? Chop tomatoes? The Swiss Army Knife
school of software!
>> As somebody who supports standards I would have thought you'd appreciate the 'standard' interface to 'data' be it relational or non-structured. <<
I do. The de jure standards for text searching were set up by NISO,
and the de facto ones by Lexus, Nexis, WestLaw and Google.
Subscribe to:
Posts (Atom)