SQL for Smarties | SQL Programming Style | Trees and Hierarchies in SQL | SQL Puzzles and Answers | Data and Databases


Monday, June 12, 2006

Why use a composite PK ever?

SQL Apprentice Question
In a fabulous book -- "Practical Issues in Database Management" -- the
fabulous author, Pascal Fabian, lists three reasons for choosing one
candidate PK over another: familiarity, stability, and simplicity. Just
started skimming through the book.

He notes further that those influenced by OO db design tend to use
simple, surrogate keys for PKs in all tables; that this is not
*precluded* by relational theory, but that it's somehow illicit.


I personally think it's a good rule of thumb to create surrogate keys
for almost all tables. Stability seems to be the single-most important
factor to consider. If the database can't uniquely identify a row,
what's the point? Choosing a surrogate key guarantees stability.


Why chance it on a composite key when you can be assured of uniqueness
with a sequence-generated surrogate key?


"Familiarity" seems like a spurious concern, and a poor tradeoff
against both stability(guaranteeing you are uniquely identifying rows)
and simplicity (with queries, and others intuiting your design).


Am I missing something? Why use a composite key *ever* (and by ever, I
mean for most purposes) aside from "familiarity?"


Could someone give a real-world example where "familiarity" is a
compelling reason to choose a composite PK, trumping both stability and
simplicity?


Finally, if there *are* any compelling reasons to prefer composite keys
in some situations, after how many attributes required to create a
composite key should one give up and create a surrogate key? I've seen
composite keys of up to 5 attributes and often wondered why that
particular decision was made. Seems inelegant, but then perhaps I'm
missing some other fundamental here.


It's a great book by the way. Should be required reading for most
database designers.



Celko Answers
>> [simple, surrogate keys for PKs in all tables] I personally think it's a good rule of thumb to create surrogate keys for almost all tables. <<


No, absolutley no. First of all, they do not create surrogate keys;
they create exposed physical locators, like IDENTITY or other
auto-numbering schemes. Let me maker an appeal to authority, with a
quote from Dr. Codd: "..Database users may cause the system to generate
or delete a surrogate, but they have no control over its value, nor is
its value ever displayed to them ..."(Dr. Codd in ACM TODS, pp 409-410)
and Codd, E. (1979), Extending the database relational model to capture
more meaning. ACM Transactions on Database Systems, 4(4). pp.
397-434.

This means that a surrogate ought to act like an index; created by the
user, managed by the system and NEVER seen by a user. That means never
used in queries, DRI or anything else that a user does.


Codd also wrote the following:


"There are three difficulties in employing user-controlled keys as
permanent surrogates for entities.


(1) The actual values of user-controlled keys are determined by users
and must therefore be subject to change by them (e.g. if two companies
merge, the two employee databases might be combined with the result
that some or all of the serial numbers might be changed.).


(2) Two relations may have user-controlled keys defined on distinct
domains (e.g. one uses social security, while the other uses employee
serial numbers) and yet the entities denoted are the same.


(3) It may be necessary to carry information about an entity either
before it has been assigned a user-controlled key value or after it has
ceased to have one (e.g. and applicant for a job and a retiree).


These difficulties have the important consequence that an equi-join on
common key values may not yield the same result as a join on common
entities. A solution - proposed in part [4] and more fully in [14] -
is to introduce entity domains which contain system-assigned
surrogates. Database users may cause the system to generate or delete
a surrogate, but they have no control over its value, nor is its value
ever displayed to them....." (Codd in ACM TODS, pp 409-410).


References


Codd, E. (1979), Extending the database relational model to capture
more meaning. ACM Transactions on Database Systems, 4(4). pp. 397-434


A real surrogate key maintained by the RDBMS itself, like an index and
the user never sees it, much less uses it in his code. Ever use
hashing in Teradata?



>> Stability seems to be the single-most important factor to consider. If the database can't uniquely identify a row, what's the point? Choosing a surrogate key guarantees stability. <<


No, it makes the data unstable. You cannot validate or verify an
auto-numbering value. Think about it. I can look at an ISBN,
validate it with the check digit, then verify it with a search to
Amazon.com, B&N, Bowker, etc. in *any* database, on *any* SQL (or
non-SQL) engine.

By definition, a relational key is a subset of attributes of the
entity. You or your machinery do not invent it so much as discover it.



>> Why chance it on a composite key when you can be assured of uniqueness with a sequence-generated surrogate key? <<


Why is something you can verify more risk than something you cannot
verify? You use relational keys for data integrity, data portability,
code portability and the fact that if the composite key exists in the
real world, then you **must** model it in your schema. What is the
universal, verifiable single integer for a given (longitude, latitude)
pair? If you are a Douglas Adams fan the answer is 42! Always 42?

All you have done is waste storage and create weak redundancy.



>> "Familiarity" seems like a spurious concern, and a poor tradeoff against both stability (guaranteeing you are uniquely identifying rows) and simplicity (with queries, and others intuiting your design). <<


Wrong again. Familiarity, in the sense of codes that are universally
underestood by people in a given industry, make data exchange possible.
It also means that users can do "reasonableness" checks on data by
eye, with regular expressions, check digits and other programming
tools.


>> Could someone give a real-world example where "familiarity" is a compelling reason to choose a composite PK, trumping both stability and simplicity? <<


(longitude, latitude). Geographical hierarchies. Any co-ordinate
system ((x,y), (x,y,z), log-spiral, circular, etc.)


>> Finally, if there *are* any compelling reasons to prefer composite keys in some situations, after how many attributes required to create a composite key should one give up and create a surrogate key? <<


How big should a vehicle be? The real point is that if you have an
n-part composite key in the data model, you cannot make it go away with
a magic number (ever read the Khabal stuff that is a fad among movie
stars right now? Thinking there is a magic 17-digit Hebrew number that
God puts on everything is not science or logic).


>> 've seen composite keys of up to 5 attributes and often wondered why that particular decision was made. Seems inelegant, but then perhaps I'm missing some other fundamental here. <<


Data integrity and a correct data model. But people do not handle more
than five thigns very well, so that has been considered an upper limit
on human data processing ever since a guy named Brown wrote a paper on
it in the 1950's.

Let's look at the logical problems. First try to create a table with
two columns and try to make them both IDENTITY. If you cannot declare
more than one column to be of a certain data type, then that thing is
not a data type at all, by definition. It is a property which belongs
to the PHYSICAL table, not the LOGICAL data in the table. It also has
to be NULL-able to be a data type in SQL.


Next, create a table with one column and make it an IDENTITY. Now try
to insert, update and delete different numbers from it. If you cannot
insert, update and delete rows from a table, then it is not a table by
definition.


Finally create a simple table with one IDENTITY and a few other
columns. Use a few statements like


INSERT INTO Foobar (a, b, c) VALUES ('a1', 'b1', 'c1');
INSERT INTO Foobar (a, b, c) VALUES ('a2', 'b2', 'c2');
INSERT INTO Foobar (a, b, c) VALUES ('a3', 'b3', 'c3');


To put a few rows into the table and notice that the IDENTITY
sequentially numbered them in the order they were presented. If you
delete a row, the gap in the sequence is not filled in and the sequence
continues from the highest number that has ever been used in that
column in that particular table. This is how we did record numbers in
pre-allocated sequential files in the 1950's, by the way. A utility
program would then "pack" or "compress" the records that were flagged
as deleted or unused to move the empty space to the physical end of the
physical file. IDENTITY leaves the gaps.


But now use a statement with a query expression in it, like this:


INSERT INTO Foobar (a, b, c)
SELECT x, y, z
FROM Floob;


Since a query result is a table, and a table is a set which has no
ordering, what should the IDENTITY numbers be? The entire, whole,
completed set is presented to Foobar all at once, not a row at a time.
There are (n!) ways to number (n) rows, so which one do you pick? The
answer has been to use whatever the *physical* order of the result set
happened to be. That non-relational phrase "physical order" again!


But it is actually worse than that. If the same query is executed
again, but with new statistics or after an index has been dropped or
added, the new execution plan could bring the result set back in a
different physical order. Indexes and statistics are not part of the
logical model.


Can you explain from a logical model viewpoint why the same rows in the
second query get different IDENTITY numbers? In the relational model,
they should be treated the same if all the values of all the attributes
are identical.


Using IDENTITY as a primary key is a sign that there is no data model,
only an imitation of a sequential file system. Since this "magic,
all-purpose, one-size-fits-all" pseudo-identifier exists only as a
result of the physical state of a particular piece of hardware at a
particular time as read by the current release of a particular database
product, how do you verify that an entity has such a number in the
reality you are modeling?


Another cute way to destroy data integrity:


BEGIN ATOMIC
DELETE FROM Foobar
WHERE id = <>;
INSERT INTO Foobar
VALUES ( <>)
END;


Logically this should do nothing, but since IDENTITY has gaps, it
trashes the data.


Now you are REALLY thinking about relations and keys instead of 1950's
sequential record numbering. Adding an IDENTITY column to either of
these tables as a candidate key would be dangerously redundant; one
query uses the IDENTITY and another uses the real key, and like a man
with two watches, you are never sure what time it is.


Daniel Wetzler found out the hard way that IDENTITY changed behavior in
SQL 2000 and SQL 2005. If you perform the statement below you get only
one dataset which has the described properties.


SELECT DISTINCT IDENTITY (INTEGER) AS fake_id, title1, ..
FROM Foobar
WHERE title1 IS NOT NULL
AND ..


The IDENTITY function makes each row unique so DISTINCT doesn't
eliminate the duplicates in this case. Interestingly, this behavior
seems to have changed in SQL Server 2005. If Iupoi run this as a SELECT
INTO on 2005, the execution plan computes the IDENTITY value after
DISTINCT.


For 2000 the kludge is a bit hard to see. The following should insert
just one row into the target table.


CREATE TABLE Foobar (title1 VARCHAR(10), ..);


INSERT INTO Foobar VALUES ('1', ..);
INSERT INTO Foobar VALUES ('1', ..);


SELECT IDENTITY (INTEGER) AS fake_id, title1, ..
INTO Foobar2
FROM (SELECT DISTINCT title1, ..
FROM Foobar
WHERE ..);


Since we are dealling with a proprietary ...

Tough SQL problem, need expert advice!!!

SQL Apprentice Question
I think I am having a very tough problem, I need some expert advice
here.
Please bear with me since it will takes me a while to explain the
situation.

(Using SQL2005) I need to design the generic search utility on the
database server (run as Web Service), client supply the search criteria
(where clause)


I need return only one field:id back to user,


the data source the user need to search is coming from different
relational database tables , there is a main table and a few sub
tables, the main table has a primary key (id) and all the sub-table
have the main tables primary key as foreign key., but there are not
relationship among the sub tables, and client needs to able to search
the all the fields in all the tables.


Also, we have different database, each has its own main table and sub
tables, and the structure of each table is different. for each set, I
have those information stored in my own meta table. For each set I
could create view, dynamically generate sql statement on the fly from
the metadata table I have, but the code that generate the dynamic SQL
has to be the same


I have two ways to solve this problem but neither is satisfactory


1) Create a view or query that joins all the tables, because the
sub-tables only related to each other with the main table's primary
key, if I do a join on the main table and all sub- tables, I would
create a Cartesian products.


Select distinct id from cartesian_product_view where
maintable.field1=field1 and subtable1.field2=fied2 and
subtable3.field3=field3


Here on the main table with 40,000 rows, the cartesian_product_view has
100 million rows!!!!!


2) Substitute the where clause with subquery . for example, if the
where clause is following:


where subtable1.name ='john' or subtable2.product_code ='xyz'
or subtable1.name ='tom


I will replace it with the following : (please trust me I have the code
to do the Substitution but it will be too hard to explain here)


Select distinct maintable.id where
exists (select * from subttable1 where name = 'john' and
maintable.id=subtable1.id)
or
exists (select * from subttable2 where product_code = 'xyz' and
maintable.id=subtable2.id)
or
exists (select * from subttable1 where name = 'tom' and
maintable.id=subtable1.id)
order by ...


the problem with that is the query is getting slower & slower when user
specify more search criteria, it spend more time parsing the dynamic
query than querying the database, when you specify enough criteria
(which translated into subqueries) SQL Server just gave up and throw
the following exception.


The query processor ran out of internal resources and could not produce
a query plan. This is a rare event and only expected for extremely
complex queries or queries that reference a very large number of tables
or partitions. Please simplify the query. If you believe you have
received this message in error, contact Customer Support Services for
more information.


So I am totally stuck.


Please advice.


Celko Answers
I think what you want is the ability to load tables with criteria and
not have to use dynamic SQL:

skill = Java AND (skill = Perl OR skill = PHP)


becomes the disjunctive canonical form:


(Java AND Perl) OR (Java AND PHP)


which we load into this table:


CREATE TABLE Query
(and_grp INTEGER NOT NULL,
skill CHAR(4) NOT NULL,
PRIMARY KEY (and_grp, skill));


INSERT INTO Query VALUES (1, 'Java');
INSERT INTO Query VALUES (1, 'Perl');
INSERT INTO Query VALUES (2, 'Java');
INSERT INTO Query VALUES (2, 'PHP');


Assume we have a table of job candidates:


CREATE TABLE Candidates
(candidate_name CHAR(15) NOT NULL,
skill CHAR(4) NOT NULL,
PRIMARY KEY (candidate_name, skill));


INSERT INTO Candidates VALUES ('John', 'Java'); --winner
INSERT INTO Candidates VALUES ('John', 'Perl');
INSERT INTO Candidates VALUES ('Mary', 'Java'); --winner
INSERT INTO Candidates VALUES ('Mary', 'PHP');
INSERT INTO Candidates VALUES ('Larry', 'Perl'); --winner
INSERT INTO Candidates VALUES ('Larry', 'PHP');
INSERT INTO Candidates VALUES ('Moe', 'Perl'); --winner
INSERT INTO Candidates VALUES ('Moe', 'PHP');
INSERT INTO Candidates VALUES ('Moe', 'Java');
INSERT INTO Candidates VALUES ('Celko', 'Java'); -- loser
INSERT INTO Candidates VALUES ('Celko', 'Algol');
INSERT INTO Candidates VALUES ('Smith', 'APL'); -- loser
INSERT INTO Candidates VALUES ('Smith', 'Algol');


The query is simple now:


SELECT DISTINCT C1.candidate_name
FROM Candidates AS C1, Query AS Q1
WHERE C1.skill = Q1.skill
GROUP BY Q1.and_grp, C1.candidate_name
HAVING COUNT(C1.skill)
= (SELECT COUNT(*)
FROM Query AS Q2
WHERE Q1.and_grp = Q2.and_grp);


You can retain the COUNT() information to rank candidates. For example
Moe meets both qualifications, while other candidates meet only one of
the two. You can Google "canonical disjunctive form" for more details.
This is a form of relational division.

Named Mistakes and Questionable Practices

SQL Apprentice Question
Has anybody here made a list of canonical mistakes[1] that DB
practitioners make?


One example mistake is the One True Lookup Table aka the
Entity-Attribute-Value table, as in this article:
<http://www.dbazine.com/ofinterest/oi-articles/celko22>. Doubtless
people have run across others. Ideally, such references would:


* Name the mistake,
* Describe the mistake's implementation, and
* Show how and under what circumstances it's a mistake, as the above
article does.


Thanks in advance for any hints, tips, or pointers :)


Cheers,
D


[1] Let's exclude, for the purposes of this discussion, the idea that
SQL itself is a giant mistake and that we should now be using
VaporWareTransRelationalDBMS™--vague, questionable patents
filed--which will eventually be a product some day. We can say that
that horse has already been beaten well enough and is already
deceased, if it makes people happier.


Celko Answers
>> Has anybody here made a list of canonical mistakes that DB practitioners make? <<


Not really, but we do have a few:

1) EAV = Entity-attribute-vaue
2) OTLT = One True Lookup Table aka MUCK (Massively Unified Code Keys)
3) Integrity in the Apps
4) Denormalized tables
5) Auto-increment methods for keys
6) No keys at all


>> I presume this is considered a mistake only because of SQL allowing duplicate "rows". In the real RM, no such mistake is possible. <<


No, we would have Dr. Codd's "Degree of Duplication" operator that
nobody mentions. This is how he handled dups in RM II. At least you
can avoid dups in base tables in SQL via constraints and in results via
SELECT DISTINCT.

Teradata used to disallow duplicate rows in its early versions, but
lost out to Standard SQL. My wish is that we had put in a [CREATE |
DECLARE] FILE () construct for the non-tables. But
SQL was built on the existing indexed file systems of the day, and we
are stuck with the family legacy (aka "family curse").

Monday, June 05, 2006

Referential Integrity problem

SQL Apprentice Question
I'm using Microsoft SQL Server Management Studio Express 9.00.2047.00
and expriencing problems with setting referential integrity on a link
table. The tables' schema is as follows:
-------------------------------------------------------------------
CREATE TABLE competencies (
CID bigint identity(1,1) CONSTRAINT pk_CID PRIMARY KEY,
LockedBy bigint DEFAULT 0 NOT NULL
CONSTRAINT fk_UserID
REFERENCES usr_info(userID)
ON DELETE SET DEFAULT
ON UPDATE CASCADE
)
---------------------------------------------------------
CREATE TABLE usr_info (
userID bigint IDENTITY(0,1) CONSTRAINT pk_UID PRIMARY KEY,
ActiveFlag bit default 0 NOT NULL, --(1='Yes', 0='No')
FirstName varchar(100) default '' NOT NULL,
LastName varchar(100) default '' NOT NULL
)
-------------------------------------------------------
CREATE TABLE competency_hdr (
fkCID bigint default 0 NOT NULL
CONSTRAINT fkCID_ch
REFERENCES competencies(CID)
ON DELETE CASCADE
ON UPDATE CASCADE,
ApprovedBy bigint default 0 NOT NULL
CONSTRAINT fkUserID_ch
REFERENCES usr_info(userID)
ON DELETE SET DEFAULT -- NO delete if user is deleted
ON UPDATE CASCADE
)
--------------------------------------------------------
When I execute the above I get the following error message.

Msg 1785, Level 16, State 0, Line 1
Introducing FOREIGN KEY constraint 'fkUserID_ch' on table
'competency_hdr' may cause cycles or multiple cascade paths. Specify
ON DELETE NO ACTION or ON UPDATE NO ACTION, or modify other FOREIGN
KEY constraints.
Msg 1750, Level 16, State 0, Line 1
Could not create constraint. See previous errors.


Now, if i swap the fields around then the error message changes to
that of the fkCID field.


Basically what I want is:
when I delete a competency record I need all references to this record
to be deleted.
when I delete a user I want to set the foreign key to zero (the record
must remain on the database).


Obviously there is something I'm missing here. Any advice

Celko Answers
Have you considered a relational design? It would have actual keys of
a proper data type and some research about standards? Do you really
know anyone with a first_name that is VARCHAR(100)? Why did you think
that IDENTITY can ever, ever be a key? And a key with a DEFAULT?
Defaults are for attributes which can have multiple occurrences of
values. And why did think that you need a BIGINT so you can have more
users than the entire population of Earth?

Can you use the "Dictionary of Occupational Titles" for
Competencies? What research did you do?


Why is a user an attribute of a Competency? Only Fred can be a
carpenter? Why do you use bit flags in SQL?


CREATE TABLE Competencies
(dot_code CHAR(6) NOT NULL PRIMARY KEY,
dot_description VARCHAR(25) NOT NULL);


Shouldn't there be a user attribute for approval powers? You did not
show one.


CREATE TABLE Users
(user_id INTEGER NOT NULL PRIMARY KEY, -- needs research!
first_name VARCHAR(25) NOT NULL,
last_name VARCHAR(25) NOT NULL);



>> When I delete a competency record [sic] I need all references to this record [sic] to be deleted. When I delete a user I want to set the foreign key to zero (the record [sic] must remain on the database). <<


One of the first steps to learning to think in SQL is that a row and a
record are totally different. Until then, you will keep setting flags
(like your "foreign keys are set to zero" violation of 1NF).

You are trying to keep a history, so you need durations in the data
model.


CREATE TABLE Assignments
(user_id INTEGER NOT NULL
REFERENCES Users(user_id)
ON DELETE CASCADE
ON UPDATE CASCADE,
dot_code CHAR(6) NOT NULL
REFERENCES Competencies(dot_code)
ON UPDATE CASCADE,
assignment_date DATETIME
DEFAULT CURRENT_TIMESTAMP NOT NULL,
completion_date DATETIME
DEFAULT CURRENT_TIMESTAMP
CHECK (assignment_date < completion_date),
PRIMARY KEY (user_id, dot_code, assignment_date),
approving_user_id INTEGER NOT NULL
REFERENCES Users(user_id)
ON DELETE CASCADE
ON UPDATE CASCADE);


This is proper SQL, but SQL Server will not allow it. The product
worries about cycles in DRI.


One kludge might be to split out the approvers


CREATE TABLE Competencies
(dot_code CHAR(6) NOT NULL PRIMARY KEY,
dot_description VARCHAR(25) NOT NULL);


CREATE TABLE Users
(user_id INTEGER NOT NULL,
user_type CHAR(1) DEFAULT 'R' NOT NULL
CHECK (user_type IN ('R', 'A')), -- r= regular, a = approver
PRIMARY KEY (user_id, user_type),
first_name VARCHAR(25) NOT NULL,
last_name VARCHAR(25) NOT NULL);


CREATE TABLE Approvers
(user_id INTEGER NOT NULL,
user_type CHAR(1) NOT NULL
CHECK (user_type = 'A'),
PRIMARY KEY (user_id, user_type),
FOREIGN KEY (user_id, user_type)
REFERENCES Users (user_id, user_type)
ON DELETE CASCADE
ON UPDATE CASCADE);


CREATE TABLE RegularUsers
(user_id INTEGER NOT NULL,
user_type CHAR(1) NOT NULL
CHECK (user_type = 'R'),
PRIMARY KEY (user_id, user_type),
FOREIGN KEY (user_id, user_type)
REFERENCES Users (user_id, user_type)
ON DELETE CASCADE
ON UPDATE CASCADE);


CREATE TABLE Assignments
(user_id INTEGER NOT NULL,
user_type CHAR(1) NOT NULL CHECK (user_type = 'R')
FOREIGN KEY (iser_id, user_type)
REFERENCES RegularUsers(user_id, user_type)
ON DELETE CASCADE
ON UPDATE CASCADE,
dot_code CHAR(6) NOT NULL
REFERENCES Competencies(dot_code)
ON UPDATE CASCADE,
assignment_date DATETIME
DEFAULT CURRENT_TIMESTAMP NOT NULL,
completion_date DATETIME
DEFAULT CURRENT_TIMESTAMP
CHECK (assignment_date < completion_date),
PRIMARY KEY (user_id, dot_code, assignment_date),
approving_user_id INTEGER NOT NULL,
user_type CHAR(1) NOT NULL CHECK (user_type = 'A')
FOREIGN KEY (user_id, user_type)
REFERENCES ApprovingUsers(user_id, user_type)
ON DELETE CASCADE
ON UPDATE CASCADE);

Best method for maintaining list

SQL Apprentice Question
I am wondering what, if any, method would best suit the following request.

I want to maintain a list of items in SQL Server.
I'm thinking a Stored Procedure.


The list is of two columns: (both VarChar)
colName, colTag


data:
Name1, Tag1
Name2, Tag2
Name3, Tag3
Name5, Tag4


Now, before anyone says, why not a table, I would answer that I want to be
able to add/move line items around easily. And I don't want to have to
renumber an ID column of any type.
So If I wanted to quickly add a new row to this data between Name2 and
Name3, I don't want to have to open up Enterprise explorer to do it.
This list is for pulling into a .NET application.


Any ideas would be great.



Celko Answers
>> I want to maintain a list of items in SQL Server. <<


SQL has no list structure. It also has no pointers with which to
create such a thing.


>> The list is of two columns: (both VarChar) colName, colTag <<


Why did you put that silly, redundant prefix on the column names?
People who know RDBMS, data modeling and ISO-11179 Standards do not do
this!


>> Now, before anyone says, why not a table, I would answer that I want to be able to add/move line items around easily. <<


Since the table is the ONLY data structure in SQL, you either need to
model your problem in SQL or you need to use LISP or some other
language that has list structures.


>> And I don't want to have to renumber an ID column of any type. <<


But you do not mind the prospect of the pointer chains in list
manipulations? Which also leads to the quesitions, where the DDL? And
what is the key?


>> So If I wanted to quickly add a new row to this data between Name2 and Name3, I don't want to have to open up Enterprise explorer to do it. <<


You might want to look up the "Information Principle" when you finally
study RDBMS. All information is represented as values in columns --
NOT by something physical in the storage.

Rows are not records; fields are not columns; tables are not files;
there is no sequential access or ordering in an RDBMS, so "first",
"next" and "last" are totally meaningless. If you want an ordering,
then you need to have a column that defines that ordering.



>> This list is for pulling into a .NET application. <<


Why are trying to design the database to fit an application? That is
what we did 40+ yers ago when we had file systems and procedural
languages, not RDBMS and tiered architectures.

Given a table with an ordering column, position, like this:


CREATE TABLE Foobar
(display_position INTEGER NOT NULL PRIMARY KEY
CHECK (display_position > 0),
foo_value CHAR(10) NOT NULL);


Re-arrange the display order of foo_value based on the position column:


CREATE PROCEDURE SwapFoobars (@old_position INTEGER, @new_position
INTEGER)
AS
UPDATE Foobar
SET display_position
= CASE display_position
WHEN @old_position
THEN @new_position
ELSE display_position + SIGN(@old_position - @new_pos)
END
WHERE display_position BETWEEN @old_position AND @new_position
OR display_position BETWEEN @new_position AND @old_position;


When you want to drop a few rows, remember to close the gaps with this:


CREATE PROCEDURE CloseFoobarGaps ()
AS
UPDATE Foobar
SET display_position
= (SELECT COUNT (F1.position)
FROM Foobar AS F1
WHERE F1.display_position <= Foobar.display_position);


Is this so hard?

Complex SQL Query

SQL Apprentice Question
have three tables in my database, part of a tennis league results and

fixtures website I am putting together.


Results Table
id fixture_id home_team_rubbers away_team_rubbers
1 1 2 2
2 2 1 3
3 3 3 1
4 4 1 3
5 5 0 4
6 6 3 1
7 7 4 0
8 8 4 0
9 9 1 3
10 10 2 2
11 11 2 2
12 12 0 4
13 13 1 3
14 14 2 2
15 15 3 1
16 16 4 0
17 17 4 0
18 18 3 1
19 19 1 3
20 20 2 2
21 21 0 4
22 22 2 2
23 23 3 1
24 24 3 1


Fixtures Table
id home_team_id away_team_id
1 1 2
2 1 3
3 2 1
4 2 3
5 3 1
6 3 2
7 4 5
8 4 6
9 4 7
10 5 4
11 5 6
12 5 7
13 6 4
14 6 5
15 6 7
16 7 4
17 7 5
18 7 6
19 8 9
20 8 10
21 9 8
22 9 10
23 10 8
24 10 9


Team Table
id division_id club_id
1 1 1
2 1 2
3 1 3
4 2 1
5 2 2
6 2 2
7 2 3
8 3 1
9 3 2
10 3 2


What I want to do it list the league table positions for all the teams
in all the divisions with the same club_id as myclub_id through an SQL
query.


The league table positions are determined by the total number of
rubbers acheived (total_rubbers_acheived) which is the sum of
home_team_rubbers and away_team_rubbers and then sorted so that the
highest is in position 1.


For example the league table for division_id = 1
team_id total_rubbers_acheived league_table_position
3 9 1
1 8 2
2 7 3


For example the league table for division_id = 2
team_id total_rubbers_acheived league_table_position
7 19 1
4 14 2
6 9 3
5 6 4


For example the league table for division_id = 3
team_id total_rubbers_acheived league_table_position
10 10 1
8 8 2
9 6 3


The resulting output from the query that I desire is the following.


Query Output - when myclub_id = 1
team_id division_id league_table_position
1 1 2
4 2 2
8 3 2


Query Output - when myclub_id = 2
team_id division_id league_table_position
2 1 3
5 2 4
6 2 3
9 3 3
10 3 1


Query Output - when myclub_id = 3
team_id division_id league_table_position
3 1 1
7 2 1


Any ideas how I can do this?


Celko Answers
>> Any ideas how I can do this? <<


Have you considered using a relational design instead mimicking a
magnetic tape file? There is no such thing as a vague, universal id
that you can use to mark all the things in creation. Auto-increment is
a way of saying that you have no idea what a key is - and it ain't
a physical locator generated by the hardware!

I see that teams have no names, that you have no referencing among the
tables, so they are totally unrelated. Instead of computing standing
and results, you seem to want to write them to physical storage, thus
missing the basic point that tables - unlike files - can be virtual
tables.


I also find it strange that you have results, but nobody plays a game
in your model. But then you have a model where everything is a
SMALLINT.


CREATE TABLE Teams
(team_id SMALLINT NOT NULL PRIMARY KEY,
team_name CHAR(15) NOT NULL,
team_div CHAR(15) NOT NULL,
etc.);


When you say "rubbers" it means you are playing Bridge to me (or
engaged in another of my favorite sports with proper protection).


I vaguely remember that you score 0 (love), 15, 30, and 40 points
which leads to four points to win a game, six games to win a set and
two (or three?) sets to win a match. I am not sure if you want to keep
each set or just match points. Either way, you need better CHECK()
constraints than I am showing here to enforce valid scoring.


CREATE TABLE Games
(home_team_id SMALLINT NOT NULL
REFERENCES Teams(team_id),
away_team_id SMALLINT NOT NULL
REFERENCES Teams(team_id),
CHECK (away_team_id <> home_team_id),
game_date DATE NOT NULL,
home_team_score SMALLINT NOT NULL
CHECK (home_team_score >= 0),
away_team_score SMALLINT NOT NULL
CHECK (away_team_score >= 0),
PRIMARY KEY (away_team_id, home_team_id, game_date)
);


Do you need to be sure that teams are in the same division? Etc. You
did not post a good spec and assumed that everyone plays competition
Tennis, so they know the terms.

Views VS Temp Tables...which to use...or is there another solution?

SQL Apprentice Question
I have to retreive data that is somewhat complicated. This data in the
past has required two views and a final ASP SQL Query. I am in the
process of updating everything to ASP.Net 2.0 (VB) and would love some
advice as to making this more efficient.

What I am doing is collecting all of the data from a table and grouping
it. I am then taking this grouped data and counting the number of
times a true bit comes up within each group. Finaly the asp code
breaks this data down for a specific person's ID. Here is the code
somewhat changed.


(View One)
TestsCompletedCount


SELECT ConsumerID, SessionCode, TypeOfTest, NumberOfQuestionsChosen,
StartDateTime, COUNT(*) AS NumberOfQuestionsAnswered, CategoryOfTest,
Deleted, LearningMethod, ExamGeneralMathOrDefinition, RandomOrExam,
ExamNumber, MaterialChosen
FROM QuestionsAnswered
WHERE (IsPreviouslyDoneTest <> 1)
GROUP BY ConsumerID, SessionCode, TypeOfTest, NumberOfQuestionsChosen,
StartDateTime, CategoryOfTest, Deleted, LearningMethod,
ExamGeneralMathOrDefinition, RandomOrExam, ExamNumber, MaterialChosen


(View Two)
TestsCompletedCountWithCountOfAnswersWrong


SELECT TestsCompletedCount.ConsumerID, TestsCompletedCount.SessionCode,
TestsCompletedCount.TypeOfTest,
TestsCompletedCount.NumberOfQuestionsChosen,
TestsCompletedCount.StartDateTime,
TestsCompletedCount.NumberOfQuestionsAnswered,
COUNT(QuestionsAnswered.Correct) AS NumberOfQuestionsAnsweredWrong,
TestsCompletedCount.CategoryOfTest, TestsCompletedCount.Deleted,
TestsCompletedCount.LearningMethod,
TestsCompletedCount.ExamGeneralMathOrDefinition,
TestsCompletedCount.RandomOrExam, TestsCompletedCount.ExamNumber,
QuestionsAnswered.MaterialChosen
FROM TestsCompletedCount INNER JOIN QuestionsAnswered ON
TestsCompletedCount.StartDateTime = QuestionsAnswered.StartDateTime
WHERE (QuestionsAnswered.Correct = 0)
GROUP BY TestsCompletedCount.ConsumerID,
TestsCompletedCount.SessionCode, TestsCompletedCount.TypeOfTest,
TestsCompletedCount.NumberOfQuestionsChosen,
TestsCompletedCount.StartDateTime,
TestsCompletedCount.NumberOfQuestionsAnswered,
TestsCompletedCount.CategoryOfTest, TestsCompletedCount.Deleted,
TestsCompletedCount.LearningMethod,
TestsCompletedCount.ExamGeneralMathOrDefinition,
TestsCompletedCount.RandomOrExam, TestsCompletedCount.ExamNumber,
QuestionsAnswered.MaterialChosen


And the current asp query goes like this


SELECT * FROM TestsCompletedCountWithCountOfAnswersWrong WHERE
(NumberOfQuestionsAnswered >= NumberOfQuestionsChosen) and ConsumerID
= " & ConsumerID & " AND (CategoryOfTest ='XXX') And (Deleted = 0) and
MaterialChosen = '" & MaterialChosen & "' ORDER BY StartDateTime


Any suggestions as to how to redo this would be greatly appreciated.
Would a temp table be a better choice? Can I use a stored procedure?


Celko Answers

>> Any suggestions as to how to redo this would be greatly appreciated. <<


Just from the column names and vague narrative description, it sounds
like you need an "exam header " table and a questions table for the
details of each questions from which a "question status" (ansered
right , wrong not answered, etc.) on each answer gives you all of the
data you need.

I hope you really do not use an assembly language style
"IsPreviouslyDoneTest" flag when you should have a test date, a
"NumberOfQuestionsAnswered" which should be computed from a COUNT( CASE
WHEN question_status = 100 THEN 1 ELSE 0 END) kind of thing, etc.


Nobody should write with BITs and those silly 1950's flags in SQL! And
you should ALWAYS avoid temp tables. VIEWs are standard SQL; the SQL
Server model of temp tables is proprietary, so you r code will not port
easily. Temp tables can be replced by derived tables or CTEs which cna
be used by the optimize s well as ported.


Please post DDL, so that people do not have to guess what the keys,
constraints, Declarative Referential Integrity, data types, etc. in
your schema are. Sample data is also a good idea, along with clear
specifications. It is very hard to debug code when you do not let us
see it.

Stored Procedure question

SQL Apprentice Question
Hi
I am looking to write a stored procedure that uses a string inside the
IN statement but am having trouble with the syntax.

ex.
Create Procedure spMyProcedure
@bookID varchar(80)
AS
SELECT * FROM books WHERE bookID IN(@bookID)


I've tried this and it does not work. How would I pass in a comma
separated string so that the IN clause works?



Celko Answers

>> I am looking to write a stored procedure that uses a string inside the IN statement but am having trouble with the syntax. <<


No, you are having trouble with the most basic programing concepts --
not just SQL, but the most basic programing concepts. You have no
idea;

1) What a parameter is
2) How a compiled language works
3) What First Normal Form is and why it is important (okay, this is SQL
and not MVDBMS)


Do not feel like the Lone Ranger. This error is posted a few times a
month along with EAV, OTLT, MUCK and RTFM. Those misconceptions about
the RDBMS have names, which makes them easier to handle and to Google.


I propose a contest for a name for this piece of non-RDBMS disaster. I
will start with "NFNF parameters" of NFNFP for short. but it does not
flow off the tongue; can someone do better?

Get unique sequential number- best practice

SQL Apprentice Question
I need a number generator. (e.g. for Receipt number, or transaction number,
etc.) in a multiuser high volume envoironment. What is the best way to get
one from SQLserver2005? (no duplicates allowed)
1) I've seen a StoredProc that will get value, value++, then save back,
enclosed in a Transaction. This will work, but locks the table. A little
concerned about the blocking here.
2) Should I do the same without the Transaction and check for changed value
(optimistic lock?)
3) better way ?

Celko Answers
>> I need a number generator. (e.g. for Receipt number, or transaction number, etc.) in a multiuser high volume envoironment. <<


What kind of check digit and validation are you using? Is this number
exposed in such a way that your need a SOX audit trail? People think
this can be done on one machine with IDENTITY and it really is not that
esy, if you give a damn about doing it right. What IDENTITY says is
that you are planning on never being a large company with many stores
on purpose! The gps will not matter because nobody will ever invest in
the company so there is no need for good auditing and SOX compliance!
Not a great business plan.


>> 1) I've seen a StoredProc that will get value, value++, then save back, enclosed in a Transaction. This will work, but locks the table. A little concerned about the blocking here.<<


Not a problem, really. You can issue blocks of invoice numbers to
stores/salesmen or you can have a generator rule that adds the store,
cash register, timestamp and a sequence number to the sales ticket
(works for Home Depot, et al).


>> 2) Should I do the same without the Transaction and check for changed value


(optimistic lock?) <<

With a computed key like the Home Depot (they are on my mind today --
I just bought some keys), optimistic concurrency control (it is not
really locking) works great. But SQL Server is a pessimistic system by
nature. Want to use Firebird or Innerbase instead?



>> 3) better way ? <<


Look up additive congruence generators if you need a random number that
will not repeat. There are some games you can play with those that are
fun.

Again, there is no "Magic, Universal one-size-fits-all" answer. Ever
wonder why each industry has different standards? Different problems!

Dynamic WHERE Clause

SQL Apprentice Question
have a query which has a few different Time Period columns:
Half_Year (H1,H2)
Quarters (Q1,Q2,Q3,Q4)
Months (M1,M2,M3,.... M12)


These periods are held in three difference columns.


I need to run this query with 2 params. One will be the year and other will
be one of the above three:


i.e
sp_Rating 2005, 'H1'


This is all transactions in months 1=6 for the year 2005.


or sp_Rating 2005, 'Q3' or sp_Rating 2005, 'M7'


How can I dynamically interrogate the correct column, based on the param
supplied (H, Q, M) ?


Celko Answers
First of all, don't call your procedures sp_something. That prefix is
reserved for system procedures, and SQL Server first looks for a
procedure with such a name in the master database.


>> I have a query which has a few different Time Period columns: <<


Really? Mind showing us? Please post DDL, so that people do not have
to guess what the keys, constraints, Declarative Referential Integrity,
data types, etc. in your schema are. Sample data is also a good idea,
along with clear specifications. It is very hard to debug code when
you do not let us see it.


>> These periods are held in three difference columns. <<


Why? Are they LOGICALLY DIFFERENT? There is usually only one kind of
time in the universe. Try a proper design:

CREATE TABLE PeriodCalendar
(period_name CHAR(15) NOT NULL PRIMARY KEY
start_date DATETIME NOT NULL,
end_date DATETIME NOT NULL);


INSERT INTO PeriodCalendar ('Q1-2006', '2006-01-01', '2006-03-341');
etc. for all the possible fiscal, marketing and calendaral periods you
use.



>> I need to run this query with 2 params. One will be the year and other will be one of the above three: sp_Rating 2005, 'H1' <<


Again, not a good design; you seem to think that a year should be an
integer, while the ANSI standard say it is a CHAR(4); get a copy of the
8601 standards, too. Now life a JOIN and a BETWEEN predicate.

And you have already been told about not using "sp_" prefixes because
they refer to where something is phycially located as well as having a
special meaning in SQL Server.

covering Primary Keys.. What's too much

SQL Apprentice Question
I have a table which I'd like to create a Primary Key that would have to
cover 6 fields to enforce uniqueness.
My question is, is that too much. I know it's common to have tables with
PK's that cover 2-3 fields, but 6 seems like a little much. [3 int fields, 2
varchar(5) fields, 1 tinyint]
I know I can use a surrogate "ID" identity field as the PK, but I prefer to
use PK's that are more natural if possible.

Any thoughts on this?



Celko Answers

>> I have a table which I'd like to create a Primary Key that would have to cover 6 fields [sic] to enforce uniqueness. My question is, is that too much. <<


This is rare but possible. Since we do not have any specs or knwo
anything about the problem, all we can do is generalize. I have seen
some people create super-keys (sertial number + manufacturer, not
knowning that serial number has manufacturer in it).


>> I know I can use a surrogate "ID" identity field as the PK, but I prefer to use PK's that are more natural if possible. <<


1) it is not a surrogate key because it is exposed to the user; that is
the definition from Dr. Codd.

2) you will need to use a UNIQUE (c1,.. ,c6) constraint anyway, so you
now have two indexes


3) You will need code to assure that the IDENTITY is always validate
and points to the right target.

SELECT INTO with a table variable

SQL Apprentice Question
Is is possible to do something like this:

SELECT A,B,C INTO @TableVariable FROM @AnotherTableVariable?


I try it and I get syntax errors and I am not sure if it is just something I
am doing wrong or if it is something that is not supported. I know when I use
JOIN I need to specify an alias inseated of the full table variable name.
Maybe there is something similar with SELECT * INTO.


Thank you.


Kevin


Celko Answers
>> Is is possible to do something like this: <<


You tried it and got an error, so the answer is No :).

But a better question is why do you want to have two identical tables
in a schema? Why are you seeking to write code with poor cohesion, via
the proprietary table variables (I.e. your statement has no idea what
it is operating on until an unknown future user makes this design
decision for you on the fly -- could squids or automobiles, depending
on @TableVariable!)


Kevin, the questions you are posting show that you have not done your
homework and the best you can hope for is a pile of awful kludges from
people in the newsgroup who do not have the time to educated you
properly. By analogy, if this were a furniture makers newsgroup, you
are asking "what is the best kind of rock to smash screws into wood?"
The newsgroup answer will "Granite!", but the right answer is "you need
to learn about screwdrivers and the right kinds of screws for
particular job".

Changing the columns of a SELECT statement

SQL Apprentice Question
I have been "normalizing" some of the tables that we have been using and I
ran into a problem in one of our stored procedures.

It used to be (with the old table) that you could do something like:


SELECT A, B FROM OldTable


The stored procedure essentially does this.


Now with the new table(s) it is something like:


SELECT *
FROM Table
LEFT OUTER JOIN Attributes ON Attributes.AttributesId = Table.AttributesId


There is in the Attributes table an AttibutesTypeId and an AttributesValue.
So I get 'A' only when AttributesTypeId = 1 and 'B' when AttributesTypeId =
2. The question is how do I return A and B from the stored procedure given
this new table structure? I am tempted to create a temporary table and use a
cursor to move row by row through the table, but there must be a more
efficient way.



Celko Answers
Find the moron that did this to you and kill him. This is called a EAV
design and it is totally wrong in an RDBMS. It confuses data and meta
data and makes your most basic queries run several orders of magnitude
slower. But the lack of speed does not matter; you will have no data
integrity so you cannot trust the answers anyway.

You might also want to get any book on data modeling. Then you will
know that there cannot be such a creature is a "type_id" -- the data
element is either a type with a known code or it is an identifier of
some kind of entity.

You can Google old postings on EAV, OTLT and MUCK as
bad design decisions frequently made by non-SQL programmers.

Thursday, June 01, 2006

Very Challenging Question

SQL Apprentice Question
Here is a table representing the problem

A | B | C | D
-----------------
a1 b1 c1 d1
a1 b2 c2 d2
a3 b3 c1 d3
a4 b4 c4 d3
a5 b5 c5 d5
a6 b6 c6 d3


Tha duplications are:
row 1+2 in param A
row 1+3 in param C
row 3+4+6 in param D
only row 5 is unique in all parameters.
conclusion: row 1+2+3+4+6 are the same user
goal: to find all duplicated rows & to delete them all accept one
instance to leave.


Note:
Finding that row 1similar to 2 in A & deleting it will loose data
because we won't know that row 1 is ALSO similar to 3 on C & later on
finding that 3 is similar to 4 & 6 on D & so on


The simple time consuming (about 2 weaks) query to acomplish the task
is:
SELECT count(*),A.B,C,D
FROM tbl
GROUP BY A,B,C,D
HAVING count(*)>1


Celko Answers
A | B | C | D
-----------------
a1 b1 c1 d1
a1 b2 c2 d2
a3 b3 c1 d3
a4 b4 c4 d3
a5 b5 c5 d5
a6 b6 c6 d3

I am going to guess at the DDL and add another column


CREATE TABLE Foobar
(a CHAR(2) NOT NULL,
b CHAR(2) NOT NULL,
c CHAR(2) NOT NULL,
d CHAR(2) NOT NULL,
dups INTEGER DEFAULT 0 NOT NULL
CHECK(dups >= 0),
PRIMARY KEY(a, b, c, d));


INSERT INTO Foobar(a, b, c, d) VALUES ('a1', 'b1', 'c1', 'd1');
INSERT INTO Foobar(a, b, c, d) VALUES ('a1', 'b2', 'c2', 'd2');
INSERT INTO Foobar(a, b, c, d) VALUES ('a3', 'b3', 'c1', 'd3');
INSERT INTO Foobar(a, b, c, d) VALUES ('a4', 'b4', 'c4', 'd3');
INSERT INTO Foobar(a, b, c, d) VALUES ('a5', 'b5', 'c5', 'd5');
INSERT INTO Foobar(a, b, c, d) VALUES ('a6', 'b6', 'c6', 'd3');


Since you seem to want to preserve some of the information about
duplications, you can keep a tally


UPDATE Foobar
SET dups
= dups
+ COALESCE((SELECT 1
FROM Foobar AS F1
WHERE F1.a = Foobar.a
HAVING COUNT(*) > 1) ,0)
+ COALESCE((SELECT 1
FROM Foobar AS F1
WHERE F1.b = Foobar.b
HAVING COUNT(*) > 1),0)
+ COALESCE((SELECT 1
FROM Foobar AS F1
WHERE F1.c = Foobar.c
HAVING COUNT(*) > 1) ,0)
+ COALESCE((SELECT 1
FROM Foobar AS F1
WHERE F1.d = Foobar.d
HAVING COUNT(*) > 1),0);


The duplications are:
row 1+2 in param A
row 1+3 in param C
row 3+4+6 in param D
only row 5 is unique in all parameters.
conclusion: row 1+2+3+4+6 are the same user

Wednesday, May 31, 2006

Using Aggregate functions in joins

SQL Apprentice Question
SELECT
amount,
currencycode
FROM
Expenses (NOLOCK)


We have to select the latest exchnage rate from other table
EXCHANGE_RATE.. this table will contain exchange rate of every date. We
have to take the exchage rate for the latest day and use for
calculating amount in the expenses table.
i tried this but this is not working.


SELECT EX.usdamount *ER.Exchangerate
from EXPENSES EX INNER JOIN EXCHANGE_RATE ER
on EX.currencycode = ER.Currenycode
and ER.Date = max(Date) from Exchange_rate where currencycode =
ER.CurrencyCode


Can anyone help me with this


Celko Answers
>> We have to select the latest exchnage rate from other table EXCHANGE_RATE. this table will contain exchange rate of every date. <<


I would suggest that you create a table more like this:

CREATE TABLE ExchangeRates
(currency_code CHAR(3) NOT NULL,
rate DECIMAL(8,4) NOT NULL,
start_date DATETIME NOT NULL,
end_date DATETIME, -- current rate
CHECK (start_date < end_date),
PRIMARY KEY (currency_code, start_date));


Now use a VIEW to get the current rate. and BETWEEN predicates to match
a rate to a past date. The table will be faster and more useful than
doing a lot of subqueries and aggregates in your code.

Google API to make text search into database

SQL Apprentice Question
I'd like to know if it's possible to use Google APIs to make a text
search into DBs such Oracle or Sql Server.



Celko Answers
Do not use SQL for text and document searching; get a tool built for
this job. If you are doing this only once, then you can use whatever
your RDBMS product has for strings.

Need help design the database for the scheduling task application

SQL Apprentice Question
I need some help to design the database or table for my
application on scheduling task. My application is to do some scheduling
task based on daily, weekly and monthly. On the daily the choice are
everyday, weekdays, or every what ever value of the days. On the daily
the choice are every what ever value of the week on the selected days.
On the monthly the choice are on what date, on what day of what week,
and on selected month. My scheduling task choice is the same as
Microsoft Windows XP Professional Version 2002 SP 2 scheduling task
choice. I want to store these choices on database so I can check the
value using timer


Celko Answers
>> On the daily the choice are everyday, weekdays, or every what ever value of the days. On the daily the choice are every what ever value of the week on the selected days. On the monthly the choice are on what date, on what day of what week, and on selected month. <<


Do you have a calendar table which identifiies all of these groupings?
Do not try to compute them.

Monday, May 29, 2006

average question

SQL Apprentice Question
SELECT AVG(SCORE) AS AVERAGE_SCORE
FROM GRADES

SCORE is an integar column with values from 0 - 5

Now this only returns integer values such as 3, 4...

How can I make AVERAGE_SCORE to be a decimal?

ie 3.452


Celko Answers
select avg(1.0*SCORE) AS AVERAGE_SCORE FROM GRADES


Just put the query in a VIEW and it will be re-calculated each time.
You are still thinking like a COBOL programmer who wants to write all
his data to a file, not a like an SQL programmer who knows that a VIEW
is also a TABLE and does not have to havea physical existence.

use column value subquery

SQL Apprentice Question
Is it possible to use a columnvalue in a subquery? Like this:


SELECT id, tablename, columnname, description,
(SELECT Count(columnname) FROM tablename) as recordcount
FROM tableobjects


Obviously where "columnname" and "tablename" in the subquery should be the
values from columns in "tableobjects".



Celko Answers
>> Is it possible to use a columnvalue in a subquery? Like this: <<


Perhaps you ought to look up ther concepts of coupling an cohesion in a
basic text on software engineering, so you will know better than to
write code like this. This kidn of code is unpredictable until run
time, has no cohesion, etc. This is not just bad SQL; it is bad
programming in any language.

Get last inserted PK

SQL Apprentice Question
I have a table that has a computed value in the primary key (int column).
The computed expression is below:

(convert(int,(rand() * power(2,30))))


Does anyone know how I can get the last inserted value on this table since I
can't use @@IDENTITY. Thanks.

Celko Answers
What you have will fail because of duplication. Use additive congrunce instead.

http://www.rationalcommerce.com/resources/keys.htm http://www.rationalcommerce.com/resources/surrogates.htm http://www.rationalcommerce.com/resources/lfsr.gif

The formula is easy for a 31-bit number, always gives a unique answer and you will have the last number in a one-row table that will stay in main storage.