SQL for Smarties | SQL Programming Style | Trees and Hierarchies in SQL | SQL Puzzles and Answers | Data and Databases


Tuesday, April 11, 2006

Complicated Update query based on existing data

SQL Apprentice Question
OK folks, may have a tough one or perhaps just not thinking it through
well. I need to create a SQL query or queries that updates two columns
based on some business rules. Here's an example of the data:

GroupID complete_num first_num second_num InUse biggest
965423 1.0 1 0
965423 2.0 2 0
965423 3.0 3 0 X
965423 3.1 3 1
965423 3.2 3 2 X
324554 1.0 1 0
324554 2.0 2 0 X X
123456 0.1 0 1
123456 0.2 0 2 X X



Hopefully the above even vaguely lines up for you. The last two
columns are currently blank. The representation above is how I would
like them to look after the queries run. So for the above data, you
have a group id that links all records for one set together. I need to
have the "InUse" box updated with a value of "X" for the highest number
that has zero in the second_num column for a group. I also need the
biggest column set to "X" for the largest number in a particular group,
so 3.1 is larger than 3.0. Keep in mind, I have separated the
complete_num into two columns as there could be a "decimal" value of
"10" which is higher than "1". They are not the same as complete_num
is not really a decimal numeric representation. It's used
programatically for other things. Seperating into two separate columns
allows better sorting of data as complete_num is varchar and first and
second num are integer. You will also see the case, 123456,where there
is no zero in the second_num column. In this case, the highest
second_num value will have both set to "X". Any help would be
appreciated and SQL queries are prefered over any procedures/functions
as this is a one time query I need to run against the database. If you
need further clarification or more examples, please let me know.

As requested, here is a table creation script and some minimal data to
limit the length of the message. Also Hugo's view is great however
since this data is for one time use, would imagine updates to the
existing data would be preferable over introducing a new view into the
mix. If not updating the data directly based on the view would now be
a simple matter when you introduce the Id from the original table into
the view.

CREATE TABLE [abcd].[dbo].[TBL1
(Id,GroupId,complete_num,first_num,second_num)] (
[Id] int NOT NULL,
[GroupID] nvarchar (60) NULL,
[complete_num] varchar (255) NULL,
[first_num] integer,
[second_num] integer,
[InUse] varchar (2) NULL,
[biggest] varchar (2) NULL,
)

INSERT INTO TBL1 (Id,GroupId,complete_num,first_num,second_num) VALUES
(91510,ABC1235,2.0,2,0);
INSERT INTO TBL1 (Id,GroupId,complete_num,first_num,second_num) VALUES
(89377,ABC1235,2.1,2,1);
INSERT INTO TBL1 (Id,GroupId,complete_num,first_num,second_num) VALUES
(89371,ABC1235,2.2,2,2);
INSERT INTO TBL1 (Id,GroupId,complete_num,first_num,second_num) VALUES
(1310,M123456,1.0,1,0);
INSERT INTO TBL1 (Id,GroupId,complete_num,first_num,second_num) VALUES
(1309,M123456,2.0,2,0);
INSERT INTO TBL1 (Id,GroupId,complete_num,first_num,second_num) VALUES
(1311,M123456,3.0,3,0);
INSERT INTO TBL1 (Id,GroupId,complete_num,first_num,second_num) VALUES
(1312,M123456,4.0,4,0);
INSERT INTO TBL1 (Id,GroupId,complete_num,first_num,second_num) VALUES
(1315,M123456,5.0,5,0);
INSERT INTO TBL1 (Id,GroupId,complete_num,first_num,second_num) VALUES
(1318,M123456,6.0,6,0);
INSERT INTO TBL1 (Id,GroupId,complete_num,first_num,second_num) VALUES
(1319,M123456,7.0,7,0);
INSERT INTO TBL1 (Id,GroupId,complete_num,first_num,second_num) VALUES
(1317,M123456,8.0,8,0);
INSERT INTO TBL1 (Id,GroupId,complete_num,first_num,second_num) VALUES
(5342,M123456,9.0,9,0);
INSERT INTO TBL1 (Id,GroupId,complete_num,first_num,second_num) VALUES
(5346,M123456,10.0,10,0);
INSERT INTO TBL1 (Id,GroupId,complete_num,first_num,second_num) VALUES
(5756,M123456,11.0,11,0);
INSERT INTO TBL1 (Id,GroupId,complete_num,first_num,second_num) VALUES
(6315,M123456,12.0,12,0);
INSERT INTO TBL1 (Id,GroupId,complete_num,first_num,second_num) VALUES
(6604,M123456,13.0,13,0);
INSERT INTO TBL1 (Id,GroupId,complete_num,first_num,second_num) VALUES
(6920,M123456,14.0,14,0);
INSERT INTO TBL1 (Id,GroupId,complete_num,first_num,second_num) VALUES
(1002,M123456,15.0,15,0);
INSERT INTO TBL1 (Id,GroupId,complete_num,first_num,second_num) VALUES
(4023,ARDFO32,0.1,0,1);



Celko Answers
Your programs will be total nightmares and crap until you learn how to
design a schema.

Look at the DDL; do you really have a NCHAR(60) grpoup identifier? In
Chinese?? Wel;l;, since you allowed it, you will get one! Why do you
have redundant split attributes (i.e. first_num || second_num =
complete_num)? Let's spit on normalization!! I also love the clear,
meaningful names of the data elements. Tell us what a thing is, not
its sequential order inside another column. Logical not physical
descriptions.

There are no keys, no constraints. This is not a table at all! And
you invented your own syntax for CREATE TABLE.

Your idea of updatind computed columns is a way to mimic punch cards.
Back in the 1950-60's we had to store those things in the physical
card, like you are doing now.

Making a guess, if you normalized your schema, had a key and followed
the baisc data modeling rules, would this nightmare look more like
this? Better names that show subordination (well, Foobar is a dummy
name, but that is all you gave us)

CREATE TABLE Foobar
(group_id CHAR(6) NOT NULL
CHECK (group_id LIKE '[0-9][0-9][0-9][0-9][0-9][0-9]')
section_nbr INTEGER DEFAULT 1 NOT NULL
CHECK (section_num >= 0),
subsection_nbr INTEGER DEFAULT 0 NOT NULL
CHECK (subsection_num >= 0),
PRIMARY KEY (group_id, section_nbr, subsection_nbr));

Do you need a constaint to assure that the subsections are in sequence?
Is there a check digit rule in the group_id? 90% of the work in RDBMS
is done in the DDL!!

>> So for the above data, you have a group id that links [sic] all records [sic] for one set together. <<

SQL does not have links; it has REFERENCES and grouping. Totally
different concepts, based on sets and not pre-RDBMS file and pointer
systems. Rows are nothing whatsoever like records.

>> I need to have the "InUse" box [sic, tha tis display, not SQL!] updated with a value of "X" for the highest number that has zero in the second_num [subsection] column for a group. <<

>> I also need the biggest column set to "X" for the largest number in a particular group,so 3.1 is larger than 3.0.<<

Now, the answer to your question is a VIEW, not a "punch cards and bit
flags" solution via updates.

CREATE VIEW InUseFoobar (group_id, section_nbr, subsection_nbr)
AS
SELECT group_id, MAX(section_nbr), 0
FROM Foobar
WHERE subsection = 0
GROUP BY group_id) ;

>> I also need the biggest column set to "X" for the largest number in a particular group,so 3.1 is larger than 3.0.<<

CREATE VIEW MaxFoobar (group_id, section_nbr, subsection_nbr)
AS
SELECT group_id, section_nbr, MAX(subsection_nbr)
FROM Foobar AS F1, InUseFoobar AS U1
WHERE F1.group_id = .U1.group_id
AND F1.section_nbr = .U1.section_nbr
GROUP BY group_id, section_nbr ;

<>

No comments: