SQL Apprentice Question
In a DB2 UDB LUW table, I have a table with pairs of equivalent ID's.
What I want to do is assign all equivalent IDs to the same group
number, including those that are transitively related, i.e., if A = B
and B = C then A = C, so I'd group all three together.
Although they're not related in a composition fashion per se, it seems
like the way to go conceptually is to consider the ID relationships as
a reporting tree (ID_A would be the manager and ID_B would be the
employee) and assign all IDs that share the same root to the same
group.
For instance, let's say I have the following pairs
ID_A ID_B
---- ----
1800 1804
1800 1808
1806 1809
1808 1810
1808 1812
1809 1815
1810 1811
I'd have two trees (sideways):
1800 1804
1808
1810
1811
1812
and
1806
1809
1815
I'm struggling with the following:
1. How to group based on a shared *root* (I'd hate to have to build a
chain column, e.g., for 1811: "1800-->1808-->1810" and do something
like DENSE_RANK() OVER(ORDER BY SUBSTR(CHAIN,1,4)--that seems
unreliable, and the ID is not always the same length)
2. How to write a recursive CTE that accomodates multiple, independent,
trees.
What I'd like to end up with is this:
ID GRP
---- ---
1800 1
1804 1
1808 1
1810 1
1811 1
1812 1
1806 2
1809 2
1815 2
I feel like I'm close--I've read Serge's "CONNECT BY" article and
Molinaro's chapter "Hierarchical Queries" in his _SQL Cookbook_, but
I'm just not able to stitch it all together.
Would anyone care to lend a hand?
Thanks
Celko Answers
>> I feel like I'm close--I've read Serge's "CONNECT BY" article and Molinaro's chapter "Hierarchical Queries" in his _SQL Cookbook_, but I'm just not able to stitch it all together. <<
You should have been reading Celko's TREES & HIERARCHIES IN SQL instead
:)!
What you have is a forest, not a tree and you can do this with a nested
sets model
CREATE TABLE FoobarForest
(foobar_id INTEGER NOT NULL PRIMARY KEY, --assumption
tree_nbr INTEGER NOT NULL,
lft INTEGER NOT NULL CHECK (lft > 0),
rgt INTEGER NOT NULL,
CHECK (lft < rgt),
UNIQUE (tree_nbr, lft));
Instead of a recursive CTE, pick the roots and use a simple push-down
stack for tree traversal. Here is a SQL/PSM version for one tree:
- Tree holds the adjacency model
CREATE TABLE Tree
(node CHAR(10) NOT NULL,
parent CHAR(10));
-- Stack starts empty, will holds the nested set model
CREATE TABLE Stack
(stack_top INTEGER NOT NULL,
node CHAR(10) NOT NULL,
lft INTEGER,
rgt INTEGER);
CREATE PROCEDURE TreeTraversal ()
LANGUAGE SQL
DETERMINISTIC
BEGIN ATOMIC
DECLARE counter INTEGER;
DECLARE max_counter INTEGER;
DECLARE current_top INTEGER;
SET counter = 2;
SET max_counter = 2 * (SELECT COUNT(*) FROM Tree);
SET current_top = 1;
--clear the stack
DELETE FROM Stack;
-- push the root
INSERT INTO Stack
SELECT 1, node, 1, max_counter
FROM Tree
WHERE parent IS NULL;
-- delete rows from tree as they are used
DELETE FROM Tree WHERE parent IS NULL;
WHILE counter <= max_counter- 1
DO IF EXISTS (SELECT *
FROM Stack AS S1, Tree AS T1
WHERE S1.node = T1.parent
AND S1.stack_top = current_top)
THEN BEGIN -- push when top has subordinates and set lft value
INSERT INTO Stack
SELECT (current_top + 1), MIN(T1.node), counter, NULL
FROM Stack AS S1, Tree AS T1
WHERE S1.node = T1.parent
AND S1.stack_top = current_top;
-- delete rows from tree as they are used
DELETE FROM Tree
WHERE node = (SELECT node
FROM Stack
WHERE stack_top = current_top + 1);
-- housekeeping of stack pointers and counter
SET counter = counter + 1;
SET current_top = current_top + 1;
END;
ELSE
BEGIN -- pop the stack and set rgt value
UPDATE Stack
SET rgt = counter,
stack_top = -stack_top -- pops the stack
WHERE stack_top = current_top;
SET counter = counter + 1;
SET current_top = current_top - 1;
END;
END IF;
END WHILE;
-- SELECT node, lft, rgt FROM Stack;
-- the top column is not needed in the final answer
-- move stack contents to new tree table
END;
Monday, July 10, 2006
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment