Joe Celko The SQL Apprentice: Trees, recursion, and grouping

SQL Apprentice Question
In a DB2 UDB LUW table, I have a table with pairs of equivalent ID's.
What I want to do is assign all equivalent IDs to the same group
number, including those that are transitively related, i.e., if A = B
and B = C then A = C, so I'd group all three together.

Although they're not related in a composition fashion per se, it seems
like the way to go conceptually is to consider the ID relationships as
a reporting tree (ID_A would be the manager and ID_B would be the
employee) and assign all IDs that share the same root to the same
group.

For instance, let's say I have the following pairs

ID_A ID_B
---- ----
1800 1804
1800 1808
1806 1809
1808 1810
1808 1812
1809 1815
1810 1811

I'd have two trees (sideways):

1800 1804
1808
1810
1811
1812
and

1806
1809
1815

I'm struggling with the following:

1. How to group based on a shared *root* (I'd hate to have to build a
chain column, e.g., for 1811: "1800-->1808-->1810" and do something
like DENSE_RANK() OVER(ORDER BY SUBSTR(CHAIN,1,4)--that seems
unreliable, and the ID is not always the same length)

2. How to write a recursive CTE that accomodates multiple, independent,
trees.

What I'd like to end up with is this:

ID GRP
---- ---
1800 1
1804 1
1808 1
1810 1
1811 1
1812 1
1806 2
1809 2
1815 2

I feel like I'm close--I've read Serge's "CONNECT BY" article and
Molinaro's chapter "Hierarchical Queries" in his _SQL Cookbook_, but
I'm just not able to stitch it all together.

Would anyone care to lend a hand?

Thanks

Celko Answers
>> I feel like I'm close--I've read Serge's "CONNECT BY" article and Molinaro's chapter "Hierarchical Queries" in his _SQL Cookbook_, but I'm just not able to stitch it all together. <<

You should have been reading Celko's TREES & HIERARCHIES IN SQL instead
:)!

What you have is a forest, not a tree and you can do this with a nested
sets model

CREATE TABLE FoobarForest
(foobar_id INTEGER NOT NULL PRIMARY KEY, --assumption
tree_nbr INTEGER NOT NULL,
lft INTEGER NOT NULL CHECK (lft > 0),
rgt INTEGER NOT NULL,
CHECK (lft < rgt),
UNIQUE (tree_nbr, lft));

Instead of a recursive CTE, pick the roots and use a simple push-down
stack for tree traversal. Here is a SQL/PSM version for one tree:

- Tree holds the adjacency model
CREATE TABLE Tree
(node CHAR(10) NOT NULL,
parent CHAR(10));

-- Stack starts empty, will holds the nested set model
CREATE TABLE Stack
(stack_top INTEGER NOT NULL,
node CHAR(10) NOT NULL,
lft INTEGER,
rgt INTEGER);

CREATE PROCEDURE TreeTraversal ()
LANGUAGE SQL
DETERMINISTIC
BEGIN ATOMIC
DECLARE counter INTEGER;
DECLARE max_counter INTEGER;
DECLARE current_top INTEGER;

SET counter = 2;
SET max_counter = 2 * (SELECT COUNT(*) FROM Tree);
SET current_top = 1;

--clear the stack
DELETE FROM Stack;

-- push the root
INSERT INTO Stack
SELECT 1, node, 1, max_counter
FROM Tree
WHERE parent IS NULL;

-- delete rows from tree as they are used
DELETE FROM Tree WHERE parent IS NULL;

WHILE counter <= max_counter- 1
DO IF EXISTS (SELECT *
FROM Stack AS S1, Tree AS T1
WHERE S1.node = T1.parent
AND S1.stack_top = current_top)
THEN BEGIN -- push when top has subordinates and set lft value
INSERT INTO Stack
SELECT (current_top + 1), MIN(T1.node), counter, NULL
FROM Stack AS S1, Tree AS T1
WHERE S1.node = T1.parent
AND S1.stack_top = current_top;

-- delete rows from tree as they are used
DELETE FROM Tree
WHERE node = (SELECT node
FROM Stack
WHERE stack_top = current_top + 1);
-- housekeeping of stack pointers and counter
SET counter = counter + 1;
SET current_top = current_top + 1;
END;
ELSE
BEGIN -- pop the stack and set rgt value
UPDATE Stack
SET rgt = counter,
stack_top = -stack_top -- pops the stack
WHERE stack_top = current_top;
SET counter = counter + 1;
SET current_top = current_top - 1;
END;
END IF;
END WHILE;
-- SELECT node, lft, rgt FROM Stack;
-- the top column is not needed in the final answer
-- move stack contents to new tree table
END;

Joe Celko The SQL Apprentice

Monday, July 10, 2006

Trees, recursion, and grouping

No comments:

Blog Archive

Labels