SQL for Smarties | SQL Programming Style | Trees and Hierarchies in SQL | SQL Puzzles and Answers | Data and Databases


Friday, May 26, 2006

Using Materialized Path to create a paths table

SQL Apprentice Question
I wonder if anyone has any thoughts on using the materialized path as
an intermediate step to create a kind of paths table to represent a
tree hierarchy

Here are some records


EmployeeId ParentEmployeeId MaterializedPath
1 1.
2 1 1.1.
3 1 1.2.
4 2 1.1.1


etc.


To find the descendants of a node using the materialized path is quite
low-cost: an index on MaterializedPath would be usable with the
following query


select
Descendant.EmployeeId
from Employee as Ancestor
inner join Employee as Descendant on Descendant.MaterializedPath like
Ancestor.MaterializedPath + '%'
where Ancestor.EmployeeId = 1


To find the ancestors however with the following query


select
Ancestor.EmployeeId
from Employee as Descendant
inner join Employee as Ancestor on Descendant.MaterializedPath like
Ancestor.MaterializedPath + '%'
where Descendant.EmployeeId = 4


requires a table scan.


This is very expensive for any non-trivial number of nodes where you
need to find a node's ancestors.


But if you treat the MaterializedPath as an intermediate result, it is
a trival to use it to build a denormalized EmployeeAncestor table,
which for the set of data above would be as follows


EmployeeId AncestorEmployeeId
1 1
2 1
2 2
3 1
3 2
4 1
4 2
4 4


I.e. there is one record for each of each node's ancestors (including
the node itself)


This table is easy to maintain using the same trigger tha maintains the
MaterializedPath (although not so easy if you did not maintain the
MaterializedPath itself)


The advantage is that it becomes very much cheaper to calculate the
ancestors of a node:


select
Ancestor.EmployeeId
from
EmployeeAncestor Descendant
inner join Employee Ancestor on Ancestor.EmployeeId =
Descendant.AncestorEmployeeId
where Descendant.EmployeeId = 4


The query to find descendant is similar:


select
Descendant.EmployeeId
from
Employee Ancestor
inner join EmployeeAncestor Descendant on Descendant.AncestorEmployeeId
= Ancestor.EmployeeId
where Ancestor.EmployeeId = 1


So it seems to me that the materialized path is not actually a good
thing to use to find ancestor/descendant relationships of itself, but
by using it to create a simple ancestor node-type table, you can get
slightly better performance for finding descendants, and vastly better
performance results for ancestors.


Celko Answers
Get a copy of TREES & HIERARCHIES IN SQL and look up the Nested Sets
model. Much faster, easier to use in calculations, etc. Avoid path
enumeration and adjacency list models.

No comments: