Curiosity is bliss    Archive    Feed    About    Search

Julien Couvreur's programming blog and more

Trees in SQL

 

Here is a cool DB trick that a friend of mine showed me a while back. The problem is basically how to efficiently access a tree stored in a relational/SQL database.
The solution where you store the parent of a node in the node/row itself works, but can't be retrieved in-depth without using some kind of a loop or reccursion.

Joe Celko seems to one of the most mentioned DB experts on this matter. Here is an intro he wrote on reprensenting trees as "nested sets".

Introduction
A double numbering or indexing is done globally on the tree, by giving each node/row a left index and right index. These indexes are set sequentially starting with left=1 on the root, and walking through the tree in-depth. The left index is the value of the sequence when you first reach a node (on the left side, if you do a drawing) and the right index is the value of the counter when you reach the node the second time.
Below is an illustration of this numbering:


(from http://i.cmpnet.com/intelligententerprise/gifs/Celko.10.19.Fig1s.gif)

Some of the nice properties of this schema are that non-reccursive queries can be used to retrieve some meaningful parts of the tree:
- a leaf has (right=left+1),
- a subbranch (under a node with leftbase and rightbase as indexes) verifies (leftbase < left AND right < rightbase),
- the path back to the root starting at a node (with leftnode, rightnode as indexes) can be found similarly with (left < leftnode AND rightnode < right).

One of the problem is that each time a new node is inserted, approximately half of the existing nodes/rows need to be updated, to maintain the consistency of this indexing. This could be a problem for trees that are updated frequently.

If you know of any large scale implementation of trees in SQL using this system or any other, please let me know.

Links
Here is another bunch of links on the topic:
First, second and third articles by Celko on dbmsmag.com.
Another tutorial.
A comparison with other methods.

Update:
Celko wrote an article for O'Reilly Networks: Hierarchical SQL.

______________________________________

Found another article that explains this technique, in PHP: http://www.sitepoint.com/article/1105

Posted by: Dumky at June 19, 2003 04:11 PM

http://www.codeproject.com/cs/database/Trees_in_SQL_databases.asp

Posted by: dev@ at October 26, 2004 02:47 PM

Also see Teodor Sigaev and Oleg Bartunov's use of GiST indexes at http://www.sai.msu.su/~megera/postgres/gist/

Posted by: PRinc at April 26, 2005 03:35 PM
comments powered by Disqus