Category tree with nested sets and Extbase (1)

In computer science a tree is a data structure representing a hierarchical structure. Within a tree you have nodes. Each node can have multiple children but a node only can have one parent node. There is one special node called root which is the starting point of the tree and there are leaves - nodes that have no further children.

Tree with 5 nodes. Root has two childern: an inner node (with two more children) and a leaf. Inner node has root as its parent.

So if you implement trees in an arbitrary programming language, you would have a node class similar to this one: 

class Node {

    String label;

    Node parent;

    Array<Node> children;

}
Pseudo implementation of a node

The problem arises if you try to save such objects into a relational database since recursive queries are required if you select rows for nodes referenced by parent IDs.

Naive database representation of above tree.

Nested sets

This is where nested sets get into the play. Nested sets allow you to store trees into a relational database avoiding recursive queries. For those of you who are familiar with tree traversals in depth-first order, this is nothing spectacular. Here is how it works. Assume you want to visit every node of our tree. Whenever you visit a node, you go to its leftmost child and do this until you come to a leaf. When doing this, you put a number on the node, when you first did visit it. You then use a second number marking the last visit of a node. On a leaf, a first visit is also a last visit so you have two numbers.

The above tree would have the following numbers after doing this:

Above tree marked with number of first visit in depth-first traversal (green) and last visit in depth-first traversal (red).

So we slightly change the implementation of our node class from above and add left (first time of visit) and right (last time of visit). Although still having parent and children in our implementation is a little redundancy, we leave those properties to make it easier to use our tree later on.

class Node {

    int left;

    int right;

    String label;

    Node parent;

    Array<Node> children;

}
Pseudo implementation of a nested set node

So you might ask yourself: what is all this for? You will get the benefits of this implementation in a second. Before let's look at how we store our nodes to the database now:

Database table for nested set node storage. Parent id is no longer required. We add a 'root' field to enable storage of multiple trees in a single table.

So here comes the benefit of this technique: We can select the records of our tree with a single SQL statement - even if we want to have the depth of the node within the tree. This query even preserves the order:

SELECT n.label, COUNT(*)-1 AS depth
FROM nested_set_nodes AS n,
         nested_set_nodes AS p
WHERE n.left BETWEEN p.left AND p.right
GROUP BY n.left
ORDER BY n.left;

Drawbacks on nested sets: Insertion and deletion of nodes

So as we have seen nested sets provide us a nice way to store tree data in a relational database. This holds for reading records from database. Adding new nodes and deleting nodes turns out to be a little more complicated.

Assume we want to delete node with ID 4 from our above tree. What we have to do is deleting this record and then updating the other nodes left and right number. This done with the following SQL statement:

DELETE FROM nested_set_nodes WHERE left=5;
UPDATE nested_set_nodes SET left=left-2 WHERE left>5;
UPDATE nested_set_nodes SET right=right-2 WHERE right>5;

After node is deleted, our tree looks like this:

Above tree after deleting node with left=5. As you can see, all numbers of nodes "after" this node are decreased by 2.

So after explaining how to delete a node, let's assume we want to add a new node in our first tree (before deleting a node). Let's further assume we want to add this node as a new child of node 3 (with left=3). Therefore we use the following SQL statement:

UPDATE nested_set_nodes SET right=right+2 WHERE right >= 4;
UPDATE nested_set_nodes SET left=left+2 WHERE left > 3;
INSERT INTO nested_set_nodes (label,left,right) VALUES ('new node', 4, 5);

If we have inserted this node, our tree looks like this:

For inserting a node in our tree, we have to increase all left values after the node we insert by 2 and increase the right values of the node where we insert and all following nodes by 2.

Conclusion and outlook

As we have seen, nested sets provide an elegant way to store trees in relational databases. In a second article I will show a possible implementation with PHP for TYPO3 and Extbase.

There is a good resource on nested sets (written in German) here: http://www.klempert.de/nested_sets/


 
Inhalt © Michael Knoll 2009-2017  •  Powered by TYPO3  •  TypoScript Blogging by Fabrizio Branca  •  TYPO3 Photo Gallery Management by yag  •  Impressum