Managing hierarchical data in MySQL - path enumeration

Previously we saw that they are already two methods, adjacency list and nested set model, to manage hierarchical data in RDBMS. But it's not all. A third method, called path enumeration, permits to handle trees on relational database too.

In this article, like in two previous, we will start by making a theoretical introduction to path enumeration concept. After we will pass to more pragmatic part. First, we will select the data and secondly, make some data manipulation.

What is path enumeration in MySQL ?

Path enumeration looks like a webpage breadcrumb. We start from a root point (most of the time "Home page") and we pass step by step to the current page. A little bit like that : Home page > News > IT > Programming.

In our case, the root is "Home page" (id in database : 1). After we have 2 categories, "News" (id 5) and "IT" (id 10). At the end we retrieve the current category (id 40). To write this relation as a path enumeration, we concatenate all ids in ascending order (from root to current page) and separate them by a character other than a number (for example: /). It should give the following result : /1/5/10/40/. This results is our path enumeration ("databases breadcrumb").

We can already pass to our example with family genealogical tree. First, make a new database structure, without parent, lft and rgt columns. Instead of them we add a "path" column to store the enumeration path :

CREATE TABLE IF NOT EXISTS `family` (
  `id` smallint(5) NOT NULL AUTO_INCREMENT,
  `role` varchar(30) NOT NULL,
  `path` varchar(30) NOT NULL,
  PRIMARY KEY (`id`)
);

-- add some family members
-- parents
INSERT INTO family (id, role, path) VALUES (NULL, 'parents', '/1/');
SET @PARENTS_ID = LAST_INSERT_ID();
UPDATE family SET path = CONCAT('/', @PARENTS_ID, '/') WHERE id = @PARENTS_ID;

-- daughter
INSERT INTO family (id, role, path) VALUES (NULL, 'daughter', '');
SET @DAUGHTER1_ID = LAST_INSERT_ID();
UPDATE family SET path = CONCAT('/', @PARENTS_ID, '/', @DAUGHTER1_ID, '/') WHERE id = @DAUGHTER1_ID;

-- son
INSERT INTO family (id, role, path) VALUES (NULL, 'son', '');
SET @SON1_ID = LAST_INSERT_ID();
UPDATE family SET path = CONCAT('/', @PARENTS_ID, '/', @SON1_ID, '/') WHERE id = @SON1_ID;

-- grandchildren
INSERT INTO family (id, role, path) VALUES (NULL, 'son', '');
SET @D_SON1_ID = LAST_INSERT_ID();
UPDATE family SET path = CONCAT('/', @PARENTS_ID, '/', @DAUGHTER1_ID, '/', @D_SON1_ID, '/') WHERE id = @D_SON1_ID;
INSERT INTO family (id, role, path) VALUES (NULL, 'daughter', '');
SET @D_DAUGHTER1_ID = LAST_INSERT_ID();
UPDATE family SET path = CONCAT('/', @PARENTS_ID, '/', @DAUGHTER1_ID, '/', @D_DAUGHTER1_ID, '/') WHERE id = @D_DAUGHTER1_ID;

INSERT INTO family (id, role, path) VALUES (NULL, 'son', '');
SET @S_SON1_ID = LAST_INSERT_ID();
UPDATE family SET path = CONCAT('/', @PARENTS_ID, '/', @SON1_ID, '/', @S_SON1_ID, '/') WHERE id = @S_SON1_ID;
INSERT INTO family (id, role, path) VALUES (NULL, 'son', '');
SET @S_SON2_ID = LAST_INSERT_ID();
UPDATE family SET path = CONCAT('/', @PARENTS_ID, '/', @SON1_ID, '/', @S_SON2_ID, '/') WHERE id = @S_SON2_ID;

-- great grandchildren
INSERT INTO family (id, role, path) VALUES (NULL, 'daughter', '');
SET @D_D1_DAUGHTER1_ID = LAST_INSERT_ID();
UPDATE family SET path = CONCAT('/', @PARENTS_ID, '/', @DAUGHTER1_ID, '/', @D_DAUGHTER1_ID, '/', @D_D1_DAUGHTER1_ID, '/') WHERE id = @D_D1_DAUGHTER1_ID;
INSERT INTO family (id, role, path) VALUES (NULL, 'son', '');
SET @D_D1_SON1_ID = LAST_INSERT_ID();
UPDATE family SET path = CONCAT('/', @PARENTS_ID, '/', @DAUGHTER1_ID, '/', @D_DAUGHTER1_ID, '/', @D_D1_SON1_ID, '/') WHERE id = @D_D1_SON1_ID;

INSERT INTO family (id, role, path) VALUES (NULL, 'daughter', '');
SET @S_S1_DAUGHTER1_ID = LAST_INSERT_ID();
UPDATE family SET path = CONCAT('/', @PARENTS_ID, '/', @DAUGHTER1_ID, '/', @S_SON1_ID, '/', @S_S1_DAUGHTER1_ID, '/') WHERE id = @S_S1_DAUGHTER1_ID;
INSERT INTO family (id, role, path) VALUES (NULL, 'son', '');
SET @S_S1_SON1_ID = LAST_INSERT_ID();
UPDATE family SET path = CONCAT('/', @PARENTS_ID, '/', @DAUGHTER1_ID, '/', @S_SON1_ID, '/', @S_S1_SON1_ID, '/') WHERE id = @S_S1_SON1_ID;

INSERT INTO family (id, role, path) VALUES (NULL, 'daughter', '');
SET @S_S2_DAUGHTER1_ID = LAST_INSERT_ID();
UPDATE family SET path = CONCAT('/', @PARENTS_ID, '/', @DAUGHTER1_ID, '/', @S_SON2_ID, '/', @S_S2_DAUGHTER1_ID, '/') WHERE id = @S_S2_DAUGHTER1_ID;
INSERT INTO family (id, role, path) VALUES (NULL, 'daughter', '');
SET @S_S2_DAUGHTER2_ID = LAST_INSERT_ID();
UPDATE family SET path = CONCAT('/', @PARENTS_ID, '/', @DAUGHTER1_ID, '/', @S_SON2_ID, '/', @S_S2_DAUGHTER2_ID, '/') WHERE id = @S_S2_DAUGHTER2_ID;
INSERT INTO family (id, role, path) VALUES (NULL, 'daughter', '');
SET @S_S2_DAUGHTER3_ID = LAST_INSERT_ID();
UPDATE family SET path = CONCAT('/', @PARENTS_ID, '/', @DAUGHTER1_ID, '/', @S_SON2_ID, '/', @S_S2_DAUGHTER3_ID, '/') WHERE id = @S_S2_DAUGHTER3_ID;

Note that (at least in MySQL) we can't set the path directly in insert statement. We need to make an update after the generation of inserted row's id.

Select trees with path enumeration in MySQL

As you can imagine, thanks to LIKE clause and standardized path format, the selects are very simple to realize. Let's try to get all family starting by parent's daughter (should have id 2). It's a query which we use :

SELECT * FROM family WHERE path LIKE '/1/2/%' ORDER BY path ASC;

With this query we received a list of family members of the parent's daughter, ordered from the most ancient to the most younger.

Note the presence of two / in the LIKE clause. Thanks to them, we avoid to get a inappropriate members. Imagine the situation when we store a member with a path /1/20/49. The clause LIKE '/1/2%' makes that we retrieve not only family members of parent's daughter (id 2), but also another parent's child (id 20).

Manipulating data with path enumeration in MySQL

Previously we saw the reading operation. Now, we approach the writing ones. First, the removing. To delete a hierarchy of family, we can use the same condition as in reading exemple. This is the query which we can use to delete all family of parent's daughter :

DELETE FROM family WHERE path LIKE '/1/2/%';

Not so difficult, isn't it ? More complex tasks are adding and editing of existing rows. Let's start by adding a new child into parent's daughter (id 2). To add a new row in path enumeration model, we have to find the max path value :

SELECT @maxPath := path FROM family WHERE path REGEXP '/1/2/([0-9]+)/$' LIMIT 1;

In this query, set the new variable, @maxPath, which contains the value of one of paths belonging to parent's daughter children. After, in WHERE part, we look for path corresponding to following RegEx : '/1/2/([0-9]+)/$' (starts by /1/2/, followed by one number and ending by a /).

Now we can split the max path and get the last entry casted to an integer :

-- 1st part
-- select last slash and remove it
SET @lastSlash = LENGTH(@maxPath) - LOCATE('/', REVERSE(@maxPath))+1;
SET @maxPath = SUBSTRING(@maxPath, 1, @lastSlash-1);
-- decomment to debug SELECT @maxPath;

-- 2nd part
-- select the path prefix
SET @lastSlash = LENGTH(@maxPath) - LOCATE('/', REVERSE(@maxPath))+1;
SET @pathPrefix = SUBSTRING(@maxPath, 1, @lastSlash);
-- decomment to debug SELECT @pathPrefix;

What we did previously ? First, we were looking for the index of the last slash. The last slash is always placed as the last character of the path. It's why we used the LENGTH() function. With this index we can normalize the @maxPath variable (/1/2/5/) by removing the final /. In the second part, we were looking for the last slash index again (SET @lastSlash). After we set the @pathPrefix (/1/2) and insert our row :

INSERT INTO family (role, path) VALUES ('son', '');
SET @NEW_SON_ID = LAST_INSERT_ID();
UPDATE family SET path = CONCAT(@pathPrefix, @NEW_SON_ID) WHERE id = @NEW_SON_ID;

Like previously, we insert a new row and after make an update to put the right path.

Editing rows in path enumeration model

Imagine now that we want to move @D_DAUGHTER1_ID (id 5) and all of his children to parent's son (@SON1_ID, id 3) node. Here we will use a little bit of client-side code :

$pdo = new PDO("mysql:host=localhost;dbname=tests", "root", "root");

// 1) get moved item, id = 5
$statement = $pdo->query('SELECT * FROM family WHERE id = 5');
$rowToMove = $statement->fetch(PDO::FETCH_ASSOC);
$likeQueryPath = $rowToMove['path'].'%';

// 2) get path of destination item
$statement = $pdo->query('SELECT * FROM family WHERE id = 3');
$rowDest = $statement->fetch(PDO::FETCH_ASSOC);

// 3) make new base path (/1/2/5/ replaced by /1/3/5
$newPath = $rowDest['path'].$rowToMove['id'].'/';

// 4) get all rows concerned by the moving and update theirs paths
$statement = $pdo->prepare('SELECT * FROM family WHERE path LIKE :path');
$statement->bindParam(':path', $likeQueryPath);
$statement->execute();
$rows = $statement->fetchAll(PDO::FETCH_ASSOC);
foreach ($rows as $r => $row) {
  $path = explode($rowToMove['path'], $row['path']);
  $path[0] = $newPath;
  $updateChild = $pdo->prepare('UPDATE family SET path = :newPath WHERE id = :id');
  $updateChild->bindParam('newPath', implode('', $path));
  $updateChild->bindParam('id', $row['id']);
  $updateChild->execute();
}

The steps are explained in the code. First we get the row concerned by the update. In the second step we retrieve its new parent. In next stage, we make new base path. /1/2/5/ is replaced by /1/3/5/. The last step moves all rows (5 and its children) into new parent. In our situation, we move 3 rows with ids : 5, 8 and 9. Now, let's check if the operation was made correctly. We check it with this query :

SELECT * FROM family WHERE path LIKE '/1/3/5/%' ORDER BY path ASC;

If the returned rows match these ones, it means that the update was made successfully :

+----+----------+-----------+
| id | role     | path      |
+----+----------+-----------+
|  5 | daughter | /1/3/5/   |
|  8 | daughter | /1/3/5/8/ |
|  9 | son      | /1/3/5/9/ |
+----+----------+-----------+

The path enumeration model is another way to manage hierarchical data in SQL. Like two previous models, adjacency and nested set, it's simple to create and maintain. But in some situations, it can be slower than two previous models. It's because the LIKE clause on the query which most the time will take more the time to execute than the search done by indexed columns (adjacency) or exact numbers (lft and rgt columns on nested set).


If you liked it, you should read:

📚 Newsletter Get new posts, recommended reading and other exclusive information every week. SPAM free - no 3rd party ads, only the information about waitingforcode!