Until then we approached 3 ways to manage hierarchical data in MySQL : adjacency, nested set and path enumeration. There remains one method which will be covered in this article, closure table, called adjacency relation too.
A virtual conference at the intersection of Data and AI. This is not a conference for the hype. Its real users talking about real experiences.
- 40+ speakers with the likes of Hannes from Duck DB, Sol Rashidi, Joe Reis, Sadie St. Lawrence, Ryan Wolf from nvidia, Rebecca from lidl
- 12th September 2024
- Three simultaneous tracks
- Panels, Lighting Talks, Keynotes, Booth crawls, Roundtables and Entertainment.
- Topics include (ingestion, finops for data, data for inference (feature platforms), data for ML observability
- 100% virtual and 100% free
👉 Register here
Like in previous articles, we'll start by explain the concept and the table structure on which we'll work. Next part will treated the data reading. At the end we'll present how to manipulate data on closure table.
What is closure table ?
Legally, if you want to marry, you and your partner can't have any family tie. It means that you can't have the same ancestors, for example grandparents. The closure table technique uses this dependency of family tie to link one or more rows between them. The idea behind this concept is simple. In our family genealogical tree, the parents have the relation with theirs children, grandchildren and great grandchildren. But theirs children have relation only with parents grandchildren and parents great grandchildren. The grandchildren have only one relation, with the parents great grandchildren. Closure table groups all these relations in a separate table by specifying the ancestor (for example : parents son), the descendant (for example : son's grandchildren) and the depth of relation.
Let's see that in more pragmatic sample, a SQL code which creates family and family_relation tables :
CREATE TABLE family ( id SMALLINT(5) NOT NULL AUTO_INCREMENT, role VARCHAR(50) NOT NULL, PRIMARY KEY(id) ); CREATE TABLE family_relation ( ancestorId SMALLINT(5) NOT NULL, descendantId SMALLINT(5) NOT NULL, depth SMALLINT(5) NOT NULL, PRIMARY KEY(ancestorId, descendantId) ); -- add some family members -- parents INSERT INTO family VALUES (NULL, 'parents'); SET @PARENTS = LAST_INSERT_ID(); -- daughter INSERT INTO family VALUES (NULL, 'parents daughter'); SET @PAR_DAUGHTER = LAST_INSERT_ID(); -- son INSERT INTO family VALUES (NULL, 'parents son'); SET @PAR_SON = LAST_INSERT_ID(); -- grandchildren INSERT INTO family VALUES (NULL, 'parents 1st grandchild (daughter\'s side)'); SET @PAR_DAU_GRAND1 = LAST_INSERT_ID(); INSERT INTO family VALUES (NULL, 'parents 2nd grandchild (daughter\'s side)'); SET @PAR_DAU_GRAND2 = LAST_INSERT_ID(); INSERT INTO family VALUES (NULL, 'parents 1st grandchild (son\'s side)'); SET @PAR_SON_GRAND1 = LAST_INSERT_ID(); INSERT INTO family VALUES (NULL, 'parents 2nd grandchild (son\'s side)'); SET @PAR_SON_GRAND2 = LAST_INSERT_ID(); -- great grandchildren INSERT INTO family VALUES (NULL, 'parents 1st great grandchildren (son\'s side) '); SET @PAR_SON_GRAND1_GREAT1 = LAST_INSERT_ID(); INSERT INTO family VALUES (NULL, 'parents 2nd great grandchildren (son\'s side) '); SET @PAR_SON_GRAND1_GREAT2 = LAST_INSERT_ID(); -- insert some relations now INSERT INTO family_relation VALUES (@PARENTS, @PARENTS, 0); INSERT INTO family_relation VALUES (@PARENTS, @PAR_DAUGHTER, 1); INSERT INTO family_relation VALUES (@PARENTS, @PAR_SON, 1); INSERT INTO family_relation VALUES (@PARENTS, @PAR_DAU_GRAND1, 2); INSERT INTO family_relation VALUES (@PARENTS, @PAR_DAU_GRAND2, 2); INSERT INTO family_relation VALUES (@PARENTS, @PAR_SON_GRAND1, 2); INSERT INTO family_relation VALUES (@PARENTS, @PAR_SON_GRAND2, 2); INSERT INTO family_relation VALUES (@PARENTS, @PAR_SON_GRAND1_GREAT1, 3); INSERT INTO family_relation VALUES (@PARENTS, @PAR_SON_GRAND1_GREAT2, 3); INSERT INTO family_relation VALUES (@PAR_DAUGHTER, @PAR_DAUGHTER, 0); INSERT INTO family_relation VALUES (@PAR_DAUGHTER, @PAR_DAU_GRAND1, 1); INSERT INTO family_relation VALUES (@PAR_DAUGHTER, @PAR_DAU_GRAND2, 1); INSERT INTO family_relation VALUES (@PAR_SON, @PAR_SON, 0); INSERT INTO family_relation VALUES (@PAR_SON, @PAR_SON_GRAND1, 1); INSERT INTO family_relation VALUES (@PAR_SON, @PAR_SON_GRAND2, 1); INSERT INTO family_relation VALUES (@PAR_SON, @PAR_SON_GRAND1_GREAT1, 2); INSERT INTO family_relation VALUES (@PAR_SON, @PAR_SON_GRAND1_GREAT2, 2); INSERT INTO family_relation VALUES (@PAR_DAU_GRAND1, @PAR_DAU_GRAND1, 0); INSERT INTO family_relation VALUES (@PAR_DAU_GRAND2, @PAR_DAU_GRAND2, 0); INSERT INTO family_relation VALUES (@PAR_SON_GRAND1, @PAR_SON_GRAND1, 0); INSERT INTO family_relation VALUES (@PAR_SON_GRAND1, @PAR_SON_GRAND1_GREAT1, 1); INSERT INTO family_relation VALUES (@PAR_SON_GRAND1, @PAR_SON_GRAND1_GREAT2, 1); INSERT INTO family_relation VALUES (@PAR_SON_GRAND2, @PAR_SON_GRAND2, 0); INSERT INTO family_relation VALUES (@PAR_SON_GRAND1_GREAT1, @PAR_SON_GRAND1_GREAT1, 0); INSERT INTO family_relation VALUES (@PAR_SON_GRAND1_GREAT2, @PAR_SON_GRAND1_GREAT2, 0);
Have you seen something strange ? Maybe the relation between two the same rows (@PARENTS - @PARENTS) with depth equals to 0 ? It can seem strange, but in next chapter we'll see that it's not.
Reading data in closure table
Reading data in closure table is based on joining original table with relations one. For example, this simple query will return us all family members for parents son side (son's id is 3) :
SELECT f.* FROM family f JOIN family_relation fr ON f.id = fr.descendantId WHERE fr.ancestorId = 3
The returned rows are following :
+----+------------------------------------------------+ | id | role | +----+------------------------------------------------+ | 3 | parent's son | | 6 | parent's 1st grandchild (son's side) | | 7 | parent's 2nd grandchil (son's side) | | 8 | parent's 1st great grandchildren (son's side) | | 9 | parent's 2nd great grandchildren (son's side) | +----+------------------------------------------------+
As you can see, we got all son's descendants (ids 6, 7, 8, 9) with their ancestor (id 3). Now you understand the presence of relation for the same row with depth 0 ? Thanks to it we can get not only the descendants, but ancestor too. In the case when we wouldn't to get the ancestor, we can simply add a supplementary condition to WHERE clause :
SELECT f.* FROM family f JOIN family_relation fr ON f.id = fr.descendantId WHERE fr.ancestorId = 3 AND fr.depth > 0
And the query result :
+----+------------------------------------------------+ | id | role | +----+------------------------------------------------+ | 6 | parent's 1st grandchild (son's side) | | 7 | parent's 2nd grandchild (son's side) | | 8 | parent's 1st great grandchildren (son's side) | | 9 | parent's 2nd great grandchildren (son's side) | +----+------------------------------------------------+
Deleting data in closure table
Manipulating data in closure table is based only on operations done on relational table. Let's start by deleting one row. Imagine that we delete the parent's 1st grandchild (son's side) with id 6. This row is particular because it's descendant in relations with parents (ids 3 and 1) and ascendant in relations with its children (ids 8 and 9). To delete it completely, we have to operate on descendantId and ancestorId columns. These 3 queries remove the 6th row and their children completely from the database :
DELETE FROM family WHERE id = 6 OR id IN (SELECT descendantId FROM family_relation WHERE ancestorId = 6); DELETE FROM family_relation WHERE descendantId = 6 OR ancestorId = 6;
The first query removes row with id 6 from family table. It removes all of its children too (id IN (...) clause). The last query removes all entries from relation table. Now, execute these queries and check if the delete was made correctly. You can use this query to check it :
SELECT f.* FROM family f JOIN family_relation fr ON f.id = fr.descendantId WHERE fr.ancestorId = 6
You should see a proud SQL message : "Empty set (0.01 sec)".
Adding new row in closure table
To add new data in closure table we'll use a simple PHP script. It will generate all possible paths to inserted row. In our example we want to add a new grandchild from the son's side :
$pdo = new PDO("mysql:host=localhost;dbname=tests", "root", "root"); $insertQueries = array(); // insert new row $role = "parent's 3nd grandchild (son's side)"; $insertSt = $pdo->prepare('INSERT INTO family (role) VALUES (:role)'); $insertSt->bindParam('role', $role); $insertSt->execute(); $lastInsertedId = $pdo->lastInsertId(); $relQuery = $pdo->prepare('INSERT INTO family_relation (ancestorId, descendantId, depth) VALUES (:ancestor, :descendant, 0)'); $relQuery->bindParam('ancestor', $lastInsertedId); $relQuery->bindParam('descendant', $lastInsertedId); $relQuery->execute(); // get ancestors $statement = $pdo->query('SELECT * FROM family_relation WHERE descendantId = 3'); $ancestors = $statement->fetchAll(PDO::FETCH_ASSOC); foreach ($ancestors as $a => $ancestor) { $insertQueries[] = array('ancestor' => $ancestor['ancestorId'], 'depth' => $ancestor['depth']+1); } foreach ($insertQueries as $query) { $relQuery = $pdo->prepare('INSERT INTO family_relation (ancestorId, descendantId, depth) VALUES (:ancestor, :descendant, :depth)'); $relQuery->bindParam('ancestor', $query['ancestor']); $relQuery->bindParam('descendant', $lastInsertedId); $relQuery->bindParam('depth', $query['depth']); $relQuery->execute(); }
First, we insert a new row and get its id ($lastInsertedId, probably equals to 10). After we make a relationship with newly inserted row (separate $relQuery statement). After we look for all ancestors of inserted row parent (WHERE descendantId = 3). By reading all of them, we take the ancestor and increment the depth by one. The incrementation by 1 is done because the ancestor has the relation with descendant's children and not with the descendant himself. At the end we execute all of created queries.
Editing a row in closure table
Modifying a row is a kind of mix of adding and deleting. First thing to do is deleting of all old relations in family_relation table. After we have to find all of new ancestors and insert the relations between editing row and them. In our example we want to move row deleted previously (id 6, to do that, reimport the database structure) into daughter's grandchild. Here we'll use some of client-side code too :
$pdo = new PDO("mysql:host=localhost;dbname=tests", "root", "root"); $inIds = array(); $insertQueries = array(); // get all ancestors of new parent (id 2) and increment the depth by 1 $depths = array(); $statement = $pdo->query('SELECT * FROM family_relation WHERE descendantId = 2'); $ancestors = $statement->fetchAll(PDO::FETCH_ASSOC); foreach ($ancestors as $a => $ancestor) { $depth = $ancestor['depth']+1; $insertQueries[] = array('ancestor' => $ancestor['ancestorId'], 'depth' => $depth, 'descendant' => 6); $depths[$ancestor['ancestorId']] = $depth; } // get all descendants of 6 based on previously retreived relations ($depths array) $statement = $pdo->query('SELECT * FROM family_relation WHERE ancestorId = 6 AND descendantId != 6'); $descendants = $statement->fetchAll(PDO::FETCH_ASSOC); foreach ($descendants as $d => $descendant) { $inIds[] = (int)$descendant['descendantId']; foreach ($depths as $ancestorId => $depth) { // previously, we generated the depths between 6 and its ancestors; it's why we use these values to generate the depths of 6 descendants // for exemple, if we have a generation between 1 and 6 with the depth 2, the depth of 6's first children, will be equal to 3 // this dependency will be valid to 6's grandchildren (depth 2 from 6) too : 2+2 = 4 $insertQueries[] = array('ancestor' => $ancestorId, 'depth' => $depth+$descendant['depth'], 'descendant' => $descendant['descendantId']); } } // delete all rows where $deleteQuery = $pdo->prepare('DELETE FROM family_relation WHERE (descendantId IN ('.implode(',', $inIds).') AND ancestorId != 6 AND depth > 0) OR (descendantId = 6 AND ancestorId != 6)'); $deleteQuery->execute(); foreach ($insertQueries as $query) { $relQuery = $pdo->prepare('INSERT INTO family_relation (ancestorId, descendantId, depth) VALUES (:ancestor, :descendant, :depth)'); $relQuery->bindParam('ancestor', $query['ancestor']); $relQuery->bindParam('descendant', $query['descendant']); $relQuery->bindParam('depth', $query['depth']); $relQuery->execute(); }
The first operation is the retrieving of all ancestors for row with id 6 and all of its children. In the second query we get all of 6's descendants (8 and 9) and prepare new insert queries with them. Next, we remove all old relation between them and other rows than 6 (their parent). We remove the relations between 6 and its ancestors too. At the end we execute insert all of new relations. To check if the edition was made correctly, please execute following statement :
SELECT f.*, fr.* FROM family f JOIN family_relation fr ON f.id = fr.descendantId WHERE fr.descendantId IN(6, 8, 9) OR fr.ancestorId IN (6, 8, 9) ORDER BY fr.depth ASC
If you receive the results like that, the edition was successfully made :
+----+-----------------------------------------------+------------+--------------+-------+ | id | role | ancestorId | descendantId | depth | +----+-----------------------------------------------+------------+--------------+-------+ | 9 | parents 2nd great grandchildren (son's side) | 9 | 9 | 0 | | 8 | parents 1st great grandchildren (son's side) | 8 | 8 | 0 | | 6 | parents 1st grandchild (son's side) | 6 | 6 | 0 | | 9 | parents 2nd great grandchildren (son's side) | 6 | 9 | 1 | | 6 | parents 1st grandchild (son's side) | 2 | 6 | 1 | | 8 | parents 1st great grandchildren (son's side) | 6 | 8 | 1 | | 9 | parents 2nd great grandchildren (son's side) | 2 | 9 | 2 | | 8 | parents 1st great grandchildren (son's side) | 2 | 8 | 2 | | 6 | parents 1st grandchild (son's side) | 1 | 6 | 2 | | 9 | parents 2nd great grandchildren (son's side) | 1 | 9 | 3 | | 8 | parents 1st great grandchildren (son's side) | 1 | 8 | 3 | +----+-----------------------------------------------+------------+--------------+-------+
Through this article we saw how to manage hierarchical data in SQL with closure table. The closure table which stores all of possible relation between the rows thanks to 3 columns : ancestor, descendant and depth of the relation. A main benefit of this solution is its clarity. Thanks to two tables we avoid to mix the relation with the rows information. But the main cost of this is the place that can be taken by the relations table.