Unlike XML files, RDBMS (relation database management system) are the flat and non hierarchical structures. So, the developer has to make a supplementary effort to transform database data into tree structures. With this article, we inaugurate the set of articles dedicated to managing hierarchical data in RDBMS. For our example, we will use the most famous system, MySQL.
A virtual conference at the intersection of Data and AI. This is not a conference for the hype. Its real users talking about real experiences.
- 40+ speakers with the likes of Hannes from Duck DB, Sol Rashidi, Joe Reis, Sadie St. Lawrence, Ryan Wolf from nvidia, Rebecca from lidl
- 12th September 2024
- Three simultaneous tracks
- Panels, Lighting Talks, Keynotes, Booth crawls, Roundtables and Entertainment.
- Topics include (ingestion, finops for data, data for inference (feature platforms), data for ML observability
- 100% virtual and 100% free
👉 Register here
In each article we will approach the presented hierarchical model in 3 situation : retrieving a data, adding new data and deleting an existent entry. In this first article we'll discover the model called adjacency list. The first part of this article will explain this concept. At the second part, we will see how to retrieve data and construct a tree.
What is adjacency list ?
We use adjacency list model when the value of one column corresponds to another row in the same table. When the correspondence is missing, it means that the row with the missing value should be considered as the root element. To illustrate this talking, let's make a sample table with a genealogical tree.
This is a table and data used to our tests :CREATE TABLE IF NOT EXISTS `family` ( `id` smallint(5) NOT NULL AUTO_INCREMENT, `parent` smallint(5) DEFAULT NULL, `role` varchar(30) NOT NULL, PRIMARY KEY (`id`) ); -- add some family members -- parents INSERT INTO family (id, parent, role) VALUES (NULL, NULL, 'parents'); SET @PARENTS_ID = LAST_INSERT_ID(); -- daughter INSERT INTO family (id, parent, role) VALUES (NULL, @PARENTS_ID, 'daughter'); SET @DAUGHTER_ID = LAST_INSERT_ID(); -- son INSERT INTO family (id, parent, role) VALUES (NULL, @PARENTS_ID, 'son'); SET @D_SON_ID = LAST_INSERT_ID(); -- grandchildren INSERT INTO family (id, parent, role) VALUES (NULL, @DAUGHTER_ID, 'son'); SET @D_SON1_ID = LAST_INSERT_ID(); INSERT INTO family (id, parent, role) VALUES (NULL, @DAUGHTER_ID, 'son'); SET @D_SON2_ID = LAST_INSERT_ID(); INSERT INTO family (id, parent, role) VALUES (NULL, @D_SON_ID, 'son'); SET @D_SON3_ID = LAST_INSERT_ID(); INSERT INTO family (id, parent, role) VALUES (NULL, @D_SON_ID, 'daughter'); SET @D_DAUGHTER_ID = LAST_INSERT_ID(); -- great grandchildren INSERT INTO family (id, parent, role) VALUES (NULL, @D_SON2_ID, 'daughter'); INSERT INTO family (id, parent, role) VALUES (NULL, @D_SON2_ID, 'son'); INSERT INTO family (id, parent, role) VALUES (NULL, @D_SON1_ID, 'daughter'); INSERT INTO family (id, parent, role) VALUES (NULL, @D_SON1_ID, 'son'); INSERT INTO family (id, parent, role) VALUES (NULL, @D_DAUGHTER_ID, 'daughter'); INSERT INTO family (id, parent, role) VALUES (NULL, @D_DAUGHTER_ID, 'daughter'); INSERT INTO family (id, parent, role) VALUES (NULL, @D_DAUGHTER_ID, 'daughter');
We see clearly that the relation between family members is done with the column called parent. The parents, treated here as the root node, have this column set to NULL.
Retreive data with adjacency list
Now we can try to construct the full tree. It's quite simple, event if requires a lot of client-side code. The client-side code means here the code written in a programming language as Java or PHP and not the browser. Let's start by retrieving all family members by making the SQL query :
SELECT * FROM family ORDER BY parent ASC;
Returned rows will be ordered by the parent column. At the first position we will see the parents and at the last the great grandchildren. To make a tree, we'll need a little bit o programming code. Beceause of consistency reasons we will use PHP :
$pdo = new PDO('mysql:host=localhost;dbname=tests', 'root', ''); $statement = $pdo->query('SELECT * FROM family ORDER BY parent ASC'); $family = array(); $rows = $statement->fetchAll(PDO::FETCH_ASSOC); // organize family tree foreach ($rows as $r => $row) { $parent = (int)$row['parent']; if (!isset($family[$parent])) { $family[$parent] = array(); } $family[$parent][] = $row; } // starts at the first generation foreach ($family[0] as $m => $members) { echo '<ul>'; echo '<li>'.$members['role'].', id : '.$members['id'].'</li>'; showChildren($members['id']); echo '</ul>'; } // shows children of $parent id function showChildren($parent) { global $family; echo '<ul>'; foreach ($family[$parent] as $m => $members) { echo '<li>'.$members['role'].', id : '.$members['id'].' for parent '.$members['parent']; if (isset($family[$members['id']])) { showChildren($members['id']); } echo '</li>'; } echo '</ul>'; }
As we can see, adjacency list model needs a lot of client-side code. If we have to deal with something more complicated as a simple ul-li tree output, this solution can be insufficient, difficult to maintain and to evolve. But those disadvantages point only to client-side code because adjacency list model at the database is, on the contrary, simple to maintain and evolve. Imagine that you have to add a new family member. Everything to do is to find his parent and put the right value for "parent" column. Even when the data amount is consequent, the data manipulation will be fast.
Manipulating data in adjacency list
Like we saw previously, the data manipulation in adjacency list is simple. Imagine that a new family member was born and we have to add him. To do it, we make a simple SQL query :
INSERT INTO family (id, parent, role) VALUES (NULL, 73, 'daughter');
This simple query will add a daughter to a family member with id 73. Deleting a family members is more complicated than adding but it stills simple. Because of relations between the columns, we delete not only an element but the others elements associated with it too. For example, if we want to delete a son who has two daughters and three grandchildren, we will delete the son with his children and his grandchildren. Firstly, let's make a client-side code to retrieve all rows concerned by the deleting.
$statement = $pdo->query('SELECT par.id AS parentId, chi.id AS childId, grChi.id as grChildId FROM family par LEFT JOIN family chi ON chi.parent = par.id LEFT JOIN family grChi ON grChi.parent = chi.id WHERE par.id = 3'); $family = array(); $rows = $statement->fetchAll(PDO::FETCH_ASSOC); foreach ($rows as $r => $row) { if ($row['parentId'] != '' && !in_array($row['parentId'], $family)) $family[] = $row['parentId']; if ($row['childId'] != '' && !in_array($row['childId'], $family)) $family[] = $row['childId']; if ($row['grChildId'] != '' && !in_array($row['grChildId'], $family)) $family[] = $row['grChildId']; } $placeHolders = implode(',', array_fill(0, count($family), '?')); $deleteStatement = $pdo->prepare('DELETE FROM family2 WHERE id IN('.$placeHolders.')'); $deleteStatement->execute($family);
As we can see, a lot of code were written. First, we have to get all fields concerned by the deleting (suppose that our RDBMS doesn't support cascading deletes). After we make an array with ids ($family) and a placeholder for parameters (puts ? into our IN() case). At the end, we execute the request. Now we can check if the select request returns something :
SELECT par.id AS parentId, chi.id AS childId, grChi.id as grChildId FROM family par LEFT JOIN family chi ON chi.parent = par.id LEFT JOIN family grChi ON grChi.parent = chi.id WHERE par.id = 3
It returns nothing and the rows were deleted correctly. But the price of this delete is a quantity of code to write and maintain after that.
Note that the query could be simplified if we're used InnoDB as MySQL engine with cascade deleting.
Retreiving one node of family tree
Previously we extracted a family node to delete whole hierarchy from the database. Note that the selecting query is fixed and difficult to evolve if you have deeper relations than only parent-children-grandchildren.
This request returns us all family members of the parent's son (id 3). But the problem is that the static character of this query. If in 30 years, the great grandchildren will have their own children (great great children!), the query will have to be changed too by appending a new LEFT JOIN sequence. Unfortunately (for the programmers, not for the family), the family can live 400 years. In case, the SQL query will look like a long snake with the "LEFT JOIN" pattern on its skin.
To resume, the adjacency list model is maybe a simple aproach, based on intern relationship between database's table rows. It's quite simple to use it tree constructions with programming languages. But when stored data is supposed to change frequently by making new relationships, it could be better to think about another solution.