Hierarchical/tree database for directories path in filesystem

I want to store the directories (present on the disk) into a database, maintaining their hierarchical/tree structure.

Here's a figure:

                         (ROOT)
                       /        \ 
                    Dir2        Dir3
                   /    \           \
                 Dir4   Dir5        Dir6
                 /          
               Dir7

I am using the SQLite database.

Please suggest me:

  1. the SQL query to store above structure in SQLite database, and

  2. a query to retrieve full path of the directory when I select one.

    i.e. suppose I select Dir7 then I should get the full path like ROOT/Dir2/Dir4/Dir7


Solution 1:

Here's a quick closure table example for SQLite. I've not included the statements for inserting items into an existing tree. Instead, I've just created the statements manually. You can find the insert and delete statements in the Models for hierarchical data slides.

For the sake of my sanity when inserting the IDs for the directories, I renamed the directories to match their IDs:

        (ROOT)
      /        \ 
    Dir2        Dir3
    /    \           \
  Dir4   Dir5        Dir6
  /          
Dir7

Create tables

CREATE TABLE `filesystem` (
  `id` INTEGER,
  `dirname` TEXT,
  PRIMARY KEY (`id`)
);

CREATE TABLE `tree_path` (
  `ancestor` INTEGER,
  `descendant` INTEGER,
  PRIMARY KEY (`ancestor`, `descendant`)
);

Insert directories into filesystem table

INSERT INTO filesystem (id, dirname) VALUES (1, 'ROOT');
INSERT INTO filesystem (id, dirname) VALUES (2, 'Dir2');
INSERT INTO filesystem (id, dirname) VALUES (3, 'Dir3');
INSERT INTO filesystem (id, dirname) VALUES (4, 'Dir4');
INSERT INTO filesystem (id, dirname) VALUES (5, 'Dir5');
INSERT INTO filesystem (id, dirname) VALUES (6, 'Dir6');
INSERT INTO filesystem (id, dirname) VALUES (7, 'Dir7');

Create the closure table paths

INSERT INTO tree_path (ancestor, descendant) VALUES (1, 1);
INSERT INTO tree_path (ancestor, descendant) VALUES (1, 2);
INSERT INTO tree_path (ancestor, descendant) VALUES (1, 3);
INSERT INTO tree_path (ancestor, descendant) VALUES (1, 4);
INSERT INTO tree_path (ancestor, descendant) VALUES (1, 5);
INSERT INTO tree_path (ancestor, descendant) VALUES (1, 6);
INSERT INTO tree_path (ancestor, descendant) VALUES (1, 7);
INSERT INTO tree_path (ancestor, descendant) VALUES (2, 2);
INSERT INTO tree_path (ancestor, descendant) VALUES (2, 4);
INSERT INTO tree_path (ancestor, descendant) VALUES (2, 5);
INSERT INTO tree_path (ancestor, descendant) VALUES (2, 7);
INSERT INTO tree_path (ancestor, descendant) VALUES (3, 3);
INSERT INTO tree_path (ancestor, descendant) VALUES (3, 6);
INSERT INTO tree_path (ancestor, descendant) VALUES (4, 4);
INSERT INTO tree_path (ancestor, descendant) VALUES (4, 7);
INSERT INTO tree_path (ancestor, descendant) VALUES (5, 5);
INSERT INTO tree_path (ancestor, descendant) VALUES (6, 6);
INSERT INTO tree_path (ancestor, descendant) VALUES (7, 7);

Run some queries

# (ROOT) and subdirectories
SELECT f.id, f.dirname FROM filesystem f
  JOIN tree_path t
    ON t.descendant = f.id
 WHERE t.ancestor = 1;

+----+---------+
| id | dirname |
+----+---------+
|  1 | ROOT    |
|  2 | Dir2    |
|  3 | Dir3    |
|  4 | Dir4    |
|  5 | Dir5    |
|  6 | Dir6    |
|  7 | Dir7    |
+----+---------+


# Dir3 and subdirectories
SELECT f.id, f.dirname
  FROM filesystem f
  JOIN tree_path t
    ON t.descendant = f.id
 WHERE t.ancestor = 3;

+----+---------+
| id | dirname |
+----+---------+
|  3 | Dir3    |
|  6 | Dir6    |
+----+---------+

# Dir5 and parent directories
SELECT f.id, f.dirname
  FROM filesystem f
  JOIN tree_path t
    ON t.ancestor = f.id
 WHERE t.descendant = 5;

+----+---------+
| id | dirname |
+----+---------+
|  1 | ROOT    |
|  2 | Dir2    |
|  5 | Dir5    |
+----+---------+

# Dir7 and parent directories
SELECT f.id, f.dirname
  FROM filesystem f
  JOIN tree_path t
    ON t.ancestor = f.id
 WHERE t.descendant = 7;

+----+---------+
| id | dirname |
+----+---------+
|  1 | ROOT    |
|  2 | Dir2    |
|  4 | Dir4    |
|  7 | Dir7    |
+----+---------+

SELECT f.id, f.dirname
  FROM filesystem f
  JOIN tree_path t
    ON t.ancestor = f.id
 WHERE t.descendant = (
SELECT id
  FROM filesystem
 WHERE dirname LIKE '%7%'
);

+----+---------+
| id | dirname |
+----+---------+
|  1 | ROOT    |
|  2 | Dir2    |
|  4 | Dir4    |
|  7 | Dir7    |
+----+---------+

Solution 2:

I think you should read about a method caled Modified Preorder Tree Traversal: http://www.sitepoint.com/hierarchical-data-database/

That link discusses about two methods to storing hierarchical data into relational databases: the adjacency list model and the modified preorder tree traversal algorithm.

The main idea of Modified Preorder Tree Traversal method is annotate all nodes with pointers to auxiliate the navigation and sub-tree selection: enter image description here