Achieve hierarchy, Parent/Child Relationship in an effective and easy way

I have a table like

create table site
(
site_Id int(5),
parent_Id int(5),
site_desc varchar2(100)
);

Significance of the fields:

  • site_Id : Id of the sites
  • parent_Id : Parent id of the site
  • site_desc : though not relevant to the question but it has the description of the site

The requirement is that if I have a site_id as an input, and I need all the ids tagged below the site. For Example:

                    A
                   / \
                  B   C
                / | \ /\
               D  E F G H
              /\
             I  J

All the nodes are the site_Id.

The table contains data like this:

Site_id  | Parent_ID  |  site_desc
_________|____________|___________
 A       |   -1       |   
 B       |    A       |
 C       |    A       |
 D       |    B       |
 E       |    B       |
 F       |    B       |
 I       |    D       |
 J       |    D       |

......

A is the parent of B and C and so on.

If B is the input given then the query need to fetch D, E, I, F, J

It is currently achieved through multiple queries in a loop, but I was thinking to achieve this in a minimum number of queries.

What I am currently doing is::

down vote

The algorithm goes like this :

Initially create a data set object which you will populate, by fetching data from the data base. 
Create a method which takes the parent id as parameter and returns its child nodes if present, and returns -1, if it doesnt have a child. 
Step1: Fetch all the rows, which doesn't have a parent(root) node. 
Step2: Iterate through this result. For example if prod1 and prod2 are the initial returned nodes, in the resultset. 
Iterating this RS we get prod1, and we insert a row in our DataSET obj. 
Then we send the id of prod1 to getCHILD method, to get its child, and then again we iterate the returned resultset, and again call the getCHILD method, till we dont get the lowest node.

I need the best optimized technique within my data model constraint. Feel free to answer if you have any suggestion.
Please suggest anything. Thanks in advance.


Solution 1:

Unfortunately, if you can't change the data model, and you're using MySQL, you're stuck in a situation where you need recursive queries and you're using a DBMS that doesn't support recursive queries.

Quassnoi wrote an interesting series of blog articles, showing techniques for querying hierarchical data. His solutions are quite clever, but very complex. http://explainextended.com/2009/03/17/hierarchical-queries-in-mysql/

PostgreSQL is another open-source RDBMS, which does support recursive queries, so you could fetch a whole tree stored in the way you show. But if you can't change the data model, I'd assume you can't switch to a different RDBMS.

There are several alternative data models that make it much easier to fetch arbitrarily-deep trees:

  • Closure Table
  • Nested Sets aka Modified Preorder Tree Traversal
  • Path Enumeration aka Materialized Path

I cover these in my presentation Models for Hierarchical Data with SQL and PHP, and in my book SQL Antipatterns: Avoiding the Pitfalls of Database Programming.

Finally, there's another solution that I've seen used in the code for Slashdot, for their comments hierarchies: They store "parent_id" like in Adjacency List, but they also store a "root_id" column. Every member of a given tree has the same value for root_id, which is the highest ancestor node in its tree. Then it's easy to fetch a whole tree in one query:

SELECT * FROM site WHERE root_id = 123;

Then your application fetches all the nodes back from the database into an array, and you have to write the code to loop over this array, inserting the nodes into a tree data structure in memory. This is a good solution if you have many separate trees, and each tree has relatively few entries. It's good for Slashdot's case.

Solution 2:

Yesterday, I had answered this question which is exactly related to your problem you've described: Out of a given adjacency list, you want to get all child nodes of a particular parent - and perhaps in a one-dimensional array that you can easily iterate over.

You can do this using only one call to the database, but there's sort of a catch: you have to return all rows from the table. MySQL doesn't support recursive queries, so instead, you essentially have to do the SELECTing in the application code.

I'm simply going to reiterate my answer that I linked to above, but basically if you return a result-set (perhaps from PDOStatement->fetchAll(PDO::FETCH_ASSOC) or other methods) in the format of something like:

Array
(
    [0] => Array
    (
        [site_id] => A
        [parent_id] => -1
        [site_desc] => testtext
    )
    [1] => Array
    (
        [site_id] => B
        [parent_id] => A
        [site_desc] => testtext
    )
    [2] => Array
    (
        [site_id] => C
        [parent_id] => A
        [site_desc] => testtext
    )
    [3] => Array
    (
        [site_id] => D
        [parent_id] => B
        [site_desc] => testtext
    )
    [4] => Array
    (
        [site_id] => E
        [parent_id] => B
        [site_desc] => testtext
    )
    [5] => Array
    (
        [site_id] => F
        [parent_id] => B
        [site_desc] => testtext
    )
    [6] => Array
    (
        [site_id] => I
        [parent_id] => D
        [site_desc] => testtext
    )
    [7] => Array
    (
        [site_id] => J
        [parent_id] => D
        [site_desc] => testtext
    )
)

You can retrieve all children/grandchildren/greatgrandchildren/so-on of any site_id (provided you know the id) using this recursive function:

function fetch_recursive($src_arr, $id, $parentfound = false, $cats = array())
{
    foreach($src_arr as $row)
    {
        if((!$parentfound && $row['site_id'] == $id) || $row['parent_id'] == $id)
        {
            $rowdata = array();
            foreach($row as $k => $v)
                $rowdata[$k] = $v;
            $cats[] = $rowdata;
            if($row['parent_id'] == $id)
                $cats = array_merge($cats, fetch_recursive($src_arr, $row['site_id'], true));
        }
    }
    return $cats;
}

For example, say you wanted to retrieve all children of site_id D, you would use the function like so:

$nodelist = fetch_recursive($pdostmt->fetchAll(PDO::FETCH_ASSOC), 'D');
print_r($nodelist);

Would output:

[0] => Array
(
    [site_id] => D
    [parent_id] => B
    [site_desc] => testtext
)
[1] => Array
(
    [site_id] => I
    [parent_id] => D
    [site_desc] => testtext
)
[2] => Array
(
    [site_id] => J
    [parent_id] => D
    [site_desc] => testtext
)

Notice we retain the information of the parent, along with its children, grandchildren, and so on (however deep the nesting goes).

Solution 3:

Check out the nested set model, if you want to be able to do this in single queries: http://mikehillyer.com/articles/managing-hierarchical-data-in-mysql/

Another alternative is to include all relationships in a linking table. So every site would have a link to its parent, its grandparent, etc. Every relationship is explicit. Then you just query that linkage table to get all descendants.

Solution 4:

To begin with, i would recommend a bit different method for storing a tree: Closure Table. If you want to know more about it, you could find SQL Antipatterns book quite interesting.

That said. The easiest way, in my opinion, to generate such structure is this: http://jsbin.com/omexix/3/edit#javascript

I hope you have no problems reading JavaScript code. I used it because creation of unclassified objects in JavaScript doesn't look so hack-ish. It is possible to implement same without relaying on object (or references) by using multidimensional array, but it kinda looks confusing.

Here is what algorithm does:

  • we loop through the list of nodes, once
  • if node's parent does not exits, placeholder is created in the array
  • if node has no parent, it is place in list of root nodes
  • if node has no placeholder in array, the placeholder is created
  • values from node are assigned to placeholder
  • node is registered to the parent, if it has a parent

This is about it. Basically you generate two lists: with all the nodes, and with only root node.