Turn database result into array
Okay, I've written PHP classes that extend the Zend Framework DB table, row, and rowset classes. I've been developing this anyway because I'm speaking at PHP Tek-X in a couple of weeks about hierarchical data models.
I don't want to post all my code to Stack Overflow because they implicitly get licensed under Creative Commons if I do that. update: I committed my code to the Zend Framework extras incubator and my presentation is Models for Hierarchical Data with SQL and PHP at slideshare.
I'll describe the solution in pseudocode. I'm using zoological taxonomy as test data, downloaded from ITIS.gov. The table is longnames
:
CREATE TABLE `longnames` (
`tsn` int(11) NOT NULL,
`completename` varchar(164) NOT NULL,
PRIMARY KEY (`tsn`),
KEY `tsn` (`tsn`,`completename`)
)
I've created a closure table for the paths in the hierarchy of taxonomy:
CREATE TABLE `closure` (
`a` int(11) NOT NULL DEFAULT '0', -- ancestor
`d` int(11) NOT NULL DEFAULT '0', -- descendant
`l` tinyint(3) unsigned NOT NULL, -- levels between a and d
PRIMARY KEY (`a`,`d`),
CONSTRAINT `closure_ibfk_1` FOREIGN KEY (`a`) REFERENCES `longnames` (`tsn`),
CONSTRAINT `closure_ibfk_2` FOREIGN KEY (`d`) REFERENCES `longnames` (`tsn`)
)
Given the primary key of one node, you can get all its descendants this way:
SELECT d.*, p.a AS `_parent`
FROM longnames AS a
JOIN closure AS c ON (c.a = a.tsn)
JOIN longnames AS d ON (c.d = d.tsn)
LEFT OUTER JOIN closure AS p ON (p.d = d.tsn AND p.l = 1)
WHERE a.tsn = ? AND c.l <= ?
ORDER BY c.l;
The join to closure AS p
is to include each node's parent id.
The query makes pretty good use of indexes:
+----+-------------+-------+--------+---------------+---------+---------+----------+------+-----------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+---------------+---------+---------+----------+------+-----------------------------+
| 1 | SIMPLE | a | const | PRIMARY,tsn | PRIMARY | 4 | const | 1 | Using index; Using filesort |
| 1 | SIMPLE | c | ref | PRIMARY,d | PRIMARY | 4 | const | 5346 | Using where |
| 1 | SIMPLE | d | eq_ref | PRIMARY,tsn | PRIMARY | 4 | itis.c.d | 1 | |
| 1 | SIMPLE | p | ref | d | d | 4 | itis.c.d | 3 | |
+----+-------------+-------+--------+---------------+---------+---------+----------+------+-----------------------------+
And given that I have 490,032 rows in longnames
and 4,299,883 rows in closure
, it runs in pretty good time:
+--------------------+----------+
| Status | Duration |
+--------------------+----------+
| starting | 0.000257 |
| Opening tables | 0.000028 |
| System lock | 0.000009 |
| Table lock | 0.000013 |
| init | 0.000048 |
| optimizing | 0.000032 |
| statistics | 0.000142 |
| preparing | 0.000048 |
| executing | 0.000008 |
| Sorting result | 0.034102 |
| Sending data | 0.001300 |
| end | 0.000018 |
| query end | 0.000005 |
| freeing items | 0.012191 |
| logging slow query | 0.000008 |
| cleaning up | 0.000007 |
+--------------------+----------+
Now I post-process the result of the SQL query above, sorting the rows into subsets according to the hierarchy (pseudocode):
while ($rowData = fetch()) {
$row = new RowObject($rowData);
$nodes[$row["tsn"]] = $row;
if (array_key_exists($row["_parent"], $nodes)) {
$nodes[$row["_parent"]]->addChildRow($row);
} else {
$top = $row;
}
}
return $top;
I also define classes for Rows and Rowsets. A Rowset is basically an array of rows. A Row contains an associative array of row data, and also contains a Rowset for its children. The children Rowset for a leaf node is empty.
Rows and Rowsets also define methods called toArrayDeep()
which dump their data content recursively as a plain array.
Then I can use the whole system together like this:
// Get an instance of the taxonomy table data gateway
$tax = new Taxonomy();
// query tree starting at Rodentia (id 180130), to a depth of 2
$tree = $tax->fetchTree(180130, 2);
// dump out the array
var_export($tree->toArrayDeep());
The output is as follows:
array (
'tsn' => '180130',
'completename' => 'Rodentia',
'_parent' => '179925',
'_children' =>
array (
0 =>
array (
'tsn' => '584569',
'completename' => 'Hystricognatha',
'_parent' => '180130',
'_children' =>
array (
0 =>
array (
'tsn' => '552299',
'completename' => 'Hystricognathi',
'_parent' => '584569',
),
),
),
1 =>
array (
'tsn' => '180134',
'completename' => 'Sciuromorpha',
'_parent' => '180130',
'_children' =>
array (
0 =>
array (
'tsn' => '180210',
'completename' => 'Castoridae',
'_parent' => '180134',
),
1 =>
array (
'tsn' => '180135',
'completename' => 'Sciuridae',
'_parent' => '180134',
),
2 =>
array (
'tsn' => '180131',
'completename' => 'Aplodontiidae',
'_parent' => '180134',
),
),
),
2 =>
array (
'tsn' => '573166',
'completename' => 'Anomaluromorpha',
'_parent' => '180130',
'_children' =>
array (
0 =>
array (
'tsn' => '573168',
'completename' => 'Anomaluridae',
'_parent' => '573166',
),
1 =>
array (
'tsn' => '573169',
'completename' => 'Pedetidae',
'_parent' => '573166',
),
),
),
3 =>
array (
'tsn' => '180273',
'completename' => 'Myomorpha',
'_parent' => '180130',
'_children' =>
array (
0 =>
array (
'tsn' => '180399',
'completename' => 'Dipodidae',
'_parent' => '180273',
),
1 =>
array (
'tsn' => '180360',
'completename' => 'Muridae',
'_parent' => '180273',
),
2 =>
array (
'tsn' => '180231',
'completename' => 'Heteromyidae',
'_parent' => '180273',
),
3 =>
array (
'tsn' => '180213',
'completename' => 'Geomyidae',
'_parent' => '180273',
),
4 =>
array (
'tsn' => '584940',
'completename' => 'Myoxidae',
'_parent' => '180273',
),
),
),
4 =>
array (
'tsn' => '573167',
'completename' => 'Sciuravida',
'_parent' => '180130',
'_children' =>
array (
0 =>
array (
'tsn' => '573170',
'completename' => 'Ctenodactylidae',
'_parent' => '573167',
),
),
),
),
)
Re your comment about calculating depth -- or really length of each path.
Assuming you've just inserted a new node to your table that holds the actual nodes (longnames
in the example above), the id of the new node is returned by LAST_INSERT_ID()
in MySQL or else you can get it somehow.
INSERT INTO Closure (a, d, l)
SELECT a, LAST_INSERT_ID(), l+1 FROM Closure
WHERE d = 5 -- the intended parent of your new node
UNION ALL SELECT LAST_INSERT_ID(), LAST_INSERT_ID(), 0;
Proposed Solution
This following example gives a little more than you ask for, but it's a really nice way of doing it and still demonstrates where the information comes from at each stage.
It uses the following table structure:
+--------+------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+--------+------------------+------+-----+---------+----------------+
| id | int(10) unsigned | NO | PRI | NULL | auto_increment |
| parent | int(10) unsigned | NO | | NULL | |
| name | varchar(45) | NO | | NULL | |
+--------+------------------+------+-----+---------+----------------+
Here it is:
<?php
// Connect to the database
mysql_connect('localhost', 'root', '');
mysql_select_db('test');
echo '<pre>';
$categories = Category::getTopCategories();
print_r($categories);
echo '</pre>';
class Category
{
/**
* The information stored in the database for each category
*/
public $id;
public $parent;
public $name;
// The child categories
public $children;
public function __construct()
{
// Get the child categories when we get this category
$this->getChildCategories();
}
/**
* Get the child categories
* @return array
*/
public function getChildCategories()
{
if ($this->children) {
return $this->children;
}
return $this->children = self::getCategories("parent = {$this->id}");
}
////////////////////////////////////////////////////////////////////////////
/**
* The top-level categories (i.e. no parent)
* @return array
*/
public static function getTopCategories()
{
return self::getCategories('parent = 0');
}
/**
* Get categories from the database.
* @param string $where Conditions for the returned rows to meet
* @return array
*/
public static function getCategories($where = '')
{
if ($where) $where = " WHERE $where";
$result = mysql_query("SELECT * FROM categories$where");
$categories = array();
while ($category = mysql_fetch_object($result, 'Category'))
$categories[] = $category;
mysql_free_result($result);
return $categories;
}
}
Test Case
In my database I have the following rows:
+----+--------+-----------------+
| id | parent | name |
+----+--------+-----------------+
| 1 | 0 | First Top |
| 2 | 0 | Second Top |
| 3 | 0 | Third Top |
| 4 | 1 | First Child |
| 5 | 1 | Second Child |
| 6 | 2 | Third Child |
| 7 | 2 | Fourth Child |
| 8 | 4 | First Subchild |
| 9 | 4 | Second Subchild |
+----+--------+-----------------+
And thus the script outputs the following (lengthy) information:
Array
(
[0] => Category Object
(
[id] => 1
[parent] => 0
[name] => First Top
[children] => Array
(
[0] => Category Object
(
[id] => 4
[parent] => 1
[name] => First Child
[children] => Array
(
[0] => Category Object
(
[id] => 8
[parent] => 4
[name] => First Subchild
[children] => Array
(
)
)
[1] => Category Object
(
[id] => 9
[parent] => 4
[name] => Second Subchild
[children] => Array
(
)
)
)
)
[1] => Category Object
(
[id] => 5
[parent] => 1
[name] => Second Child
[children] => Array
(
)
)
)
)
[1] => Category Object
(
[id] => 2
[parent] => 0
[name] => Second Top
[children] => Array
(
[0] => Category Object
(
[id] => 6
[parent] => 2
[name] => Third Child
[children] => Array
(
)
)
[1] => Category Object
(
[id] => 7
[parent] => 2
[name] => Fourth Child
[children] => Array
(
)
)
)
)
[2] => Category Object
(
[id] => 3
[parent] => 0
[name] => Third Top
[children] => Array
(
)
)
)
Example Usage
I'd suggest creating some kind of recursive function if you're going to create menus from the data:
function outputCategories($categories, $startingLevel = 0)
{
$indent = str_repeat(" ", $startingLevel);
foreach ($categories as $category)
{
echo "$indent{$category->name}\n";
if (count($category->children) > 0)
outputCategories($category->children, $startingLevel+1);
}
}
$categories = Category::getTopCategories();
outputCategories($categories);
which would output the following:
First Top
First Child
First Subchild
Second Subchild
Second Child
Second Top
Third Child
Fourth Child
Third Top
Enjoy
I loved the answer from icio, but I prefer to have arrays of arrays, rather than arrays of objects. Here is his script modified to work without making objects:
<?php
require_once('mysql.php');
echo '<pre>';
$categories = Taxonomy::getTopCategories();
print_r($categories);
echo '</pre>';
class Taxonomy
{
public static function getTopCategories()
{
return self::getCategories('parent_taxonomycode_id = 0');
}
public static function getCategories($where = '')
{
if ($where) $where = " WHERE $where";
$result = mysql_query("SELECT * FROM taxonomycode $where");
$categories = array();
// while ($category = mysql_fetch_object($result, 'Category'))
while ($category = mysql_fetch_array($result)){
$my_id = $category['id'];
$category['children'] = Taxonomy::getCategories("parent_taxonomycode_id = $my_id");
$categories[] = $category;
}
mysql_free_result($result);
return $categories;
}
}
I think it fair to note that both my answer, and icios do not address your question directly. They both rely on having a parent id link in the main table, and make no use of the closure table. However, recursively querying the database is definitely the way to do, but instead of recursively passing the parent id, you have to pass in the parent id AND the level of the depth (which should increase by one on each recursion) so that the queries at each level can use parent + depth to get the direct parent information from the closure table rather than having it in the main table.
HTH, -FT