Most efficient method of self referencing tree using Entity Framework

Solution 1:

I have successfully mapped hierarchical data using EF.

Take for example an Establishment entity. This can represent a company, university, or some other unit within a larger organizational structure:

public class Establishment : Entity
{
    public string Name { get; set; }
    public virtual Establishment Parent { get; set; }
    public virtual ICollection<Establishment> Children { get; set; }
    ...
}

Here is how the Parent / Children properties are mapped. This way, when you set the Parent of 1 entity, the Parent entity's Children collection is automatically updated:

// ParentEstablishment 0..1 <---> * ChildEstablishment
HasOptional(d => d.Parent)
    .WithMany(p => p.Children)
    .Map(d => d.MapKey("ParentId"))
    .WillCascadeOnDelete(false); // do not delete children when parent is deleted

Note that so far I haven't included your Lineage or Depth properties. You are right, EF doesn't work well for generating nested hierarchical queries with the above relationships. What I finally settled on was the addition of a new gerund entity, along with 2 new entity properties:

public class EstablishmentNode : Entity
{
    public int AncestorId { get; set; }
    public virtual Establishment Ancestor { get; set; }

    public int OffspringId { get; set; }
    public virtual Establishment Offspring { get; set; }

    public int Separation { get; set; }
}

public class Establishment : Entity
{
    ...
    public virtual ICollection<EstablishmentNode> Ancestors { get; set; }
    public virtual ICollection<EstablishmentNode> Offspring { get; set; }

}

While writing this up, hazzik posted an answer that is very similar to this approach. I'll continue writing up though, to provide a slightly different alternative. I like to make my Ancestor and Offspring gerund types actual entity types because it helps me get the Separation between the Ancestor and Offspring (what you referred to as Depth). Here is how I mapped these:

private class EstablishmentNodeOrm : EntityTypeConfiguration<EstablishmentNode>
{
    internal EstablishmentNodeOrm()
    {
        ToTable(typeof(EstablishmentNode).Name);
        HasKey(p => new { p.AncestorId, p.OffspringId });
    }
}

... and finally, the identifying relationships in the Establishment entity:

// has many ancestors
HasMany(p => p.Ancestors)
    .WithRequired(d => d.Offspring)
    .HasForeignKey(d => d.OffspringId)
    .WillCascadeOnDelete(false);

// has many offspring
HasMany(p => p.Offspring)
    .WithRequired(d => d.Ancestor)
    .HasForeignKey(d => d.AncestorId)
    .WillCascadeOnDelete(false);

Also, I did not use a sproc to update the node mappings. Instead we have a set of internal commands that will derive / compute the Ancestors and Offspring properties based on the Parent & Children properties. However ultimately, you end up being able to do some very similar querying as in hazzik's answer:

// load the entity along with all of its offspring
var establishment = dbContext.Establishments
    .Include(x => x.Offspring.Select(y => e.Offspring))
    .SingleOrDefault(x => x.Id == id);

The reason for the bridge entity between the main entity and its Ancestors / Offspring is again because this entity lets you get the Separation. Also, by declaring it as an identifying relationship, you can remove nodes from the collection without having to explicitly call DbContext.Delete() on them.

// load all entities that are more than 3 levels deep
var establishments = dbContext.Establishments
    .Where(x => x.Ancestors.Any(y => y.Separation > 3));

Solution 2:

You could use supporting hierarchy table to do eager loading of unlimited levels of tree.

So, you need to add two collections Ancestors and Descendants, both collection should be mapped as many-to-many to supporting table.

public class Tree 
{
    public virtual Tree Parent { get; set; }
    public virtual ICollection<Tree> Children { get; set; }
    public virtual ICollection<Tree> Ancestors { get; set; }
    public virtual ICollection<Tree> Descendants { get; set; }
}

Ancestors will contain all ancestors (parent, grand-parent, grand-grand-parent, etc.) of the entity and Descendants will contain all the descendants (children, grand-children, grand-grand-children, etc) of the entity.

Now you have to map it with EF Code First:

public class TreeConfiguration : EntityTypeConfiguration<Tree>
{
    public TreeConfiguration()
    {
        HasOptional(x => x.Parent)
            .WithMany(x => x.Children)
            .Map(m => m.MapKey("PARENT_ID"));

        HasMany(x => x.Children)
            .WithOptional(x => x.Parent);

        HasMany(x => x.Ancestors)
            .WithMany(x => x.Descendants)
            .Map(m => m.ToTable("Tree_Hierarchy").MapLeftKey("PARENT_ID").MapRightKey("CHILD_ID"));

        HasMany(x => x.Descendants)
            .WithMany(x => x.Ancestors)
            .Map(m => m.ToTable("Tree_Hierarchy").MapLeftKey("CHILD_ID").MapRightKey("PARENT_ID"));
    }    
}

Now with this structure you could do eager fetch like following

context.Trees.Include(x => x.Descendants).Where(x => x.Id == id).SingleOrDefault()

This query will load entity with id and all of it descenadnts.

You could populate the supporting table with following stored procedure:

CREATE PROCEDURE [dbo].[FillHierarchy] (@table_name nvarchar(MAX), @hierarchy_name nvarchar(MAX))
AS
BEGIN
    DECLARE @sql nvarchar(MAX), @id_column_name nvarchar(MAX)
    SET @id_column_name = '[' + @table_name + '_ID]'
    SET @table_name = '[' + @table_name + ']'
    SET @hierarchy_name = '[' + @hierarchy_name + ']'

    SET @sql = ''
    SET @sql = @sql + 'WITH Hierachy(CHILD_ID, PARENT_ID) AS ( '
    SET @sql = @sql + 'SELECT ' + @id_column_name + ', [PARENT_ID] FROM ' + @table_name + ' e '
    SET @sql = @sql + 'UNION ALL '
    SET @sql = @sql + 'SELECT e.' + @id_column_name + ', e.[PARENT_ID] FROM ' + @table_name + ' e '
    SET @sql = @sql + 'INNER JOIN Hierachy eh ON e.' + @id_column_name + ' = eh.[PARENT_ID]) '
    SET @sql = @sql + 'INSERT INTO ' + @hierarchy_name + ' ([CHILD_ID], [PARENT_ID]) ( '
    SET @sql = @sql + 'SELECT [CHILD_ID], [PARENT_ID] FROM Hierachy WHERE [PARENT_ID] IS NOT NULL '
    SET @sql = @sql + ') '

    EXECUTE (@sql)
END
GO

Or even you could map supporting table to a view:

CREATE VIEW [Tree_Hierarchy]
AS
    WITH Hierachy (CHILD_ID, PARENT_ID) 
    AS 
    (
        SELECT [MySuperTree_ID], [PARENT_ID] FROM [MySuperTree] AS e
        UNION ALL
        SELECT e.[MySuperTree_ID], e.[PARENT_ID] FROM [MySuperTree] AS e 
            INNER JOIN Hierachy AS eh ON e.[MySuperTree_ID] = eh.[PARENT_ID]
    )

    SELECT [CHILD_ID], [PARENT_ID] FROM Hierachy WHERE [PARENT_ID] IS NOT NULL
GO

Solution 3:

I've already spent a while trying to fix a bug in your solution. The stored procedure really don't generate children, grandchildren, etc. Below you will find fixed stored procedure:

CREATE PROCEDURE dbo.UpdateHierarchy AS
BEGIN
  DECLARE @sql nvarchar(MAX)

  SET @sql = ''
  SET @sql = @sql + 'WITH Hierachy(ChildId, ParentId) AS ( '
  SET @sql = @sql + 'SELECT t.Id, t.ParentId FROM dbo.Tree t '
  SET @sql = @sql + 'UNION ALL '
  SET @sql = @sql + 'SELECT h.ChildId, t.ParentId FROM dbo.Tree t '
  SET @sql = @sql + 'INNER JOIN Hierachy h ON t.Id = h.ParentId) '
  SET @sql = @sql + 'INSERT INTO dbo.TreeHierarchy (ChildId, ParentId) ( '
  SET @sql = @sql + 'SELECT DISTINCT ChildId, ParentId FROM Hierachy WHERE ParentId IS NOT NULL '
  SET @sql = @sql + 'EXCEPT SELECT t.ChildId, t.ParentId FROM dbo.TreeHierarchy t '
  SET @sql = @sql + ') '

  EXECUTE (@sql)
END

Mistake: wrong reference. Translating @hazzik code it was:

  SET @sql = @sql + 'SELECT t.ChildId, t.ParentId FROM dbo.Tree t '

but should be

  SET @sql = @sql + 'SELECT h.ChildId, t.ParentId FROM dbo.Tree t '

also I've added code that allows you to update TreeHierarchy table not only when you will populate it.

  SET @sql = @sql + 'EXCEPT SELECT t.ChildId, t.ParentId FROM dbo.TreeHierarchy t '

And the magic. This procedure or rather TreeHierarchy allows you to load Children just by including Ancestors (not Children and not Descendants).

 using (var context = new YourDbContext())
 {
      rootNode = context.Tree
           .Include(x => x.Ancestors)
           .SingleOrDefault(x => x.Id == id);
 } 

Now the YourDbContext will return a rootNode with loaded children, children of rootName's children (grandchildren), and so on.