Summarize hierarchical data with Linq to DB

I have a database that is represented(roughly) by the following structure. Note that since Entity Framework generates it, the GamePlayer class has the reference to a parent Game instance and GameAction class has the reference to a parent GamePlayer instance and corresponding Ids.

    public class Player
    {
        public int Id { get; set; }
        public string Name { get; set; }
    }
    public class Game
    {
        public int Id { get; set; }
        public IEnumerable<GamePlayer> Players { get; private set; }

    }
    public class GamePlayer
    {
        public int Id { get; set; }
        public int PlayerId{get;set;}
        public int GameId { get; set; }
        public Game Game { get; set; }
        public IEnumerable<GameAction> Actions { get; private set; }
    }
    public class GameAction
    {
        public int Id { get; set; }
        public ActionType ActionType { get; set; }
        public int? Amount { get; set; }
        public int GamePlayerId { get; set; }
        public GamePlayer GamePlayer { get; set; }
    }
    public enum ActionType
    {
        BET,
        FOLD,
        RAISE
    }

A single player can play multiple games and perform various actions.

I want to summarize the data to get the following statistic for each player:

  • Number of games played
  • number of bets made
  • the sum of all bets made

For example:

Player Name Games Bets Amount
John 120 20 980
Paul 90 10 5
Garry 200 100 1500

The database contains thousands of players, millions of games, and even more actions. Therefore, ideally, I would like to write a Linq request that runs on the server-side to prevent all the records from being loaded to the client.

Is it feasible? Or should I resort to stored procedures and pure SQL queries on the DB level?


Solution 1:

The desired LINQ query could be written in two ways - top-down without GroupBy, just following the naturally grouped data Player -> GamePlayer -> GameAction, or bottom-up GameAction -> GamePlayer and GroupBy (the traditional SQL way).

The first approach would generate many correlated subqueries (server side, but still), so probably it is better to follow the SQL way. Moreover EF Core 6.0 supports distinct count translation which is needed here to count the games duplicated by actions.

Here is the query which returns the data you want by PlayerId. You can always join it to Player in case you need other player info.

var query =
    from gp in db.Set<GamePlayer>()
    from ga in gp.Actions
    group new { gp, ga } by new { gp.PlayerId } into g
    select new
    {
        PlayerId = g.Key.PlayerId,
        Games = g.Select(e => e.gp.GameId).Distinct().Count(),
        Bets = g.Count(e => e.ga.ActionType == ActionType.BET),
        Amount = g.Sum(e => e.ga.ActionType == ActionType.BET ? e.ga.Amount : null),
    };

which for SqlServer is translated as

SELECT [g].[PlayerId], COUNT(DISTINCT ([g].[GameId])) AS [Games], COUNT(CASE
    WHEN [g0].[ActionType] = 0 THEN 1
END) AS [Bets], COALESCE(SUM(CASE
    WHEN [g0].[ActionType] = 0 THEN [g0].[Amount]
    ELSE NULL
END), 0) AS [Amount]
FROM [GamePlayer] AS [g]
INNER JOIN [GameAction] AS [g0] ON [g].[Id] = [g0].[GamePlayerId]
GROUP BY [g].[PlayerId]

which is very close to the what one would write in SQL.