Select rows with Max(Column Value) for each unique combination of two other columns

I'm working with a sample table like below. A Dataset has multiple groups, and each time a write to the table occurs, the RunNumber increments for the dataset, along with data for each group and the total. Each Dataset/Group combo will usually have multiple rows, example below:

RunNumber	Group	Dataset	Total
1	Group1	Dataset A	10
1	Group1	Dataset A	20
2	Group1	Dataset A	30
2	Group2	Dataset A	15
1	Group1	Dataset B	5
1	Group2	Dataset B	10
1	Group3	Dataset A	30
2	Group3	Dataset A	30
1	Group1	Dataset C	15
1	Group2	Dataset C	50
2	Group2	Dataset C	70
2	Group2	Dataset C	90

What I want to do is essential for each combination of Dataset and Group, return all data for rows that have the max(RunNumber) for the given Dataset/Group combination. So for example, the above sample would return this:

RunNumber	Group	Dataset	Total
2	Group1	Dataset A	30
2	Group2	Dataset A	15
1	Group1	Dataset B	5
1	Group2	Dataset B	10
2	Group3	Dataset A	30
1	Group1	Dataset C	15
2	Group2	Dataset C	70
2	Group2	Dataset C	90

Where the Dataset/Groups match, all rows are kept with the max RunNumber for that given combo. For now, I've split this into 2 separate queries, where i first query for the max(RunNumber) for all distinct Dataset/Group combos, then do a select * for all matches. Any help would be appreciated, thanks in advance!

In MySQL 5.x you can use a sub-query.

SELECT * 
FROM your_table
WHERE (`Group`, Dataset, RunNumber) IN (
    SELECT `Group`, Dataset, MAX(RunNumber) AS MaxRunNumber
    FROM your_table
    GROUP BY `Group`, Dataset
  );

Test on db<>fiddle here

Alternatives

--
-- LEFT JOIN on bigger
--
SELECT t.* 
FROM your_table t
LEFT JOIN your_table t2
  ON t2.`Group` = t.`Group`
 AND t2.Dataset = t.Dataset
 AND t2.RunNumber > t.RunNumber
WHERE t2.RunNumber IS NULL
ORDER BY t.`Group`, t.Dataset;

--
-- where NOT EXISTS on bigger
--
SELECT * 
FROM your_table t
WHERE NOT EXISTS (
  SELECT 1
  FROM your_table t2
  WHERE t2.`Group` = t.`Group`
    AND t2.Dataset = t.Dataset
    AND t2.RunNumber > t.RunNumber
)
ORDER BY `Group`, Dataset;

--
-- Emulating DENSE_RANK = 1 with variables
-- Works also in 5.x
--
SELECT RunNumber, `Group`, Dataset, Total
FROM
(
  SELECT 
    @rnk:=IF(@ds=Dataset AND @grp=`Group`, IF(@run=RunNumber, @rnk, @rnk+1), 1) AS Rnk
  , @grp := `Group` as `Group`
  , @ds := Dataset as Dataset
  , @run := RunNumber as RunNumber
  , Total
  FROM your_table t
  CROSS JOIN (SELECT @grp:=null, @ds:=null, @run:=null, @rnk := 0) var
  ORDER BY `Group`, Dataset, RunNumber DESC
) q
WHERE Rnk = 1
ORDER BY `Group`, Dataset;

--
-- DENSE_RANK = 1
-- MySql 8 and beyond. 
--
SELECT *
FROM
(
  SELECT *
  , DENSE_RANK() OVER (PARTITION BY `Group`, Dataset ORDER BY RunNumber DESC) AS rnk
  FROM your_table
) q
WHERE rnk = 1
ORDER BY `Group`, Dataset;

Select rows with Max(Column Value) for each unique combination of two other columns

Related

Recent Posts