Using LIMIT within GROUP BY to get N results per group?
You could use GROUP_CONCAT aggregated function to get all years into a single column, grouped by id
and ordered by rate
:
SELECT id, GROUP_CONCAT(year ORDER BY rate DESC) grouped_year
FROM yourtable
GROUP BY id
Result:
-----------------------------------------------------------
| ID | GROUPED_YEAR |
-----------------------------------------------------------
| p01 | 2006,2003,2008,2001,2007,2009,2002,2004,2005,2000 |
| p02 | 2001,2004,2002,2003,2000,2006,2007 |
-----------------------------------------------------------
And then you could use FIND_IN_SET, that returns the position of the first argument inside the second one, eg.
SELECT FIND_IN_SET('2006', '2006,2003,2008,2001,2007,2009,2002,2004,2005,2000');
1
SELECT FIND_IN_SET('2009', '2006,2003,2008,2001,2007,2009,2002,2004,2005,2000');
6
Using a combination of GROUP_CONCAT
and FIND_IN_SET
, and filtering by the position returned by find_in_set, you could then use this query that returns only the first 5 years for every id:
SELECT
yourtable.*
FROM
yourtable INNER JOIN (
SELECT
id,
GROUP_CONCAT(year ORDER BY rate DESC) grouped_year
FROM
yourtable
GROUP BY id) group_max
ON yourtable.id = group_max.id
AND FIND_IN_SET(year, grouped_year) BETWEEN 1 AND 5
ORDER BY
yourtable.id, yourtable.year DESC;
Please see fiddle here.
Please note that if more than one row can have the same rate, you should consider using GROUP_CONCAT(DISTINCT rate ORDER BY rate) on the rate column instead of the year column.
The maximum length of the string returned by GROUP_CONCAT is limited, so this works well if you need to select a few records for every group.
The original query used user variables and ORDER BY
on derived tables; the behavior of both quirks is not guaranteed. Revised answer as follows.
In MySQL 5.x you can use poor man's rank over partition to achieve desired result. Just outer join the table with itself and for each row, count the number of rows lesser than it. In the above case, lesser row is the one with higher rate:
SELECT t.id, t.rate, t.year, COUNT(l.rate) AS rank
FROM t
LEFT JOIN t AS l ON t.id = l.id AND t.rate < l.rate
GROUP BY t.id, t.rate, t.year
HAVING COUNT(l.rate) < 5
ORDER BY t.id, t.rate DESC, t.year
Demo and Result:
| id | rate | year | rank |
|-----|------|------|------|
| p01 | 8.0 | 2006 | 0 |
| p01 | 7.4 | 2003 | 1 |
| p01 | 6.8 | 2008 | 2 |
| p01 | 5.9 | 2001 | 3 |
| p01 | 5.3 | 2007 | 4 |
| p02 | 12.5 | 2001 | 0 |
| p02 | 12.4 | 2004 | 1 |
| p02 | 12.2 | 2002 | 2 |
| p02 | 10.3 | 2003 | 3 |
| p02 | 8.7 | 2000 | 4 |
Note that if the rates had ties, for example:
100, 90, 90, 80, 80, 80, 70, 60, 50, 40, ...
The above query will return 6 rows:
100, 90, 90, 80, 80, 80
Change to HAVING COUNT(DISTINCT l.rate) < 5
to get 8 rows:
100, 90, 90, 80, 80, 80, 70, 60
Or change to ON t.id = l.id AND (t.rate < l.rate OR (t.rate = l.rate AND t.pri_key > l.pri_key))
to get 5 rows:
100, 90, 90, 80, 80
In MySQL 8 or later just use the RANK
, DENSE_RANK
or ROW_NUMBER
functions:
SELECT *
FROM (
SELECT *, RANK() OVER (PARTITION BY id ORDER BY rate DESC) AS rnk
FROM t
) AS x
WHERE rnk <= 5
For me something like
SUBSTRING_INDEX(group_concat(col_name order by desired_col_order_name), ',', N)
works perfectly. No complicated query.
for example: get top 1 for each group
SELECT
*
FROM
yourtable
WHERE
id IN (SELECT
SUBSTRING_INDEX(GROUP_CONCAT(id
ORDER BY rate DESC),
',',
1) id
FROM
yourtable
GROUP BY year)
ORDER BY rate DESC;
No, you can't LIMIT subqueries arbitrarily (you can do it to a limited extent in newer MySQLs, but not for 5 results per group).
This is a groupwise-maximum type query, which is not trivial to do in SQL. There are various ways to tackle that which can be more efficient for some cases, but for top-n in general you'll want to look at Bill's answer to a similar previous question.
As with most solutions to this problem, it can return more than five rows if there are multiple rows with the same rate
value, so you may still need a quantity of post-processing to check for that.
This requires a series of subqueries to rank the values, limit them, then perform the sum while grouping
@Rnk:=0;
@N:=2;
select
c.id,
sum(c.val)
from (
select
b.id,
b.bal
from (
select
if(@last_id=id,@Rnk+1,1) as Rnk,
a.id,
a.val,
@last_id=id,
from (
select
id,
val
from list
order by id,val desc) as a) as b
where b.rnk < @N) as c
group by c.id;