How to limit contained associations per record/group?
I have a Model, Articles, which hasMany Abstracts. I want to load the 10 latest Articles, and for each Article, the Abstract with the highest number of points. My function looks like this:
public function getArticles($category, $viewName) {
$subArticles = $this->Articles->findByCategory($category)->contain([
'Abstracts' => function ($q) {
return $q
->select(['body', 'points', 'article_id'])
->where(['Abstracts.approved' => true])
->limit(10)
->order(['Abstracts.points' => 'DESC']);
}
])
->limit(10)
->order(['Articles.created' => 'DESC']) ;
$this->set( $viewName . 'Articles', $subArticles );
}
The result that I get is not what I intend though. Looking through the SQL, first CakePHP is getting the articles.id of everything in the category (fine). Then, CakePHP goes into the Abstracts table, using those 10 articles.id's it just found, and asks for the 10 Abstracts with the highest votes (that belong to those Articles).
The problem is that I want 1 Abstract for each Article, not the 10 Abstracts belonging to any Article in that category. How can I fix this? Thanks!
EDIT
ndm suggested that this was a duplicate of Using limit() on contained model so I attempted the solution there. Namely, I added this to my Model:
$this->hasOne('TopAbstract', [
'className' => 'Abstracts',
'foreignKey' => 'abstract_id',
'strategy' => 'select',
'sort' => ['TopAbstract.points' => 'DESC'],
'conditions' => function ($e, $query) {
$query->limit(1);
return $e;
} ]);
And then I try to find the Articles byCategory, with contain(['TopAbstract']), only this kills my SQL. It dies a horrible death:
Error: SQLSTATE[HY000]: General error: 1 near ")": syntax error
Debug doesn't even show the query that killed it, so I'm not sure how to debug this one?
EDIT
Talking to myself a bit, but the error is definitely in the 'conditions' part of the hasOne. I take that out, and it works fine. Can't find an example of how this is supposed to look on the interwebs.. anyone have any idea?
Solution 1:
What you are looking for, is a solution to the greatest-n-per-group problem. You didn't mention any specific RDBMS, but nonetheless see also http://dev.mysql.com/doc/refman/5.6/en/example-maximum-column-group-row.html
A library solution
For those who are a little bit adventurous, I've developed some custom associations that transparently integrate into the ORM layer, and allow for basic limit per group for hasMany
and belongsToMany
relations: https://github.com/icings/partitionable.
Using them, the solution for the question would be to set up an association like this:
$this
->partitionableHasMany('TopAbstracts')
->setClassName('Abstracts')
->setLimit(1)
->setSort([
'Abstracts.points' => 'DESC',
'Abstracts.id' => 'ASC',
]);
TopAbstracts
could then be contained just like any other association.
Custom solutions on association level
So let's give this a try, here's three options that can be applied on association level (defining the conditions could also be moved into custom finders), however you might consider them as not that "straightforward".
Select strategy - Using a join on a grouping, max-value subquery
$this->hasOne('TopAbstracts', [
'className' => 'Abstracts',
'strategy' => 'select',
'conditions' => function (\Cake\Database\Expression\QueryExpression $exp, \Cake\ORM\Query $query) {
$query->innerJoin(
[
'AbstractsFilter' => $query
->connection()
->newQuery()
->select(['article_id', 'points' => $query->func()->max('points')])
->from('abstracts')
->group('article_id')
],
[
'TopAbstracts.article_id = AbstractsFilter.article_id',
'TopAbstracts.points = AbstractsFilter.points'
]
);
return [];
}
]);
This will select the top abstracts via a join query that is based on the max points, it will look something like
SELECT
TopAbstracts.id AS `TopAbstracts__id`, ...
FROM
abstracts TopAbstracts
INNER JOIN (
SELECT
article_id, (MAX(points)) AS `points`
FROM
abstracts
GROUP BY
article_id
)
AbstractsFilter ON (
TopAbstracts.article_id = AbstractsFilter.article_id
AND
TopAbstracts.points = AbstractsFilter.points
)
WHERE
TopAbstracts.article_id in (1,2,3,4,5,6,7,8, ...)
Select strategy - Using left self-join filtering
$this->hasOne('TopAbstracts', [
'className' => 'Abstracts',
'strategy' => 'select',
'conditions' => function (\Cake\Database\Expression\QueryExpression $exp, \Cake\ORM\Query $query) {
$query->leftJoin(
['AbstractsFilter' => 'abstracts'],
[
'TopAbstracts.article_id = AbstractsFilter.article_id',
'TopAbstracts.points < AbstractsFilter.points'
]);
return $exp->add(['AbstractsFilter.id IS NULL']);
}
]);
This will use a self-join that filters based on the rows that don't have a.points < b.points
, it will look something like
SELECT
TopAbstracts.id AS `TopAbstracts__id`, ...
FROM
abstracts TopAbstracts
LEFT JOIN
abstracts AbstractsFilter ON (
TopAbstracts.article_id = AbstractsFilter.article_id
AND
TopAbstracts.points < AbstractsFilter.points
)
WHERE
(AbstractsFilter.id IS NULL AND TopAbstracts.article_id in (1,2,3,4,5,6,7,8, ...))
Join strategy - Using a subquery for the join condition
$this->hasOne('TopAbstracts', [
'className' => 'Abstracts',
'foreignKey' => false,
'conditions' => function (\Cake\Database\Expression\QueryExpression $exp, \Cake\ORM\Query $query) {
$subquery = $query
->connection()
->newQuery()
->select(['SubTopAbstracts.id'])
->from(['SubTopAbstracts' => 'abstracts'])
->where(['Articles.id = SubTopAbstracts.article_id'])
->order(['SubTopAbstracts.points' => 'DESC'])
->limit(1);
return $exp->add(['TopAbstracts.id' => $subquery]);
}
]);
This will use a correlated subquery that uses a rather specific select with simple ordering and limiting to pick the top comment. Note that the foreignKey
option is set to false
in order to avoid an additional Articles.id = TopAbstracts.article_id
condition to be compiled into the join conditions.
The query will look something like
SELECT
Articles.id AS `Articles__id`, ... ,
TopAbstracts.id AS `TopAbstracts__id`, ...
FROM
articles Articles
LEFT JOIN
abstracts TopAbstracts ON (
TopAbstracts.id = (
SELECT
SubTopAbstracts.id
FROM
abstracts SubTopAbstracts
WHERE
Articles.id = SubTopAbstracts.article_id
ORDER BY
SubTopAbstracts.points DESC
LIMIT
1
)
)
All these 3 options will query and inject the records without any hackery, it's just not very "straightforward".
A manual approach
For the sake of completeness, it is of course always possible to manually load the associcated records and format the results appropriately, for example using result formatters, see for example CakePHP Entity contain without foreign key
Select strategy and reversed ordering
Just for reference, one of the weird solutions I stumbled over initially. This one really shouldn't be used!
This will select all associated abstracts, and then the ORM will iterate over them and for each article pick the first one with a matching article_id
value. So in theory, when ordered descing on points
, the ORM should pick the one with he most points.
While I would have expected this to work out of the box, it seems that the ORM iterates over the results in reversed order, which will cause the wrong rows to be picked. In order to get this working, the query needs to use the opposite order that would normally need to be used, ie ASC
instead of DESC
.
$this->hasOne('TopAbstracts', [
'className' => 'Abstracts',
'foreignKey' => 'abstract_id',
'strategy' => 'select',
'conditions' => function (\Cake\Database\Expression\QueryExpression $exp, \Cake\ORM\Query $query) {
$query->order(['TopAbstracts.points' => 'ASC']);
return [];
}
]);
Also the function needs to return an empty array instead of the expression like shown in the linked answer, as this will cause invalid SQL to be compiled. Both of these behaviours, the reversed order iterating and the invalid SQL might be bugs.
While this will work, it will always select all associated abstracts, not only the top ones, which might be considered rather inefficient, and look something like
SELECT
Articles.id AS `Articles__id`, ...
FROM
articles Articles
SELECT
TopAbstracts.id AS `TopAbstracts__id`, ...
FROM
abstracts TopAbstracts
WHERE
TopAbstracts.article_id in (1,2,3,4,5,6,7,8, ...)
ORDER BY
TopAbstracts.points ASC