Is MySQL breaking the standard by allowing selecting columns that are not part of the group by clause?
Solution 1:
Standard SQL would reject your query because you can not SELECT non-aggregate fields that are not part of the GROUP BY clause in an aggregate query
This is correct, up to 1992.
But it is plainly wrong, from 2003 and beyond.
From SQL-2003 standard, 6IWD6-02-Foundation-2011-01.pdf, from http://www.wiscorp.com/, paragraph-7.12 (query specification), page 398:
- If T is a grouped table, then let G be the set of grouping columns of T. In each ((value expression)) contained in ((select list)) , each column reference that references a column of T shall reference some column C that is functionally dependent on G or shall be contained in an aggregated argument of a ((set function specification)) whose aggregation query is QS
Now MYSQL, has implemented this feature by allowing not only columns that are functionally dependent on the grouping columns but allowing all columns. This is causing some problems with users that do not understand how grouping works and get indeterminate results where they don't expect.
But you are right to say that MySQL has added a feature that conflicts with SQL-standards (although you seem to think that for the wrong reason). It's not entirely accurate as they have added a SQL-standard feature but not in the best way (more like the easy way) but it does conflict with the latest standards.
To answer your question, the reason for this MySQL feature (extension) is I suppose to be accordance with latest SQL-standards (2003+). Why they chose to implement it this way (not fully compliant), we can only speculate.
As @Quassnoi and @Johan answered with examples, it's mainly a performance and maintainability issue. But one can't easily change the RDBMS to be clever enough (Skynet excluded) to recognize functionally dependent columns, so MySQL developers made a choice:
We (MySQL) give you (MySQL users) this feature which is in SQL-2003 standards. It improves speed in certain
GROUP BY
queries but there's a catch. You have to be careful (and not the SQL engine) so columns in theSELECT
andHAVING
lists are functionally dependent on theGROUP BY
columns. If not, you may get indeterminate results.
If you want to disable it, you can set
sql_mode
toONLY_FULL_GROUP_BY
.
It's all in the MySQL docs: Extensions to GROUP BY
(5.5) - although not in the above wording but as in your quote (they even forgot to mention that it's a deviation from standard SQL-2003 while not standard SQL-92). This kind of choices is common I think in all software, other RDBMS included. They are made for performance, backward compatibility and a lot of other reasons. Oracle has the famous '' is the same as NULL
for example and SQL-Server has probably some, too.
There is also this blog post by Peter Bouman, where MySQL developers' choice is defended: Debunking GROUP BY myths.
In 2011, as @Mark Byers informed us in a comment (in a related question at DBA.SE), PostgreSQL 9.1 added a new feature (release date: September 2011) designed for this purpose. It is more restrictive than MySQL's implementation and closer to the standard.
Later, in 2015 MySQL announced that in 5.7 version, the behaviour is improved to conform with the standard and actually recognize functional dependencies, (even better than the Postgres implementation). The documentation: MySQL Handling of GROUP BY
(5.7) and another blog post by Peter Bouman: MySQL 5.7.5: GROUP BY
respects functional dependencies!
Solution 2:
Is MySQL breaking the standard by allowing this? How?
It lets you write a query like that:
SELECT a.*, COUNT(*)
FROM a
JOIN b
ON b.a = a.id
GROUP BY
a.id
Other systems would require you to add all columns from a
into the GROUP BY
list which makes the query larger, less maintanable and less efficient.
In this form (with grouping by the PK
), this does not contradict the standard since every column in a
is functionally dependent on its primary key.
However, MySQL
does not really check the functional dependency and lets you select columns not functionally dependent on the grouping set. This can yield indeterminate results and should not be relied upon. The only thing guaranteed is that the column values belong to some of the records sharing the grouping expression (not even to one record!).
This behavior can be disabled by setting sql_mode
to ONLY_FULL_GROUP_BY
.