MySQL - Selecting a Column not in Group By

I'm trying to add features to a preexisting application and I came across a MySQL view something like this:

SELECT
     AVG(table_name.col1),
     AVG(table_name.col2),
     AVG(table_name.col3),
     table_name.personID,
     table_name.col4
FROM table_name
GROUP BY table_name.personID;

OK so there's a few aggregate functions. You can select personID because you're grouping by it. But it also is selecting a column that is not in an aggregate function and is not a part of the GROUP BY clause. How is this possible??? Does it just pick a random value because the values definitely aren't unique per group?

Where I come from (MSSQL Server), that's an error. Can someone explain this behavior to me and why it's allowed in MySQL?


Solution 1:

It's true that this feature permits some ambiguous queries, and silently returns a result set with an arbitrary value picked from that column. In practice, it tends to be the value from the row within the group that is physically stored first.

These queries aren't ambiguous if you only choose columns that are functionally dependent on the column(s) in the GROUP BY criteria. In other words, if there can be only one distinct value of the "ambiguous" column per value that defines the group, there's no problem. This query would be illegal in Microsoft SQL Server (and ANSI SQL), even though it cannot logically result in ambiguity:

SELECT AVG(table1.col1), table1.personID, persons.col4
FROM table1 JOIN persons ON (table1.personID = persons.id)
GROUP BY table1.personID;

Also, MySQL has an SQL mode to make it behave per the standard: ONLY_FULL_GROUP_BY

FWIW, SQLite also permits these ambiguous GROUP BY clauses, but it chooses the value from the last row in the group.


At least in the version I tested. What it means to be arbitrary is that either MySQL or SQLite could change their implementation in the future, and have some different behavior. You should therefore not rely on the behavior staying they way it is currently in ambiguous cases like this. It's better to rewrite your queries to be deterministic and not ambiguous. That's why MySQL 5.7 now enables ONLY_FULL_GROUP_BY by default.

Solution 2:

I should have Googled for just a bit longer... It seems I found my answer.

MySQL extends the use of GROUP BY so that you can use nonaggregated columns or calculations in the SELECT list that do not appear in the GROUP BY clause. You can use this feature to get better performance by avoiding unnecessary column sorting and grouping. For example, you do not need to group on customer.name in the following query

In standard SQL, you would have to add customer.name to the GROUP BY clause. In MySQL, the name is redundant.

Still, that just seems... wrong.