Do all columns in a SELECT list have to appear in a GROUP BY clause

Solution 1:

Imagine the following:

    A    B    C
   Cat   10   False
   Dog   25   True
   Dog   20   False
   Cat   5    False

If you select A, B and Group By Only A - what would your output be? You'd only have two rows (or tuples) because you have two values for A - but how does it display B?

If you group by A, B, you'd get four rows, no problems there. If you group by A and perform a function on B - like SUM(B) then you get two rows again:

    Cat    15
    Dog    45

But if you select A, B and only group by A - it doesn't know what to do. Truthfully, I believe there are some databases out there that will select a random value for B in that case and I believe there are some that will give you an error message.

Solution 2:

That's historically true. Omitting unaggregated columns leads to indeterminate behavior. SQL aims at fully determinate behavior.

But SQL standards have recently changed to let you omit from the GROUP BY clause columns that are functionally dependent on columns that are in the GROUP BY. PostgreSQL follows the more recent SQL standards. (It's not the only one.) Behavior is still fully determinate.

create table a (
  a_id integer primary key,
  xfr_date date not null
);

create table b (
  a_id integer not null references a (a_id),
  recd_date date not null,
  units_recd integer not null 
    check (units_recd >= 0),
  primary key (a_id, recd_date)
);

select a.a_id, a.xfr_date, sum(b.units_recd)
from a
inner join b on a.a_id = b.a_id
group by a.a_id; -- The column a.xfr_date is functionally dependent 
                 -- on a.a_id; it doesn't have to appear in the 
                 -- GROUP BY clause.

The notable deviation from from SQL standards is MySQL. It lets you omit just about everything from the GROUP BY. But that design choice makes its behavior indeterminate when you omit columns that are in the SELECT list.

Solution 3:

Actually, in MySQL you don't have to group by all columns. You can just group by whatever columns you want. The problem is, it will just pull a random value (from the set of available rows in the group) for the fields which aren't in the group by. If you know that you are grouping by something that is a unique key, there's no point in grouping by the rest of the fields, as they will already all have the same value anyway. It can actually speed it up to not have to group by every field when it is completely unnecessary.