Find overlapping date ranges in PostgreSQL
Solution 1:
The currently accepted answer does not answer the question. And it is wrong in principle. a BETWEEN x AND y
translates to:
a >= x AND a <= y
Including the upper bound, while people typically need to exclude it:
a >= x AND a < y
With dates you can easily adjust. For the year 2009 use '2009-12-31' as upper bound.
But it's not as simple with timestamps which allow fractional digits. Modern Postgres versions use an 8-byte integer internally to store up to 6 fractional seconds (µs resolution). Knowing this we could still make it work, but that's not intuitive and depends on an implementation detail. Bad idea.
Moreover, a BETWEEN x AND y
does not find overlapping ranges. We need:
b >= x AND a < y
And players that never left are not considered, yet.
Proper answer
Assuming the year 2009
, I'll rephrase the question without changing its meaning:
"Find all players of a given team who joined before 2010 and did not leave before 2009."
Basic query:
SELECT p.*
FROM team t
JOIN contract c USING (name_team)
JOIN player p USING (name_player)
WHERE t.name_team = ?
AND c.date_join < date '2010-01-01'
AND c.date_leave >= date '2009-01-01';
But there is more:
If referential integrity is enforced with FK constraints, the table team
itself is just noise in the query and can be removed.
While the same player can leave and rejoin the same team, we also need to fold possible duplicates, for instance with DISTINCT
.
And we may need to provide for a special case: players that never left. Assuming those players have NULL in date_leave
.
"A player that is not known to have left is assumed to be playing for the team to this day."
Refined query:
SELECT DISTINCT p.*
FROM contract c
JOIN player p USING (name_player)
WHERE c.name_team = ?
AND c.date_join < date '2010-01-01'
AND (c.date_leave >= date '2009-01-01' OR c.date_leave IS NULL);
Operator precedence works against us, AND
binds before OR
. We need parentheses.
Related answer with optimized DISTINCT
(if duplicates are common):
- Many to Many Table - Performance is bad
Typically, names of natural persons aren't unique and a surrogate primary key is used. But, obviously, name_player
is the primary key of player
. If all you need is player names we don't need the table player
in the query, either:
SELECT DISTINCT name_player
FROM contract
WHERE name_team = ?
AND date_join < date '2010-01-01'
AND (date_leave >= date '2009-01-01' OR date_leave IS NULL);
SQL OVERLAPS
operator
The manual:
OVERLAPS
automatically takes the earlier value of the pair as the start. Each time period is considered to represent the half-open intervalstart <= time < end
, unlessstart
andend
are equal in which case it represents that single time instant.
To take care of potential NULL
values, COALESCE
seems easiest:
SELECT DISTINCT name_player
FROM contract
WHERE name_team = ?
AND (date_join, COALESCE(date_leave, CURRENT_DATE)) OVERLAPS
(date '2009-01-01', date '2010-01-01'); -- upper bound excluded
Range type with index support
In Postgres 9.2 or later you can also operate with actual range types:
SELECT DISTINCT name_player
FROM contract
WHERE name_team = ?
AND daterange(date_join, date_leave) &&
daterange '[2009-01-01,2010-01-01)'; -- upper bound excluded
Range types add some overhead and occupy more space. 2 x date
= 8 bytes; 1 x daterange
= 14 bytes on disk or 17 bytes in RAM. But in combination with the overlap operator &&
the query can be supported with a GiST index.
Also, no need to special-case NULL values. NULL means "open range" in a range type - exactly what we need. The table definition doesn't even have to change: we can create the range type on the fly - and support the query with a matching expression index:
CREATE INDEX mv_stock_dr_idx ON mv_stock USING gist (daterange(date_join, date_leave));
Related:
- Average stock history table
Solution 2:
Why not use between without the date part thing:
WHERE datefield BETWEEN '2009-10-10 00:00:00' AND '2009-10-11 00:00:00'
or something like that?