Is there a standard for inclusive/exclusive ends of time intervals?
I'm wondering if there is a standard or "normal" means of interpreting time interval data end points with respect to inclusiveness/exclusiveness of the value defining the end point. Note however that I am asking what the standard (or most common) convention is (if there is one), not for a dissertation on your personal preference. If you really want to provide a dissertation, please attach it to a reference to someone's published standard or a standard text on the matter. Open standards (that I don't have to pay to read) are greatly preferred unless they are fundamentally flawed :).
Of course there are 4 possibilities for a time interval from A to B:
- (A, B) - Both ends are exclusive.
- [A, B] - Both ends are inclusive.
- [A, B) - Start is inclusive and end is exclusive
- (A, B] - Start is exclusive and end is inclusive
Each of these has different characteristics (as I see it, feel free to point out more)
The [A, B] convention would have the seemingly inconvenient property that B is contained withing the inteval [A, B] and also [B, C]. This is particularly inconvenient if B is meant to represent the midnight boundary and you are trying to determine which day it falls on for example. Also, this means the duration of the interval is slightly irritatig to calculate since [A, B] where A = B should have a length of 1 and therefore the duration of [A, B] is (B - A) + 1
Similarly the (A, B) convention would have the difficulty that B falls within neither (A,B) nor (B,C)... continuing the analogy with day boundaries, midnight would be part of neither day. This is also logically inconvenient because [A, B] where A = B is a non-sense interval with duration less than zero, but reversing A and B does not make it a valid interval.
So I think I want either [A, B), or (A, B] and I can't figure out how to decide between them.
So if someone has a link to a standards document, reference to a standard text or similar that clarify the convention that would be great. Alternately, if you can link a variety of standards documents and/or references that more or less completely fail to agree, then I can just pick one that seems to have sufficient authority to CMA and be done with it :).
Finally, I will be working in Java, so I am particularly susceptible to answers that work well in Java.
In the general case, [A, B)
(inclusive start, exclusive end) has a lot going for it and I don't see any reason why the same wouldn't be true for time intervals.
Djikstra wrote a nice article about it Why numbering should start at zero which - despite the name - deals mostly with exactly this.
Short summary of the advantages:
-
end - start
equals the number of items in the list - upper bound of preceding interval is the lower bound of the next
- allows to index an interval starting from 0 with unsigned numbers [1]
Personally the second point is extremely useful for lots of problems; consider a pretty standard recursive function (in pseudo python):
def foo(start, end):
if end - start == 1:
# base case
else:
middle = start + (end - start) / 2
foo(start, middle)
foo(middle, end)
Writing the same with inclusive upper bound introduces lots of error prone off by one errors.
[1] That's the advantage compared to (A, B]
- a interval starting from 0 is MUCH more common than an interval ending in MAX_VAL
. Note that also relates to one additional problem: Using two inclusive bounds means we can denote a sequence whose length cannot be expressed with the same size.
java.time & Half-Open
The java.time classes that supplant the troublesome legacy date-time classes as well as the Joda-Time project define a span-of-time using the Half-Open approach [) where the beginning is inclusive while the ending is exclusive.
For date-time with a fractional second this eliminates the problem of trying to capture last moment. The infinitely-divisible last second must be resolved, but various systems use various granularities such as milliseconds, microseconds, nanoseconds, or something else. With Half-Open, a day, for example, starts at the first moment of the day and runs up to, but does not include, the first moment of the following day. Problem solved, no need to wrestle with last moment of the day and its fractional second.
I have come to see the benefits of using this approach consistently throughout all my date-time handling code. A week for example starting on a Monday runs up to, but does not include, the following Monday. A month starts on the 1st and runs up to, but does not include, the first of the following month thereby ignoring the challenge of determining the number of the last day of the month including Feb 28/29 Leap Year.
Another benefit of consistent use of Half-Open [) is the easing the cognitive load every time I have to detect and decipher and verify a piece of code’s span-of-time approach. In my own programming, I simply glance for a mention of Half-Open in a comment at top and I instantly know how to read that code.
A result of consistent use of Half-Open is reducing the chance of bugs in my code as my thinking and writing style are uniform with no chance of getting confused over inclusive-exclusive.
By the way, note that Half-Open [) means avoiding the SQL BETWEEN
conjunction as that is always fully-closed [].
As for the business thinking of the customers I serve, where appropriate I try to convince them to use Half-Open constantly as well. I've seen many situations where various business people were making incorrect assumptions about the periods of time covered in reports. Consistent use of Half-Open avoids these unfortunate ambiguities. But if the customer insists, I note this in my code and adjust inputs/outputs so as to use Half-Open within my own logic. For example my logic uses a week of Monday-Monday, but on a report subtract a day to show Sunday.
For even more classes representing spans of time with the Half-Open approach [), see the ThreeTen-Extras project for its Interval
class (a pair of Instant
objects) and the LocalDateRange
class (a pair of LocalDate
objects).
Tip: When printing/displaying reports for business, include a footer that describes the query logic including the detail of the beginning/ending be inclusive/exclusive. I have seen way too much confusion on this in the workplace, with readers making incorrect assumptions about the date ranges (and other criteria).
About java.time
The java.time framework is built into Java 8 and later. These classes supplant the troublesome old legacy date-time classes such as java.util.Date
, Calendar
, & SimpleDateFormat
.
To learn more, see the Oracle Tutorial. And search Stack Overflow for many examples and explanations. Specification is JSR 310.
The Joda-Time project, now in maintenance mode, advises migration to the java.time classes.
You may exchange java.time objects directly with your database. Use a JDBC driver compliant with JDBC 4.2 or later. No need for strings, no need for java.sql.*
classes. Hibernate 5 & JPA 2.2 support java.time.
Where to obtain the java.time classes?
-
Java SE 8, Java SE 9, Java SE 10, Java SE 11, and later - Part of the standard Java API with a bundled implementation.
- Java 9 brought some minor features and fixes.
-
Java SE 6 and Java SE 7
- Most of the java.time functionality is back-ported to Java 6 & 7 in ThreeTen-Backport.
-
Android
- Later versions of Android (26+) bundle implementations of the java.time classes.
- For earlier Android (<26), the process of API desugaring brings a subset of the java.time functionality not originally built into Android.
- If the desugaring does not offer what you need, the ThreeTenABP project adapts ThreeTen-Backport (mentioned above) to Android. See How to use ThreeTenABP….
I'll provide what I wrote for our team as an answer using Voo's link until such time as Voo adds an answer, then I'll give him credit instead. Here's what I decided for our case:
Time intervals in our applications will be represented as a pair of instantaneous times with the convention that the start time is inclusive and the end time is exclusive. This convention is mathematically convenient in that the difference of the bounds is equal to the length of the interval, and is also numerically consistent with the way arrays and lists are subscripted in java programs (see http://www.cs.utexas.edu/~EWD/ewd08xx/EWD831.PDF). The practical upshot of this is that interval 2012-03-17T00:00:00.000Z – 2012-03-18T00:00:00.000Z denotes the entirety of St. Patrick’s Day, and every date beginning with 2012-03-17 will be identified as included in St Patrick’s Day, but 2012-03-18T00:00:00.000Z will not be included, and St Patrick’s Day will include exactly 24*60*60*1000 milliseconds.
I can't say for certain, but I doubt a standard or convention exists. Whether or not you include the start or end instant would depend on your use case, so consider whether they are important to you. If the decision is arbitrary, pick one, note that the choice is arbitrary and move on.
As for what is supported in Java, the Joda Time library implements Interval
s that include the start time but not the end time
Despite this thread focusing more on Java, I thought it'd be quite interesting to see other adopted conventions, especially given that the pandas
Python library is ubiquitous for data analysis these days, and the fact that this StackOverflow page is one of the top search results when looking for conventions on the inclusivity/exclusivity of time ranges.
Quoting this page:
The start and end dates are strictly inclusive. So it will not generate any dates outside of those dates if specified.
Also, it's not only generating date ranges. The convention is also adopted when trying to index into time-series data. Here's a simple test on data frames with DatetimeIndex
>>> import pandas as pd
>>> pd.__version__
'0.20.2'
>>> df = pd.DataFrame(list(range(20)))
>>> df.index = pd.date_range(start="2017-07-01", periods=20)
>>> df["2017-07-01":"2017-07-05"]
0
2017-07-01 0
2017-07-02 1
2017-07-03 2
2017-07-04 3
2017-07-05 4