Floating point keys in std:map
Solution 1:
So there are a few issues with using doubles as keys in a std::map
.
First, NaN
, which compares less than itself is a problem. If there is any chance of NaN
being inserted, use this:
struct safe_double_less {
bool operator()(double left, double right) const {
bool leftNaN = std::isnan(left);
bool rightNaN = std::isnan(right);
if (leftNaN != rightNaN)
return leftNaN<rightNaN;
return left<right;
}
};
but that may be overly paranoid. Do not, I repeat do not, include an epsilon threshold in your comparison operator you pass to a std::set
or the like: this will violate the ordering requirements of the container, and result in unpredictable undefined behavior.
(I placed NaN
as greater than all double
s, including +inf
, in my ordering, for no good reason. Less than all double
s would also work).
So either use the default operator<
, or the above safe_double_less
, or something similar.
Next, I would advise using a std::multimap
or std::multiset
, because you should be expecting multiple values for each lookup. You might as well make content management an everyday thing, instead of a corner case, to increase the test coverage of your code. (I would rarely recommend these containers) Plus this blocks operator[]
, which is not advised to be used when you are using floating point keys.
The point where you want to use an epsilon is when you query the container. Instead of using the direct interface, create a helper function like this:
// works on both `const` and non-`const` associative containers:
template<class Container>
auto my_equal_range( Container&& container, double target, double epsilon = 0.00001 )
-> decltype( container.equal_range(target) )
{
auto lower = container.lower_bound( target-epsilon );
auto upper = container.upper_bound( target+epsilon );
return std::make_pair(lower, upper);
}
which works on both std::map
and std::set
(and multi
versions).
(In a more modern code base, I'd expect a range<?>
object that is a better thing to return from an equal_range
function. But for now, I'll make it compatible with equal_range
).
This finds a range of things whose keys are "sufficiently close" to the one you are asking for, while the container maintains its ordering guarantees internally and doesn't execute undefined behavior.
To test for existence of a key, do this:
template<typename Container>
bool key_exists( Container const& container, double target, double epsilon = 0.00001 ) {
auto range = my_equal_range(container, target, epsilon);
return range.first != range.second;
}
and if you want to delete/replace entries, you should deal with the possibility that there might be more than one entry hit.
The shorter answer is "don't use floating point values as keys for std::set
and std::map
", because it is a bit of a hassle.
If you do use floating point keys for std::set
or std::map
, almost certainly never do a .find
or a []
on them, as that is highly highly likely to be a source of bugs. You can use it for an automatically sorted collection of stuff, so long as exact order doesn't matter (ie, that one particular 1.0 is ahead or behind or exactly on the same spot as another 1.0). Even then, I'd go with a multimap/multiset, as relying on collisions or lack thereof is not something I'd rely upon.
Reasoning about the exact value of IEEE floating point values is difficult, and fragility of code relying on it is common.
Solution 2:
Here's a simplified example of how using soft-compare (aka epsilon or almost equal) can lead to problems.
Let epsilon = 2
for simplicity. Put 1
and 4
into your map
. It now might look like this:
1
\
4
So 1
is the tree root.
Now put in the numbers 2
, 3
, 4
in that order. Each will replace the root, because it compares equal to it. So then you have
4
\
4
which is already broken. (Assume no attempt to rebalance the tree is made.) We can keep going with 5
, 6
, 7
:
7
\
4
and this is even more broken, because now if we ask whether 4
is in there, it will say "no", and if we ask for an iterator for values less than 7
, it won't include 4
.
Though I must say that I've used map
s based on this flawed fuzzy compare operator numerous times in the past, and whenever I digged up a bug, it was never due to this. This is because datasets in my application areas never actually amount to stress-testing this problem.
Solution 3:
As Naszta says, you can implement your own comparison function. What he leaves out is the key to making it work - you must make sure that the function always returns false
for any values that are within your tolerance for equivalence.
return (abs(left - right) > epsilon) && (left < right);
Edit: as pointed out in many comments to this answer and others, there is a possibility for this to turn out badly if the values you feed it are arbitrarily distributed, because you can't guarantee that !(a<b)
and !(b<c)
results in !(a<c)
. This would not be a problem in the question as asked, because the numbers in question are clustered around 0.1 increments; as long as your epsilon is large enough to account for all possible rounding errors but is less than 0.05, it will be reliable. It is vitally important that the keys to the map are never closer than 2*epsilon apart.