Why does C++ output negative numbers when using modulo?
Math:
If you have an equation like this:
x = 3 mod 7
x could be ... -4, 3, 10, 17, ..., or more generally:
x = 3 + k * 7
where k can be any integer. I don't know of a modulo operation is defined for math, but the factor ring certainly is.
Python:
In Python, you will always get non-negative values when you use %
with a positive m
:
#!/usr/bin/python
# -*- coding: utf-8 -*-
m = 7
for i in xrange(-8, 10 + 1):
print(i % 7)
Results in:
6 0 1 2 3 4 5 6 0 1 2 3 4 5 6 0 1 2 3
C++:
#include <iostream>
using namespace std;
int main(){
int m = 7;
for(int i=-8; i <= 10; i++) {
cout << (i % m) << endl;
}
return 0;
}
Will output:
-1 0 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 0 1 2 3
ISO/IEC 14882:2003(E) - 5.6 Multiplicative operators:
The binary / operator yields the quotient, and the binary % operator yields the remainder from the division of the first expression by the second. If the second operand of / or % is zero the behavior is undefined; otherwise (a/b)*b + a%b is equal to a. If both operands are nonnegative then the remainder is nonnegative; if not, the sign of the remainder is implementation-defined 74).
and
74) According to work underway toward the revision of ISO C, the preferred algorithm for integer division follows the rules defined in the ISO Fortran standard, ISO/IEC 1539:1991, in which the quotient is always rounded toward zero.
Source: ISO/IEC 14882:2003(E)
(I couldn't find a free version of ISO/IEC 1539:1991
. Does anybody know where to get it from?)
The operation seems to be defined like this:
Question:
Does it make sense to define it like that?
What are arguments for this specification? Is there a place where the people who create such standards discuss about it? Where I can read something about the reasons why they decided to make it this way?
Most of the time when I use modulo, I want to access elements of a datastructure. In this case, I have to make sure that mod returns a non-negative value. So, for this case, it would be good of mod always returned a non-negative value. (Another usage is the Euclidean algorithm. As you could make both numbers positive before using this algorithm, the sign of modulo would matter.)
Additional material:
See Wikipedia for a long list of what modulo does in different languages.
On x86 (and other processor architectures), integer division and modulo are carried out by a single operation, idiv
(div
for unsigned values), which produces both quotient and remainder (for word-sized arguments, in AX
and DX
respectively). This is used in the C library function divmod
, which can be optimised by the compiler to a single instruction!
Integer division respects two rules:
- Non-integer quotients are rounded towards zero; and
- the equation
dividend = quotient*divisor + remainder
is satisfied by the results.
Accordingly, when dividing a negative number by a positive number, the quotient will be negative (or zero).
So this behaviour can be seen as the result of a chain of local decisions:
- Processor instruction set design optimises for the common case (division) over the less common case (modulo);
- Consistency (rounding towards zero, and respecting the division equation) is preferred over mathematical correctness;
- C prefers efficiency and simplicitly (especially given the tendency to view C as a "high level assembler"); and
- C++ prefers compatibility with C.
Back in the day, someone designing the x86 instruction set decided it was right and good to round integer division toward zero rather than round down. (May the fleas of a thousand camels nest in his mother's beard.) To keep some semblance of math-correctness, operator REM, which is pronounced "remainder", had to behave accordingly. DO NOT read this: https://www.ibm.com/support/knowledgecenter/ssw_ibm_i_73/rzatk/REM.htm
I warned you. Later someone doing the C spec decided it would be conforming for a compiler to do it either the right way or the x86 way. Then a committee doing the C++ spec decided to do it the C way. Then later yet, after this question was posted, a C++ committee decided to standardize on the wrong way. Now we are stuck with it. Many a programmer has written the following function or something like it. I have probably done it at least a dozen times.
inline int mod(int a, int b) {int ret = a%b; return ret>=0? ret: ret+b; }
There goes your efficiency.
These days I use essentially the following, with some type_traits stuff thrown in. (Thanks to Clearer for a comment that gave me an idea for an improvement using latter day C++. See below.)
<strike>template<class T>
inline T mod(T a, T b) {
assert(b > 0);
T ret = a%b;
return (ret>=0)?(ret):(ret+b);
}</strike>
template<>
inline unsigned mod(unsigned a, unsigned b) {
assert(b > 0);
return a % b;
}
True fact: I lobbied the Pascal standards committee to do mod the right way until they relented. To my horror, they did integer division the wrong way. So they do not even match.
EDIT: Clearer gave me an idea. I am working on a new one.
#include <type_traits>
template<class T1, class T2>
inline T1 mod(T1 a, T2 b) {
assert(b > 0);
T1 ret = a % b;
if constexpr ( std::is_unsigned_v<T1>)
{
return ret;
} else {
return (ret >= 0) ? (ret) : (ret + b);
}
}
What are arguments for this specification?
One of the design goals of C++ is to map efficiently to hardware. If the underlying hardware implements division in a way that produces negative remainders, then that's what you'll get if you use %
in C++. That's all there is to it really.
Is there a place where the people who create such standards discuss about it?
You will find interesting discussions on comp.lang.c++.moderated and, to a lesser extent, comp.lang.c++