Absolute value abs(x) using bitwise operators and Boolean logic [duplicate]

Assuming 32-bit words, as stated in the question:

For negative x, x >> 31 is implementation-defined in the C and C++ standards. The author of the code expects two’s complement integers and an arithmetic right-shift, in which x >> 31 produces all zero bits if the sign bit of x is zero and all one bits if the sign bit is one.

Thus, if x is positive or zero, y is zero, and x + y is x, so (x + y) ^ y is x, which is the absolute value of x.

If x is negative, y is all ones, which represents −1 in two’s complement. Then x + y is x - 1. Then XORing with all ones inverts all the bits. Inverting all the bits is equivalent to taking the two’s complement and subtracting one, and two’s complement is the method used to negate integers in two’s complement format. In other words, XORing q with all ones gives -q - 1. So x - 1 XORed with all ones produces -(x - 1) - 1 = -x + 1 - 1 = -x, which is the absolute value of x except when x is the minimum possible value for the format (−2,147,483,648 for 32-bit two’s complement), in which case the absolute value (2,147,483,648) is too large to represent, and the resulting bit pattern is just the original x.


This approach relies on many implementation specific behavior:

  1. It assumes that x is 32 bits wide. Though, you could fix this by x >> (sizeof(x) * CHAR_BIT - 1)
  2. It assumes that the machine uses two's complement representation.
  3. the right-shift operator copies the sign bit from left to right.

Example with 3 bits:

101 -> x = -3
111 -> x >> 2

101 + 111 = 100 -> x + y

100 XOR 111 -> 011 -> 3

This is not portable.


This isn't portable, but I'll explain why it works anyway.

The first operation exploits a trait of 2's complement negative numbers, that the first bit if 1 if negative, and 0 if positive. This is because the numbers range from

The example below is for 8 bits, but can be extrapolated to any number of bits. In your case it's 32 bits (but 8 bits displays the ranges more easily)

10000000 (smallest negative number)
10000001 (next to smallest)
...
11111111 (negative one)
00000000 (zero)
00000001 (one)
...
01111110 (next to largest)
01111111 (largest)

Reasons for using 2's complement encoding of numbers come about by the property that adding any negative number to it's positive number yields zero.

Now, to create the negative of a 2's complement number, you would need to

  1. Take the inverse (bitwise not) of a the input number.
  2. Add one to it.

The reason the 1 is added to it is to force the feature of the addition zeroing the register. You see, if it was just x + ~(x), then you would get a register of all 1's. By adding one to it, you get a cascading carry which yields a register of zeros (with a 1 in the carry out of the register).

This understanding is important to know "why" the algorithm you provided (mostly) works.

y = x >> 31   // this line acts like an "if" statement.
              // Depending on if y is 32 signed or unsigned, when x is negative, 
              // it will fill y with 0xFFFFFFFF or 1.  The rest of the 
              // algorithm doesn't, care because it accommodates both inputs.
              // when x is positive, the result is zero.

We will explore (x is positive first)

(x + y) ^ y   // for positive x, first we substitute the y = 0
(x + 0) ^ 0   // reduce the addition
(x) ^ 0       // remove the parenthesis
x ^ 0         // which, by definition of xor, can only yield x
x

Now let's explore (x is negative, y is 0xFFFFFFFF (y was signed))

(x + y) ^ y   // first substitute the Y
(x + 0xFFFFFFFF) ^ 0xFFFFFFFF // note that 0xFFFFF is the same as 2's complement -1
(x - 1) ^ 0xFFFFFFFF // add in a new variable Z to hold the result
(x - 1) ^ 0xFFFFFFFF = Z  // take the ^ 0xFFFFFFFF of both sides
(x - 1) ^ 0xFFFFFFFF ^ 0xFFFFFFFF = Z ^ 0xFFFFFFFF // reduce the left side
(x - 1) = z ^ 0xFFFFFFFF // note that not is equivalent to ^ 0xFFFFFFFF
(x - 1) = ~(z) // add one to both sides
x - 1 + 1 = ~(z) + 1 //  reduce
x = ~(z) + 1  // by definition z is negative x (for 2's complement numbers)

Now let's explore (x is negative, y is 0x01 (y was unsigned))

(x + y) ^ y   // first substitute the Y
(x + 1) ^ 0x00000001 // note that x is a 2's complement negative, but is
                     // being treated as unsigned, so to make the unsigned
                     // context of x tracable, I'll add a -(x) around the X
(-(x) + 1) ^ 0x00000001 // which simplifies to
(-(x - 1)) ^ 0x00000001 // negative of a negative is positive
(-(x - 1)) ^ -(-(0x00000001)) // substituting 1 for bits of -1
(-(x - 1)) ^ -(0xFFFFFFFF) // pulling out the negative sign
-((x-1) ^ 0xFFFFFFFF) // recalling that while we added signs and negations to
                      // make the math sensible, there's actually no place to
                      // store them in an unsigned storage system, so dropping
                      // them is acceptable
x-1 ^ 0XFFFFFFFF = Z // introducing a new variable Z, take the ^ 0xFFFFFFF of both sides
x-1 ^ 0xFFFFFFFF ^ 0xFFFFFFFF = Z ^ 0xFFFFFFFF // reduce the left side
x-1 = z ^ 0xFFFFFFFF // note that not is equivalent to ^ 0xFFFFFFFF
x-1 = ~(z) // add one to both sides
x - 1 + 1 = ~(z) + 1 //  reduce
x = ~(z) + 1  // by definition z is negative x (for 2's complement numbers, even though we used only non-2's complement types)

Note that while the above proofs are passable for a general explanation, the reality is that these proofs don't cover important edge cases, like x = 0x80000000 , which represents a negative number greater in absolute value than any positive X which could be stored in the same number of bits.


I use this code, first the calculation of the two's complement (the guard just ensures with a compile time check, the template is an Integer)

/**
 * Zweierkomplement - Two's Complement
 */
template<typename T> constexpr auto ZQ(T const& _x) noexcept ->T{
  Compile::Guards::IsInteger<T>();
  return ((~(_x))+1);
}

and in a second step this is used to calculate the integer abs()

/**
 * if number is negative, get the same number with positiv sign
 */
template<typename T> auto INTABS(T const _x) -> typename std::make_unsigned<T>::type{
  Compile::Guards::IsInteger<T>();
  return static_cast<typename std::make_unsigned<T>::type>((_x<0)?(ZQ<T>(_x)):(_x));
}

why I use this kind of code:
* compile-time checks
* works with all Integer sizes
* portable from small µC to modern cores
* Its clear, that we need to consider the two's complement, so you need an unsigned return value, e.g for 8bit abs(-128)=128 can not be expressed in an signed integer