Is a logical right shift by a power of 2 faster in AVR?
Solution 1:
Let's look at the datasheet:
http://atmel.com/dyn/resources/prod_documents/8271S.pdf
As far as I can see, the ASR (arithmetic shift right) always shifts by one bit and cannot take the number of bits to shift; it takes one cycle to execute. Therefore, shifting right by n bits will take n cycles. Powers of two behave just the same as any other number.
Solution 2:
In the AVR instruction set, arithmetic shift right and left happen one bit at a time. So, for this particular microcontroller, shifting >> n
means the compiler actually makes n many individual asr
ops, and I guess >>3
is one faster than >>4
.
This makes the AVR fairly unsual, by the way.
Solution 3:
You have to consult the documentation of your processor for this information. Even for a given instruction set, there may be different costs depending on the model. On a really small processor, shifting by one could conceivably be faster than by other values, for instance (it is the case for rotation instructions on some IA32 processors, but that's only because this instruction is so rarely produced by compilers).
According to http://atmel.com/dyn/resources/prod_documents/8271S.pdf all logical shifts are done in one cycle for the ATMega328. But of course, as pointed out in the comments, all logical shifts are by one bit. So the cost of a shift by n
is n
cycles in n
instructions.
Solution 4:
Indeed ATMega doesn't have a barrel shifter just like most (if not all) other 8-bit MCUs. Therefore it can only shift by 1 each time instead of any arbitrary values like more powerful CPUs. As a result shifting by 4 is theoretically slower than shifting by 3
However ATMega does have a swap nibble instruction so in fact x >> 4
is faster than x >> 3
Assuming x
is an uint8_t
then x >>= 3
is implemented by 3 right shifts
x >>= 1;
x >>= 1;
x >>= 1;
whereas x >>= 4
only need a swap and a bit clear
swap(x); // swap the top and bottom nibbles AB <-> BA
x &= 0x0f;
or
x &= 0xf0;
swap(x);
For bigger cross-register shifts there are also various ways to optimize it
With a uint16_t
variable y
consisting of the low part y0
and high part y1
then y >> 8
is simply
y0 = y1;
y1 = 0;
Similarly y >> 9
can be optimized to
y0 = y1 >> 1;
y1 = 0;
and hence is even faster than a shift by 3 on a char
In conclusion, the shift time varies depending on the shift distance, but it's not necessarily slower for longer or non-power-of-2 values. Generally it'll take at most 3 instructions to shift within an 8-bit char
Here are some demos from compiler explorer
-
A right shift by 4 is achieved by a
swap
and anand
like aboveswap r24 andi r24,lo8(15)
-
A right shift by 3 has to be done with 3 instructions
lsr r24 lsr r24 lsr r24
Left shifts are also optimized in the same manner
See also Which is faster: x<<1 or x<<10?