Shift elements in a numpy array
Solution 1:
Not numpy but scipy provides exactly the shift functionality you want,
import numpy as np
from scipy.ndimage.interpolation import shift
xs = np.array([ 0., 1., 2., 3., 4., 5., 6., 7., 8., 9.])
shift(xs, 3, cval=np.NaN)
where default is to bring in a constant value from outside the array with value cval
, set here to nan
. This gives the desired output,
array([ nan, nan, nan, 0., 1., 2., 3., 4., 5., 6.])
and the negative shift works similarly,
shift(xs, -3, cval=np.NaN)
Provides output
array([ 3., 4., 5., 6., 7., 8., 9., nan, nan, nan])
Solution 2:
For those who want to just copy and paste the fastest implementation of shift, there is a benchmark and conclusion(see the end). In addition, I introduce fill_value parameter and fix some bugs.
Benchmark
import numpy as np
import timeit
# enhanced from IronManMark20 version
def shift1(arr, num, fill_value=np.nan):
arr = np.roll(arr,num)
if num < 0:
arr[num:] = fill_value
elif num > 0:
arr[:num] = fill_value
return arr
# use np.roll and np.put by IronManMark20
def shift2(arr,num):
arr=np.roll(arr,num)
if num<0:
np.put(arr,range(len(arr)+num,len(arr)),np.nan)
elif num > 0:
np.put(arr,range(num),np.nan)
return arr
# use np.pad and slice by me.
def shift3(arr, num, fill_value=np.nan):
l = len(arr)
if num < 0:
arr = np.pad(arr, (0, abs(num)), mode='constant', constant_values=(fill_value,))[:-num]
elif num > 0:
arr = np.pad(arr, (num, 0), mode='constant', constant_values=(fill_value,))[:-num]
return arr
# use np.concatenate and np.full by chrisaycock
def shift4(arr, num, fill_value=np.nan):
if num >= 0:
return np.concatenate((np.full(num, fill_value), arr[:-num]))
else:
return np.concatenate((arr[-num:], np.full(-num, fill_value)))
# preallocate empty array and assign slice by chrisaycock
def shift5(arr, num, fill_value=np.nan):
result = np.empty_like(arr)
if num > 0:
result[:num] = fill_value
result[num:] = arr[:-num]
elif num < 0:
result[num:] = fill_value
result[:num] = arr[-num:]
else:
result[:] = arr
return result
arr = np.arange(2000).astype(float)
def benchmark_shift1():
shift1(arr, 3)
def benchmark_shift2():
shift2(arr, 3)
def benchmark_shift3():
shift3(arr, 3)
def benchmark_shift4():
shift4(arr, 3)
def benchmark_shift5():
shift5(arr, 3)
benchmark_set = ['benchmark_shift1', 'benchmark_shift2', 'benchmark_shift3', 'benchmark_shift4', 'benchmark_shift5']
for x in benchmark_set:
number = 10000
t = timeit.timeit('%s()' % x, 'from __main__ import %s' % x, number=number)
print '%s time: %f' % (x, t)
benchmark result:
benchmark_shift1 time: 0.265238
benchmark_shift2 time: 0.285175
benchmark_shift3 time: 0.473890
benchmark_shift4 time: 0.099049
benchmark_shift5 time: 0.052836
Conclusion
shift5 is winner! It's OP's third solution.
Solution 3:
Benchmarks & introducing Numba
1. Summary
- The accepted answer (
scipy.ndimage.interpolation.shift
) is the slowest solution listed in this page. - Numba (@numba.njit) gives some performance boost when array size smaller than ~25.000
- "Any method" equally good when array size large (>250.000).
- The fastest option really depends on
(1) Length of your arrays
(2) Amount of shift you need to do. - Below is the picture of the timings of all different methods listed on this page (2020-07-11), using constant shift = 10. As one can see, with small array sizes some methods are use more than +2000% time than the best method.
2. Detailed benchmarks with the best options
- Choose
shift4_numba
(defined below) if you want good all-arounder
3. Code
3.1 shift4_numba
- Good all-arounder; max 20% wrt. to the best method with any array size
- Best method with medium array sizes: ~ 500 < N < 20.000.
- Caveat: Numba jit (just in time compiler) will give performance boost only if you are calling the decorated function more than once. The first call takes usually 3-4 times longer than the subsequent calls. You can get even more performance boost with ahead of time compiled numba.
import numba
@numba.njit
def shift4_numba(arr, num, fill_value=np.nan):
if num >= 0:
return np.concatenate((np.full(num, fill_value), arr[:-num]))
else:
return np.concatenate((arr[-num:], np.full(-num, fill_value)))
3.2. shift5_numba
- Best option with small (N <= 300.. 1500) array sizes. Treshold depends on needed amount of shift.
- Good performance on any array size; max + 50% compared to the fastest solution.
- Caveat: Numba jit (just in time compiler) will give performance boost only if you are calling the decorated function more than once. The first call takes usually 3-4 times longer than the subsequent calls. You can get even more performance boost with ahead of time compiled numba.
import numba
@numba.njit
def shift5_numba(arr, num, fill_value=np.nan):
result = np.empty_like(arr)
if num > 0:
result[:num] = fill_value
result[num:] = arr[:-num]
elif num < 0:
result[num:] = fill_value
result[:num] = arr[-num:]
else:
result[:] = arr
return result
3.3. shift5
- Best method with array sizes ~ 20.000 < N < 250.000
- Same as
shift5_numba
, just remove the @numba.njit decorator.
4 Appendix
4.1 Details about used methods
-
shift_scipy
:scipy.ndimage.interpolation.shift
(scipy 1.4.1) - The option from accepted answer, which is clearly the slowest alternative. -
shift1
:np.roll
andout[:num] xnp.nan
by IronManMark20 & gzc -
shift2
:np.roll
andnp.put
by IronManMark20 -
shift3
:np.pad
andslice
by gzc -
shift4
:np.concatenate
andnp.full
by chrisaycock -
shift5
: using two timesresult[slice] = x
by chrisaycock -
shift#_numba
: @numba.njit decorated versions of the previous.
The shift2
and shift3
contained functions that were not supported by the current numba (0.50.1).
4.2 Other test results
4.2.1 Relative timings, all methods
- Relative timings, 10% shift, all methods
- Relative timings, constant shift (10), all methods
4.2.2 Raw timings, all methods
- Raw timings, constant shift (10), all methods
- Raw timings, 10% shift, all methods
4.2.3 Raw timings, few best methods
- Raw timings with small arrays, constant shift (10), few best methods
- Raw timings with small arrays, 10% shift, few best methods
- Raw timings with large arrays, constant shift (10), few best methods
- Raw timings with large arrays, 10% shift, few best methods
Solution 4:
There is no single function that does what you want. Your definition of shift is slightly different than what most people are doing. The ways to shift an array are more commonly looped:
>>>xs=np.array([1,2,3,4,5])
>>>shift(xs,3)
array([3,4,5,1,2])
However, you can do what you want with two functions.
Consider a=np.array([ 0., 1., 2., 3., 4., 5., 6., 7., 8., 9.])
:
def shift2(arr,num):
arr=np.roll(arr,num)
if num<0:
np.put(arr,range(len(arr)+num,len(arr)),np.nan)
elif num > 0:
np.put(arr,range(num),np.nan)
return arr
>>>shift2(a,3)
[ nan nan nan 0. 1. 2. 3. 4. 5. 6.]
>>>shift2(a,-3)
[ 3. 4. 5. 6. 7. 8. 9. nan nan nan]
After running cProfile on your given function and the above code you provided, I found that the code you provided makes 42 function calls while shift2
made 14 calls when arr is positive and 16 when it is negative. I will be experimenting with timing to see how each performs with real data.
Solution 5:
You can convert ndarray
to Series
or DataFrame
with pandas
first, then you can use shift
method as you want.
Example:
In [1]: from pandas import Series
In [2]: data = np.arange(10)
In [3]: data
Out[3]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
In [4]: data = Series(data)
In [5]: data
Out[5]:
0 0
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
dtype: int64
In [6]: data = data.shift(3)
In [7]: data
Out[7]:
0 NaN
1 NaN
2 NaN
3 0.0
4 1.0
5 2.0
6 3.0
7 4.0
8 5.0
9 6.0
dtype: float64
In [8]: data = data.values
In [9]: data
Out[9]: array([ nan, nan, nan, 0., 1., 2., 3., 4., 5., 6.])