How to normalize a NumPy array to within a certain range?
After doing some processing on an audio or image array, it needs to be normalized within a range before it can be written back to a file. This can be done like so:
# Normalize audio channels to between -1.0 and +1.0
audio[:,0] = audio[:,0]/abs(audio[:,0]).max()
audio[:,1] = audio[:,1]/abs(audio[:,1]).max()
# Normalize image to between 0 and 255
image = image/(image.max()/255.0)
Is there a less verbose, convenience function way to do this? matplotlib.colors.Normalize()
doesn't seem to be related.
audio /= np.max(np.abs(audio),axis=0)
image *= (255.0/image.max())
Using /=
and *=
allows you to eliminate an intermediate temporary array, thus saving some memory. Multiplication is less expensive than division, so
image *= 255.0/image.max() # Uses 1 division and image.size multiplications
is marginally faster than
image /= image.max()/255.0 # Uses 1+image.size divisions
Since we are using basic numpy methods here, I think this is about as efficient a solution in numpy as can be.
In-place operations do not change the dtype of the container array. Since the desired normalized values are floats, the audio
and image
arrays need to have floating-point point dtype before the in-place operations are performed.
If they are not already of floating-point dtype, you'll need to convert them using astype
. For example,
image = image.astype('float64')
If the array contains both positive and negative data, I'd go with:
import numpy as np
a = np.random.rand(3,2)
# Normalised [0,1]
b = (a - np.min(a))/np.ptp(a)
# Normalised [0,255] as integer: don't forget the parenthesis before astype(int)
c = (255*(a - np.min(a))/np.ptp(a)).astype(int)
# Normalised [-1,1]
d = 2.*(a - np.min(a))/np.ptp(a)-1
If the array contains nan
, one solution could be to just remove them as:
def nan_ptp(a):
return np.ptp(a[np.isfinite(a)])
b = (a - np.nanmin(a))/nan_ptp(a)
However, depending on the context you might want to treat nan
differently. E.g. interpolate the value, replacing in with e.g. 0, or raise an error.
Finally, worth mentioning even if it's not OP's question, standardization:
e = (a - np.mean(a)) / np.std(a)
You can also rescale using sklearn
. The advantages are that you can adjust normalize the standard deviation, in addition to mean-centering the data, and that you can do this on either axis, by features, or by records.
from sklearn.preprocessing import scale
X = scale( X, axis=0, with_mean=True, with_std=True, copy=True )
The keyword arguments axis
, with_mean
, with_std
are self explanatory, and are shown in their default state. The argument copy
performs the operation in-place if it is set to False
. Documentation here.