Normalize rows of pandas data frame by their sums [duplicate]
I have a pandas dataframe containing spectral data and metadata. The columns are labeled with a multiindex so that df['wvl']
gives the spectra and df['meta']
gives the metadata. Within df['wvl']
the column labels are the wavelength values for the spectrometer channels.
What I want to do is normalize each row of df['wvl']
by the sum of that row so that adding up the values in the row gives a total of 1.0.
Here's what one row of the dataframe looks like:
df['wvl'].iloc[0]
246.050003 128.533035
246.102005 102.756321
246.156006 99.930775
...
848.697205 121.313347
848.896423 127.011662
849.095703 123.234168
Name: 0, dtype: float64
But when I do something like:
df['wvl'].iloc[0]=df['wvl'].iloc[0]/df['wvl'].iloc[0].sum()
Nothing happens! I get the exact same values:
df['wvl'].iloc[0]
246.050003 128.533035
246.102005 102.756321
246.156006 99.930775
...
848.697205 121.313347
848.896423 127.011662
849.095703 123.234168
Name: 0, dtype: float64
If I create a temporary variable to hold the row, I can do the normalization just fine:
temp=df['wvl'].iloc[0]
temp=temp/temp.sum()
temp
246.050003 0.000027
246.102005 0.000022
246.156006 0.000021
...
848.697205 0.000026
848.896423 0.000027
849.095703 0.000026
Name: 0, dtype: float64
But if I try to replace the dataframe row with the normalized temporary variable, nothing happens:
df['wvl'].iloc[0]=temp
df['wvl'].iloc[0]
246.050003 128.533035
246.102005 102.756321
246.156006 99.930775
...
848.697205 121.313347
848.896423 127.011662
849.095703 123.234168
Name: 0, dtype: float64
I'm obviously missing something here, but I can't figure out what and it's driving me insane. Help? Thanks in advance!
Solution 1:
You can use
df.div(df.sum(axis=1), axis=0)
df.sum(axis=1)
sums up each row; df.div(..., axis=0)
then divides.
Example:
import pandas as pd
df = pd.DataFrame({'a': [1, 2], 'b': [3, 4]})
>>> df.div(df.sum(axis=1), axis=0)
a b
0 0.250000 0.750000
1 0.333333 0.666667