Adding two Series with NaNs

I'm working through the "Python For Data Analysis" and I don't understand a particular functionality. Adding two pandas series objects will automatically align the indexed data but if one object does not contain that index it is returned as NaN. For example from book:

a = Series([35000,71000,16000,5000],index=['Ohio','Texas','Oregon','Utah'])
b = Series([NaN,71000,16000,35000],index=['California', 'Texas', 'Oregon', 'Ohio'])

Result:

    In [63]: a
    Out[63]: Ohio          35000
             Texas         71000
             Oregon        16000
             Utah           5000
    In [64]: b
    Out[64]: California      NaN
             Texas         71000
             Oregon        16000
             Ohio          35000

When I add them together I get this...

    In [65]: a+b
    Out[65]: California       NaN
             Ohio           70000
             Oregon         32000
             Texas         142000
             Utah             NaN

So why is the Utah value NaN and not 500? It seems that 500+NaN=500. What gives? I'm missing something, please explain.

Update:

    In [92]: # fill NaN with zero
             b = b.fillna(0)
             b
    Out[92]: California        0
             Texas         71000
             Oregon        16000
             Ohio          35000

    In [93]: a
    Out[93]: Ohio      35000
             Texas     71000
             Oregon    16000
             Utah       5000

    In [94]: # a is still good
             a+b
    Out[94]: California       NaN
             Ohio           70000
             Oregon         32000
             Texas         142000 
             Utah             NaN

Solution 1:

Pandas does not assume that 500+NaN=500, but it is easy to ask it to do that:

a.add(b, fill_value=0)

Solution 2:

The default approach is to assume that any computation involving NaN gives NaN as the result. Anything plus NaN is NaN, anything divided by NaN is NaN, etc. If you want to fill the NaN with some value, you have to do that explicitly (as Dan Allan showed in his answer).

Solution 3:

It makes more sense to use pd.concat() as it can accept more columns.

import pandas as pd
import numpy as np

a = pd.Series([35000,71000,16000,5000],index=['Ohio','Texas','Oregon','Utah'])
b = pd.Series([np.nan,71000,16000,35000],index=['California', 'Texas', 'Oregon', 'Ohio'])

pd.concat((a,b), axis=1).sum(1, min_count=1)

Output:

California         NaN
Ohio           70000.0
Oregon         32000.0
Texas         142000.0
Utah            5000.0
dtype: float64

Or with 3 series:

import pandas as pd
import numpy as np

a = pd.Series([1, np.NaN, 4, 5])
b = pd.Series([3, np.NaN, 5, np.NaN])
c = pd.Series([np.NaN,np.NaN,np.NaN,np.NaN])

print(pd.concat((a,b,c), axis=1).sum(1, min_count=1))

#0    4.0
#1    NaN
#2    9.0
#3    5.0
#dtype: float64

Adding two Series with NaNs

Solution 1:

Solution 2:

Solution 3:

Related

Recent Posts