How to round/remove trailing ".0" zeros in pandas column?

I'm trying to see if I can remove the trailing zeros from this phone number column.

Example:

0
1      8.00735e+09
2      4.35789e+09
3      6.10644e+09

The type in this column is an object, and I tried to round it but I am getting an error. I checked a couple of them I know they are in this format "8007354384.0", and want to get rid of the trailing zeros with the decimal point.

Sometimes I received in this format and sometimes I don't, they will be integer numbers. I would like to check if the phone column has a trailing zero, then remove it.

I have this code but I'm stuck on how to check for trailing zeros for each row.

data.ix[data.phone.str.contains('.0'), 'phone']

I get an error => *** ValueError: cannot index with vector containing NA / NaN values. I believe the issue is because some rows have empty data, which sometime I do receive. The code above should be able to skip an empty row.

Does anybody have any suggestions? I'm new to pandas but so far it's an useful library. Your help will be appreciated.

Note The provided example above, the first row has an empty data, which I do sometimes I get. Just to make sure this is not represented as 0 for phone number.

Also empty data is considered a string, so it's a mix of floats and string, if rows are empty.


Solution 1:

use astype(np.int64)

s = pd.Series(['', 8.00735e+09, 4.35789e+09, 6.10644e+09])
mask = pd.to_numeric(s).notnull()
s.loc[mask] = s.loc[mask].astype(np.int64)
s

0              
1    8007350000
2    4357890000
3    6106440000
dtype: object

Solution 2:

In Pandas/NumPy, integers are not allowed to take NaN values, and arrays/series (including dataframe columns) are homogeneous in their datatype --- so having a column of integers where some entries are None/np.nan is downright impossible.

EDIT:data.phone.astype('object') should do the trick; in this case, Pandas treats your column as a series of generic Python objects, rather than a specific datatype (e.g. str/float/int), at the cost of performance if you intend to run any heavy computations with this data (probably not in your case).

Assuming you want to keep those NaN entries, your approach of converting to strings is a valid possibility:

data.phone.astype(str).str.split('.', expand = True)[0]

should give you what you're looking for (there are alternative string methods you can use, such as .replace or .extract, but .split seems the most straightforward in this case).

Alternatively, if you are only interested in the display of floats (unlikely I'd suppose), you can do pd.set_option('display.float_format','{:.0f}'.format), which doesn't actually affect your data.