With `pandas.cut()`, how do I get integer bins and avoid getting a negative lowest bound?

you should specifically set the labels argument

preparations:

lower, higher = df['value'].min(), df['value'].max()
n_bins = 7

build up the labels:

edges = range(lower, higher, (higher - lower)/n_bins) # the number of edges is 8
lbs = ['(%d, %d]'%(edges[i], edges[i+1]) for i in range(len(edges)-1)]

set labels:

df['binned_df_pd'] = pd.cut(df.value, bins=n_bins, labels=lbs, include_lowest=True)

None of the other answers (including OP's np.histogram workaround) seem to work anymore. They have upvotes, so I'm not sure if something has changed over the years.

IntervalIndex requires all intervals to be closed identically, so [0, 53] cannot coexist with (322, 376].

Here are two working solutions based on the relabeling approach:

Without numpy, reuse pd.cut edges as pd.cut labels

bins = 7

_, edges = pd.cut(df.value, bins=bins, retbins=True)
labels = [f'({abs(edges[i]):.0f}, {edges[i+1]:.0f}]' for i in range(bins)]

df['bin'] = pd.cut(df.value, bins=bins, labels=labels)

#     value         bin
# 1       8     (0, 53]
# 2      16     (0, 53]
# ..    ...         ...
# 45    360  (322, 376]
# 46    368  (322, 376]

With numpy, convert np.linspace edges into pd.cut labels

bins = 7

edges = np.linspace(df.value.min(), df.value.max(), bins+1).astype(int)
labels = [f'({edges[i]}, {edges[i+1]}]' for i in range(bins)]

df['bin'] = pd.cut(df.value, bins=bins, labels=labels)

#     value         bin
# 1       8     (0, 53]
# 2      16     (0, 53]
# ..    ...         ...
# 45    360  (322, 376]
# 46    368  (322, 376]

Note: Only the labels are changed, so the underlying binning will still occur with 0.1% margins.

pointplot() output (as of pandas 1.2.4):

sns.pointplot(x='bin', y='value', data=df)
plt.xticks(rotation=30, ha='right')

With `pandas.cut()`, how do I get integer bins and avoid getting a negative lowest bound?

preparations:

build up the labels:

set labels:

Related

Recent Posts