How to plot two columns of a pandas data frame using points
I have a pandas dataframe and would like to plot values from one column versus the values from another column. Fortunately, there is plot
method associated with the data-frames that seems to do what I need:
df.plot(x='col_name_1', y='col_name_2')
Unfortunately, it looks like among the plot styles (listed here after the kind
parameter) there are not points. I can use lines or bars or even density but not points. Is there a work around that can help to solve this problem.
You can specify the style
of the plotted line when calling df.plot
:
df.plot(x='col_name_1', y='col_name_2', style='o')
The style
argument can also be a dict
or list
, e.g.:
import numpy as np
import pandas as pd
d = {'one' : np.random.rand(10),
'two' : np.random.rand(10)}
df = pd.DataFrame(d)
df.plot(style=['o','rx'])
All the accepted style formats are listed in the documentation of matplotlib.pyplot.plot
.
For this (and most plotting) I would not rely on the Pandas wrappers to matplotlib. Instead, just use matplotlib directly:
import matplotlib.pyplot as plt
plt.scatter(df['col_name_1'], df['col_name_2'])
plt.show() # Depending on whether you use IPython or interactive mode, etc.
and remember that you can access a NumPy array of the column's values with df.col_name_1.values
for example.
I ran into trouble using this with Pandas default plotting in the case of a column of Timestamp values with millisecond precision. In trying to convert the objects to datetime64
type, I also discovered a nasty issue: < Pandas gives incorrect result when asking if Timestamp column values have attr astype >.