pandas bar plot combined with line plot shows the time axis beginning at 1970
I am trying to draw a stock market graph
timeseries vs closing price and timeseries vs volume.
Somehow the x-axis shows the time in 1970
the following is the graph and the code
The code is:
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
pd_data = pd.DataFrame(data, columns=['id', 'symbol', 'volume', 'high', 'low', 'open', 'datetime','close','datetime_utc','created_at'])
pd_data['DOB'] = pd.to_datetime(pd_data['datetime_utc']).dt.strftime('%Y-%m-%d')
pd_data.set_index('DOB')
print(pd_data)
print(pd_data.dtypes)
ax=pd_data.plot(x='DOB',y='close',kind = 'line')
ax.set_ylabel("price")
#ax.pd_data['volume'].plot(secondary_y=True, kind='bar')
ax1=pd_data.plot(y='volume',secondary_y=True, ax=ax,kind='bar')
ax1.set_ylabel('Volumne')
# Choose your xtick format string
date_fmt = '%d-%m-%y'
date_formatter = mdates.DateFormatter(date_fmt)
ax1.xaxis.set_major_formatter(date_formatter)
# set monthly locator
ax1.xaxis.set_major_locator(mdates.MonthLocator(interval=1))
# set font and rotation for date tick labels
plt.gcf().autofmt_xdate()
plt.show()
Also tried the two graphs independently without ax=ax
ax=pd_data.plot(x='DOB',y='close',kind = 'line')
ax.set_ylabel("price")
ax1=pd_data.plot(y='volume',secondary_y=True,kind='bar')
ax1.set_ylabel('Volumne')
then price graph shows years properly whereas volumen graph shows 1970
And if i swap them
ax1=pd_data.plot(y='volume',secondary_y=True,kind='bar')
ax1.set_ylabel('Volumne')
ax=pd_data.plot(x='DOB',y='close',kind = 'line')
ax.set_ylabel("price")
Now the volume graph shows years properly whereas the price graph shows the years as 1970
I tried removing secondary_y and also changing bar to line. BUt no luck
Somehow pandas Data after first graph is changing the year.
Solution 1:
I could not find the reason for 1970, but rather use matplotlib.pyplot to plot instead of indirectly using pandas and also pass the datatime array instead of pandas
So the following code worked
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import pandas as pd
import datetime as dt
import numpy as np
pd_data = pd.read_csv("/home/stockdata.csv",sep='\t')
pd_data['DOB'] = pd.to_datetime(pd_data['datetime2']).dt.strftime('%Y-%m-%d')
dates=[dt.datetime.strptime(d,'%Y-%m-%d').date() for d in pd_data['DOB']]
plt.gca().xaxis.set_major_formatter(mdates.DateFormatter('%m/%d/%Y'))
plt.gca().xaxis.set_major_locator(mdates.MonthLocator(interval=2))
plt.bar(dates,pd_data['close'],align='center')
plt.gca().xaxis.set_minor_locator(plt.MultipleLocator(1))
plt.gcf().autofmt_xdate()
plt.show()
I have created a dates array in the datetime format. If i make graph using that then the dates are no more shown as 1970
open high low close volume datetime datetime2
35.12 35.68 34.79 35.58 1432995 1244385200000 2012-6-15 10:30:00
35.69 36.02 35.37 35.78 1754319 1244371600000 2012-6-16 10:30:00
35.69 36.23 35.59 36.23 3685845 1245330800000 2012-6-19 10:30:00
36.11 36.52 36.03 36.32 2635777 1245317200000 2012-6-20 10:30:00
36.54 36.6 35.8 35.9 2886412 1245303600000 2012-6-21 10:30:00
36.03 36.95 36.0 36.09 3696278 1245390000000 2012-6-22 10:30:00
36.5 37.27 36.18 37.11 2732645 1245376400000 2012-6-23 10:30:00
36.98 37.11 36.686 36.83 1948411 1245335600000 2012-6-26 10:30:00
36.67 37.06 36.465 37.05 2557172 1245322000000 2012-6-27 10:30:00
37.06 37.61 36.77 37.52 1780126 1246308400000 2012-6-28 10:30:00
37.47 37.77 37.28 37.7 1352267 1246394800000 2012-6-29 10:30:00
37.72 38.1 37.68 37.76 2194619 1246381200000 2012-6-30 10:30:00
The plot i get is