pandas bar plot combined with line plot shows the time axis beginning at 1970

I am trying to draw a stock market graph

timeseries vs closing price and timeseries vs volume.

Somehow the x-axis shows the time in 1970

the following is the graph and the code

enter image description here

The code is:

import pandas as pd

import matplotlib.pyplot as plt
import matplotlib.dates as mdates


pd_data = pd.DataFrame(data, columns=['id', 'symbol', 'volume', 'high', 'low', 'open', 'datetime','close','datetime_utc','created_at'])

pd_data['DOB'] = pd.to_datetime(pd_data['datetime_utc']).dt.strftime('%Y-%m-%d') 

pd_data.set_index('DOB')

print(pd_data)

print(pd_data.dtypes)

ax=pd_data.plot(x='DOB',y='close',kind = 'line')
ax.set_ylabel("price")

#ax.pd_data['volume'].plot(secondary_y=True,  kind='bar')
ax1=pd_data.plot(y='volume',secondary_y=True, ax=ax,kind='bar')
ax1.set_ylabel('Volumne')


# Choose your xtick format string
date_fmt = '%d-%m-%y'

date_formatter = mdates.DateFormatter(date_fmt)
ax1.xaxis.set_major_formatter(date_formatter)

# set monthly locator
ax1.xaxis.set_major_locator(mdates.MonthLocator(interval=1))

# set font and rotation for date tick labels
plt.gcf().autofmt_xdate()

plt.show()

Also tried the two graphs independently without ax=ax

ax=pd_data.plot(x='DOB',y='close',kind = 'line')
ax.set_ylabel("price")

ax1=pd_data.plot(y='volume',secondary_y=True,kind='bar')
ax1.set_ylabel('Volumne')

then price graph shows years properly whereas volumen graph shows 1970

And if i swap them

ax1=pd_data.plot(y='volume',secondary_y=True,kind='bar')
ax1.set_ylabel('Volumne')

ax=pd_data.plot(x='DOB',y='close',kind = 'line')
ax.set_ylabel("price")

Now the volume graph shows years properly whereas the price graph shows the years as 1970

I tried removing secondary_y and also changing bar to line. BUt no luck

Somehow pandas Data after first graph is changing the year.


Solution 1:

I could not find the reason for 1970, but rather use matplotlib.pyplot to plot instead of indirectly using pandas and also pass the datatime array instead of pandas

So the following code worked

import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import pandas as pd
import datetime as dt
import numpy as np

pd_data = pd.read_csv("/home/stockdata.csv",sep='\t')

pd_data['DOB'] = pd.to_datetime(pd_data['datetime2']).dt.strftime('%Y-%m-%d')

dates=[dt.datetime.strptime(d,'%Y-%m-%d').date() for d in pd_data['DOB']]

plt.gca().xaxis.set_major_formatter(mdates.DateFormatter('%m/%d/%Y'))
plt.gca().xaxis.set_major_locator(mdates.MonthLocator(interval=2))
plt.bar(dates,pd_data['close'],align='center')
plt.gca().xaxis.set_minor_locator(plt.MultipleLocator(1))
plt.gcf().autofmt_xdate()
plt.show()

I have created a dates array in the datetime format. If i make graph using that then the dates are no more shown as 1970

open    high    low close   volume  datetime    datetime2
35.12   35.68   34.79   35.58   1432995 1244385200000   2012-6-15 10:30:00
35.69   36.02   35.37   35.78   1754319 1244371600000   2012-6-16 10:30:00
35.69   36.23   35.59   36.23   3685845 1245330800000   2012-6-19 10:30:00
36.11   36.52   36.03   36.32   2635777 1245317200000   2012-6-20 10:30:00
36.54   36.6    35.8    35.9    2886412 1245303600000   2012-6-21 10:30:00
36.03   36.95   36.0    36.09   3696278 1245390000000   2012-6-22 10:30:00
36.5    37.27   36.18   37.11   2732645 1245376400000   2012-6-23 10:30:00
36.98   37.11   36.686  36.83   1948411 1245335600000   2012-6-26 10:30:00
36.67   37.06   36.465  37.05   2557172 1245322000000   2012-6-27 10:30:00
37.06   37.61   36.77   37.52   1780126 1246308400000   2012-6-28 10:30:00
37.47   37.77   37.28   37.7    1352267 1246394800000   2012-6-29 10:30:00
37.72   38.1    37.68   37.76   2194619 1246381200000   2012-6-30 10:30:00

The plot i get is

b