Pandas join/merge/concat two dataframes
I am having issues with joins in pandas and I am trying to figure out what is wrong.
Say I have a dataframe
x:
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 1941 entries, 2004-10-19 00:00:00 to 2012-07-23 00:00:00
Data columns:
close 1941 non-null values
high 1941 non-null values
low 1941 non-null values
open 1941 non-null values
dtypes: float64(4)
should I be able to join it with y on index with a simple join command where y = x except colnames have +2.
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 1941 entries, 2004-10-19 00:00:00 to 2012-07-23 00:00:00
Data columns:
close2 1941 non-null values
high2 1941 non-null values
low2 1941 non-null values
open2 1941 non-null values
dtypes: float64(4)
y.join(x) or pandas.DataFrame.join(y,x):
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 34879 entries, 2004-12-16 00:00:00 to 2012-07-12 00:00:00
Data columns:
close2 34879 non-null values
high2 34879 non-null values
low2 34879 non-null values
open2 34879 non-null values
close 34879 non-null values
high 34879 non-null values
low 34879 non-null values
open 34879 non-null values
dtypes: float64(8)
I expect the final to have 1941 non-values for both. I tried merge as well but I have the same issue.
I had thought the right answer was pandas.concat([x,y]), but this does not do what I intend either.
In [83]: pandas.concat([x,y])
Out[83]: <class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 3882 entries, 2004-10-19 00:00:00 to 2012-07-23 00:00:00
Data columns:
close2 3882 non-null values
high2 3882 non-null values
low2 3882 non-null values
open2 3882 non-null values
dtypes: float64(4)
edit: If you are having issues with join, read Wes's answer below. I had one time stamp that was duplicated.
Solution 1:
Does your index have duplicates x.index.is_unique
? If so would explain the behavior you're seeing:
In [16]: left
Out[16]:
a
2000-01-01 1
2000-01-01 1
2000-01-01 1
2000-01-02 2
2000-01-02 2
2000-01-02 2
In [17]: right
Out[17]:
b
2000-01-01 3
2000-01-01 3
2000-01-01 3
2000-01-02 4
2000-01-02 4
2000-01-02 4
In [18]: left.join(right)
Out[18]:
a b
2000-01-01 1 3
2000-01-01 1 3
2000-01-01 1 3
2000-01-01 1 3
2000-01-01 1 3
2000-01-01 1 3
2000-01-01 1 3
2000-01-01 1 3
2000-01-01 1 3
2000-01-02 2 4
2000-01-02 2 4
2000-01-02 2 4
2000-01-02 2 4
2000-01-02 2 4
2000-01-02 2 4
2000-01-02 2 4
2000-01-02 2 4
2000-01-02 2 4
Solution 2:
It sounds like maybe you want pandas.concat
? merge
and join
do, well, joins, which means they will give you something based around the Cartesian product of the two inputs, but it sounds like you just want to paste them together into one big table.
Edit: did you try concat with axis=1
? It seems to do what you're asking for:
>>> print x
A B C
0 0.155614 -0.252148 0.861163
1 0.973517 1.156465 -0.458846
2 2.504356 -0.356371 -0.737842
3 0.012994 1.785123 0.161667
4 0.574578 0.123689 0.017598
>>> print y
A2 B2 C2
0 -0.280993 1.278750 -0.704449
1 0.140282 1.955322 -0.953826
2 0.581997 -0.239829 2.227069
3 -0.876146 -1.955199 -0.155030
4 -0.518593 -2.630978 0.333264
>>> print pandas.concat([x, y], axis=1)
A B C A2 B2 C2
0 0.155614 -0.252148 0.861163 -0.280993 1.278750 -0.704449
1 0.973517 1.156465 -0.458846 0.140282 1.955322 -0.953826
2 2.504356 -0.356371 -0.737842 0.581997 -0.239829 2.227069
3 0.012994 1.785123 0.161667 -0.876146 -1.955199 -0.155030
4 0.574578 0.123689 0.017598 -0.518593 -2.630978 0.333264