Is there a way to copy only the structure (not the data) of a Pandas DataFrame?
I received a DataFrame from somewhere and want to create another DataFrame with the same number and names of columns and rows (indexes). For example, suppose that the original data frame was created as
import pandas as pd
df1 = pd.DataFrame([[11,12],[21,22]], columns=['c1','c2'], index=['i1','i2'])
I copied the structure by explicitly defining the columns and names:
df2 = pd.DataFrame(columns=df1.columns, index=df1.index)
I don't want to copy the data, otherwise I could just write df2 = df1.copy()
. In other words, after df2 being created it must contain only NaN elements:
In [1]: df1
Out[1]:
c1 c2
i1 11 12
i2 21 22
In [2]: df2
Out[2]:
c1 c2
i1 NaN NaN
i2 NaN NaN
Is there a more idiomatic way of doing it?
That's a job for reindex_like
. Start with the original:
df1 = pd.DataFrame([[11, 12], [21, 22]], columns=['c1', 'c2'], index=['i1', 'i2'])
Construct an empty DataFrame and reindex it like df1:
pd.DataFrame().reindex_like(df1)
Out:
c1 c2
i1 NaN NaN
i2 NaN NaN
In version 0.18 of pandas, the DataFrame constructor has no options for creating a dataframe like another dataframe with NaN instead of the values.
The code you use df2 = pd.DataFrame(columns=df1.columns, index=df1.index)
is the most logical way, the only way to improve on it is to spell out even more what you are doing is to add data=None
, so that other coders directly see that you intentionally leave out the data from this new DataFrame you are creating.
TLDR: So my suggestion is:
Explicit is better than implicit
df2 = pd.DataFrame(data=None, columns=df1.columns, index=df1.index)
Very much like yours, but more spelled out.
Let's start with some sample data
In [1]: import pandas as pd
In [2]: df = pd.DataFrame([[1, 'a'], [2, 'b'], [3, 'c']],
...: columns=['num', 'char'])
In [3]: df
Out[3]:
num char
0 1 a
1 2 b
2 3 c
In [4]: df.dtypes
Out[4]:
num int64
char object
dtype: object
Now let's use a simple DataFrame
initialization using the columns of the original DataFrame
but providing no data:
In [5]: empty_copy_1 = pd.DataFrame(data=None, columns=df.columns)
In [6]: empty_copy_1
Out[6]:
Empty DataFrame
Columns: [num, char]
Index: []
In [7]: empty_copy_1.dtypes
Out[7]:
num object
char object
dtype: object
As you can see, the column data types are not the same as in our original DataFrame
.
So, if you want to preserve the column dtype
...
If you want to preserve the column data types you need to construct the DataFrame
one Series
at a time
In [8]: empty_copy_2 = pd.DataFrame.from_items([
...: (name, pd.Series(data=None, dtype=series.dtype))
...: for name, series in df.iteritems()])
In [9]: empty_copy_2
Out[9]:
Empty DataFrame
Columns: [num, char]
Index: []
In [10]: empty_copy_2.dtypes
Out[10]:
num int64
char object
dtype: object