Create Empty Dataframe in Pandas specifying column types
Solution 1:
You can use the following:
df = pd.DataFrame({'a': pd.Series(dtype='int'),
'b': pd.Series(dtype='str'),
'c': pd.Series(dtype='float')})
or more abstractly:
df = pd.DataFrame({c: pd.Series(dtype=t) for c, t in {'a': 'int', 'b': 'str', 'c': 'float'}.items()})
then if you call df you have:
>>> df
Empty DataFrame
Columns: [a, b, c]
Index: []
and if you check its types:
>>> df.dtypes
a int32
b object
c float64
dtype: object
Solution 2:
One way to do it:
import numpy
import pandas
dtypes = numpy.dtype(
[
("a", str),
("b", int),
("c", float),
("d", numpy.datetime64),
]
)
df = pandas.DataFrame(numpy.empty(0, dtype=dtypes))
Solution 3:
This is an old question, but I don't see a solid answer (although @eric_g was super close).
You just need to create an empty dataframe with a dictionary of key:value pairs. The key being your column name, and the value being an empty data type.
So in your example dataset, it would look as follows (pandas 0.25 and python 3.7):
variables = {'contract':'',
'state_and_county_code':'',
'state':'',
'county':'',
'starting_membership':int(),
'starting_raw_raf':float(),
'enrollment_trend':float(),
'projected_membership':int(),
'projected_raf':float()}
df = pd.DataFrame(variables, index=[])
In old pandas versions, one may have to do:
df = pd.DataFrame(columns=[variables])
Solution 4:
This really smells like a bug.
Here's another (simpler) solution.
import pandas as pd
import numpy as np
def df_empty(columns, dtypes, index=None):
assert len(columns)==len(dtypes)
df = pd.DataFrame(index=index)
for c,d in zip(columns, dtypes):
df[c] = pd.Series(dtype=d)
return df
df = df_empty(['a', 'b'], dtypes=[np.int64, np.int64])
print(list(df.dtypes)) # int64, int64