Create Empty Dataframe in Pandas specifying column types

Solution 1:

You can use the following:

df = pd.DataFrame({'a': pd.Series(dtype='int'),
                   'b': pd.Series(dtype='str'),
                   'c': pd.Series(dtype='float')})

or more abstractly:

df = pd.DataFrame({c: pd.Series(dtype=t) for c, t in {'a': 'int', 'b': 'str', 'c': 'float'}.items()})

then if you call df you have:

>>> df 
Empty DataFrame 
Columns: [a, b, c]
Index: []

and if you check its types:

>>> df.dtypes
a      int32
b     object
c    float64
dtype: object

Solution 2:

One way to do it:

import numpy
import pandas

dtypes = numpy.dtype(
    [
        ("a", str),
        ("b", int),
        ("c", float),
        ("d", numpy.datetime64),
    ]
)
df = pandas.DataFrame(numpy.empty(0, dtype=dtypes))

Solution 3:

This is an old question, but I don't see a solid answer (although @eric_g was super close).

You just need to create an empty dataframe with a dictionary of key:value pairs. The key being your column name, and the value being an empty data type.

So in your example dataset, it would look as follows (pandas 0.25 and python 3.7):

variables = {'contract':'',
             'state_and_county_code':'',
             'state':'',
             'county':'',
             'starting_membership':int(),
             'starting_raw_raf':float(),
             'enrollment_trend':float(),
             'projected_membership':int(),
             'projected_raf':float()}

df = pd.DataFrame(variables, index=[])

In old pandas versions, one may have to do:

df = pd.DataFrame(columns=[variables])

Solution 4:

This really smells like a bug.

Here's another (simpler) solution.

import pandas as pd
import numpy as np

def df_empty(columns, dtypes, index=None):
    assert len(columns)==len(dtypes)
    df = pd.DataFrame(index=index)
    for c,d in zip(columns, dtypes):
        df[c] = pd.Series(dtype=d)
    return df

df = df_empty(['a', 'b'], dtypes=[np.int64, np.int64])
print(list(df.dtypes)) # int64, int64