Python - Dimension of Data Frame
df.shape
, where df
is your DataFrame.
Summary of all ways to get info on dimensions of DataFrame or Series
There are a number of ways to get information on the attributes of your DataFrame or Series.
Create Sample DataFrame and Series
df = pd.DataFrame({'a':[5, 2, np.nan], 'b':[ 9, 2, 4]})
df
a b
0 5.0 9
1 2.0 2
2 NaN 4
s = df['a']
s
0 5.0
1 2.0
2 NaN
Name: a, dtype: float64
shape
Attribute
The shape
attribute returns a two-item tuple of the number of rows and the number of columns in the DataFrame. For a Series, it returns a one-item tuple.
df.shape
(3, 2)
s.shape
(3,)
len
function
To get the number of rows of a DataFrame or get the length of a Series, use the len
function. An integer will be returned.
len(df)
3
len(s)
3
size
attribute
To get the total number of elements in the DataFrame or Series, use the size
attribute. For DataFrames, this is the product of the number of rows and the number of columns. For a Series, this will be equivalent to the len
function:
df.size
6
s.size
3
ndim
attribute
The ndim
attribute returns the number of dimensions of your DataFrame or Series. It will always be 2 for DataFrames and 1 for Series:
df.ndim
2
s.ndim
1
The tricky count
method
The count
method can be used to return the number of non-missing values for each column/row of the DataFrame. This can be very confusing, because most people normally think of count as just the length of each row, which it is not. When called on a DataFrame, a Series is returned with the column names in the index and the number of non-missing values as the values.
df.count() # by default, get the count of each column
a 2
b 3
dtype: int64
df.count(axis='columns') # change direction to get count of each row
0 2
1 2
2 1
dtype: int64
For a Series, there is only one axis for computation and so it just returns a scalar:
s.count()
2
Use the info
method for retrieving metadata
The info
method returns the number of non-missing values and data types of each column
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 2 columns):
a 2 non-null float64
b 3 non-null int64
dtypes: float64(1), int64(1)
memory usage: 128.0 bytes