how to read certain columns from Excel using Pandas - Python
Solution 1:
You can use column indices (letters) like this:
import pandas as pd
import numpy as np
file_loc = "path.xlsx"
df = pd.read_excel(file_loc, index_col=None, na_values=['NA'], usecols="A,C:AA")
print(df)
Corresponding documentation:
usecols : int, str, list-like, or callable default None
If None, then parse all columns.
If str, then indicates comma separated list of Excel column letters and column ranges (e.g. “A:E” or “A,C,E:F”). Ranges are inclusive of both sides.
If list of int, then indicates list of column numbers to be parsed.
If list of string, then indicates list of column names to be parsed.
New in version 0.24.0.
If callable, then evaluate each column name against it and parse the column if the callable returns True.
Returns a subset of the columns according to behavior above.
New in version 0.24.0.
Solution 2:
parse_cols
is deprecated, use usecols
instead
that is:
df = pd.read_excel(file_loc, index_col=None, na_values=['NA'], usecols = "A,C:AA")
Solution 3:
"usecols" should help, use range of columns (as per excel worksheet, A,B...etc.) below are the examples
1. Selected Columns
df = pd.read_excel(file_location,sheet_name='Sheet1', usecols="A,C,F")
2. Range of Columns and selected column
df = pd.read_excel(file_location,sheet_name='Sheet1', usecols="A:F,H")
3. Multiple Ranges
df = pd.read_excel(file_location,sheet_name='Sheet1', usecols="A:F,H,J:N")
4. Range of columns
df = pd.read_excel(file_location,sheet_name='Sheet1', usecols="A:N")