Find column whose name contains a specific string

I have a dataframe with column names, and I want to find the one that contains a certain string, but does not exactly match it. I'm searching for 'spike' in column names like 'spike-2', 'hey spike', 'spiked-in' (the 'spike' part is always continuous).

I want the column name to be returned as a string or a variable, so I access the column later with df['name'] or df[name] as normal. I've tried to find ways to do this, to no avail. Any tips?

Solution 1:

Just iterate over DataFrame.columns, now this is an example in which you will end up with a list of column names that match:

import pandas as pd

data = {'spike-2': [1,2,3], 'hey spke': [4,5,6], 'spiked-in': [7,8,9], 'no': [10,11,12]}
df = pd.DataFrame(data)

spike_cols = [col for col in df.columns if 'spike' in col]
print(list(df.columns))
print(spike_cols)

Output:

['hey spke', 'no', 'spike-2', 'spiked-in']
['spike-2', 'spiked-in']

Explanation:

df.columns returns a list of column names
[col for col in df.columns if 'spike' in col] iterates over the list df.columns with the variable col and adds it to the resulting list if col contains 'spike'. This syntax is list comprehension.

If you only want the resulting data set with the columns that match you can do this:

df2 = df.filter(regex='spike')
print(df2)

Output:

   spike-2  spiked-in
0        1          7
1        2          8
2        3          9

Solution 2:

This answer uses the DataFrame.filter method to do this without list comprehension:

import pandas as pd

data = {'spike-2': [1,2,3], 'hey spke': [4,5,6]}
df = pd.DataFrame(data)

print(df.filter(like='spike').columns)

Will output just 'spike-2'. You can also use regex, as some people suggested in comments above:

print(df.filter(regex='spike|spke').columns)

Will output both columns: ['spike-2', 'hey spke']

Solution 3:

You can also use df.columns[df.columns.str.contains(pat = 'spike')]

data = {'spike-2': [1,2,3], 'hey spke': [4,5,6], 'spiked-in': [7,8,9], 'no': [10,11,12]}
df = pd.DataFrame(data)

colNames = df.columns[df.columns.str.contains(pat = 'spike')] 

print(colNames)

This will output the column names: 'spike-2', 'spiked-in'

More about pandas.Series.str.contains.

Solution 4:

# select columns containing 'spike'
df.filter(like='spike', axis=1)

You can also select by name, regular expression. Refer to: pandas.DataFrame.filter

Solution 5:

df.loc[:,df.columns.str.contains("spike")]

Could not load file or assembly 'xxx' or one of its dependencies. An attempt was made to load a program with an incorrect format

Every collection of disjoint non-empty open subsets of $\mathbb{R}$ is countable?

L'Hopital's rule and $\frac{\sin x}x$

$\mathbb R = X^2$ as a Cartesian product

Using congruences, show $\frac{1}{5}n^5 + \frac{1}{3}n^3 + \frac{7}{15}n$ is integer for every $n$

How many books are in a library?

Simple example of non-arithmetic ring (non-distributive ideal lattice)

The product of a normal and Rademacher variables, independent from each other

Evaluate the integral $\int\limits_{-\infty}^\infty \frac{\cos(x)}{x^2+1}dx$.

how can we show $\frac{\pi^2}{8} = 1 + \frac1{3^2} +\frac1{5^2} + \frac1{7^2} + …$?

The distribution of the minimum of two independent geometric random variables

Globally Lipschitz implies solutions exist for all time