Removing Rows from Pandas DataFrame based on Multiple Column Values

I am trying to remove rows from a large data frame based on whether each row has certain values in either of two different columns.

I will have a Series called "finalists". Finalists with be a series of names that will be imported from a different part of the code and will change each time its run.

ex)

finalists = ["Company A", "Company F", "Product S"... etc]

The dataframe will be about 1,000 rows long and 200 columns wide

Simplifying it, the dataframe would look something like this:

category	score	description	company_name	product_name	comments
"----"	2.8	"----"	Company A	Product A	"----"
"----"	1.2	"----"	Company B	Product B	"----"
"----"	2.4	"----"	Company C	Product C	"----"

I need to keep the rows where either the company_name column or product_name column is one of the values in the Finalists Series (or remove rows where it isn't).

I tried doing something like this:

results = finalists.isin(app_data["company_name"]) or finalists.isin(app_data["product_name"])

but got an error that the answer was ambiguous

Solution 1:

You want something like

mask = app_data["company_name"].isin(finalists) | app_data["product_name"].isin(finalists)

filtered_app_data = app_data[mask]

Removing Rows from Pandas DataFrame based on Multiple Column Values

Solution 1:

Related

Recent Posts