Removing Rows from Pandas DataFrame based on Multiple Column Values

I am trying to remove rows from a large data frame based on whether each row has certain values in either of two different columns.

I will have a Series called "finalists". Finalists with be a series of names that will be imported from a different part of the code and will change each time its run.

ex)

finalists = ["Company A", "Company F", "Product S"... etc]

The dataframe will be about 1,000 rows long and 200 columns wide

Simplifying it, the dataframe would look something like this:

category score description company_name product_name comments
"----" 2.8 "----" Company A Product A "----"
"----" 1.2 "----" Company B Product B "----"
"----" 2.4 "----" Company C Product C "----"

I need to keep the rows where either the company_name column or product_name column is one of the values in the Finalists Series (or remove rows where it isn't).

I tried doing something like this:

results = finalists.isin(app_data["company_name"]) or finalists.isin(app_data["product_name"])

but got an error that the answer was ambiguous


Solution 1:

You want something like

mask = app_data["company_name"].isin(finalists) | app_data["product_name"].isin(finalists)

filtered_app_data = app_data[mask]