Removing Rows from Pandas DataFrame based on Multiple Column Values
I am trying to remove rows from a large data frame based on whether each row has certain values in either of two different columns.
I will have a Series called "finalists". Finalists with be a series of names that will be imported from a different part of the code and will change each time its run.
ex)
finalists = ["Company A", "Company F", "Product S"... etc]
The dataframe will be about 1,000 rows long and 200 columns wide
Simplifying it, the dataframe would look something like this:
category | score | description | company_name | product_name | comments |
---|---|---|---|---|---|
"----" | 2.8 | "----" | Company A | Product A | "----" |
"----" | 1.2 | "----" | Company B | Product B | "----" |
"----" | 2.4 | "----" | Company C | Product C | "----" |
I need to keep the rows where either the company_name column or product_name column is one of the values in the Finalists Series (or remove rows where it isn't).
I tried doing something like this:
results = finalists.isin(app_data["company_name"]) or finalists.isin(app_data["product_name"])
but got an error that the answer was ambiguous
Solution 1:
You want something like
mask = app_data["company_name"].isin(finalists) | app_data["product_name"].isin(finalists)
filtered_app_data = app_data[mask]