How can I filter data based on items in a list?
Solution 1:
First of all, if you want to match sentences that contain “Amoy Street”, you’ll need to remove the “, Singapore” from that first entry (and the others that contain it, and likewise with “(Singapore)”). You may want to do this with a formula in a helper column, so let’s assume that your minimized list of road names is in cells B1
through B42
.
And assume that your seven example sentences (or seventeen thousand, or however many you have) are in Column C
. Type
=AND(ISERROR(SEARCH(B$1:B$42, C1)))
into cell D1
and type Ctrl+Shift+Enter.
This enters the formula as an array formula, causing it to appear in curly braces:
{=AND(ISERROR(SEARCH(A$1:A$3, B1)))}
. Drag/fill down to cover Column C
.
This will evaluate to FALSE for every sentence (C
value) that contains a string from B1:B42
and TRUE for every one that doesn’t.
You can filter on that. If you don’t like the fact that these values are “backwards”, just wrap the expression in a NOT( )
:
=NOT(AND(ISERROR(SEARCH(B$1:B$42, C1))))
This is case insensitive.
If you want it to be case sensitive, replace SEARCH
with FIND
.
This solution finds substrings. Making it case sensitive reduces the likelihood
of false positives like “Banson Road” and “Uboat Quay”.
If the road name will never appear at the beginning of a sentence, you can eliminate these substring match false positives by prepending a space to every road name in Column B
.
If a road name can appear at the beginning of a sentence,
then do that and change the formula to
=AND(ISERROR(SEARCH(B$1:B$42,
" " &
C1)))
To handle “Mr. Connaught drives a Mercedes”, do the same thing but append the spaces. This still won’t handle “Does Mr. Connaught drive a Mercedes?”, so you might want to go the case sensitive route.
I have tested this solution for a small number of strings. It is possible that it will fail if the list of roads is very long.