A bug in Excel? Conditional formatting for marking duplicates also highlights unique value
If I let Excel highlight two duplicate values PT_INTERNAL2859736
, then also unique value *736
gets highlighted. Why this happens and how can I stop highlighting the unique value as duplicate?
I thought the Duplicate Values rule is reliable until I found this case.
The problem can be reproduced based on the image.
Now, if you reproduced the behavior, try to delete one of PT_
values. The other will lose the highlighting but the *736
will keep it!
Is this an expected behavior of the Duplicate Values functionality (usefulness of which I am overlooking)? Or is this rather a defect which has to be reported?
Update by iliansky (2021-01-25):
Similar undocumented behavior can also be observed with other symbols such as <
, >
, <=
, >=
when these symbols are placed in the beginning of the text in a cell and there are 2 or more non-empty cells in the column.
See an example here (each column is a separate example).
This might become an issue when processing xml files where the lines normally start with <
.
Solution 1:
It is indeed because *
is treated as a wildcard.
The way around it, is to use a formula to insert a tilde (~
) in order to escape the asterisk (*
) for your conditional formatting:
=COUNTIF(A:A,SUBSTITUTE(A1,"*","~*"))>1
Solution 2:
After further research of the behavior and documentation, I can answer the question from high-level perspective:
This is rather an undocumented behavior than a defect.
Other answers reminded us of use of wildcard characters (*
, ?
, ~
) in Excel formulas. The question unanswered before is if they are expected also in conditional formatting. Microsoft documents the Duplicate Values mode of conditional formatting in the following articles:
- Find and remove duplicates
- Filter for unique values or remove duplicate values
- Filter for or remove duplicate values
- Highlight patterns and trends with conditional formatting
Nowhere in these articles is mentioned that the internal algorithm searching for duplicates still respects wildcard characters (*
, ?
, ~
). On the contrary, support of wildcards is explicitly named in functionalities, where presence of wildcards is obvious and expected:
- standard search box
- functions like SEARCH(), SEARCHB(), COUNTIF(), ...
But back to duplicates: it can be discussed whether sample values PT_INTERNAL2859736
and *736
are duplicates by definition. By common sense, no. In Excel, obviously yes. It is possible that Excel simply uses its standard search algorithm (which honors wildcard characters) also for searching of cells with duplicate values... and the results of showing duplicate values can be quite unexpected as you can see in the question or here:
Based on the above, I would say this is not a defect, but an unexpected and undocumented behavior.
And this poses a problem in real-life scenarios. You were not warned, that you need a special formulas to discover real duplicates. You were just presented by Duplicate values conditional formatting type without further explanation. Today, I took a list of 2000 values to provide data for the customer and false positives were spotted only by coincidence and after double checking of results. I almost deleted unique values considering them to have duplicates somewhere in the list.
Current behavior is logical from Excel viewpoint but draws a huge exclamation mark for use by inexperienced users. If something should be fixed, it is at least the documentation.
Solution 3:
If you Google for excel asterisk wildcard conditional formatting
you'll find someone with the same issue.
There the proposed solution is to use a custom formula to check if the value is a duplicate.
The formula looks as follows:=SUMPRODUCT(--(("~"&A2)=("~"&$A$2:$A$4)))>1
And the result is: