How to choose AVX compare predicate variants
In the Advanced Vector Extensions (AVX) the compare instructions like _m256_cmp_ps, the last argument is a compare predicate. The choices for the predicate overwhelm me. They seem to be a tripple of type, ordering, signaling. E.g. _CMP_LE_OS is 'less than or equal, ordered, signaling.
For starters, is there a performance reason for selecting signaling or non signaling, and similarly, is ordered or unordered faster than the other?
And what does 'non signaling' even mean? I can't find this in the docs at all. Any rule of thumb on when to select what?
Here are the predicate choices from avxintrin.h:
/* Compare */
#define _CMP_EQ_OQ 0x00 /* Equal (ordered, non-signaling) */
#define _CMP_LT_OS 0x01 /* Less-than (ordered, signaling) */
#define _CMP_LE_OS 0x02 /* Less-than-or-equal (ordered, signaling) */
#define _CMP_UNORD_Q 0x03 /* Unordered (non-signaling) */
#define _CMP_NEQ_UQ 0x04 /* Not-equal (unordered, non-signaling) */
#define _CMP_NLT_US 0x05 /* Not-less-than (unordered, signaling) */
#define _CMP_NLE_US 0x06 /* Not-less-than-or-equal (unordered, signaling) */
#define _CMP_ORD_Q 0x07 /* Ordered (nonsignaling) */
#define _CMP_EQ_UQ 0x08 /* Equal (unordered, non-signaling) */
#define _CMP_NGE_US 0x09 /* Not-greater-than-or-equal (unord, signaling) */
#define _CMP_NGT_US 0x0a /* Not-greater-than (unordered, signaling) */
#define _CMP_FALSE_OQ 0x0b /* False (ordered, non-signaling) */
#define _CMP_NEQ_OQ 0x0c /* Not-equal (ordered, non-signaling) */
#define _CMP_GE_OS 0x0d /* Greater-than-or-equal (ordered, signaling) */
#define _CMP_GT_OS 0x0e /* Greater-than (ordered, signaling) */
#define _CMP_TRUE_UQ 0x0f /* True (unordered, non-signaling) */
#define _CMP_EQ_OS 0x10 /* Equal (ordered, signaling) */
#define _CMP_LT_OQ 0x11 /* Less-than (ordered, non-signaling) */
#define _CMP_LE_OQ 0x12 /* Less-than-or-equal (ordered, non-signaling) */
#define _CMP_UNORD_S 0x13 /* Unordered (signaling) */
#define _CMP_NEQ_US 0x14 /* Not-equal (unordered, signaling) */
#define _CMP_NLT_UQ 0x15 /* Not-less-than (unordered, non-signaling) */
#define _CMP_NLE_UQ 0x16 /* Not-less-than-or-equal (unord, non-signaling) */
#define _CMP_ORD_S 0x17 /* Ordered (signaling) */
#define _CMP_EQ_US 0x18 /* Equal (unordered, signaling) */
#define _CMP_NGE_UQ 0x19 /* Not-greater-than-or-equal (unord, non-sign) */
#define _CMP_NGT_UQ 0x1a /* Not-greater-than (unordered, non-signaling) */
#define _CMP_FALSE_OS 0x1b /* False (ordered, signaling) */
#define _CMP_NEQ_OS 0x1c /* Not-equal (ordered, signaling) */
#define _CMP_GE_OQ 0x1d /* Greater-than-or-equal (ordered, non-signaling) */
#define _CMP_GT_OQ 0x1e /* Greater-than (ordered, non-signaling) */
#define _CMP_TRUE_US 0x1f /* True (unordered, signaling) */
Ordered vs Unordered has to do with whether the comparison is true if one of the operands contains a NaN (see What does ordered / unordered comparison mean?). Signaling (S) vs non-signaling (Q for quiet?) will determine whether an exception is raised if an operand contains a NaN.
From a performance perspective, these should all be the same (assuming of course no exceptions are raised). If you want to be alerted when there's a NaN, then you want signaling. As for ordered vs unordered, it all depends on how you want to deal with NaNs.
When either operand is NaN, ordered vs unordered dictates the result value.
Ordered comparisons returns false
for NaN operands.
- _CMP_EQ_OQ of
1.0
and1.0
givestrue
(vanilla equality). - _CMP_EQ_OQ of
NaN
and1.0
givesfalse
. - _CMP_EQ_OQ of
1.0
andNaN
givesfalse
. - _CMP_EQ_OQ of
NaN
andNaN
givesfalse
.
Unordered comparison returns true
for NaN operands.
- _CMP_EQ_UQ of
1.0
and1.0
givestrue
(vanilla equality). - _CMP_EQ_UQ of
NaN
and1.0
givestrue
. - _CMP_EQ_UQ of
1.0
andNaN
givestrue
. - _CMP_EQ_UQ of
NaN
andNaN
givestrue
.
The difference between signalling vs non-signalling only impacts the value of the MXCSR. To observe the effect, you'd need to clear the MXCSR, perform one or more comparisons, then read from the MXCSR (thanks to Peter Cordes for clarifying this!).
The order of the enum values is pretty confusing. It helps to put them in a table...
comparison | ordered (non-signalling) | unordered (non-signalling) |
---|---|---|
a < b | _CMP_LT_OQ | _CMP_NGE_UQ |
a <= b | _CMP_LE_OQ | _CMP_NGT_UQ |
a == b | _CMP_EQ_OQ | _CMP_EQ_UQ |
a != b | _CMP_NEQ_OQ | _CMP_NEQ_UQ |
a >= b | _CMP_GE_OQ | _CMP_NLT_UQ |
a > b | _CMP_GT_OQ | _CMP_NLE_UQ |
true | _CMP_ORD_Q | _CMP_TRUE_UQ (useless) |
false | _CMP_FALSE_OQ (useless) | _CMP_UNORD_Q |
With MXCSR "signaling":
comparison | ordered (signalling) | unordered (signalling) |
---|---|---|
a < b | _CMP_LT_OS | _CMP_NGE_US |
a <= b | _CMP_LE_OS | _CMP_NGT_US |
a == b | _CMP_EQ_OS | _CMP_EQ_US |
a != b | _CMP_NEQ_OS | _CMP_NEQ_US |
a >= b | _CMP_GE_OS | _CMP_NLT_US |
a > b | _CMP_GT_OS | _CMP_NLE_US |
true | _CMP_ORD_S | _CMP_TRUE_US (useless) |
false | _CMP_FALSE_OS (useless) | _CMP_UNORD_S |
The order of the enum values can be explained by:
-
The first four ops are canonical (
EQ
,LT
,LE
,UNORD
). Note that if the0x00
and0x03
values wereLE
/UNORD
orUNORD
/LE
, the four canonical ops could be viewed as a composition of two separate bits, but that's not possible for their actual order. -
The remaining ops are transformations of the first four.
-
The
0x04
bit precisely inverts the result value, which also effectively also toggles ordered vs unordered. For example,LT_O
becomesNLT_U
, which is similar toGE
, but see the rule for unordered naming. -
The
0x08
bit toggles ordered vs unordered (without changing anything else). -
Setting both the
0x04
and0x08
bits negates the result for numerical operands, while retaining the same ordering behavior for NaN operands. For example,LT_O
becomesGE_O
. -
Note that when the comparison is unordered (ie, one of
0x04
or0x08
is set), a negated name is used:NGE
instead ofLT
,NGT
instead ofLE
,NLT
instead ofGE
, andNLE
instead ofGT
; however bothEQ
andNEQ
need to define both ordered and unordered variants, so those names only change under the0x04
negating transformation, not the0x08
orderedness-toggling transformation. -
FALSE
/TRUE
are mostly useless0x08
transformations ofUNORD
/ORD
, always returning the same value. For example,UNORD
(0x03
) returnsfalse
if both operands are numbers, ortrue
if either isNaN
; adding0x08
, we getFALSE
(0x0b
), which has toggled behavior forNaN
operands, causing it to returnfalse
for both cases.Fun fact: the
TRUE
op wasn't always completely useless. Prior to AVX2 it was the most compact mechanism for setting a YMM register to all 1's. See https://godbolt.org/z/Yb5TjP for details (Thanks Peter Cordes). -
The
0x10
bit toggles signaling vs not. Note that of the canonical ops,LE
andLT
are signaling, whileEQ
andUNORD
are not, so setting the0x10
bit removes signaling from theLE
/LT
ops and adds it to theEQ
/UNORD
ops. Because that's obviously sensible and not at all confusing.