Clang generates unusual code for pointer subtraction, is this a bug?

Solution 1:

I would say the problem lies here (line 2942 of https://github.com/llvm-mirror/llvm/blob/master/lib/Target/X86/X86FastISel.cpp)

// FastISel doesn't have a pattern for all X86::MUL*r and X86::IMUL*r. Emit
// it manually.
if (BaseOpc == X86ISD::UMUL && !ResultReg) {
  static const uint16_t MULOpc[] =
    { X86::MUL8r, X86::MUL16r, X86::MUL32r, X86::MUL64r };
  static const MCPhysReg Reg[] = { X86::AL, X86::AX, X86::EAX, X86::RAX };
  // First copy the first operand into RAX, which is an implicit input to
  // the X86::MUL*r instruction.
  BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc,
          TII.get(TargetOpcode::COPY), Reg[VT.SimpleTy-MVT::i8])
    .addReg(LHSReg);
  ResultReg = fastEmitInst_r(MULOpc[VT.SimpleTy-MVT::i8],
                             TLI.getRegClassFor(VT), RHSReg);
} else if (BaseOpc == X86ISD::SMUL && !ResultReg) {
  static const uint16_t MULOpc[] =
    { X86::IMUL8r, X86::IMUL16rr, X86::IMUL32rr, X86::IMUL64rr };
  if (VT == MVT::i8) {
    // Copy the first operand into AL, which is an implicit input to the
    // X86::IMUL8r instruction.
    BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc,
           TII.get(TargetOpcode::COPY), X86::AL)
      .addReg(LHSReg);
    ResultReg = fastEmitInst_r(MULOpc[0], TLI.getRegClassFor(VT), RHSReg);
  } else
    ResultReg = fastEmitInst_rr(MULOpc[VT.SimpleTy-MVT::i8],
                                TLI.getRegClassFor(VT), LHSReg, RHSReg);
}

The first if block is about unsigned multiplication (X86ISD::UMUL), the second else if block is about signed multiplication (X86ISD::SMUL).

For unsigned multiplication first a copy to RAX (or its shorter variants) is emitted before the actual multiplication.

For signed multiplication with operand size > 1 byte, the small else block at the bottom is executed, which directly does signed multiplication.

With x86 there is a two-operand-form of imul (signed multiply) - https://www.felixcloutier.com/x86/imul, but not of mul (unsigned multiply) - https://www.felixcloutier.com/x86/mul.

The optimizations for exchanging divisions with multiplications and detecting signed vs. unsigned are done in the architecture independent part of LLVM. (https://github.com/llvm/llvm-project/blob/main/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp (lines 5108, 5186, 5334).

The x86 backend does not check that it could use a signed multiply instruction for cases, where the unsigned operand is below (0x8000000000000000).

The additional shift right instruction is a separate issue from the register copying, which also happens one other architectures, which have an unsigned multiplication instruction as expressive as the signed one (e.g. ARM).

As suggested by you it is related to BuildExactSDIV() (in line 5108 of TargetLowering.cpp), which also contains:

  // Shift the value upfront if it is even, so the LSB is one.
  if (UseSRA) {
    // TODO: For UDIV use SRL instead of SRA.
    SDNodeFlags Flags;
    Flags.setExact(true);
    Res = DAG.getNode(ISD::SRA, dl, VT, Res, Shift, Flags);
    Created.push_back(Res.getNode());
  }

One should note the TODO comment about how this could be implemented for UDIV.

The unsigned division in BuildUDIV() (line 5334) has no similar option as BuildSDIV (line 5190) to call a BuildExactSDIV() (5219). There exists no BuildExactUDIV() and the normal buildup always involves a shift.