Do ARM processors like cortex-a9 use microcode?
TL,DR; While ARM processors use similar concepts to micro-coded CPUs (e.g. there is a hardware block that decodes instructions into one or more micro-operations), they are not microcoded in the traditional sense that uses a ROM to store each micro-instruction, nor can these micro-instructions/operations be modified after produced in actual hardware. Indeed, ARM processors use hardwired control in the instruction decoder to generate micro-operations.
In practice, however, modifying the instruction decoder can be similar to modifying a microcoded processor, because ARM licenses the hardware-description language (HDL) source code of its CPU architectures to individual manufacturers, making hardware-level modifications significantly easier to implement. See the Instruction Decoder section in the Microprocessor Design Wikibook for more differences between typical RISC and CISC instruction decoders.
While the ARM architecture itself is not microcoded in the traditional sense, individual instructions are decoded into smaller micro-operations. A modern ARM processor is far from "simple" - although the instructions themselves are very orthogonal, there is a lot of modern technology (e.g. pipelining, superscalar instructions, out-of-order execution, caching, extended complex instructions like floating-point units or NEON instructions) that a modern A9 core has. Indeed any processor can be simple enough to execute without translation to micro-operations, but this is essentially putting "all your eggs in one basket" - you cannot correct any possible errata in the instruction set, nor expand/modify it after production.
However, if we're only talking about the instruction decode stage, then indeed many ARM processors are not microcoded in a way that allows modification after-the-fact, although this may be because most manufacturers licensing ARM technology are given access to the actual hardware source code (written in an HDL). This reduces power consumption because a microcode stage isn't required, but the individual instructions are "compiled" into actual hardware blocks. This also allows for errata correction by each manufacturer.
Indeed, even in a CISC-based CPU (e.g. x86), there is no requirement for the use of microcode. In practice, however, the complexity of the instruction set, combined with various differences in licensing, power consumption, and applications, make the choice of microcode ideal for the case of x86. In the case of ARM, however, it's less useful as changes to the instruction set (decoder) are much easier to implement and control in terms of the hardware itself (as it can be customized by the manufacturer).
Although having microcode can actually simplify the design of the processor in some cases (as each instruction exists as a "micro program" as opposed to actual hardware), this is effectively just an instruction decoder (e.g. the Thumb-2 extension, allowing variable-length instructions to exist by adding a separate instruction decoder in-line with the ARM instruction decoder). While functionally these units can be implemented using microcode, this would not be wise in terms of power consumption, as you need to define the output for each control signal in the CPU itself, even if not required. This does not have anything to do with how "complex" the actual CPU itself is, however, as ARM cores have all modern constructs one would expect (pipelining, instruction/data caches, micro-TLB buffers, branch prediction, virtual memory, etc...).
In the case of ARM, given the orthogonality of the instruction set, the complexity involved in implementing such a microcoded approach would outweigh the benefits of simply changing the relevant hardware directly in the instruction decoder block. While this is certainly possible, it ends up "reinventing the wheel" so-to-speak, given you are capable of directly modifying (and compiling/testing/emulating) the changes in hardware.
You can "think" of the ARM source code itself as a type of microcoding in this case, although instead of storing each micro-operation/micro-program in a ROM that can be modified after-the-fact, they are implemented directly in hardware in the instruction decoder. Given the instruction decoder itself is written in VHDL/Verilog, making changes to existing instructions is as simple as modifying the source code, recompiling, and testing the new hardware (e.g. on an FPGA or a simulator). This contrasts with the complexity of modern x86 hardware, which is much more difficult to test/simulate during development, and even more difficult to modify after production (since the size in terms of transistors far exceeds what one can run inside even the most expensive modern FPGAs, thus adding a benefit to using a microcode store). Certainly the same fact goes for ARM, but the difference is in development - one can make changes to the processor hardware, and then directly see/test the changes on physical hardware using an FPGA.