What's the difference between a superscalar and a vector processor?

They both can process multiple instructions in the same time, but I suppose there is a fundamental difference which explains why there are two names and we haven't just switched to using superscalar ones always?

Also, if I understood correctly, both scalar and vector instructions are present in a modern CPU, so I suppose those two are not mutually exclusive (scalar instructions such as mov or add will be executed superscalar-ly and e.g. dot product will be calculated vector-ly in some special black magic-kind of way)?


A superscalar processor is capable of executing multiple instructions within a single program in parallel. It does this by analyzing the instruction stream to determine which instructions do not depend on each other, and having multiple execution units within the processor to do the work simultaneously (e.g. multiple ALUs). Compiler support is generally not required to optimize code for superscalar processors as the functionality is typically implemented entirely in hardware.1

A vector processor contains instructions specifically designed to operate on whole groups of multiple data values at once (called arrays or vectors). Most modern high-performance processors contain some form of vector processing capability; for example; the SSE ADDPS instruction available in most x86 processors computes the sum of two vectors each containing four single-precision values. Compiler, developer, and operating system support are typically required to use vector instructions, and not every processor, even in current generations, support the most advanced vector instructions (e.g. Intel Celeron and Pentium processors, even as of Kaby Lake, do not support AVX).

More technical information about how today's processors achieve high performance is available in this answer.


1 An alternative, and rather unusual, design approach is to have multiple execution units but let the compiler determine what instructions to issue to each execution unit for each clock cycle. This is called very long instruction word and is typically only found on specialized processors.


Since nobody came up with an answer, I think I have figured it out in the meanwhile.

Scalar processor is just a regular processor, executing scalar instructions which are working on one number at a time. Nothing special.

Vector processor on the other side uses vector instructions which are supposed to work on multiple numbers at the same time. There are special, wider, registers, intended for this purpose (e.g. SSE's 128-bit xmm* into which multiple values can be packed, for instance, 4 32-bit integers; AVX-512 introduces 512-bit registers which are the widest I could find). Vector ops are done by special units in the processor which are made for that purpose. A typical example of a vector processor would be GPU - it does only vector calculations.

Superscalar is the term used to denote specific optimization allowing scalar instructions to be executed in parallel, on different "regular" execution units (e.g. multiple ALUs). It divides instructions into multiple "streams" (I just made this term up) which are then executed at the same time.

So how are they different from their vector counterparts? Scalar instructions are not meant to be executed in that way. There are multiple possible hazards which could arise and prevent completely parallel execution, such as data or procedural dependencies. In that case, execution of that instruction would have to wait for its dependencies to be satisfied, pausing the execution of that "stream". The CPU has to take care of all dependencies in order to avoid data corruption, so special care has to be taken while optimizing the execution in this way.

It also doesn't introduce any new instructions - everything looks just the normal scalar CPU operation. On the other hand, vector CPUs has special instructions for vector operations. The main difference is that for vector ops, programmer (or, rather, compiler) must take care of the data and because there is no meddling with different registers at the same time (remember, all values are packed into wide registers), various hazards are avoided. On the other hand, superscalar CPUs do their best to figure out which instructions are independent of each other and execute them at the same time.


Notice how I never said any of categories are mutually exclusive? They aren't. Vector units will execute vector instructions, and the CPU will try to find the best way to parallelize scalar ones. In fact, all of the modern CPUs support both vector instructions (SSE*, 3DNow!, AVX,...) and scalar ones (x86) which will be executed in a "superscalar" way.