How can I find a list of all SSE instructions? What happens if a CPU doesn't support SSE?
So I've been reading about how processors work. Now I'm on the instructions (SSE, SSE2, etc) stuff. (Which is pretty interesting).
I have lot of questions (I've been reading this stuff on Wikipedia):
I've saw the names of some instructions that were added on SSE, however there's no explanation about any of them (Maybe SSE4? They're not even listed on Wikipedia). Where can I read about what they do?
How do I know which of these instructions are being used?
If we do know which are being used, let's say I'm doing a comparison, (This may be the most stupid question I've ever asked, I don't know about assembly, though) Is it possible to directly use the instruction on an assembly code? (I've been looking at this: http://asm.inightmare.org/opcodelst/index.php?op=CMP)
How does the processor interpret the instructions?
What would happen if I had a processor without any of the SSE instructions? (I suppose in the case we want to do a comparison, we wouldn't be able to, right?)
I've saw the names of some instructions that we're added on SSE, however there's no explain about all of them (Maybe SSE4? They're not even listed on Wikipedia). Where i can read about what they do?
The best source would be straight from the people who designed the extensions: Intel. The definitive references are the Intel® 64 and IA-32 Architectures Software Developer Manuals; I would recommend that you download the combined Volumes 1 through 3C (first download link on that page). You may want to look at Vol. 1, Ch. 12
- Programming with SSE3, SSSE3, SSE4 and AESNI. To refer to specific instructions, see Vol. 2, Ch. 3-4
. (Appendix B is also helpful)
How do i know which of these instructions are being used?
The instructions are only used if a program you're running actually uses them (i.e. the bytecode corresponding to the various SSE4 instructions are being called). To find out what instructions a program uses, you need to use a disassembler.
If we do know which are being used, let's say i'm doing a comparation, (This may be the stupidest question i've ever done, i don't know about assembly, though) It's possible to directly use the instruction on an assembly code? (I've been looking at this: http://asm.inightmare.org/opcodelst/index.php?op=CMP)
How does the processor interpret the instructions?
You may want to have a look at my answer to the question, "How does a CPU 'know' what commands and instructions actually mean?". When you write out assembly code by hand, to make an executable, you pass the "human readable" assembly code to an assembler, which turns the instructions into the actual 0's and 1's the processor executes.
What would happen if i have a processor without any of the SSE instructions? (I suppose if in the case we want to do a comparation, we wouldn't be able, right?)
Since your computer is Turing complete, it can execute any arbitrary mathematical function using a software algorithm if it does not have the dedicated hardware to do so. Obviously, doing intense parallel or matrix mathematics in hardware is much faster than in software (requiring many loops of instructions), so this would cause a slow-down for the end user. Depending on how the program was created, it's possible that it may require a particular instruction (i.e. one from the SSE4 set), although given it's possible to do the same thing in software (and thus useable on more processors), this practice is rare.
As an example of the above, you may recall when processors first came out with the MMX instruction set extension. Let's say we want to add two 8-element, signed 8-bit vectors together (so each vector is 64-bits, equal to a single MMX register), or in other words, A + B = C
. This could be done with a single MMX instruction called paddsb
. For brevity, let's say our vectors are held at memory locations A
, B
, and C
as well. Our equivalent assembly code would be:
movq MM0, [A]
paddsb MM0, [B]
movq [C], MM0
However, this operation could also easily be done in software. For example, the following C code performs the equivalent operation (since a char
is 8-bits wide):
#define LEN 8
char A[LEN], B[LEN], C[LEN];
/* Code to initialize vectors A and B... */
for (i = 0; i < LEN; i++)
{
C[i] = A[i] + B[i];
}
You can probably guess how the assembly code of the above loop would look, but it's clear that it would contain significantly more instructions (as we now need a loop to handle adding the vectors), and thus, we would need to perform that many more fetches. This is similar to how the word length of a processor affects a computer's performance (the purpose of MMX/SSEx is to provide both larger registers, as well as the ability to perform the same instruction on multiple pieces of data).
Answering you in the same Order as Questions:
- The easiest way would be to go to Intel's Site and download the whitepapers. Event eh Processor's SDK Manual will have all the required details. Here is one such link. Here is another link to the SSE Instruction Set's Mnemonics and Explanations.
- What exactly do you mean which of these instructions are being used? Are you looking for information about your processor or a particular application?
For Processors, I don't know about Windows, but on Linux, you simply read it's processor flags. Easier done through the# lshw
command.
On the other hand, application specific, I'm not really sure, you could always disassemble an executable, and check out the instructions being used. Because most applications are complied for the mass audience, they will use only the Generic x86 Instruction Set. To use the more processor specific instructions, you should the compile the application manually on your system. - You could always run a simulator. If you want to use the Assembly code within your programming projects, you can do it in C and C++. I have only used ASM Code inside C, so don't know if any other language supports it. For help on using in-line ASM, Refer to this SO Question.
- That question lies heavily in the field of Computer Architecture. While I could explain it here, it will not be easy. There was another SU question, that dealt with this subject.
- To answer your specific question, the SSE Instruction Set came out only in 1999, while the CMP instruction has been around since way before that. It was part of the Instruction Set in 8080 too. In any case, with our Machines being Turing-Complete, event he older Microprocessors could perform comparisons. Only, it was tougher to do them without an explicit instruction. Every Instruction Set is only a faster, easier and more optimized way to carry out certain instructions, it barely adds new functionality, since a Turing-Complete Machine can always
compute everything that is computable
I've saw the names of some instructions that were added on SSE, however there's no explanation about any of them (Maybe SSE4? They're not even listed on Wikipedia).
That's not correct. There's a list on wikipedia about every x86 instruction, including even deprecated and undocumented instructions
Where can I read about what they do?
To know about any CPUs you need to read their manufacturer's manual. In this case Intel or maybe AMD. For a compact compilation of instructions these are two reliable sources
- X86 Opcode and Instruction Reference
- x86 and amd64 instruction reference
If we do know which are being used, let's say I'm doing a comparison, (This may be the most stupid question I've ever asked, I don't know about assembly, though) Is it possible to directly use the instruction on an assembly code? (I've been looking at this: http://asm.inightmare.org/opcodelst/index.php?op=CMP)
Assembly is just a human-readable version of the machine code. The names you see are mnemonics for the instructions in assembly, so of course they have always been used directly in assembly
What would happen if I had a processor without any of the SSE instructions? (I suppose in the case we want to do a comparison, we wouldn't be able to, right?)
In reality nowadays you can hardly get an x86 CPU that doesn't support SSE because it has been introduced since Pentium III 20 years ago. But typically if the CPU sees an invalid instruction/opcode it will raise an exception. Normally the OS simply announces the error and then terminates the program. But if needed the application can catch that exception and process the instruction in software. This will cause extreme inefficiency because of the state toggling between program and exception handler but the program can run without modification.
This has been used in the past when some CPUs had no built-in FPU and floating-point math were done in a separate coprocessor. In that case if the a coprocessor has not been attached then floating-point instructions will raise exceptions and the exception handler will calculate the operation in software before transfering them back to the program. See What is the protocol for x87 floating point emulation in MS-DOS?
It was also used by some Hackintosh patch to make MacOS X (which requires SSE2/3 or more) to run on older CPUs with only SSE