SSE2 for Dummies (who know C/C++)
IntroWhat is SSE2?SSE2 is an extension of assembly language which allows programs to execute one operation on multiple pieces of data at a time. Because SSE2 is assembly however, it only works on processors that support it. If the commands are attempted to be executed on a machine which is not capable of doing so, a general protection fault will be encountered. Luckily there are easy ways to tell if the processor(s) you are running on supports SSE2. Basic Structure of SSE2SSE2 works just like any other set of assembly calls. There are registers in which data can be stored and operations that can execute on these registers. Each register is 16 bytes (2 doubles). The 8 registers are named xmm0 through xmm7. BasicsSome Codeinline void Add(double *x, double *y, double *retval) { asm { // Copy the first 16 bytes into xmm0, starting at the memory x points to movupd xmm0, [x] // Copy the first 16 bytes into xmm1, starting at the memory y points to movupd xmm1, [y] // Add the 2 doubles in xmm1 to the 2 doubles in xmm0, and put the // result in xmm0, overwriting the previous data stored there addpd xmm0, xmm1 // Copy the 16 bytes of data in xmm0 to the memory ret points to movupd [retval], xmm0 } } Hopefully my comments before each line were enough to let you know what was going on. In case they weren't, I'll go into a little more detail about each line. asm{} This keyword lets your compiler know that the code you are giving it will be in assembly and that it should compile it as such. It also, conveniently, tells the compiler to inline the code. This means that there is NO overhead for the asm block. movupd xmm0, [x] movupd xmm1, [y] This command copies data from the second operand to the first; as always in Intel syntax, the asm is in dest, src order. By putting brackets around the x, we tell the mov command to copy the data that x points to the actual value of the pointer. The square brackets can be thought of as a method of dereferencing a pointer. addpd xmm0, xmm1 This is the line that does the actual arithmetic. It takes the value from the 2nd operand, src, and adds it to the 1st operand, dest, and stores the resulting value in the 1st operand, dest. |