Vertex Shader Architecture
Let's get deeper into vertex shader programming by looking on a graphical representation of the vertex shader architecture:
All data are in a vertex shader is represented by 128-bit quad-floats (4 x 32-bit):
A hardware vertex shader can be seen as a typical SIMD (Single Instruction Multiple Data) processor for you are applying one instruction and affecting a set of up to four 32-bit variables. This data format is very useful, because most of the transformation and lighting calculations are performed using 4x4 matrices or quaternions. The instructions are very simple and easy to understand. The vertex shader does not allow any loops, jumps or conditional branches, which means that it executes the program linearly - one instruction after the other. The maximum length of a vertex shader program in DirectX 8.x is limited to 128 instructions. Combining vertex shaders to have one to compute the transformation and the next one to compute the lighting is impossible. Only one vertex shader can be active at a time and the active vertex shader must compute all required per-vertex output data.
A vertex shader use up to 16 input registers (named v0 - v15, where each register consists of 128 bit (4x32bit) quad-floats) to access vertex input data. The vertex input register can easily hold the data for a typical vertex: its position coordinates, normal, diffuse and specular color, fog coordinate and point size information with space for the coordinates of several textures.
The constant registers (Constant Memory) are loaded by the CPU, before the vertex shader starts executing parameters defined by the programmer. The vertex shader is not able to write to the constant registers. They are used to store parameters such as light position, matrices, procedural data for special animation effects, vertex interpolation data for morphing/key frame interpolation and more. The constants can be applied within the program and they can even be addressed indirectly with the help of the address register a0.x, but only one constant can be used per instruction. If an instruction needs more than one constant, it must be loaded into on e of the temporary regsiters before it its required. The names of the constant registers are c0 - c95 or in case of the ATI RADEON 8500 c0 - c191.
The temporary Rgisters consist of 12 registers used to perform intermediate calculations. They can be used to load and store data (read/write). The names of the temporary registers are r0 - r11.
There are up to 13 output registers (Vertex Output), depending on the underlying hardware. The names of the output registers always start with o for output. The Vertex Output is available per rasterizer and your vertex shader program has write-only access to it. The final result is yet another vertex, a vertex transformed to the "homogenous clip space". Here is an overview of all available registers:
An identifier of the streaming nature of this vertex shader architecture is the read-only input registers and the write-only output registers.