Vertex Shader ArchitectureA graphical representation of the vertex shader unit might look like this: A vertex shader uses 16 input registers to access vertex data (Vertex Input). So it is able to compute vertices with up to 16 data entries, consisting of 128-bit (4 x 32-bit) quad-floats. This is quite a lot. It easily fits an average vertex with its position coordinates, weight, normal, diffuse and specular color, fog coordinate and point size information, leaving plenty of space for the coordinates of several textures. The names of the vertex input registers are v0 - v15. With these 128-bit quad-floats a hardware vertex shader can be seen as a typical SIMD (single instruction multiple data) processor, as you are applying one instruction and affect a set of up to four 32-bit variables. This makes sense because most of the transformation and lighting calculations are using 4x4 matrices or quaternions. The instructions are very simple and easy to understand. The vertex shader does not allow any loops, jumps or conditional branches, which means that it executes the program linearly one instruction after the other. The maximum length of a vertex shader program is restricted to 128 instructions. You can't cross that border. Combining vertex shaders to, for example, have one compute the transformation and the next one compute the lighting is impossible. The one and only active vertex shader must compute all required per-vertex output data. The so called constant registers or Constant Memory are loaded with parameters defined by the programmer before the vertex shader starts, because a vertex shader program can not write into constant registers. They are used to store constants such as lights, matrices, procedural data for special animation effects, vertex interpolation data for morphing/key frame interpolation and more. Those constants can be applied within the program and they can even be addressed indirectly with the help of the address register a0.x, but only one constant can be used per instruction. If an instruction needs more than one constant, they need to be loaded in one of the Registers with a previous load-instruction. The names of the constant registers are c0 - c95. The so called Registers consist of 12 temporary registers with the purpose of saving intermediate calculations. So, the vertex shader can juggle using them. They can be used to load and store data (read/write). The names of the temporary registers are r0 - r11. There are up to 13 output registers (Vertex Output), depending on the underlying hardware. The names of the output registers always start with o for output. The Vertex Output is available per rasterizer and your vertex shader program has write-only access to it. The final result is yet another vertex, at least transformed to the "homogenous clip space". Making the input registers read-only and the output registers write-only shows the streaming nature of this vertex shader architecture. |