GameDev.net -- Shader Programming Part I: Fundamentals of Vertex Shaders

Vertex Shader Architecture

Let's get deeper into vertex shader programming by looking on a graphical representation of the vertex shader architecture:

Figure 13 - Vertex Shader Architecture

All data are in a vertex shader is represented by 128-bit quad-floats (4 x 32-bit):

Figure 14 - 128 bits

A hardware vertex shader can be seen as a typical SIMD (Single Instruction Multiple Data) processor for you are applying one instruction and affecting a set of up to four 32-bit variables. This data format is very useful, because most of the transformation and lighting calculations are performed using 4x4 matrices or quaternions. The instructions are very simple and easy to understand. The vertex shader does not allow any loops, jumps or conditional branches, which means that it executes the program linearly - one instruction after the other. The maximum length of a vertex shader program in DirectX 8.x is limited to 128 instructions. Combining vertex shaders to have one to compute the transformation and the next one to compute the lighting is impossible. Only one vertex shader can be active at a time and the active vertex shader must compute all required per-vertex output data.

A vertex shader use up to 16 input registers (named v0 - v15, where each register consists of 128 bit (4x32bit) quad-floats) to access vertex input data. The vertex input register can easily hold the data for a typical vertex: its position coordinates, normal, diffuse and specular color, fog coordinate and point size information with space for the coordinates of several textures.

The constant registers (Constant Memory) are loaded by the CPU, before the vertex shader starts executing parameters defined by the programmer. The vertex shader is not able to write to the constant registers. They are used to store parameters such as light position, matrices, procedural data for special animation effects, vertex interpolation data for morphing/key frame interpolation and more. The constants can be applied within the program and they can even be addressed indirectly with the help of the address register a0.x, but only one constant can be used per instruction. If an instruction needs more than one constant, it must be loaded into on e of the temporary regsiters before it its required. The names of the constant registers are c0 - c95 or in case of the ATI RADEON 8500 c0 - c191.

The temporary Rgisters consist of 12 registers used to perform intermediate calculations. They can be used to load and store data (read/write). The names of the temporary registers are r0 - r11.

There are up to 13 output registers (Vertex Output), depending on the underlying hardware. The names of the output registers always start with o for output. The Vertex Output is available per rasterizer and your vertex shader program has write-only access to it. The final result is yet another vertex, a vertex transformed to the "homogenous clip space". Here is an overview of all available registers:

Registers: Number of Registers Properties

Input (v0 - v15) 16 RO1

Output (o*) GeForce 3/4TI: 9; RADEON 8500: 11 WO

Constants (c0 - c95) vs.1.1 Specification: 96; RADEON 8500: 192 RO1

Temporary (r0 - r11) 12 R1W3

Address (a0.x) 1 (vs.1.1 and higher) WO (W: only with mov)

An identifier of the streaming nature of this vertex shader architecture is the read-only input registers and the write-only output registers.

High Level View of Vertex Shader Programming

Contents

Introduction

Vertex Shaders in the Pipeline

Vertex Shader Tools

Vertex Shader Architecture

High Level View of Vertex Shader Programming

Conclusion

Printable version

Discuss this article

The Series

Fundamentals of Vertex Shaders

Programming Vertex Shaders

Fundamentals of Pixel Shaders

Programming Pixel Shaders

Diffuse & Specular Lighting with Pixel Shaders