GameDev.net -- Shader Programming Part II: Programming Vertex Shaders

Preface

During this part of the introduction a very simple program that shows a rotating quad will evolve into a more sophiscticated application showing a Bézier patch class with a diffuse and specular reflection model, featuring a point light source. The example applications are all build on each other in a way that most of the code of the previous example is re-used in the following example. This way the explanation of the features stayed focused on the advancements of the specific example.

RacorX

Figure 1 - RacorX

RacorX displays a green color, that is applied to the quad evenly. This example demonstrate the usage of the common file framework, provided with the DirectX 8.1 SDK and how to compile vertex shaders with the D3DXAssembleShader() function.

Like with all the upcoming examples, which are based on the Common Files, <Alt>+<Enter> switches between the windowed and full-screen mode, <F2> gives you a selection of the usable drivers and <Esc> will shutdown the application.

First let's take a look at the files you need to compile the program:

Figure 2- Directory Content

The source file is RacorX.cpp, the resource files are winmain.rc and resource.h. The icon file is directx.ico and the executable is RacorX.exe. The remaining files are for the use of the Visual C/C++ 6 IDE.

To compile this example, you should link it with the following *.lib files:

d3d8.lib

d3dx8dt.lib

dxguid.lib

d3dxof.lib

winmm.lib

gdi32.lib

user32.lib

kernel32.lib

advapi32.lib

Most of these *.lib files are COM wrappers. The d3dx8dt.lib is the debug version of the Direct3DX static link library.

The release Direct3DX static link library is called d3dx8.lib. There is also a *.dll version of the debug build called d3dx8d.dll in the system32 directory. It is used by linking to the d3dx8d.lib COM wrapper.

All of these *.lib files have to be included in the <Object/libary modules:> entry field. This is located at <Project->Settings> and there under the <Link> tab:

Figure 3 - Project Settings

The provided Visual C/C++ 6 IDE workspace references the common files in a folder with the same name:

Figure 4 - Workspace

They were added to the project with Project->Add to the Project->Files:

Figure 5 - Add Files to Project

The Common Files Framework

The common files framework helps getting up to speed, because:

It helps to avoid how-tos for Direct3D in general, so that the focus of this text is the real stuff.

It's common and tested foundation, which helps reduce the debug time.

All of the Direct3D samples in the DirectX SDK use it. Learning time is very short.

Its window mode makes debugging easier.

Self-developed production code could be based on the common files, so knowing them is always a win.

A high-level view of the Common Files shows 14 *.cpp files in

C:\DXSDK\samples\Multimedia\Common\src

These files encapsulate the basic functionality you need to start programming a Direct3D application. The most important d3dapp.cpp contains the class CD3DApplication. It provides seven functions that can be overriden and that are used in the main *.cpp file of any project in this introduction:

virtual HRESULT OneTimeSceneInit() { return S_OK; }

virtual HRESULT InitDeviceObjects() { return S_OK; }

virtual HRESULT RestoreDeviceObjects() { return S_OK; }

virtual HRESULT DeleteDeviceObjects() { return S_OK; }

virtual HRESULT Render() { return S_OK; }

virtual HRESULT FrameMove( FLOAT ) { return S_OK; }

virtual HRESULT FinalCleanup() { return S_OK; }

All that has to be done, to create an application based on this framework code is to create a new project and new implementations of these overridable functions in the main source file. This is also shown in all Direct3D examples in the DirectX SDK.

RacorX uses these framework functions in RacorX.cpp. They can be called the public interface of the common files framework.

Figure 6 - Framework Public Interface

The following functions are called in the following order in racorx.cpp at startup:

ConfirmDevice()

OneTimeSceneInit()

InitDeviceObjects()

RestoreDeviceObjects()

Now the application is running. While it is running, the framework calls

FrameMove()

Render()

in a loop.

If the user resizes the window, the framework will call

InvalidateDeviceObjects()

RestoreDeviceObjects()

If the user presses F2 or clicks <File>-><Change device> and changes the device by choosing for example another resolution or color quality, the framework will call

InvalidateDeviceObjects()

DeleteDeviceObjects()

InitDeviceObjects()

RestoreDeviceObjects()

If the user quits the application, the framework will call

InvalidateDeviceObjects()

DeleteDeviceObjects()

FinalCleanup()

There are matching functional pairs. InvalidateDeviceObjects() destroys what RestoreDeviceObjects() has build up and DeleteDeviceObjects() destroys what InitDeviceObjects() has build up. The FinalCleanup() function destroys what OneTimeSceneInit() build up.
The idea is to give every functional pair its own tasks. The OneTimeSceneInit() / FinalCleanup() pair is called once at the beginning and the end of a life-cycle of the game. Both are used to load or delete data, which is not device dependant. A good candidate might be geometry data. The target of the InitDeviceObjects() / DeleteDeviceObjects() pair is, like the name implies, data that is device dependant. If the already loaded data has to be changed, when the device changes, it should be loaded here. The following examples will load, re-create or destroy their vertex buffer and index buffers and their textures in these functions.
The InvalidateDeviceObjects() / RestoreDeviceObjects() pair has to react on changes of the window size. So for example code that handles the projection matrix might be placed here. Additionally the following examples will set most of the render states in RestoreDeviceObjects().

Now back to RacorX. Like shown in part 1 of this introduction, the following list tracks the life-cycle of a vertex shader:

Check for vertex shader support by checking the D3DCAPS8::VertexShaderVersion field

Declaration of the vertex shader with the D3DVSD_* macros, to map vertex buffer streams to input registers

Setting the vertex shader constant registers with SetVertexShaderConstant()

Compiling an already written vertex shader with D3DXAssembleShader*() (Alternatively: could be pre-compiled using a Shader Assembler)

Creating a vertex shader handle with CreateVertexShader()

Setting a vertex shader with SetVertexShader() for a specific object

Free vertex shader resources handled by the Direct3D engine with DeleteVertexShader()

We will walk step-by-step through this list in the following pages.

Check for Vertex Shader Support

The supported vertex shader version is checked in ConfirmDevice() in racorx.cpp:

HRESULT CMyD3DApplication::ConfirmDevice( D3DCAPS8* pCaps, DWORD dwBehavior, D3DFORMAT Format ) { if( (dwBehavior & D3DCREATE_HARDWARE_VERTEXPROCESSING ) || (dwBehavior & D3DCREATE_MIXED_VERTEXPROCESSING ) ) { if( pCaps->VertexShaderVersion < D3DVS_VERSION(1,1) ) return E_FAIL; } return S_OK; }

If the framework has already initialized hardware or mixed vertex processing, the vertex shader version will be checked. If the framework initialized software vertex processing, the software-implementation provided by Intel and AMD jumps in and a check of the hardware capabilities is not needed.
The globally available pCaps capability data structure is filled with a call to GetDeviceCaps() by the framework. pCaps->VertexShaderVersion holds the vertex shader version in a DWORD. The macro D3DVS_VERSION helps checking the version number. For example the support of at least vs.2.0 in hardware will be checked with D3DVS_VERSION(2,0).

After checking the hardware capabilities for vertex shader support, the vertex shader has to be declared.

Vertex Shader Declaration

Declaring a vertex shader means mapping vertex data to specific vertex shader input registers, therefore the vertex shader declaration must reflect the vertex buffer layout, because the vertex buffer must transport the vertex data in the correct order. The one used in this example program is very simple. The vertex shader will get the position data via v0.

// shader decl DWORD dwDecl[] = { D3DVSD_STREAM(0), D3DVSD_REG(0, D3DVSDT_FLOAT3 ), // D3DVSDE_POSITION,0 D3DVSD_END() }; The corresponding layout of the vertex buffer looks like this: // vertex type struct VERTEX { FLOAT x, y, z; // The untransformed position for the vertex }; // Declare custom FVF macro. #define D3DFVF_VERTEX (D3DFVF_XYZ)

The position values will be stored in the vertex buffer and bound through the SetStreamSource() function to a device data stream port, that feed the primitive processing functions (this is the Higher-Order Surfaces (HOS) stage or directly the vertex shader, depending on the usage of HOS; see the Direct3D pipeline in part 1).

We do not use vertex color here, so no color values are declared.

Setting the Vertex Shader Constant Registers

The vertex shader constant registers have to be filled with a call to SetVertexShaderConstant(). We set the material color in RestoreDeviceObjects() in c8 in this example:

// set material color FLOAT fMaterial[4] = {0,1,0,0}; m_pd3dDevice->SetVertexShaderConstant(8, fMaterial, 1);

SetVertexShaderConstant() is declared like:

HRESULT SetVertexShaderConstant (DWORD Register, CONST void* pConstantData, DWORD ConstantCount);

The first parameter provides the number of the constant register that should be used. In this case 8. The second parameter stores the 128bit data in that constant register and the third parameter gives you the possibility to use the following registers as well. A 4x4 matrix can be stored with one SetVertexShaderConstant() call by providing the number four in ConstantCount. This is done for the clipping matrix in FrameMove():

// set the clip matrix ... m_pd3dDevice->SetVertexShaderConstant(4, matTemp, 4);

This way the c4, c5, c6 and c7 registers are used to store the matrix.

The Vertex Shader

The vertex shader that is used by RacorX is very simple:

// reg c4-7 = WorldViewProj matrix // reg c8 = constant color // reg v0 = input register const char BasicVertexShader[] = "vs.1.1 //Shader version 1.1 \n"\ "dp4 oPos.x, v0, c4 //emit projected position \n"\ "dp4 oPos.y, v0, c5 //emit projected position \n"\ "dp4 oPos.z, v0, c6 //emit projected position \n"\ "dp4 oPos.w, v0, c7 //emit projected position \n"\ "mov oD0, c8 //material color = c8 \n";

It is used inline in a constant char array in RacorX.cpp. This vertex shader incorporates the vs.1.1 vertex shader implementation rules. It transforms from the concatenated and transposed world-, view- and projection-matrix to the clip matrix or clip space with the four dp4 instructions and kicks out into oD0 a green material color with mov.
As shown above, the values of the c4 - c7 constant registers are set in FrameMove(). These values are calculated by the following code snippet:

// rotates the object about the y-axis D3DXMatrixRotationY( &m_matWorld, m_fTime * 1.5f ); // set the clip matrix D3DXMATRIX matTemp; D3DXMatrixTranspose( &matTemp , &(m_matWorld * m_matView * m_matProj) ); m_pd3dDevice->SetVertexShaderConstant(4, matTemp, 4);

First the quad is rotated around the y-axis by the D3DMatrixRotationY() call, then the concatenated matrix is transposed and then stored in the constant registers c4 - c7. The source of the D3DMatrixRotationY() function might look like:

VOID D3DMatrixRotationY(D3DMATRIX * mat, FLOAT fRads) { D3DXMatrixIdentity(mat); mat._11 = cosf(fRads); mat._13 = -sinf(fRads); mat._31 = sinf(fRads); mat._33 = cosf(fRads); } = cosf(fRads) 0 -sinf(fRads) 0 0 0 0 0 sinf(fRads) 0 cosf(fRads) 0 0 0 0 0

So fRads equals the amount you want to rotate about the y-axis. After changing the values of the matrix this way, we transpose the matrix by using D3DXMatrixTranspose(), so that its columns are stored as rows. Why do we have to transpose the matrix?
A 4x4 matrix looks like this:

a b c d e f g h i j k l m n o p

The formula for transforming a vector (v0) through the matrix is:
dest.x = (v0.x * a) + (v0.y * e) + (v0.z * i) + (v0.w * m) dest.y = (v0.x * b) + (v0.y * f) + (v0.z * j) + (v0.w * n) dest.z = (v0.x * c) + (v0.y * g) + (v0.z * k) + (v0.w * o) dest.w = (v0.x * d) + (v0.y * h) + (v0.z * l) + (v0.w * p)
So each column of the matrix should be multiplied with each component of the vector. Our vertex shader uses four dp4 instructions:

dest.w = (src1.x * src2.x) + (src1.y * src2.y) + (src1.z * src2.z) + (src1.w * src2.w) dest.x = dest.y = dest.z = unused

The dp4 instructions multiplies a row of a matrix with each component of the vector. Without transposing we would end up with:

dest.x = (v0.x * a) + (v0.y * b) + (v0.z * c) + (v0.w * d) dest.y = (v0.x * e) + (v0.y * f) + (v0.z * g) + (v0.w * h) dest.z = (v0.x * i) + (v0.y * j) + (v0.z * k) + (v0.w * l) dest.w = (v0.x * m) + (v0.y * n) + (v0.z * o) + (v0.w * p)

which is wrong. By transposing the matrix it looks like this in constant memory:

a e i m b f j n c g k o d h l p

so the 4 dp4 operations would now yield:

dest.x = (v0.x * a) + (v0.y * e) + (v0.z * i) + (v0.w * m) dest.y = (v0.x * b) + (v0.y * f) + (v0.z * j) + (v0.w * n) dest.z = (v0.x * c) + (v0.y * g) + (v0.z * k) + (v0.w * o) dest.w = (v0.x * d) + (v0.y * h) + (v0.z * l) + (v0.w * p)

or

oPos.x = (v0.x * c4.x) + (v0.y * c4.y) + (v0.z * c4.z) + (v0.w * c4.w) oPos.y = (v0.x * c5.x) + (v0.y * c5.y) + (v0.z * c5.z) + (v0.w * c5.w) oPos.z = (v0.x * c6.x) + (v0.y * c6.y) + (v0.z * c6.z) + (v0.w * c6.w) oPos.w = (v0.x * c7.x) + (v0.y * c7.y) + (v0.z * c7.z) + (v0.w * c7.w)

which is exactly how the vector transformation should work.

dp4 gets the matrix values via the constant register c4 - c7 and the vertex position via the input register v0. Temporary registers are not used in this example. The dot product of the dp4 instructions is written to the oPos output register and the value of the constant register c8 is moved into the output register oD0, that is usually used to output diffuse color values.

Compiling a Vertex Shader

The vertex shader that is stored in a char array is compiled with a call to the following code snippet in RestoreDeviceObjects():

// Assemble the shader rc = D3DXAssembleShader( BasicVertexShader , sizeof(BasicVertexShader) -1, 0 , NULL , &pVS , &pErrors ); if ( FAILED(rc) ) { OutputDebugString( "Failed to assemble the vertex shader, errors:\n" ); OutputDebugString( (char*)pErrors->GetBufferPointer() ); OutputDebugString( "\n" ); }

D3DXAssembleShader() creates a binary version of the shader in a buffer object via the ID3DXBuffer interface in pVS.

HRESULT D3DXAssembleShader( LPCVOID pSrcData, UINT SrcDataLen, DWORD Flags, LPD3DXBUFFER* ppConstants, LPD3DXBUFFER* ppCompiledShader, LPD3DXBUFFER* ppCompilationErrors );

The source data is provided in the first parameter and the size of the data length in bytes is provided in the second parameter. There are two possible flags for the third parameter called

#define D3DXASM_DEBUG 1 #define D3DXASM_SKIPVALIDATION 2

The first one inserts debug info as comments into the shader and the second one skips validation. This flag can be set for a working shader.
Via the fourth parameter a ID3DXBuffer interface can be exported, to get a vertex shader declaration fragment of the constants. To ignore this parameter, it is set to NULL here. In case of an error, the error explanation would be stored in a buffer object via the ID3DXBuffer interface in pErrors. To see the output of OutputDebugString() the debug process in the Visual C/C++ IDE must be started with <F5>.

Creating a Vertex Shader

The vertex shader is validated and a handle for it is retrieved via a call to CreateVertexShader() in m_dwVertexShader: The following lines of code can be found in RestoreDeviceObjects():

// Create the vertex shader rc = m_pd3dDevice->CreateVertexShader( dwDecl, (DWORD*)pVS->GetBufferPointer(), &m_dwVertexShader, 0 ); if ( FAILED(rc) ) { OutputDebugString( "Failed to create the vertex shader, errors:\n" ); D3DXGetErrorStringA(rc,szBuffer,sizeof(szBuffer)); OutputDebugString( szBuffer ); OutputDebugString( "\n" ); }

CreateVertexShader() gets a pointer to the buffer with the binary version of the vertex shader via the ID3DXBuffer interface. This function gets the vertex shader declaration via dwDecl, that maps vertex data to specific vertex shader input registers. If an error occurs, its explanation is accessible via a pointer to a buffer object that is retrieved via the ID3DXBuffer interface in pVS->GetBufferPointer(). D3DXGetErrorStringA() interprets all Direct3D and Direct3DX HRESULTS and returns an error message in szBuffer.
It is possible to force the usage of software vertex processing with the last parameter by using the D3DUSAGE_SOFTWAREPROCESSING flag. It must be used when the D3DRS_SOFTWAREVERTEXPROCESSING member of the D3DRENDERSTATETYPE enumerated type is TRUE.

Setting a Vertex Shader

The vertex shader is set via SetVertexShader() in the Render() function:

// set the vertex shader m_pd3dDevice->SetVertexShader( m_dwVertexShader );

The only parameter that must be provided is the handle to the vertex shader. This function executes the vertex shader as often as there are vertices.

Free Vertex Shader Resources

Vertex shader resources must be freed with a call to

if ( m_dwVertexShader != 0xffffffff ) { m_pd3dDevice->DeleteVertexShader( m_dwVertexShader ); m_dwVertexShader = 0xffffffff; }

This example frees the vertex shader resources in the InvalidateDeviceObjects() framework function, because this has to happen in case of a change of the window size or a device.

Non-Shader specific Code

The non-shader specific code of RacorX deals with setting render states and the handling of the vertex and index buffer. A few render states have to be set in RestoreDeviceObjects():

// z-buffer enabled m_pd3dDevice->SetRenderState( D3DRS_ZENABLE, TRUE ); // Turn off D3D lighting, since we are providing our own vertex shader lighting m_pd3dDevice->SetRenderState( D3DRS_LIGHTING, FALSE ); // Turn off culling, so we see the front and back of the quad m_pd3dDevice->SetRenderState( D3DRS_CULLMODE, D3DCULL_NONE );

The first instructions enables the z-buffer (a corresponded flag has to be set in the constructor of the Direct3D framework class, so that the device is created with a z-buffer).
The fixed-function lighting is not needed, so it is switched off with the second statement. To be able to see both sides of the quad, backface culling is switched off with the third statement.

The vertex and index buffer is created in InitDeviceObjects():

// create and fill the vertex buffer // Initialize vertices for rendering a quad VERTEX Vertices[] = { // x y z { -1.0f,-1.0f, 0.0f, }, { 1.0f,-1.0f, 0.0f, }, { 1.0f, 1.0f, 0.0f, }, { -1.0f, 1.0f, 0.0f, }, }; m_dwSizeofVertices = sizeof (Vertices); // Create the vertex buffers with four vertices if( FAILED( m_pd3dDevice->CreateVertexBuffer( 4 * sizeof(VERTEX), D3DUSAGE_WRITEONLY , sizeof(VERTEX), D3DPOOL_MANAGED, &m_pVB ) ) ) return E_FAIL; // lock and unlock the vertex buffer to fill it with memcpy VOID* pVertices; if( FAILED( m_pVB->Lock( 0, m_dwSizeofVertices, (BYTE**)&pVertices, 0 ) ) ) return E_FAIL; memcpy( pVertices, Vertices, m_dwSizeofVertices); m_pVB->Unlock(); // create and fill the index buffer // indices WORD wIndices[]={0, 1, 2, 0, 2, 3}; m_wSizeofIndices = sizeof (wIndices); // create index buffer if(FAILED (m_pd3dDevice->CreateIndexBuffer(m_wSizeofIndices, 0, D3DFMT_INDEX16, D3DPOOL_MANAGED, &m_pIB))) return E_FAIL; // fill index buffer VOID *pIndices; if (FAILED(m_pIB->Lock(0, m_wSizeofIndices, (BYTE **)&pIndices, 0))) return E_FAIL; memcpy(pIndices, wIndices, m_wSizeofIndices); m_pIB->Unlock();

The four vertices of the quad are stored in a VERTEX structure, which holds for each vertex three FLOAT values for the position.
By using the flag D3DFMT_INDEX16 in CreateIndexBuffer(), 16-bit variables are used to store the indices into the wIndices structure. So the maximum number of available indices are 64 k. Both buffers use a managed memory pool with D3DPOOL_MANAGED, so they will be cached in the system memory.

D3DPOOL_MANAGED resources are read from the system memory which is quite fast and they are written to the system memory and afterwards uploaded to wherever the non-system copy has to go (AGP or VIDEO memory). This upload happens when the resource is unlocked. So there are always two copies of a resource, one in the system and one in the AGP or VIDEO memory. This is a less efficient but bullet-proof way. It works for any class of driver and must be used with unified memory architecture boards. Handling resources with D3DPOOL_DEFAULT is more efficient. In this case the driver will choose the best place for the resource.

Why do we use a vertex buffer at all ? The vertex buffer can be stored in the memory of your graphic card or AGP, where it can be accessed very quickly by 3-D hardware. So a copy between system memory and the graphic card/AGP memory could be avoided. This is important for hardware that accelerates transformation and lighting. Without vertex buffers a lot of bus-traffic would happen by transforming and lighting the vertices.

Why do we use an index buffer ? You will get the maximum performance when you reduce the duplication in vertices transformed and sent across the bus to the rendering device. A nonindexed triangle list for example achieves no vertex sharing, so it's the least optimal method, because DrawPrimitive*() is called several times. Using indexed lists or strips reduce the call overhead of DrawPrimitive*() methods (Reducing DrawPrimitve*() methods is also called batching) and because of the reduction of vertices to send through the bus, it saves memory bandwidth. Indexed strips are more hardware-cache friendly on newer hardware than indexed lists. The performance of index processing operations depends heavily on where the index buffer exists in memory. At the time of this writing, the only graphic cards that supports index buffers in hardware are the RADEON 8x00 series.

Summarize

RacorX shows a simple vertex shader together with its infrastructure. The shader is inlined in racorx.cpp and compiled with D3DXAssembleShader(). It uses four dp4 instructions for the transformation of the quad and only one material color.

The upcoming examples are build on this example and only the functional additions will be shown on the next pages.

RacorX2

Contents

RacorX

RacorX2

RacorX3

RacorX4

RacorX5

Printable version

Discuss this article

The Series

Fundamentals of Vertex Shaders

Programming Vertex Shaders

Fundamentals of Pixel Shaders

Programming Pixel Shaders

Diffuse & Specular Lighting with Pixel Shaders