Using VertexBuffers With DirectX
by Erik "Wazoo" Yuzwa

Introduction

Even though DirectX 8 (and now v9.0) has been out for quite a while now, a lot of people still seem to be having problems getting used to the "proper" usage of VertexBuffers. Even a cursory glance through the DirectX forum reveals some posts by confused programmers and/or game developers who are making the switch from OpenGL to Direct3D, or just simply trying to figure out why their frame rate isn't as high as it should/could be. I will attempt to cover some known (and maybe not so known) tidbits of knowledge on VertexBuffers, and hopefully people will benefit from it.

To try and offer help to those who need it, I decided to whip up a small document/tutorial which would try and ease the suffering. (Note that I'm not really covering anything "new" here, I'm just compiling together information gathered from the MSDN and other online documentation).

What Are Vertex Buffers?

Vertex Buffers were created into Direct3D8, as a way of creating a rendering pipeline system which allows the processing to be shared by both the CPU and the GPU (of the video hardware). Vertex Buffers provide us with a mechanism of being able to fill in vertex buffer data with the CPU, while at the same time allowing the GPU to process an earlier batch of vertices. In effect, giving us the ability to achieve a small degree of parallel processing during our game.

So what does using a vertex buffer in our game help us over just allocating a hunk of memory to stick our vertex data in? Well, theoretically, a vertex buffer is optimized by the device driver for faster access and flexibility within our rendering pipeline.

Static or Dynamic?

Vertex Buffers can be created in two forms: static and dynamic. Once static vertex buffers are created, they are stuck in an "optimal location by the device driver". This location, chosen by the device driver, enables the switching between static vertex buffers as fast as possible.

Dynamic Vertex Buffers, on the other hand, are filled and tossed away every frame. One of the advantages of Dynamic VB's is that you can create large batches of triangles to send of to the GPU, which according to both ATI and NVidia, is the way to go in terms of maximizing performance. Note that even this point is argumentative. According to the "Performance Optimizations" tips included with the SDK, Microsoft recommends to use static vertex buffers wherever possible.

The scope of this article will deal with Dynamic Vertex Buffers, as the majority of your vertex information will probably be changing throughout the lifetime of your scene. IMHO even though the jury is still kinda out on Dynamic vs. Static, Dynamic VB's are the way to go. In all of my projects (so far) using DirectX8, I just create one dynamic VB that I empty and fill every frame to maximize the amount of triangle batching I can send to the GPU.

Creation of Dynamic Vertex Buffers

The first step in optimizing our use of Vertex Buffers, is to carefully examine the creation of them. The DirectX SDK documentation, along with whitepapers from NVidia both claim that improper initialization of Vertex Buffers will seriously impede proper performance of your application.

Direct3D8.0 Direct3D9.0

The only parameters we need to worry about for this article, are the Usage and Pool parameters. For dynamic vertex buffers, which contain information about primitives that change often in the scene, we need to specify the D3DUSAGE_DYNAMIC | D3DUSAGE_WRITEONLY flags for the Usage and the D3DPOOL_DEFAULT flag for the Pool. The DYNAMIC and WRITEONLY flags, tell the Direct3DDevice interface to create a vertex buffer within AGP memory since we are accessing it more often than static vertex data.

Locking / Unlocking Dynamic Vertex Buffers

In order to update the vertex information contained within a vertex buffer, we need to get a handle to the Vertex Buffer resource. Using the Lock method of the VertexBuffer8 interface, we signal the hardware that we wish to acquire a handle to the area in memory containing our primitive information. Note that while an area of memory is Locked, no other area of memory containing primitive information can be touched. It is for this reason, that we naturally keep the Locks to a bare minimum.

Direct3D8.0 Direct3D9.0

Again, the most important parameter of this method is the Flags type. Here we have two options available to us. The first, D3DLOCK_NOOVERWRITE is used when we wish to keep the existing vertex information within the buffer. By specifying this flag, we are able to append more vertex information to the primitive data already contained in the vertex buffer. The alternative is the D3DLOCK_DISCARD flag. This flag signals the device that we wish to empty the current contains of the vertex buffer and start anew. Note that the VertexBuffer interface returns a new area of memory to us with this call, just in case there's a DMA conflict with the existing area of memory (ie. the existing vertex buffer could be used at the same time by the GPU of the video hardware).

Once we are finished either appending additional vertices or creating new ones, we need to Unlock the area of memory.

Direct3D8.0 Direct3D9.0

Very simple, and no explanation needed. It just frees our handle to the area of video memory we were working with.

Some Common Usages of Vertex Buffers : Good and Bad

Now that we've basically gone over the preamble to using VertexBuffers, we should follow up with some clearer code examples that demonstrate the points I attempted to outline above : proper dynamic VB usage.

Example #1: The vanilla "OpenGL" way (OpenGL v1.1)

If you're coming from the world of OpenGL, then you might make some mistakes in using VBs. Consider this:

//OpenGL method of drawing some particles
for(int i = 0; i < max_particles; i++){
   glLoadIdentity();
   glTranslatef(particles[i].x, particles[i].y, particles[i].z);
   glBegin(GL_TRIANGLESTRIPS);
   //vertex data here
   glEnd();
}

Not that there's anything wrong with that, but look what it might translate to in Direct3D.

Direct3D8.0 Direct3D9.0

Looks perfectly reasonable right? We might even try to set the StreamSource and VertexShader BEFORE the for loop in an attempt to increase performance right?

This example is a good one to show, as it points out the bad usage of Vertex Buffers. Here we are only sending a measly 2 triangles to the GPU every iteration of the for loop. Not only does this waste state changes for EACH iteration of the loop, but we're nowhere NEAR flexing the muscle of our VertexBuffer. In fact, our application is almost entirely CPU-bound, as we are fiddling with the vertices in memory before sending off a paltry few to the rendering pipeline.

Example #2: Optimizing Example #1

Well now that we've seen so much about Vertex Buffers, let's take a good crack at speeding up the performance of the rendering mechanism we outlined in Example #1. Our approach will try to take into account the dynamic usage of a Vertex Buffer, and try to keep the GPU and CPU more in parallel. After all, we've just spent a lot of money on our new video hardware and want to sport it!

Direct3D8.0 Direct3D9.0

Okay so the rendering loop is a teeny bit longer here than the OpenGL port. But we're not doing anything entirely complex here. Our goal is to use the GPU to our advantage, and render some vertex information while we continuing filling up another area of video memory with the CPU.

The samples above could probably be used for rendering particles, but they can be easily modified to render simple triangle data of other objects, such as a terrain engine.

Example #3: The Microsoft Way

Yet another way to use Vertex Buffers is used in most of the samples contained within the SDK. While the algorithm is similar to the one used above, it differs in some subtle ways. Primarily, we simply append vertices to the existing vertex buffer using the D3DLOCK_NOOVERWRITE flag. Once we've finished appending a small chunk of vertices, we blast it to the GPU using the DrawPrimitive method. We then repeat this process of locking, appending, unlocking, blasting until we run out of space within the vertex buffer. The bonus of this method as well, is that our GPU is able to again render our primitive data while we're filling up some more vertices.

Direct3D8.0 Direct3D9.0

Common Mistakes

Some of the more common mistakes with Vertex Buffers can help boost performance in your app tremendously. Again, these gems are straight from the DX SDK documentation, but I'll drop them here for the lazy people.

Render your scene from front to back.
Minimize vertex buffer switching
Use triangle strips instead of lists and fans wherever possible
Batch, batch, batch!
Keep vertex buffer locking down to a minimum
Use D3DLOCK_DISCARD wherever possible
Triple-check Vertex Buffer creation flags!

I urge you to read the documents in the MSDN pertaining to Direct3D performance optimization for more tips!

Closing

Well that's a pretty whirlwind tour of the usage of Vertex Buffers. Hopefully we all learned something along our journey, or at the very least, enough to get us interested to find out more about them! If you are struggling to make your application faster than 10 FPS even though you're not doing much at all, chances are it can be due to poor vertex buffer usage. Use them right, and they scream, but use them wrong and they become quicksand.

If you have any questions or comments about this article, feel free to send me some email.

References

MSDN Microsoft DirectX8 Developer FAQ, February 2001.

MSDN Microsoft DirectX9 Developer FAQ, May 2003.

Huddy, Richard D3D Optimization, GDC 2001.

Huddy, Richard Basic Mistakes, GDC 2001.

Discuss this article in the forums

Date this article was posted to GameDev.net: 5/26/2003
(Note that this date does not necessarily correspond to the date the article was written)

See Also:
DirectX Graphics
Sweet Snippets