An Overview of Microsoft's Direct3D 10 API
About This Article
This article is intended as a high-level overview for developers familiar with Direct3D 9 development. The contents are primarily an elaboration of the many personal notes I made whilst reading all of the available information I have as an MVP. Some of the details in this document will be familiar if you've watched the PDC presentations. It is also worth noting that the API is not finished yet as far as the contents of this article are concerned it should remain accurate, but it's worth realising that if you're reading this long after December 2005 then some parts might have changed.
Given the number of changes and the general complexity of Direct3D 10 this document won't be able to cover everything but it'll hopefully give you a starting point. To go beyond the information covered in this article (or simply to try it out), make sure you get the latest DirectX SDK as of December 2005 the DirectX 9 SDK also contains the Direct3D 10 Technical Preview.
I've divided the document into five sections:
About Direct3D 10
The most important point to realise with Direct3D 10 is that whilst it retains many similarities with previous iterations it was fundamentally redesigned from the ground up. For a start, it's intended to be for Windows Vista only that is, you will not be running Direct3D 10 applications on Windows XP.
Amongst a number of other technologies, Windows Vista will be introducing the Vista Display Driver Model (VDDM). Graphical features and effects are a much bigger part of Windows Vista than they have been in previous Windows operating systems as such it requires the GPU to go beyond its current (primary) role as gaming hardware. Take a look at the "Aero Glass" GUI skin for a perfect example of this.
The GPU is to be viewed as a shared resource in the system with multiple applications using and relying upon it making stability a much more important factor. It's also worth noting that as GPU's become more powerful it is necessary to have a clean and efficient path for utilizing them. VDDM moves much more of the command scheduling and translation into 'user mode' and keeps only the essential parts in 'kernel mode' such that if the hardware or driver crashes it's possible for the system to effectively restart the driver/hardware and avoid taking the whole system down.
Sharing the GPU is a big part of VDDM; to the extent that the video memory will be virtualized by the operating system. This will in turn allow for resource sharing across threads, which could become an important feature with the recent turn towards multi-programming. Another bonus of the GPU becoming a more central resource to the system is that the "lost device" scenario is gone so applications no longer need to worry about handling it. However, there is a "device removed" state which exists for the increasing number of laptops that come with docking stations.
Direct3D 10 also introduces the DirectX Graphics Infrastructure (DXGI) a common foundation for this new release as well as any subsequent versions (e.g. Direct3D 10.1, 10.2, 11, 12 etc ). Much of the basic low-level resources and operations stay constant and common across most versions of Direct3D such that they've now been isolated from the core runtime. The benefit being that there is a stable and consistent foundation for the API to be based upon, and for application developers it should allow different API's (e.g. D3D10 and D3D11) to share resources.
The Programmable Pipeline
We've had some form of programmable pipeline for five years now ever since Direct3D 8 back in the year 2000. Over the number of revisions since then it has become both more powerful and more flexible and with Direct3D 10 it becomes the only choice. That's right the fixed function pipeline is history!
With the fixed function pipeline gone it won't be too surprising to see a lot of complaints online many people still seem quite happy with the style of programming. More importantly, it provides for a much easier "step-up" into the world of graphics programming you don't really need to understand what's happening internally to get some basic graphics running. Yet, at the same time it becomes a confusion when it comes to moving over to the programmable pipeline as it's not always entirely clear where the boundary between fixed-function and programmable exists. Moving away from the fixed function hardware might make it initially more complicated for beginners, but in the long run it is by far the best way to learn. Being able to directly express algorithms and equations should make learning from one of the many textbooks much more straightforward.
The advantages of a programmable pipeline have been discussed many times across the internet and printed media. Suffice to say that "one size fits all" doesn't really apply now that we have the desire for richer and more intense graphics. It's already made itself evident in recent titles and it's likely to become even more prevalent individual "characteristics" of a game. With the programmers directly expressing the equations and then exposing the parameters to artists it allows for many subtle differences in the final images.
With Direct3D 10 we have a new programmable unit giving three in total: Vertex Shaders (VS), Geometry Shaders (GS) and Pixel Shaders (PS). All three form "Shader Model 4.0". Both vertex and pixel shaders are fundamentally the same as they always have been but with a few added bells and whistles. However, the Geometry Shader is completely new and allows us to write code that operates on a per-primitive basis. Not only that, but it also allows us to add geometry procedurally effectively extending the hardware to a whole new class of algorithm.
A powerful feature connected to Geometry Shaders is Stream Output (SO). Conventionally the graphics pipeline has moved in one direction data gets fed in by the application and via a number of steps generates an image on the screen. Locking render targets is about as close to being able to retrieve the outputs of a given stage. The stream output mechanism allows the GS to circulate its results back to the Input Assembler (discussed further on) such that it can be re-processed. Although, it doesn't exclusively have to circulate it back it can circulate and render by passing the output to both the rasterizer and Input Assembler.
SO essentially allows for multi-pass geometry processing with minimal intervention by the CPU (good for parallelism). Examples of this might be to create geometry in the first pass (Bezier patches and/or skinning) and then doing shadow-volume extrusion on a second pass.
Despite mentioning that the fixed-function methodology was dead, there are two major components in the pipeline that are essentially fixed function.
The Input Assembler (IA) is a refinement on a number of existing technologies its responsibility being to take the index and vertex streams and composes the actual geometric data that is fed into the VS and GS components. At the simplest level this component takes the various primitive types (line list, triangle strip etc ) and constructs the actual triangles (remember that some primitive types share vertices). At the more complex level it'll be dealing with geometry instancing and stream-out data. A useful feature that it will generate is a set of counters as it walks through the geometry vertex ID's and primitive ID's. This can be used further down the pipeline to vary processing (or source data) depending on the result.
The Output Merger (OM) is fixed function and also the final stage in the pipeline. Its job is to take all of the results generated by the pipeline and merge them into the final pixel value that we see on the screen. It uses the stencil values, depth values along with multiple render targets along with various blending functions to create the final result.
The Direct3D 10 pipeline should allow for not only a wider class of algorithms (neural networks and physics on a GPU ) and improved performance (single pass cube map rendering) but it should allow application developers to offload more work.
An interesting part of the new attributes that the IA generates as well as the GS's ability to work at the triangle level is that of GPU-selectable properties. It is quite conceivable that most (if not all) of a material system can be executed directly on the GPU. Consider a case where each triangle is given a Primitive ID by the IA which is used by the GS or PS to look up a set of attributes from an array provided as a set of constants that determines how the pixels are finally rendered. Whether this eliminates the need for material-based sorting in the application won't be known until developers get their hands on some real Direct3D 10 hardware but it definitely opens up the possibilities.
Data inputs for the programmable pipeline are less strict in Direct3D 10 further blurring the potential of what the different components are actually capable of. Under Direct3D 9's shader model 3.0 it was possible (albeit performance made it prohibitive) to sample some textures inside the vertex shader. This still exists in Direct3D 10, but courtesy of the more unified resource model it is now what it probably should always have been.
The unified resource model is supported by "views" that is, different stages (or even separate uses of the same stages) can view the same resource in different ways. This allows complex resources to become a lot more flexible which should simplify the application-side of D3D programming as well as offload more work to the GPU and hopefully act as a performance optimization. Examples of this include interpreting a cube-map as an array of 6 separate render targets and performing single pass cube-map rendering; being able to use different mip-map levels as inputs/outputs should also help to avoid the "ping-pong" nature of down-sampling algorithms.