An Overview of Microsoft's Direct3D 10 API
About This Article
This article is intended as a high-level overview for developers familiar with Direct3D 9 development. The contents are primarily an elaboration of the many personal notes I made whilst reading all of the available information I have as an MVP. Some of the details in this document will be familiar if you've watched the PDC presentations. It is also worth noting that the API is not finished yet as far as the contents of this article are concerned it should remain accurate, but it's worth realising that if you're reading this long after December 2005 then some parts might have changed.
Given the number of changes and the general complexity of Direct3D 10 this document won't be able to cover everything but it'll hopefully give you a starting point. To go beyond the information covered in this article (or simply to try it out), make sure you get the latest DirectX SDK as of December 2005 the DirectX 9 SDK also contains the Direct3D 10 Technical Preview.
I've divided the document into five sections:
About Direct3D 10
The most important point to realise with Direct3D 10 is that whilst it retains many similarities with previous iterations it was fundamentally redesigned from the ground up. For a start, it's intended to be for Windows Vista only that is, you will not be running Direct3D 10 applications on Windows XP.
Amongst a number of other technologies, Windows Vista will be introducing the Vista Display Driver Model (VDDM). Graphical features and effects are a much bigger part of Windows Vista than they have been in previous Windows operating systems as such it requires the GPU to go beyond its current (primary) role as gaming hardware. Take a look at the "Aero Glass" GUI skin for a perfect example of this.
The GPU is to be viewed as a shared resource in the system with multiple applications using and relying upon it making stability a much more important factor. It's also worth noting that as GPU's become more powerful it is necessary to have a clean and efficient path for utilizing them. VDDM moves much more of the command scheduling and translation into 'user mode' and keeps only the essential parts in 'kernel mode' such that if the hardware or driver crashes it's possible for the system to effectively restart the driver/hardware and avoid taking the whole system down.
Sharing the GPU is a big part of VDDM; to the extent that the video memory will be virtualized by the operating system. This will in turn allow for resource sharing across threads, which could become an important feature with the recent turn towards multi-programming. Another bonus of the GPU becoming a more central resource to the system is that the "lost device" scenario is gone so applications no longer need to worry about handling it. However, there is a "device removed" state which exists for the increasing number of laptops that come with docking stations.
Direct3D 10 also introduces the DirectX Graphics Infrastructure (DXGI) a common foundation for this new release as well as any subsequent versions (e.g. Direct3D 10.1, 10.2, 11, 12 etc ). Much of the basic low-level resources and operations stay constant and common across most versions of Direct3D such that they've now been isolated from the core runtime. The benefit being that there is a stable and consistent foundation for the API to be based upon, and for application developers it should allow different API's (e.g. D3D10 and D3D11) to share resources.
The Programmable Pipeline
We've had some form of programmable pipeline for five years now ever since Direct3D 8 back in the year 2000. Over the number of revisions since then it has become both more powerful and more flexible and with Direct3D 10 it becomes the only choice. That's right the fixed function pipeline is history!
With the fixed function pipeline gone it won't be too surprising to see a lot of complaints online many people still seem quite happy with the style of programming. More importantly, it provides for a much easier "step-up" into the world of graphics programming you don't really need to understand what's happening internally to get some basic graphics running. Yet, at the same time it becomes a confusion when it comes to moving over to the programmable pipeline as it's not always entirely clear where the boundary between fixed-function and programmable exists. Moving away from the fixed function hardware might make it initially more complicated for beginners, but in the long run it is by far the best way to learn. Being able to directly express algorithms and equations should make learning from one of the many textbooks much more straightforward.
The advantages of a programmable pipeline have been discussed many times across the internet and printed media. Suffice to say that "one size fits all" doesn't really apply now that we have the desire for richer and more intense graphics. It's already made itself evident in recent titles and it's likely to become even more prevalent individual "characteristics" of a game. With the programmers directly expressing the equations and then exposing the parameters to artists it allows for many subtle differences in the final images.
With Direct3D 10 we have a new programmable unit giving three in total: Vertex Shaders (VS), Geometry Shaders (GS) and Pixel Shaders (PS). All three form "Shader Model 4.0". Both vertex and pixel shaders are fundamentally the same as they always have been but with a few added bells and whistles. However, the Geometry Shader is completely new and allows us to write code that operates on a per-primitive basis. Not only that, but it also allows us to add geometry procedurally effectively extending the hardware to a whole new class of algorithm.
A powerful feature connected to Geometry Shaders is Stream Output (SO). Conventionally the graphics pipeline has moved in one direction data gets fed in by the application and via a number of steps generates an image on the screen. Locking render targets is about as close to being able to retrieve the outputs of a given stage. The stream output mechanism allows the GS to circulate its results back to the Input Assembler (discussed further on) such that it can be re-processed. Although, it doesn't exclusively have to circulate it back it can circulate and render by passing the output to both the rasterizer and Input Assembler.
SO essentially allows for multi-pass geometry processing with minimal intervention by the CPU (good for parallelism). Examples of this might be to create geometry in the first pass (Bezier patches and/or skinning) and then doing shadow-volume extrusion on a second pass.
Despite mentioning that the fixed-function methodology was dead, there are two major components in the pipeline that are essentially fixed function.
The Input Assembler (IA) is a refinement on a number of existing technologies its responsibility being to take the index and vertex streams and composes the actual geometric data that is fed into the VS and GS components. At the simplest level this component takes the various primitive types (line list, triangle strip etc ) and constructs the actual triangles (remember that some primitive types share vertices). At the more complex level it'll be dealing with geometry instancing and stream-out data. A useful feature that it will generate is a set of counters as it walks through the geometry vertex ID's and primitive ID's. This can be used further down the pipeline to vary processing (or source data) depending on the result.
The Output Merger (OM) is fixed function and also the final stage in the pipeline. Its job is to take all of the results generated by the pipeline and merge them into the final pixel value that we see on the screen. It uses the stencil values, depth values along with multiple render targets along with various blending functions to create the final result.
The Direct3D 10 pipeline should allow for not only a wider class of algorithms (neural networks and physics on a GPU ) and improved performance (single pass cube map rendering) but it should allow application developers to offload more work.
An interesting part of the new attributes that the IA generates as well as the GS's ability to work at the triangle level is that of GPU-selectable properties. It is quite conceivable that most (if not all) of a material system can be executed directly on the GPU. Consider a case where each triangle is given a Primitive ID by the IA which is used by the GS or PS to look up a set of attributes from an array provided as a set of constants that determines how the pixels are finally rendered. Whether this eliminates the need for material-based sorting in the application won't be known until developers get their hands on some real Direct3D 10 hardware but it definitely opens up the possibilities.
Data inputs for the programmable pipeline are less strict in Direct3D 10 further blurring the potential of what the different components are actually capable of. Under Direct3D 9's shader model 3.0 it was possible (albeit performance made it prohibitive) to sample some textures inside the vertex shader. This still exists in Direct3D 10, but courtesy of the more unified resource model it is now what it probably should always have been.
The unified resource model is supported by "views" that is, different stages (or even separate uses of the same stages) can view the same resource in different ways. This allows complex resources to become a lot more flexible which should simplify the application-side of D3D programming as well as offload more work to the GPU and hopefully act as a performance optimization. Examples of this include interpreting a cube-map as an array of 6 separate render targets and performing single pass cube-map rendering; being able to use different mip-map levels as inputs/outputs should also help to avoid the "ping-pong" nature of down-sampling algorithms.
Direct3D 10 Application Development
Given the features and general information discussed so far in this article, it should be expected that application development will have to change to make use of the new API. This is true, but at the same time it's an evolution of what seasoned Direct3D programmers should be used to the core theory remains the same.
Enabled by Direct3D 10 being targeted at a new operating system, driver model and internal re-design is that it only has a few variable capabilities. This is no small change under Direct3D 9 the D3DCAPS9 structure contains enumerations that cover a huge spectrum of hardware (as demonstrated by the 'CardCaps' files in the SDK). Initial application development with such a number of potential combinations was at least an interesting balancing act; testing the final product against all combinations was difficult if not impossible.
With fixed capabilities it is conceivable that the testing will become more for performance than stability and correctness. This works well for both professional studios who can reduce the overhead (time and cost) of testing as well as for indie developers who simply don't have access to such wide-ranging resources. It has been hinted at that there will be more frequent point releases of Direct3D to allow for innovation of hardware features.
Given that the IHV's can no longer compete on who's got the best feature-set it will come down to whom has the best performance and quality. This does pose the question of whether performance-related capabilities will become the new problem an IHV might be required to provide a feature, but whether it's actually usable is another issue entirely. In current Direct3D 9 development, it is possible on some hardware to use "vertex texturing" but it is generally regarded as being too slow to rely on as a central component. With this in mind it's unlikely that a graphics engine can be abstracted completely from the hardware and that multiple code-paths will still exist (optimized for each IHV).
Another big aspect of Direct3D that has changed with this release is the push for create-time validation of resources. On the surface this is about performance moving any expensive checking to the time of creation and avoid doing it each-and-every time it's used by the core rendering functions. By having resources and configurations verified at create-time it should allow application developers to write cleaner and simpler code in the core parts of an application. Checking and handling of errors should become less of an intrusion into the core code. Also, by verifying resources when they are loaded it should give the application a better ability to handle any incorrect resources. With the increasing bias towards data-driven effects (more shaders, more textures ) any additional 'safety' can't be a bad thing!
Render states have undergone some changes with Direct3D10 that could require some changes to existing code bases. Simply put they are now organised into a number of immutable state objects. Each of these object types encompasses the group of render states related to its part of the pipeline for example the D3D10_DEPTH_STENCIL_DESC object describes render states related to depth and stencil buffering. This makes things a bit easier to manage it would be quite simple to create a number of these state objects for each effect to be used. Where it could get difficult is for existing scene graphs and state management systems that are finely tuned to the current Direct3D API. Efficient state management and ordered drawing has become the cornerstone of high performance Direct3D applications and if existing implementations are fine-grained down to individual states then this change towards immutable groups could invalidate much of that code.
HLSL and the Effects Framework
Direct3D 8's shaders were entirely written in assembly (which was still an intermediary step) and Direct3D 9 continued to allow this form but also introduced HLSL. With Direct3D 10 the use of assembly shaders has been almost eliminated effect files and "pure" shaders are now expressed only in HLSL form. The output from the HLSL compiler will still be visible so that you can use it for sanity checking and debugging, but it won't be a valid input from the application. The compiler itself has been merged into the core Direct3D 10 runtime it's no longer owned by D3DX as it was under Direct3D 9.
HLSL-only fits into the previous point about having create-time validation. By examining the input/output semantics the runtime can verify that the output of one stage matches the input of the next stage and not only produce an error indicating that the shader will fail, but also eliminate the need to check shader linkage during the core rendering segments.
The fundamentals of HLSL shaders (and their related "fx" files) hasn't changed, but there are a number of new instructions and functions available refinements on what most developers will be familiar with. It is notable that apart from a few specialist instructions (such as cut and emit in the GS) all three programmable units share a common set of instructions. 'Standard Annotations and Semantics' has featured in later Direct3D 9 updates and still exists in Direct3D 10 albeit it is more standardised and likely to be more widely used and relied upon. With annotations becoming a more basic part of the effects and HLSL systems, it should be more convenient to move effects between tools (e.g. authoring a shader effect inside 3DSMax and having it automatically picked up and used by your Direct3D application).
One of the bigger changes that will be apparent with HLSL is the introduction of "constant buffers" these apply logical groupings to the constants that the application can feed into a stage of the pipeline. These buffers allow the API and hardware to manipulate constants in a much cleaner way; in current applications it is very possible to adversely affect performance with inefficient constant updates. Despite the way they appear, they are more of a semantic than syntactic feature they don't work like namespaces (it's still a flat global namespace) and constants can get assigned to namespaces based on their usage. For example, a number of constant buffers can be created to indicate how often they are updated (always constant, per frame and user input etc..).
HLSL now allows for a full set of integer instructions and operations which appear much like you would expect from most progamming languages logical and bitwise operators. Given that stages in the pipeline can now access multiple resources, it can be useful to work in integer format so as to get correct addressing (and where filtering by floating point addressing is not desirable). With clean and direct access to the raw binary data (via bitwise operations) it will be interesting to see what geometry compression systems can be mapped directly to the GPU hardware.
Direct3D 10 shaders have no instruction count limit. Looping and extended execution time can be covered by the new Vista driver model because the GPU is considered as a shared resource the drivers and OS monitor situations where the GPU is deemed to have "hung" and require resetting.
The effects framework has undergone a complete re-design in part so that it matches the changes in the API but also to take advantage of the opportunity offered by a completely new API.
The memory footprint for effects files has been reduced by internal restructuring, but changes to the reflection system allow application developers to query for information and then discard all information that is no longer required.
Access to the internals of the effect file (constants/parameters, techniques etc ) has been refined and is now through a series of lightweight interfaces rather than through the slightly complicated handles system currently employed by Direct3D 9. Performance-wise, updating and applying changes is in line with the other design changes explained in this article fast! Validation is done at load-time such that no runtime checking is required, consequently changing becomes a constant-time operation.
Related to both of these is the ability to have "effect pools" it is advised that you load all effects into a pool such that any common information can be efficiently stored and updated.
The article you've just been reading has been focussed on the changes and enhancements in Direct3D 10 with reference to existing technologies where applicable. For this section of the article the focus does a 180 and covers some of the issues that application writers might need to consider today.
The first big change is going to be to stop relying on fixed-function processing. Obviously there will be a lot of legacy code around that uses this functionality, but any attempts to deprecate and slowly fade it away will be a good plan. Writing any new Direct3D code should be done entirely in the form of shaders (doesn't have to be via the Effect framework). Remember that FX files (via the Effect Framework) can contain fixed-function statements, so it's worth checking those as well.
Following on from this, you should also look at any shaders written in assembly code. As mentioned in the main body of the article, the new Direct3D runtimes deal with HLSL only such that your ASM code becomes obsolete. Given that most assembly code shaders are for the older 1.x targets it shouldn't be too hard to write up an equivalent HLSL version. Going forwards, it would be best to start writing all shaders in HLSL so that they are Direct3D 10 ready.
Consider making use of the new pipeline-generated values (VertexID, PrimitiveID and InstanceID). Their values can almost be simulated by attaching per-vertex attributes under Direct3D 9; but it might have to be something you need to wait before using. The reasoning for considering these attributes as it can, with a bit of creative thinking, be used to simplify the application-level code by getting the programmable pipeline to select properties for you.
When thinking about the new geometry shader functionality it is probably best not to think of it as a programmable unit for higher-order surfaces (e.g. Bezier patches) or compressed meshes. Whilst the GS has much of the functionality required for these tessellation algorithms it is quite likely that the first round of Direct3D 10 compliant hardware won't have the necessary performance to make heavy use of it. As hinted at previously in this article it is probably worth utilizing the GS as more of an optimization (single-pass rendering to multiple render targets) and a means by which to gather additional information about what is being rendered. For example, it's possible to compute plane equations and surface (rather than vertex) properties.
Combined with the increased availability of information inside the pipeline and the new grouping of render states, it will be worth considering how any scene graphs and state management algorithms work. Emulating state objects should be simple enough, and designing your algorithm(s) such that it can take advantage of the pipeline would definitely be beneficial for any future upgrades.
As you could quite rightly expect, Windows XP and DirectX 9 are still going to be around for some time yet. Officially Windows XP will cease to be supported at the end of 2006, but that doesn't mean it'll just vanish off home users machines (it may well vanish very quickly off corporate networks though). Application compatibility is a big factor in a new OS such that DirectX 9 applications built for Windows XP will still work under Windows Vista, and there will even be an updated version of DirectX 9 to take advantage of the new driver model in Vista ("DirectX 9.L").
For the time period running up to Windows Vista (currently estimated somewhere towards the end of 2006) it makes sense to continue developing for DirectX 9 with an eye towards targeting DirectX 9.L (as and when details are available). If you follow the guidelines in this article (as well as any others you can find) then updating to be Direct3D 10 compatible (or to make use of its features) should be fairly straight forward.
About The Author
Jack Hoxley is a Microsoft Most Valuable Professional (MVP) for DirectX. You can view his full profile here. He maintains a developer journal on the game development website GameDev.net and answers questions in their forums under the alias "Jollyjeffers". You can contact Jack by email at the following address: Jack *dot* Hoxley *at* f1cm *dot* co *dot* uk.
13th December 2005