The Future of PC Gaming – The Possibilities of Direct3D 10
APRIL 24, NEW YORK – The excitement was palpable as Will Willis, Senior PR Manager for ATI Technologies, Inc., opened the door into the suite at the W Hotel in Times Square where the company was giving a preview presentation under the ostentatious title, "The Future of PC Gaming – The Possibilities of DirectX 10." Dominating the wall that immediately came into view as the door opened was a simply mammoth flat-screen HDTV hanging on the wall, with the oh so familiar Windows XP desktop on display. Palpable anticipation…
I turned to receive introductions and warm handshakes to Guennardi Rigeur, a Senior ISV Engineer, and Bob Drevin, who has the fascinating title "ATI Fellow," and who has the aura of an evangelist – which I quickly discover is about right. Bob explains that he helps craft ATI's technology directions and is a representative for the company to Microsoft's DirectX architectural group. Guennardi then tells me that he and other ISV Engineers often spend time with game developers on site at their studios, working on production code and helping them squeeze every last bit of performance out of the company's GPUs. Which provides the ideal segue into the substance of this session.
We begin with a discussion of the limitations inherent in DirectX 9, specifically in Direct3D. Guennardi explains that the API as it currently stands is "batch limited." "You can only perform so many operations within the allotted frame time, because of the amount of overhead inherent in state changes, texture unit access and so on, as well as the organization of the vertex and pixel shaders." By way of explanation, he shows me a slide in which the vertex shader is heavily loaded while performing geometry operations but the pixel shader is virtually idle, and then the load is inverted when framebuffer operations are being performed. The fundamental limitations in terms of access to computing resources, he explains, are limiting what developers can do – forcing them to continue to bring a "false reality" to life.
Guennardi is referring to the variety of mathematically inaccurate models for environmental effects and objects that are used as "good enough" approximations for the real thing, due to the inability of current hardware to keep up with full simulations. Or, at least, that's what I thought.
Bob takes over, talking about ATI's objective of eliminating the DX 9 constraints and solving the small batch problem – reducing the overhead associated with operations such that more operations can be performed in each batch, to the point where accurate mathematical models can power our simulations. The challenge, he explains, is balancing the diverse needs of vertex and pixel processing, which seem to be orthogonal at best, if not anti-parallel. In a sense, Bob expounds, "things are about to get worse… in the hopes of getting better."
And then he really gets into his element, letting out his inner technology evangelist, and before I know it I've drunk the Kool-Aid and I'm seeing happy-happy joy-joy images of developers frolicking in glorious rendered fields of Direct3D 10 goodness. Bob introduces me to the Unified Shader Architecture.
"Current shader architectures use fairly different approaches for the vertex and pixel shader units, and this is reflected in their supporting different operations and sometimes requiring different techniques to program. With Direct3D 10, all shader units support the same basic operations and use the same syntax." In addition, Guennardi chips in to tell me about optimizations to the low-level graphics driver such that shader development no longer necessitates the use of assembly language. "Everything can be done in HLSL," he gushes. "Almost everything," Bob corrects. I make the analogy to contemporary use of high(er)-level languages like C or C++ with only occasional use of assembly for machine-specific extensions (such as SSE3) and they both eagerly seize and run with it. The driver analyzes the bytecode produced by the runtime and generates optimized opcodes for the specific hardware, a process I compare to just-in-time (JIT) compilation and which Guennardi is quite pleased with.
Bob continues, "This is the Unified Shader Architecture, and it enables us to do a lot of really cool things that were just really difficult and tedious before, stuff that was either being bus-limited or CPU-limited but is now possible because of how we've been able to reorganize the GPU's internal architecture." Now all shader units can fetch textures or access vertex memory, and some operations can be shifted from one unit to another – some operations are, in essence, shader unit agnostic. As a consequence, an executive process running on the GPU known as Arbitration that decides what gets to execute next can avoid stalls by determining that the next several operations are not dependent on the result of a block unit, perhaps waiting on I/O. I say that it's like having a spare CPU, except that it's sort of running on the GPU.
Bob likes the analogy. "We actually have the unified shader architecture running on a production system already – in the Xbox 360, with the custom GPU we designed for that. It's allowed developed for the 360 to do all sorts of cool stuff, and we'll get into that in a minute." He takes a minute to point out, however, that the Unified Shader Architecture is not a requirement of the Direct3D 10 specification. Rather, "the specification is written in such a way that encourages and is compatible with the Unified Shader Architecture. This is just ATI's take – and just a first take at that, and you'll see some of the amazing stuff we've been able to do with it. Essentially, the Direct3D 10 'refresh' of the API presents an opportunity for a more natural mapping to the capabilities of the underlying hardware."