GameDev.net -- Introduction to Shader Programming Part IV: Programming Pixel Shaders

RacorX8

The main difference between RacorX8 and RacorX7 is the usage of a Look-up table to store the specular power values for the specular reflection instead of using a few mul instructions in the pixel shader. The advantage of using a table-Look-up is, that the banding is reduced. This is due to the higher value range of the Look-up table compared to the solution with multiple mul instructions.

Figure 6 - RacorX8 Specular Lighting with Specular Power from a Look-up table

The drawback of using a Look-up table in the way shown in this example, is the need for an additional texture stage.

Set Texture Operation Flags (with D3DTSS_* flags)

RacorX8 sets in RestoreDeviceObjects() additionally the following two texture stage states for the texture map that holds the specular map:

// specular map m_pd3dDevice->SetTextureStageState( 2, D3DTSS_ADDRESSU, D3DTADDRESS_CLAMP ); m_pd3dDevice->SetTextureStageState( 2, D3DTSS_ADDRESSV, D3DTADDRESS_CLAMP );

With D3DTADDRESS_CLAMP flag, the texture is applied once and then the color of the edge pixel is smeared. Clamping sets all negative values to 0, whereas all positive values remain unchanged. Without the clamping a white ring around the earth would be visible.

Set Texture (with SetTexture())

This example sets the specular map with the handle m_pLightMap16 in Render():

... m_pd3dDevice->SetTexture(2, m_pLightMap16); ...

This look-up table is created as a texture map and filled with specular power values in the following lines:

// specular light lookup table void LightEval(D3DXVECTOR4 *col, D3DXVECTOR2 *input, D3DXVECTOR2 *sampSize, void *pfPower) { float fPower = (float) pow(input->y,*((float*)pfPower)); col->x = fPower; col->y = fPower; col->z = fPower; col->w = 1; } ... // // create light texture // if (FAILED(D3DXCreateTexture(m_pd3dDevice, 256, 256, 0, 0, D3DFMT_A8R8G8B8, D3DPOOL_MANAGED, &m_pLightMap16))) return S_FALSE; FLOAT fPower = 16; if (FAILED(D3DXFillTexture(m_pLightMap16, LightEval, &fPower))) return S_FALSE;

D3DXFillTexture() (new in DirectX 8.1) uses the user-provided function LightEval() in its second parameter to fill each texel of each mip level of the Look-up table texture that is returned in the first parameter.

HRESULT D3DXFillTexture( LPDIRECT3DTEXTURE8 pTexture, LPD3DXFILL2D pFunction, LPVOID pData );

This function is useful to build all kind of procedural textures, that might be used in the pixel shader as a Look-up table.

The startup time of your app should be shorter by building the procedural texture with this function once and save it with
D3DXSaveTextureToFile("function.dds", D3DXIFF_DDS, m_pNormalMap, 0);

LightEval() which is provide in pFunction has to follow the following declaration

VOID (*LPD3DXFILL2D)( D3DXVECTOR4* pOut, D3DXVECTOR2* pTexCoord, D3DXVECTOR2* pTexelSize, LPVOID pData );

The first parameter returns the result in a pointer to a vector. The second parameter gets a vector containing the coordinates of the texel currently being processed. In our case this is a pointer to a 2D vector named input. The third parameter is unused in LightEval() and might be useful to provide the texel size. The fourth parameter is a pointer to user data. LightEval() gets here a pointer to the pfPower variable. This value is transfered via the third parameter of the D3DXFillTexture() function.

This example sets the same constants as the previous example, so we can proceed further with the pixel shader source.

Pixel Shader Instructions

The vertex shader that drives the pixel shader differs from the vertex shader in the previous examples only in the last four lines:

; oT0 coordinates for normal map ; oT1 half angle ; oT2 half angle ; oT3 coordinates for color map mov oT0.xy, v7.xy mov oT1.xyz, r8 mov oT2.xyz, r8 mov oT3.xy, v7.xy

The texture coordinates for the normal map and the color map are stored in oT0 and oT3. The half angle vector is stored as a texture coordinate in oT1 and oT2. This example uses a 3x2 table of exponents, stored in the specular map in t2.
The two pixel shaders TableSpec.psh and TableSpecps14.psh calculate the u and v position and sample a texel from the specular map. After the color texture is sampled, the color value and the value from the specular map is modulated:

ps.1.1 ; t0 holds normal map ; (t1) holds row #1 of the 3x2 matrix ; (t2) holds row #2 of the 3x2 matrix ; t2 holds the Look-up table ; t3 holds color map tex t0 ; sample normal texm3x2pad t1, t0_bx2 ; calculates u from first row texm3x2tex t2, t0_bx2 ; calculates v from second row ; samples texel with u,v tex t3 ; sample base color mul r0,t2,t3 ; blend terms ; specular power from a Look-up table ps.1.4 ; r0 holds normal map ; t1 holds half vector ; r2 holds the lookup table ; r3 holds color map texld r0, t0 texcrd r1.rgb, t1 dp3 r1.rg, r1, r0_bx2 ; calculates u phase texld r3, t0 texld r2, r1 ; samples texel with u,v mul r0, r2, r3

In the ps.1.1 pixel shader, texm3x2pad performs a three component dot product between the texture coordinate set corresponding to the destination register number and the data of the source register and stores the result in the destination register. The texm3x2tex instruction calculates the second row of a 3x2 matrix by performing a three component dot product between the texture coordinate set corresponding to the destination register number and the data of the source register.
texcrd in the ps.1.4 shader copies the texture coordinate set corresponding to the source register into the destination register as color data. It clamps the texture coordinates in the destination register with a range of [-MaxTextureRepeat, MaxTextureRepeat] (RADEON 8500: 2048) to the range of the source register [-8, 8] (MaxPixelShaderValue). This clamp might behave differently on different hardware. To be safe, provide data in the range of [-8, 8].

Values from the output registers of the vertex shader are clamped to [0..1], that means the negative values are set to 0, while the positive values remain unchanged. To bypass the problem of clamping, the data can be loaded in a texture into the pixel shader directly.
In a ps.1.1 - ps.1.3 pixel shader, the rn, tn and cn registers can handle a range of [-1..1]. The color registers can only handle a range of [0..1].
To load data in the range [-1..1] via a texture in ps.1.1 - ps.1.3, the tex tn instruction can be used.
In ps.1.4 the rn registers can handle a range of [-8..8] and the tn registers can handle, in case of the RADEON 8500, a range of [-2048..2048]. So data from a texture in the range of [-8..8] can be loaded via texcrd rn, tn, via texld rn, tn or texld rn, rn (only phase 2) in a ps.1.4 pixel shader.

A .rgb or .rg modifier should be provided to the destination register of texcrd, because the fourth channel of the destination register is unset/undefined in all cases.
The arithmetic instruction dp3 performs a three component dot product between the two source registers. Its result is stored in r and g of r1.
Both shaders perform a dependant read. A dependant read is a read from a texture map using a texture coordinate which was calculated earlier in the pixel shader. The texm3x2pad/texm3x2tex instruction pair calculate the texture coordinate, that is used to sample a texel by the texm3x2tex instruction later. In the ps.1.4 shader, the second texld instruction uses the texture coordinate that was calculated earlier with the dp3 instruction.
It is interesting to note, that the first texld instruction after the phase marker uses the same texture coordinate pair as the normal map. This re-usage of texture coordinates is only possible in ps.1.4.
It is also important to note, that using the texm3x2pad/texm3x2tex pair to load a value from a specular map is inefficient, because both instructions calculate the same value and get the same half vector via two texture coordinate registers. Using only the texm3x2tex instruction is not possible, because this instruction can only be used together with a texm3x2pad instruction.
A more elegant solution comparable to the ps.1.4 shader is possible by using the texdp3tex instruction together with a 1D specular map, but this instruction needs ps.1.2 or ps.1.3 capable hardware.

You can not change the order of the t0 - t3 registers in a ps.1.1 - ps.1.3 pixel shader. These registers must be arranged in this pixel shader version in increasing numerical order. For example setting the color map in texture stage 1 in the above ps.1.1 shader won't work. In ps.1.4 it is not necessary to order the r0 - r5 or t0 and t1 registers in any way.

Summarize

This example improved the specular power precision by using a specular power look-up table. The drawback of this technique is the usage of an additional texture, which may be overcome by using the alpha channel of the normal map.

RacorX9

Contents

RacorX6

RacorX7

RacorX8

RacorX9

Printable version

Discuss this article

The Series

Fundamentals of Vertex Shaders

Programming Vertex Shaders

Fundamentals of Pixel Shaders

Programming Pixel Shaders

Diffuse & Specular Lighting with Pixel Shaders