Upcoming Events
Unite 2010
11/10 - 11/12 @ Montréal, Canada

GDC China
12/5 - 12/7 @ Shanghai, China

Asia Game Show 2010
12/24 - 12/27  

GDC 2011
2/28 - 3/4 @ San Francisco, CA

More events...
Quick Stats
64 people currently visiting GDNet.
2406 articles in the reference section.

Help us fight cancer!
Join SETI Team GDNet!
Link to us Events 4 Gamers
Intel sponsors gamedev.net search:

High Level View on Pixel Shader Programming

Pixel Shading takes place on a per-pixel, per-object basis during a rendering pass.

Let's start by focusing on the steps required to build a pixel shader-driven application. The following list ordered in the sequence of execution shows the necessary steps to build up a pixel shader driven application:

  • Check for Pixel Shader Support
  • Set Texture Operation Flags (with D3DTSS_* flags)
  • Set Texture (with SetTexture())
  • Define Constants (with SetPixelShaderConstant()/def)
  • Pixel Shader Instructions
    • Texture Address Instructions
    • Arithmetic Instructions
  • Assemble Pixel Shader
  • Create Pixel Shader
  • Set Pixel Shader
  • Free Pixel Shader Resources

The following text will work through this list step-by-step:

Check for Pixel Shader Support

It is important to check for the proper pixel shader support, because there is no feasible way to emulate pixel shaders. So in case there is no pixel shader support or the required pixel shader version is not supported, there have to be fallback methods to a default behaviour (ie the multitexturing unit or ps.1.1). The following statement checks the supported pixel shader version:

if( pCaps->PixelShaderVersion < D3DPS_VERSION(1,1) )
  return E_FAIL;

This example checks the support of the pixel shader version 1.1. The support of at least ps.1.4 in hardware might be checked with D3DPS_VERSION(1,4). The D3DCAPS structure has to be filled in the startup phase of the application with a call to GetDeviceCaps(). In case the Common Files Framework which is provided with the DirectX 8.1 SDK is used, this is done by the framework. If you graphics card does not support the requested pixel shader version and there is no fallback mechanism that switches to the multitexturing unit, the reference rasterizer will jump in. This is the default behaviour of the Common Files Framework, but it is not useful in a game, because the REF is too slow.

Set Texture Operation Flags (D3DTSS_* flags)

The pixel shader functionality replaces the D3DTSS_COLOROP and D3DTSS_ALPHAOP operations and their associated arguments and modifiers that were used with the fixed-function pipeline. For example the following four SetTextureStageState() calls could be handled now by the pixel shader:

m_pd3dDevice->SetTextureStageState( 0, D3DTSS_COLORARG1, D3DTA_TEXTURE );
m_pd3dDevice->SetTextureStageState( 0, D3DTSS_COLORARG2, D3DTA_DIFFUSE); 
m_pd3dDevice->SetTextureStageState( 0, D3DTSS_COLOROP, D3DTOP_MODULATE); 
m_pd3dDevice->SetTexture( 0, m_pWallTexture);

But the following texture stage states are still observed.

D3DTSS_ADDRESSU 
D3DTSS_ADDRESSV
D3DTSS_ADDRESSW
D3DTSS_BUMPENVMAT00
D3DTSS_BUMPENVMAT01
D3DTSS_BUMPENVMAT10
D3DTSS_BUMPENVMAT11
D3DTSS_BORDERCOLOR
D3DTSS_MAGFILTER
D3DTSS_MINFILTER
D3DTSS_MIPFILTER
D3DTSS_MIPMAPLODBIAS
D3DTSS_MAXMIPLEVEL
D3DTSS_MAXANISOTROPY
D3DTSS_BUMPENVLSCALE
D3DTSS_BUMPENVLOFFSET
D3DTSS_TEXCOORDINDEX
D3DTSS_TEXTURETRANSFORMFLAGS

The D3DTSS_BUMP* states are used with the bem, texbem and texbeml instructions.

In ps.1.1 - ps.1.3 all D3DTSS_TEXTURETRANSFORMFLAGS flags are available and have to be properly set for a projective divide, whereas in ps.1.4 the texture transform flag D3DTTFF_PROJECTED is ignored. It is accomplished by using source register modifiers with the texld and texcrd registers.

The D3DTSS_TEXCOORDINDEX flag is valid only for fixed-function vertex processing. when rendering with vertex shaders, each stages's texture coordinate index must be set to its default value. The default index for each stage is equal to the stage index.

ps.1.4 gives you the ability to change the association of the texture coordinates and the textures in the pixel shader.

The texture wrapping, filtering, color border and mip mapping flags are fully functional in conjunction with pixel shaders.

A change of these texture stage states doesn't require the regeneration of the currently bound shader, because they are not available to shader compile time and the driver can therefore make no assumption about them.

Set Texture (with SetTexture()

After checking the pixel shader support and setting the proper texture operation flags, all textures have to be set by SetTexture(), as with the DX6/7 multitexturing unit. The prototype of this call is:

HRESULT SetTexture(DWORD Stage, IDirect3DBaseTexture8* pTexture);

The texture stage that should be used by the texture is provided in the first parameter and the pointer to the texture interface is provided in the second parameter. A typical call might look like:

m_pd3dDevice->SetTexture( 0, m_pWallTexture);

This call sets the already loaded and created wall texture to texture stage 0.

Define Constants (with SetPixelShaderConstant() / def)

The constant registers can be filled with SetPixelShaderConstant() or the def instruction in the pixel shader. Similar to the SetVertexShaderConstant() call, the prototype of the pixel shader equivalent looks like this

HRESULT SetPixelShaderConstant(
    DWORD Register,
    CONST void* pConstantData,
    DWORD ConstantCount
);

First the constant register must be specified in Register. The data to transfer into the constant register is provided in the second argument as a pointer. The number of constant registers, that have to be filled is provided in the last parameter. For example to fill c0 - c4, you might provide c0 as the Register and 4 as the ConstantCount.

The def instruction is an alternative to SetPixelShaderConstant(). When SetPixelShader() is called, it is effectively translated into a SetPixelShaderConstant() call. Using the def instruction makes the pixel shader easier to read. A def instruction in the pixel shader might look like this:

def c0, 0.30, 0.59, 0.11, 1.0

Each value of the constant source registers has to be in the range [-1.0..1.0].

Pixel Shader Instructions

Using vertex shaders the programmer is free to choose the order of the used instructions in any way that makes sense, whereas pixel shaders need a specific arrangement of the used instructions. This specific instruction flow differs between ps.1.1 - ps.1.3 and ps.1.4.

ps.1.1 - ps.1.3 allow four types of instructions, that must appear in the order shown below:


Figure 8 - ps.1.1 - ps.1.3 Pixel Shader Instruction Flow
(Specular Lighting with Lookup table)

This example shows a per-pixel specular lighting model, that evaluates the specular power with a lookup table. Every pixel shader starts with the version instruction. It is used by the assembler to validate the instructions which follow. Below the version instruction a constant definition could be placed with def. Such a def instruction is translated into a SetPixelShaderConstant() call, when SetPixelShader() is executed.

The next group of instructions are the so-called texture address instructions. They are used to load data from the tn registers and additionally in ps.1.1 - ps.1.3 to modify texture coordinates. Up to four texture address instructions could be used in a ps.1.1 - ps.1.3 pixel shader.

In this example the tex instruction is used to sample the normal map, that holds the normal data. texm* instructions are always used at least as pairs:

texm3x2pad t1, t0_bx2
texm3x2tex t2, t0_bx2

Both instructions calculate the proper u/v texture coordinate pair with the help of the normal map in t0 and sample the texture data from t2 with it. This texture register holds the light map data with the specular power values. The last texture addressing instruction samples the color map into t3.

The next type of instructions are the arithmetic instructions. There could be up to eight arithmetic instructions in a pixel shader.

mad adds t2 and c0, the ambient light, and multiplies the result with t3 and stores it into the output register r0.

Instructions in a ps.1.4 pixel shader must appear in the order shown below:


Figure 9 - ps.1.4 Pixel Shader Instruction Flow
(Simple transfer function for sepia or heat signature effects)

This is a simple transfer function, which could be useful for sepia or heat signature effects. It is explained in detail in [Mitchell]. The ps.1.4 pixel shader instruction flow has to start with the version instruction ps.1.4. After that, as much def instructions as needed might be placed into the pixel shader code. This example sets a Luminance constant value with one def.

There could be up to six texture addressing instructions used after the constants. The texld instruction loads a texture from texture stage 0, with the help of the texture coordinate pair 0, which is chosen by using t0. In the following up to eight arithmetic instructions, color, texture or vector data might be modified. This shader uses only the arithmetic instruction to convert the texture map values to luminance values.

So far a ps.1.4 pixel shader has the same instruction flow like a ps.1.1 - ps.1.3 pixel shader, but the phase instruction allows it to double the number of texture addressing and arithmetic instructions. It divides the pixel shader in two phases: phase 1 and phase 2. That means as of ps.1.4 a second pass through the pixel shader hardware can be done.

Another way to re-use the result of a former pixel shader pass is to render into a texture and use this texture in the next pixel shader pass. This is accomplished by rendering into a seperate render target.

The additional six texture addressing instruction slots after the phase instruction are only used by the texld r5, r0 instruction. This instruction uses the color in r0, which was converted to Luminance values before as a texture coordinate to sample a 1D texture (sepia or heat signature map), which is referenced by r5. The result is moved with a mov instruction into the output register r0.

Adding the number of arithmetic and addressing instructions shown in the pixel shader instruction flow above, leads to 28 instructions. If no phase marker is specified, the default phase 2 allows up to 14 addressing and arithmetic instructions.

Both of the preceding examples show the ability to use dependant reads. A dependant read is a read from a texture map using a texture coordinate which was calculated earlier in the pixel shader. More details on dependant reads will be presented in the next section.

Texture Address Instructions

Texture address instructions are used on texture coordinates. The texture coordinate address is used to sample data from a texture map. Controlling the u/v pair, u/v/w triplet or a u/v/w/q quadruplet of texture coordinates with address operations, gives the ability to choose different areas of a texture map. Texture coordinate "data storage space" can also be used for other reasons than sampling texture data. The registers that reference texture coordinate data, are useful to "transport" any kind of data from the vertex shader to the pixel shader via the oTn registers of a vertex shader. For example the light or half-angle vector or a 3x2, 3x3 or a 4x4 matrix might be provided to the pixel shader this way.

ps.1.1 - ps.1.3 texture addressing

The following diagram shows the ways that texture address instructions work in ps.1.1 - ps.1.3 for texture addressing:


Figure 10 - Texture Addressing in ps.1.1 - ps.1.3

All of the texture addressing happens "encapsulated" in the texture address instructions masked with a grey field. That means results of texture coordinate calculations are not accessible in the pixel shader. The texture address instruction uses these results internally to sample the texture. The only way to get access to texture coordinates in the pixel shader is the texcoord instruction. This instruction converts texture coordinate data to color values, so that they can be manipulated by texture addressing or arithmetic instructions. These color values could be used as texture coordinates to sample a texture with the help of the texreg2* instructions.

The following instructions are texture address instructions in ps.1.1 - ps.1.3. The d and s in the column named Para are the destination and source parameters of the instruction. The usage of texture coordinates is shown by two brackets around the texture register, for example (t0).

Instruction 1.1 1.2 1.3 Para Action
tex x x x d Loads tn with color data (RGBA) sampled from the texture
texbem x x x d, s Transforms red and green components as du, dv signed values of the source register using a 2-D bump mapping matrix, to modify the texture address of the destination register.

Can be used for a variety of techniques based on address perturbation such as fake per-pixel environment mapping, diffuse lighting (bump mapping), environment matting etc..

There is a difference in the usage of the texture stages between environment mapping with a pixel shader (means texbem, texbeml or bem) and environment mapping with the DX 6/7 multitexturing unit. texbem (texbeml or bem) needs the matrix data connected to the texture stage that is sampled. This is the environment map. Environment mapping with the DX 6/7 multitexturing unit needs the matrix data connected to the texture stage used by the bump map (see also the example program Earth Bump on the DVD):

------------------
// texture stages for environment mapping
// with a pixel shader:
SetRenderState(D3DRS_WRAP0,D3DWRAP_U|D3DWRAP_V);
// color map
SetTexture( 0, m_pEarthTexture );
SetTextureStageState(0, D3DTSS_TEXCOORDINDEX, 1);
// bump map
SetTexture(1, m_psBumpMap);
SetTextureStageState(1, D3DTSS_TEXCOORDINDEX, 1);
// enviroment map
SetTexture(2, m_pEnvMapTexture);
SetTextureStageState(2, D3DTSS_TEXCOORDINDEX, 0);
SetTextureStageState(2, D3DTSS_BUMPENVMAT00,
                        F2DW(0.5f)); 
SetTextureStageState(2, D3DTSS_BUMPENVMAT01,
                        F2DW(0.0f));
SetTextureStageState(2, D3DTSS_BUMPENVMAT10,
                        F2DW(0.0f));
SetTextureStageState(2, D3DTSS_BUMPENVMAT11,
                        F2DW(0.5f));

texbem performs the following operations:

u += 2x2 matrix(du)
v += 2x2 matrix(dv)

Then sample (u, v)

Read more in the section "Bump Mapping Formulas" in the DirectX 8.1 SDK documentation.

Some rules for texbem/texbeml:
The s register can not be read by any arithmetic instruction, until it is overwritten again by another instruction:

...
texbem/l t1, t0
mov r0, t0 ; not legal
...
texbem/l t1, t0
mov t0, c0
mov r0, t0 ; legal
...

The s register of texbem/l can not be read by any texture instruction except for the texbem/l instruction:

...
texbem/l t1, t0
texreg2ar t2, t0 ; not legal
...
texbem/l t1, t0
texbem/l t2, t0 ; legal
...
---------------------------------
; t2 environment map
; (t2) texture coordinates environment map
; t1 du/dv perturbation data
ps.1.1
tex t0 ; color map
tex t1 ; bump map
texbem t2, t1 ; Perturb and then sample the
; environment map.
add r0, t2, t0

See the bem instruction for the ps.1.4 equivalent. See also the chapter on particle flow from Daniel Weiskopf and Matthias Hopf [Weiskopf] for an interesting use of texbem.

texbeml x x x d, s

Same as above, applys additionally luminance. Now three components of the source register are used red, green and blue as du, dv and l for luminance.

u += 2x2 matrix(du)
v += 2x2 matrix(dv)

Then sample (u, v) & apply Luminance. See the texbem/l rules in the texbem section.

--------------------------------
; t1 holds the color map
; bump matrix set with the 
; D3DTSS_BUMPENVMAT* flags
ps.1.1
tex t0         ; bump map with du, dv, l data
texbeml t1, t0  ; compute u, v
; sample t1 using u, v
; apply luminance correction
mov r0, t1     ; output result
texcoord x x x d Clamps the texture coordinate to the range [0..1.0] and outputs it as color. If the texture coordinate set contains fewer than three components, the missing components are set to 0. The fourth component is always 1. Provides a way to pass vertex data interpolated at high precision directly into the pixel shader.
--------------------------------
ps.1.1
texcoord t0 ; convert texture coordinates 
; to color
mov r0, t0  ; move color into output 
; register r0
texdp3   x x d, s Performs a three-component dot product between the texture coordinate set corresponding to the d register number and the texture data in s and replicate clamped values to all four color channels of d.
--------------------------------
; t0 holds color map
; (t1) hold texture coordinates
ps.1.2
tex t0        ; color map
texdp3 t1, t0 ; t1 = (t1) dot t0
mov r0, t1 ; output result
texdp3tex   x x d, s Performs a three-component dot product between the texture coordinate set corresponding to the d register number and the texture data in s. Uses the result to do a 1D texture lookup in d and places the result of the lookup into d. A common application is to lookup into a function table stored in a 1-D texture for procedural specular lighting terms.
---------------------------------
; t1 holds 1D color map
ps.1.2
tex t0        ; vector data (x, y, z, w)

texdp3tex t1, t0 ; u = (t1) dot t0
; lookup data in t1
; result in t1
mov r0, t1 ; output result
texkill x x x s Cancels rendering of the current pixel if any of the first three components of the texture coordinates in s is less than zero. When using vertex shaders, the application is responsible for applying the perspective transform. If the arbitrary clipping planes contain anisomorphic scale factors, you have to apply the perspective transform to these clip planes as well.
---------------------------------
ps.1.1
texkill t0 ; mask out pixel using 
;  uvw texture coordinates < 0.0 
mov r0, v0
texm3x2depth     x d, s

Calculates together with a texm3x2pad instruction the depth value to be used in depth testing for this pixel. Performs a three component dot product between the texture coordinate set corresponding to the d register number and the second row of a 3x2 matrix in the s register and stores the resulting w into d. After execution the d register is no longer available for use in the shader.

The benefit of a higher resolution of the depth buffer resulting from multisampling is eliminated, because texm3x2depth (same with ps.1.4: texdepth) will output the single depth value to each of the sub-pixel depth comparison tests.

Needs clamped to [0..1] w and z values or the result stored in the depth buffer will be undefined.

---------------------------------
; (t1) holds row #1 of the 3x2 matrix
; (t2) holds row #2 of the 3x2 matrix
; t0 holds normal map
ps.1.3
tex t0 ; normal map
texm3x2pad t1, t0  ; calculates z from row #1

; calculates w from row #2
; stores a result in t2 depending on
; if (w == 0)
;  t2 = 1.0;
; else
;  t2 = z/w;
texm3x2depth t2, t0
texm3x2pad x x x d, s This instruction cannot be used by itself. It must be combined with either texm3x2tex or texm3x2depth. It performs a three component dot product between the texture coordinate set corresponding to the d register number and the data of the s register and stores the result in d.

See example shown for the texm3x2depth instruction or the example shown for the texm3x2tex instruction.

texm3x2tex x x x d, s It calculates the second row of a 3x2 matrix by performing a three component dot product between the texture coordinate set corresponding to the d register number and the data of the s register to get a v value, which is used to sample a 2D texture. This instruction is used in conjunction with texm3x2pad, that calculates the u value.
--------------------------------
; Dot3 specular lighting with a lookup table
ps.1.1
; t0 holds normal map
; (t1) holds row #1 of the 3x2 matrix (light vector)
; (t2) holds row #2 of the 3x2 matrix (half vector)
; t2 holds a 2D texture (lookup table)
; t3 holds color map
tex t0 ; sample normal
texm3x2pad t1, t0_bx2 ; calculates u from first row
texm3x2tex t2, t0_bx2 ; calculates v from second row
                      ; samples texel with u,v 
                      ; from t2 (lookup table)
tex t3 ; sample base color
mul r0,t2,t3 ; blend terms
-------
; A ps.1.4 equivalent to the 
; texm3x2pad/texm3x2tex pair could be
; specular power from a lookup table
ps.1.4
; r0 holds normal map
; t1 holds light vector
; t2 holds half vector
; r2 holds 2D texture (lookup table)
; r3 holds color map
texld r0, t0 
texcrd r1.rgb, t1 
texcrd r4.rgb, t2

dp3 r1.r, r1, r0_bx2 ; calculates u 
dp3 r1.g, r4, r0_bx2 ; calculates v 

phase

texld r3, t3
texld r2, r1 ; samples texel with u,v 
             ; from r2 (lookup table)
mul r0, r2, r3
texm3x3   x x d, s

Performs a 3x3 matrix multiply similar to texm3x3tex, except that it does not automatically perform a lookup into a texture. The returned result vector is placed in d with no dependent read. The .a value in d is set to 1.0. The 3x3 matrix is comprised of the texture coordinates of the third texture stage, and by the two preceding texture stages. Any texture assigned to any of the three texture stages is ignored. This instruction must be used with two texm3x3pad instructions.

-----------------------------------
; (t1) holds row #1 of the 3x3 matrix
; (t2) holds row #2 of the 3x3 matrix
; (t3) holds row #3 of the 3x3 matrix 
ps.1.2
tex t0 ; normal map
texm3x3pad t1, t0 ; calculates u from row #1
texm3x3pad t2, t0 ; calculates v from row #2
texm3x3 t3, t0    ; calculates w from row #3
                  ; store u, v , w in t3
mov r0, t3

; ps.1.4 equivalent
; r1 holds row #1 of the 3x3 matrix
; r2 holds row #2 of the 3x3 matrix
; r3 holds row #3 of the 3x3 matrix
ps.1.4
def c0, 1.0, 1.0, 1.0, 1.0
texld r0, t0 ; r0 normal map
texcrd r1.rgb, t1 ; calculates u from row #1
texcrd r2.rgb, t2 ; calculates v from row #2
texcrd r3.rgb, t3 ; calculates w from row #3
dp3 r4.r, r1, r0
dp3 r4.g, r2, r0
dp3 r4.b, r3, r0 ; store u, w, w in r4.rgb
mov r0.a, c0
+mov r0.rgb, r4
texm3x3pad x x x d, s Performs the first or second row of a 3x3 matrix multiply. The instruction can not be used by itself and must be used with texm3x3, texm3x3spec, texm3x3vspec or texm3x3tex.
texm3x3spec x x x d,s1, s2

Performs together with two texm3x3pad instructions a 3x3 matrix multiply. The resulting vector is used as a normal vector to reflect the eye-ray vector from a constant register in s2 and then uses the reflected vector as a texture address for a texture lookup in d.

The 3x3 matrix is typically useful for orienting a normal vector of the correct tangent space for the surface being rendered. The 3x3 matrix is comprised of the texture coordinates of the third texture stage and the results in the two preceding texture stages. Any texture assigned to the two preceding texture stages is ignored.

This can be used for specular reflection and environment mapping.

---------------------------------------
; (t1) holds row #1 of the 3x3 matrix
; (t2) holds row #2 of the 3x3 matrix 

; (t3) holds row #3 of the 3x3 matrix
; t3 is assigned a cube or volume map with 
; color data (RGBA)
; t0 holds a normal map
; c0 holds the eye-ray vector E
ps.1.1
tex t0
texm3x3pad t1, t0 ; calculates u from row #1
texm3x3pad t2, t0 ; calculates v from row #2

; calculates w from row #3
; reflect u, v and w by the 
; eye-ray vector in c0
; use reflected vector to lookup texture in t3
texm3x3spec t3, t0, c0 
mov r0, t3 ; output result

; A similar effect is possible with the following 
; ps.1.4 pixel shader.
; The eye vector is stored as a normalized 
; vector in a cube map
ps.1.4
texld r0, t0 ; Look up normal map.
texld r1, t4 ; Eye vector through normalizer cube map
texcrd r4.rgb, t1 ; 1st row of environment matrix
texcrd r2.rgb, t2 ; 2st row of environment matrix
texcrd r3.rgb, t3 ; 3rd row of environment matrix

dp3 r4.r, r4, r0_bx2 ; 1st row of matrix multiply
dp3 r4.g, r2, r0_bx2 ; 2nd row of matrix multiply
dp3 r4.b, r3, r0_bx2 ; 3rd row of matrix multiply
dp3 r5, r4, r1_bx2 ; (N.Eye)
mov r0, r5
texm3x3tex x x x d, s

Performs together with two texm3x3pad instructions a 3x3 matrix multiply and uses the result to lookup the texture in d. The 3x3 matrix is typically useful for orienting a normal vector to the correct tangent space for the surface being rendered. The 3x3 matrix is comprised of the texture coordinates of the third texture stage and the two preceding texture stages. The resulting u, v and w is used to sample the texture in stage 3. Any textures assigned to the preceding textures is ignored.

------------------------------------
; (t1) holds row #1 of the 3x3 matrix
; (t2) holds row #2 of the 3x3 matrix 
; (t3) holds row #3 of the 3x3 matrix
; t3 is assigned a cube or volume map with 
; color data (RGBA)
ps.1.1
tex t0 ; normal map
texm3x3pad t1, t0 ; calculates u from row #1
texm3x3pad t2, t0 ; calculates v from row #2

; calculates w from row #3
; uses u, v and w to sample t3
texm3x3tex t3, t0
mov r0, t3 ; output result

; ps.1.4 equivalent
; r1 holds row #1 of the 3x3 matrix
; r2 holds row #2 of the 3x3 matrix 
; r3 holds row #3 of the 3x3 matrix
; r3 is assigned a cube or volume map with 
; color data (RGBA)
ps.1.4
texld r0, t0
texcrd r1.rgb, t1
texcrd r2.rgb, t2
texcrd r3.rgb, t3
dp3 r4.r, r1, r0  ; calculates u from row #1
dp3 r4.g, r2, r0  ; calculates v from row #2
dp3 r4.b, r3, r0  ; calculates w from row #3
phase
texld r3, r4
mov r0, r3
texm3x3vspec x x x d, s Performs together with two texm3x3pad instructions a 3x3 matrix multiply. The resulting vector is used as a normal vector to reflect the eye-ray vector and then uses the reflected vector as a texture address for a texture lookup. It works just like texm3x3spec, except that the eye-vector is taken from the q coordinates of the three sets of 4D textures. The 3x3 matrix is typically useful for orienting a normal vector of the correct tangent space for the surface being rendered. The 3x3 matrix is comprised of the texture coordinates of the third texture stage and the results in the two preceding texture stages. Any texture assigned to the two preceding texture stages is ignored.

This can be used for specular reflection and environment mapping, where the eye-vector is not constant.

-----------------------------------------
; (t1) holds row #1 of the 3x3 matrix
; (t2) holds row #2 of the 3x3 matrix
; (t3) holds row #3 of the 3x3 matrix
; t3 is assigned a cube or volume map with 
; color data (RGBA)
; t0 holds a normal map
; used for Cubic bump mapping 
; the per-vertex eye vector is derived using 
; the camera position in the vertex shader
ps.1.1
tex t0
texm3x3pad t1, t0 ; calculates u from row #1
texm3x3pad t2, t0 ; calculates v from row #2

; calculates w from row #3
; calculates eye-ray vector from the q texture 
; coodinate values of t1 - t3
; reflect u, v and w by the eye-ray vector 
; use reflected vector to lookup texture in t3
texm3x3vspec t3, t0
mov r0, t3 ; output result
texreg2ar x x x d, s General dependant texture read operation that takes the alpha and red color component of s as texture address data (u, v) consisting of unsigned values, to sample a texture at d.
---------------------------------------
ps.1.1
tex t0 ; color map
texreg2ar t1, t0
mov r0, t1
texreg2gb x x x d, s General dependant texture read operation that takes the green and blue color component of s as texture address data (u, v) consisting of unsigned values, to sample a texture at d.
---------------------------------------
ps.1.1
tex t0 ; color map
texreg2gb t1, t0
mov r0, t1
texreg2rgb   x x d, s General dependant texture read operation that takes the red, green and blue color component of s as texture address data (u, v, w) consisting of unsigned values, to sample a texture at d. This is useful for color-space remapping operations.
---------------------------------------
ps.1.2
tex t0 ; color map
texreg2rgb t1, t0
mov r0, t1

All of these texture address instructions use only the tn registers, with the exception of tex3x3spec, that uses a constant register for the eye-ray vector. In a ps.1.1 - ps.1.3 pixel shader, the destination register numbers for texture addressing instructions had to be in increasing order.

In ps.1.1 - ps.1.3, the ability to re-use a texture coordinate after modifying it in the pixel shader is available through specific texture address instructions, that are able to modify the texture coordinates and sample a texture with these afterwards. The following diagram shows this reliance:


Figure 11 - Dependant Read in ps.1.1 - ps.1.3

The texture address operations that sample a texture after modifying the texture coordinates are:

  • texbem/texbeml
  • texdp3tex
  • texm3x2tex
  • texm3x3spec
  • texm3x3tex
  • texm3x3vspec

The following instructions sample a texture with the help of color values as texture coordinates. If one of these color values are manipulated before, the sampling happens to be a dependant read.

  • texreg2ar
  • texreg2gb
  • texreg2rgb

Therefore these instructions are called general dependant texture read instructions.

As already stated above, each ps.1.1 - ps.1.3 pixel shader has a maximum of 8 arithmetic instructions and 4 texture address instructions. All texture address instructions uses one slot of the supplied slots, with the exception of texbeml, that uses one texture address slot plus one arithmetic slot.

ps.1.4 Texture Addressing

To use texels or texture coordinates in ps.1.4, you always have to load them first with texld or texcrd. These instructions are the only way to get access to texels or texture coordinates. Texture coordinates can be modified after a conversion to color data via texcrd, with all available arithmetic instructions. As a result, texture addressing is more straightforward with ps.1.4.

The following instructions are texture address instructions in ps.1.4:

Instruction Para Action
texcrd d, s

Copies the texture coordinate set corresponding to s into d as color data (RGBA). No texture is sampled. Clamps the texture coordinates in tn with a range of [-MaxTextureRepeat, MaxTextureRepeat] (RADEON 8500: 2048) to the range of rn [-8, 8] (MaxPixelShaderValue). This clamp might behave differently on different hardware. To be safe, provide data in the range of [-8, 8].

A .rgb or .rg modifier should be provided to d. The fourth channel of d is unset/undefined in all cases. The third channel is unset/undefined for a projective divide with _dw.xyz (D3DTFF_PROJECTED is ignord in ps.1.4). The allowed syntax taking into account all valid source modifier/selector and destination write mask combinations, is shown below:

texcrd rn.rgb, tn.xyz
texcrd rn.rgb, tn
texcrd rn.rgb, tn.xyw
texcrd rn.rg, tn_dw.xyw
texdepth d

Calculates the depth value used in the depth buffer comparison test for the pixel by using the r5 register. The r5 register is then unavailable for any further use in the pixel shader. texdepth updates the depth buffer with the value of r5.r and r5.g. The .r channel is treated as the z-value and the .g channel is treated as the w-value. The value in the depth buffer is replaced by the result of the .r channel divided by the .g channel == z/w. If the value in .g channel is zero then the depth buffer is updated with 1.0.

texdepth is only available in phase 2.

Using this instruction eleminates the benefit of a higher resolution of the depth buffer resulting from multisampling, because texdepth (same with texm3x2depth) will output the single depth value to each of the sub-pixel depth comparison tests.

----------------------------------------
ps.1.4
texld r0, t0 ; samples from texture stage 0 with 
; texture coordinates set t0 
texcrd r1.rgb, t1 ; load texture coordinates from 
; t1 into r1 as color values
add 5.rg, r0, r1 ; add both values
phase
texdepth r5 ; calculate pixel depth as r5.r/r5.g
// do other color calculation here and output it to r0
texkill s

Cancels rendering of the current pixel if any of the first three components of the texture data (ps.1.1 - ps.1.3: texture coordinates) in s is less than zero. You can use this instruction to implement arbitrary clipping planes in the rasterizer.

When using vertex shaders, the application is responsible for applying the perspective transform. If the arbitrary clipping planes contain anisomorphic scale factors, you have to apply the perspective transform to the clip planes as well.

texkill is only available in phase 2 and sources rn or tn registers.

---------------------------------
ps.1.4
...        ; include other shader instructions here
phase
texkill t0 ; mask out pixel using 
; uvw texture coordinates < 0.0
mov r0, v0 ; move diffuse into r0
texld d, s

Loads d with the color data (RGBA) sampled using the texture coordinates from s. The texture stage number with the texture data from which to sample is determined by the number of d (r0 - r5) and the texture coordinate set is determined by the number of src (t0 - t5) in phase 1. texld is able to use additionally rn as s in phase 2.

The allowed syntax is:

texld rn, tn 
texld rn, tn.xyz ; same as previous
texld rn, tn.xyw
texld rn, tn_dw.xyw
texld rn, rn
texld rn, rn.xyz ; same as previous
texld rn, rn_dz ; only valid on rn
                              
; no more than two times per shader
texld rn, rn_dz.xyz ; same as previous
----------------------------------
; Simple transfer function for sepia or 
; heat signature effects [Mitchell]
; c0 holds the luminance value
; t0 holds the texture coordinates
; r0 holds the original image
; r5 holds the 1D sepia or heat signature map
ps.1.4
def c0, 0.30, 0.59, 0.11, 1.0
texld r0, t0
dp3 r0, r0, c0
phase
texld r5, r0 ; dependent read
mov r0, r5

; ps.1.2 equivalent 
; t0 holds the original image
; t1 holds the 1D sepia or heat signature map
; (t1) holds 0.30, 0.59, 0.11, 1.0
ps.1.2
tex t0 ; color map
texdp3tex t1, t0 ; u = (t1) dot t0
; lookup data in t1
; result in t1
mov r0, t1 ; output result

In ps.1.4, there are only four texture address instructions but, as mentioned before, all the arithmetic instructions can be used to manipulate texture address information. So there are plenty of instruments to solve texture addressing tasks.

Valid source registers for first phase texture address instructions are tn. Valid source registers for second phase texture address instructions are tn and also rn. Each rn register may be specified as a destination to a texture instruction only once per phase. Aside from this, destination register numbers for texture instructions do not have to be in any particular order (as opposed to previous pixel shader versions in which destination register numbers for texture instructions had to be in increasing order).

No dependencies are allowed in a block of tex* instructions. The destination register of a texture address instruction cannot be used as a source in a subsequent texture address instruction in the same block of texture address instruction (same phase).

Dependent reads with ps.1.4 are not difficult to locate in the source. Pseudo code of the two possible dependent read scenarios in ps.1.4 might look like:

; transfer function
texld ; load first texture
modify color data here
phase
texld ; sample second texture with changed color data as address

or

texcrd ; load texture coordinates
modify texture coordinates here
phase<
texld ; sample texture with changed address

Another way to think of it is that if the second argument to a texld after the phase marker is rn (not tn) then it's a dependent read, because the texture coordinates are in a temp register so they must have been computed:

.....
phase
texld rn, rn

Set first three channels of a rn register, which is used as a source register, has to be set before it is used as a source parameter. Otherwise the shader will fail.

To manipulate texture coordinates with arithmetic instructions, they have to be loaded into texture data registers (ps.1.1 - ps.1.3: tn; ps.1.4: rn) via texcoord or texcrd. There is one important difference between these two instructions. texcoord clamps to [0..1] and texcrd does not clamp at all.

If you compare texture addressing used in ps.1.1 - ps.1.3 and texture addressing used in ps.1.4, it is obvious that the more CISC-like approach uses much more powerful instructions to address textures compared to the more RISC-like ps.1.4 approach. On the other hand, ps.1.4 offers a greater flexibility in implementing different texture addressing algorithms by using all of the arithmetic instructions compared to ps.1.1 - ps.1.3.

Arithmetic Instructions

The arithmetic instructions are used by ps.1.1 - ps.1.3 and ps.1.4 in a similar way, to manipulate texture or color data. Here is an overview of all available instructions in these implementations:

Instruction Arguments Registers Version
vn cn tn rn
add dest
    x x ps.1.1 - ps.1.3
      x ps.1.4
src0, src1 x x x x ps.1.1 - ps.1.3
  x   x ps.1.4 phase 1
x x   x ps.1.4 phase 2

Performs a component-wise addition of register src0 and src1:

dest.r = src0.r + src1.r
dest.g = src0.g + src1.g
dest.b = src0.b + src1.b
dest.a = src0.a + src1.a
--------------------------------------
; glow mapping
ps.1.1
tex t0 ; color map
tex t1 ; glow map
add r0, t0, t1 ; add the color values
; increase brightness lead to a glow effect

; glow mapping
ps.1.4
texld r0, t0 ; color map
texld r1, t1 ; glow map
add r0, r0, r1 ; add the color values
; increase brightness lead to a glow effect

---------
; detail mapping
ps.1.1
tex t0 ; color map
tex t1 ; detail map
add r0, t0, t1_bias ; detail map is add-signed to the color map
; watch out for the used texture coordinates of 
; the detail map

; detail mapping
ps.1.4
texld r0, t0 ; color map
texld r1, t0 ; sample detail map with the texture coords of the color map
add r0, r0, r1_bias ; detail map is add-signed to the color map

You may increase the detail map effect by using _bx2 as a modifier in the add instruction.

Instruction Arguments Registers Version
vn cn tn rn
bem dest
      x ps.1.4 phase 1
src0
  x   x ps.1.4 phase 1
src1       x ps.1.4 phase 1

Apply a fake bump environment transform.

There is a difference in the usage of the texture stages between environment mapping with a pixel shader (means texbem, texbeml or bem) and environment mapping with the DX 6/7 multitexturing unit. bem (texbeml or texbem) needs the matrix data connected to the texture stage that is sampled. This is the environment map. Environment mapping with the DX 6/7 multitexturing unit needs the matrix data connected to the texture stage used by the bump map (see the example code for the texbem instruction).

bem has a lot of restrictions when used in a pixel shader:

  • bem must appear in the first phase of a shader (that is, before a phase marker)
  • bem consumes two arithmetic instruction slots
  • Only one use of this instruction is allowed per shader
  • Destination writemask must be .rg /.xy
  • This instruction cannot be co-issued
  • Aside from the restriction that destination write mask be .rg, modifiers on source src0, src1, and instruction modifiers are unconstrained

bem performs the following calculation:

(Given n == dest register #)
dest.r = src0.r + D3DTSS_BUMENVMAT00(stage n) * src1.r
                + D3DTSS_BUMPENVMAT10(stage n) * src1.g

dest.g = src0.g + D3DTSS_BUMENVMAT01(stage n) * src1.r
                + D3DTSS_BUMPENVMAT11(stage n) * src1.g

------------------------------------
ps.1.4
; r2 environment map texture coordinates
; r1 du/dv perturbation data
texld r1, t1 ; bump map
texcrd r2.rgb, t2
bem r2.rg, r2, r1 ; perturb 
; r2.rg = tex coordinates to sample environment map
phase
texld r0, t0 ; color map
texld r2, r2 ; environment map
add r0, r0, r2

See the example program BumpEarth on the ShaderX DVD. See also the articles on improved environment mapping techniques as Cube Mapping [Hurley][Brennan2][Brennan3] and Per-Fresnel Term [Brennan].

Instruction Arguments Registers Version
vn cn tn rn
cmp dest
    x x ps.1.2, ps.1.3
      x ps.1.4
src0, src1, src2 x x x x ps.1.2, ps.1.3
  x   x ps.1.4 phase 1
x x   x ps.1.4 phase 2

Conditionally chooses between src1 and src2 based on a per-channel comparison src0 >= 0.

For ps.1.2 and ps.1.3 cmp uses two arithmetic instruction slots. CreatePixelShader() erroneously assumes, that this instruction consumes only one instruction slot. So you have to check the instruction count of a pixel shader, which uses this instruction, manually. Another validation problem is, that in ps.1.2 and ps.1.3 the destination register of cmp cannot be the same as any of the source registers.

------------------------------------------
// Compares all four components.
ps.1.2

... fill t1, t2 and t3
// t1 holds -0.6, 0.6, 0, 0.6
// t2 holds 0, 0, 0, 0
// t3 holds 1, 1, 1, 1
cmp r0, t1, t2, t3 // r0 is assigned 1,0,0,0 based on the following:
// r0.x = t3.x because t1.x < 0
// r0.y = t2.y because t1.y >= 0
// r0.z = t2.z because t1.z >= 0
// r0.w = t2.w because t1.w >= 0
----------
// Compares all four components.
ps.1.4
texcrd r1, t1
texcrd r2, t2
texcrd r3, t3
// r1 holds -0.6, 0.6, 0, 0.6
// r2 holds 0, 0, 0, 0
// r3 holds 1, 1, 1, 1
cmp r0, r1, r2, r3 // r0 is assigned 1,0,0,0 based on the following:
// r0.x = r3.x because r1.x < 0
// r0.y = r2.y because r1.y >= 0
// r0.z = r2.z because r1.z >= 0
// r0.w = r2.w because r1.w >= 0
----------
; Cartoon pixel shader
; explained in detail in [Card/Mitchell]
; c0 holds falloff 1
; c1 holds falloff 2
; c2 holds dark
; c3 holds average
; c4 holds bright
; t0 holds normal information
; t1 holds the light vector
ps.1.4
def c0, 0.1f, 0.1f, 0.1f, 0.1f
def c1, 0.8f, 0.8f, 0.8f, 0.8f
def c2, 0.2f, 0.2f, 0.2f, 1.0f
def c3, 0.6f, 0.6f, 0.6f, 1.0f
def c4, 0.9f, 0.9f, 1.0f, 1.0f

texcrd r0.xyz, t0 ; place normal vector in r0
texcrd r1.xyz, t1 ; place light vector in r1
dp3 r3, r0, r1 ; n.l
sub r4, r3, c0 ; subtract falloff #1 from n.l
cmp_sat r0, r4, c3, c2 ; check if n.l is > zero
                       ; if yes use average color                        
                       ; otherwise darker color
sub r4, r3, c1 ; subtract falloff #2 from n.l
cmp_sat r0, r4 c4, r0 ; check if n.l is > zero
                      ; if yes use bright color                       
                      ; otherwise use what is there

; ps.1.2 equivalent with less precision        
ps.1.2
def c0, 0.1f, 0.1f, 0.1f, 0.1f
def c1, 0.8f, 0.8f, 0.8f, 0.8f
def c2, 0.2f, 0.2f, 0.2f, 1.0f
def c3, 0.6f, 0.6f, 0.6f, 1.0f
def c4, 0.9f, 0.9f, 1.0f, 1.0f

texcoord t0 ; place normal vector in t0
texcoord t1 ; place light vector in t1
dp3 r1, t0, t1 ; n.l
sub t3, r1, c0 ; subtract falloff #1 from n.l
cmp_sat r0, t3, c3, c2 ; check if n.l is > zero
                   ; if yes use average color                        
                   ; otherwise darker color
sub t3, r1, c1     ; subtract falloff #2 from n.l
cmp_sat r0, t3, c4, r0 ; check if n.l is > zero
                   ; if yes use bright color                       
                   ; otherwise use what is there

The ps.1.2 version is not able to provide the same precision as the ps.1.4 version, because of texcoord, which clamps to [0..1]. texcrd do not clamp at all. It is able to handle values in the range of its source registers rn [-8..+8].

Instruction Arguments Registers Version
vn cn tn rn
cnd dest
    x x ps.1.1 - ps.1.3
      x ps.1.4
src0       r0.a ps.1.1 - ps.1.3
src1, src2 x x x x ps.1.1 - ps.1.3
src0, src1, src2   x   x ps.1.4 phase 1
x x   x ps.1.4 phase 2

Conditionally chooses between src1 and src2 based on the comparison r0.a > 0.5, whereas ps.1.4 conditionally chooses on between src1 and src2 based on the comparison src0 > 0.5 by comparing all channels of src0.

// Version 1.1 to 1.3
if (r0.a > 0.5)
  dest = src1
else
  dest = src2

// Version 1.4 compares each channel separately.
for each channel in src0
{
   if (src0.channel > 0.5)
     dest.channel = src1.channel 
   else
     dest.channel = src2.channel 
}
------------------------------------------
// Compares r0.a > 0.5
ps.1.1
def c0, -0.5, 0.5, 0, 0.6
def c1, 0, 0, 0, 0
def c2, 1, 1, 1, 1
mov r0, c0
cnd r1, r0.a, c1, c2 // r1 is assigned 0,0,0,0 based on the following:
// r0.a > 0.5, therefore r1.xyzw = c1.xyzw
-----------
// Compares all four components.
ps.1.4
// r1 holds -0.6, 0.5, 0, 0.6
// r2 holds 0, 0, 0, 0
// r3 holds 1, 1, 1, 1
texcrd r1, t1
texcrd r2, t2
texcrd r3, t3
cnd r0, r1, r2, r3 // r0 is assigned 1,1,1,0 based on the following:
// r0.x = r3.x because r1.x < 0.5
// r0.y = r3.y because r1.y = 0.5
// r0.z = r3.z because r1.z < 0.5
// r0.w = r2.w because r1.w > 0.5

See the chapter of Steffen Bendel [Bendel] for an intersting usage of cnd to smooth fonts.

Instruction Arguments Registers Version
vn cn tn rn
dp3 dest
    x x ps.1.1 - ps.1.3
      x ps.1.4
src0, src1 x x x x ps.1.1 - ps.1.3
  x   x ps.1.4 phase 1
x x   x ps.1.4 phase 2

Calculates a three-component dot product. The scalar result is replicated to all four channels:

dest.w = (src0.x * src1.x) + (src0.y * src1.y) + (src0.z * src1.z);
dest.x = dest.y = dest.z = dest.w = the scalar result of dp3

It does not automatically clamp the result to [0..1]. This instruction executes in the vector pipeline. So it can be paired or co-issued with an instruction that executes in the alpha pipeline (More on co-issuing below).

dp3 r0.rgb, t0, v0
+mov r2.a, t0

In ps.1.1 - ps.1.3 dp3 always writes out to .rgb. In ps.1.4 you are free to choose three channels of the four rgba channels in any combination by masking the destination register.

----------------------------------------------
; Dot3 per-pixel specular lighting
; specular power comes from a lookup table
ps.1.4
; r0 holds normal map
; t1 holds light vector
; t2 holds half vector
; r2 holds a 2D texture (lookup table)
; r3 holds color map
texld r0, t0 ; normal map
texcrd r1.rgb, t1
texcrd r4.rgb, t2

dp3 r1.r, r1, r0_bx2 ; calculates u 
dp3 r1.g, r4, r0_bx2 ; calculates v

phase

texld r3, t3
texld r2, r1 ; samples texel with u,v from t2 (lookup table)
mul r0, r2, r3

You will find the ps.1.1 equivalent as an example for the texm3x2tex instruction. See the example RacorX8 and RacorX9 in part 3 of this introduction.

Instruction Arguments Registers Version
vn cn tn rn
dp4 dest
      x ps.1.2, ps.1.3
      x ps.1.4
src0, src1 x x x x ps.1.2, ps.1.3
  x   x ps.1.4 phase 1
x x   x ps.1.4 phase 2

Calculates a four-component dot product. It does not automatically clamp the result to [0..1]. This instruction executes in the vector and alpha pipeline. So it can not be co-issued. Unfortunately, CreatePixelShader() assumes that this instruction consumes only one instruction slot, whereas it really consumes two. So the instruction count in a pixel shader that uses dp4 must be checked manually.

Additionally in ps.1.2 and ps.1.3, the destination register for dp4 cannot be the same as any of the source registers. CreatePixelShader() will not catch a wrong usage.

A maximum of 4 dp4 commands are allowed in a single pixel shader.

dp4 is useful to handle 4x4 matrices or quaternions in a pixel shader. dp4 does not seem to be useful in conjunction with most of the texture address instructions of ps.1.2 and ps.1.3, because these instructions support only matrices with three columns.

Instruction Arguments Registers Version
vn cn tn rn
lrp dest
    x x ps.1.1 - ps.1.3
      x ps.1.4
src0, src1, src2 x x x x ps.1.1 - ps.1.3
  x   x ps.1.4 phase 1
x x   x ps.1.4 phase 2
Performs a linear interpolation based on the following formula:

dest = src2 + src0 * (src1 - src2)

src0 determines the amount for the blend of src1 and src2.

-------------------------------------------
ps.1.1
def c0, 0.4, 0.2, 0.5, 1.0
tex t0
tex t3
lrp r0, c0, t3, t0 ; the texture values of t3 and t0 are 
; interpolated depending on c0
-----------------------
ps.1.4
def c0, 0.4, 0.2, 0.5, 1.0
texld r0, t0
texld r3, t3
lrp r0, c0, r3, r0 ; the texture values of t3 and t0 are 
; interpolated depending on c0

[Vlachos] shows how to programmatically interpolate with lrp between two textures based on their normal.

Instruction Arguments Registers Version
vn cn tn rn
mad

dest

    x x ps.1.1 - ps.1.3
      x ps.1.4
src0, src1, src2 x x x x ps.1.1 - ps.1.3
  x   x ps.1.4 phase 1
x x   x ps.1.4 phase 2
Performs a multiply accumulate operation based on the following forumula

dest = src0 * src1 + src2

This might be useful for example for dark mapping with diffuse color.

-------------------------------------------
; The following examples show a modulation of a light map with a color map. 
; This technique is often used to darken parts of a color map. In this case 
; it is called Dark Mapping. Additionally a diffuse color is added.
ps.1.1
tex t0 ; color map
tex t3 ; light map
mad r0, t3, t0, v0

; Dark Mapping + diffuse color
ps.1.4
texld r0, t0 ; color map
texld r3, t3 ; light map
mad r0, r3, r0, v0
ps.1.1 - ps.1.3
Instruction Arguments Registers Version
vn cn tn rn
mov dest
    x x ps.1.1 - ps.1.3
      x ps.1.4
src x x x x
  x   x ps.1.4 phase 1
x x   x ps.1.4 phase 2
Copies the content of the source to the destination register. Question every use of move, because there might be better suitable instructions.
Instruction Arguments Registers Version
vn cn tn rn
mul

dest

    x x ps.1.1 - ps.1.3
      x ps.1.4
src0, src1 x x x x ps.1.1 - ps.1.3
  x   x ps.1.4 phase 1
x x   x ps.1.4 phase 2
Performs the following operation:
dest = src0 * src1
---------------------------------------
; The following examples show a modulation of a light map with a color map. 
; This technique is often used to darken parts of a color map. In this case 
; it is called Dark Mapping
ps.1.1
tex t0 ; color map
tex t3 ; light map
mul r0, t0, t3

; Dark Mapping
ps.1.4
texld r0, t0 ; color map
texld r3, t3 ; light map
mul r0, r0, r3
Instruction Arguments Registers Version
vn cn tn rn
nop
Performs no operation in ps.1.1 - ps.1.4.
Instruction Arguments Registers Version
vn cn tn rn
sub dest
    x x ps.1.1 - ps.1.3
      x ps.1.4
src0, src1 x x x x ps.1.1 - ps.1.3
  x   x ps.1.4 phase 1
x x   x ps.1.4 phase 2
Performs the following operation:
dest = src0 - src1
---------------------------------
ps.1.1
tex t0 ; color map #1
tex t1 ; color map #2
sub r0, t1, t0 ; subtract t0 from t1

All arithmetic instructions can use the temporary registers rn. The rn registers are initially unset, and cannot be used as source operands until they are written. This requirement is enforced independently per each channel of each rn register. In ps.1.4 the tn registers can not be used with any arithmetic instruction, so they are restricted on texture addressing instructions (exception: texdepth).

Valid source registers for first phase arithmetic instructions are rn and cn. Valid source registers for second phase arithmetic instructions are rn, vn, and cn.

The comparison of ps.1.1 - ps.1.3 to ps.1.4 shows only a few differences. The ps.1.4-only instruction is bem. It substitutes the texbem and texbeml capabilities in an arithmetic operation in ps.1.4. Furthermore the cmd and cnd instructions are more powerful in ps.1.4. The scope of the arithmetic instructions is much bigger in ps.1.4, than in ps.1.1 - ps.1.3, because they are used for all texture addressing and blending tasks in ps.1.4.

As with the vertex shader, the pixel shader arithmetic instructions provide no if-statement, but this functionality can be emulated with cmp or cnd.

All of the rn.a channels are marked as unset at the end of the first phase, and thus cannot be used as a source operand until written. As a result, the fourth channel of color data will be lost during the phase transition. This problem can be partly solved by re-ordering the instructions. For example the following code snippet will loose the alpha value in r3.

ps.1.4
...
texld r3, t3
phase
...
mul r0, r2, r3

The next code snippet will not lose the alpha value:

ps.1.4
...
phase
texld r3, t3
...
mul r0, r2, r3

If no phase marker is present, then the entire shader is validated as being in the second phase.

All four channels of the shader result r0 must be written.

ps.1.1 - ps.1.3 and ps.1.4 are limited in different ways regarding the maximum number of source registers of the same type, that can be read.

Read Port Limit

The read port limit gives you the maximum number of registers of the same register type, that can be used as a source register in a single instruction.

Register Name Version
ps.1.1 ps.1.2 ps.1.3 ps.1.4
cn 2 2 2 2
rn 2 2 2 3
tn 2 3 3 1
vn 2 2 2 2

The color registers have a read port limit of two in all versions. In the following code snippet, mad uses v0 and v1 as a source register:

ps.1.1 // Version instruction
tex t0 // Declare texture
mad r0, v0, t0, v1

This is example exposes a readport limit of 2. The following example exposes a readport limit of 1, because v0 is used twice:

ps.1.1 // Version instruction
tex t0 // Declare texture
mad r0, v0, t0, v0

The following pixel shader fails in ps.1.1:

ps.1.1
tex t0
tex t1
tex t2
mad r0, t0, t1, t2

It exceeds the readport limit of 2 for the texture registers. This shader won't fail with ps.1.2 and ps.1.3, because these versions have a readport limit of 3 for the tn registers. The functional equivalent in ps.1.4 won't fail too:

ps.1.4
texld r0, t0
texld r1, t1
texld r2, t2
mad r0, r0, r1, r2

Another example for the usage of three temporary registers in ps.1.4 in the same instruciton is shown in the examples for the cmp and cnd instructions. In ps.1.4 the tn registers cannot be used with arithmetic instructions and none of the texture address instructions can use more than one tn register as a source, therefore it is not possible to cross the readport limit of the tn registers in ps.1.4.

There is no write port limit, because every instruction has only one destination register.





Instruction Modifiers

Contents
  Introduction
  Pixel Shader Tools
  High Level View
  Instruction Modifiers
  Conclusion

  Printable version
  Discuss this article

The Series
  Fundamentals of Vertex Shaders
  Programming Vertex Shaders
  Fundamentals of Pixel Shaders
  Programming Pixel Shaders
  Diffuse & Specular Lighting with Pixel Shaders