Perspective Corrected Texture Mapping

 Perspective Corrected Texturing

 [From Tom Hammerslevs Graphics Coding Page
 Check it out at http://www.users.globalnet.co.uk/~tomh ]

 Perspective Corrected Texturing

 ---------------------------------------------------------------------------

 Introduction

 Perspective texturing is an almost essential addition to any 3D engine
 nowadays. People are sick of watching affine texture slip and slide all
 over their polygons, they want something better. And perspective texturing
 solves this problem. I'll be describing Chris Hecker-style perspective
 texturing here, rather than the 3 "Magic Vectors" method.
 ---------------------------------------------------------------------------

 Affine Texturing

 Ordinary texturing is very simple to code. We specify u, v at each vertex
 of our polygon (I'll be describing triangles however). We interpolate the U
 and V across the triangle edges, and then across scanlines. With triangles,
 the delta for the scanline is constant, which speeds things up a lot. Then
 its just a case of interpolating texture co-ords across a scanline, and
 sampling the texture map. Very simple, very fast. And very limited. If your
 triangles get too big, too much difference in Zs at vertices etc, the
 texture starts to slide. It looks horrible. We need perspective correction.

 ---------------------------------------------------------------------------

 Perspective Texturing

 Probably the most common form of perspective texturing is done via a divide
 by Z. Its a very simple algorithm. Instead of interpolate U and V, we
 instead interpolate U/Z and V/Z. 1/Z is also interpolated. At each pixel,
 we take our texture co-ords, and divide them by Z. Hang on, you're thinking
 - if we divide by the same number twice (Z) don't we get back to where we
 started - like a double reciprocal? Well, sort of. Z is also interpolated,
 so we're not dividing by the same Z twice. We then take the new U and V
 values, index into our texture map, and plot the pixel. Pseudo-code might
 be:

 su = Screen-U = U/Z
 sv = Screen-V = V/Z
 sz = Screen-Z = 1/Z

 for x=startx to endx
         u = su / sz
         v = sv / sz
         PutPixel(x, y, texture[v][u])
         su += deltasu
         sv += deltasv
         sz += deltasz
 end

 Very simple, and very slow.

 ---------------------------------------------------------------------------

 Speeding Up The Routine

 The first thing that comes to mind when speeding up this routine is the two
 divides - divides are a slow operation, and should be avoided. So, we'll
 turn those 2 divides into a reciprocal and a multiply:

 for x=startx to endx
         recip = 1.0 / sz
         u = su * recip
         v = sv * recip
         PutPixel(x, y, texture[v][u])
         su += deltasu
         sv += deltasv
         sz += deltasz
 end

 This helps things a little. The second big way of speeding it up is to lerp
 (linear interpolate) between sets of 'correct' u, v. We calculate correct
 u, v every n pixels, and interpolate between them. This cuts down on the
 divides overall, but it can lead to problems: if your correction value is
 too high for your resolution, the texture will 'wiggle' - the sample rate
 is too low. If your correction value is too high, you'll see all sorts of
 weird bendy patterns at certain viewing angles. It takes a little time to
 find the best correction level for a given resolution. Pseudo-code for this
 would look something like:

 zinv = 1.0 / sz;        // do the divide here
 width++;
 oddedge = width & cormask; // test for case of raggy-edge

 zinv *= 65536.0;
 u = su * zinv;
 v = sv * zinv;          // reciprocal then multiply
 RoundToInt(logu1, u);
 RoundToInt(logv1, v);
 sv += cordvdelta;       // cordvdelta etc are deltasv*correction
 su += cordudelta;
 sz += cordzdelta;

 zinv = (1.0 / sz) * 65536.0; // muls by 65536 are used to do
                              // u << 16 a little better
                              // one fmul = 3 clocks
                              // 2 shls = 2*2 clocks

 while(width > 0) {
         if(width >= correction)
                 pixels = correction;
         else
                 pixels = oddedge;       // we have a raggy edge

         width -= correction;    // even if edge is raggy loop will
                                 // still terminate due to this

         u = su * zinv;
         v = sv * zinv;
         RoundToInt(&logu2, u);
         RoundToInt(&logv2, v);

         luadd = (logu2 - logu1) >> corshift; // deltas for linear
         lvadd = (logv2 - logv1) >> corshift; // pass

         logu = logu1;   // 'logical' u and v
         logv = logv1;

         sv += cordvdelta;
         su += cordudelta;
         sz += cordzdelta;

         zinv = 1.0 / sz;        // again, do divide in parallel
         while(pixels--) {
                 index = ((logv >> 8) & 0xFF00) +
                         ((logu >> 16) & 0xFF);
                 PutPixel(x, y, texture[index]);
                 logu += luadd;
                 logv += lvadd;
         }

         zinv *= 65536.0;
         logu1 = logu2;
         logv1 = logv2;
 }

 This is based on the loop I use. I use the idea of doing floating point
 operations in parallel a lot here, because it means we can effectively get
 them for free. However it is often quite hard to persuade the compiler that
 this is what you want to do; it'll take a little experimentation. Note also
 that this loop doesn't have a seperate if () {} statement to cover the case
 of a 'raggy-edge', like most perspective texturers do. I see that in a lot
 of code, this way is smaller, and easier to maintain and optimize.

 ---------------------------------------------------------------------------

 Other Considerations

 A lot of people take a religious hatred to floating point calculations,
 because they think they are slow. Well, they may have been slow in times
 past, but now CPUs can do them very quickly indeed; they can be up to 29%
 faster on a 486DX, and 40% faster on a 586 (intel). I found these figures
 by doing 1,000,000 matrix muls, 1,000,000 * (Add/Sub/Div/Mul). Note that
 conversion to integer however, is still slow.

 I know one person in particular was adamant that FPU was slower in his
 tests. What was his test? Something like:

 for(x=0;x<65536; x++)
         array[x] = 1.0 * x;

 What a stupid test! For a number of reasons:

   1. Its not representative of the kind of work you'd do in a 3D engine
   2. Most compilers would optimize away a mutiplication by 1.0.
   3. Conversion to integer can be slow; especially in Watcom C 10.6 for
      DOS. Did you know that to convert to integer, it calls a function
      __CHP, which contains the following code:

      __CHP:          push    eax
                      fstcw   dword ptr [esp]
                      wait
                      push    dword ptr [esp]
                      mov     byte ptr +1H[esp],1fH
                      fldcw   dword ptr [esp]
                      frndint
                      fldcw   dword ptr +4H[esp]
                      wait
                      lea     esp,+8H[esp]
                      ret

      Its amazing what can be done with WDISASM, and WLIB, and a little
      lateral thought...

 FPU operations can also be done in parallel with the integer unit on Intel
 chips. I don't think this can be done on Cyrix. Thats not worth worrying
 about. Despite all you might hate intels monopoly, Cyrix have no real
 chance of ever breaking it. So you may as well optimize with an Intel chip
 in mind.

 The 1/z values can also be used for Z-buffering. Which is very handy. You
 can then have perspective correct texturing, and perspective correct
 Z-Buffering, at little speed cost. See my page on Z-Buffering for more
 information on that.

 I also toyed with the idea of pre-perspective correcting textures. I heard
 that in Quake, textures are lit in a seperate pass to the texturing, due to
 the lack of registers on the 586 (can Intel count higher than the number of
 fingers they posess?). I wonder if it would be possible to do a similar
 thing with perspective texturing? Theoretically, it shouldn't work, because
 I think that the routine is dependant on the shape of the polygon being
 mapped to. If anyone has any thoughts on this, I'd be very interested.

 Another possible speed up would be to use an affine texture where the
 change in z is very little, and a perspective texturer where the change is
 large. Hmm.. what would this look like in code?

 average-z = (z1 + z2 + z3) / 3

 zdiff = 0
 for n=1 to 3
         zdiff += (z(n) - average-z)**2
 end

 if zdiff < z-threshold**2
         Affinetexture(polygon)
 else
         PerspectiveTexture(polygon)
 end

 Sound about right? Maybe I'll try this one day. Idea here is to find
 difference between average Z and Z of each triangle. Distance is not square
 -rooted, just kept as a square, then compared against squared threshold. If
 too much Z change is present, then perspective is used. This however may
 fail with large triangles.

 Tom Hammersley, tomh@globalnet.co.uk

 [BACK] Back
Discuss this article in the forums
Date this article was posted to GameDev.net: 7/16/1999
(Note that this date does not necessarily correspond to the date the article was written)
See Also:
Texture Mapping