Fast Convex Polygons

 Fast Convex Polygons

 Journal:   Dr. Dobb's Journal  March 1991 v16 n3 p129(6)
 -----------------------------------------------------------------------------
 Title:     Fast convex polygons. (Graphics programming) (tutorial)
 Author:    Abrash, Michael.
 AttFile:    Program:  GP-MAR91.ASC  Source code listing.

 Summary:   Assembly language programming can improve graphics performance and
            help programmers understand the function of their programs.  Fast
            convex polygon filling can be performed by optimizing the existing
            polygon filling implementation.  The drawing portion of the
            program is easily accelerated by using the 'memset' library
            function, which uses the REP STOS instruction.  The code
            replacement yields a speed an order of magnitude faster than the
            old implementation.  The polygon edge tracing procedures are slow
            because they use floating-point calculations, but integer
            calculations are both faster and more accurate.  Modification of
            the drawing and edge tracing portions of the program increase
            performance almost 20 times.  Assembly language can be used in
            place of C to improve speed even further.
 -----------------------------------------------------------------------------
 Descriptors..
 Topic:     Graphics Software
            Programming Instruction
            Painting
            Tutorial
            Type-In Programs
            Solids Modeling.
 Feature:   illustration
            table
            program.
 Caption:   Execution times of various versions of the convex polygon drawing
            code. (table)
            Drawing polygons. (program)

 -----------------------------------------------------------------------------
 Full Text:

 Fast Convex Polygons

 Last month, we explored the surprisingly intricate process of filling convex
 polygons.  This month we're going to fill them an order of magnitude or so
 faster.

 Two thoughts may occur to some of you at this point: "Oh, no, he's not going
 to get into assembly language and device dependent code, is he?" and, "Why
 bother with polygon filling--or, indeed, any drawing primitives--anyway?
 Isn't that what GUIs and third-party libraries are for?"

 To which I answer, "Well, yes, I am," and, "If you have to ask, you've missed
 the magic of microcomputer programming."  Actually, both questions ask the
 same thing, and that is: "Why should I, as a programmer, have any idea how my
 program actually works?"

 Put that way, it sounds a little different, doesn't it?

 GUIs, reusable code, portable code written entirely in high-level languages,
 and object-oriented programming are all the rage now, and promise to remain
 so for the foreseeable future.  The thrust of this technology is to enhance
 the software development process by offloading as much responsibility as
 possible to other programmers, and by writing all remaining code in modular,
 generic form.  This modular code then becomes a black box to be reused
 endlessly without another thought about what actually lies inside.  GUIs also
 reduce development time by making many interface choices for you.  That, in
 turn, makes it possible to create quickly and reliably programs that will be
 easy for new users to pick up, so software becomes easier to both produce and
 learn.  This is, without question, a Good Thing.

 The "black box" approach does not, however, necessarily cause the software
 itself to become faster, smaller, or more innovative; quite the opposite, I
 suspect.  I'll reserve judgement on whether that is a good thing or not, but
 I'll make a prediction: In the short run, the aforementioned techniques will
 lead to noticeably larger, slower programs, as programmers understand less
 and less of what the key parts of their programs do and rely increasingly on
 general-purpose code written by other people.  (In the long run, programs
 will be bigger and slower yet, but computers will be so fast and will have so
 much memory that no one will care.)  Over time, PC programs will also come to
 be more similar to one another--and to programs running on other platforms,
 such as the Mac--as regards both user interface and performance.

 Again, I am not saying that this is bad.  It does, however, have major
 implications for the future nature of PC graphics programming, in ways that
 will directly affect the means by which many of you earn your livings.  Not
 so very long from now, graphics programming--all programming, for that
 matter--will become mostly a matter of assembling in various ways components
 written by other people, and will cease to be the all-inclusively creative,
 mind-bendingly complex pursuit it is today.  (Using legally certified black
 boxes is, by the way, one direction in which the patent lawyers are leading
 us; legal considerations may be the final nail in the coffin of homegrown
 code.)  For now, a PC programmer, to understand and even control every single
 thing that happens on a computer if you so desire, to realize any vision you
 may have.  Take advantage of this unique window of opportunity to create some
 magic!

 Neither does it hurt to understand what's involved in drawing, say, a filled
 polygon, even if you are using a GUI.  You will better understand the
 performance implications of the available GUI functions, and you will be able
 to fill in any gaps in the functions provided.  You may even find that you
 can out-perform the GUI on occasion by doing your own drawing into a system
 memory bitmap, then copying the result to the screen.  You will also be able
 to understand why various quirks exist, and will be able to put them to good
 use.  For example, X Window follows the polygon drawing rules described last
 month (although it's not obvious from the documentation); if you understood
 last month's discussion, you're in good shape to use polygons under X.

 In short, even though it runs counter to current trends, it helps to
 understand how things work, especially when they're very visible parts of the
 software you develop.  That said, let's learn more about filling convex
 polygons.

 Fast Convex Polygon Filling

 When last we left the topic of filling convex polygons, the implementation we
 had met all of our functional requirements.  In particular, it met stringent
 rules that guaranteed that polygons would never overlap at shared edges, an
 important consideration when building polygon-based images.  Unfortunately,
 the implementation was also slow as molasses.  This month we'll work up
 polygon filling code that's fast enough to be truly usable.

 Last month's polygon filling code involved three major tasks, each performed
 by a separate function: Tracing each polygon edge to generate a coordinate
 list (performed by the function ScanEdge); drawing the scanned-out horizontal
 lines that constitute the filled polygon (DrawHorizontalLineList); and
 characterizing the polygon and coordinating the tracing and drawing
 (FillConvexPolygon).  The amount of time that the sample program from last
 month spent in each of these areas is shown in Table 1.  As you can see, half
 the time was spent drawing and the other half was spent tracing the polygon
 edges (the time spent in FillConvexPolygon was relatively minuscule), so we
 have our choice of where to begin optimizing.

 Fast Drawing

 Let's start with drawing, which is easily sped up.  The original code used a
 double-nested loop that called a draw-pixel function to plot each pixel in
 the polygon individually.  That's a ridiculous approach in a graphics mode
 that offers linearly mapped memory, as does VGA mode 13h, the mode with which
 we're working.  At the very least, we could point a far pointer to the left
 edge of each polygon scan line, then draw each pixel in that scan line in
 quick succession, using something along the lines of *ScrPtr++ = FillColor;
 inside a loop.

 However, it seems silly to use a loop when the 80x86 has an instruction, REP
 STOS, that's uniquely suited to filling linear memory buffers.  There's no
 way to use REP STOS directly in C code, but it's a good bet that the memset
 library function uses REP STOS, so you could greatly enhance performance by
 using memset to draw each scan line of the polygon in a single shot.  That,
 however, is easier said than done.  The memset function linked in from the
 library is tied to the memory model in use; in small (tiny, small, or medium)
 data models memset accepts only near pointers, so it can't be used to access
 screen memory.  Consequently, a large (compact, large, or huge) data model
 must be used to allow memset to draw to display memory--a clear case of the
 tail wagging the dog.  This is an excellent example of why, although it is
 possible to use C to do virtually anything, it's sometimes much simpler just
 to use a little assembly code and be done with it.

 At any rate, Listing One (page 146) shows a version of DrawHorizontalLineList
 that uses memset to draw each scan line of the polygon in a single call.
 When linked to last month's test program, Listing One increases pure drawing
 speed (disregarding edge tracing and other nondrawing time) by more than an
 order of magnitude over last month's draw-pixel-based code.  This despite the
 fact that Listing One requires a large (in this case, compact) data model.
 Listing One works fine with Turbo C++, but may not work with other compilers,
 for it relies on the aforementioned interaction between memset and the
 selected memory model.

 At this point, I'd like to mention that benchmarks are notoriously
 unreliable; the results in Table 1 are accurate only for the test program,
 and then only when running on a particular system.  Results could be vastly
 different if smaller, larger, or more complex polygons were drawn, or if a
 faster or slower computer/VGA combination were used.  These factors
 notwithstanding, the test program does fill a variety of polygons of varying
 complexity sized from large to small and in between, and certainly the order
 of magnitude difference between Listing One and the old version of
 DrawHorizontalLineList is a clear indication of which code is superior.

 Anyway, Listing One has the desired effect of vastly improving drawing time.
 There are cycles yet to be had in the drawing code, but as tracing polygon
 edges now takes 92 percent of the polygon filling time, it's logical to
 optimize the tracing code next.

 Fast Edge Tracing

 There's no secret as to why last month's ScanEdge was so slow: It used
 floating point calculations.  One secret of fast graphics is using integer or
 fixed-point calculations, instead.  (Sure, the floating point code would run
 faster if a math coprocessor were installed, but it would still be slower
 than the alternatives; besides, why require a math coprocessor when you don't
 have to?)  Both integer and fixed-point calculations are fast.  In many
 cases, fixed-point is faster, but integer calculations have one tremendous
 virtue: They're completely accurate.  The tiny imprecision inherent in either
 fixed- or floating-point calculations can result in occasional pixels being
 one off from their proper location.  This is no great tragedy, but after
 going to so much trouble to ensure that polygons don't overlap at common
 edges, why not get it exactly right?

 In fact, when I tested out the integer edge tracing code by comparing an
 integer-based test image to one produced by floating point calculations, two
 pixels out of the whole screen differed, leading me to suspect a bug in the
 integer code.  It turned out, however, that in those two cases, the floating
 point results were sufficiently imprecise to creep from just under an integer
 value to just over it, so that the ceil function returned a coordinate that
 was one too large.  Floating point is very accurate--but it is not precise.
 Integer calculations, properly performed, are.

 Listing Two (page 146) shows a C implementation of integer edge tracing.
 Vertical and diagonal lines, which are trivial to trace, are special-cased.
 Other lines are broken into two categories: Y-major (closer to vertical) and
 X-major (closer to horizontal).  The handlers for the Y-major and X-major
 cases operate on the principle of similar triangles: The number of X pixels
 advanced per scan line is the same as the ratio of the X delta of the edge to
 the Y delta.  Listing Two is more complex than the original floating point
 implementation, but not painfully so.  In return for that complexity, Listing
 Two is more than 80 times faster--and, as just mentioned, it's actually more
 accurate than the floating point code.

 You gotta love that integer arithmetic.

 The Finishing Touch: Assembly Language

 The C implementation is now nearly 20 times as fast as the original, good
 enough for most purposes.  Still, it requires that a large data model be used
 (for memset), and it's certainly not the fastest possible code.  The obvious
 next step is assembly language.

 Listing Three (page 146) is an assembly language version of
 DrawHorizontalLineList.  In actual use, it proved to be about 36 percent
 faster than Listing One; better than a poke in the eye, but just barely.
 There's more to these timing results than meets that eye, though.  Display
 memory generally responds much more slowly than system memory, especially in
 386 and 486 systems.  That means that much of the time taken by Listing Three
 is actually spent waiting for display memory accesses to complete, with the
 processor forced to idle by wait states.  If, instead, Listing Three drew to
 a local buffer in system memory or to a particularly fast VGA, the assembly
 implementation might well display a far more substantial advantage over the C
 code.

 And indeed it does.  When the test program is modified to draw to a local
 buffer, both the C and assembly language versions get 0.29 seconds faster,
 that being a measure of the time taken by display memory wait states.  With
 those wait states factored out, the assembly language version of
 DrawHorizontalLineList becomes almost three times as fast as the C code.

 There is a lesson here.  An optimization has no fixed payoff; its value
 fluctuates according to the context in which it is used.  There's relatively
 little benefit to further optimizing code that already spends half its time
 waiting for display memory; no matter how good your optimizations, you'll get
 only a two-times speedup at best, and generally much less than that.  There
 is, on the other hand, potential for tremendous improvement when drawing to
 system memory, so if that's where most of your drawing will occur,
 optimizations such as Listing Three are well worth the effort.

 Know the environments in which your code will run, and know where the cycles
 go in those environments.

 Maximizing REP STOS

 Listing Three doesn't take the easy way out and use REP STOSB to fill each
 scan line; instead, it uses REP STOSW to fill as many pixel pairs as possible
 via word-sized accesses, using STOSB only to do odd bytes.  Word accesses to
 odd addresses are always split by the processor into 2-byte accesses.  Such
 word accesses take twice as long as word accesses to even addresses, so
 Listing Three makes sure that all word accesses occur at even addresses, by
 performing a leading STOSB first if necessary.

 Listing Three is another case in which it's worth knowing the environment in
 which your code will run.  Extra code is required to perform aligned
 word-at-a-time filling, resulting in extra overhead.  For very small or
 narrow polygons, that overhead might overwhelm the advantage of drawing a
 word at a time, making plain old REP STOSB faster.

 Faster Edge Tracing

 Finally, Listing Four (page 147) is an assembly language version of ScanEdge.
 Listing Four is a relatively straightforward translation from C to assembly,
 but is nonetheless almost twice as fast as Listing Two.

 The version of ScanEdge in Listing Four could certainly be sped up still
 further by unrolling the loops.  FillConvexPolygon, the overall coordination
 routine, hasn't even been converted to assembly language, so that could be
 sped up as well.  I haven't bothered with these optimizations because all
 code other than DrawHorizontalLineList takes only 14 percent of the overall
 polygon filling time when drawing to display memory; the potential return on
 optimizing nondrawing code simply isn't great enough to justify the effort.
 Part of the value of a profiler is being able to tell when to stop
 optimizing; with Listings Three and Four in use, more than two-thirds of the
 time taken to draw polygons is spent waiting for display memory, so
 optimization is pretty much maxed out.  However, further optimization might
 be worthwhile when drawing to system memory, where wait states are out of the
 picture and the nondrawing code takes a significant portion (46 percent) of
 the overall time.

 Again, know where the cycles go.

 By the way, note that all the versions of ScanEdge and FillConvexPolygon that
 we've looked at are adapter independent, and that the C code is also machine
 independent; all adapterspecific code is isolated in DrawHorizontalLineList.
 This makes it easy to add support for other graphics formats, such as the
 8514/A, the XGA, or, for that matter, a non-PC system.

 Books of the Month

 Two books share honors this month.  One isn't new, but is well worth having
 if you're a PC graphics programmer.  Programmer's Guide to PC & PS/2 Video
 Systems, by Richard Wilton (Microsoft Press, 1987) is pretty much the
 standard graphics reference for the PC, covering all standards up through the
 VGA, including MCGA, EGA, Hercules, CGA, and MDA (but not 8514/A).  For
 8514/A programming, Graphics Programming for the 8514/A, by Jake Richter and
 Bud Smith (M&T Books, 1990), is good; which is lucky, because it's the only
 8514/A book I know of!

 [BACK] Back
Discuss this article in the forums
Date this article was posted to GameDev.net: 7/16/1999
(Note that this date does not necessarily correspond to the date the article was written)
See Also:
Michael Abrash's Articles