Raw Speed and more (X-sharp 3-d prog.)

 Journal:   Dr. Dobb's Journal  April 1992 v17 n4 p139(7)
 Title:     Raw speed and more. (Graphics Programming column; X-Sharp 3-D
            animation program) (Tutorial)
 Author:    Abrash, Michael.
 AttFile:    Program:  GP-APR92.ASC  Source code listing.

 Abstract:  X-Sharp three-dimensional (3-D) software has assembly language
            added for performance improvement, and for matrix math functions.
            Several functions can be eliminated by converting them to pure
            assembly language, such as the function to multiply a matrix by a
            vector, and the function to concatenate matrices.  Depth sorting
            is useful for handling hidden surfaces, however it does not
            address the issue of nonconvex objects.  Convex polyhedrons are
            addressed.  Fast microcomputer-based animation is not seriously
            affected by problems with depth sorting, problems that can affect
            photo-realistic rendering to a greater extent.  Features that are
            still to be added to X-Sharp include shading, antialiasing,
            support for non-convex objects, 3-D clipping, texture mapping and
            performance improvement.
 Topic:     Assembly Language
            Programming Instruction
            Graphics Software
            Performance Improvement
            Software Design
 Feature:   illustration

 Full Text:

 As usual, there's an awful lot to cover this month, so I'll have to make this
 quick.  That's a shame, because this absolutely true story is even lovelier
 in the long version, as told by--well, I promised I wouldn't even hint at who
 he is, for reasons that will soon be obvious.

 Years ago, this friend of mine--let's call him Bert--went to Hawaii with
 three other fellows to celebrate their graduation from high school.  This was
 an unchaperoned trip, and they behaved pretty much as responsibly as you'd
 expect four teenages to behave, which is to say, not; there's a story about a
 rental car that, to this day, Bert can't bring himself to tell.  They had a
 good time, though, save for one thing: no girls.

 By and by, they met a group of girls by the pool, but the boys couldn't get
 past the hi-howya-doin stage, so they retired to their hotel room to plot a
 better approach.  This being the early '70s, and them being slightly tipsy
 teenagers with raging hormones and the effective combined IQ of four
 eggplants, it took them no time at all to come up with a brilliant plan:
 streaking.  The girls had mentioned their room number, so the boys piled into
 the elevator, pushed the button for the girls' floor, shucked their clothes
 as fast as they could, and sprinted to the girls' door.  They knocked on the
 door and ran on down the hall.  As the girls opened their door, Bert and his
 crew raced past, toward the elevator, laughing hysterically.

 Bert was by far the fastest of them all.  He whisked between the elevator
 doors just as they started to close; by the time his friends got there, it
 was too late, and the doors slid shut in their faces.  As the elevator began
 to move, Bert could hear the frantic pounding of six fists thudding on the
 closed doors.  As Bert stood among the clothes littering the elevator floor,
 the thought of his friends stuck in the hall, naked as jaybirds, was just too
 much, and he doubled over with helpless laughter, tears streaming down his
 face.  The universe had blessed him with one of those exceedingly rare
 moments of perfect timing and execution.

 The universe wasn't done with Bert quite yet, though.  He was still contorted
 with laughter--and still quite thoroughly undressed--when the elevator doors
 opened again.  On the lobby.

 And with that, we come to this month's topics, raw speed and hidden surfaces.

 Raw Speed, Part I: Assembly Language

 I would like to state, here and for the record, that I am not an assembly
 language fanatic.  Frankly, I prefer programming in C; assembly language is
 hard work, and I can get a whole lot more done with fewer hassles in C.
 However, I am a performance fanatic, performance being defined as having
 programs be as nimble as possible in those areas where the user wants fast
 response.  And, in the course of pursuing performance, there are times when a
 little assembly language goes a long way.

 We're now in the fourth month of working on the X-Sharp #-D animation
 package.  In real-time animation, performance is sine qua non--Latin for
 "Make it fast or find another line of work"--so some judiciously applied
 assembly language is in order.  Last month, we got up to a serviceable
 performance level by switching to fixed-point math, then implementing the
 fixed-point multiplication and division functions in assembler in order to
 take advantage of the 386's 32-bit capabilities.  There's another area of the
 program that fairly cries out for assembly language: matrix math.  The
 function to multiply a matrix by a vector (XformVec()) and the function to
 concatenate matrices (ConcatXforms()) both loop heavily around calls to
 FixedMul(); a lot of calling and looping can be eliminated by converting
 these functions to pure assembly language.

 Listing One, page 157, is the module FIXED.ASM from the current version of
 X-Sharp, with XformVec() and ConcatXforms() implemented in assembly language.
 The code is heavily optimized, to the extent of completely unrolling the
 loops via macros so that looping is eliminated althogether.  FIXED.ASM is
 highly effective; the time taken for matrix math is now down to the point
 where it's a fairly minor component of execution time, representing less than
 ten percent of the total.  It's time to turn our optimization sights

 Raw Speed, Part II: Look it Up

 It's a funny thing about Turbo Profiler: Time spent in the Borland C++ 80x87
 emulator doesn't show up directly anywhere that I can see in the timing
 results.  The only way to detect it is by way of the line that reports what
 percent of total time is represented by all the areas that were profiled; if
 you're profiling all areas, whatever's not explicitly accounted for seems to
 be the floating-point emulator time.  This quirk fooled me for a while,
 leading me to think sine and and cosine weren't major drags on performance,
 because the sin() and cos() functions spend most of their time in the
 emulator, and that time doesn't show up in Turbo Profiler's statistics on
 those functions.  Once I figured out what was going on, it turned out that
 not only were sin() and cos() major drags, they were taking up over half the
 total execution time by themselves.

 The solution is a lookup table.  Listing One contains a function called
 CosSin() that calculates both the sine and cosine of an angle, via a lookup
 table.  The function accepts agnles in tenths of degrees; I decided to use
 tenths of degrees rather than radians because that way it's always possible
 to look up the sine and cosine of the exact angle requested, rather than
 approximating, as would be required with radians.  Tenths of degrees should
 be fine enough control for most purposes; if not, it's easy to alter CosSin()
 for finer gradations yet.  GENCOS.C, the program used to generate the lookup
 table (COSTABLE.INC), included in Listing One, can be found in the X-Sharp
 archive.  GENCOS.C can generate a cosine table with any integral number of
 steps per degree.

 FIXED.ASM speeds X-Sharp up quite a bit, and it changes the performance
 balance a great deal.  When we started out with 3-D animation, calculation
 time was the dragon we faced; more than 90 percent of the total time was
 spent doing matrix and projection math.  Additional optimizations in the area
 of math could still be made (using 32-bit multiplies in the backface-removal
 code, for example), but fixed-point math, the sine and cosine lookup, and
 selective assembler optimizations have done a pretty good job already.  The
 bulk of the time taken by X-Sharp is now spent drawing polygons, drawing
 rectangles (to erase objects), and waiting for the page to flip.  In other
 words, we've slain the dragon of 3-D math, or at least wounded it grievously;
 now we're back to the dragon of polygon filling.  We'll address faster
 polygon filling soon, but for the moment, we have more than enough horsepower
 to have some fun with.  First, though, we need one more feature: hidden

 Hidden Surfaces

 So far, we've made a number of simplifying assumptions in order to get the
 animation to look good; for example, all objects must currently be convex
 polyhedrons.  We'll deal with that one down the road a little, but first we
 have to address a still more fundamental limitation, which is that right now,
 objects can never pass behind or in front of each other.  In short, it's time
 to have a look at hidden surfaces.

 There are a passel of ways to do hidden surfaces.  Way off at one end (the
 slow end) of the spectrum is z-buffering, whereby each pixel of each polygon
 is checked as it's drawn to see whether it's the frontmost version of the
 pixel at those coordinates.  At the other end is the technique of simply
 drawing the objects in back-to-front order, so that nearer objects are drawn
 on top of farther objects.  The latter approach, depth sorting, is the one
 we'll take today.  (Actually, true depth sorting involves detecting and
 resolving possible ambiguities when objects overlap in z; simply sorting on
 z, which we'll be doing this month, is more precisely known as the "painter's

 Depth sorting is fast but less than perfect.  For one thing, it doesn't
 address the issue of nonconvex objects, so we'll have to stick with convex
 polyhedrons for now.  For another, there's the question of what part of each
 object to use as the sorting key; the nearest point, the center, and the
 farthest point are all possibilities--and, whichever point is used, depth
 sorting doesn't handle some overlap cases properly.  Figure 1 illustrates one
 case in which back-to-front sorting doesn't work, regardless of what point is
 used as the sorting key.

 For photo-realistic rendering, these are serious problems.

 For fast PC-based animation, however, they're manageable.  Choose objects
 that aren't too elongated; arrange their paths of travel so they don't
 intersect in problematic ways; and, if they do overlap incorrectly, trust
 that the glitch will be lost in the speed of the animation and the complexity
 of the screen.

 Listing Two, page 159, shows the key routines for depth sorting, from the
 X-Sharp file OLIST.C.  Objects are now stored in a linked list.  The initial,
 empty list, created by InitializeObjectList(), consists of a sentinel entry
 at either end, one at the farthest possible z coordinate, and one at the
 nearest.  New entries are inserted by AddObject() in z-sorted order.  Each
 time the objects are moved, before they're drawn at their new locations,
 SortObjects() is called to z-sort the object list, so that drawing wil
 proceed from back to front.  The z-sorting is done on the basis of the
 objects' center points; a center-point filed has been added to the object
 structure to support this, and the center point for each object is now
 transformed along with the vertices.  That's really all there is to depth
 sorting--and now we can have objects that overlap in x and y.

 I wish I could reproduce the lastes X-Sharp demonstration program here; I
 really do.  It's a lot of fun, with 11 spinning cubes and a whirling faceted
 ball bouncing back and forth between the screen and deep space at varying
 speeds.  The number of faces has nearly doubled since last month, to 138, and
 the number of vertices is up to 150, but thanks to FIXED.ASM, the animation
 is snappier than ever, and of course there's a much-enhanced sense of depth,
 thanks to the visual cues made possible by depth sorting.  We've reached the
 point of genuinely dynamic animation on a 386, even with a slowpoke VGA; my
 daughter thinks it's a little scary having those things hurtling toward her
 so fast.  On a 486 with a fast (Tseng ET-4000) VGA, the animation is too
 fast; the illusion of reality is broken.  We should all have such problems.

 I'd like to list the demo program in its entirety, but I can't.  As I
 explained last month, X-Sharp is way too large, over 2000 lines now; even the
 changes since last month would blow my page budget out of the water.
 Instead, I've made the full source available in the file XSHARPn.ARC in the
 DDJ Forum on CompuServe, on M&T Online, and in the graphic.disp conference on
 Bix.  Alternatively, you can send me a 360K or 720K formatted diskette and an
 addressed, stamped diskette mailer, care of DDJ, 411 Borel Ave., San Mateo,
 CA 94402, and I'll send you the latest copy of X-Sharp.  There's no charge,
 but it'd be very much appreciated if you'd slip in a dollar or so to help out
 the folks at the Vermont Association for the Blind and Visually Impaired.
 It's not every day that you can make a difference so easily--and, believe me,
 their work does make a difference.

 I'm available on a daily basis to discuss X-Sharp on M&T Online and Bix (user
 name mabrash in both cases).


 FIXED.ASM containes the equate ROUNDING[underscore]ON.  When this equate is
 1, the results of multiplications and divisions are rounded to the nearest
 fixed-point values; when it's 0, the results are truncated.  The difference
 between the results produced by the two approaches is, at most, [2.sup.-16];
 you wouldn't think that would make much difference, now, would you?  But it
 does.  When the animation is run with rounding disabled, the cubes start to
 distort visibly after a few minutes, and after a few minutes more they look
 like they've been run over.  In contrast, I've never seen any significant
 distortion with rounding on, even after a half-hour or so.  I think the
 difference with rounding is not that it's so much more accurate, but rather
 that the errors are evenly distributed; with truncation, the errors are
 biased, and biased errors become very visible when they're applied to
 right-angle objects.  Even with rounding, though, errors will eventually
 creep in, and reorthogonalization will become necessary at some point.  We'll
 have a look at that soon.

 The performance cost of rounding is small, and the benefits are highly
 visible.  Still, truncation errors become significant only when they
 accumulate over time, as, for example, when rotation matrices are repeatedly
 concatenated over the course of many transformations.  Some time could be
 saved by rounding only in such cases.  For example, division is performed
 only in the course of projection, and the results do not accumulate over
 time, so it would be reasonable to disable rounding for division.

 Having a Ball

 For three months, we've had nothing to look at but triangles and cubes.  It's
 time for something a little more visually appealing, so the demonstration
 program now features a 72-sided ball.  What's particularly interesting about
 this ball is that it's created by the GENBALL.C program in the BALL
 subdirectory of X-Sharp, and both the size of the ball and the number of
 bands of faces are programmable.  GENBALL.C spits out to a file all the
 arrays of vertices and faces needed to create the ball, ready for inclusion
 in INITBALL.C.  True, if you change the number of bands, you must change the
 Color array in INITBALL.C to match, but that's a tiny detail; by and large,
 the process of generating a ballshaped object is now automated.  In fact,
 we're not limited to ball-shaped objects; substitute a different vertex and
 face generation program for GENBALL.C, and you can make whatever convex
 polyhedron you want; again, all you have to do is change the Color array
 correspondingly.  You can easily create multiple versions of the base object,
 too; INITCUBE.C is an example of this, creating 11 different cubes.

 What we have here is the first glimmer of an object-editing system.
 GENBALL.C is the prototype for object definition, and INITBALL.C is the
 prototype for general-purpose object instantiation.  Certainly, it would be
 nice to have an interactive 3-D object editing tool and resource management
 setup, and perhaps we'll do that someday.  We have our hands full with the
 drawing end of things at the moment, though, and for now it's enough to be
 able to create objects in a semiautomated way.

 Eating Crow and Other Delicacies

 Back in November, I said, "Hicolor programming unavoidably requires handling
 broken rasters"  (Hicolor mode features 32K colors, by means of the Sierra
 Hicolor DAC combined with a Hicolor-capable Super-VGA.)  Any absolute
 assertion on my part seems to be the cue for readers to come swarming out of
 the woodworks with examples to the contrary.  In this case, M&T Online user
 rfredrick was quick to point out that setting the bitmap width--th offset
 from the start of one line to the start of the next--to 1928 bytes in 640X480
 Hicolor mode eliminates broken rasters (displayed lines that span banks).  A
 bitmap width of 1928 is selected by setting the Row Offset register (CRTC
 register 13h) to 241, like so: outpw(0x3d4, 0x13 \ (241 << 8));.  Once the
 width is set, all you have to do is use 1928 for the offset from one row to
 the next, rather than 1280, and you're all set.  Actually, the breaks are
 still there, but they can be ignored because they happen off to the right of
 the displayed portion of the bitmap.  To get the Hicolor drawing code from
 the November 1991 column to support 1928-wide Hicolor bitmaps, just change
 BitmapWidthInBytes from 640*2 to 1928.

 rfrederick credits this information to the folks at Everex.  Thanks to all
 involved for passing it along.

 Coming Up

 There are a lot of wonderful things to add to X-Sharp, including shading of
 many sorts, antialiasing, support for non-convex objects, faster polygon
 drawing and clipping, 3-D clipping, texture mapping, still better
 performance--you get the idea.  Personally, I'm excited as all get-out about
 this stuff, and I'll certainly cover more of it in the near future.  Graphics
 is a broad field, though, and you folks have diverse interests; I'd like to
 hear what you're interested in seeing in this space.  More 3-D animation?
 More animation of other types?  Programming techniques for PC graphics
 hardware, such as 24-bpp VGAs, the S3 Super-VGA accelerator, and XGA?  JPEG?
 Graphics operations such as seed fills and fast (and I do mean fast!) line
 drawing?  Color mapping to a 256-color palette?  Dithering?  Something else?

 It's a big list, because this is a big--and tremendously exciting--field.  So
 long as there's code to be written for a topic (after all, this is the
 Graphics Programming column), I'm game.  Let me know what you'd like to see.

 [BACK] Back

Discuss this article in the forums

Date this article was posted to GameDev.net: 7/16/1999
(Note that this date does not necessarily correspond to the date the article was written)

See Also:
Michael Abrash's Articles

© 1999-2011 Gamedev.net. All rights reserved. Terms of Use Privacy Policy
Comments? Questions? Feedback? Click here!