More undocumented 256-color VGA magic

 Journal:   Dr. Dobb's Journal  August 1991 v16 n8 p165(7)
 Title:     More undocumented 256-color VGA magic. (Graphics Programming)
 Author:    Abrash, Michael.
 AttFile:    Program:  GP-AUG91.ASC  Source code listing.

 Summary:   Programmers should remember that there are many subtle approaches
            to any problem and to keep the big picture in mind when
            implementing programs.  Mode X is an undocumented 320 x 320
            256-color mode of the VGA standard that supports page flipping,
            makes available off-screen memory, has square pixels, and permits
            users to increase performance by as much as four times by using
            the VGA's hardware.  There are four latches in VGA, one for each
            plane of display memory, and these latches are used to copy data
            from one part of display memory to another.  Latches are suitable
            for patterned fills and screen-to-screen copies, including
            scrolls.  Four-pixel-wide patterns are extremely useful.
 Topic:     Tutorial
            Computer Graphics
            VGA Standard.
 Feature:   illustration
 Caption:   The latches are loaded by every display memory read. (chart)
            Bytes written from the latches to corresponding planes. (chart)
            One useful way to organize display memory in Mode X. (chart)

 Full Text:

 Every so often, a programming demon that I'd thought I'd forever laid to rest
 arises to haunt me once again.  A minor example of this -- an imp, if you
 will -- is the use of "=" when I mean "==," which I've done all too often in
 the past, and am sure I'll do again.  That's minor deviltry, though, compared
 to the considerably greater evils of one of my personal scourges, of which I
 was recently reminded anew: too-close attention to detail.  Not seeing the
 forest for the trees.  Looking low when I should have looked high.  Missing
 the big picture, if you catch my drift.

 Thoreau said it best: "Our life is frittered away by detail.  .  .  .
 Simplify, simplify."  That quote sprang to mind when I received a letter from
 Anton Treuenfels of Fridley, Minnesota, thanking me for clarifying the
 principles of filling adjacent convex polygons, as discussed in this column
 in February and March.  Anton then went on the describe his own method for
 filling convex polygons.

 Anton's approach had its virtues and drawbacks, foremost among the virtues
 being a simplicity Thoreau would have admired.  For instance, in writing my
 polygon-filling code, I had spent quite some time trying to figure out the
 best way to identify which edge was the left edge and which the right,
 finally settling on comparing the slopes of the edges if the top of the
 polygon wasn't flat, and comparing the starting points of the edges if the
 top was flat.  Anton simplified this tremendously by not bothering to figure
 out ahead of time which was the right edge of the polygon and which the left,
 instead scanning out the two edges in whatever order he found them and
 letting the low-level drawing code test, and if necessary swap, the
 end-points of each horizontal line of the fill, so that filling started at
 the leftmost edge.  This is a little slower than my approach (although the
 difference is almost surely negligible), but it also makes quite a bit of
 code go away.

 What that example, and others like it in Anton's letter, did was kick my mind
 into a mode that it hadn't -- but should have -- been in when I wrote the
 code, a mode in which I began to wonder, "How else can I simplify this
 code?"; what you might call Occam's Razor mode.  You see, I created the
 convex polygon-drawing code by first writing pseudocode, then writing C code,
 and finally writing assembly code, and once the pseudocode was finished, I
 stopped thinking about the interactions of the various portions of the
 program.  In other words,  I became so absorbed in individual details that I
 forgot to consider the code as a whole.  That was a mistake, and an
 embarrassing one for someone who constantly preaches that programmers should
 look at their code from a variety of perspectives.  May my embarrassment be
 your enlightenment.

 The point is not whether, in the final analysis, my code or Anton's code is
 better; both have their advantages.  The point is that I was programming with
 half a deck because I was so fixated on the details of a single sort of
 implementation; I ended up with relatively hard-to-write, complex code, and
 missed out on many potentially useful optimizations by being so focused.
 It's a big world out there, and there are many subtle approaches to any
 problem, so relax and keep the big picture in mind as you implement your
 programs.  Your code will likely be not only better, but also simpler.  And
 whenever you see me walking across hot coals in this column when there's an
 easier way to go, please, let me know!

 Thanks, Anton.

 Mode X Continued

 Last month, I introduced you to what I call mode X, an undocumented 320 X 240
 256-color mode of the VGA.  Mode X is distinguished from mode 13h, the
 documented 320 X 200 256-color VGA mode, in that it supports page flipping,
 makes off-screen memory available, has square pixels, and, above all, lets
 you use the VGA's hardware to increase performance by as much as four times
 (at the cost of more complex and demanding programming, to be sure -- but end
 users care about results, not how hard the code was to write, and mode X
 delivers results in a big way).  Last month we saw how the VGA's
 plane-oriented hardware can be used to speed solid fills.  That's a nice
 technique, but this month we're going to move up to the big guns -- the

 The VGA has four latches, one for each plane of display memory.  Each latch
 stores exactly one byte, and that byte is always the last byte read from the
 corresponding plane of display memory, as shown in Figure 1.  Furthermore,
 whenever a given address in display memory is read, all four planes' bytes at
 that address are read and stored in the corresponding latches, regardless of
 which plane supplied the byte returned to the CPU (as determined by the Read
 Map register).  As with so much else about the VGA, the above will make
 little sense to VGA neophytes, but the important point is this: By reading
 one display memory byte, 4 bytes --one from each plane -- can be loaded into
 the latches at once.  Any or all of those 4 bytes can then be written
 anywhere in display memory with a single byte-sized write, as shown in Figure
 2.  The upshot is that the latches make it possible to copy data around from
 one part of display memory to another, 32 bits (four pixels) at a time --
 four times as fast as normal.  (Recall from last month that in mode X, pixels
 are stored one per byte, with four pixels in a row stored in successive
 planes at the same address, one pixel per plane.)  However, any one latch can
 only be loaded from and written to the corresponding plane, so an individual
 latch can only work with every fourth pixel on the screen; the latch for
 plane 0 can work with pixels 0, 4, 8.  .  ., the latch for plane 1 with
 pixels 1, 5, 9.  .  ., and so on.

 The latches aren't intended for use in 256-color mode -- they were designed
 to allow individual bits of display memory to be modified in 16-color mode --
 but they are nonetheless very useful in mode X, particularly for patterned
 fills and screen-to-screen copies, including scrolls.  Patterned filling is a
 good place to start, because patterns are widely used in windowing
 environments for desktops, window backgrounds, and scroll bars, and for
 textures and color dithering in drawing and game software.

 Fast mode X fills with patterns that are four pixels in width can be
 performed by drawing the pattern once to the four pixels at any one address
 in display memory, reading that address to load the pattern into the latches,
 setting the Bit Mask register to 0 to specify that all bits drawn to display
 memory should come from the latches, and then performing the fill pretty much
 as we did last month, except that each line of the pattern must be loaded
 into the latches before the corresponding scan line on the screen is filled.
 Listings One and Two (page 181) together demonstrate a variety of fast mode X
 four-by-four pattern fills.  (The mode set function called by Listing One is
 from last month's column.)

 Four-pixel-wide patterns are more useful than you might imagine.  There are
 actually [2.sup.128] possible patterns (16 pixels, each with [2.sup.8]
 possible colors); that set is certainly large enough for most color-dithering
 purposes, and includes many often-used patterns, such as halftones, diagonal
 stripes, and crosshatches.

 Furthermore, eight-wide patterns, which are widely used, can be drawn with
 two passes, one for each half of the pattern; this principle can in fact be
 extended to patterns of arbitrary multiple-of-four widths.  (Widths that
 aren't multiples of four are considerably more difficult to handle, because
 the latches are four pixels wide.)

 Allocating Memory in Mode X

 Listing Two raises some interesting questions about the allocation of display
 memory in mode X.  In Listing Two, whenever a pattern is to be drawn, that
 pattern is first drawn in its entirety at the very end of display memory; the
 latches are then loaded from that copy of the pattern before each scan line
 of the actual fill is drawn.  Why this double copying process, and why is the
 pattern stored in that particular area of display memory?

 The double copying process is used because it's the easiest way to load the
 latches.  Remember, there's no way to get information directly from the CPU
 to the ltches; the information must first be written to some location in
 display memory, because the latches can be loaded only from display memory.
 By writing the pattern to off-screen memory, we don't have to worry about
 interfering with whatever is currently displayed on the screen.

 As for why the pattern is stored exactly where it is, that's part of a master
 memory allocation plan that will come to fruition next month when I implement
 a mode X animation program.  Figure 3 shows this master plan; the first two
 pages of memory (each 76,800 pixels long, spanning 19,200 addresses -- that
 is, 19,200 pixel quadruplets -- in display memory) are reserved for page
 flipping, the next page of memory (also 76,800 pixels long) is reserved for
 storing the background (this is used to restore the holes left after images
 move), the last 16 pixels (four addresses) of display memory are reserved for
 the pattern buffer, and the remaining 31,728 pixels (7932 addresses) of
 display memory are free for storage of icons, images, temporary buffers, or
 whatever.  This is an efficient organization for animation, but there are
 certainly many other possible setups.  For example, you might choose to have
 a solidly-colored background, in which case you could dispense with the
 background page (instead using the solid rectangle fill routine to replace
 the background after images move), freeing up another 76,800 pixels of
 off-screen storage for images and buffers.  You could even eliminate
 page-flipping altogether if you needed to free up a great deal of display
 memory.  For example, with enough free display memory it is possible in mode
 X to create a virtual bitmap three times larger than the screen, with the
 screen becoming a scrolling window onto that larger bitmap.  This technique
 has been used to good effect in a number of games, although I don't know if
 any of those games use mode X.

 Copying Pixel Blocks Within Display


 Another fine use for the latches is copying pixels from one place in display
 memory to another.  Whenever both the source and the destination share the
 same nibble alignment (that is, their start addresses modulo four are the
 same), it is not only possible but quite easy to use the latches to perform
 the copy four pixels at a time.  Listing Three (page 182) shows a routine
 that copies via the latches.  (When the source and destination do not share
 the same nibble alignment, the latches cannot be used, because the source and
 destination planes for any given pixel differ; in that case, you can set the
 Read Map register to select a source plane and the Map Mask register to
 select the corresponding destination plane, then copy all pixels in that
 plane; repeat for all four planes.)

 Listing Three has an important limitation: It does not guarantee proper
 handling when the source and destination overlap, as in the case of a
 downward scroll, for example.  Listing Three performs top-to-bottom,
 left-to-right copying.  Downward scrolls require bottom-to-top copying;
 likewise, rightward horizontal scrolls require right-to-left copying.  As it
 happens, my intended use for Listing Three is to copy images between
 off-screen memory and on-screen memory, and to save areas under pop-up menus
 and the like, so I don't really need overlap handling -- and I do really need
 to keep the size of this column down.  However, you will surely want to add
 overlap handling if you plan to perform arbitrary scrolling and copying in
 display memory.

 Now that we have a fast way to copy images around in display memory, we can
 draw icons and other images between two and four times faster than in mode
 13h, depending on the speed of the VGA's display memory.  (In case you're
 worried about the nibble-alignment limitation on fast copies, don't be; I'll
 address that fully next time, but the secret is to store all four possible
 rotations in off-screen memory, then select the correct one for each copy.)
 However, before our fast display memory-to-display memory copy routine can do
 us any good, we must have a way to get pixel patterns from system memory into
 display memory, so that they can be copied with the fast copy routine.

 Copying to Display Memory

 The final piece of the puzzle is the system memory to
 display-memory-copy-routine shown in Listing Four (page 182).  This routine
 assumes that pixels are stored in system memory in exactly the order in which
 they will ultimately appear on the screen; that is, in the same linear order
 that mode 13h uses.  It would be more efficient to store all the pixels for
 one plane first, then all the pixels for the next plane, and so on for all
 four planes, because many OUTs could be avoided, but that would make images
 rather hard to create.  And, while it is true that the speed of drawing
 images is, in general, often a critical performance factor, the speed of
 copying images from system memory to display memory is not particularly
 critical in mode X.  Important images can be stored in off-screen memory and
 copied to the screen via the latches must faster than even the speediest
 system memory-to-display memory-copy-routine could manage.

 I'm not going to present a routine to perform mode X copies from display
 memory to system memory, but such a routine would be a straightforward
 inverse of Listing Four.

 Coming Up: Our Hero Risks Life, Limb,

 and Word Count in a Thrilling Conclusion

 Next month, I'll take all the model X tools we've developed, together with
 one more tool -- masked image copying -- and the remaining unexplored feature
 of mode X, page flipping, and build an animation application.  I hope that
 when I'm done, you'll agree with me that mode X is the way to animate on the
 PC.  I also hope that I can fit everything into one column; there are always
 so many interesting things to say that I have trouble keeping the size of
 these columns down, and mode X animation covers even more fertile ground than

 But, hey -- you've already heard about my programming demons; I'll spare you
 the writing demons.  Besides, as I'm fond of saying, end users care about
 results, not how you produced them.  For my writing, you folks are the end
 users -- and notice how remarkably little you care about how this magazine
 gets written and produced.  You care that it shows up in your mailbox every
 month, and you care about how it got there.  When you're a creator, the
 process matters.  When you're a buyer, results are everything.  All
 important.  Sine qua non.  The whole enchilada.

 If you catch my drift.

 Late Flash!

 The Mode X mode set code in my July '91 column (Listing One, page 154) has a
 small -- but critical -- bug.  On line 46, the value loaded into AL should be
 0E3h, not 0E7h.  Without this correction, the screen will roll on
 fixed-frequency (IBM 851X-style) monitors.

 [BACK] Back

Discuss this article in the forums

Date this article was posted to 7/16/1999
(Note that this date does not necessarily correspond to the date the article was written)

See Also:
Michael Abrash's Articles

© 1999-2011 All rights reserved. Terms of Use Privacy Policy
Comments? Questions? Feedback? Click here!