Mode X: 256-color VGA Magic

 Mode X: 256-color VGA magic

 Journal:   Dr. Dobb's Journal  July 1991 v16 n7 p133(7)
 -----------------------------------------------------------------------------
 Title:     Mode X: 256-color VGA magic. (Graphics Programming)
 Author:    Abrash, Michael.
 AttFile:    Program:  GP-JUL91.ASC  Source code listing.

 Summary:   VGA's 320 x 240 256-color mode is most likely the single best mode
            of VGA, especially for animation.  Features that make this mode so
            special include its 1:1 aspect ratio, which results in equal pixel
            spacing vertically and horizontally.  Square pixels create the
            most attractive displays.  In addition, mode X allows page
            flipping, which helps create smooth animation.  Mode X pixels are
            processed in parallel, improving performance up to four times.
            However, the use of mode X is not widespread, since it is entirely
            undocumented.  Only a very experienced VGA programmer would know
            that such a mode exists.  The author provides mode set code,
            delineates the bitmap organization, and demonstrate how the basic
            write pixel and read pixel operations work.
 -----------------------------------------------------------------------------
 Descriptors..
 Topic:     Programming Instruction
            VGA Standard
            Pixels
            Color
            Animation
            Performance Improvement.
 Feature:   illustration
            chart.
 Caption:   The organization of display memory in mode X. (chart)
            The Map Mask register selects which planes are written to in
            planar modes. (chart)

 -----------------------------------------------------------------------------
 Full Text:

 There's a well-known Latin saying, in complexitate est opportunitas ("in
 complexity there is opportunity"), that must have been invented with the VGA
 in mind.  Well, actually, it's not exactly well-known (I just thought of it
 this afternoon), but it should be.  As evidence, witness the strange case of
 the VGA's 320 x 240 256-color mode, which is undeniably complex to program
 and isn't even documented by IBM -- but which is, nonetheless, perhaps the
 single best mode the VGA has to offer, especially for animation.

 What Makes 320 x 240 Special?

 Five features set the 320 x 240 256-color mode (which I'll call "mode X,"
 befitting its mystery status in IBM's documentation) apart from other VGA
 modes.  First, it has a 1:1 aspect ratio, resulting in equal pixel spacing
 horizontally and vertically (square pixels).  Square pixels make for the most
 attractive displays, and avoid considerable programming effort that would
 otherwise be necessary to adjust graphics primitives and images to match the
 screen's pixel spacing.  (For example, with square pixels, a circle can be
 drawn as a circle; otherwise, it must be drawn as an ellipse that corrects
 for the aspect ratio -- a slower, more complicated process.)  In contrast,
 mode 13h, the only documented 256-color mode, provides a nonsquare 320 x 200
 resolution.

 Second, mode X allows page flipping, a prerequisite for the smoothest
 possible animation.  Mode 13h does not allow page flipping, nor does mode
 12h, the VGA's high-resolution 640 x 480 16-color mode.

 Third, mode X allows the VGA's plane-oriented hardware to be used to process
 pixels in parallel, improving performance by up to four times over mode 13h.

 Fourth, like mode 13h but unlike all other VGA modes, mode X is a
 byteper-pixel mode (each pixel is controlled by one byte in display memory),
 eliminating the slow read-before-write and bit-masking operations often
 required in 16-color modes.  In addition to cutting the number of memory
 accesses in half, this is important because the memory caching schemes used
 by many VGA clones speed up writes more than reads.

 Fifth, unlike mode 13h, mode X has plenty of offscreen memory free for image
 storage.  This is particularly effective in conjunction with the use of the
 VGA's latches; together, the latches and the off-screen memory allow images
 to be copied to the screen four pixels at a time.

 There's a sixth feature of mode X that's not so terrific: It's hard to
 program efficiently.  If you've ever programmed a VGA 16-color mode directly,
 you know that VGA programming can be demanding; mode X is often as demanding
 as 16-color programming, and operates by a set of rules that turns everyting
 you've learned in 16-color mode sideways.  Programming mode X is nothing like
 programming the nice, flat bitmap of mode 13h, or, for that matter, the flat,
 linear (albeit banked) bitmap used by 256-color SuperVGA modes.  (I'd like to
 emphasize that mode X works on all VGAs, not just SuperVGAs.)  Many
 programmers I talk to love the flat bitmap model, and think that it's the
 ideal organization for display memory because it's so straightforward to
 program.  Remember the saying I started this column with, though; the
 complexity of mode X truly is opportunity -- opportunity for the best
 combination of performance and appearance the VGA has to offer.  If you do
 256-color programming, especially if you use animation, you're missing the
 boat if you're not using mode X.

 Although some developers have taken advantage of mode X, its use is certainly
 not widespread, being entirely undocumented; only an experienced VGA
 programmer would have the slightest inkling that it exists, and figuring out
 how to make it perform beyond the write pixel/read pixel level is no mean
 feat.  I've never seen anything in print about it, and, in fact, the only
 articles I've seen about any of the undocumented 256-color modes were my own
 articles about the 320 x 200, 320 x 400, and 360 x 480 256-color modes in
 Programmer's Journal (January and September, 1989).  (However, John Bridges
 has put code for a number of undocumented 256-color resolutions into the
 public domain, and I'd like to acknowledge the influence of his code on the
 mode set routine presented in this article.)

 Given the tremendous advantages of 320 x 240 over the documented mode 13h,
 I'd very much like to get it into the hands of as many developers as
 possible, so I'm going to spend the next few columns exploring this odd but
 worthy mode.  I'll provide mode set code, delineate the bitmap organization,
 and show how the basix write pixel and read pixel operations work.  Then I'll
 move on to the magic stuff: rectangle fills, screen clears, scrolls, image
 copies, pixel inversion, and, yes, polygon fills (just a different driver),
 all blurry fast; hardware raster ops; and page flipping.  In the end, I'll
 build a working animation program that showns many of the features of mode X
 in action.

 The mode set code is the logical place to begin.

 Selecting 320 x 240 256-Color Mode

 We could, if we wished, write our own mode set code for mode X from scratch
 -- but why bother?  Instead, we'll let the BIOS do most of the work by having
 it set up mode 13h, which we'll then turn into mode X by changing a few
 registers.  Listing One (page 154) does exactly that.

 After setting up mode 13h, Listing One alters the vertical counts and timings
 to select 480 visible scan lines.  (There's no need to alter any horizontal
 values, because mode 13h and mode X both have 320-pixel horizontal
 resolutions.)  The Maximum Scan Line register is programmed to double scan
 each line (that is, repeat each scan line twice), however, so we get an
 effective vertical resolution of 240 scan lines.  It is, in fact, possible to
 get 400 or 480 independent scan lines in 256-color mode (see the
 aforementioned articles for details); however, 400-scan-line modes lack
 square pixels and can't support simultaneous offscreen memory and page
 flipping, and 480-scan-line modes lack page flipping altogether, due to
 memory constraints.

 At the same time, Listing One programs the VGA's bitmap to a planar
 organization that is similar to that used by the 16-color modes, and utterly
 different from the linear bitmap of mode 13h.  The bizarre bitmap
 organization of mode X is shown in Figure 1.  The first pixel (the pixel at
 the upper left corner of the screen) is controlled by the byte at offset 0 in
 plane 0.  (The one thing that mode X blessedly has in common with mode 13h is
 that each pixel is controlled by a single byte, eliminating the need to mask
 out individual bits of display memory.)  The second pixel, immediately to the
 right of the first pixel, is controlled by the byte at offset 0 in plane 1.
 The third pixel comes from offset 0 in plane 2, and the fourth pixel from
 offset 0 in plane 3.  Then the fifth pixel is controlled by the byte at
 offset 1 in plane 0, and that cycle continues, with each group of four pixels
 spread across the four planes at the same address.  The offset M of pixel N
 in display memory is M = N/4, and the plane P of pixel N is P = N mod 4.  For
 display memory writes, the plane is selected by setting bit P of the Map Mask
 register (Sequence Controller register 2) to 1 and all other bits to 0; for
 display memory reads, the plane is selected by setting the Read Map register
 (Graphics Controller register 4) to P.

 It goes without saying that this one ugly bitmap organization, requiring a
 lot of overhead to manipulate a single pixel.  The write pixel code shown in
 Listing Two (page 154) must determine the appropriate plane and perform a
 16-bit OUT to select that plane for each pixel written, and likewise for the
 read pixel code shown in Listing Three (page 154).  Calculating and mapping
 in a plane once for each pixel written is scarcely a recipe for performance.

 That's all right, though, because most graphics software spends little time
 drawing individual pixels.  I've provided the write and read pixel routines
 as basic primitives, and so you'll understand how the bitmap is organized,
 but the building blocks of high-performance graphics software are fills,
 copies, and bitblts, and it's here that mode X shines.

 Designing From a Mode X Perspective

 Listing Four (page 154) shows mode X rectangle fill code.  The plane is
 selected for each pixel in turn, with drawing cycling from plane 0 to plane 3
 then wrapping back to plane 0.  This is the sort of code that stems from a
 write-pixel line of thinking; it reflects not a whit of the unique
 perspective that mode X demands, and although it looks reasonably efficient,
 it is in fact some of the slowest graphics code you will ever see.  I've
 provided Listing Four partly for illustrative purposes, but mostly so we'll
 have a point of reference for the substantial speed-up that's possible with
 code that's designed from a mode X perspective.

 The two major weaknesses of Listing Four both result from selecting the plane
 on a pixel by pixel basis.  First, endless OUTs (which are particularly slow
 on 386s and 486s, often much slower than accesses to display memory) must be
 performed, and, second REP STOS can't be used.  Listing Five (page 156)
 overcomes both these problems by tailoring the fill technique to the
 organization of display memory.  Each plane is filled in its entirety in one
 burst before the next plane is processed, so only five OUTs are required in
 all, and REP STOS can indeed be used.  (I've used REP STOSB in Listings Five
 and Six (page 156.)  REP STOSW could be used and would improve performance on
 some 16-bit VGAs; however, REP STOSW requires extra overhead to set up, so it
 can be slower for small rectangles, especially on 8-bit VGAs.  Doing an
 entire plane at a time can produce a "fading-in" effect for large images,
 because all columns for one plane are drawn before any columns for the next;
 if this is a problem, the four planes can be cycled through once for each
 scan line, rather than once for the entire rectangle.

 Listing Five is 2.5 times faster than Listing Four at clearing the screen on
 a 20-MHz cached 386 with a Paradise VGA.  Although Listing Five is slightly
 slower than an equivalent mode 13h fill routine would be, it's not grievously
 so.  In general, performing plane-at-a-time operations can make almost any
 mode X operation, at the worst, nearly as fast as the same operation in mode
 13h (although this sort of mode X programming is admittedly fairly complex).
 In this pursuit, it can help to organize data structures with mode X in mind.
 For example, icons could be prearranged in system memory with the pixels
 organized into four plane-oriented sets (or, again, in four sets per scan
 line to avoid a fading-in effect) to facilitate copying to the screen a plane
 at a time with REP MOVS.

 Hardware Assist from an

 Unexpected Quarter

 Listing Five illustrates the benefits of designing code from a mode X
 perspective; this is the software aspect of mode X optimization, which
 suffices to make mode X about as fast as mode 13h.  That alone makes mode X
 an attractive mode, given its square pixels, page flipping, and offscreen
 memory, but superior performance would nonetheless be a pleasant addition to
 that list.  Superior performance is indeed possible in mode X, although,
 oddly enough, it comes courtesy of the VGA's hardware, which was never
 designed to be used in 256-color modes.

 All of the VGA's hardware assist features are available in mode X, although
 some are not particularly useful.  The VGA hardware feature that's truly the
 key to mode X performance is the ability to process four planes' worth of
 data in parallel; this includes both the latches and the capability to fan
 data out to any or all planes.  For rectangular fills, we'll just need to fan
 the data out to various planes, so I'll defer a discussion of other hardware
 features until another column.  (By the way, the ALUs, bit mask, and most
 other VGA hardware features are also available in mode 13h -- but parallel
 data processing is not.)

 In planar modes, such as mode X, a byte written by the CPU to display memory
 may actually go to anywhere between zero and four planes, as shown in Figure
 2.  Each plane for which the setting of the corresponding bit in the Map Mask
 register is 1 receives the CPU data, and each plane for which the
 corresponding bit is 0 is not modified.

 In 16-color modes, each plane contains one-quarter of each of eight pixel,s
 with the 4 bits of each pixel scanning all four planes.  Not so in mode X.
 Look at Figure 1 again; each plane contains one pixel in its entirety, with
 four pixels at any given address, one per plane.  Still, the Map Mask
 register does the same job in mode X as in 16-color modes; set it to 0Fh (all
 1-bits), and all four planes will be written to by each CPU access.  Thus, it
 would seem that up to four pixels could be set by a single mode X byte-sized
 write to display memory, potentially speeding up operations like rectangle
 fills by four times.

 And, as it turns out, four-plane parallelism works quite nicely indeed.
 Listing Six is yet another rectangle-fill routine, this time using the Map
 Mask to set up to four pixels per STOS.  The only trick to Listing Six is
 that any left or right edge that isn't aligned to a multiple-of-four pixel
 column (that is, a column at which one four-pixel set ends and the next
 begins) must be clipped via the Map Mask register, because not all pixels at
 the address containing the edge are modified.  Performance is as expected;
 Listing Siz is nearly ten times faster at clearing the screen than Listing
 Four and just about four times faster than Listing Five -- and also about
 four times faster than the same rectangle fill in mode 13h.  Understanding
 the bitmap organization and display hardware of mode X does indeed pay.

 Just so you can see mode X in action, Listing Seven (page 158) is a sample
 program that selects mode X and draws a number of rectangles.  Listing Seven
 links to any of the rectangle fill routines I've presented.

 And now, I hope, you begin to see why I'm so fond of mode X.  Next month,
 we'll continue with mode X by exploring the wonders that the latches and
 parallel plane hardware can work on scrolls, copies, blits, and pattern
 fills.

 Notes From the Edsun Front

 Comments coming my way indicate a great deal of programmer interest in the
 Edsun CEG/DAC, of which I wrote in April and May.  However, everyone who has
 actually programmed the CEG/DAC complains about how hard it is; the results
 are nice, but the process of getting there is anything but.  Nonetheless,
 programming the CEG/DAC is certainly a solvable problem, and whoever solves
 it best will come out looking mighty good.  A fair analogy is writing active
 TSRs.  Six years ago, TSR-writing was black magic, and Sidekick, primitive by
 today's standards, made a fortune.  Today, any dope can choose from dozens of
 books and toolkits and make a rock-solid TSR in a few hours.  As programmers
 develop better tools and a better understanding of the CEG/DAC, the grumbling
 will subside, and the software will take off.  Another case of complexity
 providing opportunity.

 Book of the Month

 This month's book is Advanced Programmer's Guide to SuperVGAs, by Sutty and
 Blair (Brady, 1990, ISBN 0-13010455-8; $44.95).  Pricey for softcover, but
 included in that price is a diskette of SuperVGA assembly code (which I have
 not tried out).  This book is the single best guide I've seen to the
 Byzantine world of SuperVGA programming, where every one of dozens of VGA
 models has different mode numbers and banking schemes.  Take it from someone
 who's waded through a slew of chip databooks and applications notes -- this
 book will save you a lot of time and aggravation if you have to program
 SuperVGAs directly.

 Still, not everything I'd like to see is in there.  For example, they cover
 only the Tseng Labs ET3000 chip, not the now widely used ET4000 that supports
 15-bpp graphics.  That's not the authors' fault, of course; it's a reflection
 of the incredible diversity and rate of change in the SuperVGA arena.

 Mode X.  The Edsun CEG/DAC.  SuperVGA programming.  In complexitate est
 opportunitas.  Q.E.D.

 [BACK] Back
Discuss this article in the forums
Date this article was posted to GameDev.net: 7/16/1999
(Note that this date does not necessarily correspond to the date the article was written)
See Also:
Michael Abrash's Articles