Geometry Culling in 3D Engines
Efficient algorithms for determining the visible parts of 3D environments are a key to visualizing complex scenes at interactive rates. Visibility culling algorithms reduce the number of polygons sent down the rendering pipeline based on the simple principle that if something is not seen, it does not have to be drawn. In the following I detail the underlying concepts of the major visibility culling algorithms in use today. We will cover view frustum culling, occlusion culling and contribution culling and finally discuss what else can be used in environments in which this alone is not enough.
View frustum culling
Perhaps the most obvious form of culling is view frustum culling, which is based on the fact that only the objects in the current view volume must be drawn. View volume is usually defined by six planes, namely the front, back, left, right, top, and bottom clipping planes, which together form a cut pyramid. Front and back clipping planes may be defined to lie at the viewer point and infinity, respectively. If a polygon is entirely outside the pyramid, it cannot be visible and can be discarded. If it is partially inside, it is clipped to the planes so that its outside parts are removed.
In the naive version of this algorithm, every object in the world is tested against the six planes. This leads to an asymptotic complexity of O(n) (for an introduction to asymptotic notation, see [CLR90]) if reasonable assumptions hold, meaning that the running speed of the algorithm scales linearly with scene object count. Fortunately, we can do much better than this by hierarchically subdividing space into smaller chunks, and tagging which objects belong to which chunks. A popular way to do this is called the octree. Octree is a hierarchy of axially aligned boxes in which every box is always subdivided to 8 smaller subboxes (i.e. the box is split along every axis) until some criterion to terminate the subdivision process is met. A good choice for the criterion is to stop subdivision when an octree node contains less than a predefined amount of objects. Densely occupied areas can thus have small nodes, whereas predominantly empty areas can be covered with just one big box.
This kind of regular space subdivision leads to hierarchical view frustum culling and logarithmic asymptotic time complexity. The octree is traversed top-down; immediately when a box is totally outside the view pyramid, we can discard all of its children, as they are contained within the parent. Otherwise the recursion continues to the children. Further, as can be immediately seen, octrees can be used for other things, such as speeding up collision queries.
This primitive form of culling is based on the observation that if all objects in the world are closed, then the polygons which don't face the viewer cannot be seen. This directly translates to the vector angle between the direction where the viewer is facing and the normal of the polygon: if the angle is more than 90 degrees, the polygon can be discarded. Back-face culling is usually automatically performed by the rendering API (Direct3D, OpenGL) and it can be expected to cull roughly half of the polygons inside the viewing frustum.
Cell-based occlusion culling
As good as view frustum and back-face culling are, they still can't sufficiently reduce polygon counts in modern games. Thus, we turn our attention to occlusion culling, which, as the name suggests, attempts to identify the visible parts of the scene, thus reducing the number of primitives rendered. First of all we cover cell-based methods.
All cell-based occlusion culling methods are based on the assumpion that the game world can be divided to cells which are connected to each other using portals. Clearly, if a portal is not seen from a given point of view, then none of the cells behind the portal can be seen and they can thus be culled away. There are two dominating forms of cell-based engines in use today: BSP and "portal" engines.
BSP stands for Binary Space Partitioning. In Binary Space Partitioning, space is split with a plane to two halfspaces, which are again recursively split. This can be used to force a strict back-to-front drawing ordering (for more details, see [FDFH90]). Occlusion culling in BSP engines, such as Quake, is usually based on the Potentially Visible Set (PVS). This means that, for every cell in the world, the engine finds all the other cells that are potentially visible from the cell, from any point in the cell. [Tel92] details how to find this PVS analytically. Then, at rendering time, the cell where the viewer is located is found, view frustum culling is performed for the cells in the PVS, and finally those cells that reside in the PVS and inside the frustum are drawn.
This introduces several major concepts. PVS-based culling is conservative: it always overestimates the set of visible cells, never underestimates it. This means that no artifacts due to objects that weren't drawn even though they should have been can occur. Further, as we know from Quake, calculating the PVS can take quite a bit of time. Thus, an expensive preprocessing step is traded for fast displaying speed. There is still overdraw, however, meaning that redundant polygons are drawn; this is due to the fact that PVS contains all the cells that can be seen from anywhere in the viewing cell. The overdraw typically ranges from 50% to 150%.
A more recent form of cell-based occlusion culling is present in so-called "portal" engines. The world is again defined as cells and portals connecting them. At runtime, the cell where the viewer is in is found. Then, all portals in the viewing cell are tested against the view frustum planes and accepted, discarded, or clipped as necessary. Those that are either wholly or partially inside the frustum are recursively traversed, so that, mathematically speaking, the planes that define the portal polygon's boundaries are added as clipping planes, so that the new set of clipping planes define a smaller and smaller portion of the screen. This solution leads to exactly 0% overdraw for the world geometry thanks to the analytical clipping performed. It is thus well suited for software rendering.
However, analytical clipping is expensive. Thus, the projected bounding rectangle of the portal polygons can be used instead, as detailed in [LG95], to provide an approximation for discarding portals.
Portal engines have the advantage that no preprocessing step is required and that scenes can be very dynamic as no PVS is present. Also, "unnatural" space-warping effects are possible. The disadvantage is potentially larger CPU overhead for portal culling at runtime, especially in complex scenes. Recently portals have only been reserved for special effects in industry-strength engines, so that the eventual value of the dynamic portal approach is unclear.
Clearly, cell engines are only suited for scenes where large occluders, such as walls, are present. Thus, they make excellent high-performance indoor engines for games.
PVS-based arbitrary geometry occlusion culling
Recently, PVS-based occlusion culling algorithms for more general environments have been proposed. These algorithms follow the PVS principle in that they try to find a tight overestimation of the true PVS for a given viewcell. The algorithm by Schaufler et. al. [SDDS00] first discretizes the scene by building an octree containing empty, boundary and opaque nodes. Then, they find opaque (solid) octree nodes, and, for all viewnodes, check what portion of space the node definitely occludes with respect to any position in the viewnode. Only the nodes not hidden are inserted into the PVS for the viewcell node. The approach is also well suited for finding the definitely hidden parts of terrain in terrain rendering.
An alternative algorithm to the problem by Durand et. al. [DDTP00] introduces something called extended projections to find the PVS for viewing cells. Both the presented algorithms, unlike some earlier ones, can handle occluder fusion; this means that two or more occluders can hide an object even if none of them could hide it alone.
As the trend from Quake-like towards arbitrary geometry environments continues, these kind of algorithms will probably gain more and more widespread acceptance. Like PVS algorithms in general, these algorithms are only suited for mostly static environments. At this point, however, it is unclear what the real value of the presented algorithms for high-performance game environments is, as no engine I know of has implemented them.
Other forms of occlusion culling
More dynamic, or "on-line", methods for occlusion culling exist. These are well suited for highly dynamic environments, or for engines where the long preprocess time of a PVS approach is not desired. Zhang [Zha98] has invented a novel method based on Hierarchical Occlusion Mapping, or HOM for short. This method is a two-pass algorithm which first takes advantage of the rendering hardware to build a hierarchy of occlusion maps. What this means is that a group of objects are chosen as occluders; these are rendered to an occlusion map. Depth information is also extracted. Then, for every potential occludee, we see whether its projection is inside the combined occluder projection, and whether the occludee is behind the occluders. If the test returns positive results, the occludee can be safely discarded. As detailed in Zhang's thesis, the algorithm can also be modified to provide aggressive (nonconservative) culling.
Yet another interesting on-line algorithm for the visibility problem is detailed in [GKM93]. Greene's hierarchical Z-buffer is also a dynamic algorithm which takes advantage of frame-to-frame coherence. Space is subdivided to an octree. The first frame is rendered in front-to-back order. Octree nodes are tested against the viewing frustum, and, if accepted, their projections are then tested against the hierarchical Z-buffer. If the projection of the front faces of the node is hidden, the contents of the node need not be drawn. Subsequent frames are rendered by first rendering all the nodes that were visible in the last frame, and then checking for cube visibility as in the first frame, the difference being that far more cubes can now be trivially rejected if reasonable spatial coherency is present.
Both of the presented algorithms are image-space algorithms. Their rejection percentage is generally very good, but on the other hand they require additional rendering and scan-converting, thus putting additional strain on the CPU at runtime. Whether these algorithms or a PVS approach is desirable depends on your requirements. A hybrid approach might also be investigated.
Raycasting algorithms have also been proposed. Bungie's game Oni is one to use this solution [Pea00]. The idea is to cast a finite number of rays from the point of view to space, and check where they intersect with a polygon or a surface. To accelerate the process, an octree is used. The general raycasting solution is to cast one ray per pixel, but this is too expensive. Instead, in Bungie's version, octree nodes are rendered (i.e. all polygons in the node output) when a ray goes through the box. When the ray hits something, the process is terminated. The number of rays can be varied for higher or lower quality. Note how this can introduce aliasing artifacts; these can be dealt with, but it is still unclear how useful this algorithm is versus the algorithms presented above.
Contribution culling discards objects if their screen projection is too small (in practice, the projection of their bounding volume is used). This form of culling isn't conservative, but is still very often used in, say, driving or open-environment roleplaying games. To smooth out the transitions, distance fog is sometimes added.
So you've incorporated a good visibility culling algorithm to your engine. What next? Even if the visibility process provides too much polygons for reasonable 3D card consumption, the game is still not lost. There is namely the mysterious LOD, or Level of Detail. By implementing LOD in form of adaptive terrain, procedural meshes (like curved surfaces), mesh simplification in one end and possibly subdivision surfaces in the other end of the spectrum you can potentially incorporate massive scalability to your engine. This is going to become increasingly important in the future, as the gap between high-end and low-end PC is getting wider and wider (well, unless the consoles eat the gaming market, something that is happening I think!).
We have briefly covered some of the major algorithms used for visibility culling, but the work certainly does not stop here. To really understand the concepts involved, it would be beneficial to get and read some of the papers mentioned in the references (which are all available in the Internet, though in various places). There is also an excellent article up at http://www.gamasutra.com/features/19991109/moller_haines_01.htm which goes into more detail with some of the occlusion culling algorithms presented here.
Summa summarum: It is great to be a game programmer in an era in which the possibilities are opening up in front of all of us. There is no longer one right way to solve the visibility problem, as upcoming games that are taking distance to the Quake model show. Much research will undoubtedly have to be done before the best solutions for arbitrary scenes are found. Progress has been rapid, however, so that there already is a commercial package whose sole purpose is to solve scene visibility (Surrender Umbra, www.hybrid.fi). Doing the same thing, just open sourced, would certainly be beneficial for game developers all around the globe.
This article is provided as-is, and no claim of the correctness of the information presented above is made. It certainly presents the state of my current knowledge of the subject matter. :) I happily welcome any feedback, whether positive or negative. Your comments are the only way I can improve my writing to be more useful for you.
Until next time!
Pietari Laurila is a programmer who specializes in J2EE, XML and game technology. He is also a member of Twilight Minds Design Group (www.twilightminds.com). He can be reached at plaurila at twilightminds.com.