3D Basics
by Thierry Tremblay
http://frogengine.net-connect.net

What this is about

In this document I will outline the mathematics that I am using. I assume you already know about linear algebra (vectors and matrices). I won't explain much in details, if you need more information I suggest you go look on the net.

Spaces, there is a lot of spaces

What is a space?

Here I am talking a 3D Euclidean (linear) space. A 3D space is defined by three axis: X, Y and Z. They are all perpendicular do each others. There is two conventions used: the right-handed and the left-handed systems. On a sheet of paper, draw the X axis going to the right (that is, positives X values are on the right side of the origin) and Y axis going up. Then, in a right-handed system, the positive Z axis is going from to sheet toward you. In the left-handed system, it is the opposite, the positive Z axis is going away from you, i.e.: behind the sheet.

I will use the right-handed system in the Frog engine, because that's what I am used to. Some peoples use the left-handed system saying that the computer screen is a left-handed system, which is true, but I feel it is not hard to map a right-handed system to the screen.

What is the implications of selecting one system over the other? Quite simply, the vector and matrices operations are inverted. For example, the vector multiplication N = A * B in a right-handed system would become N = B * A in a left-handed system. This is important to remember if you used left-handed systems before. If not, then forget about it, because I will.

Spaces in 3D

It is very helpful to define different 3D spaces when dealing with 3D graphics. There is different conventions about the view/camera/screen spaces, so here I will define what I use:

Object space(3D space)
This coordinate system is local to an object, an object being a set of polygons. If you want to have multiple instances of the same object, at different locations in the world, you need object space. Object space is also the space into which you modelize your objects. For example, a mesh designed in 3D Studio is defined in object space.
World space (3D space)
This coordinate system is the most important one. This is where all objects are positioned, where you do compute physics, movements and collisions detection. It is also here that lighting is computed. Think of the world space as your game world.
View space (3D space)
This coordinate system is relative to the camera (see below). Objects in world space are transformed to view space to know what is visible on the screen. This space is also commonly called "eye space" or "camera space". Some peoples will also refer to this as "screen space", but I personally think it is a bad habit... Screens aren't 3D yet.
Screen space (2D space)
What I call screen space is in fact a homogenous coordinates representation of the screen. Coordinates are not pixels! The viewport maps view space to screen space, and the origin in screen space is in the middle of the screen (for perspective projections anyway).

Spaces are represented using matrix notation. In a 3x3 matrix, you have three vectors. Each vector defines the direction of an axis. The first vector is "right" (positive X axis), the second is "up" (positive Y axis) and the last one is "ahead" (positive Z axis). All vectors are orthogonal and of unit length.

Camera and viewport

What is a camera?

A camera is a simple entity that resides in world space. It provides the origin of the view frustum, which is a pyramid encompassing what is visible in the world (this is usually where you are in the world). The camera also specifies what you are looking at, and where up and down is. All information in the camera are defined in world space. Is is really handy to use a 3x3 matrix and a translation vector to represent the camera information. The first vector is where "right" is, the second is "up" and the third is the direction you are looking at.

        | Ux  Vx  Nx |
        | Uy  Vy  Ny |
        | Uz  Vz  Nz |

where U is the "right" vector, V the "up" vector and N the direction you are looking. The translation vector is simply the location of the camera.

For example, you are at (3,10,5), right is along vector (1,0,0), up along vector (0,0,1) and you are looking at something at point (3,15,5). The matrix representation would is:

        | 1 0 0 |
        | 0 0 1 |
        | 0 1 0 |

The translation vector, or camera location, would be (3,10,5)

What is a viewport?

Think of a viewport as a kind of window. The viewport delimits what is visible to the camera. It defines the sides of the view frustum or "viewing pyramid". The viewport defines the distance from the camera to the projection plane. The viewport also defines a rectangle on the projection plane, and anything outside of this rectangle is not visible. In effect, this rectangle defines the field of view, or FOV. The FOV is usually represented with an angle, which is the angle formed by the sides of the rectangle with the camera location.

From one space to another

We have seen that there is four different spaces. So how do we go from one space to another one? Simple, define transformations that map objects from one space to another. A transformation from one space to another is defined in the same way as a camera is! In fact, the matrix/vector pair defined for the camera is a transform that position the camera location/orientation in world space. In the case of space transforms, the matrix represent the orientation of space B relative to space A, and the vector is the origin of space B in space A.

Since we have four spaces, we need three transformations. Each object in the world have a transform that maps the original object (in object space) to world coordinates. To transform from world to view space, we use the inverse of the camera transform. That is, we inverse it's matrix and negate it's vector to obtain the transformation from world to view space. Why are we inverting that transform? Because it is the camera that is moving in the world, and not the world that is moving in view space. Suppose that the camera is moving to the right: the objects in the world should be moving left, not right! Same goes with movements: if the camera moves forward, the world appears to be moving toward us.

Finally, the transition from view space to screen space. The transformation is a projection defined by the viewport.

Playing in spaces

We need theses different spaces for different things. Different algorithms can be applied in different spaces. Usually, you perform lighting and collisions in world space. Clipping can be done in world space, view space, screen space or any combinaison of them. It is not the right place to go further, so I will stop here for now.

The Z buffer

There is two kind of "Z buffer" I want to talk about: the Z buffer, and the 1/Z or W buffer. In one case, you put Z values in the Z buffer, in the other case you put 1/Z values. Why would you do that?

Suppose you have your near clip plane at Z = 1 (everything here is in view space) and the far clip plane at 100. You want to map the values between 1 and 100 to values between 0 and 1 (that's what the hardware Z buffer use).

The Z buffer

You simply do a linear mapping:

Z buffer = ( Z - nearZ ) / ( farZ - nearZ )

For our example, we get:

Z buffer = Z / 99 - 1/99

So for Z = 1, we get Z buffer = 0 and for Z = 100 we get Z buffer = 1. For Z = 50.5 we would get Z Buffer = 0.5.

The 1/Z buffer

The mapping equation ain't linear anymore:

Z buffer = ( 1/Z - 1/nearZ ) / ( 1/farZ - 1/nearZ )

For our example, we get:

Z buffer = (-1/Z) / 0.99 + 1 / 0.99

For Z = 1, we get Z buffer = 0 and for Z = 100 we get Z buffer = 1. But for Z = 50.5 we get Z Buffer = 0.99.

So what?

The difference is that in the first case, values are mapped linearly, and this ensure the Z buffer precision is the same for any value of Z in view space. In the second case, the mapping is non linear, it is proportional to -1/Z and that means that the farthest an object is in view space, the less precision the Z buffer will have. This is not necessarily bad.

The 1/Z approach does have less precision for far objects, but it also has better precision at close range, and this is not negligible. Another advantage of using the 1/Z buffer is the fact that I use texture planes in the engine, which gives me the 1/Z I need. If I were to use Z, I would have to do a division for each vertices!

Another thing to consider is that when rendering polygons, I use the "less or equal" function. So if you draw objects that need Z buffering in a back to front order (which is the case in the Frog Engine), then the precision doesn't matter so much. Of course, I may change this later in the project if the results are not good... But in my experience, this works well.

Discuss this article in the forums

Date this article was posted to GameDev.net: 8/23/1999
(Note that this date does not necessarily correspond to the date the article was written)

See Also:
General