Introduction to Debugging
by Richard "superpig" Fine

Miscellany

Prevention, not prescription

Debugging, while something you should definitely be able to do if necessary, is however something you should always avoid having to do - you should always be trying to write code in ways that make it harder for bugs to creep in. While the thing that will best help you write solid code is a solid understanding of your program, your computer, your tools and how they are all supposed to work together, there are also some common practices that I want to mention.

Firstly, there's an attitude amongst some beginners that ignoring compiler warnings is OK. That's very much not the case! The fact that your code compiles with warnings but no errors guarantees that it's written well enough to run - it does not guarantee that it is written well enough to run correctly. The compiler isn't generating those warnings for its own amusement; it's trying to draw your attention to the location of potential bugs. Check each one out, and decide whether the warning is warranted; you can usually make it go away by making your code more explicit.

Of course, whenever you change the code, think about the change that you're making. Does it make sense? Is it the right thing to do? Don't just do it for the sake of silencing the warning; do it for the sake of improving the code quality. Compilers include an option for treating warnings as errors, preventing compilation from succeeding if there are any warnings generated; turning this on is not a bad idea. It's perfectly possible to keep a few million lines of code free from warnings. You can also increase the strictness of the compiler by increasing the warning level - this will cause it to report things that it would normally just let slide. In the event that you're faced with a warning that you really can't do anything about, or really don't need to do anything about (such as the use of a compiler-specific extension when you have no intentions of ever using other compilers), then it may be possible to disable the warning - under Microsoft's compilers, this is done using the #pragma directive. This is not something to be done lightly, but sometimes it is necessary as a flood of unwanted warnings can make it harder to spot the ones you need to be paying attention to. Aim to disable warnings only temporarily, and for as short a block of code as possible - Visual Studio supports both explicit enabling of warnings, and saving/restoring the warning state, so it's not hard to minimize the affected area.

Use safe containers. Many people - particularly beginners - fall foul of buffer overruns and general pointer mismanagement issues because they use straight arrays when the container classes in the STL would be more suitable. While it's important to know about arrays, using the STL containers like vector and string will help you avoid mistakes like using an array index that is out of bounds or forgetting to reserve space for the null terminator on your string. Similar containers exist in other technologies - ArrayList under .NET, for example. Also, don't forget that std::string is a container type, and does support iterators, just like list or vector - this means you can use functions from <algorithm> like for_each(), which let you achieve pretty much anything the C string functions can do but in a safer manner.

In a similar vein, for C++ coders, use C++-style IO - the iostream library. The C-style functions like printf and scanf rely on you to ensure that the format string matches the other arguments correctly; because there's no guarantees made about types it becomes very simple to accidentally specify a pointer to a variable when you meant the variable itself, or something similar. The C++ iostream library is typesafe, allowing the compiler to do a better job of checking your code for you, and does not require that you keep values in sync with some format string. Also, don't forget that iostream goes beyond input and output. In-memory string formatting a la sprintf can be done by the stringstream class.

The last thing that should be mentioned is assertions. Assertions are "things that are supposed to be true" - for example, you might assert that a particular pointer should never be null, or that the number of items in a list at the end of your function is the same as the number of items in the list at the beginning. If the assertion is tested and found to be false, then the program can immediately halt itself to be checked out in the debugger, instead of waiting for the point at which the situation leads to a crash or something like it. Assertions can save you a lot of time because they catch bugs earlier, and they reinforce ideas about the way your code is supposed to work. One particularly common place for assertions is at the beginning of a function, checking that the parameters that have been passed to the function are all valid.

However, assertions are not designed to replace error checks. In a release build, you're often no longer interested in why something went wrong, because the build has been released and you're not a position to fix the problem - you just want to know that it did go wrong, so that your other code can account for that. Consider also the fact that testing an assertion does take a small amount of time, so having lots of them around can reduce performance.

For these reasons, release builds will still contain the broader, coarser error checks designed to stop the program from crashing, but will often be configured to ignore assertions, leaving them out of the compiled executable. For this reason, you should never assert expressions that have side-effects - you shouldn't assert 'i++,' for example, as it will not happen in a release build, thus causing behaviour to differ. Assertion expressions should treat everything as read-only.

How do you put assertions into your code, and have them tested? The fastest approach is to use the assert() function in the C Runtime Library, which will give you a generic error dialogue reporting the thing you were asserting. However, you can often get a lot more information by writing your own assertion macro and handler; you can include state about your game, you can have the handler write a log message instead of throwing up a dialogue, you can provide the option to ignore the assertion and continue on anyway, etc.

Post-mortem Debugging

One of the features Visual Studio provides that people tend to forget about is the fact that it supports post-mortem debugging - that is, debugging your program after it has crashed. If you had the debugger attached to your program when it crashed then you'd see it catch the crash and pause the program for you, but if you didn't have it attached then it may seem like any information is lost. Not so!

Windows contains a component that allows the creation of dump files. A dump is a simple copy of some part of your program's current state, ranging from the contents of memory that it's using, to the contents of the processor registers and the call stack, to lists of Windows resources that it owns. They can get quite large - in some cases, can contain a copy of everything that was in memory on your machine at the time of the crash - but can also be quite small, no more than 64K, depending on what you include. The tool usually used to create these dump files is called 'Dr. Watson;' it launches automatically upon a crash, and can also be launched manually if a program has hung (but not crashed). You can also initiate a dump from code using the DbgHelp API, and the MiniDumpWriteDump() function. Here's an article from CodeProject, if you're feeling particularly adventurous.

Bright sparks amongst you may have already spotted that if you want to write a dump after a crash, you'll be in no position to call functions like MiniDumpWriteDump() - in fact you'll be in no position to call functions at all, because control has passed to the code that produces the crash dialogue. In truth, Windows allows you to install your own code that gets run when the program crashes, via SetUnhandledExceptionFilter(). You can write a function that calls MiniDumpWriteDump() to make the dump just the way you want it, and then install that function using SetUnhandledExceptionFilter(). There may be other useful tricks you can do, such as automatically submitting the dump to an internal bug tracking system; you need to be careful, however, that whatever code you write in the function cannot throw any unhandled exceptions itself. A try/except block may help you here.

If you obtain a dump file from a crash, you can load it up into Visual Studio (though "Open Project/Solution"), and you will be able to inspect the state of the application at the moment of the crash as if you had the debugger attached. You can't try to continue running the program, but the information provided is often invaluable.

Bear in mind that for the debugger to make sense of the dump file, it needs to have both the executable/DLLs and the symbol files for the program present, and they need to be the exact same version as was in use on the crashing machine. As such, if you intend to collect and analyse dump files from users, it is very important that you clearly track version numbers on builds that you release to them, and that you keep the symbol files from that build somewhere safe. Without them, you're wasting your time.

Rolling your own debugging facilities

While we've looked at a fair number of reasonably powerful tools in this article, they're general-purpose beasts, and it's usually going to be true that they do not report information in a format that is really optimal for your situation. They're good at presenting a text-based view of a static shot of your program, but frequently one needs more than that. What if you want to view how a value is changing over time? What if you want to display information graphically? For whatever reason, whether the usual tools are unsuitable or unavailable, it is sometimes necessary to "throw good code after bad." And you may even wish to do so pre-emptively - debug code can be used to check that things are working correctly just as easily as it can be used to investigate when they break.

The oldest instances of this are things like traces. If you're working in a situation where you can't break/step program flow but you want to see how things are executing, it's fairly common to insert some (temporary) code that something highly visible to happen - a message to be displayed, the screen to change colour, etc.

Moving into a more game-specific context, something that people often overlook is developer cheats. The ability to have infinite lives or infinite ammo may seem very handy to players, but it's likely even more handy to developers who wish to test out the second part of a level without having to worry about getting killed working their way through the first part. Things like the ability to move the camera completely freely ("flymode") can be very handy for inspecting graphics issues.

An entire subcategory of developer cheats is data visualisation. This means putting data about the internal state of your game into the screen somehow. The most common instance of this is an on-screen framerate counter; you might also see memory allocation counts or network activity graphs. The information presented is often related directly to what is visible in the regular game view; the current AI state of an NPC might be drawn over that character's head, for instance.

Another component, useful for diagnosing problems in release builds if crash dumps are not available, is log files. A log file is just a file to which the game writes messages about operation - information about the game startup process, information about the shutdown process, information about when it loads a new level or fails to find a particular sound. Log files are often used in very similar ways to the debugger's output window, but have the advantage of persisting on disk after the debugger is closed, and of not requiring that a debugger be installed (allowing them to be used for error reporting on end-user machines).

Lastly, some projects have included a facility for remote debugging, which is debugging the program from another machine via a network. It's worth noting that Visual Studio itself supports this via the 'Attach to Process' command, though as has been said, more specialised tools can be useful to have. It's perfectly possible to write a small information server into your game that you can communicate with via a customised client program over whatever network technology you like, perhaps displaying your scene graph or your entity list in an easily consumable format. Shea Street wrote recently that he has embedded a tiny HTTP server in his project, allowing him to inspect and interact with the game on a developer level using a simple web browser.

The Probe Effect

Unfortunately, some bugs - particularly timing-related ones - tend to be affected by the mere presence of the debugger; the added slowdown just happens to be enough to stop the bug from happening, to cause threads to switch execution order and so on. This is known as the "probe" effect, in that "probing" the bug with the debugger causes it to disappear (making the bug in question a "heisenbug").

How does one combat the probe effect? You can't attach a debugger because that'll cause the bug to disappear. You can try and change the nature of the bug to stop it from being affected - this can sometimes be achieved by trying to make the bug more pronounced and then attaching a debugger in the hope that it will no longer be enough to knock the bug undercover.

Most often, the probe effect is seen when handling bugs related to multiple threads or multiple processes and shared memory. If one thread is consistently performing an action before another thread that is leading to the bug, then attaching the debugger could affect the timing of the threads to the extent that the order switches around. If you suspect that this is the case, then you can try forcibly desynchronising the threads to exaggerate the condition (using a mutex that forces one thread to wait for the other, for example), allowing you to attach the debugger without causing the bug to disappear. (Desynchronising the threads in the other direction is not usually a suitable fix). It's important to collect whatever information you can to try and understand what is going on at the moment the bug occurs; this may mean resorting to a straight-out code inspection to see where threads interact with shared data and resources. (Bear in mind that the threads in contention may not necessarily be ones that you yourself created, or even ones that show up in your process space).

At the last resort, however, the only option is exploratory surgery - guessing at where the bug lies, creating and applying a fix, then testing to see if it worked. While a fix may not work, it will usually provide you with more information to improve your second guess, and can sometimes stop the probe effect from happening (allowing you to gather information through more orthodox channels). If this route appears to be necessary, it's important to ensure that a backup of your code exists before you start, and that the bug is very clearly defined - a bug that nobody can really consistently reproduce and that doesn't show up under the debugger is practically a ghost and will be impossible to track down.

Conclusion

Contents

	Introduction
	Issue Recognition
	Execution Flow tools
	Other Tools
	Diagnosis and Prescription
	Common runtime bugs
	Miscellany
	Conclusion

	Printable version
	Discuss this article