Upcoming Events
Unite 2010
11/10 - 11/12 @ Montréal, Canada

GDC China
12/5 - 12/7 @ Shanghai, China

Asia Game Show 2010
12/24 - 12/27  

GDC 2011
2/28 - 3/4 @ San Francisco, CA

More events...
Quick Stats
118 people currently visiting GDNet.
2406 articles in the reference section.

Help us fight cancer!
Join SETI Team GDNet!
Link to us Events 4 Gamers
Intel sponsors gamedev.net search:

Introduction to Debugging


Dealing with a couple of common runtime bugs

Now that we've covered the full debugging process, let's look at how it's put into action on a couple of common types of bug. We'll only look at the first three stages of the process here.

Access Violation

Symptom: The program crashes. The error report gives an exception information code of 0xc0000005.

Intelligence-gathering:

  • Run the program in the debugger and do what you did to make the program crash again. The debugger should catch it this time, telling you that it has halted the program because "an unhandled exception of type 0xc0000005 has occurred: Access Violation" and a memory address.
  • Look at the memory address reported. If it's 0x00000000, or a very low value like 0x0000000B, then we're probably looking at a null pointer dereferencing. If it's a higher value like 0x00455CD2, then we're probably looking at a pointer corruption bug.
  • There are a few other special codes to look out for as well - values close to 0xCDCDCDCD, 0xCCCCCCCC or 0xBAADF00D indicate an uninitialised variable, while values close to 0xDDDDDDDD and 0xFEEEFEEE indicate recently deleted variables. If you see these in a pointer, it doesn't mean that the pointer is pointing to uninitialised or deleted memory - it means that the pointer itself is uninitialised or has been deleted. The other one to watch out for is 0xFDFDFDFD - it can indicate that you're reading past the beginning or end of a buffer.
  • Dismiss the exception report message, and the debugger will show you the point in your program at which the exception occurred.
  • Mouse-over (or add to the watch window) the variables in the line of code in question. Do they all appear to have reasonable values? If the error is due to dereferencing a null pointer then you will probably find at least one variable with value 0x00000000; if it's not a null pointer, then look for values that are close to the one reported in the error (usually on the side of being slightly smaller). Also, check array indices - accessing a normal array using a way-out-of-bounds array index can sometimes result in an access violation, too.
  • Use the watch window, if necessary, to evaluate larger parts of the expression. For example, your code might contain "arrayvar[index]->member," where both arrayvar and index are valid but arrayvar[index] is actually null.
  • Once you've found the suspect data, consider what that means in the context of your program. What does it mean to say that the current player object is null? Is that something that makes sense to have? Is it something you should allow?
  • It's possible that the data in question shouldn't happen - you shouldn't be able to call "Player::SetCurrentWeapon" if the player object is null. Use the Call Stack window to see how your program reached this point; if the data is a parameter then you might be able to see it being passed up the stack. Trace it back, checking that things are behaving correctly as they go along, to find the point where your program first started going off the rails.
  • If the pointer in question is becoming corrupted after working fine for a while, and you can't figure out where it's being changed, try setting up a data breakpoint on it. Unless the pointer is global, its address can change with each run of the program and so you may need to run through program initialisation before you can set up the breakpoint. Using an unconditional breakpoint at the end of your initialisation code is the simplest way to achieve that.

Diagnosis: It depends on the code in question, but the common diagnoses for access violations are the following:

  • Failure to check for null before dereferencing a pointer, i.e. you just do ptr->something without first doing if(ptr)
  • Failure to check for invalid function parameters, i.e. there is no if(!isValid(argument)) return; at the beginning of a function
  • Failure to reset a pointer to null after deleting the memory it points to
  • Pointer overwritten due to a bug elsewhere, e.g. a buffer overrun

Logic Bug

Symptom: The program runs without crashing, but behaves incorrectly, e.g. allowing you to select weapons you do not currently have, or declaring the game won/lost at the wrong time.

Intelligence-gathering:

  • First, decide whether the program is doing something it should not be doing, or whether it is not doing something it should be doing.
  • If it's doing something it should not be doing, stick a breakpoint on it and then push the context up the call stack to see how it got there.
  • If it's not doing something that it should be doing, find the function that you expect to be calling it and stick a breakpoint on that function. If that breakpoint never gets hit, stick a breakpoint on a function that calls that function, etc. Eventually you should end up in some code that is actually being run, so the breakpoint will get hit. As a sanity check, it may first be worth sticking a breakpoint on the code itself to be sure that it's not being called - sometimes it can seem like the code isn't being called, when actually, it's being called just fine, it's just not doing anything.
  • When you find the code that is erroneously calling the function (or not calling it), put a breakpoint at the beginning of it and step through it to see how it behaves in the run-up to the call. Put key variables in the watch window to monitor them; you can put more complex expressions, such as parts of the conditions used for if statements, there as well if necessary.

Diagnosis: Usually, logic bugs come from incorrect 'if' or 'while' statements - either using conditions that are incorrect (commonly, using 'and' instead of 'or', or 'greater than' instead of 'less than'), or conditions that are incomplete (i.e. there's an extra thing you need to be checking for). Less commonly, they can come from syntax errors - an accidental semicolon after an if statement, for example.





Miscellany


Contents
  Introduction
  Issue Recognition
  Execution Flow tools
  Other Tools
  Diagnosis and Prescription
  Common runtime bugs
  Miscellany
  Conclusion

  Printable version
  Discuss this article