Using Modern C++ to Eliminate Memory Problems
by Dave Mikesell


ADVERTISEMENT

0. Introduction

Programming is hard. Programming in C++ is even harder. Unfortunately, it is often made unnecessarily hard by programmers' resistance to adopt modern, safer methods and idioms. Bring up the topic of C++ at the lunch table and -- especially if there are Java programmers present -- you will be greeted with the customary horror stories of buffer overruns, memory leaks, and wild pointer errors that led to caffeine binges and marathon programming sessions.

Sadly, these kinds of errors occur far too often in C++ programs. Not because the language is inherently unsafe, but because many C++ programmers don't know how to use it safely. If you are tired of these kind of errors in your C++ programs, you've come to the right place. Relax, put down that Java compiler before it goes off, and follow the simple rules outlined in this article.

1. Use std::string instead of char * or char []

Character arrays are the only way to encapsulate string data in C. They're quick and easy to use, but unfortunately their use can be fraught with peril. Let's look at some of the more common errors that occur with character pointers and arrays. Keep in mind that most if not all of these problems will go undetected by the compiler.

Ex. 1 - Forgetting to allocate enough space for string terminator

char myName[4] = "Dave";  // Oops! No room for the '\0' terminator!

strcpy(anotherName, myName);  // Might copy four characters or 4000

Ex. 2 - Forgetting to allocate memory for a char *

char * errorString;
...
strcpy(errorString, "SomeValueDeterminedAtRuntime");

Usually this error is caught rather quickly with a segmentation violation.

Ex. 3 - Returning a pointer to space allocated on the stack

char * getName()
{
  char name[256];
  
  strcpy(name, "SomeStaticValue");
  ...
  strcat(name, "SomeValueDeterminedAtRuntime");

  return name;
}

char * myName = getName();

Once the function returns, the space allocated to name is returned to the program. This means myName might point to something unexpected later.

Ex. 4 - The dread function sprintf()

char buf[128];

sprintf(buf, "%s%d", someValueGottenAtRuntime, someInteger);

Unless you are absolutely sure of how much space you need, it's all too easy to overrun a buffer with sprintf().

Now, let's revisit each example and show how a std::string eliminates the aforementioned problems:

Ex. 1a

std::string myName = "Dave"; 
std::string anotherName = myName;

Ex. 2

std::string errorString;
...
errorString = "SomeValueDeterminedAtRuntime";

Ex. 3

std::string getName()
{
  std::string name;

  name = "SomeStaticValue";
  ...
  name += "SomeValueDeterminedAtRuntime";

  return name;
}

std::string myName = getName();

Ex. 4

std::string buf;
std::ostringstream ss;

ss << someValueGottenAtRuntime << someInteger;
buf = ss.str();

This one's a no-brainer, folks. Avoid the headaches associated with character arrays and pointers and use std::string. For legacy functions that expect a character pointer, you can use std::string's c_str() member function.

2. Use standard containers instead of homegrown containers

Besides std::string, the standard library provides the following container classes that you should prefer over your homegrown alternatives: vector, deque, list, set, multiset, map, multimap, stack, queue, and priority_queue. It is beyond the scope of this article to describe these in detail, however you can probably ascertain what most of them are by their names. For a proper treatment of the subject, I highly recommend the book by Josuttis listed in my references.

An important feature of the standard containers is that they are all template classes. This is a powerful concept. Templates let you define lists (or stacks or vectors) of *any* data type. The compiler generates type-safe code for each type of list you create. With C, you either needed a list for each type of data it would hold (e.g. IntList, MonsterList, StringList) or the list would hold a void * that pointed to data in each node; somewhat the antithesis of type-safety.

Let's look at a simple example with the commonly used std::vector. You'll want to use std::vector (or std::deque) instead of variable length arrays.

#include <vector>
#include <iostream>

using namespace std;

int main()
{
  vector<int> v;

  // Add elements to the end of the vector.
  // The vector class handles resizing
  v.push_back(1);
  v.push_back(2);
  v.push_back(3);
  v.push_back(4);

  // Careful - bounds-checking not performed
  cout << v[2] << endl;

  // iterate like you would with arrays
  for (int i = 0; i < v.size(); i++) {
    cout << v[i] << endl;
  }

  // iterate with an iterator
  vector<int>::iterator p;
  for (p = v.begin(); p != v.end(); p++) {
    cout << *p << endl;
  }
}    

In addition to providing generic, type-safe containers for any data type, these classes also provide multiple ways to search and iterate, and like std::string, they manage their own memory - a huge win over rolling these things yourself. I can't stress enough how important it is to familiarize yourself with the standard containers. Josuttis' book is an invaluable reference that is always within arm's reach of my keyboard.

3. Manage your class data to avoid resource leaks and crashes

There are other resources besides pointers that you might have to manage in your program: file handles, sockets, semaphores, etc. And sometimes you will manage raw pointers. The way to effectively manage resources in C++ is to wrap them in a class that performs their allocation and initialization in the constructor and their deallocation in the destructor:

class ResourceHog {
  SomeObject * object_;
  FILE * file_;
public:
  ResourceHog()
  {
    object_ = new SomeObject();
    file_ = fopen("myfile", "r");
  }
  ~ResourceHog()
  {
    delete object_;
    fclose(file_);
  }
};

The constructor allocates and opens resources, the destructor closes and deallocates them. But you're not done yet. When your objects are copied or assigned, their copy constructors and assignment operators are called.

"But I didn't write a copy constructor or assignment operator!", you protest.

Perhaps you didn't, but your compiler did for you, and their default behavior is to do a bitwise copy of your member data -- probably not what you want if your object manages pointers or other resources. Consider the following:

class String {
  char * data_;
public:
  // Member functions
};

String s1("hello, memory problems");
String s2 = s1;  // String copy constructor called
String s3;

s3 = s1;   // String assignment operator called

If you didn't write a copy constructor or assignment operator for String to allocate space for the data_ member and copy its contents, then your compiler generated one that merely copies the data_ *pointer*. Now you have three String objects, each with a copy of the same pointer but not with a copy of the data which is pointed to by the pointer.

When the destructor is called for the first of these String objects that goes out of scope, the pointer is deleted and the memory pointed to is reclaimed by the system. Recall that the other two copies of the pointer still point to that reclaimed data -- i.e., they are "dangling pointers" -- so when one of their destructors is called, an access violation occurs and your program crashes. The access violation occurs since an attempt to access reclaimed memory is made.

One way to fix this problem is to not let client code copy or assign objects of your class. The trick here is to *declare* the copy constructor and assignment operator, but not implement them:

class String {
  char * data_;
  // Copy c'tor with no implementation
  String(const String &);
  
  // Assignment op with no implementation
  String & operator=(const String &);
public:
  // Member functions
};

Now code that tries to copy or assign objects of your class will not compile. That's because it can't access your private functions. Friend and member functions that try the same will compile, but the linker will stop them since these functions are not defined.

Not allowing copying and assignment is, in fact, my default MO when designing a new class. Rarely do you need multiple copies of objects floating around your program. If you do need a copy constructor and assignment operator, they are pretty easy to code up. You're just creating a new object from an existing one:

Matrix::Matrix(const Matrix & m)
{
  data_ = new float[16];
  memcpy(data_, m.data_, 16 * sizeof(float));
}

Matrix & Matrix::operator=(const Matrix & m)
{
  if (this != &m) {  // Check for self-assignment
    float * newData = new float[16];
    delete [] data_;
    data_ = newData;
    memcpy(data_, m.data_, 16 * sizeof(float));
  }

  return *this;
}

A good rule of thumb is that if you need a destructor, then you also need a copy constructor and assignment operator (or you need to prevent other code from calling them as shown above).

4. Use virtual destructors in class hierarchies

Compile and run the following program and observe what happens when the pointer to Base is deleted:

#include <iostream>

using namespace std;

class Base {
  // Base private data
  // Base private member functions
public:
  ~Base() { cout << "~Base()" << endl; }
  // Base public member functions
};

class Derived : public Base {
  // Derived private data
  // Derived private member functions
public:
  ~Derived() { cout << "~Derived()" << endl; }
  // Derived public member functions
};

int main()
{
  Base * bp = new Derived();

  delete bp;
}

On my system, the output looks like this:

$ myprogram
~Base()
$

The Derived destructor wasn't called! This means that the Derived portion of the object is still floating around and taking up memory, and you have no way of getting it back. That's called a memory leak. It's also a problem in the C++ world known as the "slicing" of an object. Fortunately, it is easily prevented by declaring your base class destructors as virtual:

virtual ~Base() { cout << "~Base()" << endl; }

Now the destructor is called polymorphically through the pointer to Base. What this means is that the object is destructed in reverse order in which it was constructed: the Derived destructor is called, then the Base destructor, which is what you want.

Making ~Base() virtual in the above example, you'll see that the object is now properly destructed from the top down:

$ myprogram
~Derived()
~Base()
$

5. Use smart pointers instead of raw pointers

In addition to wrapping pointers in classes as shown above, "smart" or "managed" pointers can be a great help in managing memory. The standard library offers std::auto_ptr, which is designed to prevent memory leaks in the face of thrown exceptions. As shown here, raw pointers are prone to leaks when exceptions are thrown before they are cleaned up:

Foo * fp = new Foo();   
someFunctionThatMightThrow();
delete fp;

If the function throws an exception before you delete fp, you've leaked memory. You can fix this with std::auto_ptr, whose destructor will clean up the pointer it owns:

std::auto_ptr<Foo> fp(new Foo());
// if it throws, ~auto_ptr() cleans up the Foo *
someFunctionThatMightThrow();

auto_ptr has limitations, however. Its copying semantics transfer ownership of the raw pointer. Therefore, if you pass an auto_ptr *by value* to a function, the temporary copy in the function gains ownership of the raw pointer. When the function returns, the temporary copy is destroyed along with the raw pointer -- almost certainly what you don't want! If you want to avoid this kind of behavior, use a smart pointer that implements "reference counting", like Boost's shared_ptr:

boost::shared_ptr<Foo> fp(new Foo());

You can now pass fp around and not worry about ownership issues. shared_ptr will managed the references to it and delete the raw pointer when the last reference goes out of scope. Smart pointers are often used in containers:

std::vector<boost::shared_ptr<Monster> > monsters;

boost::shared_ptr<Monster> imp(new Imp());
monsters.push_back(imp);

boost::shared_ptr<Monster> troll(new Troll());
monsters.push_back(troll);

boost::shared_ptr<Monster> ogre(new Ogre());
monsters.push_back(ogre);

Contrast the above with the old fashioned C solution using a variable sized array holding raw pointers. I get chills up and down my spine just thinking about it. std::vector and boost::shared_ptr handle all of the memory management for you as well as provide access, iteration, searching, etc. You are now free to work on your application logic and stop chasing pointer errors.

For more information on boost::shared_ptr and the entire Boost library, visit http://boost.org.

6. Summary

To recap, the following simple rules can help you avoid memory problems in your C++ programs:

  1. Use std::string instead of char * or char []
  2. Use standard containers instead of homegrown containers
  3. Manage your class data to avoid resource leaks and crashes
  4. Use virtual destructors in class hierarchies
  5. Use smart pointers instead of raw pointers

Finally, a word about performance. While these idioms can help you write safer and more robust programs, there is a cost associated with them, both in code size and performance overhead. You may not be checking the bounds of std::vector or std::string, but the code inside them certainly is, and this costs CPU cycles. So, you might conclude that the moral of the story is that while this stuff is nice in theory, your application needs to scream, so in reality you'll just continue using good old fashion pointers, arrays, and homegrown containers. Right?

Wrong. As a wise programmer once said, premature optimization is the root of all evil. Do things the safe way first. If your program is slower than you need, profile it in a profiler and find the bottlenecks. Then and only then hand tune the offending code. Chances are -- especially in graphics applications and games -- the bottlenecks in your code won't be in the C++ standard library, but rather in your algorithms.

I hope you found this article helpful. You can find more articles, software, and other downloads at http://davemikesell.com.

7. References

  1. Josuttis, Nicolai, The C++ Standard Library, Addison Wesley
  2. Meyers, Scott, Effective C++, Addison Wesley
  3. Meyers, Scott, More Effective C++, Addison Wesley
  4. Cline, M., Lomow, G., Girou, M., C++ FAQs', Addison Wesley

Discuss this article in the forums


Date this article was posted to GameDev.net: 10/26/2003
(Note that this date does not necessarily correspond to the date the article was written)

See Also:
C and C++
Featured Articles

© 1999-2011 Gamedev.net. All rights reserved. Terms of Use Privacy Policy
Comments? Questions? Feedback? Click here!