Enginuity, Part III
Profiling, Settings, and the Kernel;
or, Stop Watching That Military Control Panel
by Richard "superpig" Fine

Download the source code for this article here.

It's time for another helping of game engine goodness. This time we're going to add a simple profiler to the engine, look at working with game settings, and finally move on to the heart of the engine - the Kernel.

This article builds on the code built up in the previous one, so if you've not read it, go do that already.

Runtime Profiler

Have you ever had one of those days when everything seems to be going really slowly? Everyone seems quiet and muffled, as if there's some invisible, thick fog throughout the air. And that's *after* you've had your morning coffee.

Your code will have times like that, too. But while you can just go to sleep and assume that things will be better in the morning, your code isn't going to change itself. So when your framerate's dropped lower than Sanford Wallace, what do you do? It's time for that wonderful process of Optimization.

"Optimizing?" you cry, "Optimizing?! Optimizing is for professional-edition compilers, masochists, and little girly-men! I do not need to optimize!" Er, OK. I'll let you get back to chopping down trees somewhere. To put it to you straight: ALL professional development houses profile and optimize their games. I don't just mean at the end of development, either - it's an ongoing process. You don't want to build something which relies on a particular aspect of an algorithm somewhere else, if you then go in and replace that algorithm because it's too slow. (Well, you don't want to build something which relies on a particular aspect of an algorithm somewhere else *full stop* - to never make assumptions about implementation details is a fairly important object-orientation concept). Because optimization usually involves going back through your code and looking for slow points, it'll help you spot bugs as well.

But before you start ripping apart your algorithms and converting things to assembler, how do you know which parts you should be working on? It's quite possible that the math code you thought was horribly slow isn't actually being called very much, so it's not going to be responsible for your speed problems. What you need is a way of timing bits of your code, to see where you're losing momentum. Enter the Runtime Profiler.

A Profiler, in case any of you haven't figured it out yet, is a tool for timing bits of your code. Professional Profiling tools can analyze your entire game loop, processing the debug information generated by your compiler to give you the very line numbers of the sluggish bits of code responsible for your problem; and, understandably, they cost large amounts of money. So we're not going to do that. We're going for something much simpler - more of a 'manual' approach. I don't mean that you're going to need to go buy a stopwatch; I mean that when you want to time something, you're going to have to make a couple of calls, and the engine will handle it. You'll be able to see what percentage of the game loop's time gets spent on your chosen blocks of code, along with minimum and maximum peaks (so you can watch for when you walk into a crowded room, for example).

Here's a rough overview of how the profiler will look to the user (i.e. you):

A CProfileSample object is created with a given name at the beginning of the block of code. It starts the timer.
At the end of the block of code, the CProfileSample object is destroyed (either explicitly, or by letting it fall out of scope). It stops the timer.
When starting up the engine, you set a pointer to an object derived from CProfilerOutputHandler, which is responsible for logging/drawing the statistics the way you want.
Static functions of CProfileSample allow you to reset the statistics for individual samples or for all samples in the system.

Not exactly hard. You can just enclose a block of questionable code in curly braces, put a CProfileSample sample("MyProfileSample"); line at the beginning of it, and the profiler will do the rest. Neat, hmm? Of course, such ease-of-use comes with a price, and that price is the difficulty of the implementation. But given that you can just download the source code I already made for you, what do you care? ;)

Two of the important things about a profiler are that it has to be passive (because if it interferes with the code we're timing, the test is pointless), and that it has to minimize overhead. There's no point using the profiler if it takes longer to time the code than it does to execute it! So, we avoid the more costly operations - things like dynamic memory allocation (hence creating the CProfileSample on the stack rather than from the heap).

Another useful feature we want to add is a parent/child relationship setup. Say I profile my ErrorLogging routine, and it's horribly slow - ideally, I'd like to then profile *within* that sample to see which parts of the code are slowing it down. It's more useful when you're higher up - for example, I profiled my Server routine (yes, I've written it already :) ), and found that it was using up to 6% of the loop time (compared with the other tasks which were using around 0.2%). So, I added extra profiling samples within the server routine, and discovered that the section I had expected to be slowest - the socketGroup updater - was actually fine, and it was the gamestate broadcast message routine that was causing the problem.

Also - a relatively simple addition - it might well be useful to see how many times per frame a block is executed. If you see something using 20% of your loop time, but it's being called 1000 times, you perhaps don't need to worry so much about optimising the routine itself - it would possibly be more efficient to try and make fewer calls to it.

So how do we work? Upon creation of a CProfileSample object, it needs to do a few things:

Record the current 'topmost' sample - that's the parent sample, the sample 'enclosing' our new one
Retrieve the sample's statistics and data from storage by searching using the profile name, or create a new set of data for the sample
Increment the number of times this sample has been executed this frame
Flag this sample as being active
Increment the global number of open samples
Lastly, as close as possible to finishing, record the current time as the sample's start time.

There's also the issue of the 'topmost sample.' Given that we're eventually going to display the statistics in the form of percentages, we need to know what '100%' *is*, time-wise, so that we can work out what fraction of it 0.06 seconds would be. The easiest way to do this is to rule that there will be a 'main game loop' sample, which will have no parent sample (it will be topmost). When creating a CProfileSample as a topmost sample, it should record the start and end times in special global variables; that way, the times will be available to all samples to calculate their percentages. More on this later, when we actually look at the statistics function.

(It's worth noting, in case some of you find this all sounds a little familiar, that I'm basing my design on Steve Rabin's profiler in Game Programming Gems 1, though I've added my own little improvements to it. The GPG books are an excellent series, and should be on the shelf of any serious game developer).

We can wrap almost all of our profiler up in the CProfileSample class. Take a look:

class CProfileSample
{
public:
  CProfileSample(std::string sampleName);
  ~CProfileSample();

  static void Output();

  static void ResetSample(std::string sampleName);
  static void ResetAll();

  static IProfilerOutputHandler *outputHandler;

protected:
  //index into the array of samples
  int iSampleIndex;
  int iParentIndex;

  inline float GetTime(){ return ((float)SDL_GetTicks())/1000.0f; }

  static struct profileSample
  {
    profileSample()
    {
      bIsValid=false; 
      dataCount=0;
      averagePc=minPc=maxPc=-1;
    }

    bool bIsValid;    //whether or not this sample is valid to be used
    bool bIsOpen;     //is this sample currently being profiled?
    unsigned int callCount; //number of times this sample has been executed
    std::string name; //name of the sample
    
    float startTime;  //starting time on the clock, in seconds
    float totalTime;  //total time recorded across all executions of this sample
    float childTime;  //total time taken by children of this sample

    int parentCount;  //number of parents this sample has
                      //(useful for neat indenting)

    float averagePc;  //average percentage of game loop time taken up
    float minPc;      //minimum percentage of game loop time taken up
    float maxPc;      //maximum percentage of game loop time taken up
    unsigned long dataCount; //number of times values have been stored since
                             //sample creation/reset
  } samples[MAX_PROFILER_SAMPLES];
  static int lastOpenedSample;
  static int openSampleCount;
  static float rootBegin, rootEnd;
};

There's quite a lot there... in the public section of the class, you've got the standard constructor and deconstructor - slightly more significant than usual in this class as we'll see in a minute - and the constructor takes the name of the sample you want to use (that is, the 'unique name' you use to recognise the block of code you're profiling). The rest are all static members; an Output() function to send all the statistics to the output handler, ResetSample/ResetAll functions for resetting sample values, and a static pointer to the aforementioned output handler, an object derived from IProfilerOutputHandler (which is an abstract class).

Then we move down into the protected section of the class, and things get a little more interesting.

The first thing I want to draw your attention to is the profileSample structure. Because each CProfileSample object is created and destroyed every time you execute the code you're profiling, we can't keep it's vital statistics as member data; this is the 'storage' I referred to earlier. There's a basic constructor to flag the structure as invalid (so that it will be initialised when it gets used). The comments should be fairly self-explanatory; all of those members will, of course, be covered in more detail when we come to use them.

Besides the profileSample structure there's not much in the protected section. You've got iSampleIndex and iParentIndex - the sample numbers for this sample and the parent sample respectively - and an inlined GetTime() function, which returns the time since a fixed point (such as app startup) in seconds.

There's also four static variables at the end there. lastOpenedSample records the index of the sample that was last started - the 'top of the stack,' so to speak, and it is used to set the iParentIndex upon creation of a CProfileSample. openSampleCount simply tracks how many samples are currently open - it gets used to set parentCount. Finally, rootBegin and rootEnd are the 'special global variables' I said the topmost sample would have to store its start and end times in. Got all that? Good. Let's look at the functions themselves.

CProfileSample::CProfileSample(std::string sampleName)
{
  //The first thing we need to do is restore our previous pieces of sample
  //data from storage. That is, look in the samples[] array to see if there's
  //a valid sample with our name on it
  int i=0;
  //If we don't find it, we're going to need to create a new sample, and rather
  //than looping through the list a second time we store the first non-valid
  //index that we find this time round
  int storeIndex=-1;
  for(i=0;i<MAX_PROFILER_SAMPLES;++i)
  {
    if(!samples[i].bIsValid)
    {
      if(storeIndex<0)storeIndex=i;
    }else{
      if(samples[i].name==sampleName)
      {
        //this is the sample we want
        //check that it's not already open
        //assert only works in debug builds, but given that you don't use
        //the profiler in release builds, it doesn't really matter
        assert(!samples[i].bIsOpen &&
          "Tried to profile a sample which was already being profiled");
        //first, store its index
        iSampleIndex=i;
        //the parent sample is the last opened sample
        iParentIndex=lastOpenedSample;
        lastOpenedSample=i;
        //and the number of parents is the number of open
        //samples (excluding ourselves)
        samples[i].parentCount=openSampleCount;
        ++openSampleCount;
        samples[i].bIsOpen=true;
        //increment the number of times we've been called
        ++samples[i].callCount;
        ///finally (more or less) store the current time to start the sample
        samples[i].startTime=GetTime();
        //if this has no parent, it must be the 'main loop' sample, so copy
        //to the global timer as well
        if(iParentIndex<0)rootBegin=samples[i].startTime;
        //done
        return;
      }
    }
  }
  //we've not found it, so it must be a new sample
  //use the storeIndex value to store the new sample
  assert(storeIndex>=0 && "Profiler has run out of sample slots!");
  samples[storeIndex].bIsValid=true;
  samples[storeIndex].name=sampleName;
  iSampleIndex=storeIndex;
  iParentIndex=lastOpenedSample;
  lastOpenedSample=storeIndex;
  samples[i].parentCount=openSampleCount;
  openSampleCount++;
  samples[storeIndex].bIsOpen=true;
  samples[storeIndex].callCount=1;

  //init the statistics for this sample
  samples[storeIndex].totalTime=0.0f;
  samples[storeIndex].childTime=0.0f;
  samples[storeIndex].startTime=GetTime();
  if(iParentIndex<0)rootBegin=samples[storeIndex].startTime;
}

Pretty simple. It seeks out the existing sample, and if it can't find it, it set up a new one; either way, the sample is set up for this run of the profiler. That should be fairly simple (I hope) - we do a little more processing in the destructor, and the bulk of the statistics calculations in the Output() function. Here's the destructor:

CProfileSample::~CProfileSample()
{
  float fEndTime=GetTime();
  //phew... ok, we're done timing
  samples[iSampleIndex].bIsOpen=false;
  //calculate the time taken this profile, for ease of use later on
  float fTimeTaken = fEndTime - samples[iSampleIndex].startTime;

  if(iParentIndex>=0)
  {
    samples[iParentIndex].childTime+=fTimeTaken;
  }else{
    //no parent, so this is the end of the main loop sample
    rootEnd=fEndTime;
  }
  samples[iSampleIndex].totalTime+=fTimeTaken;
  lastOpenedSample=iParentIndex;
  --openSampleCount;
}

As you can see, the first thing we do is to stop the clock. Once we've done that, we can relax a little when it comes to speed overhead... we cleanup the sample, updating the statistics for this sample on this frame, and the parent sample (if there is one). We then 'pop the stack' - reset the lastOpenedSample and openSampleCount values to what they were before we started profiling (assuming nobody has done anything stupid like try to construct and destruct CProfileSamples out-of-order). Also fairly simple. So let's move on to the Output() function, which calculates all the statistics for a given sample across the entire app, and sends them to the OutputHandler:

void CProfileSample::Output()
{
  assert(outputHandler && "Profiler has no output handler set");
  
  outputHandler->BeginOutput();

  for(int i=0;i<MAX_PROFILER_SAMPLES; ++i)
  {
    if(samples[i].bIsValid)
    {
      float sampleTime, percentage;
      //calculate the time spend on the sample itself (excluding children)
      sampleTime = samples[i].totalTime-samples[i].childTime;
      percentage = ( sampleTime / ( rootEnd - rootBegin ) ) * 100.0f;

      //add it to the sample's values
      float totalPc;
      totalPc=samples[i].averagePc*samples[i].dataCount;
      totalPc+=percentage; samples[i].dataCount++;
      samples[i].averagePc=totalPc/samples[i].dataCount;
      if((samples[i].minPc==-1)||(percentage<samples[i].minPc))
        samples[i].minPc=percentage;
      if((samples[i].maxPc==-1)||(percentage>samples[i].maxPc))
        samples[i].maxPc=percentage;

      //output these values
      outputHandler->Sample(samples[i].minPc,
                samples[i].averagePc,
                samples[i].maxPc,
                samples[i].callCount,
                samples[i].name,
                samples[i].parentCount);

      //reset the sample for next time
      samples[i].callCount=0;
      samples[i].totalTime=0;
      samples[i].childTime=0;
    }
  }

  outputHandler->EndOutput();
}

So we begin with a quick check that we actually *have* an output handler, followed by a call to outputHandler->BeginOutput() to allow our output handler to clear buffers, print headers, or whatever it wants.

Then we move on to the more interesting stuff. Firstly, the time spent on the sample in itself is calculated - that is, the total time spent on the sample, minus the time spent on children. This time is then turned into a percentage of the time taken for the whole loop to run, making it easier to work with (because it'll be roughly the same across different-speed machines; a chunk of code will be slower, but so will everything else). Next we calculate our statistics. We take the current average, and multiply it by the number of samples taken so far (dataCount) - that gives us the total time spent on this sample since the profiler was started/reset. We add our new value, increment the number of values in the data, and divide again to get the average back. Then, a pair of quick checks let us update the maximum and minimum percentages, if required.

Then we move on to sending the data to the OutputHandler. If you don't see how that's done, I probably can't help you. :)

Lastly, we reset the sample for the next loop. Given that the Output function should be called once per frame - ideally at the end of the game loop, outside of the top-level sample (as we'll see later this article) - it's the 'tick point' that we can use to reset things without worrying. Once all the samples are processed, the outputHandler's EndOutput() function is called, again to allow it to perform whatever clean-up processing is necessary.

The only other functions in CProfileSample are ResetSample and ResetAll - all they need to do is set the bIsValid flags for the appropriate sample(s), so I won't go over them here. Instead, let's move on to the output handler we're going to use for the time being - CProfileLogHandler.

Before we do that, it's necessary to look briefly at the IProfilerOutputHandler class. You've seen all the functions it provides already:

class IProfilerOutputHandler
{
public:
  virtual void BeginOutput()=0;
  virtual void Sample(float fMin, float fAvg, float fMax,
                      int callCount, std::string name, int parentCount)=0;
  virtual void EndOutput()=0;
};

It's just a simple abstract (interface) class, allowing you to derive any class from it to handle profiler output. The class I've derived from it for you is CProfilerLogHandler:

class CProfileLogHandler : public IProfilerOutputHandler  
{
public:
  void BeginOutput();
  void EndOutput();
  void Sample(float fMin, float fAvg, float fMax,
              int callCount, std::string name, int parentCount);
};

void CProfileLogHandler::BeginOutput()
{
  CLog::Get().Write(LOG_APP,IDS_PROFILE_HEADER1);
  CLog::Get().Write(LOG_APP,IDS_PROFILE_HEADER2);
}

void CProfileLogHandler::EndOutput()
{
  CLog::Get().Write(LOG_APP,"\n");
}

void CProfileLogHandler::Sample(float fMin, float fAvg, float fMax,
                   int callCount, std::string name, int parentCount)
{
  char namebuf[256], indentedName[256];
  char avg[16], min[16], max[16], num[16];

  sprintf(avg, "%3.1f", fAvg);
  sprintf(min, "%3.1f", fMin);
  sprintf(max, "%3.1f", fMax);
  sprintf(num, "%3d",   callCount);

  strcpy( indentedName, name.c_str());
  for( int indent=0; indent<parentCount; ++indent )
  {
    sprintf(namebuf, " %s", indentedName);
    strcpy( indentedName, namebuf);
  }

  CLog::Get().Write(LOG_APP,IDS_PROFILE_SAMPLE,min,avg,max,num,indentedName);
}

And that, ladies and gentlemen, prints a profile table to the APP logfile. IDS_PROFILE_HEADER1 and IDS_PROFILE_HEADER2 are the two header lines that make up the table column names and underlining; IDS_PROFILE_SAMPLE is a simple string to put the statistics into the right places. The only really noteworthy thing here is the fact that the name gets indented relative to the parentCount - making it much easier to see parent/child relationships in the logs. EndOutput() logs a newline, just to space things out in my logs (because we're calling this once per frame, so we get a lot of them in the logs). Here's a sample of the output:

  Min :   Avg :   Max :   # : Profile Name
--------------------------------------------
  0.0 :  96.6 : 100.0 :   1 : Kernel task loop
  0.0 :   0.6 :   6.3 :   1 :  Renderer
  0.0 :   0.1 :   6.3 :   1 :  Server
  0.0 :   0.7 : 100.0 :   1 :   Flushing socketGroup
  0.0 :   0.0 :   6.2 :   1 :   Processing messages
  0.0 :   1.4 :   6.3 :   1 :   Server state update
  0.0 :   0.5 :   6.7 :   1 :  Client

So you should see that the Avg (average) values roughly add up to 100% (give or take a little due to floating-point truncation). The Min and Max values tend to be of limited use - they're better if you examine profiler output over a specific section of the game by calling ResetAll at the beginning of the section, because otherwise startup in particular tends to produce some freak values (as you see there). So, I can immediately see that if I need to make my game faster, I ought to look into the Server State Update code first - as opposed to the message processor, which I thought would be the worst culprit (and it actually has an average time of <0.1%).

Settings

There comes a time in every engine's life when you, or the user, wants to customise something about it's operation. Perhaps you'd like to run it at a higher screen resolution, or turn off the sound. You have the luxury of the code, so you can go in and change what you want, and just recompile; but the user can't do that. It's time for a user-targeted settings mechanism.

So how do we expose our settings in a flexible, reusable way? We want them all in a central place - so that a CSettingsManager can load into them with ease - but we don't want to hard-code them (especially given that your game will probably add it's own settings to the engine's list). We need to handle different types of setting - floats, ints, strings, and so on. We also need to handle 'single' settings versus 'multiple' settings - that is, settings which contain a list of values, such as the maps in the map rotator for a multiplayer server.

We're going to use another utility object, a distant cousin of one introduced in the previous article: a Dator (loose relative of Functor). As you may have guessed from the name, a dator is an object which wraps a data member (as opposed to a Functor which wraps a function). Because we need to establish a single format for all data to be passed to the dator in (even if it then gets converted to another specific type), we're going to say that a dator accepts std::strings as input; it will then convert from that std::string to the format it's been templated to handle, using a trick borrowed from boost:lexical_cast, that you'll see in a minute.

It may also be necessary to have a list of dators for some reason. Because Dator<int> and Dator<float> are considered distinctly seperate types, we can't put them in a list together; instead we have to establish a common base class for all dators and have a list of that. Here's the base class:

class BaseDator : public IMMObject
{
protected:
  BaseDator(){}
  BaseDator(BaseDator &b){(*this)=b;}
public:
  virtual BaseDator &operator =(std::string &s)=0;
  virtual BaseDator &operator +=(std::string &s)=0;
  virtual BaseDator &operator -=(std::string &s)=0;
  virtual bool operator ==(std::string &s)=0;
  virtual bool operator !=(std::string &s)=0;

  virtual bool hasMultipleValues()=0;

  virtual operator std::string()=0;
};

You can see how all the operators take std::strings as parameters (and have return types, this time!), establishing it as the 'global type' for dators to transfer values in and out. There's also hasMultipleValues() - overridden by derived classes to indicate whether the dator works with a single value (like a normal variable) or a set of values (a std::list). Finally, there's an operator to convert the value back to a std::string for other uses. Pretty much all of these functions are pure virtual, and the constructors are protected: you're not going to be instantiating this directly, folks. :)

template<class T>
class Dator : public BaseDator
{
protected:
  T& target;
  T toVal(std::string &s)
  {
    std::stringstream str;
    str.unsetf(std::ios::skipws);
    str<<input;
    T res;
    str>>res;
    return res;
  }
  std::string toString(T &val)
  {
    std::stringstream str;
    str.unsetf(std::ios::skipws);
    str<<val;
    std::string res;
    str>>res;
    return res;
  }
public:
  Dator(T& t) : target(t) {}
  BaseDator &operator =(std::string &input)
    { target=toVal(input); return *this; }
  BaseDator &operator +=(std::string &input)
    { target+=toVal(input); return *this; }
  BaseDator &operator -=(std::string &input)
    { target-=toVal(input); return *this; }
  bool operator ==(std::string &s) { return (s==(std::string)(*this)); }
  bool operator !=(std::string &s) { return (s!=(std::string)(*this)); }
  operator std::string() { return toString(target); }

  bool hasMultipleValues() { return false; }

  AUTO_SIZE;
};

OK. That's the class for a dator which represents a single value (hasMultipleValues returns false). Look at the constructor and protected data member - we're working with references, which means that when you create a dator, you'll bind it to an existing variable - any assignments to the dator will result in assignments to the variable itself. That's why we'll hardly ever need to work with the dator when we know what type it is - we'll work with the variable it's bound to instead.

What's the std::stringstream stuff about, I hear you ask? That's the trick I mentioned, borrowed from boost::lexical_cast. You see, we can't just assign a std::string to an arbitrary type - there's no built-in conversion available. In C, we had to use the atoi()/atof()/atol() family of functions; we could, in theory, do that here, but that requires template specialisation, which is not much good (at least under MSVC) - it practically forces us to copy+paste code for each specialised type, which is exactly what the templates were there to avoid. So, what we do is take advantage of the fact that std::stringstream has overloaded input/output operators for most standard types; we write the string out to it using operator <<(std::string), and then read it back in again using operator >>(T &). The type-conversion is performed for us by the stringstream class. You can see how it works in the toVal() and toString() functions - it's reversible, too. If you try and set up a variable for a type that stringstream can't convert to, then you'll get a compile error, but it's possible to provide global overloads (for example, operator<<(std::string) isn't actually a member of the class, it's globally defined in the 'string' header).

template<class T>
class ListDator : public BaseDator
{
protected:
  std::list<T> &values;
  T toVal(std::string &s)
  {
    std::stringstream str;
    str.unsetf(std::ios::skipws);
    str<<input;
    T res;
    str>>res;
    return res;
  }
  std::string toString(T &val)
  {
    std::stringstream str;
    str.unsetf(std::ios::skipws);
    str<<val;
    std::string res;
    str>>res;
    return res;
  }
public:
  ListDator(std::list<T> &v) : values(v) { }
  BaseDator &operator =(std::string &s)
    { values.clear(); values.push_back(toVal(s)); return *this; }
  BaseDator &operator +=(std::string &s)
    { values.push_back(toVal(s)); return *this; }
  BaseDator &operator -=(std::string &s)
    { values.remove(toVal(s)); return *this; }
  bool operator ==(std::string &s)
    { return (std::find(values.begin(),values.end(),toVal(s))!=values.end()); }
  bool operator !=(std::string &s) { return !((*this)==s); }
  
  operator std::string() { return toString(values.back()); }
  operator std::list<T>&() { return values; }

  bool hasMultipleValues(){return true;}

  AUTO_SIZE;
};

And that's the class for a dator which can hold multiple values (hasMultipleValues returns true). It's pretty similar to the previous class - ideally, we'd use the previous class for this - but std::list doesn't provide +=, -=, or = operators so we can't (and my attempt to provide global overloads failed). Operator = will clear the list and add a single value; operators += and -= will add and remove values respectively. Operator == returns true if the list *contains* the given value - it could only be one of many though. The std::string type converter returns the last value added to the list (values.back()), and a function is there to get a reference to the list itself (though, like normal dators, you'll usually have direct access to the list without needing the dator).

How do you use these classes?

//we can have a variable of pretty much any type - let's pick int for simplicity
int someValue;
//we can then create a dator bound to it like this
CMMPointer< Dator<int> > dator=new Dator<int>(someValue);
//if we then assign to the dator...
(*dator)=std::string("5");
//the value of someValue should now be 5.

//using ListDators is pretty similar, as I said earlier
std::list<int> someValues;
CMMPointer< ListDator<int> > listDator=new ListDator(someValues);
(*listDator)=std::string("5");
(*listDator)+=std::string("6");
(*listDator)-=std::string("5");
//someValues should now have the single entry 6.

Dators will be useful later on for scripting, but right now, we're going to use them for our settings mechanism.

As you might expect, we need a Singleton-based Manager to handle all the settings for us.

class CSettingsManager : public Singleton<CSettingsManager>  
{
public:
  CSettingsManager();
  virtual ~CSettingsManager();

  void RegisterVariable(std::string &name, CMMPointer<BaseDator> &var);
  void SetVariable(std::string &name, std::string &value, int bias=0);

  void CreateStandardSettings();
  void DestroyStandardSettings();

  void ParseSetting(std::string str);
  void ParseFile(std::string filename);

protected:
  std::map<std::string, CMMPointer<BaseDator> > settingMap;
};

We've got the familiar Singleton syntax, a constructor and a destructor. RegisterVariable should be obvious - and that's where the dators come in. SetVariable has one parameter, 'bias,' that you might be wondering about, but we'll get to that in a minute. ParseSetting takes a single name=value expression, splits it up and passes it onto SetVariable; and ParseFile opens a given file and hands each line to ParseSetting (allowing you to build up 'settings files'). The only totally unexpected functions there are CreateStandardSettings and DestroyStandardSettings.

Now I don't like the StandardSettings mechanism. It feels messy, and like there should be a better way to do it. But as far as I can tell, there isn't. CreateStandardSettings is the function which creates all of the settings used by the engine itself - the renderer's screenX, screenY, and screenBPP, for example. I feel that creation of those should fall to the renderer (obviously), but we get a chicken-and-the-egg situation; we can't create the renderer until we've loaded in it's settings, but we can't load in it's settings till they've been registered. The one possible ray of hope is to use static object variables for the dators; but they couldn't be registered in the CSettingsManager because it needs to be created first, with a 'new CSettingsManager()' line. As Ned Flanders would say, it's quite a dilly of a pickle.

So the best we can do is provide this function, CreateStandardSettings, to be called when the CSettingsManager is created. Because it's called by the manager itself, there's no risk that the manager doesn't exist when it gets called. Also, we don't have to desert static variables completely - we can put static pointers and values into the renderer class, and have CSettingsManager store the dators there when they get created. We also, therefore, need to pair up CreateStandardSettings() with DestroyStandardSettings(), because those static pointers (which are smart pointers, of course) don't try and delete the dators they point to until the program is unloaded from memory (which is *after* the call to CollectRemainingObjects). DestroyStandardSettings will set all those static pointers to zero, so they don't try and call Release() on already-deleted objects.

We're about to look at some function code, but first I want to explain the 'bias' parameter passed to SetVariable. 'Bias,' in this situation, basically means 'Add/Remove.' When we're talking about list settings, we can't assign to them in the normal way; and flags need some kind of boolean expression for whether they should be set or unset. So, when dealing with list or flag settings, the parser requests that the name of the setting be prefixed with '+' or '-'; '+' for positive, 'enable' bias, and '-' for negative, 'disable' bias. They can still be specified for non-list, non-flag variables - they'll just be ignored - and if left off, a positive bias is assumed.

Here's the simple functions, from both CSettingsManager, and the couple of undefined CSetting functions:

CSettingsManager::CSettingsManager()
{
  settingMap.clear();
  CreateStandardSettings();
}

CSettingsManager::~CSettingsManager()
{
  DestroyStandardSettings();
}

void CSettingsManager::RegisterVariable(std::string &name,
                                        CMMPointer<BaseDator> &var)
{
  settingMap[name]=var;
}

void CSettingsManager::ParseFile(std::string filename)
{
  std::ifstream in(filename.c_str());
  if(!in.is_open())return; //couldn't open
  while(!in.eof())
  {
    char szBuf[1024];
    in.getline(szBuf,1024);
    ParseSetting(szBuf);
  }
}

//set up a couple of macros for the StandardSettings mechanism - just convenience
//jobs each macro takes the type of dator, the CMMPointer<> to store it in,
//the variable it's bound to, and the name the manager should use to refer to it.
#define SETTING(type, target, var, name) target=new Dator<type>(var); \
        RegisterVariable(std::string(name),CMMPointer<BaseDator>(target));
#define LIST(type, target, var, name) target=new ListDator<type>(var); \
        RegisterVariable(std::string(name),CMMPointer<BaseDator>(target));

void CSettingsManager::CreateStandardSettings()
{
  //empty for the time being
}

void CSettingsManager::DestroyStandardSettings()
{
  //also empty

}

OK, those are simple enough. The CSettingsManager constructor/destructor spends most of it's time setting up the StandardSettings; and RegisterVariable() just adds the pointer to the map. Let's move onto the two more complex functions, SetVariable and ParseSetting:

void CSettingsManager::SetVariable(std::string &name,
                          std::string &value, int bias)
{
  if(!settingMap[name])return; //setting doesn't exist
  if(settingMap[name]->hasMultipleValues())
  {
    std::list<std::string> valueList;
    valueList.clear();

    //check for semicolon-seperated values
    if(value.find(';')!=-1)
    {
      //split the string into semicolor-seperated chunks
      int first=0, last;
      while((last=value.find(';',first))!=-1)
      {
        valueList.push_back(value.substr(first,last-first));
        first=last+1;
      }
      valueList.push_back(value.substr(first));
    }else{
      valueList.push_back(value);
    }

    for(std::list<
        std::string>::iterator it=valueList.begin(); it!=valueList.end();
        it++)
    {
      if(bias>0)
      {
        (*settingMap[name])+=(*it);
      }else if(bias<0)
      {
        (*settingMap[name])-=(*it);
      }else{
        (*settingMap[name])=(*it);
      }
    }
  }else{
    //just assign the value
    (*settingMap[name])=value;
  }
}

void CSettingsManager::ParseSetting(std::string str)
{
  int bias=0; std::string name, value;
  //test for bias
  if((str[0]=='+')||(str[0]=='-'))
  {
    bias=((str[0]=='+')*2)-1; //+ maps to 1*2-1=1, - maps to 0*2-1=-1
    str=str.substr(1); //remove the first character from the string
  }
  //test for '='
  int eqPos=str.find('=');
  if(eqPos!=-1)
  {
    //there's an = sign in there
    //so split either side of it
    name=str.substr(0,eqPos);
    value=str.substr(eqPos+1);
  }else{
    //there's no equal sign
    //we use the bias to construct a boolean value
    //so that flags can be +flag (mapping to flag=1) or -flag (mapping to flag=0)
    name=str;
    char szBuf[5];
    sprintf(szBuf,"%i",(bias+1)/2);
    value=szBuf;
  }
  //set the variable
  SetVariable(name,value,bias);
}

Still fairly simple, just a little larger. The SetVariable function splits the value string down into its component parts (if they exist). Then, using the bias value, it adds or removes each part from the list. If, on the other hand, it's a simple non-list variable, it just sets the value. The SetVariable function is called by the ParseSetting function, which simply extracts the bias and splits the string using the '=' sign in the middle (if, again, it exists).

So what syntax should we use for our options, in the end? The syntax rules look something like this: Setting: [biasValue]valueName[=valueList] biasValue: +, - valueList: value[;valueList]

Where [] indicates 'optional,' and a comma indicates 'or.' If you want to test it, try creating a Dator of some basic type such as 'int', registering it with the CSettingsManager, passing a value to it through the command-line (MSVC users: Project->Settings->Debug->Program Arguments), and then logging that value back out. Play around; after all, this is a game engine! ;-)

The Kernel

That's right, it's that moment you've all been waiting for.. the heart of the engine, the Kernel, and its jaw-dropping task-management ensemble.

The Kernel, which the more old-school game developers amongst you might know better as the Game Loop, is the beating heart of the engine - almost literally. It 'pumps' the various things the engine needs to do at any given time, looping through them in a round-robin. Each thing the engine is doing - a 'task' - gets a single Update() call every frame, in which it does whatever it needs to do. The Kernel exits the loop when there are no more tasks. It's quite a lot like multithreading, if you're familiar with that - in fact, the term 'Kernel' is borrowed from OS terminology. (Combining the game kernel with multithreading is something you should probably avoid; however, the concept of multiple game kernels for multiprocessor systems, where each processor runs one kernel in its own thread, is something I quite like).

The first thing to do is to define a 'task.' A task can be any object which supports the Update() function, along with a couple of others - so the sensible approach is to use a base ITask class as an interface to be implemented. (However, because I'm totally insane, I'm going to use....a herring. Or maybe not.)

class ITask : public IMMObject
{
public:
  ITask(){canKill=false;priority=5000;}
  virtual bool Start()=0;
  virtual void OnSuspend(){};
  virtual void Update()=0;
  virtual void OnResume(){};
  virtual void Stop()=0;

  bool canKill;
  long priority;
};

There's a little more there than just an Update() function. The interface provides for all stages in a task's life - creation, pause, resume, and destruction. The creation function - Start() - is boolean, allowing the task to say, 'No, can't start right now,' so we don't get confused tasks being updated. OnSuspend() and OnResume() are used when pausing and resuming tasks (for example, you'd pause the AI task while the user is accessing the in-game menu), just to notify the task that it should release any high-demand resources and get itself ready for a little wait; and, conversely, that it should pick up its resources and carry straight on again. The Update() function we've already covered; which leaves the Stop() function, called when a task is being removed from the kernel's list. Note that the Start(), Update(), and Stop() functions are pure virtual - forcing derived classes to implement them - but the OnSuspend and OnResume classes are only regular virtual, meaning that you can override them if you want the notification, but you don't have to. Some tasks won't need to do anything to pause or resume - they just stop getting Update() messages for a while.

The last two parts of the class, canKill and priority, are used to manage thread lifetime and execution order, respectively. The task sets canKill to true when it's ready to be Stop()'d; and the kernel calls Update() on the tasks in order of their priority numbers, with the lowest numbers happening first. The order can be very important - your custom rendering task should have a higher priority than the 'flip buffers' task, but a lower priority than the 'clear back buffer' task, for example.

Now we preset the Kernel itself. It's a singleton, and only really provides task management functions:

class CKernel : public Singleton<CKernel>
{
public:
  CKernel();
  virtual ~CKernel();

  int Execute();

  bool AddTask(CMMPointer<ITask> &t);
  void SuspendTask(CMMPointer<ITask> &t);
  void ResumeTask(CMMPointer<ITask> &t);
  void RemoveTask(CMMPointer<ITask> &t);
  void KillAllTasks();

protected:
  std::list< CMMPointer<ITask> > taskList;
  std::list< CMMPointer<ITask> > pausedTaskList;
};

You've got your standard constructor/destructor, an Execute() function - the function which runs the task loop and exits when the game is shutdown, a load of task-management functions, and then two protected lists for ITask pointers - one for running tasks, and one for paused tasks.

Why have two seperate lists? The alternative is to have a flag bIsPaused in the ITask class, which you set to true when the task should pause, and so on. The problem with such a system is that it's very prone to abuse - anything could pause any task at any time. There's no guarantee that the OnSuspend or OnResume functions would be called. We would also incur a cost each loop, in that we'd have to check the flag; given that the actions of pausing/resuming tasks will be fairly rare, the cost of moving the pointer from one list to another is negligable.

CKernel::CKernel()
{
  SDL_Init(0);
  SDLNet_Init();
}

CKernel::~CKernel()
{
  SDLNet_Quit();
  SDL_Quit();
}

So far so simple. These functions bring up and shut down SDL, without initialisng any of its subsystems. Seemed like a good enough place to me, at least; you're welcome to move them if you can think of some place better.

int CKernel::Execute()
{

  while(taskList.size())
  {
    {
      PROFILE("Kernel task loop");

      std::list< CMMPointer<ITask> >::iterator it;
      for(it=taskList.begin();it!=taskList.end();)
      {
        ITask *t=(*it);
        it++;
        if(!t->canKill)t->Update();
      }
      //loop again to remove dead tasks
      for(it=taskList.begin();it!=taskList.end();)
      {
        ITask *t=(*it);
        it++;
        if(t->canKill)
        {
          t->Stop();
          taskList.remove(t);
          t=0;
        }
      }
      IMMObject::CollectGarbage();
    }
#ifdef DEBUG
    CProfileSample::Output();
#endif
  }

  return 0;
}

And there you have it; the game loop. You can see the task system, the Memory Manager, and the Profiler all coming into play - this is where garbage collection gets called and the profiler gets told to output its statistics (as the "Kernel task loop" is the top-level profiler sample). All tasks are updated, and then any dead tasks are removed from the system; this is repeated until all tasks are dead, one way or another. You can see some slightly weird constructions with the STL iterators - that's because there's a risk of the task being removed from the list (either to be deleted, or to be moved to the paused list) in the Update() function, which would mess up the iterator.

bool CKernel::AddTask(CMMPointer<ITask> &t)
{
  if(!t->Start())return false;

  //keep the order of priorities straight
  std::list< CMMPointer<ITask> >::iterator it;
  for(it=taskList.begin();it!=taskList.end();it++)
  {
    CMMPointer<ITask> &comp=(*it);
    if(comp->priority>t->priority)break;
  }
  taskList.insert(it,t);
  return true;
}

void CKernel::SuspendTask(CMMPointer<ITask> &t)
{
  //check that this task is in our list - we don't want to suspend
  //a task that isn't running
  if(std::find(taskList.begin(),taskList.end(),t)!=taskList.end())
  {
    t->OnSuspend();
    taskList.remove(t);
    pausedTaskList.push_back(t);
  }
}

void CKernel::ResumeTask(CMMPointer<ITask> &t)
{
  if(std::find(pausedTaskList.begin(),pausedTaskList.end(),t)
        !=pausedTaskList.end())
  {
    t->OnResume();
    pausedTaskList.remove(t);
    //keep the order of priorities straight
    std::list< CMMPointer<ITask> >::iterator it;
    for(it=taskList.begin();it!=taskList.end();it++)
    {
      CMMPointer<ITask> &comp=(*it);
      if(comp->priority>t->priority)break;
    }
    taskList.insert(it,t);
  }
}

void CKernel::RemoveTask(CMMPointer<ITask> &t)
{
  if(std::find(taskList.begin(),taskList.end(),t)!=taskList.end())
  {
    t->canKill=true;
  }
}

void CKernel::KillAllTasks()
{
  for(std::list<
      CMMPointer<ITask> >::iterator it=taskList.begin();
      it!=taskList.end();it++)
  {
    (*it)->canKill=true;
  }
}

That's the task-management API of the kernel. It covers all the basic things you can do to a task, including a KillAllTasks() function, which essentially causes the app to quit on the next task cycle. AddTask and ResumeTask keep the live tasks list sorted by priority, so that the highest-priority task (the one with the lowest number) is at the front and will therefore be executed first each cycle.

Th-th-th-that's all, folks

That's the whole of the foundation tier done. It's solid enough to support the rest of the engine, and if there are any problems it's centralised enough to make fixing things pretty simple. Next time we'll set up a few basic tasks (such as the video or input tasks), and look for the first time at how the engine integrates into a full program. Because what we've got so far is good, but doesn't do much without a main() function to start it all up...

Next time will also be the first time we do any significant work with SDL and FMOD, so make sure they're up to scratch as far as you're concerned. The addresses are http://www.libsdl.org/ and http://www.fmod.org/ for anyone who had forgotten.

Discuss this article in the forums

Date this article was posted to GameDev.net: 7/5/2003
(Note that this date does not necessarily correspond to the date the article was written)

See Also:
Featured Articles
General