Introduction

This series is intended to give the reader the information necessary to create a scripting system of his/her own from the ground up. The reasons one would choose to create such a system from scratch are many, most of which are analogous to reasons one would create anything else from scratch, such as a 3D engine. Most importantly, in my opinion, is that it's a valuable learning experience. After all, who doesn't want to learn? Certainly nobody who is taking the time to read this article!

Many of the articles devoted to scripting that I've seen in the past do not do enough to cater to the practical-minded programmers. These are the programmers who wish to learn about how the entire process of bringing a script from a high-level language down to some procedural format relates to their own programming efforts. They want to design a system that suits their needs, without being clouded by complexity. As such, this series will be geared towards enabling a programmer to fashion his/her own system, and not be dependent on handout code. Hopefully this series will be helpful to those who have found these other articles lacking as described. (No offense intended to anyone who may have written an article on scripting. Please don't take this personally.)

In addition, this series will include example code snippets written in C++. It's recommended that you be at least familiar with the basics of C++ classes.

Basic Format of this Series

The format of this series will be somewhat reversed with respect to the seemingly "normal" approach. I will not begin with the high-level language and end with the low-level implementation. Rather, I will be using a bottom-up approach, as it is more natural to develop a scripting system in this manner. A significant advantage of this approach is that the code immediately produces results, allowing problems to be found much more easily, and before they become serious. This is in direct contrast to tutorials which would begin with language theory, and ask that the programmer maintain faith that eventually somewhere down the line, everything will work itself out, and be free of bugs.

This first article will provide a simple overview to gain some perspective on what the purpose of a scripting system is, the problems it is usually intended to solve, and possible implementations. A simple example will be provided, and built upon in future articles.

A rough outline of future articles is as follows:

First, more fully described low-level characteristics of the form of implementation that I will be writing about.
They will then move on to mechanisms for "embedding" this system into an already existing game or application system.
The topics involving language theory, parsing, and compiling will possibly conclude the series.

This outline is considered rough to leave options open for new things, depending mostly on feedback to this first article. So please, let me know what you think.

An Overview

Most useful programs, not just games, are not completely isolated systems; without some form of input, a program's capabilities are normally quite static (limited). Think of the difference between a "Hello World!" program, and a program that asks for your name, certain personal traits (which it may then process in some manner), and then spits out some kind of analysis. You could not achieve the same effect without taking some form of input unless you went to some ridiculous effort, such as creating a different program to suit each user's needs.

This would be insane.

This is why most applications are designed as a structure of rules and pipelines through which information flows and is processed, from the input data to the resulting output. It is akin to a machine. This is why the term "engine" is thrown around so often.

For many purposes, this is enough to obtain the functionality you desire from your application or game. But what happens if you want to be able to modify the rules? From a development standpoint, modifying rules which are hard-coded into an application can be very annoying, and in some cases bug-inviting. The annoyance can come in many forms, not the least of which is the need to recompile all components dependent on the source of the changes. This is where scripting comes in.

The main purpose of scripting from a development standpoint is to provide a way to make your application's "rules structure" as dynamic as possible. Game-dependent logic and data, therefore, become prime candidates for scripting.

However, a script has to run on top of code itself. There is additional processing cost for every procedure executed in a script, on top of the script itself. Because of this, scripted instructions inherently run more slowly than the hard-coded kind. This would currently make multimedia components better candidates for remaining hard-coded, although scripts can still be appropriately used to perform some kind of initialization of such components.

Forms of Scripting

One possible way of implementing a system capable of executing scripts is to create a language to be interpreted on the fly. Because there isn't much preprocessing done to generate a format of instructions more tuned to the procedural nature of a computer, interpreters are, as far as I know, one of the slowest implementations available.

The second way is to go a step further, doing as much preprocessing of a script as possible before execution, and then running it through a virtual machine. In this case the instructions are generated as bytecode, with each opcode representing a definitive action to perform, possibly accompanied by some data to use in the process. This idea of bytecode is very useful in other areas of game programming as well, such as in the networking aspect, where the information received from a server can be used to drive the state of the client's game.

Furthermore, the idea of using data as code allows for the code to be reordered on the fly according to some analysis of how the code is currently being run, dictating runtime optimizations. I've heard that Java is capable of this, but currently, this topic will be beyond the scope of this article series.

The scripting system implemented in this series will be of the virtual machine variety.

Starting from Familiar Ground

This first example is designed with simplicity in mind, so as not to distract from getting the system up and running. You will want to create a console application to use this example code as provided. This example will be object-based, but not necessarily object-oriented; the classes can therefore easily be replaced by structures for those dealing with a pure C mentality.

Let's say you have a very basic desire to see your computer speak on command. You may request that it talk a specified number of times for each execution of a particular script. In its simplest form, you would write such a script in an unrolled form. For example a script that talks twice, and then knows it has finished its execution:

talk
talk
end

Pretty basic for now, but it's enough to see some results and know you're on track. We will enumerate these operations:

enum opcode
{
  op_talk,
  op_end
};

We may choose to pair opcodes with data to make them more useful later on. It would be in our interest to make an abstraction now, so that we don't have to change a lot of code later on when we decide to encapsulate the pairing as an instruction:

// the basic instruction, currently just encapsulating an opcode
class Instruction
{
public:
  Instruction(opcode code) : _code(code)	{}
  opcode Code() const         { return _code; }
private:
  opcode	_code;
  //char*	_data;  // additional data, currently not used
};

Reasonably, a script is then a collection of these instructions. Because the list of instructions generally will be formed during an initialization process, it's ok to use an arrayed form for implementation, such as a vector. The arrayed form is also useful in later optimizations, and for random access:

// the basic script, currently just encapsulating an arrayed list of instructions
class Script
{
public:
  Script(const std::vector<Instruction>& instrList)
    : _instrList(instrList) {}
  const Instruction* InstrPtr() const { return &_instrList[0]; }
private:
  std::vector<Instruction>	_instrList;
};

Given a pointer to the beginning of a list of instructions, all that remains necessary is a procedure for iterating through the list and executing each instruction:

// note that _instrPtr must point to a valid list of instructions
Instruction* _instr = _instrPtr;	// set our iterator to the beginning
while (_instr)	// the end operation will set _instr to 0
{
  switch(_instr->Code())
  {
  case op_talk:
    std::cout << "I am talking." << std::endl;
    ++_instr;    // iterate
    break;
  case op_end:
    _instr = 0;  // discontinue the loop
    break;
  }
}

For the sake of convenience, you will probably want to encapsulate this functionality into its own class, and allow it to internally manage the instruction lists (as scripts). This would be the virtual machine, provided with useful management utilities for loading and selecting scripts:

// rudimentary virtual machine with methods inlined for convenience
class VirtualMachine
{
public:
  VirtualMachine()
    : _scriptPtr(0), _instrPtr(0), _instr(0), _scriptCount(0) {}
  // a very basic interface
  inline void Execute(size_t scriptId);
  size_t Load(const Script& script)   { return AddScript(script); }
private:  // useful abstractions
  // pointers used as non-modifying dynamic references
  typedef const Script*       ScriptRef;
  typedef const Instruction*  InstrRef;
private:  // utilities
  size_t AddScript(const Script& script) // add script to list and retrieve id
  {_scriptList.push_back(script); return _scriptCount++;}
  void SelectScript(size_t index)    // set current script by id
  {assert(index < _scriptCount);  // make sure the id is valid
  _scriptPtr = &_scriptList[index];
  _instrPtr = _scriptPtr->InstrPtr();}      
private:  // data members
  std::vector<Script> _scriptList;
  ScriptRef           _scriptPtr;    // current script
  InstrRef            _instrPtr;     // root instruction
  InstrRef            _instr;        // current instruction
  size_t              _scriptCount;  // track the loaded scripts
};

The virtual machine maintains a list of scripts that have been loaded as a vector. It also internally maintains a count of the number of scripts so that an offset (id) into the vector can be returned upon loading a script, allowing it to be stored. This makes it very easy to execute a pre-loaded script by simply passing that offset to the machine.

Although currently unnecessary, it also keeps track of the current script executing. This can be useful if the script contains more than just a list of instructions, as it will in a future article.

Its Execute() method uses the procedure previously described:

void VirtualMachine::Execute(size_t scriptId)
{
  SelectScript(scriptId);  // select our _instrPtr by script ID
  _instr = _instrPtr;      // set our iterator to the beginning
  while (_instr)
  {
    switch(_instr->Code())
    {
    case op_talk:
      std::cout << "I am talking." << std::endl;
      ++_instr;  // iterate
      break;
    case op_end:
      _instr = 0;  // discontinue the loop
      break;
    }
  }
}

A side note about OOP:

Using an Object Oriented approach, you could eliminate this switch statement and derive specific instruction types from a base instruction type with some kind of virtual Process() command. To add support for a new instruction, you would simply inherit from a base instruction class, and isolate its specific processing to that class. Lists of these instructions would of course have to support polymorphism; a vector of pointers to instructions, or some equivalent.

This extensible approach can be very convenient, and is worthy of some investigation. In my own toy experiment, however, it ran at roughly 1/3rd the speed of my non-OO VM system, which is a pretty significant performance hit. Later on in development, you will probably want to optimize the heck out of your VM's processing loop. The overhead introduced with the OO version, at least in my own experience, is not worth it. I encourage the curious to explore this some more, however, as I probably did not perform the best test possible. And please give me some feedback if you do! That's it for the side note.

Now, let's see how we would use these components to create and execute a script which talks twice, and then ends:

VirtualMachine vm;

// build the script
vector<Instruction> InstrList;
InstrList.push_back(Instruction(op_talk)); // talk twice
InstrList.push_back(Instruction(op_talk));
InstrList.push_back(Instruction(op_end));  // then end
Script script(InstrList);

// load the script and save the id
size_t scriptID = vm.Load(script);

// execute the script by its id
vm.Execute(scriptID);

Conclusion

In the next article, I will probably get into some more interesting topics regarding instruction data, and some form of registered data (variables). This will lead into some simple mathematical functionality at the very least.

I didn't want to get too buried in example code this time around as the introduction was quite lengthy. Hopefully this was at least enough to be of some inspirational value until a more comprehensive second article. The main thing to keep in mind is that you want an efficient implementation, or else you'll end up with a system that drains all the processing power needed for your game. You will want algorithms that allow for optimizations later on for this purpose, but of course don't sacrifice the clarity of your code too early.

I'd like to hear about any questions, criticism, preferences, advice, mistakes i made, scolding (if deserved) you'd like to express. I'll keep an eye on the forum discussion, but you can always email me: glr9940@rit.edu

Discuss this article in the forums

Date this article was posted to GameDev.net: 1/14/2002
(Note that this date does not necessarily correspond to the date the article was written)