Creating a Scripting System in C++ Part V: Functionality
by by Greg Rosenblatt

Get the code for this article here

Agenda

During the course of this article I will illustrate a way to implement the following:

  • primitives to support the abstraction of a high-level function
  • calls to hard-coded "external" functions

Scripted functions will simply allow your scripts to become more modular, while externally called functions will allow your scripting system to interact with an existing code-base. The second method is how we will create a system that can grab an interface of function pointers from something like a game engine, and then be able to operate this system by calling scripts.

Improving the Stack

As I said at the end of the last article, the tools for expressing the higher-level concept of functions in scripts have already been provided. Although this is true, the ScopeStack can be enhanced to make things easier for us when trying to pass parameters to a new scope without having to hack around the system. One way to do this is to introduce a second parameter to its PushScope() member which will allow the new scope pushed to include variables pushed onto the stack during the previous scope.

In ScopeStack "would" be:

void PushScope(size_t scopeSize, size_t numParams)
{
    assert(_stack.size()>=numParams);       // make sure our stack is as large as we think
    _indices.push_back(_stackBase);         // store the current index
    _stackBase = _stack.size()-numParams;   // get new scope offset
    _stack.resize(_stackBase+scopeSize);    // accommodate new scope data
}

This is good, except that it could cause problems later on if we are not careful. The danger is hinted at by the assertion. The problem here is that if the proper number of variables intended to be parameters aren't pushed onto the stack before calling PushScope(), the new scope will actually leak into data that is strictly intended for use by the old scope. Furthermore, when this scope is popped off the stack, the amount of data the scope had leaked into will be lost. So let's fix the inherent problem in the ScopeStack to deal with this issue.

We will make use of an additional data member in the ScopeStack to store the next base offset for the stack in the event of a scope being pushed:

size_t	_nextBase;

A few changes to the scope interface's member functions will now determine the starting offset for the next scope to be pushed onto the stack initially, and every time a new scope is pushed. We have succeeded in squashing this possible bug preemptively.

Actual ScopeStack code:

// scope interface
void SetInitialScope(size_t scopeSize)
{
    assert(_stack.empty() && _indices.empty()); // initial scope should only be set once
    _nextBase = _stackBase+scopeSize;           // preemptively set the next offset
    _stack.resize(_nextBase);                   // accommodate initial scope data
    _indices.push_back(_stackBase);             // register the base offset
}        
void PushScope(size_t scopeSize)
{
    _stackBase = _nextBase;                 // get new scope offset
    _indices.push_back(_stackBase);         // register this offset
    _nextBase = _stackBase+scopeSize;       // store the next index
    _stack.resize(_nextBase);               // accommodate new scope data
}
void PopScope()
{
    assert(!_indices.empty());      // don't try to pop a non-existent scope
    _stack.resize(_stackBase);      // shrink back to the end of the former scope
    _nextBase = _stackBase;         // store what the next index would be
    _indices.pop_back();            // take that index off the index stack
    _stackBase = _indices.back();   // grab the base index for the former scope
}

With the changes to our stack, we will need to modify the execution's function to expose its stack state. The difference is that the end of the current scope is now found using the next depth, rather than the current one. If there is no next depth, the end of the stack is used. There was also a bug where the value output was not being taken relative to the first depth (index 0), as it should have been. Well, no harm done. Here is the modified function:

void VirtualMachine::Execution::ExposeStackState(ostream& out) const
{
    size_t numScopes = _stack.ScopeDepth();
    size_t stackSize = _stack.Size();
    size_t index, label, curDepth = 0, curEnd = 0;
    for (index = 0, label = 0; index < stackSize; ++index, ++label) // for the entire stack
    {
        if (index == curEnd)    // for every new scope in the stack
        {
            label = 0;
            out << "Scope Depth: " << curDepth << endl; // output the scope depth
            if (curDepth < numScopes)
            {
                if (curDepth+1 < numScopes)
                    curEnd = _stack.GetScopeIndex(curDepth+1);  // get end of this scope
                else
                    curEnd = _stack.Size(); // end of stack
                ++curDepth; // advance to next scope depth
            }
            else
                curEnd = 0;
        }
        out << "    " << label << ": " << static_cast<int>(_stack.GetAtDepth(index, 0)) << endl;  // output each value
    }
}

To help during debugging, I have also introduced an instruction that will expose the stack state at our whim. It simply calls ExposeStackState() with the std::cout object as the output stream:

// expose state
case op_expose_state:
    ExposeStackState(cout);
    ++_instr;
    break;

Scripted Functions

The first step to supporting scripted functions is to add instructions that make use of the stack's scoping capabilities:

case op_push_scope:
    _stack.PushScope(_instr->Data()[0]);
    ++_instr;
    break;
case op_pop_scope:
    _stack.PopScope();
    ++_instr;
    break;

The instruction to push a new scope onto the stack will require a single instruction parameter to describe the size of the scope being pushed.

So how exactly does this describe a higher-level function? Let's say we have the following useless function in a higher-level language:

void DumbFunc(int param1, int param2)
{
	int first, second, third;
}

And we make a call to it somewhere:

DumbFunc(11, 7);

Such a function requires a stack scope that can support space for both parameters, as well as three additional variables (a total of five variables). The calling code needs to push the values of 11 and 7 onto the stack for use (or non-use, as the case may be) in DumbFunc's code. Our entry into such a function from calling code at this point might look like the following in psuedo-script:

push_const 11
push_const 7
jump DumbFunc
CallingCodeDestination:
...

DumbFunc:
push_scope 5
// DumbFunc does nothing
pop_scope
jump CallingCodeDestination

Before the scope is popped, the stack from left to right should now look like this:

11 7 X X X

Where X is an undefined value. The undefined values are expected because DumbFunc doesn't initialize the values for its other three variables.

Now that the scoping issues are handled, there is another problem to solve. How does the function know where to jump in the calling code? Unless you are inlining, which I will discuss in a later article, you would certainly like to have only one instance of a function's code within your script, rather than a duplicate for every single time you call the code (where each duplicate would have a unique returning jump destination). Duplicates would make your script larger than necessary.

One thing that might come to mind initially is to store another InstrRef which is set to hold the destination before jumping to the function code. The function would then jump to this location. This would work if you never had functions that called more functions, but that is a restriction we can do without.

Again, a stack provides a desirable solution. With a stack, we can trace the path we take like in a manner that is similar to unraveling a string while exploring a maze. As long as we continue to push the desired destination point onto such a stack before entering the function code, the function can then jump back to that destination later. Subsequent function calls will push a new destination onto the stack in a recursive fashion, rather than corrupting its parent's destination.

Here I provide an implementation for such a call stack:

// traces the call path through a script    
class CallStack
{
public:
    CallStack(InstrRef& current) : _instr(current)  {}
    void Call(InstrRef funcBegin)
    {
        _stack.push_back(_instr+1); // set the return point to the next instruction
        _instr = funcBegin;         // set current instruction to the function starting point
    }
    void Return()
    {
        assert(!_stack.empty());
        _instr = _stack.back();     // return to proper instruction in calling code
        _stack.pop_back();
    }
private:
    std::vector<InstrRef>   _stack; // stores function return destinations
    InstrRef&               _instr; // reference to current instruction
};

As you can see, it requires that a reference to the root and current instruction registers be passed to it during creation. The CallStack will be placed in the Execution class, which will construct the CallStack using its instruction pointers:

// construct callstack with current instruction
Execution() : _scriptPtr(0), _instrPtr(0), _instr(0), _callStack(_instr)    {}

InstrRef            _instr;     // current instruction
CallStack           _callStack; // traces call path

In our pseudo-script, our new concept would be expressed like this:

push_const 11
push_const 7
call_func DumbFunc
...

DumbFunc:
push_scope 5
// DumbFunc does nothing
pop_scope
end_func

The call_func instruction would use the CallStack's Call(), while end_func would use its Return() function. A handler for call_func would require a single data parameter that stores the position of the function to be called. So let's create these two new instruction handlers:

case op_call_func:
    _callStack.Call(_instrPtr + _instr->Data()[0]);
    break;
case op_end_func:
    _callStack.Return();
    break;

Simple enough. Note how the instruction offset is added to the root instruction pointer when being passed to Call(). It's very similar to the jump instructions created last time.

Now suppose DumbFunc were to return an int:

int DumbFunc(int param1, int param2)
{
	int first, second, third;
	return 1;
}

With the way we currently have things set up, we will need a way to place the value returned at the top of the stack after returning from the function call. To avoid the use of hacks, a simple solution is to create a general-purpose register that can store the returned value during stack state changes.

In class Execution:

int	_register;  // general purpose register

To make use of it, we simply need instructions to handle access:

// registers
case op_store:
    _register = _stack.Top();
    _stack.Pop();
    ++_instr;
    break;
case op_load:
    _stack.Push(_register);
    ++_instr;
    break;

Currently, we only need to support this single register. Later on, if we find a need for more, we can always adjust these handlers. The op_store instruction will store the value at the top of the stack in a register, and pop it off. Then op_load can be used to push the register's value back onto the stack at a later time.

The new DumbFunc pseudo-script would look like this:

push_const 11
push_const 7
call_func DumbFunc
load	// if we want to use the return value
...

DumbFunc:
push_scope 5
push_const 1	// using a return value of 1
store		// save the return value
pop_scope
end_func

Recursive Test

We now have what it takes to implement a recursive factorial. Not that a recursive factorial algorithm is to be preferred over an iterative version, but such an algorithm can still test that the system can now make recursive function calls. At a high level, our function would look like this:

int Factorial(unsigned int input)
{
	if (input > 1)
		return input * Factorial(input-1);
	return 1;
}

We would like to call this function with a particular input parameter, and then output the result. We have to translate this into something our system will understand. At the outermost scope, our program looks like this:

push_const input
call_func factorial
load
output
end

The input value is pushed onto the stack to be used as a parameter as the factorial function is called. The return value is placed onto the stack, and then shown to us. The tricky part is the factorial function itself. Using the same method for converting from high-level to our machine code, we will come up with something like this:

factorial:
push_scope 1
push_var 0
// if (input > 1)
push_const 1
greater
jump_if_false A
// Factorial(input-1)
push_var 0
push_const 1
subtract
call_func factorial
load
// input *
push_var 0
multiply
jump B
A:
// 1
push_const 1
B:
// return
store
pop_scope
end_func

The factorial of 7 should come out to be 5040. So we will put everything together with 7 as the input parameter, and resolve the label offsets. (Note the numbering of the lines in this example to more clearly show the instruction indices.)

0:	push_const 7
1:	call_func 5
2:	load
3:	output
4:	end
5:	push_scope 1
6:	push_var 0
7:	push_const 1
8:	greater
9:	jump_if_false 18
10:	push_var 0
11:	push_const 1
12:	subtract
13:	call_func 5
14:	load
15:	push_var 0
16:	multiply
17:	jump 19
18:	push_const 1
19:	store
20:	pop_scope
21:	end_func

When we execute this script we are pleased to see that the output is 5040, as expected.

External Functions

One of the most important things for a scripting system to be able to do is bind with code that you have written to provide non-trivial and/or high-performance functionality for an application. Games and other multi-media applications normally require such functionality. Creating hard-coded instruction types to deal with one particular external system or another is not very flexible, and will prevent you from easily reusing the same scripting system for other purposes. A consistent method of calling external code is to be preferred, and function pointers are a good way to achieve this.

Callbacks

The relationship between the virtual machine and some external system will be described through callbacks that make use of pointers to member functions of the external system. These callbacks, as well as a reference to an external system, will be stored in the virtual machine, to be called by scripts that use a set of calling instructions, which we will create in the next section.

To illustrate a simple, yet flexible, callback system, the member functions called will take an argument vector as a parameter and possibly return some value. This is much like the common "int main(int argc, char* argv[])", except the argument count will be embedded in the vector's size(). A generic solution to defining such callbacks follows:

template <class T, typename ReturnType, typename ArgType>
class Callback
{
public:
    typedef std::vector<ArgType>    ArgVector;
    typedef ReturnType (T::*Func)(const ArgVector&);
    explicit Callback(Func f) : _f(f)   {}
    ReturnType operator()(T& t, const ArgVector& argv) const   { return (t.*_f)(argv); }
private:
    Func    _f;
};

template <class T, typename ReturnType, typename ArgType>
Callback<T, ReturnType, ArgType> make_callback(ReturnType (T::*func)(const std::vector<ArgType>&))
{
    return Callback<T, ReturnType, ArgType>(func);
}

These templates form a simple interface for creating callbacks from member function pointers, and then calling them through a given object reference. The return and argument types can be chosen at our discretion. Because the Callback constructor is explicit to avoid implicitly creating callbacks from function pointers, the "make_callback" template is useful for constructing such callbacks without having to redundantly define the template parameters of the particular Callback being created. The reasoning is that the compiler should already know what kind of callback to create given a particular function pointer.

Using the Callbacks

Now that we have these callbacks, it's time to put them to use. The first thing to do is establish this relationship between the virtual machine and an external system. To do this we will template the virtual machine according to a SystemType. Some clerical changes have to be made to foster this, including moving the virtual machine code from the cpp into the header. We can then define a "Command" as a Callback in terms of the SystemType. For this example, we will be using a callback type that takes integers as arguments and returns an integer, so we can also define the argument vector to save us some typing:

template <class SystemType>
class VirtualMachine
{
private:
    // example system command taking int arguments, and returning an int
    typedef Callback<SystemType, int, int>  Command;
    typedef std::vector<int>                ArgVector;
. . .

When instantiating a virtual machine, we will pass it a system reference to bind itself to, as well as the system's command interface in the form of a list of function pointers. These pointers will then be consolidated into a list of commands. The straightforward procedure for creating system callbacks uses the "make_callback()" utility defined earlier:

VirtualMachine(SystemType& system, const Command::Func funcList[], size_t numFuncs)
    : _system(system)    { CreateSystemCallbacks(funcList, numFuncs); }

void CreateSystemCallbacks(const Command::Func funcList[], size_t numFuncs)
{
    for (size_t i = 0; i != numFuncs; ++i)
        _commandList.push_back(make_callback(funcList[i]));
}

The last thing to implement purely with the virtual machine itself is a way to call one of its stored commands by ID, and passing it an argument list. Needless to say, when we begin using a higher-level language, system call IDs will be resolved during compilation. This functionality makes straightforward use of the Callback:

template <class SystemType>
int VirtualMachine<SystemType>::CallCommand(size_t id, const ArgVector& argv)
{
    assert(id < _commandList.size());
    return _commandList[id](_system, argv);
}

A few changes now have to be made to the virtual machine's execution class. First, in order to be able to make a system call through the virtual machine, it needs to be passed a reference to the machine when it is told to execute, which is simple enough:

void Execute(VirtualMachine& vm, ScriptRef scriptPtr);

Using new instructions that will be provided shortly, a script will be able to pass variable or constant arguments to an argument vector stored in the execution. Using a provided reference to a virtual machine, and this argument vector member, a system call by ID is simple to make from the execution:

int CallSystemCommand(VirtualMachine& vm, size_t id)  { return vm.CallCommand(id, _argv); }

Finally, to be implemented are three new instructions for working with system commands:

// system commands
case op_pusharg_const:
    _argv.push_back(_instr->Data()[0]);
    ++_instr;
    break;
case op_pusharg_var:
    _argv.push_back(_stack[_instr->Data()[0]]);
    ++_instr;
    break;
case op_call_command:
    _register = CallSystemCommand(vm, _instr->Data()[0]);
    _argv.clear();
    ++_instr;
    break;

The first two instructions resemble those used for pushing variables or constants onto the stack. The main difference is the destination of the push, which is now the argument vector. The next instruction simply calls the system command through the execution's interface, and stores the result in the general-purpose register. The argument vector is then cleared. Use of these instructions is quite simple.

This is an example of calling a command with an ID of 0, taking two parameters. The first parameter is the value 5, and the next is the value of the variable at index 3. The result is then placed on the stack (which is optional):

pusharg_const 5
pusharg_var 3
call_command 0
load

Testing Commands

To properly test this new implementation, we will need a simple example system to bind with a virtual machine. The system I provide is simply used to illustrate the processes of binding and using. The functionality provided by it will be essentially trivial. The class doesn't even make use of any data members. One command will draw a certain number of asterisks (stars) with a defined spacing between each. The next will compute a factorial (Another factorial test! Yay!) hard-coded. Notice that the number of functions provided to the virtual machine as an interface is enumerated. This is simply one dirty way of ensuring that a proper number of function pointers are passed:

class System
{
public:
    enum { num_commands = 2 };
    // vm interface
    int DrawStars(const std::vector<int>& argv)
        { if (argv.size() < 2) return -1; OutputStars(argv[0], argv[1]); return 0; }
    int CalcFactorial(const std::vector<int>& argv)
        { if (argv.size() < 1) return -1; return Factorial(argv[0]); }
private:
    void OutputStars(size_t numStars, size_t padding)
    {
        std::string padString;
        size_t i;
        for (i = 0; i < padding; ++i)
            padString += ' ';
        for (i = 0; i < numStars; ++i)
            std::cout << '*' << padString;
        std::cout << std::endl;
    }
    int Factorial(size_t input)
    {
        // initialize
        int counter = input;
        int result = 1;
        // iterative loop
        while (counter > 1)
        {
            result *= counter;
            --counter;
        }
        // output
        return result;
    }
};

The only changes necessary in the main.cpp to make use of a system is to instantiate one, provide an array of function pointers to its interface, and alter the declaration of the virtual machine:

// typedef for simplification
typedef Callback<System, int, int>::Func SystemFunc;
SystemFunc commandList[System::num_commands] =
{
    &System::DrawStars,
    &System::CalcFactorial
};

System system;
VirtualMachine<System> vm(system, commandList, System::num_commands);

Two example scripts follow, one to test each of the system commands.

Starcommand:

// draws 7 stars with a spacing of 2
pusharg_const 7
pusharg_const 2
call_command 0
end

Factorialcommand:

// computes the factorial of 9 and displays the result (which should be 362880)
pusharg_const 9
call_command 1
load
output
end

Conclusion

The most important parts of our run-time system have now been completed. Our virtual machine's capability for binding with other systems is very powerful, and has lots of avenues that can be explored.

Inlining of script functions is a topic that I would eventually like to cover, but in a part related to compiler optimizations. It doesn't really make sense to cover it yet, as writing a function to be inline itself in a script is trivial, and we've been doing it all along. The interesting things come when compiling code from a higher-level language.

The callback solution introduced in this part of the series could have made use of implementations found in other libraries, such as the popular boost. Such a route was not taken as such generic libraries often take into account everything under the sun, which would have certainly obfuscated the topic being conveyed.

Currently, the utilities for binding a virtual machine to an external system are a bit unwieldy, and require a bit of diligence to make sure the system's interface is passed appropriately. Perhaps we can improve upon this a little bit in the future.

The next article will probably begin dealing with creating a compiler. To reduce the learning curve of this series, the language to be compiled will probably be very C-like. Further development of the virtual machine itself may be continued at a later time, though the emphasis of the next wave of articles will be primarily on the compilation process.

As always, feel free to use the forum discussion, or contact me by email: glr9940@rit.edu

Discuss this article in the forums


Date this article was posted to GameDev.net: 1/28/2003
(Note that this date does not necessarily correspond to the date the article was written)

See Also:
Featured Articles
Scripting Languages

© 1999-2011 Gamedev.net. All rights reserved. Terms of Use Privacy Policy
Comments? Questions? Feedback? Click here!