Creating a Scripting System in C++ Part III: Dynamic Loading
by Greg Rosenblatt

A Small Detour

In this article, I will be taking a break from developing the runtime component of the scripting system in order to deal with more pressing issues. Although I would have liked for the issues dealt with in this article to be minor enough to allow further development afterwards, they take quite a bit of explanation. This article is a bit larger than the last two without any additional development, and would grow to be monstrous with anything else attached. For this reason, I have opted to set these issues apart, and continue developing the virtual machine next time.

Cleaning Up

The first thing to do is to fix a little problem from last time. The Instruction class did not support proper copy-on-write, which caused problems for some, while not even being known to me. It can be corrected rather easily, as I had shown in the forum discussion for the second article:

// the basic instruction
class Instruction
{
public:
    Instruction(opcode code) : _code(code), _datsize(0), _data(0) {}
    Instruction(opcode code, const char* data, size_t dataSize)
        : _code(code), _datsize(dataSize), _data(new char[dataSize])
    { memcpy(_data, data, dataSize); }  // we must store our own internal copy
    Instruction(const Instruction& instr)
        : _code(instr._code), _datsize(instr._datsize), _data(new char[_datsize])
    { memcpy(_data, instr._data, _datsize); }
    void operator=(const Instruction& instr)
    { delete[] _data;
      _size = instr._size;
      _data = new char[_size];
      memcpy(_data, instr._data, _datsize); }
    ~Instruction()  { delete[] _data; }   // and we must then clean up after it
    opcode Code() const         { return _code; }
    const char* Data() const    { return _data; }
private:
    opcode  _code;
    size_t  _datsize;
    char*   _data;  // additional data
};

One thing you should notice is that these additions now make up just about 50% of the Instruction class. It's bloated. We should improve this situation by separating the copy-on-write functionality from Instruction. We do this by creating a simple Buffer class that will encapsulate the data.

// data buffer to encapsulate copy-on-write functionality
class Buffer
{
public:
    Buffer() : _size(0), _data(0)   {}
    Buffer(const char* data, size_t size)
        : _size(size), _data(new char[_size])
    { memcpy(_data, data, _size); }
    Buffer(const Buffer& buf) : _size(buf._size), _data(new char[_size])
    { memcpy(_data, buf._data, _size); }
    ~Buffer()   { delete[] _data; }
    void operator=(const Buffer& buf)
    {
        delete[] _data;
        _size = buf._size;
        _data = new char[_size];
        memcpy(_data, buf._data, _size);
    }
    const char* Data() const    { return _data; }
private:
    size_t  _size;
    char*   _data;
};

// the basic instruction
class Instruction
{
public:
    Instruction(opcode code) : _code(code)  {}
    Instruction(opcode code, const char* data, size_t dataSize)
        : _code(code), _data(data, dataSize)  {}
    opcode Code() const         { return _code; }
    const char* Data() const    { return _data.Data(); }
private:
    opcode  _code;
    Buffer  _data;  // additional data
};

The Instruction class is now more concise, and still maintains its interface and functionality.

This now leads us to the second part of tidying up. The file is somewhat cluttered. We should move the new Buffer class to a separate file, since it is more of a supporting utility. We can also move some of the larger inlined functions to a source file..

Lastly, as I mentioned at the end of the previous article, the code presented in main() is becoming a nightmarish hack. It's definitely time to introduce a utility for loading scripts dynamically. Ultimately, we will want to be able to load a script from file. However, before we deal with file handling, we will need to develop a parser to properly read in lines of text as script instructions. Simple console I/O will suit us best for this at the moment, since we do not want to introduce too many new things untested at once, otherwise we run the risk of having errors that are harder to trace.

To set up our testing, we can use the getline() functionality of cin to read an entire line of input into a buffer at once:

const int BUFFER_SIZE = 256;

int main()
{
    char buffer[BUFFER_SIZE];

    // begin reading text lines from user
    cout << "Input script instructions:" << endl;
    cin.getline(buffer, BUFFER_SIZE);

    // an input processing loop will follow once we can scan the input

    return 0;
}

Loading From Text

We now must decide how we are going to process scripts from a simple text-based format. The format will use a syntax similar to the one illustrated in examples from the previous articles. The only real difference is a lack of commas for additional simplicity.

A single instruction will have syntax of the form:

opcode arg1 arg2 ... argN

Where the opcode and possible arguments are delimited by white space. A script would then be a list of such instructions, with each instruction on a new line:

opcode1 arg1 arg2 ... argN
opcode2 arg1 arg2 ... argN
...and so on

In order to utilize the information present in this text-based format, we will have to construct a simple parser. The first part of this parser will scan a line of text and create a list of tokens. For the sake of simplicity, these tokens will simply be of type char (like our instruction and variable data), so that they can correspond directly to a value.

An interface for the Scanner:

typedef std::vector<char> CharArray;  // an array of char tokens

class Scanner
{
public:
    bool ScanLine(const std::string& line);  // tokenize a line and return success
    const CharArray& GetTokens() const;      // get tokens from most recent scan
};

With this interface in hand, we can now outline our test. After every successful scan, the tokens will be used to create an instruction, which will be placed at the end of a list of instructions. Any failure to scan a line will notify the user and not create an instruction for that particular scan. To complete the entering of a list of instructions, the user inputs "end" and a final op_end instruction will be appended to the list, to ensure a script based off these instructions will not run beyond its own length.

int main()
{
    VirtualMachine vm;
    Scanner scanner;
    vector<Instruction> instrList;

    char buffer[BUFFER_SIZE];
    string input;

    // begin reading text lines from user
    cout << "Input script instructions:" << endl;
    while (input != "end")
    {
        cin.getline(buffer, BUFFER_SIZE);
        input = buffer; // may as well use the string
        if (scanner.ScanLine(input))
        {
            const CharArray& tokens = scanner.GetTokens();
            opcode code = static_cast<opcode>(tokens[0]);
            instrList.push_back(Instruction(code, &tokens[1], tokens.size()-1));
        }
        else
            cout << "Invalid Instruction!" << endl;
    }

    // as a safety precaution, it couldn't hurt to have redundant ends
    instrList.push_back(Instruction(op_end));
    . . .

The user will then be prompted to provide the number of variables the script will need to store in order to run properly. With the instruction list and the variable count, the script will be created, loaded, executed, and have its variable state exposed.

    . . .
    // obtain a variable count
    size_t varCount;
    cout << "Input required variables count: ";
    cin >> varCount;

    Script script(instrList, varCount);

    // load the script and save the id
    size_t scriptID = vm.Load(script);

    // execute script by its id
    cout << endl << "***EXECUTING SCRIPT***" << endl;
    vm.Execute(scriptID);

    // check out the variable states
    cout << endl << "***VARIABLE STATES***" << endl;
    vm.ShowVariableState();

    return 0;
}

This test puts a bit of faith in the user's ability to identify the number of variables required by a particular set of instructions. Without a mechanism for analyzing the tokens scanned for correctness, it will also fail if the user does not provide sufficient data with a particular opcode. As a result, this is definitely not suitable for a final interface for scripting, but will suffice for now to test the scanner. Now we need to complete the scanner's implementation.

Scanning

Given the text-based format created and a little thought, it is not too difficult to create a process by which to scan a line of text and break it into its constituent tokens. It may look something like this:

if line contains no content, break out prematurely
scan an opcode
for the entire length of the line:
1. skip over white space
2. scan in next numerical value

To handle the subsets of this process we will create a few private utilities for the scanner:

private:
    bool SkipSpacing(const std::string& line);
    bool ScanCode(const std::string& line);
    bool ScanNum(const std::string& line);

Each of these functions will make use of private data:

private:
    CharArray   _tokBuffer;
    size_t      _offset;

An offset into the string "line" and an array of tokens will be initialized at the start of ScanLine() and will be used and modified throughout by the specialized utility functions just described.

Now we can put some code into ScanLine():

bool Scanner::ScanLine(const std::string& line)
{
    // reset offset and token buffer
    _offset = 0;
    _tokBuffer.clear();

    // check for an empty line
    if (line.empty())
        return false;

    // check for valid line content
    if (!SkipSpacing(line))
        return false;

    // check for a valid opcode
    if (!ScanCode(line))
        return false;

    size_t len = line.length();
    while (_offset < len)   // scan args until the end of line
    {
        if (!SkipSpacing(line)) // get to next arg
            return true;        // unless we're done
        if (!ScanNum(line))
            return false;
    }

    return true;
}

Theoretically, this design should work once we implement the private functions. Take note that up until now, we have not had to deal directly with manipulating the string. The low-level implementation has been effectively separated from design of the actual process itself. The time has come to deal with these details, so here is an implementation:

bool SkipSpacing(const std::string& line)
{
    while (isspace(line.c_str()[_offset]))
        ++_offset;
    if (line.c_str()[_offset] == 0)
        return false;
    return true;
}

bool ScanCode(const std::string& line)
{
    size_t begin = _offset;
    while (isalpha(line.c_str()[_offset]))
        ++_offset;
    return MatchCode(std::string(line, begin, _offset-begin));
}

bool ScanNum(const std::string& line)
{
    size_t begin = _offset;
    while (isdigit(line.c_str()[_offset]))
        ++_offset;
    if (_offset == begin)   // were any digits scanned?
        return false;
    std::string number(line, begin, _offset-begin);
    _tokBuffer.push_back(static_cast<char>(atoi(number.c_str())));
    return true;
}

The only thing not accounted for now is this call to MatchCode(). This call is here to separate the actual string manipulation code from the code that will identify whether a string is actually an opcode or not. To complete the scanner, we will need to introduce the capability for this necessary identification.

Matching

An array of strings containing the name of each opcode could easily be searched for the purpose of matching. Certainly this is not the most efficient way to do it. A more efficient algorithm might make use of a hash table to allow matching in constant time, but a simple array will work for this example.

typedef std::vector<std::string>    StringArray;
. . .

private:
    StringArray _codeNames;
    . . .

MatchCode() will assume that _codeNames contains the name of each opcode at the time it is called:

bool MatchCode(const std::string& str)
{
    char codeVal;
    StringArray::iterator itr = _codeNames.begin();
    for (codeVal = 0; itr != _codeNames.end(); ++itr, ++codeVal)
    {
        if (str == *itr)
        {
            _tokBuffer.push_back(codeVal);
            return true;
        }
    }
    return false;
}

To make our assumption legitimate, we will provide the scanner with the list of opcode names at the time of its creation:

public:
    Scanner(const StringArray& codeNames) : _codeNames(codeNames)   {}
    Scanner(const std::string* codeNames, size_t nameCount)
    {
        _codeNames.resize(nameCount);
        const std::string* end = codeNames+nameCount;
        std::copy(codeNames, end, _codeNames.begin());
    }

In our testing code we now include an aggregate list of strings containing opcode names in the order you find them enumerated, and use it to create the scanner:

int main()
{
    string codeList[num_codes] =
    {
        "talk",
        "print",
        "set",
        "inc",
        "dec",
        "add",
        "end"
    };

    VirtualMachine vm;
    Scanner scanner(codeList, num_codes);
    . . .

One small change was made to the opcode enumeration list to allow an implicit count of the number of opcodes included:

// enumerations describing the bytecode instructions the VM is able to process
enum opcode
{
    op_talk=0,
    op_print,   // our new printing code

    // variable manipulators
    op_set, // char, char : destination index, value to set
    op_inc, // char : index to increment
    op_dec, // char : index to decrement
    op_add, // char, char, char : dest index, srce index1, srce index2

    op_end,

    // not an opcode
    num_codes   // value is the number of opcodes if first opcode has value of 0
};

It should now be possible to compile and execute the test to verify that the scanning process does indeed function correctly. With this complete we may move on to file handling.

Basic File Handling

At some point in your programming existence, if you want to make things easier on yourself, you are going to end up writing utilities to make simple file handling easier. That is, unless you have found substitutes to your own, which is fine also.

There are some options to choose from when implementing such utilities. Some are more standard than others, with certain platform specific options as well. Two of the more standard options are the stdio FILE, and the fstream. Streams always make things very intuitive, but the stdio FILE handle isn't that bad either.

In this case I will go with the stdio approach, since I already have the stdio implementation completed. If you already have your own file handling utilities, feel free to use them instead of the ones I provide. I will be using a very generic interface that can easily be substituted for.

To save some (actually, a large amount of) space, the snippets I include for file handling will only illustrate the interface. The downloadable code will contain everything, of course.

First, there is a base file class:

class File
{
public:
    bool Close()       { assert(Good()); return (fclose(_fp) == 0); }
    bool Good() const  { return (_fp != 0); }
    bool Bad() const   { return (_fp == 0); }
    // file detection
    static bool Exists(const std::string& fileName)  { . . . }
    // file length    (only use before reading/writing)
    long Length()      { /* use fseek() to determine file length */ }
protected:
    File() : _fp(0)    {}    // this class should not be directly instantiated
    ~File()            { if (_fp) fclose(_fp); }
    . . .
    FILE*   _fp;
};

This purpose of this class is to wrap a stdio FILE, and provide basic utilities that should be accessible for most/all file processing. Since it provides no actual I/O functionality, as it's mainly a stepping stone class of sorts to reduce copy/paste, its construction is protected, to prevent it from being instantiated by outside code. The static function File::Exists() is there for conveniently detecting whether a file with said name already exists on disk. Good() and Bad() are used to determine whether a file is usable for reading/writing. A File will also handle closing itself (if it's valid) when it leaves scope, although the option remains to Close() it manually.

You will notice that I separate the functionality for file reading from that of file writing. In our case, we do not have a real need to mix reading and writing on the same file simultaneously, plus the separation is also beneficial for two reasons. One is that the interface becomes much more concise, and easier to read. The second reason is that it becomes impossible to accidentally read from a file you should be writing to, and vice versa. The compiler will not allow you to do this, as the Reading class only has access to a Read method, and likewise for the Writer. It's somewhat like the safety that a strictly typed language gives.

The next two types of files are for reading and writing respectively. They also open in binary mode.

class ReaderFile : public File
{
public:
    ReaderFile()   {}
    ReaderFile(const std::string& fileName) { Open(fileName); }
    bool Open(const std::string& fileName)  { . . . }
    // reading
    int Read(char* dstBuf, size_t len)      { /* return an fread() call here */ }
    // for many basic variable types
    int Read(particular_type& val)   { /* typecast particular_type and call Read() with proper size*/ }
};

class WriterFile : public File
{
public:
    WriterFile()  {}
    WriterFile(const std::string& fileName)               { Open(fileName); }
    WriterFile(const std::string& fileName, bool append)  { Open(fileName, append); }
    bool Open(const std::string& fileName) { /* Truncates an existing file, rather than appending. */ }
    bool Open(const std::string& fileName, bool append)   { . . . }
    // writing
    int Write(const char* srcBuf, size_t len) { /* return an fwrite() call here */ }
    // for many basic variable types
    int Write(particular_type val)         { /* typecast particular_type and call Write() with proper size*/ }
};

These two classes are very useful for reading or writing binary data without having to include convoluted typecasting in your own handling code. Calls to Read() and Write() shield you from such details, and allow you to work directly with most data types you will be using. The WriterFile has the option to Open() a file for writing, and if it already exists, append the written data to the end. By default, however, it will overwrite an existing file.

To deal with text-based files, there are two more related classes:

class TextReaderFile : public File
{
public:
    TextReaderFile()  {}
    TextReaderFile(const std::string& fileName)  { Open(fileName); }
    bool Open(const std::string& fileName)       { . . . }
    // reading
    int Read(char* buffer, size_t len)           { /* call fread() */ }
    int Read(std::string& str, size_t len)       { /* call fread() */ }
};

class TextWriterFile : public File
{
public:
    TextWriterFile()  {}
    TextWriterFile(const std::string& fileName)  { Open(fileName); }
    TextWriterFile(const std::string& fileName, bool append)  { Open(fileName, append); }
    bool Open(const std::string &fileName)       { /* Truncates an existing file. */ }
    bool Open(const std::string& fileName, bool append)       { . . . }
    // writing
    int Write(const char* str)          { /* call fwrite() */ }
    int Write(const std::string& str)   { /* call fwrite() */ }
};

The interface for these text-based versions of the ReaderFile and WriterFile are similar to the binary versions. However, to keep things simple with their text-based nature, they only deal with strings in std::string or char array form.

So we have some file utilities. Why make these anyway? Would it not have been much easier to simply use file streams much like the test using console I/O? The short answer is yes. The longer answer is that these utilities will be very handy once it's necessary to work with binary files. Eventually we would have had to create these anyway. May as well make use of them now!

Loading From File

Using what we have, the simplest way to read in the data from a file written in our format would be to write the entire file into a string and be done with it:

// prompt user to enter a filename
cout << "Enter path/name of script file: ";
cin.getline(buffer, BUFFER_SIZE);

// attempt to open the file
TextReaderFile file(buffer);
if (file.Bad())
{
    cout << "Could not open file!" << endl;
    return 0;
}

// read in file data
string fileData;
fileData.resize(file.Length());
file.Read(fileData, fileData.length());

Sub-strings that represent each line of the file could then be fed into the scanner to build the instruction list:

size_t begin = 0, end = 0, lineNum = 1, varCount = NUM_VARIABLES;
// feed data into scanner and build instructions
while (end != string::npos)
{
    // grab a line from the file data
    end = fileData.find_first_of("\n", begin);
    string line(fileData, begin, end-begin);
    begin = end+1;  // move past '\n' character

    // scan the line
    if (scanner.ScanLine(line))
    {
        const CharArray& tokens = scanner.GetTokens();
        opcode code = static_cast<opcode>(tokens[0]);
        instrList.push_back(Instruction(code, &tokens[1], tokens.size()-1));
    }
    else
    {
        cout << "Invalid Instruction!" << endl;
        cout << "Line number: " << lineNum << endl;
        cout << "Line: " << line << endl;
    }
    ++lineNum;
}

// as a safety precaution, it couldn't hurt to have redundant ends
instrList.push_back(Instruction(op_end));

The only thing we lack now is a way to determine the number of variables to reserve for the script. It would probably be best to place this information at the start of the file as a sort of header. Although it doesn't necessarily have to be in this format, here is an example:

Variables: 4
... list of instructions ...

To avoid further complications for now, it is enough to reserve a reasonable number of variables through use of a pre-defined constant, as I have shown:

size_t . . . varCount = NUM_VARIABLES;

Once the NUM_VARIABLES constant is given a value, it should be possible to compile this new code and feed it an example script as a text file. The sample script I have provided looks like this:

talk
print 72 101 108 108 111 32 87 111 114 108 100
set 0 12
set 1 10
dec 0
inc 1
end

You'll notice of course that the final "end" isn't actually needed, as we automatically provide one after scanning. However, as I commented, it's safe to add redundant ends. This script should have output that looks like this:

I am talking.
Hello World

You'll notice that the values provided to "print" in the script are the ASCII codes for the corresponding characters of the string it displays. The variable state output of the script for 0 and 1 respectively should be:

0: 11
1: 11

Exercises for the Reader

There are many possible improvements to be made to what has been given as example so far. A few improvements stand out, and I leave these for the reader to solve (optionally of course). With a little thought, none of these should be too difficult.

Outputting the variable state to a file instead of the console.
Creating an instruction to display a particular variable state.
Scanning in the desired variable count from a script file to allow it to be determined at run-time.
Implementing instructions to allow the system to drive something you've written in the past.

Conclusion

In the next article I intend to cover topics involving a stack machine and program flow, which will then lead into functions. It's possible that I may cover some of the process involved in compiling down to bytecode, depending on how I decide to handle incorporating functions into the current syntax. It may be desirable to stay away from that for now, however, for the sake of brevity.

A few notes about the sample source code:
I renamed the "ScriptState" class to "Execution" (now nested in the VirtualMachine), which sounds more appropriate for the role it will be taking in the future. I've also separated the code for the console loading from that of the file loading. To switch between either mode using the sample source, simply uncomment the #define for the desired mode, and comment out the other.

As always, feel free to express concern, criticism, or ask any questions you may have. The forum discussion is available, or you may email me: glr9940@rit.edu

Discuss this article in the forums

Date this article was posted to GameDev.net: 3/13/2002
(Note that this date does not necessarily correspond to the date the article was written)