Creating a Scripting System in C++
Part II: Data Manipulation

by by Greg Rosenblatt

Get the code for this article here.

Continuing On

I last left off with a very simple example of a machine capable of outputting some text. That was all it could do, and it was always the same text. If you remember, last time I spoke about the difference between programs built in this static manner, and programs that are able to handle more dynamic situations. If it were really necessary to create a different type of instruction for every type of message you wanted to output, it could end up being a nightmare.

The Benefit of Data

The simplest remedy to this situation is to create a new style of instruction that makes use of optional data to dictate the message you would like printed. With this type of instruction, all that would be necessary to print a custom message would be to assign it the proper data. No need for hordes of specialized instruction types.

So now we will add support in our Instruction class for using additional data:

// the basic instruction
class Instruction
    Instruction(opcode code) : _code(code), _data(0) {}
    Instruction(opcode code, const char* data, size_t dataSize)
        : _code(code), _data(new char[dataSize])
    { memcpy(_data, data, dataSize); }
    ~Instruction()  { delete[] _data; }

    opcode Code() const         { return _code; }
    const char* Data() const    { return _data; }   // read the data
    opcode  _code;
    char*   _data;  // additional data

While creating an instruction, additional data can be paired with an opcode by using the second form of constructor. This constructor allocates memory of the correct length to store this data and then copies the source data into its own private storage. This data can be read, but will never be changed again, according to the current interface. A destructor has been added to handle deletion of the data.

If you're asking why the constructor creates a copy of the data provided when it seems simple enough just to assign the internal pointer to the address of the data provided, consider this: What would happen if the source data were to leave scope? You would be left with a dangling pointer. This is why the class owns its data buffer.

Now, we would like to add a new opcode to designate the new functionality we require:

enum opcode
    op_print,    // our new printing code

The last new inclusion to make is in the virtual machine's processing loop. In the case of our new opcode, it must print the message described by the data, and then go to the next instruction:

void VirtualMachine::Execute(size_t scriptId)
    SelectScript(scriptId);   // select our _instrPtr by script ID
    _instr = _instrPtr;       // set our iterator to the beginning
    while (_instr)
        case op_talk:
            std::cout << "I am talking." << std::endl;
            ++_instr;         // iterate
        case op_print:
            std::cout << _instr->Data() << std::endl;    // print data
            ++_instr;         // iterate
        case op_end:
            _instr = 0;       // discontinue the loop

It would be a good idea to make sure things work correctly. In our main source, we will test the new instruction. All we need is some data to print, which we then pass to the printing instruction's constructor, along with its proper length (the string length + 1 for the terminating null character):

VirtualMachine vm;

// simulate some external data
char* buffer = "this is printed data";

// build the script
vector<Instruction> InstrList;
InstrList.push_back(Instruction(op_talk));  // talk still works the same way
InstrList.push_back(Instruction(op_print, buffer, strlen(buffer)+1));  // print
InstrList.push_back(Instruction(op_end));   // then end
Script script(InstrList);

// load the script and save the id
size_t scriptID = vm.Load(script);

// execute the script by its id

If all is in working order, this code should talk, and then print the message provided by the data.

Another Form of Data

Data paired with an instruction is all well and good for allowing flexibility on a per-instruction basis. But what about flexibility between instructions? In order to achieve this, we need data that is accessible by all instructions, for reading and possibly writing. This data is therefore reasonably placed at the level of a running script.

The ownership of this data should be dealt with carefully. Unlike an Instruction's data, we would like this new data to be write-able in addition to being readable. If the ownership is carelessly placed at the hands of a script, then issues may arise when trying to enhance the features your system is capable of, such as when implementing some type of pseudo-multi-processing (parallel execution of scripts). This is because any changes to the script data in one "process" will affect any other "processes" running this same script.

For this reason, we would like to abstract a script's executional state. If and when we do implement such a feature, we can safely create executional states for each process being run. This script state will own the variable data we'd like to use, while the script itself will merely store a count describing how much data it needs when executing. The script state should also include some utilities for manipulating this data, otherwise what's the point of having it?

Our class may look something like this:

// a script's executional state
class ScriptState
    // initialization
    void SetDataSize(size_t varCount)   { _varData.resize(varCount); }

    // data access
    void SetVar(size_t i, char val) { _varData[i] = val; }
    char GetVar(size_t i) const     { return _varData[i]; }
    const std::vector<char>& DataArray() const  { return _varData; }
    std::vector<char>   _varData;

For current demonstrative purposes, char variables will be sufficient. Variables can be set or retrieved by index. If you'd like, you can even retrieve the data in a semi-string form. Keep in mind that it isn't necessarily null-terminated, however.

An aside regarding organization:
At the moment, all of our classes are residing at the same namespace level. While this is ok for the limited number of classes we're working with, the organization could be improved somewhat, possibly through nesting. Instruction would make the most sense nested in Script, with Script and ScriptState nested in VirtualMachine. This is something to keep in mind, and I may make this organizational change in the future.

Now, to make use of this in our VirtualMachine class, we will simply add a ScriptState as a data member. At the moment, since we aren't dealing with parallel executions of scripts, we can get away with this. Later, when implementing this parallel script execution, we will have to relocate this member.

For now, to make use of it, we simply initialize its data size at the start of execution:

void VirtualMachine::Execute(size_t scriptId)
    SelectScript(scriptId);  // select our _instrPtr by script ID

    // initialize variable data

    _instr = _instrPtr;      // set our iterator to the beginning
    . . .

A Helpful Tool

Before we go on to make any new instructions to play around with this variable data, we should take care of one minor, yet very crucial thing. As anyone who has ever had to debug his or her code should know, the debugging process can be a real pain. Utilities to aid in debugging can help a great deal, so we should definitely have a utility built to view the data values stored in a ScriptState at any given time.

Something like this should suffice for now:

void ExposeVariableState(const ScriptState& state) const
    std::vector<char>::const_iterator itr;
    int n = 0;  // used to denote indexed position of value
    for (itr = state.DataArray().begin(); itr != state.DataArray().end(); ++itr, ++n)
        std::cout << n << ": ";
        std::cout << static_cast<int>(*itr);   // cast for numeric value
        std::cout << std::endl;

Little things like these can save you a lot of trouble later on when you just can't seem to get a script to work correctly.

Manipulating the Data

Now let's add some pretty basic instructions just to prove that we can manipulate this data predictably.

op_set, // char, char : destination index, value to set
op_inc, // char : index to increment
op_dec, // char : index to decrement
op_add, // char, char, char : dest index, srce index1, srce index2

The commenting here describes the instructional data format, followed by a description of what each value represents to the instruction. For instance, the set op will set the variable at the specified index to the specified value, while the add op will set the variable at the destination index to the result of adding the values at source indices 1 and 2.

We are ready to add proper handlers for these opcodes in the virtual machine:

. . .
case op_set:
    _curState.SetVar(_instr->Data()[0], _instr->Data()[1]);
case op_inc:
    _curState.SetVar(_instr->Data()[0], _curState.GetVar(_instr->Data()[0])+1);
case op_dec:
    _curState.SetVar(_instr->Data()[0], _curState.GetVar(_instr->Data()[0])-1);
case op_add:
                     + _curState.GetVar(_instr->Data()[2]));
. . .

If you trace through each handler very carefully, you will see that, although a bit circuitous, each instruction is handled as we have described. Due to the circuitous nature of these handlers, they are certainly not optimized to their fullest extent. This is partially due to not having direct write-access to the ScriptState's data. At the moment however, individual instruction handlers are not critical, as they are merely a filler to make sure the key-components of the virtual machine system are operating. You will certainly want to rewrite these later on. Right now we are more concerned with the design of the overall system, and using efficient methods that do not deal directly with handlers.

Another Little Test

We will test this out with another little script. Lacking any creativity at the moment, you may simply put a few pseudo-random manipulation instructions into the script. We will use 4 variables, set the first 3 to a value of 7, then increment the 2nd variable (index 1), decrement the 3rd (index 2), and finally add the 1st and 3rd variables, placing the result in the 4th slot (index 3).

It should resemble the following, in opcode-with-data format:

set 0, 7
set 1, 7
set 2, 7
inc 1
dec 2
add 3, 1, 2

With this deterministic script, we are able to predict the final states of each of the 4 variables. If you follow closely, you will see that they should be as follows, in index-value format:

0: 7
1: 8
2: 6
3: 13

So let's try out our enhancements with the virtual machine. We will create a second script, load it into the machine, and then execute it using the ID returned from loading. In addition, we will use our new debugging tool to check out the variable states after execution.

To create the instructions for this script, we are going to need to simulate some external data (as was done for the previous data example) for reading into the proper instructions:

// create variable manipulation data
char setData1[] = {0, 7}; char setData2[] = {1, 7}; char setData3[] = {2, 7};
char incData = 1;
char decData = 2;
char addData[] = {3, 0, 2};// add 1st and 3rd var, and store in 4th

// proper instruction data size constants (temporary for safety)
const int SET_SIZE  = 2*sizeof(char);
const int INC_SIZE  = sizeof(char);
const int DEC_SIZE  = sizeof(char);
const int ADD_SIZE  = 3*sizeof(char);

Loading the data looks something like this. Notice that we have to use a different syntax for passing single chars than for passing char arrays:

// build the variable manipulation script
vector<Instruction> varInstrList;
varInstrList.push_back(Instruction(op_set, setData1, SET_SIZE));   // set first 3 vars to 7
varInstrList.push_back(Instruction(op_set, setData2, SET_SIZE));
varInstrList.push_back(Instruction(op_set, setData3, SET_SIZE));
varInstrList.push_back(Instruction(op_inc, &incData, INC_SIZE));   // inc 2nd var
varInstrList.push_back(Instruction(op_dec, &decData, DEC_SIZE));   // dec 3rd var
varInstrList.push_back(Instruction(op_add, addData, ADD_SIZE));
varInstrList.push_back(Instruction(op_end));                       // then end

Finish by passing the instruction list, and our variable requirement. Then we can load and execute the script:

Script varScript(varInstrList, 4);  // we need 4 variables

size_t varManipID = vm.Load(varScript);

// check out the variable states

If all goes well, you should see the correct pre-mentioned values at appropriate indices.


If our testing methods are beginning to seem like glorious hacks to you, you're probably right. Things are beginning to get messy in main(). We seem to be following sloppy, if not outright dangerous, practices to properly load the necessary data into particular instructions. What we are lacking is a centralized procedure for the proper handling and loading of instructions and their data.

While all of this may be fine for our small examples right now, if we are ever to go into larger things, we certainly want the centralization described to localize all possible bugs to one section of the code. That way, if we find we screwed up somewhere, we know exactly where to look while debugging. If you've not heard this before, the idea of localizing functionality is certainly something that is applicable in most, if not all, programming practices.

A mechanism to handle loading in a localized manner is definitely needed soon.


Quite a bit was covered in this article, even though the underlying concept was pretty simple. As basic as it may seem, the inclusion of data increases the flexibility of our instructions a great deal. What would have required hordes of different instructions now requires only a small handful, with some additional data. The virtual machine is also now capable of retaining some kind of "state" during execution, which definitely has beneficial consequences.

At this point, there is a lot of metaphorical territory to be explored on your own. As easy as it may have seemed, we already have laid out much of our foundation. There is a lot to be discovered, and the possibilities are quickly becoming endless.

I am not yet exactly sure what I will be covering in the next article, though it will still be in accordance with my original outline. I am open to suggestions. Please make use of the forum discussion, or email me:

Discuss this article in the forums

Date this article was posted to 2/1/2002
(Note that this date does not necessarily correspond to the date the article was written)

See Also:
Featured Articles
Scripting Languages

© 1999-2011 All rights reserved. Terms of Use Privacy Policy
Comments? Questions? Feedback? Click here!