Upcoming Events
Unite 2010
11/10 - 11/12 @ Montréal, Canada

GDC China
12/5 - 12/7 @ Shanghai, China

Asia Game Show 2010
12/24 - 12/27  

GDC 2011
2/28 - 3/4 @ San Francisco, CA

More events...
Quick Stats
46 people currently visiting GDNet.
2406 articles in the reference section.

Help us fight cancer!
Join SETI Team GDNet!
Link to us Events 4 Gamers
Intel sponsors gamedev.net search:

The foundation of the World Wide Web, HTML, is known as a 'mark-up' language (HTML is short for "HyperText Mark-up Language", a child of SGML). This means that you 'mark up' pieces of data to allow them to be recognised in particular ways. You might mark up the text "Welcome" as a heading, or the URL "bob.gif" as an image.

The good, clever folks down at the W3C have drafted another language (also derived from SGML, HTML's big daddy). This language is known as XML, and is a very good thing. Firstly, it's another mark-up language ("eXtensible Mark-up Language", to be precise). But it doesn't have any defined 'tags' or keywords whatsoever!

How is that useful? you ask. What can we do with a language that has no words in it?! Well, the reason it's known as an 'extensible' mark-up language is that you can 'extend' it - that is, make it up. You create the words, and everything adheres to the language 'grammar.' XML is meta-data: data about data. The full language specification is at http://www.w3.org/TR/REC-xml, and while it's heavy reading, describes every aspect of the language from start to finish.

OK, so what's so great about this, then?

If your program dumps a whole load of data to a file, then what happens when you want to use that data in another program? You have to drag up format specifications, the code that created the file in the first place, and so on. The reason is that once your data's in the file, that's all it is: data. A stream of numbers with no real meaning to anyone or anything. You've effectively encrypted it - anyone who doesn't have the format specification will have no idea how to read the data. Sure, they could try and figure it out - but that's as slow and difficult as standard code-breaking.

Surely, in the days of object-orientation and massively-multiplayer online games, there must be a better way? I think XML can fill part of the gap.

XML 101

A single 'item' in XML is called an "element". An element consists, at a bare minimum, of a tagname (in HTML, things like P, H1, or TABLE are tagnames), which is in an opening tag and a closing tag. If my tagname is "gibbon", I could write it like this:

<gibbon></gibbon>

In fact, because there are several cases where there's nothing between the two tags, you're allowed to shorten it to this:

<gibbon/>

You have to have the forward-slash at the end there, so that the XML parser knows not to look for a closing tag.

At it's fullest, an element can have three things: Attributes, Children, and Data.

An Attribute is a "name=value" pair (e.g. 'family="mammal"'). All the attributes go in the opening tag, after the tagname:

<gibbon family="mammal" size="big" bottom="red"/>

Children are other elements, which are 'contained' within the first element. What that really means depends on how you interpret it; it could be that the child is 'inside' the parent (if the parent is, perhaps, a box of some sort), or it could be that the child is literaly a child of the parent, like so:

<gibbon>
 <babyGibbon/>
</gibbon>

The babyGibbon, as a seperate element, is in it's simplest form - no attributes or children. However, because the gibbon now has a child, you can't write in the condensed form; you have to have seperate opening and closing tags, as shown.

Finally, an element can have 'data.' Data is anything that you put between the opening and closing tag, and which isn't an element. At it's simplest, you can just have plain text in there - there's also something called CDATA, which you use when your text might contain < and > symbols (thus confusing the parser).

There's one last rule about XML. All your XML has to be 'well-formed.' To do that, you just have to make sure that every opening tag has a matching close-tag (or is in the condensed form), and that you close things in the same order you open them. So, you can't do this:

<box>
 <bag>
  <thing/>
 </box>
</bag>

Instead, you need to do this:

<box>
 <bag>
  <thing/>
 </bag>
</box>

Keeping things well-formed will help you out a lot. It's much, much easier for the parser to treat the XML code as a tree or stack; and if your code isn't well-formed, it won't be able to. In the first example, after the 3rd line (<thing/>) my stack of elements looks like ":box:bag". After the next line, it becomes ":bag". That doesn't work, because there is no 'bag' element at the top level. HTML let you get away with this; XML is not so forgiving.

Conveniently, Internet Explorer (up till IE6, at least), when presented with an XML file, will check it and display it as a tree (and tell you if you messed it up), so you can check your XML syntax and layout by opening it in IE. There are plenty of other syntax-checking utilities out there, of course - including, I'm sure, something to write the XML for you, while you just build up a tree of your elements.

An Example

Here's a little chunk of XML:

<?xml version="1.0"?>
<fridge>
 <cheese type="cheddar" flavour="mild"/>
 <cola/>
 <tupperware_box size="large">
  <sandwich state="half-eaten"/>
 </tupperware_box>
</fridge>

And viola, the contents of my fridge.

According to the code above, my fridge contains some mild cheddar cheese, a can of cola, and a Tupperware box containing a half-eaten sandwich. Could you get that just by reading it? I'll guess you did - well-written XML is very easy to understand like that. If I were to eat more of the sandwich, and put, I dunno, a piece of broccoli into the box, I could just change the code to:

  <sandwich state="three-quarters-eaten"/>
  <broccoli desirability="none"/>

then you get the idea.

You may be wondering about that first line - <?xml version="1.0"?>. It's given in the spec as a requirement for 'proper' XML data - really, it just gives the version of the language used to make the file (as the language will, nay, has, changed - they're already up to 1.1, but the parsers are still catching up). It's not totally necessary, if your file sizes are constricted or something, but it's a good thing to use.



XML in Games

Contents
  XML 101
  XML in Games

  Printable version
  Discuss this article