Stack 'em, Pack 'em, and Rack 'em.
by Chris "Kiwidog" Hargrove
Download this week's code files: cotc4src.zip (69k)
Be sure to read the license agreement before using the code.
"For most men life is a search for the proper manila envelope in which to get themselves filed" - Clifton Fadiman.
Many happy returns! Like the title may suggest, today we'll be adding a small file system to our code base. This article won't be as long as last time, as the stuff shouldn't be too tough to deal with (plus we won't be contending with DirectX :) But first,
A Slight Change Of Plans...
I've got some good news and bad news.
First, the bad news. It looks like I'm not going to be able to do the 3D billiards-esque game after all. Actually, it looks like I'm not going to do a 3D game of any sort (at least not for this column). Surprisingly, the reason doesn't have to do with time or complexity, but rather a conflict of interest. The 3D wrapper that I wrote (simply called "UniRender", with a "ur_" prefix), which was going to be the wrapper I used as the 3D example in this column, has started getting used for some programs here at the office. Because of this, it's also starting to get maintained by more than one person, not just myself. Once that started happening, it was no longer a side project that I could openly distribute; instead it became work-related source code. While most of the subsystems I cover or plan to cover in this column are relatively small, 3D graphics was not going to be one of them (UniRender took me several days to get the Glide driver hammered out alone, let alone the D3D driver). Since I'm quite happy with the results, I can't justify spending several more days minimum to write a publicly-available equivalent. Even if I did, it would look almost identical (I like OpenGL so UniRender's API is extremely similar to it, and I doubt I'll change from the OpenGL paradigm anytime soon). So writing a second D3D/Glide/OpenGL-independent layer specifically for this column is pretty unlikely at this point.
Now, the good news. Because a 3D wrapper would have taken many articles to cover (minimum 5, probably 10 or more in the end), I now have a big chunk of article time available that I wouldn't otherwise have, to cover topics I might not otherwise have gotten to. For example, creating scripting languages or creating level editing tools (both areas that are irrelevant to a pool game set inside a simple cube). So by trading in a 3D game in exchange for a 2D game, we might get the chance to cover a broader range of subsystems.
For those of you who were reading this column specifically for 3D information, I'm sorry to disappoint, but I'll try not to leave you completely empty-handed. If enough people request it, I'll write some supplemental information on 3D theory and wrapper creation, outside of the context of my specific implementation (which I can no longer give out).
In the meantime, I am once again stumped for game ideas. Most of my existing plans have been 3D up to this point (as have a large portion of your suggestions), and those have now gone out the window. Fortunately, all the code we've done so far isn't specific to any particular type of game, so we haven't lost anything (see what modularity gains you? Code reuse baby!) But a decision still has to be made pretty soon as to what game to make. Should I go for Breakout? Super Sprint? Tetris? A big RPG? Just plain ol' Pong? :) I'm once again entertaining suggestions, provided they're 2D in nature. Got thoughts? Toss me an email.
Time for Q&A
A couple questions from last time:
Anyway, now that all that stuff's out of the way, it's time for our file system!
So What Do I Mean By "File System"?
In a nutshell, just "a subsystem that loads and manages files". At its core, a file system is no different than calls to regular file I/O functions like fopen(), fread(), fseek() and so forth. Many applications don't need anything more than these; they can get by with those standard I/O functions alone. But games are often a different story... they can have resource archive files that are several hundred megs in size, containing thousands of pieces of data. I'm sure you're all familiar with seeing "pack" files by now, ending in all kinds of extensions like WAD, GRP, PAK, MPQ, etc. Many games also allow these archive files to go on top of each other, for user mods and the like. So how do you allow your data files to be collapsed into these big hulking archives without putting all kinds of special cases in your code? And how do you allow other people to add on more resources that you didn't plan for?
Wrap around it. Like you've heard over and over from me, the more you can isolate and keep in small little encapsulated boxes, the better. So what's our approach to wrapping around these file concerns? Well, let's put it this way. If you didn't need a file system like this, what would you use? Probably the I/O functions mentioned above, like fopen(), fread(), etc. So why not make a wrapper that looks the same way? After all, standard I/O should be pretty intuitive by now, and there's no reason to get rid of something intuitive, right? So today, we'll be adding two more files to the project, both starting with the "xf_" prefix (short for "eXtended File" system). Just by using functions like XF_Open() instead of fopen(), and XF_Read() instead of fread(), we get to hide all the details of archive "pack" files and overriding resources from our application. Sound good? :)
Toss It On The Pile
We only have two files added this time, xf_file.h and xf_file.cpp. The interface is pretty tiny, but there's a good bit of functionality underneath.
There are two big issues that we want this subsystem to cover. One, allow us to load files whether they exist as-is, or in an archive "pack" file. And two, allow users to supplement or override one archive with another so that modifications can be made more easily. The second issue is actually no big deal, and can be done by allowing multiple search paths for the files you want to load (you'll see this when we get to it). But the first, the archiving issue, is more involved.
There are a lot of ways to do resource archive files, but most practical systems involve ways to mimic real directory structures. I've seen some nasty (and I mean nasty) systems in the past where people would actually hand-generate and hard code file locations in the archive, all kinds of evil stuff. There's absolutely no reason to subject yourself to that kind of pain when a directory-like structure will solve things much more easily. So in our case, we're going to treat our archives like they were actual subdirectories under our distribution directory.
For example, if we have a directory "Stuff" underneath the Dist directory, it'd be cool if we could have our archive named "Stuff.ext" or whatever that acted the same as that subdirectory. That way if there were a file Stuff\Sounds\happy.wav, and the Stuff archive were built such that it knew where "Sounds\happy.wav" was, it could give us the archived file in the same way.
Now beyond that basic "have a directory" premise, you have a lot of choices on how you want to structure your archives. How you do it generally doesn't matter too much, and you can support whatever archive types you see fit. As long as the specifics of an archive's layout are hidden from the rest of the program, everything underneath is up to you. For us, I just chose a pretty simple format to work with, resembling many other common uncompressed "pack" schemes out there. The extension is just "MDA" for Madness Directory Archive, but that doesn't really matter.
[Look at xf_file.h]
There's only a few functions in this subsystem, and right off the bat you should recognize what most of them are meant to do. XF_Init we just call once at initialization time, like we normally do with init functions. After that there's a whole slew of file functions, which resemble their stdio.h counterparts pretty closely. File opening is split between XF_Open and XF_Create, depending on whether the file is to be read or written (fopen() equivalents of modes "rb" and "wb", respectively; we only need binary file support). We won't even be using XF_Create that much in practice, except to dump out screenshots or crap like that. XF_Read and XF_Write are similar to fread() and fwrite() except that they merge the two size values of those functions into single "size" parameters, and return a boolean of whether or not they succeeded. There are three seeking functions which match the three seeking modes of fseek(). Finally there's the XF_Tell (the equivalent of ftell()), and XF_Size, which is a convenience function to return the file size. Altogether, pretty straightforward file stuff.
Next, there are two functions to create and extract archive files. Many games have their "packing" utilities as external executables; I thought it'd be more convenient for us if we just added the functionality right into the game itself. The create function takes a subdirectory name directly underneath our "Dist" directory and creates an archive file for it with the same name, and the extract function takes an archive file in the Dist directory and extracts all its files into a subdirectory of the same name... direct opposites of each other. Even though the two functions are interface-accessible, we'll probably never need to call them from the outside in our case; XF_Init will check the command line options to see if the user wants to work with pack files. Finally there's the XF_RootPath function, which we'll also probably never need to call from outside, as once again XF_Init checks for a command line option that uses this. Still, these three functions might make useful interface functions someday. :)
That's not a whole lot of functions, and hopefully not difficult ones to understand. As usual, we want to keep our interfaces as clean and simple as possible. Now the implementation can be a whole different story...
[Look at xf_file.cpp]
The archives start out with a header, holding some marker/version information, archive size, and information about the archive's "directory". The directory is an array of entries (one per file or subdirectory), each entry holding the name and size of the file it describes, and that file's location in the archive. So the basic idea is if you want to get a file out of the archive, you first look at the header of the archive to find the directory's location, seek to there, load up the directory into memory, find the entry for the name of the file you want, look at the entry's "file offset" field, seek to there, and read out however many bytes the entry's "file size" field says you should. That's pretty much the whole scheme. For efficiency reasons, you don't want to load up the whole directory every time for every single file you want to load; it makes more sense to read the directory the first time the archive is loaded, and keep it in memory from then on (which is what we'll do).
Now by itself, this only covers the directory of the archive name as-is, not any subdirectories. Handling subdirectories in a small packing scheme like this generally happens one of two ways... you either embed the subdirectory name as part of the entry's filename (nothing wrong with that, except it makes filenames long and can add to search time, albeit insignificantly) or you can treat subdirectories as files where the file is actually another archive for that subdirectory. Which method you choose is entirely up to you, although for this subsystem I'm using the latter. If we have a file Sounds\happy.wav, our archive will internally have a file named "Sounds" (flagged as a directory) which is another archive with the file "happy.wav" in its own directory. As you can probably guess, managing these files recursively makes things much easier, so xf_file.cpp has many private recursive functions.
If you go down to the "private structures" section, you'll see two structures for the archive header and archive entry, respectively. These two structures should be pretty easy to understand based on my description of the archive format. All the structures beyond those two are only internal, and exist in memory.
The internal archive structure contains a name of the archive, file offset of that archive within the topmost parent archive (for "Stuff" this would be zero, but for "Sounds" it'd be wherever the "Sounds" file started within Stuff), the loaded header and directory of the archive, and a few pointers to other archives. Two of the pointers are just to describe the archive hierarchy forest (typical linked list stuff), like where the subdirectories are. The third points back to the topmost parent archive so subarchives can know what filename they're in when they need to open the thing.
Then there's the file handle structure, which the subsystem equates to the dword handle numbers it gives you when you work with files. Since our file stuff sits on top of fopen(), fread() and so forth, it makes sense to see a FILE* in there. The start, length, and position fields are also necessary though. After all, even when an application thinks it may be reading from a regular file, it might be reading from an archive. So the FILE* will be a pointer within the archive file, and in a completely different place than if the file were being read directly. If we hold on to position and other file information though, we can hide the application from that fact (by subtracting the start positions etc). Look at the implementations of the XF_Seek* functions to see an example of these fields in action.
Now down in the function area, you'll know the majority of the "work" for this subsystem is in private implementation functions. There's an assortment of functions here ranging from finding and adding archives in the loaded tree of them, switching between file structures and their numeric handle numbers, recursive archive management, etc. If you look at those functions, all they really involve is some linked list control and file/directory control stuff. A few of the functions are a bit bulky, but none of them should seem complex. Regardless, they're all there to handle our specific implementation of the archives... stuff that would probably look considerably different under a different archiving scheme anyway. But that's the cool thing; if the archive scheme changed, only this implementation-specific stuff should need modification. Meanwhile the interface (and everything using it) should stay happy no matter what we do :)
Scrolling down to the interface function block, you'll see XF_Init. This doesn't really initialize much in terms of the file system internals (which don't need a lot of initialization), so much as check the command line parameters. If you run the program with "-packcreate" or "-packextract" followed by a directory name (like Stuff, or "Resource", which we'll be using), it will do that and stop the program afterward.
There's also the "-rootpath" check below. This is where we contend with our second big concern in the file system, the override factor. If you've played Quake, you probably know that you can run user mods (like CTF or TeamFortress or whatever) by using the -game parameter. In Quake, that redirects the starting search path to a different subdirectory other than Id1. For us, -rootpath does pretty much the same thing. We always have a default directory "Resource" underneath our distribution directory, and that doesn't change. But if you use -rootpath, you can add on additional subdirectories that the file system will go to beforehand, allowing user mods.
For example, when we want to open a file "Sounds\happy.wav", it will internally look at whichever subdirectories are on the rootpath list (including Resource), and see if the file is there. If it's not, it'll check if there's an archive for that same directory, and try and use that. So if a user mod named "MyMod" replaces Sounds\happy.wav, and if the game is run with "-rootpath MyMod", it will find MyMod\Sounds\happy.wav before it finds Resource\Sounds\happy.wav, and the mod works as planned. XF_Open looks through these rootpath directories when loading up files.
All the other interface functions are pretty small, and should make enough sense. All in all, there's only 800 or so lines of code in this subsystem. Pretty small, but with a nice bit of functionality in it. If you look in sys_main, the example this time around demonstrates the file system's use by loading two small text files and displaying them in a message box. One file is actually there, below the Resource directory. But the other is nowhere to be found; it only exists within the Resource archive file. The system treats it all the same, no special intervention required. :)
End Of File
That's about it for this issue. Next time, we'll be supplementing our file system with some memory management code. After all, being able to pull resources out of files is all well and good, but once you've done that, you've gotta have some place to put them, right? :)
Until next time,
- Chris"Kiwidog" Hargrove is a programmer at 3D Realms Entertainment working on Duke Nukem Forever.
Code on the Cob is © 1998 Chris Hargrove.
Reprinted with permission.