Hey! Hey! Don't Touch Me!
by Chris "Kiwidog" Hargrove
Download this week's code files: cotc3src.zip (60k)
"Input... need input!" - Johnny Five, Short Circuit.
Back again! Okay, you can stop groaning now. :)
I said last time that I was going to do some more system stuff like file management etc. before moving on to user input, but I changed my mind and decided to do user input first. Since the low-level input system won't depend on the file or memory management stuff, and since feedback might end up being handy pretty quickly for debugging, it made a bit more sense to switch the order like this. So we'll be doing another subsystem for user input, once again separated into a lower-level layer specific to Windows (using DirectInput), with a general layer sitting on top of it. Unlike last time, I'll be going into more detail explaining the functions themselves in this article... and more importantly, the logic that went behind them. Remember, it's the logic and design philosophies that you should want to pull out of this series, not the code itself.
So sit back and relax, it's time to smack the machine around a bit. :)
From Our Last Episode...
A few quick things before we begin... once again the feedback has been great since the last article, thanks for the positive response everyone. :) I apologize if I can't end up getting back to all of you who have questions, but as you can imagine I do get a lot of mail. I try to address as many people as I can though, and if something comes up enough I'll try and answer it here too. Here's a couple quickie Q&A answers for a few of the more frequent questions I've gotten:
One other quick side note; beginning with this article, the source code archive now comes with a small "license agreement" text, with some pseudo-legalese in it. This is basically just to cover my ass, as well as a way for me to openly object to people ripping off my work. I'm writing this series to help you guys learn, and if some of the code itself ends up being handy then I have no objection if you guys use it in your own stuff as is. But I'd be much happier if you consider the code purely as an example and not as some kind of public domain library. I'm not going to pretend that this won't get ripped here and there; that'd be a foolish presumption. But if you do that, at least be kind enough to let me know so I know who the code is helping. Don't worry thinking that this license thing means I'm gonna try and sue anybody who uses the code; far from it. I just want people to be aware why I'm doing this and that if they do rip the stuff without really knowing how it works, then they do neither myself nor themselves any favors.
Alright, enough opening stuff for one day, it's time to get down to work. :)
Just Click It... Click It Good...
There's a whole lot of ways to handle user input. There's only three real input devices that most games concern themselves with, the keyboard, mouse, and joystick (and we're not even worrying about joystick in this game). But people have gone to all kinds of different lengths to get information from those few devices and scatter it to the input-hungry parts of their game. Sometimes it's direct, sometimes it's indirect (through a keybinding relay or a general console system ala Quake), but there are several parts of a game that want feedback from the user, and how you get that feedback to the right place can be a real pain if you're not careful.
Even in this somewhat small game, we're going to run into these same issues. When the menu's up, the menu will want input focus. We'll be developing a rudimentary Quake-ish console later on too (mostly for debugging), and when it's up the input should go there too, unless the menu is up. When only the main game is running, we'll want it to go through a keybinding relay, driven by the console. Sometimes we'll care about when you first press a key or mouse button, sometimes we'll care about when you let a key or button go, and sometimes we'll want to know about all the times in between while you're holding a key or button down. Overall, there's a lot of context-sensitive control logic that needs to go on in the user input department. And we don't even have the primary game, let alone a menu or console, developed yet... so how does one get all this input stuff dealt with cleanly so early in the project?
Looking at the problem as a whole, it can seem like a nightmare. But if you break things up into smaller pieces, things become much simpler. Many (actually most) programming problems work this way. It's hard to chew a whole pack of gum at once, but you don't have to. That's why gum comes in packs of smaller, more chewable sticks and pieces. Programming is often no different, you just have to find your own sticks.
So let's look at the lowest level first. What possible states does a keyboard have? Well, each key can either be up, or it can be down. Same goes with mouse and joystick buttons. And what about a mouse or joystick itself? Well, it can either be still (not rolling / centered), or moving (rolling / angled). That's about it really.
Now we if we wanted to, we could make a low input layer that purely has this information, and layers above it could check it manually. But that feels like a bit more exposure than we need. Let's look at things from a different perspective... what possible things would a "client" of the user input system (i.e. the menu, the console, whatever) want to know? Offhand I can only think of a few things, and this covers all three main devices (keyboard, mouse, joystick): The client might want to know when the mouse was rolling, when the joystick was tilted, when a key or mouse button or joystick button was first pressed, when a key/mouse/joybutton remains pressed, and when a key/mouse/joybutton is released. Anything else? There're plenty of clients that would end up sitting on top of this, but all of them only concern themselves with this same information. All a client wants to know is when something happens that's important to it, so it can deal with it. When nothing is happening, clients shouldn't have to concern themselves with details just to find out that nothing is happening. So we don't need to expose anything underneath if important events are all a client cares about.
There's that word again, "event". Event-driven programming is already the law of the land when dealing with operating systems like Windows, but even if we weren't doing a Windows game, it would be appropriate here. Since all the user input we care about can be broken up into events that a client might find meaningful, it makes sense to structure the system around a few common events. Following what we came up with in the previous paragraph (and the fact that we're only concerning ourselves with keyboard and mouse), there are only four events we need: MouseMove, Press, Drag, and Release. MouseMove gets called whenever the mouse moves (duh :) and the other three get called when a key's state has something relevant to say (mouse buttons are considered keys).
Our input system will be split into two pieces to deal with this. The lower layer (the part tied to DirectInput) doesn't have to know anything about what's using it, all it has to do is tell "whom it may concern" when any of those events listed above happen. How those events are used, it doesn't know and it doesn't really care. By the same token, the "general" input layer above doesn't care where the events came from, but it knows where they're going. This general layer will dispatch the input to whichever subsystem(s) are concerned with it. As time goes on, we can register more subsystems with this general layer as we create them... while leaving the heart of the input beast untouched.
Let's start with the ground floor, shall we?
DirectX - How Direct Is It?
Not all that direct in my opinion. As far as the API goes, I don't much care for DirectX. I don't like COM, I don't like being forced to "acquire" or "restore" devices all the time when my app loses focus, and so forth (I also don't like Hungarian notation). Altogether, DirectX feels like it has far too much baggage. It's good that Microsoft gives us something that's closer to the hardware than Windows' other APIs, but this is far from an ideal solution in my opinion. If I don't intend to lose input focus but I do, then I want that focus back, right? It's the things like that which bug me. Later on we'll be working with Direct3D, which has plenty of "issues" as well.
But not liking something is not sufficient excuse to ignore it. DirectX is the standard these days when writing games for Windows, and Windows is now the standard when writing PC games in general. So DirectX is a reality that we have to deal with.
Besides, just because DirectX can be somewhat nasty doesn't mean we can't wrap around it, right? And that's what we're going to do today with DirectInput. DirectInput is actually one of the easiest DirectX components to work with (sad but true), so if you're not used to DirectX then it's a good place to start.
Obviously you'll need the DirectX SDK in order to build the game from this point on. This DirectInput code here only requires DirectX 5, but later on we might start requiring DirectX 6 so you'll want to get that if you haven't already. I added dxguid.lib and dinput.lib to the project's library list (just like I did with winmm.lib up in the Q&A section), so if you're mimicing the project then you'll want to do that as well.
The Insides Of Input
There were five files added this time around, all starting with "in_". I also made a few touchups to some of the other files too, but nothing directly related to input. MSVC comes with a "diff"ing utility named windiff.exe if you want to see exactly what lines changed, or if you have another diffing program then you can use it as well. If you're fortunate enough to have MS SourceSafe or some other version control software, then consider all modified files to be new versions.
Two of the files (in_win.*) are specific to DirectInput, our lowest layer. Two others (in_main.*) are the primary user input interface layer that sits on top. Nearly everything from the outside only goes through this layer, and doesn't touch the DirectInput layer directly. The fifth file is another header file (in_event.h) which holds the common event structure that both halves use yet neither has control over.
We'll dive into the specifics of both in_win and in_main in just a moment, but here's a quick summary of what's going on in each side.
in_win: Lowlevel DirectInput interface. Whenever its frame function is called, it processes all input information and sends out any necessary events to a function provided by the outside.
in_main: General user input interface. Uses in_win below it, and calls in_win's frame function from its own frame function (which the main loop uses). Contains the callback function which in_win sends events to, and this callback in turn relays these events to a set of "receiver" functions that are looking for input. These receiver functions sit in a stack and pass the events down from one function to the next until one of the functions consumes it.
[Look at in_event.h]
First, take a peek at this header. This is where the common definitions and structural info sits. It includes the event types, flags, key constants, and the event structure. Why is this in a separate header? Well, think of it this way. If we put it in in_win, that would imply the stuff is Windows-specific. And since it's not, and since non-Windows-related subsystems will need to access this stuff, then we shouldn't put it there. We could also put it in in_main, but remember that in_win needs to use this stuff too, and in_main sits ABOVE in_win. So putting this common stuff in either of those two headers would be inappropriate. By keeping it in a separate header, we only lock each half to this common structure and not to the other half, making for easier detachment.
In this header, we first have the event types (mousemove, press, drag, and release), like we talked about before. After that are a few flags, then the big list of keys. Is there anything special about why the key constants were chosen this way? Not really. The low 128 map to their ASCII equivalents, that way we can use the actual character values. The extended keys are added above that, in a somewhat arbitrary order. Then the mouse button keys start at 256. These are the key values that everything hooks to, so any other set of key values in the low level code (DirectInput in this case) must convert to these values.
Finally, after the key values, we have the event structure. This is the structure that in_win fills in, in_main takes and dispatches, and all the user input receiver functions use. It contains a whole bunch of information about the event, including the event type and what key it deals with, current keyboard flags, mouse positions and changes, a couple timing values, etc.
Considering that we're using C++ and not just C, my choice of declaring this as a structure rather than a class may annoy some OOP-religious types. I do this for the simple reason that I only use classes when classes have an obvious implementation advantage over structures. I'm not an OOP zealot or a functional zealot, I'm a practicality zealot. If classes don't have a definite and distinct advantage over structures for a particular scenario, I'll use a struct. We'll have plenty of cases in this project where I use classes instead, but this is not one of those cases. The event structure is just data, and it has no control over its contents, so putting it in a class has neither a conceptual nor implementation advantage to me.
So that's the glue definition stuff; let's move on to DirectInput.
[Look at in_win.h]
The DirectInput layer doesn't have a whole lot of functions. Aside from the usual three (init, shutdown, and frame), there are a couple functions specific to Windows for setting and killing focus (which are called by the window function in sys_win), the function to set the input handler, a key stuffing function, and some mouse control stuff. So the interface is small, as it should be. But the insides are considerably larger...
[Look at in_win.cpp]
If you haven't used DirectX before, this file will likely confuse you to some degree. If you don't have any books or other available references on DirectX, you might want to consider keeping the SDK documentation handy. I don't have the luxury of going into detail about every DirectInput call in here (I'd have to write a book for that), so if you want a reference for the DirectInput API itself you'll have to dig one up. I'll just be describing how this file fits in the rest of the subsystem.
After a few definitions, we've got a bunch of private data to this file, commented according to what it does. The data's usage will be apparent in the functions that deal with it.
Next we've got some implementation functions. NameForDIError just gets a text string for a DirectInput error, for debugging. SetMouseCooperative calls the mouse device's SetCooperativeLevel function depending on our exclusive state. We'll be using exclusive mode for the mouse when in fullscreen, and nonexclusive when we're not (note that the keyboard can not be exclusive). FlushKeyboardData purges any pending DirectInput buffer data and resets our key states. InitDIKToINKEY just maps DirectInput's DIK_ key constants to the ones we use. Next the Acquire/Unacquire functions for the keyboard and mouse do just that (acquiring and unacquiring devices is how DirectInput controls who has what). Finally the CheckKeyEvents function is called by the INW_Frame function below, and is effectively a macro function for triggering key events (since that same code is used multiple times). Whether a press, drag, or release event gets sent depends on the key's current state and its previous state.
Now we get to the interface functions. INW_Init calls all the DirectInput crap that we need to get our keyboard and mouse devices set up. If any of these DirectInput calls is unfamiliar to you, check the prototypes of each one in the SDK. Once we've acquired both devices, we're ready to roll. INW_Shutdown unacquires both devices and kills the DirectInput interface handle.
INW_Frame is where all the real work happens. This is the function that gets called each frame to trigger events to the event handler function. First, the mouse gets handled. After getting the state of the mouse (including position change and button info), it sets up an event with all the relevant information at the time. If the mouse has moved, a mousemove event gets sent first. Then the buttons are checked, the button key states are updated, and press/drag/release events are sent as necessary. The keyboard is processed in a similar manner, only instead of checking the current state of every key in DirectInput, it instead uses GetDeviceData's buffered input mechanism so it alters the states of those keys that have changed recently, and in the order of their alteration. That way we won't lose input as easily if the framerate drops too low. Since we only call this function once per frame, the buffer should hold all key changes since the last frame, unlike a regular key state check which might lose state changes during the frame. The keyboard stuffing "injection" buffer (used by another interface function in this file) is handled the same way as this DirectInput keyboard buffer is.
After the frame function are only a few more functions, all pretty small and to the point.
There... we got through DirectInput, and guess what? We'll barely have to touch this stuff again for the rest of the project. That's the great thing about encapsulation. Not only do you get to protect your code from itself, you get to protect it from Microsoft ;)
The Higher Ground
We're not entirely out of the user input water yet, though... now we have to do the higher layer. Fortunately, this is considerably smaller and simpler stuff.
[Look at in_main.h]
The layer above DirectInput is what other subsystems know and use when they want to deal with user input. The regular trio of functions is there of course (init, shutdown, and frame), and there's a few utility functions to turn key constants into strings and vise-versa, and get a key's shifted version if there is one. But the other three functions with "Receiver" in their names are where the real interface is.
[Look at in_main.cpp]
Receivers are just a term I'm using to describe a function that wants to receive user input. This is no different from the event handler that in_win uses, since a receiver is just a callback. The difference is, there can be more than one receiver at a time. Which one gets to handle an event depends on where it is in the "food chain" of receivers, and whether or not the receiver before it cared about the event it was getting.
The receiver structure is private to this file, and all receivers are allocated from a small array. The structure just contains a handler and pointers to its neighbor receivers in the chain (a doubly linked list). When you add a receiver, your receiver gets added to the top of the list and becomes the first function to receive any input that comes in. Each receiver that gets added pushes the previous receivers down the stack, and when a receiver is removed, the stack closes in around it. Think about how simple this can make dealing with menu or console redirection. When the menu pops up, its receiver is added and it gets a chance to deal with any input first. When it's done, it removes itself. And no other receiver has to worry about it. Convenient, huh?
The reasoning behind this system isn't that tough to figure out... more than one place is going to want to respond to input, and not everyone who wants input can get it, especially if something else already got the input first. I'm not saying this system is perfect by any means, but it certainly looks like a good place to start. While you can't predict every problem that will come up in programming (especially game programming), you still have to make a concerted effort. Looking at your situation, figuring out what might be necessary (not just now but further down the road), and working through how to deal with things before you code, can all help you turn what otherwise might be a sticky problem into a pretty simple matter.
If all this seems like a little too much for you to swallow at the moment, don't worry. We only have new articles once every two weeks, right? :) So take your time and look it over. The next issue will be a bit smaller (and less DirectX-ish), as we add a small file management library into the project. Until then, happy coding (and keep the feedback coming! :)
Until next time,
- Chris"Kiwidog" Hargrove is a programmer at 3D Realms Entertainment working on Duke Nukem Forever.
Code on the Cob is © 1998 Chris Hargrove.
Reprinted with permission.