Shaving Ping
Squeezing More Out of Hosted Game Servers with Optimized Game Server Code
General OptimizationA good first step toward beefing up your server program is to compile it with an effective optimizing compiler. One choice is the Intel C/C++ Compiler 2), available in both Linux and Windows flavors. If the server is a Windows system, the software developer can use their pre-existing Microsoft DevStudio IDE to manage projects and compiles, with the Intel C and C++ compilers underneath. If the server is a Linux system, the user can choose to use the Eclipse software development CDT environment or good, old fashioned command line editors and "make". How to start? For this exercise, a solid game engine example was selected: Richard Stanway's R1Q2 3). This is a tightened and enhanced version of the Quake 2 engine, which was release to the Open Source community by ID Software back in 2001. Older code? Yes, but many game programmers cut their teeth on Q2 mod development. It's a known space and a good reference point. Rich's R1Q2 was coupled with code from the LOX Q2 mod, an "extreme weapons" mod built by David S. Martin and friends, and enhanced by Geoff Joy and others. The LOX mod is a good example of performance challenging code, as the massive number of events that can be created by a single player with the right weapon selections and feature combinations can bring an otherwise healthy server to its knees. Again for this example, the target server platform is Linux, the default choice among server hosting companies where game server engines have a Linux server offering. The test server used was a vanilla Red Hat Enterprise Linux 3 (Taroon Update 4) server, running on a 3.7 GHz Pentium 4 with 1 Gig of RAM, spinning a standard Serial ATA hard drive. Note that all of the steps being discussed here, including the optimization techniques and compiler features, are applicable to or available on Windows as well. Step One:Get the code. Unwrapping the code and doing a straight gcc compile using the ---O2 optimization switch with the provided makefiles generated usable binaries that performed as expected. A pair of client machines running on an isolated net connected without issue and achieved pings from varying from 15 to 35 ms. Since this code has had some level of grooming, compiler warnings were minimal. Step Two:Perform reference benchmarking. In this case, two client machines were connected to the server, running its standard version of binaries, from a local network connection. Their static pings were recorded, as were their pings when the server was stressed. In this case, the stress test involved having the players from both client test machines launch 4 napalm grenades per second from a fixed location on the servers default level, generating at least 128 in-game explosions per second. Client "freeze" behavior, typical in this server stress condition, was monitored, as was the frequency of "RATEDROP" warnings, issued from the server when a significant drop in server-client data exchange rate is detected. Step Three:Get the Intel compiler. The Intel C/C++ compiler package is available for demo download, with academic, non-commercial, and commercial licenses. The software installs on nearly all major Linux distributions, including those not supporting RPM. Step Four:Update the makefiles to enable optimization options. In this case, that meant changing "CC=gcc" to "CC=icc". The R1Q2 makefile required no dependency changes or LDFLAGS changes. The LOX makefile required a minor change to the LDFLAGS setting to accommodate the new library home for a couple of key string functions. For round one of our compiler optimization exercise, CFLAGS was changed to add the -02 optimization switch. This is the most commonly recommended option, performing many optimizations for speed without significant regard to the impact on code size, including but not limited to:
One thing that became clear during the initial build with the Intel compiler was that the number of warnings increased, going from 4 to 62. Most of the warnings were variable type checking issues. Some of them warranted further investigation. In this case, only minor code changes were required. The newly rebuilt binaries were tested and results gathered. For the next round of optimization, the -02 CFLAGS option was changed to -03. This option, according to the documentation, contains "more aggressive optimizations, such as prefetching, scalar replacement, and loop and memory access transformations". This includes all of the features of the -02 optimization, plus loop unrolling, code replication to eliminate branches, and padding of certain power-of-two arrays to improve cache use. Again, the newly built binaries were tested and results were gathered. For round three, the binaries were built with an added switch: -axN. This switch enables processor-targeted optimization, in this case specifically for Intel Pentium 4 and compatible chips. Once again the new binaries were tested. The final round of compiler switch optimization called for changing the -axN switch to -axP. This option optimizes the output for Intel Pentium 4 processors with Streaming SIMD Extensions 3 (SSE3) instruction support. Once more the resulting binaries were tested. |
|