String Usage and Architecture
Introduction IntroductionStrings are vital to any game programming project. Strings are used for many important tasks, including outputting text, reading and writing data to and from files, multiplayer programming via sockets, and many other uses. Most games are currently programmed using C/C++. However, unlike languages such as Visual Basic, C++ does not include built-in support for high-level strings. Higher level strings can be used in C++ with the Standard C++ Library's string class, the MFC library's CString class, or a custom written string class. Even still, there are many situations where standard C strings may be a better choice -- such as when performance is needed, or for multi-platform programming, where C is commonly used. And knowing how C strings actually work will allow a programmer to utilize higher-level string classes more effectively. Array and Memory Management BasicsTo understand how C strings actually work, it is first important to understand basic memory management, as well as arrays. Basic Memory ManagementThere are two ways that memory is allocated in C/C++: on the stack, and on the heap. Memory is allocated on the stack when you declare a variable or array inside a function, like this: void stackfunc() { char test; // This variable is allocated on the stack char anarray[500]; // This entire array is allocated on the stack } Memory allocated on the stack has many advantages. First, allocating it is fast: this is important in games. Second, memory allocated on the stack is automatically freed when you are done with it. The main disadvantage is that you must know exactly how much memory you will need at compile time. The alternative is to allocate memory on the heap. Memory can be allocated on the heap by using global variables, by using the malloc() and free() functions (C/C++), or by using the new and delete operators (C++ only). For example: int myarray[300]; // This array is allocated on the heap void heapfunc() { char* heapvar; char* heapvar2; char* heaparray; int arraysize = 500; heapvar = new char; heaparray = new char[arraysize]; heapvar2 = (char*)malloc(sizeof(char)); delete heapvar; delete [] heaparray; free(heapvar2); }When memory is allocated on the heap by using global variables, it is automatically freed for you. However, if it is allocated on the heap by using malloc/free or new/delete, you must free it explicitly, as done in the example. Allocating memory on the heap is slower than allocating memory on the stack, but we can allocate a dynamic amount, such as here, where the amount to allocate for heaparray is stored in an int. Also, memory allocated on the heap, unlike memory allocated on the stack, gives you explicit control over when memory is allocated and freed. Note that in the example above, the actual pointers are still allocated on the stack -- it's the memory that they point to that is allocated on the heap. ArraysArrays are frequently taught early in most introductory computer science courses; however, their inner workings are rarely discussed in detail. To illustrate, I will use the following code sample: void arrayfunc() { int myarray[5]; } You may be surprised to know that arrays aren't a magical data type or feature of C/C++. In this example, myarray is just a variable of type int* (a pointer to an int). But what does myarray point to? Let's take a look at a graphic representation of myarray.
Also, you may not know that myarray[2] is really just shorthand for *(myarray+2). In fact, 2[myarray] works the same as myarray[2], because *(myarray+2) is the same as *(2+myarray). Null-Terminated StringsMemory Arrangement of StringsA string in C is just an array of type char. Type char takes up 1 byte of memory per element, and can have values between -127 and 128 (but value less than 0 are rarely used). In each element of the array, a special ANSI character code is placed to represent the character in that position in the string. These character codes are really just numbers. For example, the character A is 65, B is 66, C is 67, etc. You usually do not need to know these codes while programming; if you place a character in single-quotes, the compiler will replace it with the number that represents that character. For example, 'A' is equivalent to 65, 'B' to 66, 'C' to 67, etc. Note: There is an alternate standard to the one described above, called Unicode, or sometimes "wide chars". Unicode uses an array of type short, rather than char. However, Unicode is used mostly in applications, not games, and is thus beyond the scope of this article. Standard C strings usually have another property: they are "null-terminated". That means, the element after the last character in the string is the character code 0 ('\0'). This is NOT the printable number 0, whose character code is 48. For example, here is how the string "Hello" would be stored in memory: Assigning Strings by HandThe following code example stores "Hello" in a string, albeit in a very primitive manner: void stringfunc() { char str[50]; str[0] = 'H'; str[1] = 'e'; str[2] = 'l'; str[3] = 'l'; str[4] = 'o'; str[5] = 0; } Observe that, when constructing a string by hand, we must explicitly set the last character to NULL. Also note that we've completely ignored the contents of the string after the NULL character. Using Math Operators on StringsYou may be tempted to use operators like equals (=) or addition (+) to assign or concatenate strings, as done in many languages. However, that does not work. Let's look at what happens when you use the equals operator on a string: void stringfunc2() { char str1[3]; char* str2; str1[0] = 'H'; str1[1] = 'i'; str1[2] = 0; str2 = str1; // Does not work as expected! } This example tries to do something which seems intuitive: set a string equal to the contents of another. However, as we learned in the previous section, str1 and str2 are nothing more than pointers to the first elements of their respective strings. All that code does is cause str1 and str2 to point to the same area of memory. Thus, str2 becomes an "instance" of str1; any changes done to str1 will change str2, and any changes done to str2 will change str1. There will actually be cases where we want to do this, as we'll see later on in this article, but this is not how you copy the contents of one string to another. Basic String FunctionsString Assignment and ConcatenationThe standard C library comes with a plethora of functions to manipulate strings in many ways. The two most basic ways you can manipulate strings are assignments and concatenations. We covered the "hard way" to do assignments in the last section: by setting each element, including the trailing NULL, by hand. Here, we'll show the "hard way" to do concatenations (adding a string onto the end of another string), by concatenating the string " World" onto the end of "Hello": Warning: This code example is advanced. Do not worry if you do not understand it. A much simpler way to accomplish the same task will be presented shortly in the article. void stringfunc3() { char str[50]; // Set to "Hello" str[0] = 'H'; str[1] = 'e'; str[2] = 'l'; str[3] = 'l'; str[4] = 'o'; str[5] = 0; // Concatenate " World" // First, find the end of the string char* strp = &str[0]; // Set a pointer to the first element // Note that we could have done str, rather than &str[0], since str // itself is a pointer to the first element // Increment strp until it is equal to the trailing NULL while (*strp) strp++; // Now, strp is effectively a string that starts just after the last character of str! // We can now set it to " World" just like we set str to "Hello" above. strp[0] = ' '; strp[1] = 'W'; strp[2] = 'o'; strp[3] = 'r'; strp[4] = 'l'; strp[5] = 'd'; strp[6] = 0; // If you printf or cout str, you will find it is now "Hello World" } Note that, when concatenating " World" onto the end of "Hello", the first character of " World" overwrote the terminating NULL character, which was then re-added at the end of the new string. Easy String Assignment and ConcatenationIt seems like a lot of work to assign and concatenate strings! Fortunately, there are two functions that make our lives a lot easier: strcpy() and strcat(). Here is the last code example rewritten using strcpy and strcat: void stringfunc3() { char str[50]; strcpy(str, "Hello"); strcat(str, " World"); } Talk about easier! The strcpy function takes two parameters: destination string and source string (in that order). It copies the contents of source string into destination string (much like you would expect dest = source to do). Note that rather than copying "Hello", we could have copied the contents of another string. The strcat function concatenates the source string (the second string) onto the end of the destination string. This is much like you would expect dest += source to do. Observe that we did not have to do anything special with the trailing NULL character. Both strcpy and strcat handle the trailing NULL character for us. We still have to ensure there is enough room for both the string and the trailing NULL character in the array, though. Protecting Against OverflowsI just mentioned that you need to make sure there is enough space for the string and the trailing NULL character inside the array. But what if there isn't? What if we ask the user to type in their name, and they type in a really long name? If we use strcpy and strcat, these functions will attempt to write past the end of the string, usually resulting in a crash. To prevent that, most string functions, including strcpy and strcat, have so-called "counted" variants. Here is the previous function safely rewritten using counted functions: void stringfunc3() { char str[50]; strncpy(str, "Hello", 49); strncat(str, " World", 49 - strlen(str)); } The first line isn't much changed, except we call the counted version of strcpy, which is strncpy, and pass it the maximum string length. Note that we pass it 49 rather than 50, because the maximum string length is actually 49 (one character must be left for the trailing NULL). This tells strncpy not to copy more than 49 characters, to prevent a overflow. The second line is a bit more complicated, and introduces a new function, strlen(). strlen will return an integer representing the length (not including the trailing NULL) of a string. This function is needed because the parameter to strncat tells it how many characters to append, not how many characters the final string should be. So, we subtract from 49 the current length of the string to find the maximum number of characters we can append. Except for in time critical sections of code, ALWAYS use the counted variants of string functions! The rest of this article will always use counted functions where they are available. Advanced String FunctionsComparing StringsJust like you can't assign a string to another using equals in C/C++, you can't compare strings using == as you can with numbers. To compare strings, you must use the strcmp function. strcmp's return value system is a bit counter-intuitive. This table shows what it returns:
Here is an example of usage of strcmp: void stringfunc4() { char str1[50]; char str2[50]; char str3[50]; char str4[50]; strncpy(str1, "Hi", 49); strncpy(str2, "Hi", 49); strncpy(str3, "Bye", 49); strncpy(str4, "hI", 49); if (!strcmp(str1, str2)) // This if statement is TRUE: They are equivalent { printf("str1 and str2 are equivalent\n"); } if (!strcmp(str1, str3)) // This if statement is FALSE: They are NOT equivalent { printf("str1 and str3 are equivalent\n"); } if (!strcmp(str1, str4)) // This if statement is FALSE: They are NOT equivalent (different case) { printf("str1 and str4 are equivalent\n"); } } Contrary to what seems obvious, strcmp actually returns 0 if the strings ARE equivalent, which is why we negated it with the ! operator. Refer to the table above for more detail on what strcmp returns. strcmp is also case sensitive: it will consider "Hi" as a different string than "HI", "hi", and "hI". Most architectures also have a case-insensitive version available, but it is less standard. The function is usually called stricmp, _stricmp, or strcasecmp. Note: We did not use the counted version of strcmp here (strncmp). This is because strcmp does not change the value of either strings. strncmp is only needed for certain situations where you only want to compare the beginning parts of strings. Advanced String FormattingSetting complicated strings using strcpy and strcat can get tedious. For advanced, powerful output, the sprintf function is available. In fact, it is exactly the same as the printf function, except that it takes an extra parameter, the string to "print" to. The format specifiers of sprintf are very powerful, and beyond the scope of this article; look them up in your helpfile/manpages for more options. Here are some examples: void stringfunc5() { char str1[100]; char str2[100]; // Produces: "30 people ate 20 pieces of cheese." snprintf(str1, 99, "%d people ate %d pieces of cheese.", 30, 20); // Produces: "3.3000 quick brown foxes jumped over 27 lazy dogs." snprintf(str2, 99, "%f quick %s foxes jumped over %d %s dogs.", 3.3, "brown", 27, "lazy"); } With sprintf, it's even more important to use the counted version (snprintf) than with strcpy and strcat, because you usually won't have much of an idea how long the final string will be. Advanced String ParsingParsing strings can be one of the more complicated parts of string programming. The sscanf function, similar to the keyboard scanf function, can make life a lot easier. It uses the same format specifiers as sprintf, although integers and floats must have their addresses passed. Let's take an example: void stringfunc6() { char str1[100]; char str2[100]; char str3[100]; int anint; // Example 1: Seperate words in a string strncpy(str1, "Hello there", 99); sscanf(str1, "%s %s", str2, str3); // Example 2: Expects a string giving a noun, and the number of that noun present strncpy(str1, "5 bears", 99); sscanf(str1, "%d %s", &anint, str2); } Like sprintf, sscanf is a complicated function, with complex formatting options. Refer to your help file or manpages for more detail about using sscanf. String Manipulation TricksThis last section describes some advanced tricks we can do with strings by playing with pointers and NULL termination characters. Stripping Characters from the Beginning of a StringOne of the more common things to do with strings is to strip a certain number of characters off the beginning and end of a string. Here is how to strip characters off the beginning: void stringfunc7() { char str1[100]; char* str2; strncpy(str1, "The first four characters will be stripped off.", 99); str2 = str1 + 4; printf("%s", str2); // Prints: first four characters will be stripped off. } str2 is now str1, but with the first 7 characters stripped off. The nice thing about this method is that str1, including the first 7 characters, is still intact. This method works because str1 is really just a pointer to the first character in the string. By making str2 a character to the fifth character, we strip off the first four characters. Here is a graphic representation of what we just did:
Stripping Characters from the End of a StringUnfortunately, strings don't operate by using pointers to the ends of strings, so we can't strip characters from the end using the same method. But, we can strip characters by adding NULL characters before the actual end of the string, effectively changing the end. Keep in mind that all string functions assume a string has ended as soon as they hit a NULL, and ignore anything past it. Here is an example: void stringfunc8() { char str1[100]; char str2[100]; strncpy(str1, "All but the first 15 characters of this string will be stripped off.", 99); strncpy(str2, "The last 9 characters of this string will be removed.", 99); // Adding a NULL at the 16th element (15) leaves the first 15 (0-14) intact. str1[15] = 0; // Similar to the above line, but uses strlen to calculate the length. str2[strlen(str2) - 9] = 0; printf("%s", str1); // Prints: All but the fir printf("%s", str2); // Prints: The last 9 characters of this string will be } This method is slightly less elegant than the above method, as it effectively destroys the part of the string that we strip off, unlike stripping from the beginning, which leaves the beginning intact. In our example, with str1, we COULD later restore the string by storing the value of str[15] in a char, and setting str[15] to it when we wanted the string back. With str2, we would have to save both the index (strlen(str2) - 9), as well as the value of the character at that position. ConclusionAs you have learned, standard C strings are powerful and complex, yet elegant. While you may prefer to use the standard C++ library string class, since it is easier to use and safer, you now have a good idea about how things work behind the scenes. Discuss this article in the forums
See Also: © 1999-2011 Gamedev.net. All rights reserved. Terms of Use Privacy Policy
|