GameDev.net -- Algorithmic Forays Part 8

Introduction

In the last column we talked about accelerating recursive computations with Memoization. I mentioned that Memoization is just a method of caching that can be applied to a wider range of problems. And indeed, not only recursive computations can benefit from caching. In this column we'll implement a complete caching class that can be readily applied to any problem, and we'll see a concrete example of using the cache to speed up calculations.

Software vs. Hardware cache

By reading this column you are most likely already using a cache. Your computer caches what you see on the screen and thus is able to work faster. This cache is a hardware cache, deep below in the hierarchy of computer architecture / hardware / software. It is completely transparent to us, the simple PC users.

It was noticed a long time ago that while CPU (Central Processing Unit) speeds accelerate quickly, the speed of computer memory (DRAM - Dynamic Random Access Memory) and the bus to the memory (Front Side Bus in a PC, or FSB in short) can't keep up. Each access to the memory is expensive, and the CPU spends many cycles just waiting for the data to arrive. Thus, caches were invented. A hardware cache is a small and extremely fast memory that is usually located on the same chip with the CPU. Its access time is almost as fast as the CPU itself and there is no external bus to wait for. When the CPU accesses some memory location, it stores it in the cache and in future accesses it's done much more quickly. Caches usually guess that if the CPU reads some memory location, there's a good chance that it'll read the next one as well, so they store a whole chunk of memory which often results in a very high cache hit-rate (the percentage of memory accesses that find what they want in the cache).

In reality, things are much more complicated. Caches pre-fetch data from the memory with methods based on the principles of time and space locality. These days there are at least 2 (sometimes 3) levels of cache in a computer, and complex algorithms are involved in cleaning up the cache and keeping it coherent (in sync) with the main memory, especially when multiple CPUs / cores are working together. This is a fascinating topic, and if you're interested there's a lot of free and well-written information floating on the web; just run a web search.

But this is all hardware cache. In this article I want to discuss software caches. A software cache is a programming technique used to speed up repetitive calculations, and we saw concrete examples of this in the previous column. The implementation level may be different, but the principles are the same. All we really want is to remember computations we already did and not repeat them unnecessarily.

Basic requirements and definitions

A cache has to store results of computations. That is, for inputs, it stores outputs. Therefore, a cache is somewhat similar to a dictionary data structure - it stores key/value pairs - given a key we want to quickly find the value. A hardware cache, for example, stores the contents of memory locations. Therefore its key is a memory address and its value is the address's contents.

How fast should a cache be? The obvious answer - as fast as possible - is not accurate. It depends on the keys we use. A solution that works fastest in the most general case is not always the solution that works fastest for some specific cases. We'll get back to this a bit later.

To infinity and beyond?

There is an important complication with caches we still haven't considered, however. Caches, like all data structures (and physical structures) have some finite size. Can we store all the calculations we'll ever need in a cache? Can we store the contents of all memory locations in a hardware cache?

The answer to both questions is, of course, no. Hardware caches, for example, are far smaller than the main memory (caches are made of fast hardware which makes them very expensive). Since the amount of memory is finite, we can't let our software caches grow forever. Therefore, the cache must have a limited size. The exact limit depends heavily on the application, the data and the amount of available mamory, so it's best to let the user of the cache decide how large he wants it to be.

Cache removal

So now we know our cache should be of limited size. This raises an important question: what to do when the cache is full? We can just stop and not add new keys, but this is obviously a bad solution. The alternative is to make free space for new keys by removing old ones - cache removal.

There are many algorithms and methods of cache removal, some of which depend on the data. Here are some of the more popular approaches:

Random
Using this approach, when the cache is full and we want to add a new key to it, we just throw out some old key at random.

LRU
LRU stands for Least Recently Used. Using this approach, we throw out the key that is the oldest - that is, it was accessed least recently.

MRU
MRU is Most Recently Used. We throw out the newest key, the one accessed most recently.

All three have their merits, and may be useful for certain types of data. In our cache implementation, I will use LRU, since I believe it fits the more common applications of a cache, and has a certain logical sense. After all, if there is some key we accessed more recently than another, it makes sence that the more recent key takes part in the current computations and the older key is the one that should be thrown away.

Requirements revisited

Contents

Introduction

Requirements revisited

Using the cache

Source code

Printable version

Discuss this article

The Series

Part 1

Part 2

Part 3

Part 4

Part 5

Part 6

Part 7

Part 8