Using the cache
Using this cache is very simple. Here's a small demonstration:
cache<string, double> cc(4); cc.insert("pi", 3.14); cc.insert("e", 2.71); cc.insert("gold", 1.61); cc.insert("sq2", 1.41); cc.debug_dump(); cc.insert("zero", 0); cc.debug_dump(); double* e_value = cc.find("e"); cc.insert("one", 1); cc.debug_dump(); cc.statistics(); for (int i = 0; i < 30; ++i) double* one_value = cc.find("one"); cc.statistics();
Run this (don't forget to #include "cache.h" and run in debug mode, so that statistics will be collected and printed). Try to predict what the state of the cache is during the execution.
In the first dump, you see the items you inserted, in MRU order. In the second dump, you don't see "pi". That's because it's LRU and was removed when "zero" was added. In the second dump you don't see "gold". Why not "e", which was inserted before "gold"? Because "e" was accessed by find, and thus was marked MRU.
The way the cache is currently implemented, it does all operations in O(log N) (N being the cache size). LRU removal/splicing is very efficient (O(1)). What takes most time is the map lookups. Can't we make those more efficient?
As a matter of fact, we can. Well, in most cases. By using a hash table instead of map (which uses trees and hence is logarithmic), we can make all cache operations O(1). There's only one catch though - this can be done only if we have good hashing functions for our keys. But since most of the keys would be either numbers or strings, and good hashing functions for those exist, it's not a real problem.
Interestingly, the C++ standard library has an extension container named hash_map, which is a hash table. Since it's not standard yet (it's only an extension), its implementations differ and aren't very stable. Bruce Eckel, in his "Thinking in C++" book, creates a benchmark that gives him 4x speedup with hash_map against map.
Maybe his implementation of hash_map is better, but I didn't get such results with my tests (on Microsoft Visual C++ .NET's implementation of STL). I got only a minimal (about 20%) speedup for integer keys (Eckel's benchmark, in my opinion, is very dubious - the data he uses isn't too good for reliable benchmarking). When I tried strings as keys, hash_map was in fact twice as slow as map.
Hence, I stuck with map, but I'm confident that given a good implementation of a hash map and a good hashing function for the keys, the cache can be made more efficient. The fact that the cache size is limited and known beforehand only helps to create a very speedy hash table. This is left as an exercise for the astute reader.
Copyright © 2005 Eli Bendersky