I recently wrote a timer (in the sense of returning a timestamp on request), and am writing this report in the spirit of "learn now, pay forward", with the sincere hope that it helps someone avoid the same experience. Windows time APIs are discussed, but the concepts apply to all PCs. Acronyms are explained in the Glossary; let's jump right in. Timer RequirementsThe timer was intended for 0ad and whatever future needs may present themselves. The design goals were:
The timer would only be necessary on Windows - Linux already has gettimeofday, which fits the bill. PC time sources
There are some other esoteric ways to get the time, e.g. polling the DRAM refresh detect bit, but these aren't relevant here. Windows time APIs
libc functions such as _ftime and clock are built on top of GSTAFT. Timer problemsThe GTC and GSTAFT resolution is far too low - 10 ms isn't enough to resolve individual frames in a game, not to mention it's useless for profiling. TGT falls behind over time due to delayed or lost clock interrupts; I read that this sometimes broke StarTopia multiplayer games, and the problem wasn't noticed because it didn't happen on the development machines. Unfortunately, I do not remember where, nor can I find it. QPC has several issues:
.. and how to solve them2 approaches were considered:
Choosing a safe timer is done by checking (in order of preference, based on resolution and overhead) if each timer's known problems occur on this system and whether we can work around them. Examples: the TSC must not be used on SMP systems (inconsistent between processors), or when SpeedStep (throttling CPU frequency for cooling, or to save power) is active. The QPC race condition is worked around by reading its value every second from a thread; the jump problem is dodged by recognizing that on such systems (single processor desktop PIII-class) as it is known to occur, the TSC would be chosen, because it is safe to use there. Locking timer to system timeSo far, so good. We have a high resolution timer (HRT), and it deals with the above errors. What's left is to lock it to the system time (ST). Again, there were 2 possibilities:
This design is basically a PLL, as described in [tcl_timer] and [prec_time]. If we have the ST only good to 10 ms, we can obviously never return the current time closer to its true value. The only time the ST value is (almost) what it should be is right after it's updated. Without OS support, the best we can do is rely on the assumption that GSTAFT and the scheduler are driven by the same clock, i.e. after our calibration thread wakes up, the ST has just been updated. (Note: we can't poll for the start of the next tick - 10 ms delays are unacceptable). If this assumption is false, we are toast, otherwise, it works quite well. A simpler way to schedule periodic wakeups is via timeSetEvent, which has its own (high priority) thread. With the basic code in place, I set about finding an error correction function, h(hrt_to_st_difference) |-> hrt_freq_slew. The simple method given in [tcl_timer] is a p-controller (2) with clamped maximum adjustment. In my tests, its gain was way too high, and it was basically bouncing back and forth at the maximum allowed adjustment value. I hit the books on control theory, and understood enough to write a PID controller (3). This worked surprisingly well: the HRT was typically only a few microseconds (!) off from the given ST value. .. only to get bittenI left Winamp running during a long test, and the results were horrible. The calibration thread wasn't waking up on time (as shown by differing HRT and ST time deltas since last wakeup) - up to 5 ms late. With inaccurate system time values, the controller was constantly adjusting the frequency back and forth. I couldn't quite explain these values - my thread was _HIGH priority, so it must be drivers, and streaming a little music from the hard drive and sending it to the sound card doesn't account for that much latency. I added band-aids like using TGT boosted to 1 ms resolution to get more accurate ST, and even using the HRT to guess how late the wakeup came (contrary to the goal of syncing to the actual system time). This ran to 1200 lines of C++ code. It helped, but the timer still drifted too much. ShakedownHaving spent insane amounts of time on the simple matter of a timer, I came across a discussion of these problems ([ntdev]). I now completely agree with the assessment that long-term accurate, high resolution timing is, lacking a hardware timer capable of this, very difficult and not worth the trouble. Time of day doesn't usually need to be high resolution, and timers for profiling don't need to be free of long term drift. Moreover, you don't want to saddle a simple timer with time of day corrections. The Windows ST is updated in a strange fashion: every 57th or so update comes later, every 32nd comes earlier (from memory). Sometimes, the ST had not been updated after thread wakeup - is it done from a DPC? Finally, the time can jump due to NTP correction (built into WinXP). AftermathI stripped out the PLL, PID controller, and error correction code, leaving the HRT and the part of the calibration thread that measured its frequency, resulting in a (more) simple timer that works, and is only 430 lines. A few days later, I found out the excessive wakeup delays when running Winamp were due to a dying hard drive - it needed many retries to read the data. I now think the timer could have worked, but I believe splitting into high resolution timestamp and time of day functions is better. A decent timer (HPET) has finally been added to the architecture, but isn't widespread yet; failing that, the OS really should take care of this business. In the meantime, this timer will do. I hope this helps someone; at least I learned a lot during the whole mess ;) If you're interested in the (GPLed) source, drop me a line. Also, questions/comments/suggestions/bug reports welcome! u9rktiiistud.uni-karlsruhe.de (remove digit, replace "iii" with "@") NotesAll CPU clock timings are for my system, an Athlon XP.
Glossary
ReferencesCode// decide upon a HRT implementation, checking if we can work around // each timer's issues on this platform, but allow user override // in case there are unforeseen problems with one of them. // order of preference (due to resolution and speed): TSC, QPC, TGT. // split out of reset_impl so we can just return when impl is chosen. static void choose_impl() { bool safe; #define SAFETY_OVERRIDE(impl)\ if(overrides[impl] == HRT_DISABLE)\ safe = false;\ if(overrides[impl] == HRT_FORCE)\ safe = true; #if defined(_M_IX86) && !defined(NO_TSC) // CPU Timestamp Counter (incremented every clock) // ns resolution, moderate precision (poor clock crystal?) // // issues: // - multiprocessor systems: may be inconsistent across CPUs. // could fix by keeping per-CPU timer state, but we'd need // GetCurrentProcessorNumber (only available on Win Server 2003). // spinning off a thread with set CPU affinity is too slow // (we may have to wait until the next timeslice). // we could discard really bad values, but that's still inaccurate. // => unsafe. // - deep sleep modes: TSC may not be advanced. // not a problem though, because if the TSC is disabled, the CPU // isn't doing any other work, either. // - SpeedStep/'gearshift' CPUs: frequency may change. // this happens on notebooks now, but eventually desktop systems // will do this as well (if not to save power, for heat reasons). // frequency changes are too often and drastic to correct, // and we don't want to mess with the system power settings. // => unsafe. if(cpu_caps & TSC && cpu_freq > 0.0) { safe = (cpus == 1 && !cpu_speedstep); SAFETY_OVERRIDE(HRT_TSC); if(safe) { hrt_impl = HRT_TSC; hrt_nominal_freq = (i64)cpu_freq; return; } } #endif // TSC #if defined(_WIN32) && !defined(NO_QPC) // Windows QueryPerformanceCounter API // implementations: // - PIT on Win2k - 838 ns resolution, slow to read (~3 µs) // - PMT on WinXP - 279 ns ", moderate overhead (700 ns?) // issues: // 1) Q274323: may jump several seconds under heavy PCI bus load. // not a problem, because the older systems on which this occurs // have safe TSCs, so that is used instead. // 2) "System clock problem can inflate benchmark scores": // incorrect value if not polled every 4.5 seconds? solved // by calibration thread, which reads timer every second anyway. // - TSC on MP HAL - see TSC above. // cache freq because QPF is fairly slow. static i64 qpc_freq = -1; // first call - check if QPC is supported if(qpc_freq == -1) { LARGE_INTEGER i; BOOL qpc_ok = QueryPerformanceFrequency(&i); qpc_freq = qpc_ok? i.QuadPart : 0; } // QPC is available if(qpc_freq > 0) { // PIT and PMT are safe. if(qpc_freq == 1193182 || qpc_freq == 3579545) safe = true; // make sure QPC doesn't use the TSC // (if it were safe, we would have chosen it above) else { // can't decide yet - assume unsafe if(cpu_freq == 0.0) safe = false; else { // compare QPC freq to CPU clock freq - can't rule out HPET, // because its frequency isn't known (it's at least 10 MHz). double freq_dist = fabs(cpu_freq / qpc_freq - 1.0); safe = freq_dist > 0.05; // safe if freqs not within 5% (i.e. it doesn't use TSC) } } SAFETY_OVERRIDE(HRT_QPC); if(safe) { hrt_impl = HRT_QPC; hrt_nominal_freq = qpc_freq; return; } } #endif // QPC // // TGT // hrt_impl = HRT_TGT; hrt_nominal_freq = 1000; return; assert(0 && "hrt_choose_impl: no safe timer found!"); hrt_impl = HRT_NONE; hrt_nominal_freq = -1; return; } The rest of the code is straightforward and mostly uninteresting.
|
|