RecommendationsI strongly recommend you to experiment with some of these techniques here. You can get large performance improvements for relatively little work. Prefetching is supported by a lot of processors and is quite easy to use. It also isn't a disaster if you get it wrong. Non-temporal stores, however, are much harder to get right. If you get these wrong, then you can get disastrous results. But when you get it right, the results are amazing (remember that you are stopping the caches from reading in all the memory that you are overwriting - a stupid and time-consuming thing to do). Both techniques are supported well on only the latest processors - but these are now perfectly affordable. The Duron and newer Celerons support both prefetching and non-temporal stores. Compile your code with VectorC for different processors and run the appropriate version for your user's computer. Cache sizes, available instructions and the situations that prefetching is beneficial are different on each processor. Aaaaaarrrrrrgghhh! But then that's PCs for you. NamingThe names are not my fault. I suppose prefetching is a reasonably sensible name, but what about "non-temporal store"? Intel seems to be going in for longer and longer names. MMX was nice and simple, but "Internet Streaming SIMD Extensions" is ridiculous. "Non-temporal" I suppose means that there is a long time between writing and reading the same bit of memory. But I think that "uncached write" would be a much more self-explanatory name. Maybe I should start a campaign to change the name. Write some petitions. Get signatures. Lobby my member of parliament. Or maybe not. |