Upcoming Events
Unite 2010
11/10 - 11/12 @ Montréal, Canada

GDC China
12/5 - 12/7 @ Shanghai, China

Asia Game Show 2010
12/24 - 12/27  

GDC 2011
2/28 - 3/4 @ San Francisco, CA

More events...
Quick Stats
88 people currently visiting GDNet.
2406 articles in the reference section.

Help us fight cancer!
Join SETI Team GDNet!
Link to us Events 4 Gamers
Intel sponsors gamedev.net search:

  Contents

 Introduction
 Prefetching
 Non-Temporal
 Stores

 Recommendations

 Get VectorC
 Printable version

 


Non-Temporal Stores

If you are going to overwrite the whole cache line, then reading the old values in first is a massive waste of time. When writing out large amounts of data, you do want to overwrite a lot of whole cache lines, so you need a way of writing out to memory directly, without affecting the cache. This also has the useful side-effect of not removing useful data from the cache. This operation is called a "non-temporal store" and is supported on the Athlon and Pentium III.

Unfortunately, only a limited number of non-temporal stores are available, so it can usually only be used after you have "vectorized" your code and are writing out 8 or 16 bytes at a time from MMX, 3DNow or Streaming SIMD registers.

Here is a simple example of using non-temporal stores with VectorC and assembly language.


void writetest (int __hint__ ((nontemporal)) *a, int v)
    {
    int i;

    for (i=0; i<SIZE; i++)
        a [i] = v;
    }

Loop:
        movntq [eax],mm0
        add eax,8
        dec ecx
        jne Loop
        ret

This example writes out the same value to a large array using non-temporal stores. In the VectorC version, I have added "__hint__((nontemporal))" to the definition of the pointer to tell VectorC that I want non-temporal stores. I have also written a loop that I know VectorC can vectorize with MMX.

In the assembly language, I have used an MMX register to write out data. Unfortunately, I couldn't have used a general-purpose register like eax, because there is no non-temporal store using general-purpose registers.

Problems with Non-Temporal Stores

When using non-temporal stores, there are a few potential problems. If you mix non-temporal stores with normal (cached) stores, then you get a massive speed reduction. So be careful - this can be a problem when compiling with VectorC, because you don't have complete control over what type of stores are used. It is worth checking the speed of routines compiled with this hint or checking the assembly language produced. This will become a bit easier with the "Interactive Optimizer" which CodePlay will releasing very soon.




Next : Recommendations