Google Answers Logo
View Question
 
Q: Pentium 4 L2 cache optimization for cached writes ( No Answer,   3 Comments )
Question  
Subject: Pentium 4 L2 cache optimization for cached writes
Category: Computers > Programming
Asked by: homersan-ga
List Price: $200.00
Posted: 12 Jul 2004 22:06 PDT
Expires: 11 Aug 2004 22:06 PDT
Question ID: 373331
How can I avoid having the CPU (Pentium 4 - Northwood or Prescott
core) first fill a L2 cache line from memory, upon a write miss?  I
should clarify that I am NOT talking about non-temporal stores, here. 
I cannot afford for these writes always to propagate to main memory
(and I'd also like the data written to be cached), so neither
non-temporal stores nor write-through approaches would suffice.  Also,
the solution must not disturb the normal write-back characteristics of
other memory operations.  Finally, I need to do this from userspace,
and can't afford to lock down pages or provide a physical address (as
opposed to the logical addresses normally used in C programs) or
execute privileged instructions.


To give an example of what I mean, another CPU I programmed (which
also had a write-back cache) had an instruction to allocate a cache
line for a given address.  Upon execution of this instruction, the
contents of the cache line were undefined (or maybe '0's), since it
was intended that you'd only use this instruction when you're
intending to overwrite the contents of the entire line (hence, there
would be no reason to first fill it with the current memory contents
of the addresses it cached).

I'm not too familiar with the more recent IA32 CPUs, but it's possible
there simply is no answer.
Answer  
There is no answer at this time.

Comments  
Subject: Re: Pentium 4 L2 cache optimization for cached writes
From: wsc9tt-ga on 29 Jul 2004 14:38 PDT
 
You are describing the ZALLOC instruction.  It is a zero allocate of
memory that has the property of "forgetting" any dirty data that might
be in a cache line.  Unfortunately it was never implemented.

It is talked about here:
http://groups.google.com/groups?q=zalloc+instruction+glew&hl=en&lr=&ie=UTF-8&c2coff=1&selm=7em221%24tmo%241%40news.doit.wisc.edu&rnum=1

Have you tried benchmarking the non-temporal stores?  Those will make
it to memory but in a lazy fashion and the data is still cached in the
near data cache it just bypasses the L2 cache.

-Wayne
Subject: Re: Pentium 4 L2 cache optimization for cached writes
From: homersan-ga on 29 Jul 2004 20:27 PDT
 
No, as specified in the question, non-temporal stores are not deemed
an acceptable solution.  I did, in fact, benchmark non-temporal
stores, as well as 32-, 64-, and 128-bit stores.  I can't afford
always to propagate all stores to main memory, yet I don't know which
data I'll later need to be cached and which will be over-written.  I
got over 10 GB/sec on writes to L2 cache, and only 2 GB/sec to main
memory (4 GB/sec non-cached).


BTW, a potential solution they could have implemented was a CACHED
write that used write-combining buffers (which the non-temporal stores
also use).  I'm not aware of any such instruction, but if you are (and
it's not privileged and doesn't require physical addresses), you can
answer my question!


Thanks for the comment!
Subject: Re: Pentium 4 L2 cache optimization for cached writes
From: homersan-ga on 29 Jul 2004 20:29 PDT
 
BTW, from what I've seen/read, non-temporal stores DO NOT use L1 cache
- they use write combining buffers.

Important Disclaimer: Answers and comments provided on Google Answers are general information, and are not intended to substitute for informed professional medical, psychiatric, psychological, tax, legal, investment, accounting, or other professional advice. Google does not endorse, and expressly disclaims liability for any product, manufacturer, distributor, service or service provider mentioned or any opinion expressed in answers or comments. Please read carefully the Google Answers Terms of Service.

If you feel that you have found inappropriate content, please let us know by emailing us at answers-support@google.com with the question ID listed above. Thank you.
Search Google Answers for
Google Answers  


Google Home - Answers FAQ - Terms of Service - Privacy Policy