|
|
Subject:
Pentium 4 L2 cache optimization for cached writes
Category: Computers > Programming Asked by: homersan-ga List Price: $200.00 |
Posted:
12 Jul 2004 22:06 PDT
Expires: 11 Aug 2004 22:06 PDT Question ID: 373331 |
How can I avoid having the CPU (Pentium 4 - Northwood or Prescott core) first fill a L2 cache line from memory, upon a write miss? I should clarify that I am NOT talking about non-temporal stores, here. I cannot afford for these writes always to propagate to main memory (and I'd also like the data written to be cached), so neither non-temporal stores nor write-through approaches would suffice. Also, the solution must not disturb the normal write-back characteristics of other memory operations. Finally, I need to do this from userspace, and can't afford to lock down pages or provide a physical address (as opposed to the logical addresses normally used in C programs) or execute privileged instructions. To give an example of what I mean, another CPU I programmed (which also had a write-back cache) had an instruction to allocate a cache line for a given address. Upon execution of this instruction, the contents of the cache line were undefined (or maybe '0's), since it was intended that you'd only use this instruction when you're intending to overwrite the contents of the entire line (hence, there would be no reason to first fill it with the current memory contents of the addresses it cached). I'm not too familiar with the more recent IA32 CPUs, but it's possible there simply is no answer. |
|
There is no answer at this time. |
|
Subject:
Re: Pentium 4 L2 cache optimization for cached writes
From: wsc9tt-ga on 29 Jul 2004 14:38 PDT |
You are describing the ZALLOC instruction. It is a zero allocate of memory that has the property of "forgetting" any dirty data that might be in a cache line. Unfortunately it was never implemented. It is talked about here: http://groups.google.com/groups?q=zalloc+instruction+glew&hl=en&lr=&ie=UTF-8&c2coff=1&selm=7em221%24tmo%241%40news.doit.wisc.edu&rnum=1 Have you tried benchmarking the non-temporal stores? Those will make it to memory but in a lazy fashion and the data is still cached in the near data cache it just bypasses the L2 cache. -Wayne |
Subject:
Re: Pentium 4 L2 cache optimization for cached writes
From: homersan-ga on 29 Jul 2004 20:27 PDT |
No, as specified in the question, non-temporal stores are not deemed an acceptable solution. I did, in fact, benchmark non-temporal stores, as well as 32-, 64-, and 128-bit stores. I can't afford always to propagate all stores to main memory, yet I don't know which data I'll later need to be cached and which will be over-written. I got over 10 GB/sec on writes to L2 cache, and only 2 GB/sec to main memory (4 GB/sec non-cached). BTW, a potential solution they could have implemented was a CACHED write that used write-combining buffers (which the non-temporal stores also use). I'm not aware of any such instruction, but if you are (and it's not privileged and doesn't require physical addresses), you can answer my question! Thanks for the comment! |
Subject:
Re: Pentium 4 L2 cache optimization for cached writes
From: homersan-ga on 29 Jul 2004 20:29 PDT |
BTW, from what I've seen/read, non-temporal stores DO NOT use L1 cache - they use write combining buffers. |
If you feel that you have found inappropriate content, please let us know by emailing us at answers-support@google.com with the question ID listed above. Thank you. |
Search Google Answers for |
Google Home - Answers FAQ - Terms of Service - Privacy Policy |