Google Answers: Registers vs SRAM

View Question

Q: Registers vs SRAM ( Answered 2 out of 5 stars

Question

Subject: Registers vs SRAM
Category: Computers > Hardware
Asked by: aaslam-ga
List Price: $20.00

Posted: 23 Jul 2004 14:28 PDT
Expires: 22 Aug 2004 14:28 PDT
Question ID: 378292

Are Registers inherently faster than SRAM? Why or Why not? If
Registers are Faster, how much Faster they are compared to SRAM? How
much storage bits (typical) provided by registers (implemented
on-chip)?

Clarification of Question by aaslam-ga on 26 Jul 2004 10:36 PDT

I want an answer from technological point of view. What is the
technological difference between Registers and SRAM that cause
Registers faster than SRAM?

Answer

Subject: Re: Registers vs SRAM
Answered By: maniac-ga on 26 Jul 2004 19:16 PDT
Rated: 2 out of 5 stars

Hello Aaslam, Background: when referring to SRAM, I assume you are talking about main memory. If you really mean caches (as the comment implies), the performance hit is much less - perhaps 5:1 but the explanations still apply. I would be glad to revise the answer if needed to more accurately describe SRAM in caches. Q: Are registers inherently faster than SRAM? A: Yes. Q: Why? A: Several factors including: - distance signals must travel (speed of light, registers are part of the CPU) - speed of components implementing the registers (also more expensive) - complexity of interface (registers - direct, memory through one or more cache levels / memory controllers) - access to memory may be delayed by I/O units (e.g., DMA, PCI transfers) When a cache is involved, there is also some complexity related to independent updates of memory. This will require cache flushes to keep the cache / memory contents consistent (sometimes referred to as cache coherency). Refer to: http://www.sc2001.org/papers/pap.pap120.pdf (describes cycle stealing) [several subsequent links also help describe this] Q: How much faster are registers compared to SRAM? A: Some old designs had basically a 1:1 ratio - each instruction took one cycle (instruction fetch / memory operation). For more modern systems, it varies by system, but a typical ratio is 30:1. The ratio of 100:1 as noted in the comment is also possible. There are also "Non Uniform Memory Access" (NUMA) systems where times to local memory are similar to a normal system, but you also have access to other machines where the ratio could be 10 times worse (or more). Also note that write access may be "faster" than read access due to caching effects (fewer CPU stalls). For reference: http://ciips.ee.uwa.edu.au/~morris/Year2/PLDS210/mem_hierarchy.html (data for 1996 - 200 to 400 Mhz processors) http://www-courses.cs.uiuc.edu/~cs333/slides/chapter5%5B1%5D-1.pdf (general material, describes ratios of 2:1 up to 200:1 in one table) Q: How many storage bits are provided on chip by registers? A: Wow - that varies a lot by processor type and age of the system. The "right answer" is a pretty wide range. In the early to mid 70's it was common to have a single or pair of registers. For a 16 bit word size, that would thus be 16 or 32 bits in registers. You may still see that today in embedded microprocessors as well. Several CISC machines would later have 8 to 16 registers - so at 32 bits, you get 256 to 512 bits. Several RISC machines would have up to 32 registers, so 1024 to 2048 bits (32 or 64 bit). Note this is MUCH smaller than cache sizes and memory sizes. For reference, see: http://www.cs.uiowa.edu/~jones/pdp8/man/registers.html (PDP-8 series, single accumulator and some models w/ extended accumulator) http://www.8052.com/tutbregs.phtml (8051 series, w/ single A/B accumulator, a set of 8 limited "registers") http://www.osdata.com/topic/language/asm/register.htm (several machine references) http://www.sics.se/~psm/sparcstack.html (Sparc register explanation - a RISC machine, also describes stack usage) Search phrases included: memory delay I/O 8051 registers pdp-8 registers vax registers sparc registers memory register cycle ratio If any part of this answer is unclear or does not meet your needs, please use a clarification request. --Maniac
Request for Answer Clarification by aaslam-ga on 27 Jul 2004 11:14 PDT I need more quantitative and qualitative answer. My comments are as follows: 1. How much time signal take to travel the distance when accessing to Registers compared to SRAM (by SRAM, I mean chache memory which does not need Refresh). Is it an order of magnitude difference? What are the typical values for signal travel access time? 2. What is the difference in components implementing the registers compared to SRAM. Can you please give me some hardware schematic of Registers and SRAM and identify which components make the difference in access time? and by How much? 3. Does memory controller contribute a lot to SRAM access time? What is the percentage?
Clarification of Answer by maniac-ga on 28 Jul 2004 10:05 PDT Hello Aaslam, Hmm. Getting "real world" (instead of student homework) data and schematics is taking some digging. I can give you a partial answer to your clarification now and will try to get more detailed information later today. For another top level diagram of use of SRAM in caches, see: http://www.gsitechnology.com/MemoryTechnologyForCacheApps.pdf Describes use of SRAM in cache, includes several block diagrams showing the interconnects (not at a schematic level) as well as timing involved. For some real world (and freely available) designs of systems and components, see: http://www.opencores.org/ Has a number of publically available designs for processors and supporting items (e.g., arithmetric units, hardware interfaces). More specifically, see http://www.opencores.org/projects.cgi/web/or1k/openrisc_1200 which describes a full CPU implementation including instruction and data caches. It has been implemented in some demonstration devices as well. You can freely download the specifications and design from the opencores web site. I am still digging to find some specific timing / size and complexity answers to your question clarification and will follow up later today. --Maniac
Clarification of Answer by maniac-ga on 28 Jul 2004 16:43 PDT Hello Aaslam, I found some good references to answer the points raised in your clarification. For the most part, the time taken to access (e.g., read / write cycle) a register is included in the cycle time of the instruction. So an add instruction doing something like: R = R+M will read and write the register in the time of the CPU instruction. To get to that memory value (M), it must be fetched from the appropriate part of the cache or memory. Using http://www.systemlogic.net/articles/01/8/p4/page2.php as a guide, it indicates that: - "up to 4 simple arithmetic instructions per clock" - L1 cache has 2 clock delay - L2 cache can deliver 1 value each clock after a 10 clock latency From this information: - you can manipulate a register at least once per clock - the access to L1 cache introduces a two clock delay - the access to L2 cache introduces a ten clock delay So - yes, you can get an order of magnitude difference in timing between register operations and SRAM operations (cache). I would like to be able to give you a firm answer on the ratio of "speed of light" effects compared to "added components" effects on cache access time but cannot. There are some good references that describe how design of components has affected cache times but they don't go into sufficient detail to answer that particular issue. For example: http://www.anandtech.com/showdoc.html?i=1235 compares the design of the Pentium III and Athlon where the L2 cache in the Pentium III is "on die" and the Athlon is "off die". The larger Athlon L2 cache was much slower then the smaller Pentium III cache due to clock rates and distance. Clock rates may be the dominant factor in this case. http://www.kickassgear.com/Articles/Coppermine.htm Describes the Pentium III Coppermine design. Note the number of items described with the cache including: - width of cache accesses (fetch 256 bits, not 64) - associative access - speed increases (since now on die) which increase the complexity of the cache / have impacts on the cache performance. Note that some of these improve throughput (e.g., width of data path) but do not help latency. Others (such as speed increase) do improve latency. http://www.hardwareanalysis.com/action/printarticle/1269/ Another look at the Pentium IV but also describing the Pentium Pro through Pentium III. Talks about other factors including the use of branch prediction and a deep pipeline to mitigate the impact of latency to access data values (in cache or memory). For schematics / design data - I'll refer you again to the opencores site http://www.opencores.com/ which provides complete designs to implement a system (or parts of a system). The architectural information for the OpenRISC 1000 family is at http://www.opencores.com/projects.cgi/web/or1k/architecture/ and more specifics on the OpenRISC 1200 at http://www.opencores.com/projects.cgi/web/or1k/openrisc_1200 which includes links to the design, tutorials on implementation, and a mailing list for discussion. Good luck with your work. --Maniac

aaslam-ga rated this answer: 2 out of 5 stars

The answer was not specific. The researcher was not able to provide a
concise reference material for the question.Instead, the reference
material is vague and I have to dig further myself.

Comments

Subject: Re: Registers vs SRAM
From: grthumongous-ga on 24 Jul 2004 04:31 PDT

Registers are the absolute fastest home for a chunk of data.
Think of a register as a tiny hardware storage location used as a
scratch-pad and built-in to the microprocessor. It would be mere
millimeters from the execution units so propagation delays are
minimized. It can be accessed by an instruction on a store or read in
one clock cycle (or even *perhaps* less on the newest
implementations), so a 1GHz microprocessor could read a value from a
register in 1 nanosecond (one-billionth of a s).

The term SRAM (static RAM) that I am familiar with (I am not an
electronics engineer) is used to describe a type of RAM that doesn't
need its internal values to be refreshed as often (or at all?)
compared to DRAM (Dynamic RAM).

This SRAM *can* be used in the "normal" RAM memory of a PC (the kind
now sold by the 128MB or 256MB or 512MB scales) but is usually
reserved for use as a cache memory.

What is cache memory?
Conceptually, cache memory resides between the super-pipelined microprocessor
and the "normal" RAM memory. It is much faster than RAM, is more
expensive than RAM, but is smaller capacity (e.g. 32KB, 512KB), and is
physically close to microprocessor.  The purpose of cache memory is
hold the recently used instructions and data fetched from RAM so that
if the microprocessor needs to access them again they are available
for RE-use in less time than a RAM access would require.  The hungry
microprocessor needs a steady feed of instructions
and data or it stalls.

What is cache memory made from?
I believe this kind of microprocessor cache memory is usually SRAM for speed.

How much faster is a register access versus a SRAM cache access?
It depends on the specific implementations. I use a rule of thumb of 10 times.
that is, the register is 10 times faster than the SRAM cache, and the
SRAM cache is at least 10 times faster than dRAM.

There are many trade-offs.
The faster the clock rates on the newer chips (e.g. 3.0 GHz!) the
harder it is to keep the execution units busy.  Caches are crucial in
building a balanced part. Some of the factors to consider are  cache
speeds, sizes, whether one level, two level (e.g. a 32KB L1 cache
backed by a 512KB L2 cache) or even 3 level.

How big is a register?
Typically a 32-bit chip procesees instructions of up to 32-bits in
length, accesses data in 32-bit chunks, generates 32-bit addresses and
stores into 32-bit registers.  So a register in that case is 4 bytes. 
There are many registers in a microprocessor. A programmer or compiler
can never have too many.
 
Some chips are now 64-bit so I suppose their registers would be 64-bit.


Remember, I am not an electronics engineer. Your official Answerer may
note some logic errors in my old "execution unit".

Important Disclaimer: Answers and comments provided on Google Answers are general information, and are not intended to substitute for informed professional medical, psychiatric, psychological, tax, legal, investment, accounting, or other professional advice. Google does not endorse, and expressly disclaims liability for any product, manufacturer, distributor, service or service provider mentioned or any opinion expressed in answers or comments. Please read carefully the Google Answers Terms of Service.

If you feel that you have found inappropriate content, please let us know by emailing us at answers-support@google.com with the question ID listed above. Thank you.

Search Google Answers for

Google Home - Answers FAQ - Terms of Service - Privacy Policy