Google Answers Logo
View Question
 
Q: Registers vs SRAM ( Answered 2 out of 5 stars,   1 Comment )
Question  
Subject: Registers vs SRAM
Category: Computers > Hardware
Asked by: aaslam-ga
List Price: $20.00
Posted: 23 Jul 2004 14:28 PDT
Expires: 22 Aug 2004 14:28 PDT
Question ID: 378292
Are Registers inherently faster than SRAM? Why or Why not? If
Registers are Faster, how much Faster they are compared to SRAM? How
much storage bits (typical) provided by registers (implemented
on-chip)?

Clarification of Question by aaslam-ga on 26 Jul 2004 10:36 PDT
I want an answer from technological point of view. What is the
technological difference between Registers and SRAM that cause
Registers faster than SRAM?
Answer  
Subject: Re: Registers vs SRAM
Answered By: maniac-ga on 26 Jul 2004 19:16 PDT
Rated:2 out of 5 stars
 
Hello Aaslam,

Background: when referring to SRAM, I assume you are talking about
main memory. If you really mean caches (as the comment implies), the
performance hit is much less - perhaps 5:1 but the explanations still
apply. I would be glad to revise the answer if needed to more
accurately describe SRAM in caches.

Q: Are registers inherently faster than SRAM?
A: Yes.

Q: Why?
A: Several factors including:
 - distance signals must travel (speed of light, registers are part of the CPU)
 - speed of components implementing the registers (also more expensive)
 - complexity of interface (registers - direct, memory through one or
more cache levels / memory controllers)
 - access to memory may be delayed by I/O units (e.g., DMA, PCI transfers)
When a cache is involved, there is also some complexity related to
independent updates of memory. This will require cache flushes to keep
the cache / memory contents consistent (sometimes referred to as cache
coherency).

Refer to:
  http://www.sc2001.org/papers/pap.pap120.pdf
(describes cycle stealing)
  [several subsequent links also help describe this]

Q: How much faster are registers compared to SRAM?
A: Some old designs had basically a 1:1 ratio - each instruction took
one cycle (instruction fetch / memory operation). For more modern
systems, it varies by system, but a typical ratio is 30:1. The ratio
of 100:1 as noted in the comment is also possible. There are also "Non
Uniform Memory Access" (NUMA) systems where times to local memory are
similar to a normal system, but you also have access to other machines
where the ratio could be 10 times worse (or more). Also note that
write access may be "faster" than read access due to caching effects
(fewer CPU stalls).

For reference:
  http://ciips.ee.uwa.edu.au/~morris/Year2/PLDS210/mem_hierarchy.html
(data for 1996 - 200 to 400 Mhz processors)
  http://www-courses.cs.uiuc.edu/~cs333/slides/chapter5%5B1%5D-1.pdf
(general material, describes ratios of 2:1 up to 200:1 in one table)

Q: How many storage bits are provided on chip by registers?
A: Wow - that varies a lot by processor type and age of the system.
The "right answer" is a pretty wide range. In the early to mid 70's it
was common to have a single or pair of registers. For a 16 bit word
size, that would thus be 16 or 32 bits in registers. You may still see
that today in embedded microprocessors as well. Several CISC machines
would later have 8 to 16 registers - so at 32 bits, you get 256 to 512
bits. Several RISC machines would have up to 32 registers, so 1024 to
2048 bits (32 or 64 bit). Note this is MUCH smaller than cache sizes
and memory sizes.

For reference, see:
  http://www.cs.uiowa.edu/~jones/pdp8/man/registers.html
(PDP-8 series, single accumulator and some models w/ extended accumulator)
  http://www.8052.com/tutbregs.phtml
(8051 series, w/ single A/B accumulator, a set of 8 limited "registers")
  http://www.osdata.com/topic/language/asm/register.htm
(several machine references)
  http://www.sics.se/~psm/sparcstack.html
(Sparc register explanation - a RISC machine, also describes stack usage)

Search phrases included:
  memory delay I/O
  8051 registers
  pdp-8 registers
  vax registers
  sparc registers
  memory register cycle ratio

If any part of this answer is unclear or does not meet your needs,
please use a clarification request.

  --Maniac

Request for Answer Clarification by aaslam-ga on 27 Jul 2004 11:14 PDT
I need more quantitative and qualitative answer. My comments are as follows:

1. How much time signal take to travel the distance when accessing to
Registers compared to SRAM (by SRAM, I mean chache memory which does
not need Refresh). Is it an order of magnitude difference? What are
the typical values for signal travel access time?

2. What is the difference in components implementing the registers
compared to SRAM. Can you please give me some hardware schematic of
Registers and SRAM and identify which components make the difference
in access time? and by How much?

3. Does memory controller contribute a lot to SRAM access time? What
is the percentage?

Clarification of Answer by maniac-ga on 28 Jul 2004 10:05 PDT
Hello Aaslam,

Hmm. Getting "real world" (instead of student homework) data and
schematics is taking some digging. I can give you a partial answer to
your clarification now and will try to get more detailed information
later today.

For another top level diagram of use of SRAM in caches, see:
  http://www.gsitechnology.com/MemoryTechnologyForCacheApps.pdf
Describes use of SRAM in cache, includes several block diagrams
showing the interconnects (not at a schematic level) as well as timing
involved.

For some real world (and freely available) designs of systems and components, see:
  http://www.opencores.org/
Has a number of publically available designs for processors and
supporting items (e.g., arithmetric units, hardware interfaces). More
specifically, see
  http://www.opencores.org/projects.cgi/web/or1k/openrisc_1200
which describes a full CPU implementation including instruction and
data caches. It has been implemented in some demonstration devices as
well. You can freely download the specifications and design from the
opencores web site.

I am still digging to find some specific timing / size and complexity
answers to your question clarification and will follow up later today.
  --Maniac

Clarification of Answer by maniac-ga on 28 Jul 2004 16:43 PDT
Hello Aaslam,

I found some good references to answer the points raised in your clarification.

For the most part, the time taken to access (e.g., read / write cycle)
a register is included in the cycle time of the instruction. So an add
instruction doing something like:
  R = R+M
will read and write the register in the time of the CPU instruction.
To get to that memory value (M), it must be fetched from the
appropriate part of the cache or memory. Using
    http://www.systemlogic.net/articles/01/8/p4/page2.php
as a guide, it indicates that:
 - "up to 4 simple arithmetic instructions per clock"
 - L1 cache has 2 clock delay
 - L2 cache can deliver 1 value each clock after a 10 clock latency

From this information:
 - you can manipulate a register at least once per clock
 - the access to L1 cache introduces a two clock delay
 - the access to L2 cache introduces a ten clock delay
So - yes, you can get an order of magnitude difference in timing
between register operations and SRAM operations (cache).

I would like to be able to give you a firm answer on the ratio of
"speed of light" effects compared to "added components" effects on
cache access time but cannot. There are some good references that
describe how design of components has affected cache times but they
don't go into sufficient detail to answer that particular issue. For
example:
  http://www.anandtech.com/showdoc.html?i=1235
compares the design of the Pentium III and Athlon where the L2 cache
in the Pentium III is "on die" and the Athlon is "off die". The larger
Athlon L2 cache was much slower then the smaller Pentium III cache due
to clock rates and distance. Clock rates may be the dominant factor in
this case.

  http://www.kickassgear.com/Articles/Coppermine.htm
Describes the Pentium III Coppermine design. Note the number of items
described with the cache including:
 - width of cache accesses (fetch 256 bits, not 64)
 - associative access
 - speed increases (since now on die)
which increase the complexity of the cache / have impacts on the cache
performance. Note that some of these improve throughput (e.g., width
of data path) but do not help latency. Others (such as speed increase)
do improve latency.

  http://www.hardwareanalysis.com/action/printarticle/1269/
Another look at the Pentium IV but also describing the Pentium Pro
through Pentium III. Talks about other factors including the use of
branch prediction and a deep pipeline to mitigate the impact of
latency to access data values (in cache or memory).

For schematics / design data - I'll refer you again to the opencores site
  http://www.opencores.com/
which provides complete designs to implement a system (or parts of a
system). The architectural information for the OpenRISC 1000 family is
at
  http://www.opencores.com/projects.cgi/web/or1k/architecture/
and more specifics on the OpenRISC 1200 at
  http://www.opencores.com/projects.cgi/web/or1k/openrisc_1200
which includes links to the design, tutorials on implementation, and a
mailing list for discussion.

Good luck with your work.

  --Maniac
aaslam-ga rated this answer:2 out of 5 stars
The answer was not specific. The researcher was not able to provide a
concise reference material for the question.Instead, the reference
material is vague and I have to dig further myself.

Comments  
Subject: Re: Registers vs SRAM
From: grthumongous-ga on 24 Jul 2004 04:31 PDT
 
Registers are the absolute fastest home for a chunk of data.
Think of a register as a tiny hardware storage location used as a
scratch-pad and built-in to the microprocessor. It would be mere
millimeters from the execution units so propagation delays are
minimized. It can be accessed by an instruction on a store or read in
one clock cycle (or even *perhaps* less on the newest
implementations), so a 1GHz microprocessor could read a value from a
register in 1 nanosecond (one-billionth of a s).

The term SRAM (static RAM) that I am familiar with (I am not an
electronics engineer) is used to describe a type of RAM that doesn't
need its internal values to be refreshed as often (or at all?)
compared to DRAM (Dynamic RAM).

This SRAM *can* be used in the "normal" RAM memory of a PC (the kind
now sold by the 128MB or 256MB or 512MB scales) but is usually
reserved for use as a cache memory.

What is cache memory?
Conceptually, cache memory resides between the super-pipelined microprocessor
and the "normal" RAM memory. It is much faster than RAM, is more
expensive than RAM, but is smaller capacity (e.g. 32KB, 512KB), and is
physically close to microprocessor.  The purpose of cache memory is
hold the recently used instructions and data fetched from RAM so that
if the microprocessor needs to access them again they are available
for RE-use in less time than a RAM access would require.  The hungry
microprocessor needs a steady feed of instructions
and data or it stalls.

What is cache memory made from?
I believe this kind of microprocessor cache memory is usually SRAM for speed.

How much faster is a register access versus a SRAM cache access?
It depends on the specific implementations. I use a rule of thumb of 10 times.
that is, the register is 10 times faster than the SRAM cache, and the
SRAM cache is at least 10 times faster than dRAM.

There are many trade-offs.
The faster the clock rates on the newer chips (e.g. 3.0 GHz!) the
harder it is to keep the execution units busy.  Caches are crucial in
building a balanced part. Some of the factors to consider are  cache
speeds, sizes, whether one level, two level (e.g. a 32KB L1 cache
backed by a 512KB L2 cache) or even 3 level.

How big is a register?
Typically a 32-bit chip procesees instructions of up to 32-bits in
length, accesses data in 32-bit chunks, generates 32-bit addresses and
stores into 32-bit registers.  So a register in that case is 4 bytes. 
There are many registers in a microprocessor. A programmer or compiler
can never have too many.
 
Some chips are now 64-bit so I suppose their registers would be 64-bit.


Remember, I am not an electronics engineer. Your official Answerer may
note some logic errors in my old "execution unit".

Important Disclaimer: Answers and comments provided on Google Answers are general information, and are not intended to substitute for informed professional medical, psychiatric, psychological, tax, legal, investment, accounting, or other professional advice. Google does not endorse, and expressly disclaims liability for any product, manufacturer, distributor, service or service provider mentioned or any opinion expressed in answers or comments. Please read carefully the Google Answers Terms of Service.

If you feel that you have found inappropriate content, please let us know by emailing us at answers-support@google.com with the question ID listed above. Thank you.
Search Google Answers for
Google Answers  


Google Home - Answers FAQ - Terms of Service - Privacy Policy