UNIT-3
Capacity
Word size
The natural unit of organisation
Number of words
Unit of Transfer
Internal
Usually governed by data bus width
External
Usually a block which is much larger than a word
Addressable unit
Access Methods (1)
Sequential
Start at the beginning and read through in order
Access time depends on location of data and previous location
e.g. tape
Direct
Individual blocks have unique address
Access is by jumping to vicinity plus sequential search
Access time depends on location and previous location
Access Methods (2)
Random
Individual addresses identify locations exactly
Access time is independent of location or previous access
e.g. RAM
Associative
Data is located by a comparison with contents of a portion of the store
Access time is independent of location or previous access
Memory Hierarchy
Registers
In CPU
Internal or Main memory
May include one or more levels of cache
“RAM”
External memory
Performance
Access time
Time between presenting the address and getting the valid data
Memory Cycle time
Time may be required for the memory to “recover” before next access
Cycle time is access + recovery
Transfer Rate
Physical Characteristics
Organisation
Memory hierarchy
Memory Hierarchy is to obtain the highest possible access speed while minimizing the total cost of
the memory system.
Locality of reference
principle
Locality of Reference
- The references to memory at any given time interval tend to be confined
within a localized areas
- This area contains a set of information and the membership changes
gradually as time goes by
- Temporal Locality
The information which will be used in near future is likely to be in use
already( e.g. Reuse of information in loops)
- Spatial Locality
If a word is accessed, adjacent(near) words are likely accessed soon (e.g.
CACHE
Cache
- The property of Locality of Reference makes the Cache memory systems work
- Cache is a fast small capacity memory that should hold those information which are most likely to be accessed
Main memory
Cache memory
Cache
Small amount of fast memory
Cache operation –
overview
CPU requests contents of memory location
Check cache for this data
If present, get from cache (fast)
If not present, read required block from main
memory to cache
Then deliver from cache to CPU
Cache includes tags to identify which block of
Size does matter
Cost
More cache is expensive
Speed
More cache is faster (up to a point)
Direct Mapping
Each block of main memory maps to only one
cache line
i.e. if a block is in cache, it must be in one specific place
Address is in two parts
Least Significant w bits identify unique word
Most Significant s bits specify one memory
CACHE- ASSOCIATIVE
MAPPING
• A main memory block can load into any line of cache
• Memory address is interpreted as 2 fields: tag and word
• Tag uniquely identifies block of memory
• Every line’s tag is examined simultaneously for a match
Basic terminologies
Hit: CPU finding contents of memory address in cache
Hit rate (h) is probability of successful lookup in cache by CPU.
Miss: CPU failing to find what it wants in cache (incurs trip to deeper levels of memory hierarchy
Miss rate (m) is probability of missing in cache and is equal to 1-h.
Miss penalty: Time penalty associated with servicing a miss at any particular level of memory hierarchy
Effective Memory Access Time (EMAT): Effective access time experienced by the CPU when accessing memory.
Time to lookup cache to see if memory location is already there
Upon cache miss, time to go to deeper levels of memory hierarchy
EMAT = Tc + m * Tm
Write Access to Cache
from CPU
Two choices
Write through policy
Write allocate
No-write allocate
Write Through Policy
Each write goes to cache. Tag is set and valid
bit is set
Each write also goes to write buffer (see next
Write Through Policy
Each write goes to cache. Tag is set and valid
bit is set
This is write allocate
There is also a no-write allocate where the cache is not written to if there was a write miss
Each write also goes to write buffer
Write buffer writes data into main memory
Write back policy
CPU writes data to cache setting dirty bit
Write back policy
We write to the cache
We don't bother to update main memory
Is the cache consistent with main memory?
Is this a problem?
Comparison of the Write
Policies
Write Through
Cache logic simpler and faster
Creates more bus traffic
Write back
Requires dirty bit and extra logic
Multilevel cache processors may use both
L1 Write through
Write Policy
Must not overwrite a cache block unless main
memory is up to date
Write through
All writes go to main memory as well as cache
Multiple CPUs can monitor main memory traffic
to keep local (to CPU) cache up to date
Lots of traffic
Slows down writes
Write back
Updates initially made in cache only
Update bit for cache slot is set when update
occurs
If block is to be replaced, write to main memory
only if update bit is set
Other caches get out of sync
Types of Cache
Mainly Cache is of three types
Static Cache
This has session lifetime and once the session is complete,the cache is deleted. It brings entire
Dynamic cache
It should be used when they databases are large. The principle behind it is,it will bring records one by one into the cache from the database. If the record is present already(Identified using key columns),then the particular record is not brought into the cache. It will help when the database has lot of redundant
Persistent cache
The lifetime of this kind of cache is entire
workflow. When a particular table is being
used in many sessions across the workflow,
the table can be made as persistent cache in
the lookup table properties. It will be
Levels in cache
A computer can have several different levels of cache memory. The level numbers refers to distance from CPU where Level 1 is the closest. All levels of cache
memory are faster than RAM. The cache closest to CPU is always faster but generally costs more and stores less data then other level of cache.
A computer can have several different levels of cache memory. The level numbers refers to distance from CPU where Level 1 is the closest. All levels of cache
Levels of cache
Level 1 (L1) Cache
It is also called primary or internal cache. It is built directly into the processor chip. It has small capacity from 8 Km to 128 Kb.
Level 2 (L2) Cache
It is slower than L1 cache. Its storagecapacity is more, i-e. From 64 Kb to 16 MB. The current processors contain advanced transfer cache on processor chip that is a type of L2 cache. The common size of this cache is from 512 kb to 8 Mb.
Level 3 (L3) Cache
This cache is separate from processor chip on the motherboard. It exists on the computer that uses L2 advanced transfer cache. It is slower than L1 and L2
Split i and d Cache
High-performance processors invariably have 2 separate L1 caches, the instruction cache and the data cache (I-cache and D-cache). This "split cache" has several advantages over a unified cache:[8]
Wiring simplicity: the decoder and scheduler are only hooked to the I-cache; the registers and ALU and FPU are only
hooked to the D-cache.
Speed: the CPU can be reading data from the D-cache, while simultaneously loading the next instruction(s) from the
I-cache.
Multi-CPU systems typically have a separate L1 I-cache and L1 D-cache for each CPU, each one direct-mapped for speed. On the other hand, in a high-performance processor, other
Unified vs Split I and D
(Instruction and Data) Caches
Given a fixed total size (in bytes) for the cache, is it better to have
two caches, one for instructions and one for data; or is it better to have a single unified cache?
Unified is better because it automatically performs load balancing.
If the current program needs more data references than instruction references, the cache will accommodate. Similarly if more
instruction references are needed.
Split is better because it can do two references at once (one
instruction reference and one data reference).
The winner is ...
split I and D (at least for L1).
But unified has the better (i.e. higher) hit ratio.
So hit ratio is not the ultimate measure of good cache
Multilevel Caches
Ubiquitous in high-performance processors
Gap between L1 (core frequency) and main memory too high
Level 2 usually on chip, level 3 on or off-chip, level 4 off chip
Inclusion in multilevel caches
Multi-level inclusion holds if L2 cache is superset of L1
Can handle virtual address synonyms
Filter coherence traffic: if L2 misses, L1 needn’t see snoop
Makes L1 writes simpler
Replacement Policy
When a line must be evicted from a cache to
Direct Mapped
Replacement
There is no choice about which line to evict,
Replacement Goals
The general goal of the replacement policy is
to minimize future cache misses by evicting a
line that will not be referenced often in the
future
Least-recently used
(LRU) replacement policy
The cache ranks each of the lines in a set
Random replacement
policy
A randomly selected line from the
Virtual to real translation
The cache is addressed with the real memory addressed the addressed translated by TLB or mechanism used by the physical memory.
There are atleast three important performance aspect that directly relate to vrtual to real address translation. 1.Improperply organize or insufficiently sized TLBs may create access not n TLB faults, adding time to
execution.
2.For real cache,the TLB time must occur before the cache access effectively extending the cache access time.
What is Virtual to physical translation?
In a virtual memory system, the program
memory is divided into fixed sized pages and
allocated in fixed sized physical memory frames.
The pages do not have to be contiguous in
memory. A page table keeps track of where each
page is located in physical memory. This allows
the operating system to load a program of any
size into any available frames. Only the currently
used pages need to be loaded. Unused pages
can remain on disk until they are referenced.
What are Flags?
The page table also includes several other flags
to keep track of memory usage.
A
resident
flag in the page table indicates
whether or not the page is in memory.
A
use
flag is set whenever the page is
The addresses that appear in programs are the
virtual addresses or program addresses. For
every memory access, either to fetch an
instruction or data, the CPU must translate the
virtual address to a real physical address. A
virtual memory address can be considered to
be composed of two parts: a page number and
an offset into the page. The page number
determines which page contains the
TLB(Translation look
aside buffer)
Overlapping the Tcycle in
V->R translation
There are three general approaches to avoiding the serial
translation step in cache access.In order to avoid the
sequential translation , the translation must be arranged so
that it can be performed simultaneously with data access in
the cache array.This can be done by three means:
1.
Using high degrees of set associativity ,so that the
directory index bits are not affected by the translation.
2.
Using a virtual code.
Cache Write Policy
and Replacement at
hit
Need of Write Policy
• A block in cache might have been updated, but
corresponding updation in main memory might not have been done
• Multiple CPU’s have individual cache’s, thereby invalidating the data in other processor’s cache
Cache Write Policy
• Write through
The value is written to both the cache line and to the lower level memory.
• Write back
Write Through
• In this technique, all the write operations are made to main memory as well as to cache, ensuring MM is
always valid.
• Any other processor-cache module, may monitor traffic to MM to maintain consistency
DISADVANTAGE
• It generates memory traffic and may create bottleneck.
Pseudo Write Through
• Also called Write Buffer
• Processor writes data into the cache and the write buffer
• Memory controller writes contents of the buffer to memory
• FIFO (typical number of entries 4)
Write Back
• In this technique, the updates are made only in cache.
• When an update is made, a dirty bit or use bit, associated with the line is set
• Then when a block is replaced, it is written back into the main memory, iff the dirty bit is set
• Thus it minimizes memory writes
DISADVANTAGE
• Portions of MM are still invalid, hence I/O should be allowed access only through cache
Cache Replacement Policy
• Random
Replace a randomly chosen line
• FIFO
Replace the oldest line
• LRU (Least Recently Used)
Replace the least recently used line
• NRU (Not Recently Used)