• No results found

UNIT-3 Cache Memory.ppt

N/A
N/A
Protected

Academic year: 2020

Share "UNIT-3 Cache Memory.ppt"

Copied!
66
0
0

Loading.... (view fulltext now)

Full text

(1)

UNIT-3

(2)

Capacity

Word size

The natural unit of organisation

Number of words

(3)

Unit of Transfer

Internal

Usually governed by data bus width

External

Usually a block which is much larger than a word

Addressable unit

(4)

Access Methods (1)

Sequential

 Start at the beginning and read through in order

Access time depends on location of data and previous location

e.g. tape

Direct

 Individual blocks have unique address

 Access is by jumping to vicinity plus sequential search

Access time depends on location and previous location

(5)

Access Methods (2)

Random

Individual addresses identify locations exactly

Access time is independent of location or previous access

e.g. RAM

Associative

Data is located by a comparison with contents of a portion of the store

Access time is independent of location or previous access

(6)

Memory Hierarchy

Registers

In CPU

Internal or Main memory

May include one or more levels of cache

“RAM”

External memory

(7)
(8)

Performance

Access time

Time between presenting the address and getting the valid data

Memory Cycle time

Time may be required for the memory to “recover” before next access

Cycle time is access + recovery

Transfer Rate

(9)

Physical Characteristics

(10)

Organisation

(11)

Memory hierarchy

Memory Hierarchy is to obtain the highest possible access speed while minimizing the total cost of

the memory system.

(12)

Locality of reference

principle

Locality of Reference

- The references to memory at any given time interval tend to be confined

within a localized areas

- This area contains a set of information and the membership changes

gradually as time goes by

- Temporal Locality

The information which will be used in near future is likely to be in use

already( e.g. Reuse of information in loops)

- Spatial Locality

If a word is accessed, adjacent(near) words are likely accessed soon (e.g.

(13)

CACHE

Cache

- The property of Locality of Reference makes the Cache memory systems work

- Cache is a fast small capacity memory that should hold those information which are most likely to be accessed

Main memory

Cache memory

(14)

Cache

Small amount of fast memory

(15)
(16)

Cache operation –

overview

CPU requests contents of memory location

Check cache for this data

If present, get from cache (fast)

If not present, read required block from main

memory to cache

Then deliver from cache to CPU

Cache includes tags to identify which block of

(17)
(18)

Size does matter

Cost

More cache is expensive

Speed

More cache is faster (up to a point)

(19)
(20)

Direct Mapping

Each block of main memory maps to only one

cache line

i.e. if a block is in cache, it must be in one specific place

Address is in two parts

Least Significant w bits identify unique word

Most Significant s bits specify one memory

(21)
(22)

CACHE- ASSOCIATIVE

MAPPING

• A main memory block can load into any line of cache

• Memory address is interpreted as 2 fields: tag and word

• Tag uniquely identifies block of memory

• Every line’s tag is examined simultaneously for a match

(23)
(24)
(25)

Basic terminologies

Hit: CPU finding contents of memory address in cache

Hit rate (h) is probability of successful lookup in cache by CPU.

Miss: CPU failing to find what it wants in cache (incurs trip to deeper levels of memory hierarchy

Miss rate (m) is probability of missing in cache and is equal to 1-h.

Miss penalty: Time penalty associated with servicing a miss at any particular level of memory hierarchy

Effective Memory Access Time (EMAT): Effective access time experienced by the CPU when accessing memory.

 Time to lookup cache to see if memory location is already there

 Upon cache miss, time to go to deeper levels of memory hierarchy

EMAT = Tc + m * Tm

(26)

Write Access to Cache

from CPU

Two choices

Write through policy

Write allocate

No-write allocate

(27)

Write Through Policy

Each write goes to cache. Tag is set and valid

bit is set

Each write also goes to write buffer (see next

(28)

Write Through Policy

Each write goes to cache. Tag is set and valid

bit is set

This is write allocate

There is also a no-write allocate where the cache is not written to if there was a write miss

Each write also goes to write buffer

Write buffer writes data into main memory

(29)

Write back policy

CPU writes data to cache setting dirty bit

(30)

Write back policy

We write to the cache

We don't bother to update main memory

Is the cache consistent with main memory?

Is this a problem?

(31)

Comparison of the Write

Policies

Write Through

Cache logic simpler and faster

Creates more bus traffic

Write back

Requires dirty bit and extra logic

Multilevel cache processors may use both

L1 Write through

(32)

Write Policy

Must not overwrite a cache block unless main

memory is up to date

(33)

Write through

All writes go to main memory as well as cache

Multiple CPUs can monitor main memory traffic

to keep local (to CPU) cache up to date

Lots of traffic

Slows down writes

(34)

Write back

Updates initially made in cache only

Update bit for cache slot is set when update

occurs

If block is to be replaced, write to main memory

only if update bit is set

Other caches get out of sync

(35)

Types of Cache

Mainly Cache is of three types

(36)

Static Cache

This has session lifetime and once the session is complete,the cache is deleted. It brings entire

(37)

Dynamic cache

It should be used when they databases are large. The principle behind it is,it will bring records one by one into the cache from the database. If the record is present already(Identified using key columns),then the particular record is not brought into the cache. It will help when the database has lot of redundant

(38)

Persistent cache

The lifetime of this kind of cache is entire

workflow. When a particular table is being

used in many sessions across the workflow,

the table can be made as persistent cache in

the lookup table properties. It will be

(39)

Levels in cache

A computer can have several different levels of cache memory. The level numbers refers to distance from CPU where Level 1 is the closest. All levels of cache

memory are faster than RAM. The cache closest to CPU is always faster but generally costs more and stores less data then other level of cache.

A computer can have several different levels of cache memory. The level numbers refers to distance from CPU where Level 1 is the closest. All levels of cache

(40)

Levels of cache

Level 1 (L1) Cache

It is also called primary or internal cache. It is built directly into the processor chip. It has small capacity from 8 Km to 128 Kb.

Level 2 (L2) Cache

It is slower than L1 cache. Its storagecapacity is more, i-e. From 64 Kb to 16 MB. The current processors contain advanced transfer cache on processor chip that is a type of L2 cache. The common size of this cache is from 512 kb to 8 Mb.

Level 3 (L3) Cache

This cache is separate from processor chip on the motherboard. It exists on the computer that uses L2 advanced transfer cache. It is slower than L1 and L2

(41)

Split i and d Cache

High-performance processors invariably have 2 separate L1 caches, the instruction cache and the data cache (I-cache and D-cache). This "split cache" has several advantages over a unified cache:[8]

Wiring simplicity: the decoder and scheduler are only hooked to the I-cache; the registers and ALU and FPU are only

hooked to the D-cache.

Speed: the CPU can be reading data from the D-cache, while simultaneously loading the next instruction(s) from the

I-cache.

Multi-CPU systems typically have a separate L1 I-cache and L1 D-cache for each CPU, each one direct-mapped for speed. On the other hand, in a high-performance processor, other

(42)

Unified vs Split I and D

(Instruction and Data) Caches

Given a fixed total size (in bytes) for the cache, is it better to have

two caches, one for instructions and one for data; or is it better to have a single unified cache?

Unified is better because it automatically performs load balancing.

If the current program needs more data references than instruction references, the cache will accommodate. Similarly if more

instruction references are needed.

Split is better because it can do two references at once (one

instruction reference and one data reference).

The winner is ...

split I and D (at least for L1).

But unified has the better (i.e. higher) hit ratio.

So hit ratio is not the ultimate measure of good cache

(43)

Multilevel Caches

Ubiquitous in high-performance processors

Gap between L1 (core frequency) and main memory too high

Level 2 usually on chip, level 3 on or off-chip, level 4 off chip

Inclusion in multilevel caches

Multi-level inclusion holds if L2 cache is superset of L1

Can handle virtual address synonyms

Filter coherence traffic: if L2 misses, L1 needn’t see snoop

Makes L1 writes simpler

(44)

Replacement Policy

When a line must be evicted from a cache to

(45)

Direct Mapped

Replacement

There is no choice about which line to evict,

(46)

Replacement Goals

The general goal of the replacement policy is

to minimize future cache misses by evicting a

line that will not be referenced often in the

future

(47)

Least-recently used

(LRU) replacement policy

The cache ranks each of the lines in a set

(48)

Random replacement

policy

A randomly selected line from the

(49)

Virtual to real translation

The cache is addressed with the real memory addressed the addressed translated by TLB or mechanism used by the physical memory.

There are atleast three important performance aspect that directly relate to vrtual to real address translation. 1.Improperply organize or insufficiently sized TLBs may create access not n TLB faults, adding time to

execution.

2.For real cache,the TLB time must occur before the cache access effectively extending the cache access time.

(50)

What is Virtual to physical translation?

In a virtual memory system, the program

memory is divided into fixed sized pages and

allocated in fixed sized physical memory frames.

The pages do not have to be contiguous in

memory. A page table keeps track of where each

page is located in physical memory. This allows

the operating system to load a program of any

size into any available frames. Only the currently

used pages need to be loaded. Unused pages

can remain on disk until they are referenced.

(51)

What are Flags?

The page table also includes several other flags

to keep track of memory usage.

resident

 flag in the page table indicates

whether or not the page is in memory.

use

 flag is set whenever the page is

(52)

The addresses that appear in programs are the

virtual addresses or program addresses. For

every memory access, either to fetch an

instruction or data, the CPU must translate the

virtual address to a real physical address. A

virtual memory address can be considered to

be composed of two parts: a page number and

an offset into the page. The page number

determines which page contains the

(53)
(54)

TLB(Translation look

aside buffer)

(55)
(56)

Overlapping the Tcycle in

V->R translation

There are three general approaches to avoiding the serial

translation step in cache access.In order to avoid the

sequential translation , the translation must be arranged so

that it can be performed simultaneously with data access in

the cache array.This can be done by three means:

1.

Using high degrees of set associativity ,so that the

directory index bits are not affected by the translation.

2.

Using a virtual code.

(57)

Cache Write Policy

and Replacement at

hit

(58)

Need of Write Policy

• A block in cache might have been updated, but

corresponding updation in main memory might not have been done

• Multiple CPU’s have individual cache’s, thereby invalidating the data in other processor’s cache

(59)

Cache Write Policy

• Write through

The value is written to both the cache line and to the lower level memory.

• Write back

(60)
(61)

Write Through

• In this technique, all the write operations are made to main memory as well as to cache, ensuring MM is

always valid.

• Any other processor-cache module, may monitor traffic to MM to maintain consistency

DISADVANTAGE

• It generates memory traffic and may create bottleneck.

(62)

Pseudo Write Through

• Also called Write Buffer

• Processor writes data into the cache and the write buffer

• Memory controller writes contents of the buffer to memory

• FIFO (typical number of entries 4)

(63)
(64)

Write Back

• In this technique, the updates are made only in cache.

• When an update is made, a dirty bit or use bit, associated with the line is set

• Then when a block is replaced, it is written back into the main memory, iff the dirty bit is set

• Thus it minimizes memory writes

DISADVANTAGE

• Portions of MM are still invalid, hence I/O should be allowed access only through cache

(65)

Cache Replacement Policy

• Random

Replace a randomly chosen line

• FIFO

Replace the oldest line

• LRU (Least Recently Used)

Replace the least recently used line

• NRU (Not Recently Used)

(66)
:[8]

References

Related documents

For the purpose of registering a public company in Hong Kong, you will need to provide us with the proposed name, the amount of registered capital, identity proof, such as Hong Kong

Hardinge Lathes Page 9 Square Shank Tool Holders: Left- & Right-hand Y-Axis.. CW CW1 CH1 T CH L S H

Virtual Servers (VM) Virtual CPU Virtual Memory Root Disk (OS area) Data Disk (Data Storage) Network Firewall Load Balancer Private IP Address Global IP Address Others

 With paging, physical memory is also split into some number of pages called a page frame..  Page table per process is needed to translate the virtual address to

 Memory virtualization demands special hardware support (shadow page tables by VMWare or extended page table by Intel) to help translate virtual address

• Entry consists of the virtual address of the page stored in that real memory location, with information about the process that owns that page. • Decreases memory needed to store

 Entry consists of the virtual address of the page stored in that real memory location, with information about the process that owns that page.  Decreases memory needed to

The study was aim to investigate the effects of continuous PCBs exposure on optomotor response (OMR) and retinal photoreceptor cell development-related gene expression in