Chapter9-Memory_5.ppt

(1)

Computer Organization

and Architecture

Chapter 9

(2)

 Location_Location  Capacity_Capacity

 Unit of transfer_{Unit of transfer}  Access method_{Access method}  Performance_Performance

 Physical type_{Physical type}

 Physical characteristics_{Physical characteristics}  Organisation_Organisation

Characteristics

(3)

Location

 CPUCPU

(4)

Capacity

 Word size_{Word size}

 The natural unit of organisationThe natural unit of organisation

 Number of words_{Number of words}

(5)

Unit of Transfer

 Internal_Internal

 Usually governed by data bus widthUsually governed by data bus width

 External_External

 Usually a block which is much larger than a Usually a block which is much larger than a

word

 Addressable unit_{Addressable unit}

 Smallest location which can be uniquely Smallest location which can be uniquely

addressed

(6)

Access Methods

 _Sequential_Sequential

 Start at the beginning and read through in orderStart at the beginning and read through in order

 Access time depends on location of data and Access time depends on location of data and

previous location previous location

 e.g. tapee.g. tape

 _Direct_Direct

 Individual blocks have unique addressIndividual blocks have unique address

 Access is by jumping to vicinity plus sequential Access is by jumping to vicinity plus sequential

search search

 Access time depends on location and previous Access time depends on location and previous

location location

(7)

Access Methods

 Random_Random

 Individual addresses identify locations exactlyIndividual addresses identify locations exactly  Access time is independent of location or Access time is independent of location or

previous access

 e.g. RAMe.g. RAM

 _Associative_Associative

 Data is located by a comparison with contents Data is located by a comparison with contents

of a portion of the store

 Access time is independent of location or Access time is independent of location or

previous access

(8)

Performance

 Access time_{Access time}

 Time between presenting the address and Time between presenting the address and

getting the valid data stored for use (perform

read and write operations)

 Memory Cycle time_{Memory Cycle time}

 Time may be required for the memory to Time may be required for the memory to

“recover” before next access

 Cycle time is access + recovery (access time Cycle time is access + recovery (access time

+ time required before a 2

+ time required before a 2ndnd access can _{access can}

commence)

(9)

Performance

 Transfer Rate_{Transfer Rate}

 Rate at which data can be moved (in and out Rate at which data can be moved (in and out

of memory)

 RAM: 1/ (cycle time)RAM: 1/ (cycle time)  Non-RAM: TNon-RAM: T

N

N = T = TAA + N/R + N/R

Where:

TN = Average time to read or write N bits

TA = Average access time

N = Number of bits

R = Transfer rate, in bits per second (bps)

(10)

Memory Hierarchy

 Registers_Registers

 In CPUIn CPU

 Internal or Main memory_{Internal or Main memory}

 May include one or more levels of cacheMay include one or more levels of cache  ““RAM”RAM”

 External memory_{External memory}

(11)

Memory Hierarchy

Registers Registers L1 Cache L1 Cache L2 Cache L2 Cache Main memory Main memory Disk cache Disk cache Magnetic Disk Magnetic Disk Optical Optical Tape Tape Price

(12)

Memory Hierarchy

Registers Registers L1 Cache L1 Cache L2 Cache L2 Cache Main memory Main memory Disk cache Disk cache Magnetic Disk Magnetic Disk Optical Optical Tape Tape Price per byte Performance

 Faster access time, _{Faster access time,}

greater cost per bit greater cost per bit

 _{Greater capacity, smaller}_{Greater capacity, smaller}

cost per bit cost per bit

 Greater capacity, slower _{Greater capacity, slower}

access time access time

 Decreasing cost per bit_{Decreasing cost per bit}  Increasing capacity_{Increasing capacity}

 _{Increasing access time}_{Increasing access time}  _{Decreasing frequency of}_{Decreasing frequency of}

access of the memory by access of the memory by

(13)

13

Memory Hierarchy

 Design constraints on computer’s memory _{Design constraints on computer’s memory} can be summed up by three question

can be summed up by three question

 How much?How much?

• If the Capacity is there the application will use_{If the Capacity is there the application will use}

 How fast?How fast?

• To achieve greatest performance, the memory _{To achieve greatest performance, the memory} must be able to keep up with the processor

must be able to keep up with the processor

• Processor should not pause waiting for instructions _{Processor should not pause waiting for instructions} or operands

or operands

 How expensive?How expensive?

• Must be reasonable in relationship to other _{Must be reasonable in relationship to other} components

(14)

Effectiveness of Memory Hierarchy

Hit ratio:

T

T_A = h*M = h*M₁ + (1-h)(M + (1-h)(M₁+M+M₂)

where : where :

h = hit rate h = hit rate

MM₁ = access time of 1st level memory = access time of 1st level memory

(15)

Physical Types

 Semiconductor_{Semiconductor}

 RAMRAM

 Magnetic_Magnetic

 Disk & TapeDisk & Tape

 Optical_Optical

 CD & DVDCD & DVD

 Others_Others

 BubbleBubble

(16)

16

Physical Characteristics

 _Volatility_Volatility

 Volatile memory – information decays naturally or is Volatile memory – information decays naturally or is

lost when electrical power is switched off lost when electrical power is switched off

 Non volatile – information remains without Non volatile – information remains without

deterioration until deliberately changed (magnetic deterioration until deliberately changed (magnetic

surface) surface)

 Erasable_Erasable

 Non erasable memories cannot be altered except by Non erasable memories cannot be altered except by

destroying the storage unit e.d ROM destroying the storage unit e.d ROM

(17)

Semiconductor Memory

 RAM RAM

 Misnamed as all semiconductor memory is Misnamed as all semiconductor memory is

random access

 Read/WriteRead/Write  VolatileVolatile

(18)

Dynamic RAM

 Bits stored as charge in capacitors_{Bits stored as charge in capacitors}  Charges leak_{Charges leak}

 Need refreshing even when powered_{Need refreshing even when powered}  Simpler construction_{Simpler construction}

 Smaller per bit_{Smaller per bit}  Less expensive_{Less expensive}

 Need refresh circuits_{Need refresh circuits}  Slower_Slower

(19)

Static RAM

 Bits stored as on/off switches_{Bits stored as on/off switches}  No charges to leak_{No charges to leak}

 No refreshing needed when powered_{No refreshing needed when powered}  More complex construction_{More complex construction}

 Larger per bit_{Larger per bit}

 More expensive_{More expensive}

 Does not need refresh circuits_{Does not need refresh circuits}  Faster_Faster

(20)

Read Only Memory (ROM)

 Permanent storage_{Permanent storage}

 Microprogramming (see later)_{Microprogramming (see later)}  Library subroutines_{Library subroutines}

(21)

Types of ROM

 _{Written during manufacture}_{Written during manufacture}

 Very expensive for small runsVery expensive for small runs

 Programmable (once)_{Programmable (once)}

 PROMPROM

 Needs special equipment to programNeeds special equipment to program

 _{Read “mostly”}_{Read “mostly”}

 Erasable Programmable (EPROM)Erasable Programmable (EPROM)

• Erased by UV_{Erased by UV}

 Electrically Erasable (EEPROM)Electrically Erasable (EEPROM)

• Takes much longer to write than read_{Takes much longer to write than read}

 Flash memoryFlash memory

(22)

Cache

 _{Small amount of fast memory}_{Small amount of fast memory}

 Sits between normal main memory and CPU_{Sits between normal main memory and CPU}  May be located on CPU chip or module_{May be located on CPU chip or module}

(23)

Cache Design

 Size_Size

 Mapping Function_{Mapping Function}

 Replacement Algorithm_{Replacement Algorithm}  Write Policy_{Write Policy}

 Block Size_{Block Size}

(24)

Cache operation - overview

 CPU requests contents of memory _{CPU requests contents of memory} location

location

 Check cache for this data_{Check cache for this data}

 If present, get from cache (fast)_{If present, get from cache (fast)}

 If not present, read required block from _{If not present, read required block from} main memory to cache

main memory to cache

 Then deliver from cache to CPU_{Then deliver from cache to CPU}

 Cache includes tags to identify which _{Cache includes tags to identify which} block of main memory is in each cache

block of main memory is in each cache

slot

(25)

Typical Cache Organization

(26)

_{What Is A Cache Line?}_{What Is A Cache Line?}

 Data is hauled into the cache from memory in “chunks”.Data is hauled into the cache from memory in “chunks”.

 If you ask for 4 bytes of data, you’ll get the whole line (32/64/128 bytes)If you ask for 4 bytes of data, you’ll get the whole line (32/64/128 bytes)  Locality of reference says you’ll need that data anywayLocality of reference says you’ll need that data anyway

 Incur the cost only once rather than each time you ask for a piece of data.Incur the cost only once rather than each time you ask for a piece of data.

_{How Is The Cache Laid Out?}_{How Is The Cache Laid Out?}

 The cache is made up of a number of cache lines.The cache is made up of a number of cache lines.

 The Level 1 Data Cache of a Pentium P4 Xeon processor contains 8K The Level 1 Data Cache of a Pentium P4 Xeon processor contains 8K

bytes.

 The cache lines are each 64 Bytes.The cache lines are each 64 Bytes.

 This gives 8192 bytes / 64 bytes = 128 cache lines.This gives 8192 bytes / 64 bytes = 128 cache lines.

_{How Does The Cache Manage the Cache Lines?}_{How Does The Cache Manage the Cache Lines?}

 Associativity describes how the data is stored in the cache.Associativity describes how the data is stored in the cache.

 Direct Mapped (Associativity == 1 ) means each line has its own slot. Direct Mapped (Associativity == 1 ) means each line has its own slot.

(Analogy: Each person gets their own mailbox.)

 X-way Associativity means X cache lines share a slot. (All the “A’s” share X-way Associativity means X cache lines share a slot. (All the “A’s” share

a mailbox, but it’s a bigger mailbox.)

 Fully associative means all cache lines share the same possible places. Fully associative means all cache lines share the same possible places.

(All the letters are put into one giant mailbox.)

(27)

Cache Memory

 Divided into lines_{Divided into lines}

 Each lines holding 16 _{Each lines holding 16}

to 128 byte lines

depending on the

CPU

 On the majority of _{On the majority of}

current CPUs the

memory cache is

organized in 64-byte

lines

(28)

_{Block 12 placed in 8 block cache:}_{Block 12 placed in 8 block cache:}

 Fully associative, direct mapped, 2-way set associativeFully associative, direct mapped, 2-way set associative  S.A. Mapping = Block Number Modulo Number SetsS.A. Mapping = Block Number Modulo Number Sets

0 1 2 3 4 5 6 7 Block

no.

Direct mapped: block 12 can go only into block 4 (12 mod 8)

0 1 2 3 4 5 6 7 Block

no.

Set associative: block 12 can go anywhere in set 0 (12 mod 4)

Set 0 Set 1 Set 2 Set 3 0 1 2 3 4 5 6 7

Block no.

Fully associative: block 12 can go anywhere

0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 Block-frame address

1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3 Block

no.

Where can a block be

(29)

Mapping Function

 Cache of 64kByte_{Cache of 64kByte}

 Cache block of 4 bytes_{Cache block of 4 bytes}

 i.e. cache is 16k (2i.e. cache is 16k (21414) lines of 4 bytes) lines of 4 bytes

 16MBytes main memory_{16MBytes main memory}  24-bit address _{24-bit address}

(30)

Cache Mapping Techniques

 Cache mapping is the method by which _{Cache mapping is the method by which} the contents of main memory are brought

the contents of main memory are brought

into the cache and referenced by the CPU

 The mapping method used directly _{The mapping method used directly} affects the performance of the entire

affects the performance of the entire

computer system.

(31)

Direct Mapping

 Main memory locations can only be copied _{Main memory locations can only be copied} into one location in the cache.

into one location in the cache.

 accomplished by dividing main memory _{accomplished by dividing main memory} into pages that correspond in size with

into pages that correspond in size with

the cache

(32)

Direct Mapping

 Each block of main memory maps to only _{Each block of main memory maps to only} one cache line

one cache line

 i.e. if a block is in cache, it must be in one i.e. if a block is in cache, it must be in one

specific place

 Address is in two parts_{Address is in two parts}

 Least Significant w bits identify unique _{Least Significant w bits identify unique} word

word

 Most Significant s bits specify one memory _{Most Significant s bits specify one memory} block

block

 The MSBs are split into a cache line field r _{The MSBs are split into a cache line field r} and a tag of s-r (most significant)

(33)

Direct Mapping

Address Structure

Tag s-r Line or Slot r Word w

8 14 2

 _{24 bit address}_{24 bit address}

 _{2 bit word identifier (4 byte block)}_{2 bit word identifier (4 byte block)}

 22 bit block identifier_{22 bit block identifier}

 8 bit tag (=22-14)8 bit tag (=22-14)

 14 bit slot or line14 bit slot or line

(34)

Direct Mapping

Cache Line Table

 Cache line_{Cache line} Main Memory blocks _{Main Memory blocks} held

held

 0₀ 0, m, 2m, 3m…2_{0, m, 2m, 3m…2}ss-m_-m

 1₁ 1,m+1, 2m+1…2_{1,m+1, 2m+1…2}ss-

-m+1

m+1

(35)

Direct Mapping Cache

Organization

(36)

Organization

(37)

Organization

(38)

Associative Mapping

 A main memory block can load into any _{A main memory block can load into any} line of cache

line of cache

 Memory address is interpreted as tag and _{Memory address is interpreted as tag and} word

word

(39)

Tag 22 bit Word_{2 bit}

Associative Mapping

Address Structure

 _{22 bit tag stored with each 32 bit block of data}_{22 bit tag stored with each 32 bit block of data}

 _{Compare tag field with tag entry in cache to check for hit}_{Compare tag field with tag entry in cache to check for hit}  _{Least significant 2 bits of address identify which 16 bit}_{Least significant 2 bits of address identify which 16 bit}

word is required from 32 bit data block word is required from 32 bit data block

 _e.g._e.g.

 Address TagAddress Tag DataData Cache lineCache line

 FFFFFC FFFFFCFFFFFC FFFFFC 24682468 24682468

(40)

Full Associative Mapping

 _{the most complex}_{the most complex}

 _{but it is most flexible with regards to where data can}_{but it is most flexible with regards to where data can}

reside reside

 _{A newly read block of main memory can be placed}_{A newly read block of main memory can be placed}

anywhere in the full associative cache anywhere in the full associative cache

 _{If the cache is full, a replacement algorithm is used to}_{If the cache is full, a replacement algorithm is used to}

determine which block in the cache gets replaced by the determine which block in the cache gets replaced by the

new data new data

 it needs to keep track of what memory locations are _{it needs to keep track of what memory locations are}

(41)

Fully Associative Cache

Organization

(42)

Fully Associative Cache

Organization

(43)

Set Associative Mapping

 combines the best of direct and _{combines the best of direct and}

associative cache mapping techniques

 As with a direct mapped cache, blocks of _{As with a direct mapped cache, blocks of} main memory data will still map into as

main memory data will still map into as

specific set, but they can now be in any

N-cache block frames within each set

(44)

Set Associative Mapping

 Cache is divided into a number of sets_{Cache is divided into a number of sets}  Each set contains a number of lines_{Each set contains a number of lines}

 A given block maps to any line in a given _{A given block maps to any line in a given} set

set

 e.g. Block B can be in any line of set ie.g. Block B can be in any line of set i

 e.g. 2 lines per set_{e.g. 2 lines per set}

 2 way associative mapping2 way associative mapping

 A given block can be in one of 2 lines in only A given block can be in one of 2 lines in only

one set

(45)

Set Associative Mapping

Address Structure

 Use set field to determine cache set to look in_{Use set field to determine cache set to look in}  Compare tag field to see if we have a hit_{Compare tag field to see if we have a hit}

 e.g_e.g

 AddressAddress TagTag Data Data Set numberSet number  1FF 7FFC1FF 7FFC 1FF1FF 1234567812345678 1FFF 1FFF

 001 7FFC001 7FFC 001001 1122334411223344 1FFF 1FFF

Tag 9 bit _{Set 13 bit} Word

(46)

Set Associative Mapping

Address Structure

(47)

Address Structure

(48)

Address Bit Partitioning (Direct Mapping)

TAG INDEX OFFSET

17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

Compar

e Bits Set Select Bits

Byte Select Bits

The Compare Bits are compared with the corresponding Tag Bits in the Cache Directory.

The Set Select Bits are used to select a particular Set in the Cache.

The Byte Select Bits are used to select a particular byte in the accessed block.

Memory size = 256KB = 2 18

Block size = 2Bytes = 2 1

Number of blocks in cache = Cache size/Block size = 64KB/2B = 217_/2 1_{= 2}15

Number of bits in Tag = Total bits - Index bits - Offset bits = 18-15-1 = 2

(49)

Address Bit Partitioning (Associative Mapping)

TAG INDEX OFFSET

17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

Compare Bits Set Select Bits Byte Select

Bits

The Compare Bits are compared with the corresponding Tag Bits in the Cache Directory.

The Set Select Bits are used to select a particular Set in the Cache.

The Byte Select Bits are used to select a particular byte in the accessed block.

Memory size = 256KB = 2 18

Block size = 2Bytes = 2 1

Number of sets in cache = Cache size/(Set size * Block size)

= 64KB/(4 blocks * 2B) = 217_/(22_{* 2}1_{) = 2}13

Number of bits in Tag = Total bits - Index bits - Offset bits = 18-13-1 = 4

(50)

_{Block 12 placed in 8 block cache:}_{Block 12 placed in 8 block cache:}

 Fully associative, direct mapped, 2-way set associativeFully associative, direct mapped, 2-way set associative  S.A. Mapping = Block Number Modulo Number SetsS.A. Mapping = Block Number Modulo Number Sets

0 1 2 3 4 5 6 7 Block

no.

Direct mapped: block 12 can go only into block 4 (12 mod 8)

0 1 2 3 4 5 6 7 Block

no.

Set associative: block 12 can go anywhere in set 0 (12 mod 4)

Set 0 Set 1 Set 2 Set 3 0 1 2 3 4 5 6 7

Block no.

Fully associative: block 12 can go anywhere

0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 Block-frame address

1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3 Block

no.

Where can a block be

(51)

Replacement Algorithms

Direct mapping

 No choice_{No choice}

(52)

Replacement Algorithms

Associative & Set Associative

 Hardware implemented algorithm (speed)_{Hardware implemented algorithm (speed)}  Least Recently used (LRU)_{Least Recently used (LRU)}

 e.g. in 2 way set associative_{e.g. in 2 way set associative}

 Which of the 2 block is lru?Which of the 2 block is lru?

 _{First in first out (FIFO)}_{First in first out (FIFO)}

 replace block that has been in cache longestreplace block that has been in cache longest

 Least frequently used_{Least frequently used}

 replace block which has had fewest hitsreplace block which has had fewest hits

(53)

Write Policy

 Must not overwrite a cache block unless _{Must not overwrite a cache block unless} main memory is up to date

main memory is up to date

(54)

Write through

 All writes go to main memory as well as _{All writes go to main memory as well as} cache

cache

 Multiple CPUs can monitor main memory _{Multiple CPUs can monitor main memory} traffic to keep local (to CPU) cache up to

traffic to keep local (to CPU) cache up to

date

 Lots of traffic_{Lots of traffic}

 _{Slows down writes}_{Slows down writes}

(55)

Write back

 Updates initially made in cache only_{Updates initially made in cache only}  Update bit for cache slot is set when _{Update bit for cache slot is set when}

update occurs

 If block is to be replaced, write to main _{If block is to be replaced, write to main} memory only if update bit is set

memory only if update bit is set

 Other caches get out of sync_{Other caches get out of sync}

 I/O must access main memory through _{I/O must access main memory through} cache

cache

(56)

Virtual Memory

 Divides physical memory into blocks and _{Divides physical memory into blocks and} allocate them to different processes.

allocate them to different processes.

 VM’s purpose is to enlarge the set of _{VM’s purpose is to enlarge the set of}

memory addresses a program can utilize

(57)

Virtual Memory

 With virtual memory, what the computer _{With virtual memory, what the computer} can do is look at RAM for areas that have

can do is look at RAM for areas that have

not been used recently and copy them

onto the hard disk. This frees up space in

RAM to load the new application.

 The area of the hard disk that stores the _{The area of the hard disk that stores the} RAM image is called a

RAM image is called a page filepage file that holds that holds

pages

(58)

Virtual Memory

 Page is a fixed sized memory addresses_{Page is a fixed sized memory addresses}

 The area of the hard disk that stores the _{The area of the hard disk that stores the} RAM image is called a

RAM image is called a page filepage file that holds that holds

pages

(59)

• OS performs address translation using page table – Each process has its own page table

• OS knows address of each process’s page table – Page table is an array of Page Table Entries

• One entry for each VPN of each process, indexed by VPN

– Each PTE contains

• Phys. Page Number • Permissions

• Dirty bit • LRU

• ~4bytes total

Virtual Page No. Page Offset Virtual Address

Main Memory Page

Table

Page ID: Address Translation

(60)

References

 William Stallings. Computer Architecture and _{William Stallings. Computer Architecture and}

Organization, 2000. Organization, 2000.

 _{Illinois State University, 2006 (}_{Illinois State University, 2006 (}

http://www.acs.ilstu.edu/faculty/cjong/Spring2006/ITK225 /ClassNotes

) )

 Vasanth Venkatachalam and Michael Franz. Power _{Vasanth Venkatachalam and Michael Franz. Power}

Reduction Techniques For Microprocessor Systems. Reduction Techniques For Microprocessor Systems.

ACM Computing Surveys. 2005. pp 195-233. ACM Computing Surveys. 2005. pp 195-233.

 Prof. David Brooks. Computer Architecture. Lecture 17:

http://www.acs.ilstu.edu/faculty/cjong/Spring2006/ITK225/ClassNotes