Chapter9-Memory_5.ppt

60  Download (0)

Full text

(1)

Computer Organization

Computer Organization

and Architecture

and Architecture

Chapter 9

Chapter 9

(2)

 LocationLocation  CapacityCapacity

 Unit of transferUnit of transfer  Access methodAccess method  PerformancePerformance

 Physical typePhysical type

 Physical characteristicsPhysical characteristics  OrganisationOrganisation

Characteristics

(3)

Location

Location

 CPUCPU

(4)

Capacity

Capacity

 Word sizeWord size

 The natural unit of organisationThe natural unit of organisation

 Number of wordsNumber of words

(5)

Unit of Transfer

Unit of Transfer

 InternalInternal

 Usually governed by data bus widthUsually governed by data bus width

 ExternalExternal

 Usually a block which is much larger than a Usually a block which is much larger than a

word

word

 Addressable unitAddressable unit

 Smallest location which can be uniquely Smallest location which can be uniquely

addressed

addressed

(6)

Access Methods

Access Methods

SequentialSequential

 Start at the beginning and read through in orderStart at the beginning and read through in order

 Access time depends on location of data and Access time depends on location of data and

previous location previous location

 e.g. tapee.g. tape

DirectDirect

 Individual blocks have unique addressIndividual blocks have unique address

 Access is by jumping to vicinity plus sequential Access is by jumping to vicinity plus sequential

search search

 Access time depends on location and previous Access time depends on location and previous

location location

(7)

Access Methods

Access Methods

 RandomRandom

 Individual addresses identify locations exactlyIndividual addresses identify locations exactly  Access time is independent of location or Access time is independent of location or

previous access

previous access

 e.g. RAMe.g. RAM

AssociativeAssociative

 Data is located by a comparison with contents Data is located by a comparison with contents

of a portion of the store

of a portion of the store

 Access time is independent of location or Access time is independent of location or

previous access

previous access

(8)

Performance

Performance

 Access timeAccess time

 Time between presenting the address and Time between presenting the address and

getting the valid data stored for use (perform

getting the valid data stored for use (perform

read and write operations)

read and write operations)

 Memory Cycle timeMemory Cycle time

 Time may be required for the memory to Time may be required for the memory to

“recover” before next access

“recover” before next access

 Cycle time is access + recovery (access time Cycle time is access + recovery (access time

+ time required before a 2

+ time required before a 2ndnd access can access can

commence)

(9)

Performance

Performance

 Transfer RateTransfer Rate

 Rate at which data can be moved (in and out Rate at which data can be moved (in and out

of memory)

of memory)

 RAM: 1/ (cycle time)RAM: 1/ (cycle time)  Non-RAM: TNon-RAM: T

N

N = T = TAA + N/R + N/R

Where:

Where:

TN = Average time to read or write N bits

TN = Average time to read or write N bits

TA = Average access time

TA = Average access time

N = Number of bits

N = Number of bits

R = Transfer rate, in bits per second (bps)

(10)

Memory Hierarchy

Memory Hierarchy

 RegistersRegisters

 In CPUIn CPU

 Internal or Main memoryInternal or Main memory

 May include one or more levels of cacheMay include one or more levels of cache  ““RAM”RAM”

 External memoryExternal memory

(11)

Memory Hierarchy

Memory Hierarchy

Registers Registers L1 Cache L1 Cache L2 Cache L2 Cache Main memory Main memory Disk cache Disk cache Magnetic Disk Magnetic Disk Optical Optical Tape Tape Price

(12)

Memory Hierarchy

Memory Hierarchy

Registers Registers L1 Cache L1 Cache L2 Cache L2 Cache Main memory Main memory Disk cache Disk cache Magnetic Disk Magnetic Disk Optical Optical Tape Tape Price per byte Performance

 Faster access time, Faster access time,

greater cost per bit greater cost per bit

Greater capacity, smaller Greater capacity, smaller

cost per bit cost per bit

 Greater capacity, slower Greater capacity, slower

access time access time

 Decreasing cost per bitDecreasing cost per bit  Increasing capacityIncreasing capacity

Increasing access timeIncreasing access timeDecreasing frequency of Decreasing frequency of

access of the memory by access of the memory by

(13)

13

13

Memory Hierarchy

Memory Hierarchy

 Design constraints on computer’s memory Design constraints on computer’s memory can be summed up by three question

can be summed up by three question

 How much?How much?

• If the Capacity is there the application will useIf the Capacity is there the application will use

 How fast?How fast?

• To achieve greatest performance, the memory To achieve greatest performance, the memory must be able to keep up with the processor

must be able to keep up with the processor

• Processor should not pause waiting for instructions Processor should not pause waiting for instructions or operands

or operands

 How expensive?How expensive?

• Must be reasonable in relationship to other Must be reasonable in relationship to other components

(14)

Effectiveness of Memory Hierarchy

Effectiveness of Memory Hierarchy

Hit ratio:

Hit ratio:

T

TA = h*M = h*M1 + (1-h)(M + (1-h)(M1+M+M2)

where : where :

h = hit rate h = hit rate

MM1 = access time of 1st level memory = access time of 1st level memory

(15)

Physical Types

Physical Types

 SemiconductorSemiconductor

 RAMRAM

 MagneticMagnetic

 Disk & TapeDisk & Tape

 OpticalOptical

 CD & DVDCD & DVD

 OthersOthers

 BubbleBubble

(16)

16

16

Physical Characteristics

Physical Characteristics

VolatilityVolatility

 Volatile memory – information decays naturally or is Volatile memory – information decays naturally or is

lost when electrical power is switched off lost when electrical power is switched off

 Non volatile – information remains without Non volatile – information remains without

deterioration until deliberately changed (magnetic deterioration until deliberately changed (magnetic

surface) surface)

 ErasableErasable

 Non erasable memories cannot be altered except by Non erasable memories cannot be altered except by

destroying the storage unit e.d ROM destroying the storage unit e.d ROM

(17)

Semiconductor Memory

Semiconductor Memory

 RAM RAM

 Misnamed as all semiconductor memory is Misnamed as all semiconductor memory is

random access

random access

 Read/WriteRead/Write  VolatileVolatile

(18)

Dynamic RAM

Dynamic RAM

 Bits stored as charge in capacitorsBits stored as charge in capacitors  Charges leakCharges leak

 Need refreshing even when poweredNeed refreshing even when powered  Simpler constructionSimpler construction

 Smaller per bitSmaller per bit  Less expensiveLess expensive

 Need refresh circuitsNeed refresh circuits  SlowerSlower

(19)

Static RAM

Static RAM

 Bits stored as on/off switchesBits stored as on/off switches  No charges to leakNo charges to leak

 No refreshing needed when poweredNo refreshing needed when powered  More complex constructionMore complex construction

 Larger per bitLarger per bit

 More expensiveMore expensive

 Does not need refresh circuitsDoes not need refresh circuits  FasterFaster

(20)

Read Only Memory (ROM)

Read Only Memory (ROM)

 Permanent storagePermanent storage

 Microprogramming (see later)Microprogramming (see later)  Library subroutinesLibrary subroutines

(21)

Types of ROM

Types of ROM

Written during manufactureWritten during manufacture

 Very expensive for small runsVery expensive for small runs

 Programmable (once)Programmable (once)

 PROMPROM

 Needs special equipment to programNeeds special equipment to program

Read “mostly”Read “mostly”

 Erasable Programmable (EPROM)Erasable Programmable (EPROM)

• Erased by UVErased by UV

 Electrically Erasable (EEPROM)Electrically Erasable (EEPROM)

• Takes much longer to write than readTakes much longer to write than read

 Flash memoryFlash memory

(22)

Cache

Cache

Small amount of fast memorySmall amount of fast memory

 Sits between normal main memory and CPUSits between normal main memory and CPU  May be located on CPU chip or moduleMay be located on CPU chip or module

(23)

Cache Design

Cache Design

 SizeSize

 Mapping FunctionMapping Function

 Replacement AlgorithmReplacement Algorithm  Write PolicyWrite Policy

 Block SizeBlock Size

(24)

Cache operation - overview

Cache operation - overview

 CPU requests contents of memory CPU requests contents of memory location

location

 Check cache for this dataCheck cache for this data

 If present, get from cache (fast)If present, get from cache (fast)

 If not present, read required block from If not present, read required block from main memory to cache

main memory to cache

 Then deliver from cache to CPUThen deliver from cache to CPU

 Cache includes tags to identify which Cache includes tags to identify which block of main memory is in each cache

block of main memory is in each cache

slot

(25)

Typical Cache Organization

(26)

What Is A Cache Line?What Is A Cache Line?

Data is hauled into the cache from memory in “chunks”.Data is hauled into the cache from memory in “chunks”.

 If you ask for 4 bytes of data, you’ll get the whole line (32/64/128 bytes)If you ask for 4 bytes of data, you’ll get the whole line (32/64/128 bytes)  Locality of reference says you’ll need that data anywayLocality of reference says you’ll need that data anyway

 Incur the cost only once rather than each time you ask for a piece of data.Incur the cost only once rather than each time you ask for a piece of data.

How Is The Cache Laid Out?How Is The Cache Laid Out?

The cache is made up of a number of cache lines.The cache is made up of a number of cache lines.

 The Level 1 Data Cache of a Pentium P4 Xeon processor contains 8K The Level 1 Data Cache of a Pentium P4 Xeon processor contains 8K

bytes.

bytes.

 The cache lines are each 64 Bytes.The cache lines are each 64 Bytes.

 This gives 8192 bytes / 64 bytes = 128 cache lines.This gives 8192 bytes / 64 bytes = 128 cache lines.

How Does The Cache Manage the Cache Lines?How Does The Cache Manage the Cache Lines?

Associativity describes how the data is stored in the cache.Associativity describes how the data is stored in the cache.

 Direct Mapped (Associativity == 1 ) means each line has its own slot. Direct Mapped (Associativity == 1 ) means each line has its own slot.

(Analogy: Each person gets their own mailbox.)

(Analogy: Each person gets their own mailbox.)

 X-way Associativity means X cache lines share a slot. (All the “A’s” share X-way Associativity means X cache lines share a slot. (All the “A’s” share

a mailbox, but it’s a bigger mailbox.)

a mailbox, but it’s a bigger mailbox.)

 Fully associative means all cache lines share the same possible places. Fully associative means all cache lines share the same possible places.

(All the letters are put into one giant mailbox.)

(27)

Cache Memory

Cache Memory

 Divided into linesDivided into lines

 Each lines holding 16 Each lines holding 16

to 128 byte lines

to 128 byte lines

depending on the

depending on the

CPU

CPU

 On the majority of On the majority of

current CPUs the

current CPUs the

memory cache is

memory cache is

organized in 64-byte

organized in 64-byte

lines

(28)

Block 12 placed in 8 block cache:Block 12 placed in 8 block cache:

 Fully associative, direct mapped, 2-way set associativeFully associative, direct mapped, 2-way set associative  S.A. Mapping = Block Number Modulo Number SetsS.A. Mapping = Block Number Modulo Number Sets

0 1 2 3 4 5 6 7 Block

no.

Direct mapped: block 12 can go only into block 4 (12 mod 8)

0 1 2 3 4 5 6 7 Block

no.

Set associative: block 12 can go anywhere in set 0 (12 mod 4)

Set 0 Set 1 Set 2 Set 3 0 1 2 3 4 5 6 7

Block no.

Fully associative: block 12 can go anywhere

0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 Block-frame address

1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3 Block

no.

Where can a block be

(29)

Mapping Function

Mapping Function

 Cache of 64kByteCache of 64kByte

 Cache block of 4 bytesCache block of 4 bytes

 i.e. cache is 16k (2i.e. cache is 16k (21414) lines of 4 bytes) lines of 4 bytes

 16MBytes main memory16MBytes main memory  24-bit address 24-bit address

(30)

Cache Mapping Techniques

Cache Mapping Techniques

 Cache mapping is the method by which Cache mapping is the method by which the contents of main memory are brought

the contents of main memory are brought

into the cache and referenced by the CPU

into the cache and referenced by the CPU

 The mapping method used directly The mapping method used directly affects the performance of the entire

affects the performance of the entire

computer system.

(31)

Direct Mapping

Direct Mapping

 Main memory locations can only be copied Main memory locations can only be copied into one location in the cache.

into one location in the cache.

 accomplished by dividing main memory accomplished by dividing main memory into pages that correspond in size with

into pages that correspond in size with

the cache

(32)

Direct Mapping

Direct Mapping

 Each block of main memory maps to only Each block of main memory maps to only one cache line

one cache line

 i.e. if a block is in cache, it must be in one i.e. if a block is in cache, it must be in one

specific place

specific place

 Address is in two partsAddress is in two parts

 Least Significant w bits identify unique Least Significant w bits identify unique word

word

 Most Significant s bits specify one memory Most Significant s bits specify one memory block

block

 The MSBs are split into a cache line field r The MSBs are split into a cache line field r and a tag of s-r (most significant)

(33)

Direct Mapping

Direct Mapping

Address Structure

Address Structure

Tag s-r Line or Slot r Word w

8 14 2

24 bit address24 bit address

2 bit word identifier (4 byte block)2 bit word identifier (4 byte block)

 22 bit block identifier22 bit block identifier

 8 bit tag (=22-14)8 bit tag (=22-14)

 14 bit slot or line14 bit slot or line

(34)

Direct Mapping

Direct Mapping

Cache Line Table

Cache Line Table

 Cache lineCache line Main Memory blocks Main Memory blocks held

held

 00 0, m, 2m, 3m…20, m, 2m, 3m…2ss-m-m

 11 1,m+1, 2m+1…21,m+1, 2m+1…2ss-

-m+1

m+1

(35)

Direct Mapping Cache

Direct Mapping Cache

Organization

(36)

Direct Mapping Cache

Direct Mapping Cache

Organization

(37)

Direct Mapping Cache

Direct Mapping Cache

Organization

(38)

Associative Mapping

Associative Mapping

 A main memory block can load into any A main memory block can load into any line of cache

line of cache

 Memory address is interpreted as tag and Memory address is interpreted as tag and word

word

(39)

Tag 22 bit Word2 bit

Associative Mapping

Associative Mapping

Address Structure

Address Structure

22 bit tag stored with each 32 bit block of data22 bit tag stored with each 32 bit block of data

Compare tag field with tag entry in cache to check for hitCompare tag field with tag entry in cache to check for hitLeast significant 2 bits of address identify which 16 bit Least significant 2 bits of address identify which 16 bit

word is required from 32 bit data block word is required from 32 bit data block

e.g.e.g.

 Address TagAddress Tag DataData Cache lineCache line

 FFFFFC FFFFFCFFFFFC FFFFFC 24682468 24682468

(40)

Full Associative Mapping

Full Associative Mapping

the most complexthe most complex

but it is most flexible with regards to where data can but it is most flexible with regards to where data can

reside reside

A newly read block of main memory can be placed A newly read block of main memory can be placed

anywhere in the full associative cache anywhere in the full associative cache

If the cache is full, a replacement algorithm is used to If the cache is full, a replacement algorithm is used to

determine which block in the cache gets replaced by the determine which block in the cache gets replaced by the

new data new data

 it needs to keep track of what memory locations are it needs to keep track of what memory locations are

(41)

Fully Associative Cache

Fully Associative Cache

Organization

(42)

Fully Associative Cache

Fully Associative Cache

Organization

(43)

Set Associative Mapping

Set Associative Mapping

 combines the best of direct and combines the best of direct and

associative cache mapping techniques

associative cache mapping techniques

 As with a direct mapped cache, blocks of As with a direct mapped cache, blocks of main memory data will still map into as

main memory data will still map into as

specific set, but they can now be in any

specific set, but they can now be in any

N-cache block frames within each set

(44)

Set Associative Mapping

Set Associative Mapping

 Cache is divided into a number of setsCache is divided into a number of sets  Each set contains a number of linesEach set contains a number of lines

 A given block maps to any line in a given A given block maps to any line in a given set

set

 e.g. Block B can be in any line of set ie.g. Block B can be in any line of set i

 e.g. 2 lines per sete.g. 2 lines per set

 2 way associative mapping2 way associative mapping

 A given block can be in one of 2 lines in only A given block can be in one of 2 lines in only

one set

(45)

Set Associative Mapping

Set Associative Mapping

Address Structure

Address Structure

 Use set field to determine cache set to look inUse set field to determine cache set to look in  Compare tag field to see if we have a hitCompare tag field to see if we have a hit

 e.ge.g

 AddressAddress TagTag Data Data Set numberSet number  1FF 7FFC1FF 7FFC 1FF1FF 1234567812345678 1FFF 1FFF

 001 7FFC001 7FFC 001001 1122334411223344 1FFF 1FFF

Tag 9 bit Set 13 bit Word

(46)

Set Associative Mapping

Set Associative Mapping

Address Structure

(47)

Set Associative Mapping

Set Associative Mapping

Address Structure

(48)

Address Bit Partitioning (Direct Mapping)

TAG INDEX OFFSET

17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

Compar

e Bits Set Select Bits

Byte Select Bits

The Compare Bits are compared with the corresponding Tag Bits in the Cache Directory.

The Set Select Bits are used to select a particular Set in the Cache.

The Byte Select Bits are used to select a particular byte in the accessed block.

Memory size = 256KB = 2 18

Block size = 2Bytes = 2 1

Number of blocks in cache = Cache size/Block size = 64KB/2B = 217/2 1 = 215

Number of bits in Tag = Total bits - Index bits - Offset bits = 18-15-1 = 2

(49)

Address Bit Partitioning (Associative Mapping)

TAG INDEX OFFSET

17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

Compare Bits Set Select Bits Byte Select

Bits

The Compare Bits are compared with the corresponding Tag Bits in the Cache Directory.

The Set Select Bits are used to select a particular Set in the Cache.

The Byte Select Bits are used to select a particular byte in the accessed block.

Memory size = 256KB = 2 18

Block size = 2Bytes = 2 1

Number of sets in cache = Cache size/(Set size * Block size)

= 64KB/(4 blocks * 2B) = 217/(22 * 21) = 213

Number of bits in Tag = Total bits - Index bits - Offset bits = 18-13-1 = 4

(50)

Block 12 placed in 8 block cache:Block 12 placed in 8 block cache:

 Fully associative, direct mapped, 2-way set associativeFully associative, direct mapped, 2-way set associative  S.A. Mapping = Block Number Modulo Number SetsS.A. Mapping = Block Number Modulo Number Sets

0 1 2 3 4 5 6 7 Block

no.

Direct mapped: block 12 can go only into block 4 (12 mod 8)

0 1 2 3 4 5 6 7 Block

no.

Set associative: block 12 can go anywhere in set 0 (12 mod 4)

Set 0 Set 1 Set 2 Set 3 0 1 2 3 4 5 6 7

Block no.

Fully associative: block 12 can go anywhere

0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 Block-frame address

1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3 Block

no.

Where can a block be

(51)

Replacement Algorithms

Replacement Algorithms

Direct mapping

Direct mapping

 No choiceNo choice

(52)

Replacement Algorithms

Replacement Algorithms

Associative & Set Associative

Associative & Set Associative

 Hardware implemented algorithm (speed)Hardware implemented algorithm (speed)  Least Recently used (LRU)Least Recently used (LRU)

 e.g. in 2 way set associativee.g. in 2 way set associative

 Which of the 2 block is lru?Which of the 2 block is lru?

First in first out (FIFO)First in first out (FIFO)

 replace block that has been in cache longestreplace block that has been in cache longest

 Least frequently usedLeast frequently used

 replace block which has had fewest hitsreplace block which has had fewest hits

(53)

Write Policy

Write Policy

 Must not overwrite a cache block unless Must not overwrite a cache block unless main memory is up to date

main memory is up to date

(54)

Write through

Write through

 All writes go to main memory as well as All writes go to main memory as well as cache

cache

 Multiple CPUs can monitor main memory Multiple CPUs can monitor main memory traffic to keep local (to CPU) cache up to

traffic to keep local (to CPU) cache up to

date

date

 Lots of trafficLots of traffic

Slows down writesSlows down writes

(55)

Write back

Write back

 Updates initially made in cache onlyUpdates initially made in cache only  Update bit for cache slot is set when Update bit for cache slot is set when

update occurs

update occurs

 If block is to be replaced, write to main If block is to be replaced, write to main memory only if update bit is set

memory only if update bit is set

 Other caches get out of syncOther caches get out of sync

 I/O must access main memory through I/O must access main memory through cache

cache

(56)

Virtual Memory

Virtual Memory

 Divides physical memory into blocks and Divides physical memory into blocks and allocate them to different processes.

allocate them to different processes.

 VM’s purpose is to enlarge the set of VM’s purpose is to enlarge the set of

memory addresses a program can utilize

(57)

Virtual Memory

Virtual Memory

 With virtual memory, what the computer With virtual memory, what the computer can do is look at RAM for areas that have

can do is look at RAM for areas that have

not been used recently and copy them

not been used recently and copy them

onto the hard disk. This frees up space in

onto the hard disk. This frees up space in

RAM to load the new application.

RAM to load the new application.

 The area of the hard disk that stores the The area of the hard disk that stores the RAM image is called a

RAM image is called a page filepage file that holds that holds

pages

(58)

Virtual Memory

Virtual Memory

 Page is a fixed sized memory addressesPage is a fixed sized memory addresses

 The area of the hard disk that stores the The area of the hard disk that stores the RAM image is called a

RAM image is called a page filepage file that holds that holds

pages

(59)

• OS performs address translation using page table – Each process has its own page table

• OS knows address of each process’s page table – Page table is an array of Page Table Entries

• One entry for each VPN of each process, indexed by VPN

– Each PTE contains

• Phys. Page Number • Permissions

• Dirty bit • LRU

• ~4bytes total

Virtual Page No. Page Offset Virtual Address

Main Memory Page

Table

Page ID: Address Translation

(60)

References

References

 William Stallings. Computer Architecture and William Stallings. Computer Architecture and

Organization, 2000. Organization, 2000.

Illinois State University, 2006 (Illinois State University, 2006 (

http://www.acs.ilstu.edu/faculty/cjong/Spring2006/ITK225 /ClassNotes

) )

 Vasanth Venkatachalam and Michael Franz. Power Vasanth Venkatachalam and Michael Franz. Power

Reduction Techniques For Microprocessor Systems. Reduction Techniques For Microprocessor Systems.

ACM Computing Surveys. 2005. pp 195-233. ACM Computing Surveys. 2005. pp 195-233.

 Prof. David Brooks. Computer Architecture. Lecture 17:

Figure

Updating...

References

Related subjects : Disk Cache magnetic disk L1 cache