Computer Organization
Computer Organization
and Architecture
and Architecture
Chapter 9
Chapter 9
LocationLocation CapacityCapacity
Unit of transferUnit of transfer Access methodAccess method PerformancePerformance
Physical typePhysical type
Physical characteristicsPhysical characteristics OrganisationOrganisation
Characteristics
Location
Location
CPUCPU
Capacity
Capacity
Word sizeWord size
The natural unit of organisationThe natural unit of organisation
Number of wordsNumber of words
Unit of Transfer
Unit of Transfer
InternalInternal
Usually governed by data bus widthUsually governed by data bus width
ExternalExternal
Usually a block which is much larger than a Usually a block which is much larger than a
word
word
Addressable unitAddressable unit
Smallest location which can be uniquely Smallest location which can be uniquely
addressed
addressed
Access Methods
Access Methods
SequentialSequential
Start at the beginning and read through in orderStart at the beginning and read through in order
Access time depends on location of data and Access time depends on location of data and
previous location previous location
e.g. tapee.g. tape
DirectDirect
Individual blocks have unique addressIndividual blocks have unique address
Access is by jumping to vicinity plus sequential Access is by jumping to vicinity plus sequential
search search
Access time depends on location and previous Access time depends on location and previous
location location
Access Methods
Access Methods
RandomRandom
Individual addresses identify locations exactlyIndividual addresses identify locations exactly Access time is independent of location or Access time is independent of location or
previous access
previous access
e.g. RAMe.g. RAM
AssociativeAssociative
Data is located by a comparison with contents Data is located by a comparison with contents
of a portion of the store
of a portion of the store
Access time is independent of location or Access time is independent of location or
previous access
previous access
Performance
Performance
Access timeAccess time
Time between presenting the address and Time between presenting the address and
getting the valid data stored for use (perform
getting the valid data stored for use (perform
read and write operations)
read and write operations)
Memory Cycle timeMemory Cycle time
Time may be required for the memory to Time may be required for the memory to
“recover” before next access
“recover” before next access
Cycle time is access + recovery (access time Cycle time is access + recovery (access time
+ time required before a 2
+ time required before a 2ndnd access can access can
commence)
Performance
Performance
Transfer RateTransfer Rate
Rate at which data can be moved (in and out Rate at which data can be moved (in and out
of memory)
of memory)
RAM: 1/ (cycle time)RAM: 1/ (cycle time) Non-RAM: TNon-RAM: T
N
N = T = TAA + N/R + N/R
Where:
Where:
TN = Average time to read or write N bits
TN = Average time to read or write N bits
TA = Average access time
TA = Average access time
N = Number of bits
N = Number of bits
R = Transfer rate, in bits per second (bps)
Memory Hierarchy
Memory Hierarchy
RegistersRegisters
In CPUIn CPU
Internal or Main memoryInternal or Main memory
May include one or more levels of cacheMay include one or more levels of cache ““RAM”RAM”
External memoryExternal memory
Memory Hierarchy
Memory Hierarchy
Registers Registers L1 Cache L1 Cache L2 Cache L2 Cache Main memory Main memory Disk cache Disk cache Magnetic Disk Magnetic Disk Optical Optical Tape Tape PriceMemory Hierarchy
Memory Hierarchy
Registers Registers L1 Cache L1 Cache L2 Cache L2 Cache Main memory Main memory Disk cache Disk cache Magnetic Disk Magnetic Disk Optical Optical Tape Tape Price per byte Performance Faster access time, Faster access time,
greater cost per bit greater cost per bit
Greater capacity, smaller Greater capacity, smaller
cost per bit cost per bit
Greater capacity, slower Greater capacity, slower
access time access time
Decreasing cost per bitDecreasing cost per bit Increasing capacityIncreasing capacity
Increasing access timeIncreasing access time Decreasing frequency of Decreasing frequency of
access of the memory by access of the memory by
13
13
Memory Hierarchy
Memory Hierarchy
Design constraints on computer’s memory Design constraints on computer’s memory can be summed up by three question
can be summed up by three question
How much?How much?
• If the Capacity is there the application will useIf the Capacity is there the application will use
How fast?How fast?
• To achieve greatest performance, the memory To achieve greatest performance, the memory must be able to keep up with the processor
must be able to keep up with the processor
• Processor should not pause waiting for instructions Processor should not pause waiting for instructions or operands
or operands
How expensive?How expensive?
• Must be reasonable in relationship to other Must be reasonable in relationship to other components
Effectiveness of Memory Hierarchy
Effectiveness of Memory Hierarchy
Hit ratio:
Hit ratio:
T
TA = h*M = h*M1 + (1-h)(M + (1-h)(M1+M+M2)
where : where :
h = hit rate h = hit rate
MM1 = access time of 1st level memory = access time of 1st level memory
Physical Types
Physical Types
SemiconductorSemiconductor
RAMRAM
MagneticMagnetic
Disk & TapeDisk & Tape
OpticalOptical
CD & DVDCD & DVD
OthersOthers
BubbleBubble
16
16
Physical Characteristics
Physical Characteristics
VolatilityVolatility
Volatile memory – information decays naturally or is Volatile memory – information decays naturally or is
lost when electrical power is switched off lost when electrical power is switched off
Non volatile – information remains without Non volatile – information remains without
deterioration until deliberately changed (magnetic deterioration until deliberately changed (magnetic
surface) surface)
ErasableErasable
Non erasable memories cannot be altered except by Non erasable memories cannot be altered except by
destroying the storage unit e.d ROM destroying the storage unit e.d ROM
Semiconductor Memory
Semiconductor Memory
RAM RAM
Misnamed as all semiconductor memory is Misnamed as all semiconductor memory is
random access
random access
Read/WriteRead/Write VolatileVolatile
Dynamic RAM
Dynamic RAM
Bits stored as charge in capacitorsBits stored as charge in capacitors Charges leakCharges leak
Need refreshing even when poweredNeed refreshing even when powered Simpler constructionSimpler construction
Smaller per bitSmaller per bit Less expensiveLess expensive
Need refresh circuitsNeed refresh circuits SlowerSlower
Static RAM
Static RAM
Bits stored as on/off switchesBits stored as on/off switches No charges to leakNo charges to leak
No refreshing needed when poweredNo refreshing needed when powered More complex constructionMore complex construction
Larger per bitLarger per bit
More expensiveMore expensive
Does not need refresh circuitsDoes not need refresh circuits FasterFaster
Read Only Memory (ROM)
Read Only Memory (ROM)
Permanent storagePermanent storage
Microprogramming (see later)Microprogramming (see later) Library subroutinesLibrary subroutines
Types of ROM
Types of ROM
Written during manufactureWritten during manufacture
Very expensive for small runsVery expensive for small runs
Programmable (once)Programmable (once)
PROMPROM
Needs special equipment to programNeeds special equipment to program
Read “mostly”Read “mostly”
Erasable Programmable (EPROM)Erasable Programmable (EPROM)
• Erased by UVErased by UV
Electrically Erasable (EEPROM)Electrically Erasable (EEPROM)
• Takes much longer to write than readTakes much longer to write than read
Flash memoryFlash memory
Cache
Cache
Small amount of fast memorySmall amount of fast memory
Sits between normal main memory and CPUSits between normal main memory and CPU May be located on CPU chip or moduleMay be located on CPU chip or module
Cache Design
Cache Design
SizeSize
Mapping FunctionMapping Function
Replacement AlgorithmReplacement Algorithm Write PolicyWrite Policy
Block SizeBlock Size
Cache operation - overview
Cache operation - overview
CPU requests contents of memory CPU requests contents of memory location
location
Check cache for this dataCheck cache for this data
If present, get from cache (fast)If present, get from cache (fast)
If not present, read required block from If not present, read required block from main memory to cache
main memory to cache
Then deliver from cache to CPUThen deliver from cache to CPU
Cache includes tags to identify which Cache includes tags to identify which block of main memory is in each cache
block of main memory is in each cache
slot
Typical Cache Organization
What Is A Cache Line?What Is A Cache Line?
Data is hauled into the cache from memory in “chunks”.Data is hauled into the cache from memory in “chunks”.
If you ask for 4 bytes of data, you’ll get the whole line (32/64/128 bytes)If you ask for 4 bytes of data, you’ll get the whole line (32/64/128 bytes) Locality of reference says you’ll need that data anywayLocality of reference says you’ll need that data anyway
Incur the cost only once rather than each time you ask for a piece of data.Incur the cost only once rather than each time you ask for a piece of data.
How Is The Cache Laid Out?How Is The Cache Laid Out?
The cache is made up of a number of cache lines.The cache is made up of a number of cache lines.
The Level 1 Data Cache of a Pentium P4 Xeon processor contains 8K The Level 1 Data Cache of a Pentium P4 Xeon processor contains 8K
bytes.
bytes.
The cache lines are each 64 Bytes.The cache lines are each 64 Bytes.
This gives 8192 bytes / 64 bytes = 128 cache lines.This gives 8192 bytes / 64 bytes = 128 cache lines.
How Does The Cache Manage the Cache Lines?How Does The Cache Manage the Cache Lines?
Associativity describes how the data is stored in the cache.Associativity describes how the data is stored in the cache.
Direct Mapped (Associativity == 1 ) means each line has its own slot. Direct Mapped (Associativity == 1 ) means each line has its own slot.
(Analogy: Each person gets their own mailbox.)
(Analogy: Each person gets their own mailbox.)
X-way Associativity means X cache lines share a slot. (All the “A’s” share X-way Associativity means X cache lines share a slot. (All the “A’s” share
a mailbox, but it’s a bigger mailbox.)
a mailbox, but it’s a bigger mailbox.)
Fully associative means all cache lines share the same possible places. Fully associative means all cache lines share the same possible places.
(All the letters are put into one giant mailbox.)
Cache Memory
Cache Memory
Divided into linesDivided into lines
Each lines holding 16 Each lines holding 16
to 128 byte lines
to 128 byte lines
depending on the
depending on the
CPU
CPU
On the majority of On the majority of
current CPUs the
current CPUs the
memory cache is
memory cache is
organized in 64-byte
organized in 64-byte
lines
Block 12 placed in 8 block cache:Block 12 placed in 8 block cache:
Fully associative, direct mapped, 2-way set associativeFully associative, direct mapped, 2-way set associative S.A. Mapping = Block Number Modulo Number SetsS.A. Mapping = Block Number Modulo Number Sets
0 1 2 3 4 5 6 7 Block
no.
Direct mapped: block 12 can go only into block 4 (12 mod 8)
0 1 2 3 4 5 6 7 Block
no.
Set associative: block 12 can go anywhere in set 0 (12 mod 4)
Set 0 Set 1 Set 2 Set 3 0 1 2 3 4 5 6 7
Block no.
Fully associative: block 12 can go anywhere
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 Block-frame address
1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3 Block
no.
Where can a block be
Mapping Function
Mapping Function
Cache of 64kByteCache of 64kByte
Cache block of 4 bytesCache block of 4 bytes
i.e. cache is 16k (2i.e. cache is 16k (21414) lines of 4 bytes) lines of 4 bytes
16MBytes main memory16MBytes main memory 24-bit address 24-bit address
Cache Mapping Techniques
Cache Mapping Techniques
Cache mapping is the method by which Cache mapping is the method by which the contents of main memory are brought
the contents of main memory are brought
into the cache and referenced by the CPU
into the cache and referenced by the CPU
The mapping method used directly The mapping method used directly affects the performance of the entire
affects the performance of the entire
computer system.
Direct Mapping
Direct Mapping
Main memory locations can only be copied Main memory locations can only be copied into one location in the cache.
into one location in the cache.
accomplished by dividing main memory accomplished by dividing main memory into pages that correspond in size with
into pages that correspond in size with
the cache
Direct Mapping
Direct Mapping
Each block of main memory maps to only Each block of main memory maps to only one cache line
one cache line
i.e. if a block is in cache, it must be in one i.e. if a block is in cache, it must be in one
specific place
specific place
Address is in two partsAddress is in two parts
Least Significant w bits identify unique Least Significant w bits identify unique word
word
Most Significant s bits specify one memory Most Significant s bits specify one memory block
block
The MSBs are split into a cache line field r The MSBs are split into a cache line field r and a tag of s-r (most significant)
Direct Mapping
Direct Mapping
Address Structure
Address Structure
Tag s-r Line or Slot r Word w
8 14 2
24 bit address24 bit address
2 bit word identifier (4 byte block)2 bit word identifier (4 byte block)
22 bit block identifier22 bit block identifier
8 bit tag (=22-14)8 bit tag (=22-14)
14 bit slot or line14 bit slot or line
Direct Mapping
Direct Mapping
Cache Line Table
Cache Line Table
Cache lineCache line Main Memory blocks Main Memory blocks held
held
00 0, m, 2m, 3m…20, m, 2m, 3m…2ss-m-m
11 1,m+1, 2m+1…21,m+1, 2m+1…2ss-
-m+1
m+1
Direct Mapping Cache
Direct Mapping Cache
Organization
Direct Mapping Cache
Direct Mapping Cache
Organization
Direct Mapping Cache
Direct Mapping Cache
Organization
Associative Mapping
Associative Mapping
A main memory block can load into any A main memory block can load into any line of cache
line of cache
Memory address is interpreted as tag and Memory address is interpreted as tag and word
word
Tag 22 bit Word2 bit
Associative Mapping
Associative Mapping
Address Structure
Address Structure
22 bit tag stored with each 32 bit block of data22 bit tag stored with each 32 bit block of data
Compare tag field with tag entry in cache to check for hitCompare tag field with tag entry in cache to check for hit Least significant 2 bits of address identify which 16 bit Least significant 2 bits of address identify which 16 bit
word is required from 32 bit data block word is required from 32 bit data block
e.g.e.g.
Address TagAddress Tag DataData Cache lineCache line
FFFFFC FFFFFCFFFFFC FFFFFC 24682468 24682468
Full Associative Mapping
Full Associative Mapping
the most complexthe most complex
but it is most flexible with regards to where data can but it is most flexible with regards to where data can
reside reside
A newly read block of main memory can be placed A newly read block of main memory can be placed
anywhere in the full associative cache anywhere in the full associative cache
If the cache is full, a replacement algorithm is used to If the cache is full, a replacement algorithm is used to
determine which block in the cache gets replaced by the determine which block in the cache gets replaced by the
new data new data
it needs to keep track of what memory locations are it needs to keep track of what memory locations are
Fully Associative Cache
Fully Associative Cache
Organization
Fully Associative Cache
Fully Associative Cache
Organization
Set Associative Mapping
Set Associative Mapping
combines the best of direct and combines the best of direct and
associative cache mapping techniques
associative cache mapping techniques
As with a direct mapped cache, blocks of As with a direct mapped cache, blocks of main memory data will still map into as
main memory data will still map into as
specific set, but they can now be in any
specific set, but they can now be in any
N-cache block frames within each set
Set Associative Mapping
Set Associative Mapping
Cache is divided into a number of setsCache is divided into a number of sets Each set contains a number of linesEach set contains a number of lines
A given block maps to any line in a given A given block maps to any line in a given set
set
e.g. Block B can be in any line of set ie.g. Block B can be in any line of set i
e.g. 2 lines per sete.g. 2 lines per set
2 way associative mapping2 way associative mapping
A given block can be in one of 2 lines in only A given block can be in one of 2 lines in only
one set
Set Associative Mapping
Set Associative Mapping
Address Structure
Address Structure
Use set field to determine cache set to look inUse set field to determine cache set to look in Compare tag field to see if we have a hitCompare tag field to see if we have a hit
e.ge.g
AddressAddress TagTag Data Data Set numberSet number 1FF 7FFC1FF 7FFC 1FF1FF 1234567812345678 1FFF 1FFF
001 7FFC001 7FFC 001001 1122334411223344 1FFF 1FFF
Tag 9 bit Set 13 bit Word
Set Associative Mapping
Set Associative Mapping
Address Structure
Set Associative Mapping
Set Associative Mapping
Address Structure
Address Bit Partitioning (Direct Mapping)
TAG INDEX OFFSET
17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
Compar
e Bits Set Select Bits
Byte Select Bits
The Compare Bits are compared with the corresponding Tag Bits in the Cache Directory.
The Set Select Bits are used to select a particular Set in the Cache.
The Byte Select Bits are used to select a particular byte in the accessed block.
Memory size = 256KB = 2 18
Block size = 2Bytes = 2 1
Number of blocks in cache = Cache size/Block size = 64KB/2B = 217/2 1 = 215
Number of bits in Tag = Total bits - Index bits - Offset bits = 18-15-1 = 2
Address Bit Partitioning (Associative Mapping)
TAG INDEX OFFSET
17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
Compare Bits Set Select Bits Byte Select
Bits
The Compare Bits are compared with the corresponding Tag Bits in the Cache Directory.
The Set Select Bits are used to select a particular Set in the Cache.
The Byte Select Bits are used to select a particular byte in the accessed block.
Memory size = 256KB = 2 18
Block size = 2Bytes = 2 1
Number of sets in cache = Cache size/(Set size * Block size)
= 64KB/(4 blocks * 2B) = 217/(22 * 21) = 213
Number of bits in Tag = Total bits - Index bits - Offset bits = 18-13-1 = 4
Block 12 placed in 8 block cache:Block 12 placed in 8 block cache:
Fully associative, direct mapped, 2-way set associativeFully associative, direct mapped, 2-way set associative S.A. Mapping = Block Number Modulo Number SetsS.A. Mapping = Block Number Modulo Number Sets
0 1 2 3 4 5 6 7 Block
no.
Direct mapped: block 12 can go only into block 4 (12 mod 8)
0 1 2 3 4 5 6 7 Block
no.
Set associative: block 12 can go anywhere in set 0 (12 mod 4)
Set 0 Set 1 Set 2 Set 3 0 1 2 3 4 5 6 7
Block no.
Fully associative: block 12 can go anywhere
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 Block-frame address
1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3 Block
no.
Where can a block be
Replacement Algorithms
Replacement Algorithms
Direct mapping
Direct mapping
No choiceNo choice
Replacement Algorithms
Replacement Algorithms
Associative & Set Associative
Associative & Set Associative
Hardware implemented algorithm (speed)Hardware implemented algorithm (speed) Least Recently used (LRU)Least Recently used (LRU)
e.g. in 2 way set associativee.g. in 2 way set associative
Which of the 2 block is lru?Which of the 2 block is lru?
First in first out (FIFO)First in first out (FIFO)
replace block that has been in cache longestreplace block that has been in cache longest
Least frequently usedLeast frequently used
replace block which has had fewest hitsreplace block which has had fewest hits
Write Policy
Write Policy
Must not overwrite a cache block unless Must not overwrite a cache block unless main memory is up to date
main memory is up to date
Write through
Write through
All writes go to main memory as well as All writes go to main memory as well as cache
cache
Multiple CPUs can monitor main memory Multiple CPUs can monitor main memory traffic to keep local (to CPU) cache up to
traffic to keep local (to CPU) cache up to
date
date
Lots of trafficLots of traffic
Slows down writesSlows down writes
Write back
Write back
Updates initially made in cache onlyUpdates initially made in cache only Update bit for cache slot is set when Update bit for cache slot is set when
update occurs
update occurs
If block is to be replaced, write to main If block is to be replaced, write to main memory only if update bit is set
memory only if update bit is set
Other caches get out of syncOther caches get out of sync
I/O must access main memory through I/O must access main memory through cache
cache
Virtual Memory
Virtual Memory
Divides physical memory into blocks and Divides physical memory into blocks and allocate them to different processes.
allocate them to different processes.
VM’s purpose is to enlarge the set of VM’s purpose is to enlarge the set of
memory addresses a program can utilize
Virtual Memory
Virtual Memory
With virtual memory, what the computer With virtual memory, what the computer can do is look at RAM for areas that have
can do is look at RAM for areas that have
not been used recently and copy them
not been used recently and copy them
onto the hard disk. This frees up space in
onto the hard disk. This frees up space in
RAM to load the new application.
RAM to load the new application.
The area of the hard disk that stores the The area of the hard disk that stores the RAM image is called a
RAM image is called a page filepage file that holds that holds
pages
Virtual Memory
Virtual Memory
Page is a fixed sized memory addressesPage is a fixed sized memory addresses
The area of the hard disk that stores the The area of the hard disk that stores the RAM image is called a
RAM image is called a page filepage file that holds that holds
pages
• OS performs address translation using page table – Each process has its own page table
• OS knows address of each process’s page table – Page table is an array of Page Table Entries
• One entry for each VPN of each process, indexed by VPN
– Each PTE contains
• Phys. Page Number • Permissions
• Dirty bit • LRU
• ~4bytes total
Virtual Page No. Page Offset Virtual Address
Main Memory Page
Table
Page ID: Address Translation
References
References
William Stallings. Computer Architecture and William Stallings. Computer Architecture and
Organization, 2000. Organization, 2000.
Illinois State University, 2006 (Illinois State University, 2006 (
http://www.acs.ilstu.edu/faculty/cjong/Spring2006/ITK225 /ClassNotes
) )
Vasanth Venkatachalam and Michael Franz. Power Vasanth Venkatachalam and Michael Franz. Power
Reduction Techniques For Microprocessor Systems. Reduction Techniques For Microprocessor Systems.
ACM Computing Surveys. 2005. pp 195-233. ACM Computing Surveys. 2005. pp 195-233.
Prof. David Brooks. Computer Architecture. Lecture 17: