Computer Architecture
Chapter 5
Chapter Overview
5.1 Introduction
5.2 The ABCs of Caches
5.3 Reducing Cache Misses
5.4 Reducing Cache Miss Penalty
5.5 Reducing Hit Time
5.6 Main Memory
5.7 Virtual Memory
Introduction
5.1 Introduction
5.2 The ABCs of Caches
5.3 Reducing Cache Misses
5.4 Reducing Cache Miss
Penalty
5.5 Reducing Hit Time
5.6 Main Memory
5.7 Virtual Memory
5.8 Protection and Examples
of Virtual Memory
The Big Picture: Where are
We Now?
The Five Classic Components of a Computer
Control
Datapath
Memory
Processor
Input
Output
Topics In This Chapter:
SRAM Memory Technology
DRAM Memory Technology
Memory Organization
Levels of the Memory Hierarchy
CPU Registers
100s Bytes
1s ns
Cache
K Bytes
4 ns
1-0.1
cents/bit
Main Memory
M Bytes
100ns- 300ns
$.0001-.00001 cents
/bit
Disk
G Bytes, 10 ms
(10,000,000 ns)
10 - 10 cents/bit
-5
-6
Capacity
Access
Time
Cost
Tape
infinite
sec-min
10
-8
Register
s
Cach
e
Memor
y
Dis
k
Tap
e
Instr.
Operands
Block
s
Page
s
File
s
Staging
Xfer Unit
prog.
/compiler
1-8 bytes
cache cntl
8-128
bytes
OS
512-4K
bytes
user/operato
r
Mbytes
Upper
Level
Lower
Level
faste
r
Large
r
Introduction
The Big Picture: Where are
The ABCs of Caches
In this section we will:
Learn lots of definitions about caches – you
can’t talk about something until you
understand it (this is true in computer science
at least!)
Answer some fundamental questions about
caches:
Q1: Where can a block be placed in the
upper level?
(Block placement)
Q2: How is a block found if it is in the
upper level?
(Block identification)
Q3: Which block should be replaced on a
miss?
(Block replacement)
Q4: What happens on a write?
(Write strategy)
5.1 Introduction
5.2 The ABCs of Caches
5.3 Reducing Cache Misses
5.4 Reducing Cache Miss
Penalty
5.5 Reducing Hit Time
5.6 Main Memory
5.7 Virtual Memory
5.8 Protection and Examples
of Virtual Memory
Cache Memory
The purpose of cache memory is to speed up
accesses by storing recently used data closer to
the CPU, instead of storing it in main memory.
Although cache is much smaller than main
memory, its access time is a fraction of that of
main memory.
Unlike main memory, which is accessed by
address, cache is typically accessed by content;
hence, it is often called content addressable
memory .
Because of this, a single large cache memory
isn’t always desirable-- it takes longer to search.
Cache
Small amount of fast memory
Sits between normal main memory
and CPU
May be located on CPU chip or
module
Cache/Main Memory
Structure
Cache operation – overview
CPU requests contents of memory location
Check cache for this data
If present, get from cache (fast)
If not present, read required block from main
memory to cache
Then deliver from cache to CPU
Cache includes tags to identify which block of
main memory is in each cache slot
Comparison of Cache Sizes
a Two values seperated by a slash refer to instruction and data caches b Both caches are instruction only; no data caches
Processor Type Year of Introduction
L1 cachea L2 cache L3 cache
IBM 360/85 Mainframe 1968 16 to 32 KB — — PDP-11/70 Minicomputer 1975 1 KB — — VAX 11/780 Minicomputer 1978 16 KB — — IBM 3033 Mainframe 1978 64 KB — — IBM 3090 Mainframe 1985 128 to 256 KB — — Intel 80486 PC 1989 8 KB — — Pentium PC 1993 8 KB/8 KB 256 to 512 KB — PowerPC 601 PC 1993 32 KB — — PowerPC 620 PC 1996 32 KB/32 KB — — PowerPC G4 PC/server 1999 32 KB/32 KB 256 KB to 1 MB 2 MB IBM S/390 G4 Mainframe 1997 32 KB 256 KB 2 MB IBM S/390 G6 Mainframe 1999 256 KB 8 MB — Pentium 4 PC/server 2000 8 KB/8 KB 256 KB —
IBM SP High-end server/ supercomputer
2000 64 KB/32 KB 8 MB —
CRAY MTAb Supercomputer 2000 8 KB 2 MB —
Itanium PC/server 2001 16 KB/16 KB 96 KB 4 MB
SGI Origin 2001 High-end server 2001 32 KB/32 KB 4 MB —
Itanium 2 PC/server 2002 32 KB 256 KB 6 MB
IBM POWER5 High-end server 2003 64 KB 1.9 MB 36 MB
The Principle of Locality
The Principle of Locality:
Program access a relatively small portion of the address space at
any instant of time.
Three Different Types of Locality:
Temporal Locality (Locality in Time): If an item is referenced, it
will tend to be referenced again soon (e.g., loops, reuse)
Spatial Locality (Locality in Space): If an item is referenced, items
whose addresses are close by tend to be referenced soon
(e.g., straightline code, array access)
Sequential Locality : Sequential order of program execution
except branch instructions.
A few terms
Inclusion Property
Coherence Property
Access frequency
Access time
Cycle time
Latency
Bandwidth
Capacity
Unit of transfer
Memory Hierarchy: Terminology
Hit
: data appears in some block in the upper level (example: Block X)
Hit Rate
: the fraction of memory access found in the upper level
Hit Time
: Time to access the upper level which consists of
Upper level access time + Time to determine hit/miss
Miss
: data needs to be retrieve from a block in the lower level (Block Y)
Miss Rate
= 1 - (Hit Rate)
Miss Penalty
: Time to replace a block in the upper level +
Time to deliver the block the processor
Consider a memory with three levels
Average memory access time (assuming hit at 3
rdlevel)
h1 * t1 + (1 – h1) [t1 + h2 * t2 + (1 – h2) * ( t2 + t3)] where t1, t2 and t3 are access times at
the three levels
Access frequency of level Mi: fi = (1- h1) (1- h2)…(1-hi)hi
Effective Access time =
(fi * ti)
Cache Measures
Hit rate
: fraction found in that level
So high that usually talk about
Miss rate
Average memory-access time
= Hit time + Miss rate x Miss penalty
(ns or clocks)
Miss penalty
: time to replace a block from lower level,
including time to replace in CPU
access time
: time to lower level
= f(latency to lower level)
transfer time
: time to transfer block
=f(Bandwidth between upper & lower levels)
Measures
CPU Execution time = (CPU Clock Cycles + Memory Stall Cycles) *
Clock Cycle Time
CPU clock cycles includes cache hit and CPU is stalled during miss
Memory Stall cycles
= Number of misses * Miss penalty
= IC * (Misses / Instruction) * Miss penalty
= IC * (Memory Accesses / Instruction) * Miss Rate * Miss penalty
Miss rate and miss penalties are different for reads and writes
Memory Stall Cycles
=
IC * (Reads / Instruction) * Read Miss Rate * Read Miss penalty
+ IC * (Writes / Instruction) * Write Miss Rate * Write Miss penalty
Miss Rate = Misses / Instruction
= (Miss rate * Memory Accesses ) / Instruction Count
= Miss rate * (Memory Accesses / Instruction)
Simplest Cache: Direct Mapped
Memor
y
4 Byte Direct Mapped Cache
Memory
Address
0
1
2
3
4
5
6
7
8
9
A
B
C
D
E
F
Cache Index
0
1
2
3
Location 0 can be occupied by data from:
Memory location 0, 4, 8, ... etc.
In general: any memory location
whose 2 LSBs of the address are 0s
Address<1:0> => cache index
Which one should we place in the cache?
How can we tell which one is in the
cache?
Block 12 is placed in an 8 block cache:
Fully associative, direct mapped, 2-way set associative
S.A. Mapping = Block Number Modulo Number Sets
0 1 2 3 4 5 6
7
Bloc
k
no.
Direct mapped:
block 12 can go
only into block 4
(12 mod 8)
0 1 2 3 4 5 6
7
Bloc
k
no.
Set associative:
block 12 can go
anywhere in set
0 (12 mod 4)
Set
0
Set
1
Set
2
Set
3
0 1 2 3 4 5 6
7
Bloc
k
no.
Fully associative:
block 12 can go
anywhere
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0
1
Block-frame
address
1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3
3
Bloc
k
no.
Where can a block be placed in the Cache?
Each entry in the cache stores words
Tag on each block
No need to check index or block offset
The ABCs of Caches
How is a block found if it
is in the cache?
Address
Byte Offset
Each entry in the cache stores words
Tag on each block
No need to check index or block offset
The ABCs of Caches
How is a block found if it
is in the cache?
Address
Byte Offset
Cache Memory
Take advantage of spatial locality
Store multiple words
The diagram below is a schematic of what
cache looks like.
Block 0 contains multiple words from main memory, identified
with the tag 00000000. Block 1 contains words identified with
the tag 11110101.
Cache Memory
As an example, suppose a program generates
the address 1AA. In 14-bit binary, this number is:
00000110101010
.
The first 7 bits of this address go in the tag field,
the next 4 bits go in the block field, and the final
3 bits indicate the word within the block.
Cache Organizations I :
DirectMapped Cache
Cache Index
0
1
2
3
:
Cache Data
Byte 0
0
4
31
:
Cache Tag
Example: 0x50
Ex: 0x01
0x50
Stored as part
of the cache “state”
Valid Bit
:
31
Byte 1
Byte 31
:
Byte 32
Byte 33
Byte 63
:
Byte 992
Byte 1023
:
Cache Tag
Byte Select
Ex: 0x00
9
Block
address
Cache Memory
Direct Mapping
Address Structure
Tag s-r
Line or Slot r
Word w
8
14
2
24 bit address
2 bit word identifier (4 byte block)
22 bit block identifier
8 bit tag (=22-14)
14 bit slot or line
No two blocks in the same line have the same Tag
field
Check contents of cache by finding line and
checking Tag
Direct Mapping
Cache Line Table
Cache line Main Memory blocks held
0 0, m, 2m, 3m…2s-m
1 1,m+1, 2m+1…2s-m+1
m-1 m-1, 2m-1,3m-1…2s-1
Direct Mapping pros & cons
Simple
Inexpensive
Fixed location for given block
If a program accesses 2 blocks that map to the
same line repeatedly, cache misses are very high
Associative Mapping
A main memory block can load into
any line of cache
Memory address is interpreted as
tag and word
Tag uniquely identifies block of
memory
Every line’s tag is examined for a
match
Tag 22 bit
Word
2 bit
Associative Mapping Address Structure
22 bit tag stored with each 32 bit block of data
Compare tag field with tag entry in cache to
check for hit
Least significant 2 bits of address identify which
16 bit word is required from 32 bit data block
e.g.
Address Tag Data Cache line
Cache Organizations II :
SetAssociative Cache
Cache Memory
Cache Index
Set 1
:
Cache Data
Byte 0
0
4
31
:
Cache Tag
Example: 0x50
Ex: 0x01 mod 16
0x50
Stored as part
of the cache “state”
Valid Bit
:
Byte 1
Byte 31
:
Byte 32
Byte 33
Byte 63
:
Byte 992
Byte 1023
:
Cache Tag
Byte Select
Ex: 0x00
9
Block
address
Set 0
Set 15
8
Set Associative Mapping
Cache is divided into a number of
sets
Each set contains a number of lines
A given block maps to any line in a
given set
e.g. Block B can be in any line of set i
e.g. 2 lines per set
2 way associative mapping
A given block can be in one of 2 lines in only one
set
Set Associative Mapping
Example
13 bit set number
Block number in main memory is
modulo 2
13
000000, 00A000, 00B000, 00C000 …
map to same set
Two Way Set Associative
Cache Organization
Set Associative Mapping
Address Structure
Use set field to determine cache set to look in
Compare tag field to see if we have a hit
e.g
Address Tag Data Set number
1FF 7FFC 1FF 12345678 1FFF
001 7FFC 001 11223344 1FFF
Tag 9 bit
Set 13 bit
Word
Two Way
Set
Associative
Mapping
Replacement Algorithms (1)
Direct mapping
No choice
Each block only maps to one line
Replace that line
Replacement Algorithms (2)
Associative & Set Associative
Hardware implemented algorithm (speed)
Least Recently used (LRU)
e.g. in 2 way set associative
Which of the 2 block is lru?
First in first out (FIFO)
replace block that has been in cache longest
Least frequently used
replace block which has had fewest hits
Random
Write Policy
Must not overwrite a cache block
unless main memory is up to date
Multiple CPUs may have individual
caches
I/O may address main memory
directly
Write through
All writes go to main memory as well as cache
Multiple CPUs can monitor main memory traffic
to keep local (to CPU) cache up to date
Lots of traffic
Slows down writes
Write back
Updates initially made in cache only
Update bit for cache slot is set when update
occurs
If block is to be replaced, write to main memory
only if update bit is set
Other caches get out of sync
I/O must access main memory through cache
N.B. 15% of memory references are writes
Let’s Do An Example:
The Memory Addresses We’ll Be
Using
Here’s a number of addresses. We’ll be asking for the data at these
addresses and see what happens to the cache when we do so.
1090
0000000000000000000001
0
001
0
0001
0
0
4
5
8
9
3
1
Result
Off-set
Set
Tag
Address
Miss
1440
0000000000000000000001
0
110
1
0000
0
0
4
5
8
9
3
1
Miss
5000
xxxxxxxxxxxxxxxxxxxxxxx
xxx
x
0100
0
0
4
5
8
9
3
1
1470
xxxxxxxxxxxxxxxxxxxxxx
x
xxx
x
0
4
5
8
9
3
1
xxxxx
Cache:
1. Is Direct Mapped
2. Contains 512 bytes.
3. Has 16 sets.
4. Each set can hold 32 bytes or
1 cache line.
Here’s the Cache We’ll Be
Touching
Initially the
cache is
empty.
N
15 (1111)
N
14 (1110)
N
13 (1101)
N
12 (1100)
N
11 (1011)
N
10 (1010)
N
9 (1001)
N
8 (1000)
N
7 (0111)
N
6 (0110)
N
5 (0101)
N
4 (0100)
N
3 (0011)
N
2 (0010)
N
1 (0001)
N
0 (0000)
Data
(Can hold a 32-byte cache line.)
Tag
V
Set
Address
Cache:
1. Is Direct Mapped
2. Contains 512
bytes.
3. Has 16 sets.
4. Each set can hold
32 bytes or 1
Doing Some Cache Action
We want to
READ
data from address
1090
= 010|0010|00010
N 15 (1111) N 14 (1110) N 13 (1101) N 12 (1100) N 11 (1011) N 10 (1010) N 9 (1001) N 8 (1000) N 7 (0111) N 6 (0110) N 5 (0101) N 4 (0100) N 3 (0011) N 2 (0010) N 1 (0001) N 0 (0000) Data(Always holds a 32-byte cache line.) Tag V Set Address 1001 1000 0100 0011 0011 0010 0010 0010 0010 0010 0001 0000 Tag 1100 0000 0000 0010 0010 1101 1101 0010 0010 0000 0000 1000 Set 10100 1620 11110 1470 01000 5000 00000 4096 00000 2048 00000 1600 00000 1440 01011 1099 00010 1090 00000 1024 00000 512 00000 256 Offset
Add. Y 00000….10 Data from memory loc. 1088 - 1119
We want to
READ
data from address
1440
= 010|1101|00000
N 15 (1111) N 14 (1110) N 13 (1101) N 12 (1100) N 11 (1011) N 10 (1010) N 9 (1001) N 8 (1000) N 7 (0111) N 6 (0110) N 5 (0101) N 4 (0100) N 3 (0011)Data from memory loc. 1088 - 1119 00000….10 Y 2 (0010) N 1 (0001) N 0 (0000) Data
(Always holds a 32-byte cache line.) Tag
V Set
Address
Y 00000….10 Data from memory loc. 1440 - 1471
1001 1000 0100 0011 0011 0010 0010 0010 0010 0010 0001 0000 Tag 1100 0000 0000 0010 0010 1101 1101 0010 0010 0000 0000 1000 Set 10100 1620 11110 1470 01000 5000 00000 4096 00000 2048 00000 1600 00000 1440 01011 1099 00010 1090 00000 1024 00000 512 00000 256 Offset Add.
Doing Some Cache Action
We want to
READ
data from address
5000
= 1001|1100|01000
N 15 (1111) N 14 (1110)Data from memory loc. 1440 - 1471 00000…0010 Y 13 (1101) N 12 (1100) N 11 (1011) N 10 (1010) N 9 (1001) N 8 (1000) N 7 (0111) N 6 (0110) N 5 (0101) N 4 (0100) N 3 (0011)
Data from memory loc. 1088 - 1119 00000…….10 Y 2 (0010) N 1 (0001) N 0 (0000) Data
(Always holds a 32-byte cache line.) Tag
V Set
Address
Y 00000….1001 Data from memory loc. 4992 - 5023
1001 1000 0100 0011 0011 0010 0010 0010 0010 0010 0001 0000 Tag 1100 0000 0000 0010 0010 1101 1101 0010 0010 0000 0000 1000 Set 10100 1620 11110 1470 01000 5000 00000 4096 00000 2048 00000 1600 00000 1440 01011 1099 00010 1090 00000 1024 00000 512 00000 256 Offset Add.
Doing Some Cache Action
We want to
READ
data from address
1470
= 0010|1101|11110
N 15 (1111) N 14 (1110)Data from memory loc. 1440 - 1471 00000…00010
Y 13 (1101)
Data from memory loc. 4992 - 5023 00000….1001 Y 12 (1100) N 11 (1011) N 10 (1010) N 9 (1001) N 8 (1000) N 7 (0111) N 6 (0110) N 5 (0101) N 4 (0100) N 3 (0011)
Data from memory loc. 1088 - 1119 00000…….10 Y 2 (0010) N 1 (0001) N 0 (0000) Data
(Always holds a 32-byte cache line.) Tag
V Set
Address
Y 00000….0010 Data from memory loc. 1440 - 1471
1001 1000 0100 0011 0011 0010 0010 0010 0010 0010 0001 0000 Tag 1100 0000 0000 0010 0010 1101 1101 0010 0010 0000 0000 1000 Set 10100 1620 11110 1470 01000 5000 00000 4096 00000 2048 00000 1600 00000 1440 01011 1099 00010 1090 00000 1024 00000 512 00000 256 Offset Add.
Doing Some Cache Action
We want to
READ
data from address
1600
= 0011|0010|00000
N 15 (1111) N 14 (1110)Data from memory loc. 1440 - 1471 00000…00010
Y 13 (1101)
Data from memory loc. 4992 - 5023 00000….1001 Y 12 (1100) N 11 (1011) N 10 (1010) N 9 (1001) N 8 (1000) N 7 (0111) N 6 (0110) N 5 (0101) N 4 (0100) N 3 (0011)
Data from memory loc. 1060 - 1091 00000…….10 Y 2 (0010) N 1 (0001) N 0 (0000) Data
(Always holds a 32-byte cache line.) Tag
V Set
Address
Y 00000….0011 Data from memory loc. 1600 - 1631
1001 1000 0100 0011 0011 0010 0010 0010 0010 0010 0001 0000 Tag 1100 0000 0000 0010 0010 1101 1101 0010 0010 0000 0000 1000 Set 10100 1620 11110 1470 01000 5000 00000 4096 00000 2048 00000 1600 00000 1440 01011 1099 00010 1090 00000 1024 00000 512 00000 256 Offset Add.
Doing Some Cache Action
We want to
WRITE
data to address
256
= 0000|1000|00000
N 15 (1111) N 14 (1110)Data from memory loc. 1440 - 1471 00000…00010
Y 13 (1101)
Data from memory loc. 4992 - 5023 00000….1001 Y 12 (1100) N 11 (1011) N 10 (1010) N 9 (1001) N 8 (1000) N 7 (0111) N 6 (0110) N 5 (0101) N 4 (0100) N 3 (0011)
Data from memory loc. 1600 - 1631 00000….0011 Y 2 (0010) N 1 (0001) N 0 (0000) Data
(Always holds a 32-byte cache line.) Tag
V Set
Address
Y 00000….0000 Data from memory loc. 256 - 287
1001 1000 0100 0011 0011 0010 0010 0010 0010 0010 0001 0000 Tag 1100 0000 0000 0010 0010 1101 1101 0010 0010 0000 0000 1000 Set 10100 1620 11110 1470 01000 5000 00000 4096 00000 2048 00000 1600 00000 1440 01011 1099 00010 1090 00000 1024 00000 512 00000 256 Offset Add.
Doing Some Cache Action
We want to
WRITE
data to address
1620
= 0011|0010|10100
N 15 (1111) N 14 (1110)Data from memory loc. 1440 - 1471 00000…00010
Y 13 (1101)
Data from memory loc. 4992 - 5023 00000….1001 Y 12 (1100) N 11 (1011) N 10 (1010) N 9 (1001)
Data from memory loc. 256 - 287 00000….0000 Y 8 (1000) N 7 (0111) N 6 (0110) N 5 (0101) N 4 (0100) N 3 (0011)
Data from memory loc. 1060 - 1091 00000…….10 Y 2 (0010) N 1 (0001) N 0 (0000) Data
(Always holds a 32-byte cache line.) Tag
V Set
Address
Y 00000….0011 Data from memory loc. 1600 - 1631
1001 1000 0100 0011 0011 0010 0010 0010 0010 0010 0001 0000 Tag 1100 0000 0000 0010 0010 1101 1101 0010 0010 0000 0000 1000 Set 10100 1620 11110 1470 01000 5000 00000 4096 00000 2048 00000 1600 00000 1440 01011 1099 00010 1090 00000 1024 00000 512 00000 256 Offset Add.
Doing Some Cache Action
We want to
WRITE
data to address
1099
= 0010|0010|01011
N 15 (1111) N 14 (1110)Data from memory loc. 1440 - 1471 00000…00010
Y 13 (1101)
Data from memory loc. 4992 - 5023 00000….1001 Y 12 (1100) N 11 (1011) N 10 (1010) N 9 (1001)
Data from memory loc. 256 - 287 00000….0000 Y 8 (1000) N 7 (0111) N 6 (0110) N 5 (0101) N 4 (0100) N 3 (0011)
Data from memory loc. 1600 - 1631 00000…00011 Y 2 (0010) N 1 (0001) N 0 (0000) Data
(Always holds a 32-byte cache line.) Tag
V Set
Address
Y 00000….0010 Data from memory loc. 1088 - 1119
1001 1000 0100 0011 0011 0010 0010 0010 0010 0010 0001 0000 Tag 1100 0000 0000 0010 0010 1101 1101 0010 0010 0000 0000 1000 Set 10100 1620 11110 1470 01000 5000 00000 4096 00000 2048 00000 1600 00000 1440 01011 1099 00010 1090 00000 1024 00000 512 00000 256 Offset Add.
Doing Some Cache Action
Write through
—The information is written to both the block in
the cache and to the block in the lower-level memory.
Write back
—The information is written only to the block in the
cache. The modified cache block is written to main memory
only
when it is replaced.
is block clean or dirty?
WT always combined with write buffers so that don’t wait for
lower level memory
What happens on a write?
A
Write Buffer
is needed between the Cache
and Memory
Processor: writes data into the cache and the write buffer;
Memory controller: write contents of the buffer to memory.
Write buffer is just a FIFO:
Typical number of entries: 4;
Must handle bursts of writes
;
Processor
Cache
Write Buffer
DRA
M
Write Buffer for Write Through
Assume: a 16-bit (sub-block) write to memory location 0x0 and causes a miss.
Do
we allocate space in cache and possibly read in the block?
Yes
: Write Allocate (Write back caches)
No
: Not Write Allocate (Write through)
Example:
WriteMem[100]
WriteMem[100]
ReadMem[200]
WriteMem[200]
WriteMem[100]
NWA: four misses and one hit
WA: two misses and three hits
Write-miss Policy:
Write Allocate vs . Not Allocate
Pentium 4 Cache
80386 – no on chip cache
80486 – 8k using 16 byte lines and four way set associative
organization
Pentium (all versions) – two on chip L1 caches
Data & instructions
Pentium III – L3 cache added off chip
Pentium 4
L1 caches
8k bytes 64 byte lines
four way set associative
L2 cache
Feeding both L1 caches 256k
128 byte lines
8 way set associative
Intel Cache Evolution
Problem Solution Processor on which feature
first appears
External memory slower than the system bus. Add external cache using faster memory technology.
386
Increased processor speed results in external bus becoming a bottleneck for cache access.
Move external cache on-chip, operating at the same speed as the processor.
486
Internal cache is rather small, due to limited space on chip Add external L2 cache using faster technology than main memory
486
Contention occurs when both the Instruction Prefetcher and the Execution Unit simultaneously require access to the cache. In that case, the Prefetcher is stalled while the Execution Unit’s data access takes place.
Create separate data and instruction caches.
Pentium
Increased processor speed results in external bus becoming a bottleneck for L2 cache access.
Create separate back-side bus that runs at higher speed than the main (front-side) external bus. The BSB is dedicated to the L2 cache.
Pentium Pro
Move L2 cache on to the processor chip.
Pentium II
Some applications deal with massive databases and must have rapid access to large amounts of data. The on-chip caches are too small.
Add external L3 cache. Pentium III