Memory System Design
Memory System
There are two basic parameters that determine Memory systems
Performance
1. Access Time: Time for a processor request to be transmitted to the memory
system, access a datum and return it back to the processor.( Depends on physical parameter like bus delay, chip delay etc.)
2. Memory Bandwidth: Ability of the memory to respond to requests per unit
Memory System Organization
No. of memory banks each consisting of no of memory modules, each capable
of performing one memory access at a time.
Multiple memory modules in a memory bank share the same input and out put
buses.
In one bus cycle, only one module with in a memory bank can begin or
complete a memory operation.
Memory System Organization
In systems with multiple processors or with complex single processors,
multiple requests may occur at the same time causing bus or network congestion.
Even in single processor system requests arising from different buffered
Memory System Organization
The maximum theoretical bandwidth of the memory system is given by the
number of memory modules divided by memory cycle time.
The Offered Request Rate is the rate at which processor would be submitting
memory requests if memory had unlimited bandwidth.
Offered request rate and maximum memory bandwidth determine
Achieved vs. Offered Bandwidth
Offered Request Rate:
–
Rate that processor(s) would make requests if memory
Memory System Organization
The offered request rate is not dependent on organization of memory system. It depends on processor architecture and instruction set etc.
The analysis and modeling of memory system depends on no of processors that
request service from common shared memory system.
For this we use a model where n simple processors access m independent
Memory System Organization
Contention develops when multiple processors access the same module. A single pipelined processor making n requests to memory system during a
The Physical Memory Module
Memory module has two important parameters
Module Access Time: Amount of time to retrieve a word into output memory buffer
of the module, given a valid address in its address register.
Module Cycle Time: Minimum time between requests directed at the same module.
Semiconductor Memories
Semiconductor memories fall into two categories.
Static RAM or SRAM
Dynamic RAM or DRAM
The data retention methods of SRAM are static where as for DRAM its Dynamic. Data in SRAM remains in stable state as long as power is on.
SRAM Vs DRAM
SRAM cell uses 6 transistor and resembles flip flops in construction. Data information remains in stable state as long as power is on.
SRAM is much less dense than DRAM but has much faster access and cycle
time.
In a DRAM cell data is stored as charge on a capacitor which decays with time
SRAM Vs DRAM
DRAM cells constructed using a capacitor controlled by single transistor offer
very high storage density.
DRAM uses destructive read out process so data readout must be amplified and
subsequently written back to the cell
This operation can be combined with periodic refreshing required by DRAMS. The main advantage of DRAM cell is its small size, offering very high storage
Memory Module
Memory modules are composed of DRAM chips.
DRAM chip is usually organized as 2n X 1 bit, where n is an even number.
Internally chip is a two dimensional array of memory cells consisting of
rows and columns.
Half of memory address is used to specify a row address, (one of 2 n/2 row lines)
Memory Module
To save on pinout for better overall density the row and column addresses are
multiplexed on the same lines.
Two additional lines RAS (Row Address Strobe) and CAS (Column Address Strobe)
gate first the row address and then column address into the chip.
The row and column address are then decoded to select one out of 2n/2 possible lines.
Memory Module
The column lines signals are then amplified by a sense amplifier and transmitted to
the out put pins Dout during a Read Cycle.
During a Write Cycle, the write enable signal stores the contents on Din at the
Memory Timing
At the beginning of Read Cycle, RAS line is activated first and row address
is put on address lines.
With RAS active and CAS inactive the information is stored in row address
register.
Memory Timing
CAS gates the column address into column address register. The column address decoder then selects a column line .
Desired data bit lies at the intersection of active row and column address
lines.
During a Read Cycle the Write Enable is inactive ( low) and the output line
Memory Timing
Time from beginning of RAS until the data output line is activated is called
the chip access time. ( t chip access).
T chip cycle is the time required by the row and column address lines to recover
before next address can be entered and read or write process initiated.
This is determined by the amount of time that RAS line is active and
Memory Module
In addition to memory chips a memory module consists of a Dynamic
Memory Controller and a Memory Timing Controller to provide following functions.
Multiplex of n address bits into row and column address.
Creation of correct RAS and CAS signal lines at the appropriate time
Memory Module
Dynamic Memory Controller Memory TimingController Bus Drivers Memory Chip
2n x 1
D out n/2 address bits
p bits
p bits n address
Memory Module
As memory read operation is completed the data out signals are directed
at bus drivers which interface with memory bus, common to all the memory modules.
The access and cycle time of module differ from chip access and cycle
times.
Module access time includes the delays due to dynamic memory
Memory Module
So in a memory system we have three access and cycle times.
Chip access and Chip cycle time
Module access and Module Cycle time
Memory (System) access and cycle time.
Memory Systems Design
The Basic design steps are as follows:
1. Determine number of memory modules and the partitioning of memory
system.
2. Determine offered bandwidth.: Peak instruction processing rate multiplied by
expected memory references per instruction multiplied by number of processors.
3. Decide interconnection network: Physical delay through the network plus
Memory Systems Design
4. Assess Referencing Behavior: Program behavior in its sequence of requests to memory can be
- Purely sequential: each request follows a sequence.
- Random: requests uniformly distributed across modules.
-Regular: Each access separated by a fixed number ( Vector or array references)
Memory Systems Design
Memory Models
Nature of Processor:
Simple Processor: Makes a single request and waits for response from
memory.
Pipelined Processor: Makes multiple requests for various buffers in each
memory cycle
Multiple Processors: Each requesting once every memory cycle.
Memory Models
Achieved Bandwidth: Bandwidth available from memory system.
B (m) or B (m, n): Number of requests that are serviced each module service time Ts = Tc , (m is the number of modules and n is number of requests each cycle.)
Hellerman’s Model
One of the best known memory model. Assumes a single sequence of addresses.
Bandwidth is determined by average length of conflict free sequence of
addresses. (ie. No match in w low order bit positions where w = log 2 m: m is no of modules.)
Modeling assumption is that no address queue is present and no out of order
Hellerman’s Model
Under these conditions the maximum available bandwidth is found to be
approximately.
B(m) = m
and B(w) = m /Ts
The lack of queuing limits the applicability of this model to simple unbuffered
Strecker’s Model
Model Assumptions:
n simple processor requests made per memory cycle and there are m modules. There is no bus contention.
Requests random and uniformly distributed across modules. Prob of any one
request to a particular module is 1/m.
Any busy module serves 1 request
All unserviced requests are dropped each cycle
Strecker’s Model
Model Analysis:
Bandwidth B(m,n) is average no of memory requests serviced per memory cycle. This equals average no of memory modules busy during each memory cycle.
Prob that a module is not referenced by one processor = (1-1/m). Prob that a module is not referenced by any processor = (1-1/m)n. Prob that module is busy = 1-(1-1/m)n.
So B(m,n) = average no of busy modules
Strecker’s Model
Achieved memory bandwidth is less than the theoretical due to contention. Neglecting congestion carried over from previous cycles results in calculated
Processor Memory Modeling Using
Queuing Theory
Most real life processors make buffered requests to memory.
Whenever requests are buffered the effect of contention and resulting delays
are reduced.
More powerful tools like Queuing Theory are needed to accurately model
Queuing Theory
A statistical tool applicable to general environments where some
requestors desire service from a common server.
The requestors are assumed to be independent from each other and they
make requests based on certain request probability distribution function.
Server is able to process requests one at a time , each independently of
Queuing Theory
The mean of the arrival or request rate (measured in items per unit of
time) is called λ.
The mean of service rate distribution is called μ.( Mean service time Ts =
1/μ )
The ratio of arrival rate (λ) and service rate (μ) is called the utilization or
occupancy of the system and is denoted by ρ.(λ/μ)
Queuing Theory
Queue models are categorized by the triple.
Arrival Distribution / Service Distribution / Number of servers.
Terminology used to indicate particular probability distribution.
M: Poisson / Exponential c=1
MB: Binomial c=1
D : Constant c=0
Queuing Theory
C is coefficient of variance.
C = variance of service time / mean service time. = σ / (1/μ) = σμ.
Queue Properties
μ
Q
T
wT
sρ
N
T
Size
Queue Properties
Average time spent in the system (T) consists of average service time(Ts)
plus waiting time (Tw).
T = Ts +Tw
Average Q length ( including requests being serviced)
N = λ T ( Little’s formula).
Since N consists of items in the queue and an item in service
Queue Properties
Since N = λT Q+ρ = λ (Ts+Tw)
= λ (1/µ +Tw) = λ/µ + λ Tw = ρ + λ Tw Or Q = λ Tw
Queue Properties
For M/G/1 Queue Model:
Mean waiting time Tw = (1/)[ 2(1+c2)/2(1-)] Mean items in queue Q = Tw = 2(1+c2)/2(1-)
For M/M/1 Queue Model: C2 =1; Tw = (1/)[ 2/ (1-)]
Q = 2/(1-)
For M/D/1 Queue Model: C2 =0; Tw = (1/)[ 2/ 2(1-)]
Queue Properties
For MB/D/1 Queue Model:
Tw = (1/)[ (2-p)/2(1-)]
Q = (2-p)/2(1-)
For simple binomial p = 1/m (Prob of processor making request each Tc is 1)
For δ (Delta) binomial model p = δ /m where δ is the probability of processor making request )
Open, Closed and Mixed Queue
Models
Open queue models are the simplest queuing form. These models assume
Arrival rate Independent of service rate
This results in a queue of unbounded length as well as a unbounded waiting time.
In a processor memory interaction,
Open, Closed and Mixed Queue
Models
This situation can be modeled by a queue with feedback
+
µ
λ
aλ
0λ
aλ
0 -λ
aSuch systems are called closed queue as they have bounded size and waiting time
Open, Closed and Mixed Queue
Models
Certain systems can behave as open queue up to a certain queue size and
then behave as closed queues.
Open Queue ( Flores) Memory
Model
Open queue model is not very suitable for processor memory interaction but
its most simple model and can be used as initial guess to partition of memory modules.
This model was originally proposed by flores using M/D/1 queue but MB/D/1
Open Queue ( Flores) Memory
Model
The total processor request rate λ
s
is assumed to split uniformly over mmodules.
So request rate at module λ = λ
s
/m Since µ = 1/T
c
(Tc
is memory cycle time) So ρ = λ / µ = (λs /
m) . Tc
We can now use M
B
/D/1 model to determine Tw
and Q0
(Per module bufferOpen Queue ( Flores) Memory
Model
Design Steps:
Find peak processor instruction execution rate in MIPS.
MIPS * refrences / instruction = MAPS
Choose m so that ρ = 0.5 and m=2k ( k an integer)
Calculate T
w
andQ
0.
Total memory access time = T
w
+Ta
Open Queue ( Flores) Memory
Model
Example:
Design a memory system for a processor with peak performance of 50 MIPS
and one instruction decoded per cycle.
Open Queue ( Flores) Memory
Model
Solution:
MAPS = 1.5 * 50 = 75 MAPS Now ρ = λs / m * Tc
So ρ = 75 x 106 x 1/m x 0.1 x 10 -6 = 7.5 /m
Now choose m so that ρ = 0.5 If m =16 then ρ = 0.47
For M
B
/D/1 model Tw = 1/λ * (ρ2 – ρp)/ 2(1-ρ)= Tc * (ρ – 1/m)/ 2 (1-ρ)
Open Queue ( Flores) Memory
Model
Total memory access time = Ta + Tw = 238 ns Q
0
= ρ2 – ρp / 2 (1 – ρ) = 0.18Closed Queues
Closed queue model assumes that arrival rate is immediately affected by
service contention.
Let λ be the offered arrival rate and λa is the achieved arrival rate. Let ρ is the occupancy for λ and ρa for λa .
Closed Queues
Suppose we have an n, m system in overall stability.
Average Q size (including items in service) denoted by N = n/m and
closed Q size Qc = n/m – ρa = ρ – ρa where ρa is achieved occupancy. From discussion on open queue we know that
Closed Queues
Since in closed Queue Achieved Occupancy is ρa, and for M/D/1, Q0 is ρ2 /2(1- ρ), so we have
N = n/m = ρa2 /2(1- ρa) + ρa Solving for ρa
we have ρa = (1+n/m) – (n/m)2 +1 Bandwidth B (m,n) = m. ρa so
B (m,n) = m+n – n2+m2
Closed Queues
Since N =n/m is the same as open Queue occupancy ρ. We can say
ρa = (1+ρ) – ρ2 +1
Simple Binomial Model: While deriving asymptotic solution , we had assumed m and n to be very large and used M/D/1 model.
Comparison of Memory Models
Each model is valid for a particular type of processor memory interaction. Hellerman’s model represents simplest type of processor. Since processor
can not skip over conflicting requests and has no buffer, it achieves lowest bandwidth.
Strecker’s model anticipates out of order requests but no queues. Its
Comparison of Memory Models
M/D/1 open (Flores) Model has limited accuracy still it is useful for initial
estimates or in mixed queue models.
Closed Queue MB/D/1 model represent a processor memory in equilibrium,
where queue length including the item in service equals n/m on a per module basis.
Simple binomial model is suitable only for processors making n requests per
Comparison of Memory Models
The δ binomial model is suitable for simple pipelined processors where n
Review and Selection of Queuing
Models
There are basically three dimensions to simple (single) server queuing
models.
These three represent the statistical characterization of arrival Rate, Service
rate and amount of buffering present before system saturates.
For arrival rate, if the source always requests service during a service
Review and Selection of Queuing
Models
If the particular requestor has diminishingly small probability of making arequest during a particular service interval, use poisson arrival.
For service rate if service time is fixed , use constant (D) service distribution. If service time varies but variance is unknown, (choose c2=1 for ease of analysis)
Review and Selection of Queuing
Models
If variance is known and C2 can be calculated use M/G/1 model. The third parameter determining the simple queuing model is amount of
Processors with Cache
The addition of a cache to a memory system complicates the performance
evaluation and design.
For CBWA caches, the requests to memory consists of line read and line write
requests.
For WTNWA caches, its line read requests and word write requests. In order to develop models of memory systems with caches two basic
Processors with Cache
1. T line access ,time it takes to access a line in memory.
2. Tbusy , potential contention time (when memory is busy and processor/cache