UNIT-4.pptx

(1)

Memory System Design

(2)

Memory System

 There are two basic parameters that determine Memory systems

Performance

1. Access Time: Time for a processor request to be transmitted to the memory

system, access a datum and return it back to the processor.( Depends on physical parameter like bus delay, chip delay etc.)

2. Memory Bandwidth: Ability of the memory to respond to requests per unit

(3)

(4)

Memory System Organization

 No. of memory banks each consisting of no of memory modules, each capable

of performing one memory access at a time.

 _{Multiple memory modules in a memory bank share the same input and out put}

buses.

 _{In one bus cycle, only one module with in a memory bank can begin or}

complete a memory operation.

(5)

Memory System Organization

 _{In systems with multiple processors or with complex single processors,}

multiple requests may occur at the same time causing bus or network congestion.

 _{Even in single processor system requests arising from different buffered}

(6)

Memory System Organization

 The maximum theoretical bandwidth of the memory system is given by the

number of memory modules divided by memory cycle time.

 _The_{Offered Request Rate}_{is the rate at which processor would be submitting}

memory requests if memory had unlimited bandwidth.

 _{Offered request rate}_and_{maximum memory} _bandwidth_determine

(7)

Achieved vs. Offered Bandwidth

Offered Request Rate:

–

Rate that processor(s) would make requests if memory

(8)

Memory System Organization

 _The_{offered request rate}_{is not} _{dependent on organization of memory system.}  It depends on processor architecture and instruction set etc.

 _{The analysis and modeling of memory system depends on no of processors that}

request service from common shared memory system.

 _{For this we use a model where}_n_{simple processors access}_m_independent

(9)

Memory System Organization

 _{Contention develops when multiple processors access the same module.}  A single pipelined processor making n requests to memory system during a

(10)

The Physical Memory Module

 _{Memory module has two important parameters}

 _{Module Access Time: Amount of time to retrieve a word into output memory buffer}

of the module, given a valid address in its address register.

 _{Module Cycle Time: Minimum time between requests directed at the same module.}

(11)

Semiconductor Memories

 _{Semiconductor memories fall into two categories.}

 _{Static RAM or SRAM}

 _{Dynamic RAM or DRAM}

The data retention methods of SRAM are static where as for DRAM its Dynamic. Data in SRAM remains in stable state as long as power is on.

(12)

SRAM Vs DRAM

 SRAM cell uses 6 transistor and resembles flip flops in construction.  _{Data information remains in stable state as long as power is on.}

 _{SRAM is much less dense than DRAM but has much faster access and cycle}

time.

 _{In a DRAM cell data is stored as charge on a capacitor which decays with time}

(13)

SRAM Vs DRAM

 DRAM cells constructed using a capacitor controlled by single transistor offer

very high storage density.

 _{DRAM uses destructive read out process so data readout must be amplified and}

subsequently written back to the cell

 _{This operation can be combined with periodic refreshing required by DRAMS.}  The main advantage of DRAM cell is its small size, offering very high storage

(14)

Memory Module

 Memory modules are composed of DRAM chips.

 DRAM chip is usually organized as 2n X 1 bit, where n is an even number.

 _{Internally chip is a two dimensional array of memory cells consisting of}

rows and columns.

 _{Half of memory address is used to specify a row address, (one of 2}n/2 _row lines)

(15)

(16)

Memory Module

 To save on pinout for better overall density the row and column addresses are

multiplexed on the same lines.

 _{Two additional lines}_RAS_{(Row Address Strobe) and}_CAS_{(Column Address Strobe)}

gate first the row address and then column address into the chip.

 _{The row and column address are then decoded to select one out of 2}n/2_possible lines.

(17)

Memory Module

 _{The column lines signals are then amplified by a sense amplifier and transmitted to}

the out put pins D_outduring a Read Cycle.

 During a Write Cycle, the write enable signal stores the contents on D_in at the

(18)

(19)

Memory Timing

 _{At the beginning of Read Cycle, RAS line is activated first and row address}

is put on address lines.

 _{With RAS active and CAS inactive the information is stored in row address}

register.

(20)

Memory Timing

 CAS gates the column address into column address register.  _{The column address decoder then selects a column line .}

 _{Desired data bit lies at the intersection of active row and column address}

lines.

 _{During a Read Cycle the Write Enable is inactive ( low) and the output line}

(21)

Memory Timing

 Time from beginning of RAS until the data output line is activated is called

the chip access time. ( t _{chip access}).

 _T_{chip cycle}_{is the time required by the row and column address lines to recover}

before next address can be entered and read or write process initiated.

 _{This is determined by the amount of time that RAS line is active and}

(22)

Memory Module

 _{In addition to memory chips a memory module consists of a Dynamic}

Memory Controller and a Memory Timing Controller to provide following functions.

 _{Multiplex of n address bits into row and column address.}

 _{Creation of correct RAS and CAS signal lines at the appropriate time}

(23)

Memory Module

Dynamic Memory Controller Memory Timing

Controller Bus Drivers Memory Chip

2n x 1

D _out n/2 address bits

p bits

p bits n address

(24)

Memory Module

 _{As memory read operation is completed the data out signals are directed}

at bus drivers which interface with memory bus, common to all the memory modules.

 _{The access and cycle time of module differ from chip access and cycle}

times.

 _{Module access time includes the delays due to dynamic memory}

(25)

Memory Module

 _{So in a memory system we have three access and cycle times.}

 _{Chip access and Chip cycle time}

 _{Module access and Module Cycle time}

 _{Memory (System) access and cycle time.}

(26)

Memory Systems Design

The Basic design steps are as follows:

1. Determine number of memory modules and the partitioning of memory

system.

2. Determine offered bandwidth.: Peak instruction processing rate multiplied by

expected memory references per instruction multiplied by number of processors.

3. Decide interconnection network: Physical delay through the network plus

(27)

Memory Systems Design

4. Assess Referencing Behavior: Program behavior in its sequence of requests to memory can be

- Purely sequential: each request follows a sequence.

- Random: requests uniformly distributed across modules.

-Regular: Each access separated by a fixed number ( Vector or array references)

(28)

Memory Systems Design

(29)

Memory Models

Nature of Processor:

 Simple Processor: Makes a single request and waits for response from

memory.

 _{Pipelined Processor:}_{Makes multiple requests for various buffers in each}

memory cycle

 Multiple Processors: Each requesting once every memory cycle.

(30)

Memory Models

Achieved Bandwidth: Bandwidth available from memory system.

B (m) or B (m, n): Number of requests that are serviced each module service time T_s = T_c, (m is the number of modules and n is number of requests each cycle.)

(31)

Hellerman’s Model

 _{One of the best known memory model.}  Assumes a single sequence of addresses.

 _{Bandwidth is determined by average length of conflict free sequence of}

addresses. (ie. No match in w low order bit positions where w = log ₂ m: m is no of modules.)

 _{Modeling assumption is that no address queue is present and no out of order}

(32)

Hellerman’s Model

 _{Under these conditions the maximum available bandwidth is found to be}

approximately.

B(m) = m

and B(w) = m /T_s

 _{The lack of queuing limits the applicability of this model to simple unbuffered}

(33)

Strecker’s Model

 _{Model Assumptions:}

 _n_{simple processor requests made per memory cycle and there are}_m_modules.  _{There is no bus contention.}

 _{Requests random and uniformly distributed across modules. Prob of any one}

request to a particular module is 1/m.

 _{Any busy module serves 1 request}

 _{All unserviced requests are dropped each cycle}

(34)

Strecker’s Model

 Model Analysis:

 _{Bandwidth B(m,n) is average no of memory requests serviced per memory cycle.}  _{This equals average no of memory modules busy during each memory cycle.}

Prob that a module is not referenced by one processor = (1-1/m). Prob that a module is not referenced by any processor = (1-1/m)n. Prob that module is busy = 1-(1-1/m)n_.

So B(m,n) = average no of busy modules

(35)

Strecker’s Model

 _{Achieved memory bandwidth is less than the theoretical due to contention.}  _{Neglecting congestion carried over from previous cycles results in calculated}

(36)

Processor Memory Modeling Using

Queuing Theory

 _{Most real life processors make buffered requests to memory.}

 _{Whenever requests are buffered the effect of contention and resulting delays}

are reduced.

 More powerful tools like Queuing Theory are needed to accurately model

(37)

Queuing Theory

 A statistical tool applicable to general environments where some

requestors desire service from a common server.

 _{The requestors are assumed to be independent from each other and they}

make requests based on certain request probability distribution function.

 Server is able to process requests one at a time , each independently of

(38)

Queuing Theory

 _{The mean of the arrival or request rate (measured in items per unit of}

time) is called λ.

 _{The mean of service rate distribution is called}_μ_{.( Mean service time Ts =}

1/μ )

 _{The ratio of arrival rate (λ) and service rate (μ) is called the utilization or}

occupancy of the system and is denoted by ρ.(λ/μ)

(39)

Queuing Theory

 _{Queue models are categorized by the triple.}

Arrival Distribution / Service Distribution / Number of servers.

 _{Terminology used to indicate particular probability distribution.}

 _{M: Poisson / Exponential c=1}

 M_B: Binomial c=1

 _{D : Constant} _c=0

(40)

Queuing Theory

 _{C is coefficient of variance.}

C = variance of service time / mean service time. = σ / (1/μ) = σμ.

(41)

Queue Properties

μ

Q

T

_w

T

s

ρ

N

T

Size

(42)

Queue Properties

 _{Average time spent in the system (T) consists of average service time(Ts)}

plus waiting time (Tw).

T = Ts +Tw

Average Q length ( including requests being serviced)

N = λ T ( Little’s formula).

Since N consists of items in the queue and an item in service

(43)

Queue Properties

Since N = λT Q+ρ = λ (Ts+Tw)

= λ (1/µ +Tw) = λ/µ + λ Tw = ρ + λ Tw Or Q = λ Tw

(44)

Queue Properties

For M/G/1 Queue Model:

 Mean waiting time T_w = (1/)[ 2(1+c2)/2(1-)] Mean items in queue Q =  T_w = 2(1+c2)/2(1-)

For M/M/1 Queue Model: C2 =1; T_w = (1/)[ 2/ (1-)]

Q = 2_/(1-₎

For M/D/1 Queue Model: C2 =0; T_w = (1/)[ 2_{/ 2(1-}_)]

(45)

Queue Properties

For MB/D/1 Queue Model:

T_w = (1/)[ (2_-p_)/2(1-_)]

Q = (2_-p_)/2(1-₎

For simple binomial p = 1/m (Prob of processor making request each Tc is 1)

For δ (Delta) binomial model p = δ /m where δ is the probability of processor making request )

(46)

Open, Closed and Mixed Queue

Models

 _{Open queue}_{models are the simplest queuing form. These models assume}

 _{Arrival rate Independent of service rate}

 _{This results in a queue of unbounded length as well as a unbounded waiting time.}

In a processor memory interaction,

(47)

Open, Closed and Mixed Queue

Models

 _{This situation can be modeled by a queue with feedback}

+

µ

λ

a

λ

0

λ

a

λ

0 -

λ

a

Such systems are called closed queue as they have bounded size and waiting time

(48)

Open, Closed and Mixed Queue

Models

 _{Certain systems can behave as open queue up to a certain queue size and}

then behave as closed queues.

(49)

Open Queue ( Flores) Memory

Model

 _{Open queue model is not very suitable for processor memory interaction but}

its most simple model and can be used as initial guess to partition of memory modules.

 This model was originally proposed by flores using M/D/1 queue but M_B/D/1

(50)

Open Queue ( Flores) Memory

Model

 _{The total processor request rate λ}

s

_{is assumed to split uniformly over m}

modules.

 _{So request rate at module λ = λ}

s

_/m

 _{Since µ = 1/T}

c

_(T

c

_{is memory cycle time)}  _{So ρ = λ / µ = (λ}

s /

_{m) . T}

c

 _{We can now use M}

B

_{/D/1 model to determine T}

w

_{and Q}

0

_{(Per module buffer}

(51)

Open Queue ( Flores) Memory

Model

 _{Design Steps:}

 _{Find peak processor instruction execution rate in MIPS.}

 _{MIPS * refrences / instruction = MAPS}

 _{Choose m so that ρ = 0.5 and m=2}k _{( k an integer)}

 _{Calculate T}

w

_and

Q

0.

 _{Total memory access time = T}

w

₊

Ta

(52)

Open Queue ( Flores) Memory

Model

 _Example:

 _{Design a memory system for a processor with peak performance of 50 MIPS}

and one instruction decoded per cycle.

(53)

Model

 Solution:

 _{MAPS = 1.5 * 50 = 75 MAPS}  _{Now ρ = λs / m * Tc}

 _{So ρ = 75 x 10}6_{x 1/m x 0.1 x 10}-6 _{= 7.5 /m}

 Now choose m so that ρ = 0.5  If m =16 then ρ = 0.47

 _{For M}

B

_{/D/1 model Tw = 1/λ * (ρ}2 – ρp)/ 2(1-ρ)

= Tc * (ρ – 1/m)/ 2 (1-ρ)

(54)

Model

 _{Total memory access time = Ta + Tw = 238 ns}  _Q

0

_{= ρ}2 – ρp / 2 (1 – ρ) = 0.18

(55)

Closed Queues

 _{Closed queue model assumes that arrival rate is immediately affected by}

service contention.

 _{Let λ be the offered arrival rate and λa is the achieved arrival rate.}  _{Let ρ is the occupancy for λ and ρa for λa .}

(56)

Closed Queues

 _{Suppose we have an n, m system in overall stability.}

 Average Q size (including items in service) denoted by N = n/m and

closed Q size Qc = n/m – ρa = ρ – ρa where ρa is achieved occupancy. From discussion on open queue we know that

(57)

Closed Queues

 Since in closed Queue Achieved Occupancy is ρa, and for M/D/1, Q₀ is ρ2 /2(1- ρ), so we have

N = n/m = ρa2 /2(1- ρa) + ρa Solving for ρa

we have ρa = (1+n/m) – (n/m)2 +1 Bandwidth B (m,n) = m. ρa so

B (m,n) = m+n – n2+m2

(58)

Closed Queues

 _{Since N =n/m is the same as open Queue occupancy ρ. We can say}

ρa = (1+ρ) – ρ2 +1

Simple Binomial Model: While deriving asymptotic solution , we had assumed m and n to be very large and used M/D/1 model.

(59)

Comparison of Memory Models

 _{Each model is valid for a particular type of processor memory interaction.}  Hellerman’s model represents simplest type of processor. Since processor

can not skip over conflicting requests and has no buffer, it achieves lowest bandwidth.

 _{Strecker’s model anticipates out of order requests but no queues. Its}

(60)

Comparison of Memory Models

 _{M/D/1 open (Flores) Model has limited accuracy still it is useful for initial}

estimates or in mixed queue models.

 Closed Queue M_B/D/1 model represent a processor memory in equilibrium,

where queue length including the item in service equals n/m on a per module basis.

 _{Simple binomial model is suitable only for processors making n requests per}

(61)

Comparison of Memory Models

 _{The δ binomial model is suitable for simple pipelined processors where n}

(62)

Review and Selection of Queuing

Models

 _{There are basically three dimensions to simple (single) server queuing}

models.

 _{These three represent the statistical characterization of arrival Rate, Service}

rate and amount of buffering present before system saturates.

 For arrival rate, if the source always requests service during a service

(63)

Review and Selection of Queuing

Models

 _{If the particular requestor has diminishingly small probability of making a}

request during a particular service interval, use poisson arrival.

 _{For service rate if service time is fixed , use constant (D) service distribution.}  _{If service time varies but variance is unknown, (choose c}2=1 for ease of analysis)

(64)

Review and Selection of Queuing

Models

 _{If variance is known and C}2 can be calculated use M/G/1 model.

 The third parameter determining the simple queuing model is amount of

(65)

Processors with Cache

 The addition of a cache to a memory system complicates the performance

evaluation and design.

 _{For CBWA caches, the requests to memory consists of}_{line read}_and_{line write}

requests.

 _{For WTNWA caches, its}_{line read}_{requests and}_{word write}_requests.  In order to develop models of memory systems with caches two basic

(66)

Processors with Cache

1. T _{line access},time it takes to access a line in memory.

2. T_busy, potential contention time (when memory is busy and processor/cache