• No results found

UNIT-4.pptx

N/A
N/A
Protected

Academic year: 2020

Share "UNIT-4.pptx"

Copied!
66
0
0

Loading.... (view fulltext now)

Full text

(1)

Memory System Design

(2)

Memory System

 There are two basic parameters that determine Memory systems

Performance

1. Access Time: Time for a processor request to be transmitted to the memory

system, access a datum and return it back to the processor.( Depends on physical parameter like bus delay, chip delay etc.)

2. Memory Bandwidth: Ability of the memory to respond to requests per unit

(3)
(4)

Memory System Organization

 No. of memory banks each consisting of no of memory modules, each capable

of performing one memory access at a time.

Multiple memory modules in a memory bank share the same input and out put

buses.

In one bus cycle, only one module with in a memory bank can begin or

complete a memory operation.

(5)

Memory System Organization

In systems with multiple processors or with complex single processors,

multiple requests may occur at the same time causing bus or network congestion.

Even in single processor system requests arising from different buffered

(6)

Memory System Organization

 The maximum theoretical bandwidth of the memory system is given by the

number of memory modules divided by memory cycle time.

The Offered Request Rate is the rate at which processor would be submitting

memory requests if memory had unlimited bandwidth.

Offered request rate and maximum memory bandwidth determine

(7)

Achieved vs. Offered Bandwidth

Offered Request Rate:

Rate that processor(s) would make requests if memory

(8)

Memory System Organization

The offered request rate is not dependent on organization of memory system.  It depends on processor architecture and instruction set etc.

The analysis and modeling of memory system depends on no of processors that

request service from common shared memory system.

For this we use a model where n simple processors access m independent

(9)

Memory System Organization

Contention develops when multiple processors access the same module.  A single pipelined processor making n requests to memory system during a

(10)

The Physical Memory Module

Memory module has two important parameters

Module Access Time: Amount of time to retrieve a word into output memory buffer

of the module, given a valid address in its address register.

Module Cycle Time: Minimum time between requests directed at the same module.

(11)

Semiconductor Memories

Semiconductor memories fall into two categories.

Static RAM or SRAM

Dynamic RAM or DRAM

The data retention methods of SRAM are static where as for DRAM its Dynamic. Data in SRAM remains in stable state as long as power is on.

(12)

SRAM Vs DRAM

 SRAM cell uses 6 transistor and resembles flip flops in construction.  Data information remains in stable state as long as power is on.

SRAM is much less dense than DRAM but has much faster access and cycle

time.

In a DRAM cell data is stored as charge on a capacitor which decays with time

(13)

SRAM Vs DRAM

 DRAM cells constructed using a capacitor controlled by single transistor offer

very high storage density.

DRAM uses destructive read out process so data readout must be amplified and

subsequently written back to the cell

This operation can be combined with periodic refreshing required by DRAMS.  The main advantage of DRAM cell is its small size, offering very high storage

(14)

Memory Module

 Memory modules are composed of DRAM chips.

 DRAM chip is usually organized as 2n X 1 bit, where n is an even number.

Internally chip is a two dimensional array of memory cells consisting of

rows and columns.

Half of memory address is used to specify a row address, (one of 2 n/2 row lines)

(15)
(16)

Memory Module

 To save on pinout for better overall density the row and column addresses are

multiplexed on the same lines.

Two additional lines RAS (Row Address Strobe) and CAS (Column Address Strobe)

gate first the row address and then column address into the chip.

The row and column address are then decoded to select one out of 2n/2 possible lines.

(17)

Memory Module

The column lines signals are then amplified by a sense amplifier and transmitted to

the out put pins Dout during a Read Cycle.

 During a Write Cycle, the write enable signal stores the contents on Din at the

(18)
(19)

Memory Timing

At the beginning of Read Cycle, RAS line is activated first and row address

is put on address lines.

With RAS active and CAS inactive the information is stored in row address

register.

(20)

Memory Timing

 CAS gates the column address into column address register.  The column address decoder then selects a column line .

Desired data bit lies at the intersection of active row and column address

lines.

During a Read Cycle the Write Enable is inactive ( low) and the output line

(21)

Memory Timing

 Time from beginning of RAS until the data output line is activated is called

the chip access time. ( t chip access).

T chip cycle is the time required by the row and column address lines to recover

before next address can be entered and read or write process initiated.

This is determined by the amount of time that RAS line is active and

(22)

Memory Module

In addition to memory chips a memory module consists of a Dynamic

Memory Controller and a Memory Timing Controller to provide following functions.

Multiplex of n address bits into row and column address.

Creation of correct RAS and CAS signal lines at the appropriate time

(23)

Memory Module

Dynamic Memory Controller Memory Timing

Controller Bus Drivers Memory Chip

2n x 1

D out n/2 address bits

p bits

p bits n address

(24)

Memory Module

As memory read operation is completed the data out signals are directed

at bus drivers which interface with memory bus, common to all the memory modules.

The access and cycle time of module differ from chip access and cycle

times.

Module access time includes the delays due to dynamic memory

(25)

Memory Module

So in a memory system we have three access and cycle times.

Chip access and Chip cycle time

Module access and Module Cycle time

Memory (System) access and cycle time.

(26)

Memory Systems Design

The Basic design steps are as follows:

1. Determine number of memory modules and the partitioning of memory

system.

2. Determine offered bandwidth.: Peak instruction processing rate multiplied by

expected memory references per instruction multiplied by number of processors.

3. Decide interconnection network: Physical delay through the network plus

(27)

Memory Systems Design

4. Assess Referencing Behavior: Program behavior in its sequence of requests to memory can be

- Purely sequential: each request follows a sequence.

- Random: requests uniformly distributed across modules.

-Regular: Each access separated by a fixed number ( Vector or array references)

(28)

Memory Systems Design

(29)

Memory Models

Nature of Processor:

Simple Processor: Makes a single request and waits for response from

memory.

Pipelined Processor: Makes multiple requests for various buffers in each

memory cycle

Multiple Processors: Each requesting once every memory cycle.

(30)

Memory Models

Achieved Bandwidth: Bandwidth available from memory system.

B (m) or B (m, n): Number of requests that are serviced each module service time Ts = Tc , (m is the number of modules and n is number of requests each cycle.)

(31)

Hellerman’s Model

One of the best known memory model.  Assumes a single sequence of addresses.

Bandwidth is determined by average length of conflict free sequence of

addresses. (ie. No match in w low order bit positions where w = log 2 m: m is no of modules.)

Modeling assumption is that no address queue is present and no out of order

(32)

Hellerman’s Model

Under these conditions the maximum available bandwidth is found to be

approximately.

B(m) = m

and B(w) = m /Ts

The lack of queuing limits the applicability of this model to simple unbuffered

(33)

Strecker’s Model

Model Assumptions:

n simple processor requests made per memory cycle and there are m modules.There is no bus contention.

Requests random and uniformly distributed across modules. Prob of any one

request to a particular module is 1/m.

Any busy module serves 1 request

All unserviced requests are dropped each cycle

(34)

Strecker’s Model

 Model Analysis:

Bandwidth B(m,n) is average no of memory requests serviced per memory cycle.This equals average no of memory modules busy during each memory cycle.

Prob that a module is not referenced by one processor = (1-1/m). Prob that a module is not referenced by any processor = (1-1/m)n. Prob that module is busy = 1-(1-1/m)n.

So B(m,n) = average no of busy modules

(35)

Strecker’s Model

Achieved memory bandwidth is less than the theoretical due to contention.Neglecting congestion carried over from previous cycles results in calculated

(36)

Processor Memory Modeling Using

Queuing Theory

Most real life processors make buffered requests to memory.

Whenever requests are buffered the effect of contention and resulting delays

are reduced.

 More powerful tools like Queuing Theory are needed to accurately model

(37)

Queuing Theory

 A statistical tool applicable to general environments where some

requestors desire service from a common server.

The requestors are assumed to be independent from each other and they

make requests based on certain request probability distribution function.

 Server is able to process requests one at a time , each independently of

(38)

Queuing Theory

The mean of the arrival or request rate (measured in items per unit of

time) is called λ.

The mean of service rate distribution is called μ.( Mean service time Ts =

1/μ )

The ratio of arrival rate (λ) and service rate (μ) is called the utilization or

occupancy of the system and is denoted by ρ.(λ/μ)

(39)

Queuing Theory

Queue models are categorized by the triple.

Arrival Distribution / Service Distribution / Number of servers.

Terminology used to indicate particular probability distribution.

M: Poisson / Exponential c=1

 MB: Binomial c=1

D : Constant c=0

(40)

Queuing Theory

C is coefficient of variance.

C = variance of service time / mean service time. = σ / (1/μ) = σμ.

(41)

Queue Properties

μ

Q

T

w

T

s

ρ

N

T

Size

(42)

Queue Properties

Average time spent in the system (T) consists of average service time(Ts)

plus waiting time (Tw).

T = Ts +Tw

Average Q length ( including requests being serviced)

N = λ T ( Little’s formula).

Since N consists of items in the queue and an item in service

(43)

Queue Properties

Since N = λT Q+ρ = λ (Ts+Tw)

= λ (1/µ +Tw) = λ/µ + λ Tw = ρ + λ Tw Or Q = λ Tw

(44)

Queue Properties

For M/G/1 Queue Model:

 Mean waiting time Tw = (1/)[ 2(1+c2)/2(1-)] Mean items in queue Q =  Tw = 2(1+c2)/2(1-)

For M/M/1 Queue Model: C2 =1; Tw = (1/)[ 2/ (1-)]

Q = 2/(1-)

For M/D/1 Queue Model: C2 =0; Tw = (1/)[ 2/ 2(1-)]

(45)

Queue Properties

For MB/D/1 Queue Model:

Tw = (1/)[ (2-p)/2(1-)]

Q = (2-p)/2(1-)

For simple binomial p = 1/m (Prob of processor making request each Tc is 1)

For δ (Delta) binomial model p = δ /m where δ is the probability of processor making request )

(46)

Open, Closed and Mixed Queue

Models

Open queue models are the simplest queuing form. These models assume

Arrival rate Independent of service rate

This results in a queue of unbounded length as well as a unbounded waiting time.

In a processor memory interaction,

(47)

Open, Closed and Mixed Queue

Models

This situation can be modeled by a queue with feedback

+

µ

λ

a

λ

0

λ

a

λ

0 -

λ

a

Such systems are called closed queue as they have bounded size and waiting time

(48)

Open, Closed and Mixed Queue

Models

Certain systems can behave as open queue up to a certain queue size and

then behave as closed queues.

(49)

Open Queue ( Flores) Memory

Model

Open queue model is not very suitable for processor memory interaction but

its most simple model and can be used as initial guess to partition of memory modules.

 This model was originally proposed by flores using M/D/1 queue but MB/D/1

(50)

Open Queue ( Flores) Memory

Model

The total processor request rate λ

s

is assumed to split uniformly over m

modules.

So request rate at module λ = λ

s

/m

Since µ = 1/T

c

(T

c

is memory cycle time)So ρ = λ / µ = (λ

s /

m) . T

c

We can now use M

B

/D/1 model to determine T

w

and Q

0

(Per module buffer
(51)

Open Queue ( Flores) Memory

Model

Design Steps:

Find peak processor instruction execution rate in MIPS.

MIPS * refrences / instruction = MAPS

Choose m so that ρ = 0.5 and m=2k ( k an integer)

Calculate T

w

and

Q

0.

Total memory access time = T

w

+

Ta

(52)

Open Queue ( Flores) Memory

Model

Example:

Design a memory system for a processor with peak performance of 50 MIPS

and one instruction decoded per cycle.

(53)

Open Queue ( Flores) Memory

Model

 Solution:

MAPS = 1.5 * 50 = 75 MAPSNow ρ = λs / m * Tc

So ρ = 75 x 106 x 1/m x 0.1 x 10 -6 = 7.5 /m

 Now choose m so that ρ = 0.5  If m =16 then ρ = 0.47

For M

B

/D/1 model Tw = 1/λ * (ρ2 – ρp)/ 2(1-ρ)

= Tc * (ρ – 1/m)/ 2 (1-ρ)

(54)

Open Queue ( Flores) Memory

Model

Total memory access time = Ta + Tw = 238 nsQ

0

= ρ2 – ρp / 2 (1 – ρ) = 0.18
(55)

Closed Queues

Closed queue model assumes that arrival rate is immediately affected by

service contention.

Let λ be the offered arrival rate and λa is the achieved arrival rate.Let ρ is the occupancy for λ and ρa for λa .

(56)

Closed Queues

Suppose we have an n, m system in overall stability.

 Average Q size (including items in service) denoted by N = n/m and

closed Q size Qc = n/m – ρa = ρ – ρa where ρa is achieved occupancy. From discussion on open queue we know that

(57)

Closed Queues

 Since in closed Queue Achieved Occupancy is ρa, and for M/D/1, Q0 is ρ2 /2(1- ρ), so we have

N = n/m = ρa2 /2(1- ρa) + ρa Solving for ρa

we have ρa = (1+n/m) – (n/m)2 +1 Bandwidth B (m,n) = m. ρa so

B (m,n) = m+n – n2+m2

(58)

Closed Queues

Since N =n/m is the same as open Queue occupancy ρ. We can say

ρa = (1+ρ) – ρ2 +1

Simple Binomial Model: While deriving asymptotic solution , we had assumed m and n to be very large and used M/D/1 model.

(59)

Comparison of Memory Models

Each model is valid for a particular type of processor memory interaction.  Hellerman’s model represents simplest type of processor. Since processor

can not skip over conflicting requests and has no buffer, it achieves lowest bandwidth.

Strecker’s model anticipates out of order requests but no queues. Its

(60)

Comparison of Memory Models

M/D/1 open (Flores) Model has limited accuracy still it is useful for initial

estimates or in mixed queue models.

 Closed Queue MB/D/1 model represent a processor memory in equilibrium,

where queue length including the item in service equals n/m on a per module basis.

Simple binomial model is suitable only for processors making n requests per

(61)

Comparison of Memory Models

The δ binomial model is suitable for simple pipelined processors where n

(62)

Review and Selection of Queuing

Models

There are basically three dimensions to simple (single) server queuing

models.

These three represent the statistical characterization of arrival Rate, Service

rate and amount of buffering present before system saturates.

 For arrival rate, if the source always requests service during a service

(63)

Review and Selection of Queuing

Models

If the particular requestor has diminishingly small probability of making a

request during a particular service interval, use poisson arrival.

For service rate if service time is fixed , use constant (D) service distribution.If service time varies but variance is unknown, (choose c2=1 for ease of analysis)

(64)

Review and Selection of Queuing

Models

If variance is known and C2 can be calculated use M/G/1 model.

 The third parameter determining the simple queuing model is amount of

(65)

Processors with Cache

 The addition of a cache to a memory system complicates the performance

evaluation and design.

For CBWA caches, the requests to memory consists of line read and line write

requests.

For WTNWA caches, its line read requests and word write requests.  In order to develop models of memory systems with caches two basic

(66)

Processors with Cache

1. T line access ,time it takes to access a line in memory.

2. Tbusy , potential contention time (when memory is busy and processor/cache

References

Related documents

Backlogging is excluded from SLSE so the related costs of the problem include inventory holding cost, production cost, extra ordering cost, setup costs of doing production and

This thesis aims to develop a knowledge-driven strategy formulation framework for large family-based businesses in the Gulf Cooperation Council countries, to help them

Despite the mechanistic data and results in rodents, very little is known about the effects of omega 3 fatty acids on body composition and metabolic rate in humans. In the first

The research interviews revealed that there is a need to situate Rights of Persons with Disability agenda within other existing theories and philosophies like

The current review aimed to evaluate the use of behav- ior change techniques in apps available through iTunes and Google Play that target physical activity and use tai- lored

• If the two-stage thermostat changes the call from low heat to high heat, the integrated control module will immediately switch the induced draft blower, gas valve, and

At 5 weeks of age vascular (aortic) total Gstm1 expression showed a trend towards an increase in SHRSP- Tg(Gstm1)1 WKY and WKY rats, and was significantly increased in

o White, Powder-Coated, Rust-Resistant Frame o Straight Leg Design with Three Height Options o Meets CPAI-84 Fire Requirements o Includes Roller Bag o Includes Standard