• No results found

w14_multicore_computers.pdf

N/A
N/A
Protected

Academic year: 2020

Share "w14_multicore_computers.pdf"

Copied!
10
0
0

Loading.... (view fulltext now)

Full text

(1)

W

ILLIAM

S

TALLINGS

C

OMPUTER

O

RGANIZATION

AND

A

RCHITECTURE

8

TH

E

DITION

Chapter 18

Multicore Computers

H

ARDWARE

P

ERFORMANCE

I

SSUES

Microprocessors have seen an exponential

increase in performance

 Improved organization  Increased clock frequency

Increase in Parallelism

 Pipelining  Superscalar

 Simultaneous multithreading (SMT)

Diminishing returns

 More complexity requires more logic

P

re

pa

re

d

B

y

E

n

gi

n

ee

r

S

M

A

sim

A

li

R

iz

(2)

A

LTERNATIVE

C

HIP

O

RGANIZATIONS

P

re

pa

re

d

B

y

E

n

gi

n

ee

r

S

M

A

si

m

A

li

R

iz

vi

I

NTEL

H

ARDWARE

T

RENDS

P

re

pa

re

d

B

y

E

n

gi

n

ee

r

S

M

A

sim

(3)

I

NCREASED

C

OMPLEXITY

Power requirements grow exponentially with

chip density and clock frequency

 Can use more chip area for cache

Smaller

Order of magnitude lower power requirements 

By 2015

 100 billion transistors on 300mm2 die

Cache of 100MB

1 billion transistors for logic 

Pollack’s rule:

 Performance is roughly proportional to square root of

increase in complexity

Double complexity gives 40% more performance 

Multicore has potential for near-linear

improvement

Unlikely that one core can use all cache

effectively

P

re

pa

re

d

B

y

E

n

gi

n

ee

r

S

M

A

si

m

A

li

R

iz

vi

P

OWER

AND

M

EMORY

C

ONSIDERATIONS

P

re

pa

re

d

B

y

E

n

gi

n

ee

r

S

M

A

sim

A

li

R

iz

(4)

C

HIP

U

TILIZATION

OF

T

RANSISTORS

P

re

pa

re

d

B

y

E

n

gi

n

ee

r

S

M

A

si

m

A

li

R

iz

vi

S

OFTWARE

P

ERFORMANCE

I

SSUES

Performance benefits dependent on effective

exploitation of parallel resources

Even small amounts of serial code impact

performance

 10% inherently serial on 8 processor system gives

only 4.7 times performance

Communication, distribution of work and cache

coherence overheads

P

re

pa

re

d

B

y

E

n

gi

n

ee

r

S

M

A

sim

(5)

E

FFECTIVE

A

PPLICATIONSFOR

M

ULTICORE

P

ROCESSORS

Database

Servers handling independent transactions

Multi-threaded native applications

 Lotus Domino, Siebel CRM

Multi-process applications

 Oracle, SAP, PeopleSoft

Java applications

 Java VM is multi-thread with scheduling and memory

management

 Sun’s Java Application Server, BEA’s Weblogic, IBM

Websphere, Tomcat

Multi-instance applications

 One application running multiple times

E.g. Value Game Software

P

re

pa

re

d

B

y

E

n

gi

n

ee

r

S

M

A

si

m

A

li

R

iz

vi

M

ULTICORE

O

RGANIZATION

 Number of core processors on chip

 Number of levels of cache on chip  Amount of shared cache

 Next slide examples of each organization:  (a) ARM11 MPCore

 (b) AMD Opteron  (c) Intel Core Duo

 (d) Intel Core i7

P

re

pa

re

d

B

y

E

n

gi

n

ee

r

S

M

A

sim

A

li

R

iz

(6)

M

ULTICORE

O

RGANIZATION

A

LTERNATIVES

P

re

pa

re

d

B

y

E

n

gi

n

ee

r

S

M

A

si

m

A

li

R

iz

vi

A

DVANTAGES

OF

SHARED

L2 C

ACHE

Constructive interference reduces overall miss rate

Data shared by multiple cores not replicated at cache

level

With proper frame replacement algorithms mean

amount of shared cache dedicated to each core is

dynamic

 Threads with less locality can have more cache

Easy inter-process communication through shared

memory

P

re

pa

re

d

B

y

E

n

gi

n

ee

r

S

M

A

sim

(7)

I

NDIVIDUAL

C

ORE

A

RCHITECTURE

Intel Core Duo uses superscalar cores

Intel Core i7 uses simultaneous multi-threading

(SMT)

 Scales up number of threads supported

4 SMT cores, each supporting 4 threads appears as 16 core

P

re

pa

re

d

B

y

E

n

gi

n

ee

r

S

M

A

si

m

A

li

R

iz

vi

I

NTELX

86 M

ULTICORE

O

RGANIZATION

-

C

ORE

D

UO

(1)

2006

Two x86 superscalar, shared L2 cache

Dedicated L1 cache per core

 32KB instruction and 32KB data

Thermal control unit per core

 Manages chip heat dissipation

 Maximize performance within constraints  Improved ergonomics

Advanced Programmable Interrupt Controlled

(APIC)

P

re

pa

re

d

B

y

E

n

gi

n

ee

r

S

M

A

sim

A

li

R

iz

(8)

I

NTELX

86 M

ULTICORE

O

RGANIZATION

-

C

ORE

D

UO

(2)

Power Management Logic

 Monitors thermal conditions and CPU activity  Adjusts voltage and power consumption  Can switch individual logic subsystems

2MB shared L2 cache

 Dynamic allocation

 MESI support for L1 caches

 Extended to support multiple Core Duo in SMP

L2 data shared between local cores or external 

Bus interface

P

re

pa

re

d

B

y

E

n

gi

n

ee

r

S

M

A

si

m

A

li

R

iz

vi

I

NTELX

86 M

ULTICORE

O

RGANIZATION

-

C

OREI

7

November 2008

Four x86 SMT processors

Dedicated L2, shared L3 cache

Speculative pre-fetch for caches

On chip DDR3 memory controller

 Three 8 byte channels (192 bits) giving 32GB/s  No front side bus

QuickPath Interconnection

P

re

pa

re

d

B

y

E

n

gi

n

ee

r

S

M

A

sim

(9)

C

ACHE

C

OHERENCY

Snoop Control Unit (SCU) resolves most shared data

bottleneck issues

L1 cache coherency based on MESI

Direct data Intervention

 Copying clean entries between L1 caches without accessing

external memory

 Reduces read after write from L1 to L2

 Can resolve local L1 miss from rmote L1 rather than L2

Duplicated tag RAMs

 Cache tags implemented as separate block of RAM  Same length as number of lines in cache

 Duplicates used by SCU to check data availability before

sending coherency commands

 Only send to CPUs that must update coherent data cache

Migratory lines

 Allows moving dirty data between CPUs without writing to

L2 and reading back from external memory

P

re

pa

re

d

B

y

E

n

gi

n

ee

r

S

M

A

si

m

A

li

R

iz

vi

I

NTEL

C

ORE

I

& B

LOCK

D

IAGRAM

P

re

pa

re

d

B

y

E

n

gi

n

ee

r

S

M

A

sim

A

li

R

iz

(10)

I

NTEL

C

ORE

D

UO

B

LOCK

D

IAGRAM

P

re

pa

re

d

B

y

E

n

gi

n

ee

r

S

M

A

si

m

A

li

R

iz

vi

P

ERFORMANCE

E

FFECT

OF

M

ULTIPLE

C

ORES

P

re

pa

re

d

B

y

E

n

gi

n

ee

r

S

M

A

sim

References

Related documents

every drive more enjoyable.. Prius c Four shown in Sparkling Sea Metallic with available 16-in. 8-spoke alloy wheels... See numbered footnotes in Disclosures section.

One of the most novel feedback design aspects in CrossTrainer is the different audio and tactile textures used in the crossmodal

Control loops step parts: the optical sensor located at the entrance of the classifier can count the number of pieces that go through the process so that when

A COMPARATIVE STUDY ON ANTHROPOMETRIC PROFILE AND MOTOR PERFORMANCE OF COLLEGE MALES AND

This paper reviews several different mechanisms for organising procurement operations on a centralised basis: national purchasing groups, regional and local procurement groups,

The objective of the study was to describe response and tolerability of metronomic chemotherapy regimen GFIP/BDC, a modification of the G-FLIP regimen, in patients with persistent

Name, Parameters. Notifies the camera system that a new event has happened in the current scene. The system will accordingly update its shot contribution list. This method is

McDonough County Public Transportation, yes Adams County Council for Senior Citizens, no West Central Illinois Area Agency on Aging, no Mental Health Centers of Western Illinois,