EC6009 ADVANCED COMPUTER
EC6009 ADVANCED COMPUTER
ARCHITECTURE
ARCHITECTURE
Review of fundamentals of CPU, Memory
Review of fundamentals of CPU, Memory
and IO - Trends in Technology, Power,
and IO - Trends in Technology, Power,
Energy
and
Cost,
e!enda"ility-Energy
and
Cost,
e!enda"ility-Performance Evaluation#
Performance Evaluation#
UNIT I FUNDAMENTALS OF
UNIT I FUNDAMENTALS OF
COMPUTER DESIGN
COMPUTER DESIGN
INTRODUCTION
INTRODUCTION
•
• $%&'( years "ac) the *rst general !ur!ose electronic com!uter$%&'( years "ac) the *rst general !ur!ose electronic com!uter
was created#
was created#
•
• T Today oday less less than than +%((, +%((, mo"ile mo"ile com!uter com!uter that that has has moremore
!erformance, more main memory and more dis) storage than a
!erformance, more main memory and more dis) storage than a
com!uter in .% for + million#
com!uter in .% for + million#
•
• This This ra!id ra!id im!rovement im!rovement has has come come "oth "oth from from advances advances in in thethe
technology used to "uild com!uters and from innovations in
technology used to "uild com!uters and from innovations in
com!uter design#
com!uter design#
•
• RI/C "ased machine focused on two critical !erformanceRI/C "ased machine focused on two critical !erformance
techni0ues#
techni0ues# E1!loitation E1!loitation of of Instruction Instruction 2evel 2evel PaParallelism rallelism 3initially3initially
through
through
!i!elining and later through multi!le instruction issue4
!i!elining and later through multi!le instruction issue4
Use of Caches#
Use of Caches#
•
• 5or many a!!lications, the highest !erformance micro!rocessors5or many a!!lications, the highest !erformance micro!rocessors
of today out!erform the su!ercom!uter of less than (
of today out!erform the su!ercom!uter of less than ( years ago#years ago#
•
• ramatic im!rovement in cost-!erformance leads to new classesramatic im!rovement in cost-!erformance leads to new classes
of com!uters#
INTRODUCTION
INTRODUCTION
•
• $%&'( years "ac) the *rst general !ur!ose electronic com!uter$%&'( years "ac) the *rst general !ur!ose electronic com!uter
was created#
was created#
•
• T Today oday less less than than +%((, +%((, mo"ile mo"ile com!uter com!uter that that has has moremore
!erformance, more main memory and more dis) storage than a
!erformance, more main memory and more dis) storage than a
com!uter in .% for + million#
com!uter in .% for + million#
•
• This This ra!id ra!id im!rovement im!rovement has has come come "oth "oth from from advances advances in in thethe
technology used to "uild com!uters and from innovations in
technology used to "uild com!uters and from innovations in
com!uter design#
com!uter design#
•
• RI/C "ased machine focused on two critical !erformanceRI/C "ased machine focused on two critical !erformance
techni0ues#
techni0ues# E1!loitation E1!loitation of of Instruction Instruction 2evel 2evel PaParallelism rallelism 3initially3initially
through
through
!i!elining and later through multi!le instruction issue4
!i!elining and later through multi!le instruction issue4
Use of Caches#
Use of Caches#
•
• 5or many a!!lications, the highest !erformance micro!rocessors5or many a!!lications, the highest !erformance micro!rocessors
of today out!erform the su!ercom!uter of less than (
of today out!erform the su!ercom!uter of less than ( years ago#years ago#
•
• ramatic im!rovement in cost-!erformance leads to new classesramatic im!rovement in cost-!erformance leads to new classes
of com!uters#
INTRODUCTION
INTRODUCTION
•• The The last last decade decade saw saw the the rise rise of of smart smart cell cell !hones !hones and and ta"let ta"let com!uters, com!uters, whichwhich
are many !eo!le are using as their !rimary com!uting !latform instead of PC
are many !eo!le are using as their !rimary com!uting !latform instead of PCs#s#
•
• These These mo"ile mo"ile client client devices devices are are increasingly increasingly using using the the internet internet to to accessaccess
warehouses containing tens of thousands of servers#
warehouses containing tens of thousands of servers#
•
• Mainframe com!uters and high !erformance /u!ercom!uters all are collectionsMainframe com!uters and high !erformance /u!ercom!uters all are collections
of micro!rocessors#
of micro!rocessors#
•
• T Today oday the nathe nature of ture of a!!lication aa!!lication also changlso changes# /!eech, es# /!eech, sound, sound, images and images and videosvideos
are "ecoming increasingly im!ortant along with !redicta"le res!onse time that is
are "ecoming increasingly im!ortant along with !redicta"le res!onse time that is
so critical to
so critical to the user e1!erience#the user e1!erience#
•
• An inspiring exap!e is G""g!e G"gg!es#An inspiring exap!e is G""g!e G"gg!es#
•
• This a!!lication lets This a!!lication lets you hold you hold u! your u! your cell !hone to cell !hone to !oint its camera !oint its camera at an o"at an o"6ect,6ect,
and the image is sent wirelessly over the internet to a 7/C that recogni8e the
and the image is sent wirelessly over the internet to a 7/C that recogni8e the
o"6ect and tells you
o"6ect and tells you interesting information a"out it#interesting information a"out it#
•
• Read the "ar code on a "oo) cover to tell you if a "oo) is availa"le online and itsRead the "ar code on a "oo) cover to tell you if a "oo) is availa"le online and its
!rice#
!rice#
•
• /ince 9((:, single-!rocessor !erformance im!rovement has dro!!ed to less than/ince 9((:, single-!rocessor !erformance im!rovement has dro!!ed to less than
99; !er year due to the twin hurdles of ma1imum !ower dissi!ation and the lac)
99; !er year due to the twin hurdles of ma1imum !ower dissi!ation and the lac)
of more I2P#
of more I2P#
•
• In 9((<, Intel canceled its high-!erformance uni!rocessor !ro6ects and 6oinedIn 9((<, Intel canceled its high-!erformance uni!rocessor !ro6ects and 6oined
others in declaring that the road to higher !erformance would "e via multi!le
others in declaring that the road to higher !erformance would "e via multi!le
!rocessors !er chi! rather than via faster un
REVIE$ OF FUNDAMENTALS OF CPU
The functional "loc)s in a com!uter are
%# ALU
&# C"n'r"! Uni' (# Me"r)
*# Inp+' Uni' ,# O+'p+' Uni'
• The =2U contains necessary electronic circuits to !erform arithmetic and logical
o!erations#
• The Control Unit analyses each instruction in the !rogram and sends the relevant
control /ignals to all other units & =2U, Memory, In!ut and Out!ut Unit#
• The !rogram is fed into the com!uter through the in!ut unit and stored in the
memory# In order to e1ecute the !rogram, the instructions have to "e fetched from memory one "y one# This fetching of instruction is done "y the control unit#
• =fter an instruction is fetched, the control unit decodes the instruction# =ccording
to the instruction, the control unit issues control signals to other units#
• =fter an instruction is e1ecuted, the result of the instruction is stored in memory
or stored tem!orarily in the control unit or =2U, so that this can "e used "y the ne1t instruction#
• The results of a !rogram are ta)en out of the com!uter through the out!ut unit# • The control unit and =2U are collectively )nown as Central Processing Unit 3CPU4#
REVIE$ OF FUNDAMENTALS OF CPU
• The !hysical units in a com!uter such as the CPU, Memory, In!ut and
Out!ut units form the >ardware#
• The Com!ilers as well as user !rograms 3high level language or machine
language4 form the software#
• >ardware wor)s as dictated "y the software# The o!erating system is a
s!ecial software that manages the >?7 and /?7#
Ari'-e'i. an/ L"gi. Uni'
The =2U has hardware circuits which !erform !rimitive arithmetic and logical o!erations# The >?7 sections in =2U are
# =dder
9# =ccumulator
:# @eneral Pur!ose Register <# Counters
%# /hifters
$# Com!lementer#
A//erA adds two num"ers and gives the result#
A..++!a'"rA Register which tem!orarily holds the results of a !revious o!eration in
REVIE$ OF FUNDAMENTALS OF CPU
Genera! P+rp"se Regis'er 1GPR2A 7hen an o!erand is stored in main memory, it
Ta)es time to retrieve it# If it is stored within the CPU, it is immediately availa"le to the CPU#
The @PRBs store dierent ty!es of information # O!erand
9# O!erand address :# Constant
/ince they are used for multi!le !ur!oses, these registers are )nown as @PRBs#
S.ra'.- Pa/ Me"r) "r Regis'ersA uring Com!le1 o!erations li)e multi!lication,
division etc#, it is necessary to store intermediate results tem!orarily# 5or this !ur!ose there are usually one or more scratch !ad registers# These are !urely internal >?7 resources and not addressa"le "y !rogram#
S-i3'er an/ C"p!een'erA The shifter !rovides left and right shift re0uired for various o!erations# The com!lementer !rovides 9Bs com!lement of "inary num"ers#
REVIE$ OF FUNDAMENTALS OF CPU
CONTROL UNITA
The control unit is the most com!le1 unit in a com!uter# Its main functions are # 5etching instructions
9# =naly8ing the OPCOE
:# @enerating control signals for !erforming various o!erations#
H4$ res"+r.es "3 a ."n'r"! +ni'
Pr"gra C"+n'er "r Ins'r+.'i"n A//ress C"+n'er 1IAC2
I=C contains the memory address of the ne1t instruction to "e fetched# 7hen an instruction is fetched, the I=C is incremented so that it !oints to the address of the ne1t instruction# Every instruction contains an o!code# In addition it may contain one or more of the
following# # O!erand
9# O!erand address :# Register address
PS$ Regis'er It contains various status "its descri"ing the current condition of the CPU# These are )nown as Dags# Two such Dags are
*# In'err+p' Ena5!e 7hen this "it is , CPU will recogni8e interru!t re0uests# 7hen this "it is (, interru!t re0uests will "e ignored "y the CPU and they remain !ending# The MI is an e1ce!tion to this#
&# Oer7"8 7hen this "it is , it indicates there is an overDow condition in =2U in the !revious =rithmetic o!eration#
MEMOR AND IO
The Memory is organi8ed in to locations# Each memory location is )nown as one Memory 7ord#
Me"r) T)pes
O!/er ."p+'ers +se agne'i. ."re e"r) 8-i!e '-e presen' /a) 8e +se Sei."n/+.'"r Me"r)#
Core memory is non-volatile where semiconductor memory is volatile# semiconductor memory is of two ty!esA /R=M and R=M#
/R=M !reserves the contents of all the locations as long as the !ower su!!ly is !resent#
R=M memory can retain the content of any location only for a few milliseconds#
Ran/" A..ess an/ Se:+en'ia! A..ess Me"ries
In a R=M access time is same for all locations# 3Core and /emiconductor Memories are R=M4
In a se0uential access memory, the read or write access is se0uential# The time ta)en for accessing the *rst location is the shortest and the time ta)en for the last location is the 2ongest# 3Magnetic ta!e4
MEMOR AND IO
Memory Organi8ationA
The Memory unit consists of the following sectionsA # Memory =ddress Register 3M=R4
9# Memory ata Register 3MR4 :# Memory Control 2ogic
<# Memory cells
5or the read o!eration, the CPU does the following se0uenceA 3i4 /ends the address to M=R#
3ii4 /ends RE= signal to memory control unit#
The Memory control unit decodes the address "its and identi*es the location
to "e accessed# Then it initiates a read o!eration of the memory# The
memory ta)es some amount of time to !resent the contents of the location
in MR#
3iii4 =fter a suFcient time interval, the CPU transfers the information from
MEMOR AND IO
5or 7rite o!eration, the CPU does the following se0uenceA 3i4 /ends address to M=R#
3ii4 /ends data to MR#
3iii4 /ends 7RITE signal to memory control unit#
The Memory control unit decodes the address "its and identi*es the location 7here the write o!eration has to "e !erformed# It then routes the MR
Contents to memory and initiates the write o!eration#
Me"r) A..ess Tie The time ta)en "y the memory to su!!ly the contents of a location , from the time it receives GReadB is called the Memory =ccess time# Core Memory .((ns and semiconductor memory ((ns#
Me"r) C).!e Tie The memory access time !lus the additional recovery time 3memory is "usy due to internal o!eration4 is )nown as Memory Cycle time#
A+xi!iar) Me"r)
# 5lo!!y is) drive 9# >ard is) drive :# Magnetic ta!e drive <# C-ROM#
Inp+' 4 O+'p+' Uni's
Common in!ut units are Hey"oard, Do!!y dis), hard dis), magnetic ta!e, mouse, light !en, /canner, O!tical dis), etc#
Common Out!ut units are dis!lay terminal, !rinter, !lotter, Do!!y dis) drive , >ard dis) drive, magnetic ta!e drive and o!tical dis) drive, etc#
CLASSES OF COMPUTERS
PERSONAL MO;ILE DEVICES 1PMD2
•
Collection of wireless devices with multimedia user
interfaces - cell !hone and ta"let com!uters#
•
Price of a system is +((- +((( and !rice of ! +(-+((#
•Energy and si8e re0uirements lead to use of Dash memory
for storage instead of magnetic dis)s#
•
Res!onsiveness and Predicta"ility are )ey characteristics for
media a!!lications#
•
5or e1am!le !laying a video on a PM, the time to !rocess
each video frame is limited, since the !rocessor must
access and !rocess the ne1t frame shortly#
•
The memory can "e su"stantial !ortion of the system cost
CLASSES OF COMPUTERS
DES<TOP COMPUTING
•
/!ans from low end net "oo)s sell for +:(( to high
end, heavily con*gured wor)stations that may sell for
+9%((#
•
/ince 9((. more than half of the des)to! com!uters
made each year have "een "attery o!erated la!to!
com!uters#
•
Com"ination of !erformance 3measured in terms of
com!uter !erformance and gra!hics !erformance4 and
!rice of a system, result in the newest high
!erformance ! and cost reduced ! often a!!ear in
des)to! systems#
•
es)to! com!uting tends to "e reasona"ly well
characteri8ed
in
terms
of
a!!lications
and
CLASSES OF COMPUTERS
SERVERS:
• Role of servers grew to provide large scale and more reliable file and
computing services.
• For servers different characteristics are important. First availability is critical. • Consider the servers running at ATM machines for banks or airline
reservation systems.
• Failure of such server system is more catastrophic than failure of a single
desktop. Since these servers must operate seven days a week !" hours a day.
• Second key feature of server system is scalability. The ability to scale up the
computing capacity the memory the storage and the #$% bandwidth of a server is crucial.
• Servers are designed for efficient throughput. The overall performance of the
server is in terms of transaction per minute or web pages served per second.
• %verall efficiency and cost effectiveness of a server & determined by how
CLASSES OF COMPUTERS
CLUSTERS4$AREHOUSE SCALE COMPUTERS
•
@rowth of /oftware =s = /ervice 3/==/4 for a!!lications
li)e
search,
social
networ)ing,
Jideo
sharing,
Multi!layer games, Online sho!!ing has led to the
growth of a class of com!uters & Clusters#
•
Clusters are collections of des)to! com!uters or servers
connected "y 2= to act as a single large com!uter#
•
Each node runs its own o!erating system and nodes
communicate using a networ)ing !rotocol#
•
2argest of the clusters are called 7arehouse /cale
Com!uters 37/C4 & designed so that tens of thousands
of server can act as one#
•
Price- Performance and !ower are critical to 7/CBs
CLASSES OF COMPUTERS
•
.(; of the cost of +(M 7arehouse is associated with
!ower and cooling of the com!uter inside#
•
etwor)ing gear cost another +'(M and they must "e
re!laced every few years#
•
7/CBs are related to servers, in that availa"ility is critical#
•5or e1am!leA =ma8on#com
+: Killion in sales in the fourth 0uarter of 9((#
a"out 99(( hours in a 0uarter, average revenue !er hour
was almost +$M# uring
a !ea) hour for Christmas
sho!!ing, the !otential loss would "e many times higher#
•
/u!ercom!uters are related to 7/CBs & e0ually e1!ensive,
costing hundreds of million of dollars, "ut diers "y
em!hasi8ing Doating !oint !erformance#
TRENDS IN TECHNOLOG
INTEGRATED
CIRCUIT
LOGIC
TECHNOLOG
•
Transistor density increases "y :%; !er year
#
•
Increases in die si8e ranging from (; to
9(; !er year#
•
The com"ined eect is a growth in transistor
count on a chi! of a"out <(; to %%; !er
year or dou"ling every . to 9< months#
•
This trend is !o!ularly )nown as MooreBs
TRENDS IN TECHNOLOG
Year T r a n si s t o r s 4004 8008 8080 8086 80286 Intel386 Intel486 Pentium Pentium ProPentium IIPentium III Pentium 4 1,000 10,000 100,000 1,000,000 10,000,000 100,000,000 1,000,000,000 1970 1975 1980 1985 1990 1995 2000
TRENDS IN TECHNOLOG
SEMICONDUCTOR DRAM
•
Ca!acity !er R=M chi! has increased "y a"out 9%; to
<(; !er year recently, dou"ling roughly every two to three
years#
ear
DRAM
gr"8'-ra'e
C-ara.'eri=a'i"n "3 ipa.' "n DRAM .apa.i')%990 60> 4ear ?+a/r+p!ing eer) ( )ears
%996 60> 4ear ?+a/r+p!ing eer) ( )ears
&00( *0> @ 60> 4ear ?+a/r+p!ing eer) ( '" * )ears
&00 *0> 4ear D"+5!ing eer) & )ears
TRENDS IN TECHNOLOG
SEMICONDUCTOR FLASH 1E!e.'ri.a!!) Erasa5!e Pr"graa5!e Rea/@ On!) Me"r)2• on-Jolatile semiconductor memory & standard storage device in
PMBs#
• Ca!acity !er 5lash chi! has increased "y a"out %(; to $(; !er year
recently, dou"ling roughly every two years #
• 5lash memory is % to 9( times chea!er !er "it than R=M#
MAGNETIC DIS< TECHNOLOG
• Prior to (, density increased "y a"out :(; !er year, dou"ling
in : years# Increased ((; !er year in $# /ince 9((<, it has dro!!ed "ac) to <(; !er year#
• is)s are % to 9% times chea!er !er "it than 5lash# • is)s are :(( to %(( times chea!er !er "it than R=M#
• This technology is central to server and warehouse scale storage#
NET$OR< TECHNOLOG
• etwor) !erformance de!ends on "oth on the !erformance of
PERFORMANCE TRENDS
;AND$IDTH OR THROUGHPUT
•
It is the total amount of wor) done in a given time,
such as mega"ytes !er second for a dis) transfer#
LATENC OR RESPONSE TIME
•
It is the time "etween the start and com!letion of an
event, such as milliseconds for a dis) access#
TRENDS IN PO$ER AND ENERG IN IC
•
5or CMO/ chi!s, the !rimary energy has "een in switching
transistors, also called dynamic energy#
•
The energy re0uired !er transistor is !ro!ortional to the
Product of the ca!acitive load driven "y the transistor and
the s0uare of the voltage#
TRENDS IN PO$ER AND ENERG
IN IC
•
The energy of !ulse of the logic transition (L L ( or
L ( L #
•
The energy of a single transition3(L or L (4 is then
Energ)
/)nai.B
Capa.i'ie !"a/
V"!'age
&•
The !ower re0uired !er transistor is the !roduct of the
energy of a transition multi!lied "y the fre0uency of
transition#
P"8er /)nai. B Capa.i'ie !"a/ V"!'age &
Fre:+en.) s8i'.-e/
•
ynamic !ower and energy are greatly reduced "y
lowering the voltages# Joltages have dro!!ed from %J
to 6ust under J in 9( ears#
TRENDS IN PO$ER AND ENERG IN IC
D" n"'-ing 8e!!
• Most ! today turn o the cloc) of inactive modules to save energy
and dynamic !ower# 5or e1, if no Doating-!oint instructions are e1ecuting, the cloc) of the Doating !oint unit is disa"led# If some cores are idle, their cloc)s are sto!!ed#
D)nai. V"!'age@Fre:+en.) S.a!ing 1DVFS2
• PM, la!to!s and servers have !eriods of low activity where there is
no need to o!erate at the highest cloc) fre0uency and voltages#
• Modern !Bs oer a few cloc) fre0uencies and voltages & o!erate at
lower !ower and energy#
• Power savings via J5/ & a server may "e o!erated at : dierent
cloc) rates A 9#< @>8, #. @>8 and @>8#
Design 3"r ')pi.a! .ase
• PMs and la!to!s are often idle, memory and storage oer low !ower
modes to save energy & e1tend "attery life time#
• On-chi! tem!erature sensors to detect when activity should "e
TRENDS IN PO$ER AND ENERG
IN IC
Oer!".ing
• Intel oered Tur"o mode in 9((. & chi! decides it is safe to run
at a higher cloc) rate for a short time & few cores until tem!erature starts to rise#
• 5or a single threaded code, these micro!rocessors can turn o
all cores "ut one and run it at an even higher cloc) rate#
• O!erating /ystem turn o Tur"o mode & no noti*cation once it
is ena"led- !rograms vary in !erformance due to room tem!erature#
• P"8er s'a'i. B C+rren' s'a'i. V"!'age
• /tatic !ower is !ro!ortional to num"er of devices# Increasing
num"er of transistors, increases !ower even if they are idle#
• /R=M caches need !ower to maintain the storage values#
• Processor is a !ortion of the whole energy cost of a system &
use faster, less energy-eFcient !rocessor to allow the rest of the system to go into a slee! mode & race-to-halt#
TRENDS IN COST
C"s' "3 an IC
• Cost of IC N Cost of die Cost of testing die Cost of !ac)aging and *nal
test
5inal test ield
•
Cost of die N
Cost of wafer
ies !er wafer ie yield
•
ies !er wafer N
Q 37afer diameter?949 Q 7afer diameterie area S39 ie area4
• Pr"5!e %
5ind the num"er of dies !er :(( mm 3:( cm4 wafer for a die
that is #% cm on a side and for a die that is #( cm on a side#
•
ies !er wafer #% cm 3ie area N #% #% N9#9% cm
94 N 9'(
•ies !er wafer #( cm 3ie area N N cm
94 N $<(
DEPENDA;ILIT
•
e!enda"ility is a measure of system availa"ility, relia"ility,
and its maintaina"ility#
•
Infrastructure !roviders started oering /ervice 2evel
=greement 3/2=4 to guarantee that their networ)ing or
!ower service would "e de!enda"le#
•
5or e1am!le they would !ay the customer a !enalty if they
didnBt meet an agreement more than some hours !er month#
T8" ain eas+res "3 /epen/a5i!i') M"/+!e Re!ia5i!i')
•
Mean Time To 5ailure 3MTT54 & relia"ility measure & reci!rocal
of MTT5 is a rate of failures#
•
/ervice interru!tion is measured as Mean Time To Re!air
3MTTR4#
•
Mean Time Ketween 5ailures 3MTK54 N MTT5 MTTR#
MEASURING REPORTING AND SUMMARIING PERFORMANCE
•
=ma8on#com administrator may say a com!uter is faster
when it com!letes more transactions !er hour#
•
The com!uter user is interested in reducing res!onse
time &the time "etween the start and the com!letion of
an event - referred as e1ecution time#
•
The o!erator of warehouse scale com!uter may "e
interested in increasing through!ut & the total amount of
wor) done in a given time#
•
7e often want to relate the !erformance of two dierent
com!uters say and # The !hrase is faster than i#e
the res!onse or e1ecution time is lower on than for
the given tas)# In !articular is n time faster than #
E1ecution time N n
E1ecution time
•
/ince E1ecution time is reci!rocal of !erformance#
N
Performance ?UANTITATIVE PRINCIPLES OF COMPUTER
DESIGN
PRINCIPLE OF LOCALIT
• Programs tend to reuse data and instructions they have used recently#
The !rinci!le of locality a!!lies to data accesses, though not as strongly as to code accesses#
• Two dierent ty!es of locality have "een o"served#
• Tem!oral 2ocalityA Recently accessed items are li)ely to "e accessed in
the near future#
• /!atial 2ocalityA Items whose addresses are near one another tend to "e
referenced close together in time#
AMDAHLS LA$
• It states that the !erformance im!rovement can "e gained from using
faster mode of e1ecution is limited "y the fraction of the time faster mode can "e used#
• /!eedu! N !erformance for entire tas) using the enhancement when !ossi"le
!erformance for entire tas) without using the enhancement =lternatively
• /!eedu! N E1ecution time for entire tas) without using the enhancement
?UANTITATIVE PRINCIPLES OF COMPUTER
DESIGN
Exe.+'i"n 'ie ne8 Exe.+'i"n 'ie "!/ 11% J Fra.'i"n en-an.e/2 K Fra.'i"n en-an.e/ 4 Spee/+p en-an.e/ 2
Spee/+p "era!! E1ecution time old ? E1ecution time new
% 4 1% J Fra.'i"n en-an.e/2 K Fra.'i"n en-an.e/ 4 Spee/+p en-an.e/
PRO;LEM &
• /u!!ose that we want to enhance the !rocessor used for we" serving# The
new !rocessor is ( times faster on com!utation in the we" serving a!!lication than the original !rocessor# =ssuming that the original !rocessor is "usy with com!utation <(; of the time and is waiting for I?O $(; of the time, what is the overall s!eedu! gained "y incor!orating the enhancement 5raction enhanced N (#<
/!eedu! enhanced N (
THE PROCESSOR PERFORMANCE E?UATION
•
Essentially all com!uters are constructed using a
cloc) running at a constant rate# These discrete
time events are called tic)s, cloc) tic)s, cloc)
!eriods, cloc)s, cycles or cloc) cycles#
•
CPU time N CPU cloc) cycles for a !rogram
Cloc) cycle time
3or4
CPU time N CPU cloc) cycles for a !rogram ?
Cloc) rate
Cloc) cycles Per Instruction
CPI N CPU cloc) cycles for a !rogram ?
Instruction Count
CLASSES OF PARALLELISM
• Parallelism & driving force of com!uter design & energy and cost "eing the !rimary designconstraint#
• There are "asically two )inds of Parallelism in a!!lications#
%#Da'a@Lee! Para!!e!is 1DLP2 There are many data items that can "e o!erated on at the same time#
&#Tas@Lee! Para!!e!is 1TLP2 arises "ecause tas)s of wor) are created that can o!erate inde!endently and largely in !arallel#
Com!uter hardware can e1!loit these two )inds of a!!lication# Parallelism in ma6or four ways#
(#Ins'r+.'i"n Lee! Para!!e!is 1ILP2
•. E1!loits 2P with com!iler#
•. =ll Processors since a"out .% use !i!elining to overla! the e1ecution of instructions and
im!rove !erformance#
•. This !otential overla! among instructions is called Instruction 2evel Parallelism# •. The instructions can "e evaluated in !arallel#
&#Ve.'"r Ar.-i'e.'+res an/ Grap-i. Pr".ess"r Uni's 1GPU2
E1!loits 2P "y a!!lying a single instruction to a collection of data in !arallel#
(#T-rea/ Lee! Para!!e!is
E1!loits either 2P or T2P in a tightly cou!led hardware module that allows for interaction among !arallel threads#
*#Re:+es' Lee! Para!!e!is
E1!loits !arallelism among largely decou!led tas)s s!eci*ed "y the !rogrammer or the o!erating systems#
•
Michael 5lynn !laced all com!uters in to one of four
categoriesA
%# Sing!e Ins'r+.'i"ns Sing!e Da'a 1SISD2 s'rea
A
•.
Uni!rocessor category #
•.
/tandard se0uential com!uter, "ut it can e1!loit I2P#
•.
/I/ architectures that use I2P techni0ues such as
su!erscalar#
&# Sing!e Ins'r+.'i"ns M+!'ip!e Da'a 1SIMD2 s'rea
A
•.
In a /IM machine, the same instruction is e1ecuted "y
multi!le !rocessors using dierent data streams#
•.
Each !rocessor has its own data memory, "ut there is only
one instruction memory and control !rocessor, which
fetches and dis!atches instructions#
•.
/tandard se0uential com!uter, "ut it can e1!loit I2P#
•.
/I/ architectures that use I2P techni0ues such as
su!erscalar#
•.
It e1!loits 2P, "y a!!lying the same o!erations to
(# M+!'ip!e Ins'r+.'i"ns Sing!e Da'a 1MISD2 s'reaA
• o commercial multi!rocessor of this ty!e has "een "uilt to date#
*# M+!'ip!e Ins'r+.'i"ns M+!'ip!e Da'a 1MIMD2 s'reaA
• Each !rocessor fetches its own instructions and o!erates on its own data#
• These !rocessors either utili8e centrali8ed shared memory architecture or each has
its own memory and they communicate with each other through cross"ar networ)s#
/IM !rocessors can e1!loit data !arallelism, "ut are not as De1i"le as MIM !rocessors# They are suita"le for algorithms with high data
!arallelism and little data de!endent control Dow#
MIM !rocessors are more De1i"le, they can "e either function as single-user machines, focusing on high !erformance for one !articular a!!lication or as multi-!rogrammed machines running many tas)s simultaneously#
>owever they are much more e1!ensive and com!licated due to
re!lication of control hardware, high instruction "andwidth re0uirement and /ynchroni8ation of data !ath#
Kesides !ure /IM and MIM a!!roaches, a com"ination of "oth /IM
and MIM a!!roaches is also !ossi"le, e1!loiting the advantages of "oth /IM and MIM architectures#
Tightly cou!led MIM architectures e1!loits T2P , since multi!le coo!erating Threads o!erate in !arallel#
2oosely cou!led MIM architectures 3Clusters and 7/C4 e1!loits R2P, where many inde!endent tas)s can !roceed in !arallel with little need for communication and /ynchroni8ation#
MULTITHREADING
• M+!'i'-rea/ing /imultaneous e1ecution of two or more threads
"y the multi!le !rocessors#
• On a /ingle !rocessor, Multithreading generally occurs "y Time
ivision Multi!le1ing 3TM4# The !rocessor switches "etween dierent threads#
• On a Multi!rocessor the threads or tas)s will actually run at the
same time with each !rocessor or core running as !articular thread or tas)# T)pes # Coarse-grained Multithreading# 9# 5ine-grained Multithreading# :# /imultaneous Multithreading# A/an'ages "3 M+!'i'-rea/ing
# If a thread gets a lot of cache misses, the other thread can continue, ta)ing
advantage of unused com!uting resources, which thus can lead to faster overall
e1ecution, as these resources would have "een idle if only a single thread was
MULTITHREADING
Disa/an'ages
# Multi!le threads can interfere with each other, when sharing hardware
resources such as caches or T2P#
9# E1ecution time of a single threads are not im!roved, due to slower fre0uency or
adding !i!eline stages that are necessary to accommodate thread switching >?7#
:# Re0uires more changes to "oth a!!lica"le !rograms and O/ than multi!rocessing#
C"arse@graine/ M+!'i'-rea/ing
• =lso )nown as Kloc) or coo!erative multithreading#
• /im!lest ty!e of multithreading, occurs when one thread
runs until, it is
"loc)ed "y a event that normally would create a long latency stall#
• /uch a stall might "e a cache miss, that have to access
o-chi! memory &
might ta)e huge num"er of CPU cycles, for the data to return#
MULTITHREADING
Fine@graine/ M+!'i'-rea/ing
• It is to remove all de!endencies stalls from the e1ecuting
!i!elining#
• /ince one thread is relatively inde!endent from other thread
there is less
chance of one instruction in one !i!eline stages needing an out!ut from an older
instruction in !i!eline#
Har/8are C"s'
• It has additional cost of each !i!eline stages trac)ing the
thread I of the Instruction it is !rocessing#
• /ince there are more threads "eing e1ecuted concurrently in
the !i!eline
shared resources increase# Caches need to "e larger to avoid threading "etween
the dierent threads#
Si+!'ane"+s M+!'i'-rea/ing