• No results found

CS211 Advanced Computer Architecture

N/A
N/A
Protected

Academic year: 2021

Share "CS211 Advanced Computer Architecture"

Copied!
45
0
0

Loading.... (view fulltext now)

Full text

(1)

CS211

Advanced Computer Architecture

L01 Introduction

Chundong Wang

September 14th, 2021

(2)

Welcome to CS211

• Advanced topics related to computer architecture

• Prerequisite

• CS110 (Computer Architecture I) of ShanghaiTech University, or equivalents

• Reference Textbook

• 计算机体系结构:量化研究方法(英文版·原书第6版). 约翰·L. 亨尼斯(John L.

Hennessy),戴维·A. 帕特森 (David A. Patterson)、机械工业出版社、2019年07月、ISBN 9787111631101

• Instructor

• Chundong Wang (王春东)

• Office: SIST 1A-504.D

• Email: wangchd

• Office hour: 10:00-12:00, every Friday in the Fall semester of 2021

• Teaching assistant (TA) of Fall 2021

• Mr. Chongnan Ye (叶崇南)

• Email: yechn

• Office hour: 19:00-21:00, every Wednesday in the Fall semester of 2021

• Online

https://toast-lab.sist.shanghaitech.edu.cn/courses/CS211@ShanghaiTech/Fall-2021/

• Gradescope, Piazza, etc.

CS211@ShanghaiTech 2

(3)

Why CS211?

•To become a computer architect

• Alumni of this class helped design your computer

• To make the frequent cases fast and the rare case correct

•To learn what is under the hood of a computer

• Innate curiosity

• To better understand when things break

• To write better code/applications

• To write better system software (OS, compiler, DB, etc.)

•Because it is intellectually fascinating!

• What is the most complex man-made single device?

(4)

Topics covered by CS211

• Microcode, instructions, ISA

• Pipeline

• Memory hierarchy

• CPU cache/main memory/disk

• In-order execution and out-of- order execution

• Speculative execution

• Branch prediction

• VLIW

• Very long instruction word

• Multi-threading

• Vectors

• Cache coherence

• Memory consistency

• Synchronization

• Virtual machines

• TEE

• Trusted Execution Environment

• Security and Privacy

CS211@ShanghaiTech 6

(5)

Evaluation

Item Percentage Note

Participation and quizzes 10% In the middle of some live lectures

Homework 15% Three or more to be assigned

Mid-term examination 15% 26th October, Tuesday (tentative)

Literature reading 15% Three (tentative) papers, details to be issued

Labs 20% Three (tentative) labs, details to be announced

Final examination 25% To be decided

(6)

Basic Rules

• No Abuse

• Abusive words or behaviours are NOT allowed.

• Be graceful and respectful.

• No Plagiarism

• Do not copy ANYTHING from ANYWHERE at ANYTIME.

• Be yourself.

CS211@ShanghaiTech 8

(7)

Policy on Assignments and Independent Work

• With the exception of laboratories and assignments that explicitly permit you to work in groups, all homework and projects are to be YOUR work and your work ALONE.

• PARTNER TEAMS MAY NOT WORK WITH OTHER PARTNER TEAMS

• You can discuss your assignments with other students, and credit will be assigned to students who help others by answering questions (participation), but we expect that what you hand in is yours.

• Level of detail allowed to discuss with other students: Concepts (Material taught in the class/ in the text book)! Pseudocode is NOT allowed!

• Use the Office Hours of the TA and the Prof. if you need help with your homework/ project!

• Rather submit an incomplete homework with maybe 0 points than risking an F!

• It is NOT acceptable to copy solutions from other students.

• You can never look at homework/ project code not by you/your team!

• You cannot give your code to anybody else –> secure your computer when not around it

• It is NOT acceptable to copy (or start your) solutions from the Web.

• It is NOT acceptable to use PUBLIC github/gitlab/gitee archives (giving your answers away)

• We have tools and methods, developed over many years, for detecting this. You WILL be caught, and the penalties WILL be severe.

• At the minimum F in the course, and a letter to your university record documenting the incidence of cheating.

• Both Giver and Receiver are equally culpable and suffer equal penalties

(8)

Tenative Timetable before 1st October, 2020

CS211@ShanghaiTech 10

Week Date Lecture Task

1 14 Sept., Tuesday

Introduction & Review

Survey

16 Sept., Thursday Lab 0 + Paper Reading 0

2 21 Sept., Tuesday

Public holiday

23 Sept., Thursday Instruction, ISA, Microcode

3 28 Sept., Tuesday

Pipeline

HW 1 issued

30 Sept., Thursday Lab 1 + Paper Reading 1

(9)

L01 Survey

https://www.wjx.cn/vj/Y5wX94T.aspx

(10)

Now, let’s dive into CA.

CS211@ShanghaiTech 12

(11)

Computer Architecture

Algorithm, e.g., KMP, Quicksort

Gates/Register-Transfer Level (RTL) Application, e.g., WeChat, Meituan

Instruction Set Architecture (ISA) Operating System/Virtual Machines

Microarchitecture

Devices

Programming Language, e.g., C/C++/Python

Circuits Physics

Copyright © 2016 Elsevier Ltd. All rights reserved.

(12)

14

Computing Devices Long Long ago

EDSAC (Electronic delay storage automatic calculator), University of Cambridge, UK, May 1949

CS211@ShanghaiTech

(13)

The 1st computer of China

http://www.ict.cas.cn/kxcb/kxtp/200909/t20090922_2514368.html

http://news.sciencenet.cn/sbhtmlnews/2019/9/349501.shtm

Computer 103 (103机)

Kick-off: On August 1st 1958, four instructions were executed on Computer 103 which was with 1KB memory and 30 instructions per second.

Products: In all, 41 machines were manufactured.

The remaining intact one: Fudan University (复旦大学) purchased it in 1964. In 1973, Fudan donated it to Qufu Normal University (曲阜师范大学). It was delivered using two trucks with three instructors accompanied. Qufu Normal University built a two-story building for it.

(14)

16

Computing Devices Now

Robots

Supercomputers Automobiles

Laptops

Set-top boxes

Smart phones

Servers Media

Players Sensor Nets

Routers Cameras

Games

CS211@ShanghaiTech

(15)

Cost of software development makes compatibility a major force in market

Architecture continually changing

Applications Technology Applications

suggest how to improve

technology, provide

revenue to fund

development

Improved

technologies

make new

applications

possible

(16)

Moore’s Law

18

Gordon Moore Intel Cofounder

Predicts:

2X Transistors / chip every 2 years

"The number of transistors incorporated in a chip will approximately double every 24 months."—Gordon Moore, Intel co-founder

(17)

Single-Thread Processor Performance

[ Hennessy & Patterson, 2019 ]

Growth in processor performance over 40 years

(18)

1 10 100

1999 2003 2006 2008 2011 2013 2014 2015 2016 2017

DR AM Im pr ov em en t ( lo g)

Capacity Bandwidth Latency

DRAM Capacity, Bandwidth, Latency Trends

128x

20x

1.3x

(19)

Upheaval in Computer Design

• Most of last 50 years, Moore’s Law ruled

• Technology scaling allowed continual performance/energy improvements without changing software model

• Last decade, technology scaling slowed/stopped

• Dennard (voltage) scaling over (supply voltage ~fixed)

• Moore’s Law (cost/transistor) over?

• No competitive replacement for CMOS anytime soon

• Energy efficiency constrains everything

• No “free lunch” for software developers, must consider:

• Parallel systems

• e.g., AMD released their first dual-core processor, the Athlon 64 X2 3800+ (2.0 GHz, 512 KB L2 cache per core), on April 21, 2005.

• Heterogeneous systems

• e.g., ARM’s big.LITTLE

• ”LITTLE” processors (e.g., Cortex-A53) are designed for maximum power efficiency while ”big”

processors (e.g., Cortex-A73) are designed to provide maximum compute performance.

CS211@ShanghaiTech

(20)

Today’s Dominant Target Systems

• Mobile (smartphone/tablet)

• >1 billion sold/year

• Market dominated by ARM-ISA-compatible general-purpose processor in system-on-a-chip (SoC)

• Plus sea of custom accelerators (radio, image, video, graphics, audio, motion, location, security, etc.)

• Warehouse-Scale Computers (WSCs)

• 100,000’s cores per warehouse

• Market dominated by x86-compatible server chips

• Dedicated apps, plus cloud hosting of virtual machines

• Now seeing increasing use of GPUs, FPGAs, custom hardware to accelerate workloads

• Embedded computing

• Wired/wireless network infrastructure, printers

• Consumer TV/Music/Games/Automotive/Camera/MP3

• Internet of Things!

22 CS211@ShanghaiTech

(21)

A toy example to review CA

(22)

What are not covered by CA?

CS211@ShanghaiTech 24

Programming, and

language design Compiling

Data structure and Algorithm

Process scheduling

File operations

(23)

What are covered by CA?

Instructions and micro-codes Instruction execution:

pipeline, in-order or out-of- order, speculation, etc.

Memory hierarchy: cache, main memory, disk, etc.

Exception, interrupt, etc.

Single-threaded or multi-threaded I/O

(24)

Conventional Machine Structure

26

CA

I/O system Processor

Compiler

Operating System (Mac OSX) Application (ex: browser)

Digital Design Circuit Design

Instruction Set Architecture Datapath & Control

transistors

Memory

Hardware

Software

Assembler

Today: ISA, Pipeline, etc.

Next: memory hierarchy, I/O, threading, WSC, etc.

(25)

Instruction Set Architecture

(26)

ISA: instruction set architecture

9/14/2021 28

instruction set software

hardware

ISA is the actual programmer-visible instruction set, a critical interface/boundary/contract between software and hardware

An instruction is a single operation, with an opcode and zero or more operands, of a processor, defined by the processor instruction set.

(27)

An example, add of RISC-V

add rd, rs1, rs2

add: opcode

rd, rs1, rs2: registers as operands

[rd]  [rs1] + [rs2]

• No memory access

• 32-bit long

0000000 rs2 rs1 000 rd 0110011

Reg-Reg OP rd

add rs2 rs1 add

7 5 5 3 5 7

31 25 24 20 19 1514 12 11 76 0

funct7 rs2 rs1 funct3 rd opcode

(28)

7 Dimensions of ISA

• D1: Class of ISA

• General-purpose register architectures

• Operands are either registers or memory locations

• Two versions

• Register-memory ISA, e.g., 80x86, in which many instructions can access memory

• Load-store ISA, e.g., ARMv8 and RISC-V, in which only load or store can access memory

• D2: Memory addressing

• Byte-level addressing to access memory operands.

• D3: Addressing modes

• How memory addresses are interpreted and specified

• Basic: Register, Immediate (constant), Displacement (constant + register)

• RISC-V supports these three modes

• Other ISAs have more variations of displacement

• e.g., 80x86 has three variations of displacementCS211@ShanghaiTech 30

1. 80x86 has 16 general-purpose registers and 16 registers for floating-point data.

2. RISC-V has 32 general-purpose registers and 32 registers for floating-point data

ARMv8 demands address

alignment in load/store. An access to an object of size 𝑠𝑠 bytes at byte address 𝐴𝐴 is aligned if 𝐴𝐴 % 𝑠𝑠 = 0.

80x86 and RISC-V do not require such alignment.

(29)

7 Dimensions of ISA (cont.)

• D4: Types and sizes of operands

• 8-bit (character), 16-bit (half word), 32-bit (word), and 64-bit (double word)

• IEEE 754 floating point in 32-bit (single precision) and 64-bit (double precision)

• D5: Operations

• Data transfer, arithmetic, control, floating point

• Floating point operations are complicated

• Versus fixed point data

• Using binary (0/1) to represent real numbers is inexact as precision matters!!!

https://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html

https://software.intel.com/sites/default/files/article/164389/fp-consistency-102511.pdf

Data transfer

Arithmetic

Control

(30)

7 Dimensions of ISA (cont.)

• D6: Control flow instructions

• Conditional branch

• Unconditional jump

• Procedure call

• Return

• D7: Encoding an ISA

• Fixed length or variable length

• ARMv8 and RISC-V instructions are 32-bit long

• 80x86 encoding is variable length, ranging from 1 to 18 bytes

CS211@ShanghaiTech 33

(31)

Implementing the add instruction of RISC-V

add rd, rs1, rs2

• Instruction makes two changes to machine’s state:

Reg[rd] = Reg[rs1] + Reg[rs2]

PC = PC + 4 %%% To get next instruction

0000000 rs2 rs1 000 rd 0110011

Reg-Reg OP rd

add rs2 rs1 add

7 5 5 3 5 7

31 25 24 20 19 1514 12 11 76 0

funct7 rs2 rs1 funct3 rd opcode

(32)

Datapath for add

+4

Add

clk

addr inst

IMEM

PC pc+4

Inst[24:20] ALU+

clk Reg [ ] Inst[19:15]

Inst[11:7]

AddrB

AddrA DataA DataB AddrD

DataD

alu Reg[rs1]

Reg[rs2]

Inst[31:0]

Control logic

RegWriteEnable (RegWEn)

=1

add 5 5 add 5 Reg-Reg OP

31 25 24 20 19 1514 12 11 76 0

0000000 rs2 rs1 000 rd opcode

Reg[rd] = Reg[rs1] + Reg[rs2]

PC = PC + 4

35

(33)

From architecture to microarchitecture

• Instructions are visible to programmers

• Two processors may have the same ISA but different microarchitectures

• e.g., AMD Opteron and Intel Core i7, with the same 80x86 ISA, have very different pipelines and cache organizations.

• An instruction is partitioned into multiple stages

• Five classic stages of executing an instruction

• Instruction fetch

• Instruction decode/register fetch

• Execute

• Memory access

• Write back

(34)

Stages of Execution on Datapath

ins truc tion me mo ry

+4

rs2 rs1 rd

regi st er s

ALU

D at a me mo ry

imm

1. Instruction

Fetch 2. Decode/

Register Read

3. Execute 4. Memory 5. Write Back

PC

37

(35)

Pipeline

(36)

Instruction execution one by one

CS211@ShanghaiTech 39

Inst. No. 1 2 3 4 5 6 7 8 9 10 11 12 13

i (add) F D X M W

i+1 F D X

i+2 F D X M

i+3 F

i+4

Low utilization of hardware, and long execution time

(37)

Idle hardware

ins truc tion me mo ry

+4

rs2 rs1 rd

regi st er s

ALU

D at a me mo ry

imm

1. Instruction

Fetch 2. Decode/

Register Read

3. Execute 4. Memory 5. Write Back

PC

(38)

Pipeline: instruction-level parallelism

CS211@ShanghaiTech 41

Inst. No. 1 2 3 4 5 6 7 8 9 10 11 12 13

i F D X M W

i+1 F D X

i+2 F D X M

i+3 F D X M W

i+4 F D X M W

(39)

Well, an imperfect world

• Hazards

• A situation that prevents starting the next instruction in the next clock cycle

• A hazard may stall the pipeline

• Structural hazard

• A required hardware resource is busy

• Data hazard

• An instruction depends on the result(s) of a previous instruction

• Control hazard

• The pipelining of braches and other instructions that change the PC

(40)

Structural Hazard

add t0, t1, t2 lw t5, 0(t4) sll t6, t0, t3

instruction sequence

sw t0, 4(t3) lw t0, 8(t3)

• Instruction and data memory used simultaneously

 Use two separate memories

43

A required hardware resource is busy

(41)

Data hazard

An instruction depends on the result(s) of a previous instruction

add t0, t1, t2 lw t5, 0(t4) sll t6, t0, t3

instruction sequence

sw t0, 4(t3) lw t0, 8(t3)

 Solution 1: stall the pipeline

 Solution 2: register  register, fast enough to do write then read in one clock cycle?

 Solution 3: forwarding

(42)

Control Hazard

• Branches

• We need to calculate next PC

• Exception

• Something unusual that breaks the normal flow of instruction execution

• Exception handling

CS211@ShanghaiTech 45

Pipeline may stall due to branch

Jump to next instruction, with

a new PC

“assert” may terminate the program

(43)

Control Hazard

• Branches

• We need to calculate next PC

• Exception

• Something unusual that breaks the normal flow of instruction execution

• Exception handling, for synchronous and asynchronous events

• Jump to exception handler program, and return back if possible

PC

MemInst.

D

Decode

E + M

MemData

W

Illegal

Opcode Overflow Data address

Exceptions PC address

Exception

Asynchronous Interrupts

(44)

Conclusion

• The domain of CA

• Many interesting topics

• CA is a set of rules and methods that describe the functionality, organization, and implementation of computer systems (from wikipedia)

• CA describes the capabilities and programming model of a computer but not a particular implementation (also from wikipedia)

• The evolution of CA

• New challenges emerge from time to time

• Review of ISA and Pipeline

CS211@ShanghaiTech 48

(45)

Acknowledgements

• These slides contain materials developed and copyright by:

• Prof. Krste Asanovic (UC Berkeley)

• Prof. Xuehai Zhou (USTC)

• Prof. Onur Mutlu (ETH Zurich)

• Prof. Mikko Lipasti (UW-Madison)

• Prof. Sören Schwertfeger (ShanghaiTech)

• Prof. Kenji Kise (Tokyo Tech)

• Prof. Wenzhi Chen (ZJU)

References

Related documents

Abstracts should be submitted electronically to ECS headquarters, and questions and inquiries should be sent to the symposium organizers: Pe- ter Hesketh , Georgia Institute

Potential Severe Liver Injury Signal in Rivaroxaban Trials.. Cardiovascular and

#3168 Analogies for Critical Thinking 14 ©Teacher Created Resources Past and Present 2.. Directions: These analogies are based on past and present

If all attempts to establish radio contact with a ground station fail, the pilot of an airplane shall transmit messages preceded by the phrase:?.

Our primary finding was that skilled deaf readers had an enhanced perceptual span in comparison to the matched hearing control group: reading rate for the SKD group reached asymptote

ЗАПОРІЗЬКИЙ НАЦІОНАЛЬНИЙ ТЕХНІЧНИЙ УНІВЕРСИТЕТ Факультет радіоелектроніки і телекомунікацій Захист інформації Пояснювальна записка до

Located in the "Zona Hotelera" of Los Cabos, the Santa María Hotel & Suites is just 10 minutes from the International Airport, 5 minutes away from San José del