• No results found

A Scalable VISC Processor Platform for Modern Client and Cloud Workloads

N/A
N/A
Protected

Academic year: 2021

Share "A Scalable VISC Processor Platform for Modern Client and Cloud Workloads"

Copied!
21
0
0

Loading.... (view fulltext now)

Full text

(1)

A Scalable VISC™ Processor Platform

for Modern Client and Cloud Workloads

Mohammad Abdallah

Founder, President and CTO

Soft Machines

Linley Processor Conference

October 7, 2015

(2)

Soft Machines™ Background

Soft Machines VISC™ Architecture

Roadmap

Shasta VISC Processor

Mojave VISC SoC

Summary

(3)

Introduced Soft Machines VISC™ Architecture Oct’14

2-3x IPC speedup for up to 4x Perf/Watt, portable to all CPU ISAs

Working 28nm VISC CPU and SoC prototype

Developing VISC Architecture Processors and SoCs

Customized to Guest ISA & I/F, Processor configuration, SoC features

CPU/SoC licensing, Co-development and technology licensing

Today we will preview Shasta and Mojave

Shasta VISC Processor delivers server-class performance at mobile power

Mojave VISC SoC platform scalable from smart mobile to servers

To be announced in 2016

(4)

Soft Machines

(5)

VISC™ Architecture

Virtual SW layer

Guest Sequential Code

OS & Hypervisor

Single Thread

Guest ISA

Virtual ISA

L2$ & Memory

Core2

Core1

L1 D$

L1 D$

Core4

Core3

L1 D$

L1 D$

Virtual

Core1

Virtual

Core2

Virtual

Core3

Virtual

Core4

VISC

Virtual Cores

HW

Global Front End

Virtual HW Threadlets

VISC

SW Layer

(6)

VISC™ Virtual Cores Dynamically

Load Balance ST & MT Apps

or

• VISC dynamically allocates resources across virtual cores based on

individual application needs

• Performance/watt balanced for both single & multi-thread applications

Heavy App

Dual SW Threads

Single SW Thread

Heavy App

Light App

Virtual Cores

Virtual HW Threads/Threadlets

Core2

Core1

L1 D$

L1 D$

Virtual

Core1

Virtual

Core2

Virtual Cores

Virtual HW Threads/Threadlets

Core2

Core1

L1 D$

L1 D$

Virtual

Core1

(7)

VISC™ Virtual Cores Scale Power

Linearly

0.5

1.5

2.5

3.5

4.5

5.5

6.5

7.5

8.5

0.8

1

1.2

1.4

1.6

1.8

2

2.2

Core 1 Virtual Core 1 Virtual Core 2

P V

2

* F

P No. of virtual core resources

Performance Ratio

P

ow

er

Rati

o

(8)

VISC™ Architecture Platforms

M Abdallah, 10/7/2015 ©Copyright 2015 Soft Machines Inc.

3-4x IPC speed up

VISC™

Architecture

Processors

3-4x IPC speed up

VISC™

Architecture

Processors

2-3x IPC speed up

Guest ISA

Cloud

Networking

Mobile /

Desktop

Smart

Phones

IoT Gateways /

Embedded

(9)
(10)

VISC

TM

Processor & SoC Roadmap

VISC™

Processors

VISC™

SoCs

VISC

Proof-of-Concept

- 1VC/2C, 32 bit - 28nm

2015

2016

2018

Shasta

(Mid ’16)*

- 1-2VC/2C or SMP 2-4VC/4C - 64 bit, 2GHz - 16nm

Mojave

(Mid 16)**

- Shasta SMP 2-4VC/4ML2 - Customizable I/O features - 16nm

Tahoe

- 1-8VC - 10nm

Ordos

-

Tahoe SMP - 10nm

*RTL available **SoC tape-out

2017

Shasta+

- 1-4VC - 10nm

Tabernas

- Shasta+ SMP - 10nm - SoC Ref Design

(11)
(12)

Single and Dual Virtual Core configuration

Two physical cores act as 1 or 2 Virtual Cores

Virtual Cores dynamically load balance to service threads

64-bit ISA

Supports larger memory space addressing and more registers

Support for Multiple Guest ISAs

Also runs native VISC Apps

2GHz Frequency

(16FF+)

Up from ~500MHz prototype

SMP configuration on top of Virtual Cores

Proprietary coherency protocol

1 MB L2$ per physical core

System interface unit

Generic high speed 256-bit read/write bus adaptable to customer

specification (AMBA, OCP, CoreConnect, etc..)

Shasta VISC™ Processor

L2$ & Memory

Global Front End

Core2 Core1 L1 D$ L1 D$ Virtual Core1 Virtual Core2 Virtual HW Threads (HW threadlets)

Shasta VISC™ Dual Virtual

Core Processor

(13)

Shasta VISC™ Processor uArchitecture

L1I$ 32KB Fetch 1 BP Instruction Assembly 1 Threadlet/ Formation 1

Threadlet Allocation &

Scheduling 1 Core 1 EXE

RH RF R F BP L1I$ 32KB Fetch 2 Instruction Assembly 2 Threadlet/ Formation 2

Threadlet Allocation & Scheduling 2 RH RF R F LSQ L1 D$ 32KB L2 $ LSQ L1 D$ 32KB L2 $ Core 2 EXE TH0 TH0

(14)

Shasta VISC™ Processor Pipeline

L1I$ 32KB Fetch 1 BP Instruction Assembly 1 Threadlet/ Formation 1

Threadlet Allocation &

Scheduling 1 Core 1 EXE

RH RF R F BP L1I$ 32KB Fetch 2 Instruction Assembly 2 Threadlet/ Formation 2

Threadlet Allocation & Scheduling 2 RH RF R F LSQ L1 D$ 32KB L2 $ LSQ L1 D$ 32KB L2 $ Core 2 EXE TH0 TH0

3 Stages 3 Stages 6 Stages+1 1 Stage

1-2/4 Cycles

(15)

Shasta VISC™ Processor SMP

L2$ & Memory

Global Front End

Core2 Core1 L1 D$ L1 D$ Virtual Core1 Virtual Core2 VISC™

Dual Virtual Core Processor 0

Virtual HW Threads (HW threadlets) L2 $ & Me mo ry Glob al Fro nt En d Core 2 Core 1 L1 D$ D$ L1 Virtual Core1 Vir tual Cor e2 Virt ua l H W T hre ad s (HW th rea dle ts )

Coherency Support

VISC™

Dual Virtual Core Processor 1

(16)

Single Thread OOO Ways Perf/Watt

OOO

2-Wide

OOO

3-Wide

OOO Dual core6-Wide(3+3)

OOO Dual Core

10-Wide(5+5)

OOO

5-Wide

OOO Dual Core 4-Wide (2+2)

OOO

8-Wide

OOO Dual Core

16-Wide(8+8)

M

ob

ile

Se

rv

er

Power

(17)

Shasta Delivers Server Performance

at Mobile Power

OOO

2-Wide

OOO

3-Wide

OOO Dual core6-Wide(3+3)

OOO Dual Core

10-Wide(5+5)

OOO

5-Wide

OOO Dual Core 4-Wide (2+2)

OOO

8-Wide

OOO Dual Core

16-Wide(8+8)

Shasta VISC™ Processor

(1VC/2C)

Power

M

ob

ile

Se

rv

er

SPEC 2006 Score

(geomean of int & fp)

(18)
(19)

VISC™ SoC Platform

Scalable SoC Architecture

Ease of adding / deleting devices in SoC

Robust design methodology allows Specification to tape out in < 9 months

High Performance Low Power System

Focus on Memory / Interconnect performance

>200GB/s coherent fabric, 40 GB/s dual channel DDR4, 200 GB/s L3

High bandwidth Network and Storage connectivity

High performance Multimedia & Graphics

Industry Standard APIs and IP Blocks

OpenGL, OpenGL|ES, OpenMAX, OpenCL, AHCI SATA and XHCI USB

Soft Machines Enhanced SoC Subsystems

Plug-n-play HW/SW architecture for simplified system S/W development

Security & Virtualization

(20)

Mojave VISC™ SoC

Quad VISC™ CPU

2x Shasta Processor

Fast System Memory

1-4 Ch. LP/DDR4 2400-3200, 1-8 MB 4-way interleaved system cache

(WB/PF/DMA)

Network / Storage

1-2 1G E-net w/TCP partial offload/SRIOV Dual Storage SATA – 6G

Dual Flash UFS PCIe 3.0 8 Lanes

Multimedia/Graphics

400G–1TFLOPS, 800M-2B Tri/Sec OpenCL 2.0,

OpenGL ES 3.2 HEVC Video Enc/Dec

DTS Audio DSP

Display/Imaging

Triple 4K display outputs Dual 20MP ISP, inputs

HD Audio codec

Enterprise/

Management

Trusted Platform, HW AES/DES/HMAC/SHA, Remote Management, Fine grain DVFS

Virtualization/

Security

System MMU & GIC

Secure Zones: Secured Peripherals, Memory and Message Signaled Interrupts

DRAM OCI System Cache Network Storage GPU Video Enc/Dec Audio Display ISP System MMU/GIC Secure Zones Mgmt CPU PCIe, USB2/3 Quad VISC™ Shasta

(21)

VISC™ Architecture provides up to 4x Perf/Watt

Dynamic Virtual Cores and Threadlets provide 2-3x IPC speedup

Portable to all CPU ISAs

Applicable to a broad range of markets

First VISC products to be announced in 2016

Shasta VISC Processor delivers server-class performance at mobile

power

Mojave VISC SoC platform scalable from smart mobile to servers

Contact Soft Machines for more information

[email protected]

References

Related documents