• No results found

A Scalable VISC Processor Platform for Modern Client and Cloud Workloads

N/A
N/A
Protected

Academic year: 2021

Share "A Scalable VISC Processor Platform for Modern Client and Cloud Workloads"

Copied!
21
0
0

Loading.... (view fulltext now)

Full text

(1)

A Scalable VISC™ Processor Platform

for Modern Client and Cloud Workloads

Mohammad Abdallah

Founder, President and CTO

Soft Machines

Linley Processor Conference

October 7, 2015

(2)

Soft Machines™ Background

Soft Machines VISC™ Architecture

Roadmap

Shasta VISC Processor

Mojave VISC SoC

Summary

(3)

Introduced Soft Machines VISC™ Architecture Oct’14

2-3x IPC speedup for up to 4x Perf/Watt, portable to all CPU ISAs

Working 28nm VISC CPU and SoC prototype

Developing VISC Architecture Processors and SoCs

Customized to Guest ISA & I/F, Processor configuration, SoC features

CPU/SoC licensing, Co-development and technology licensing

Today we will preview Shasta and Mojave

Shasta VISC Processor delivers server-class performance at mobile power

Mojave VISC SoC platform scalable from smart mobile to servers

To be announced in 2016

(4)

Soft Machines

(5)

VISC™ Architecture

Virtual SW layer

Guest Sequential Code

OS & Hypervisor

Single Thread

Guest ISA

Virtual ISA

L2$ & Memory

Core2

Core1

L1 D$

L1 D$

Core4

Core3

L1 D$

L1 D$

Virtual

Core1

Virtual

Core2

Virtual

Core3

Virtual

Core4

VISC

Virtual Cores

HW

Global Front End

Virtual HW Threadlets

VISC

SW Layer

(6)

VISC™ Virtual Cores Dynamically

Load Balance ST & MT Apps

or

• VISC dynamically allocates resources across virtual cores based on

individual application needs

• Performance/watt balanced for both single & multi-thread applications

Heavy App

Dual SW Threads

Single SW Thread

Heavy App

Light App

Virtual Cores

Virtual HW Threads/Threadlets

Core2

Core1

L1 D$

L1 D$

Virtual

Core1

Virtual

Core2

Virtual Cores

Virtual HW Threads/Threadlets

Core2

Core1

L1 D$

L1 D$

Virtual

Core1

(7)

VISC™ Virtual Cores Scale Power

Linearly

0.5

1.5

2.5

3.5

4.5

5.5

6.5

7.5

8.5

0.8

1

1.2

1.4

1.6

1.8

2

2.2

Core 1 Virtual Core 1 Virtual Core 2

P V

2

* F

P No. of virtual core resources

Performance Ratio

P

ow

er

Rati

o

(8)

VISC™ Architecture Platforms

M Abdallah, 10/7/2015 ©Copyright 2015 Soft Machines Inc.

3-4x IPC speed up

VISC™

Architecture

Processors

3-4x IPC speed up

VISC™

Architecture

Processors

2-3x IPC speed up

Guest ISA

Cloud

Networking

Mobile /

Desktop

Smart

Phones

IoT Gateways /

Embedded

(9)
(10)

VISC

TM

Processor & SoC Roadmap

VISC™

Processors

VISC™

SoCs

VISC

Proof-of-Concept

- 1VC/2C, 32 bit - 28nm

2015

2016

2018

Shasta

(Mid ’16)*

- 1-2VC/2C or SMP 2-4VC/4C - 64 bit, 2GHz - 16nm

Mojave

(Mid 16)**

- Shasta SMP 2-4VC/4ML2 - Customizable I/O features - 16nm

Tahoe

- 1-8VC - 10nm

Ordos

-

Tahoe SMP - 10nm

*RTL available **SoC tape-out

2017

Shasta+

- 1-4VC - 10nm

Tabernas

- Shasta+ SMP - 10nm - SoC Ref Design

(11)
(12)

Single and Dual Virtual Core configuration

Two physical cores act as 1 or 2 Virtual Cores

Virtual Cores dynamically load balance to service threads

64-bit ISA

Supports larger memory space addressing and more registers

Support for Multiple Guest ISAs

Also runs native VISC Apps

2GHz Frequency

(16FF+)

Up from ~500MHz prototype

SMP configuration on top of Virtual Cores

Proprietary coherency protocol

1 MB L2$ per physical core

System interface unit

Generic high speed 256-bit read/write bus adaptable to customer

specification (AMBA, OCP, CoreConnect, etc..)

Shasta VISC™ Processor

L2$ & Memory

Global Front End

Core2 Core1 L1 D$ L1 D$ Virtual Core1 Virtual Core2 Virtual HW Threads (HW threadlets)

Shasta VISC™ Dual Virtual

Core Processor

(13)

Shasta VISC™ Processor uArchitecture

L1I$ 32KB Fetch 1 BP Instruction Assembly 1 Threadlet/ Formation 1

Threadlet Allocation &

Scheduling 1 Core 1 EXE

RH RF R F BP L1I$ 32KB Fetch 2 Instruction Assembly 2 Threadlet/ Formation 2

Threadlet Allocation & Scheduling 2 RH RF R F LSQ L1 D$ 32KB L2 $ LSQ L1 D$ 32KB L2 $ Core 2 EXE TH0 TH0

(14)

Shasta VISC™ Processor Pipeline

L1I$ 32KB Fetch 1 BP Instruction Assembly 1 Threadlet/ Formation 1

Threadlet Allocation &

Scheduling 1 Core 1 EXE

RH RF R F BP L1I$ 32KB Fetch 2 Instruction Assembly 2 Threadlet/ Formation 2

Threadlet Allocation & Scheduling 2 RH RF R F LSQ L1 D$ 32KB L2 $ LSQ L1 D$ 32KB L2 $ Core 2 EXE TH0 TH0

3 Stages 3 Stages 6 Stages+1 1 Stage

1-2/4 Cycles

(15)

Shasta VISC™ Processor SMP

L2$ & Memory

Global Front End

Core2 Core1 L1 D$ L1 D$ Virtual Core1 Virtual Core2 VISC™

Dual Virtual Core Processor 0

Virtual HW Threads (HW threadlets) L2 $ & Me mo ry Glob al Fro nt En d Core 2 Core 1 L1 D$ D$ L1 Virtual Core1 Vir tual Cor e2 Virt ua l H W T hre ad s (HW th rea dle ts )

Coherency Support

VISC™

Dual Virtual Core Processor 1

(16)

Single Thread OOO Ways Perf/Watt

OOO

2-Wide

OOO

3-Wide

OOO Dual core6-Wide(3+3)

OOO Dual Core

10-Wide(5+5)

OOO

5-Wide

OOO Dual Core 4-Wide (2+2)

OOO

8-Wide

OOO Dual Core

16-Wide(8+8)

M

ob

ile

Se

rv

er

Power

(17)

Shasta Delivers Server Performance

at Mobile Power

OOO

2-Wide

OOO

3-Wide

OOO Dual core6-Wide(3+3)

OOO Dual Core

10-Wide(5+5)

OOO

5-Wide

OOO Dual Core 4-Wide (2+2)

OOO

8-Wide

OOO Dual Core

16-Wide(8+8)

Shasta VISC™ Processor

(1VC/2C)

Power

M

ob

ile

Se

rv

er

SPEC 2006 Score

(geomean of int & fp)

(18)
(19)

VISC™ SoC Platform

Scalable SoC Architecture

Ease of adding / deleting devices in SoC

Robust design methodology allows Specification to tape out in < 9 months

High Performance Low Power System

Focus on Memory / Interconnect performance

>200GB/s coherent fabric, 40 GB/s dual channel DDR4, 200 GB/s L3

High bandwidth Network and Storage connectivity

High performance Multimedia & Graphics

Industry Standard APIs and IP Blocks

OpenGL, OpenGL|ES, OpenMAX, OpenCL, AHCI SATA and XHCI USB

Soft Machines Enhanced SoC Subsystems

Plug-n-play HW/SW architecture for simplified system S/W development

Security & Virtualization

(20)

Mojave VISC™ SoC

Quad VISC™ CPU

2x Shasta Processor

Fast System Memory

1-4 Ch. LP/DDR4 2400-3200, 1-8 MB 4-way interleaved system cache

(WB/PF/DMA)

Network / Storage

1-2 1G E-net w/TCP partial offload/SRIOV Dual Storage SATA – 6G

Dual Flash UFS PCIe 3.0 8 Lanes

Multimedia/Graphics

400G–1TFLOPS, 800M-2B Tri/Sec OpenCL 2.0,

OpenGL ES 3.2 HEVC Video Enc/Dec

DTS Audio DSP

Display/Imaging

Triple 4K display outputs Dual 20MP ISP, inputs

HD Audio codec

Enterprise/

Management

Trusted Platform, HW AES/DES/HMAC/SHA, Remote Management, Fine grain DVFS

Virtualization/

Security

System MMU & GIC

Secure Zones: Secured Peripherals, Memory and Message Signaled Interrupts

DRAM OCI System Cache Network Storage GPU Video Enc/Dec Audio Display ISP System MMU/GIC Secure Zones Mgmt CPU PCIe, USB2/3 Quad VISC™ Shasta

(21)

VISC™ Architecture provides up to 4x Perf/Watt

Dynamic Virtual Cores and Threadlets provide 2-3x IPC speedup

Portable to all CPU ISAs

Applicable to a broad range of markets

First VISC products to be announced in 2016

Shasta VISC Processor delivers server-class performance at mobile

power

Mojave VISC SoC platform scalable from smart mobile to servers

Contact Soft Machines for more information

Smi-info@softmachines.com

References

Related documents

Generally, aerospace slip rings and brushes (sliding electrical contacts) are designed using traditional contact technologies such as lubricated monofilament wire brushes or

Factors such as maternal age, booking status, parity, multiple gestation, mode of delivery, sex of the babies and maternal disease in the current pregnancy and other

Therefore, the main target of urine sample collection and preparation is to ideally maintain the metabolomic state of the original sample, and to halt chemical/enzymatic

Leak detection based on pressure measurements as well as on a transient hydraulic model of the pipeline system.. No counters/flow meters

University of British Columbia University of Calgary University of Manitoba University of Ottawa University of Saskatchewan University of Toronto Wilfrid Laurier University

Our surveys also asked state and school district officials about the impact of the NCLB highly qualified teacher requirements on five other outcomes related to teaching:

In order to understand the problems of teacher retention, it is important to understand which teachers are leaving the profession across all types of schools, as well as

and mountain biking Mountain climbing, Mount..