A Scalable VISC™ Processor Platform
for Modern Client and Cloud Workloads
Mohammad Abdallah
Founder, President and CTO
Soft Machines
Linley Processor Conference
October 7, 2015
•
Soft Machines™ Background
•
Soft Machines VISC™ Architecture
•
Roadmap
•
Shasta VISC Processor
•
Mojave VISC SoC
•
Summary
•
Introduced Soft Machines VISC™ Architecture Oct’14
2-3x IPC speedup for up to 4x Perf/Watt, portable to all CPU ISAs
Working 28nm VISC CPU and SoC prototype
•
Developing VISC Architecture Processors and SoCs
Customized to Guest ISA & I/F, Processor configuration, SoC features
CPU/SoC licensing, Co-development and technology licensing
•
Today we will preview Shasta and Mojave
Shasta VISC Processor delivers server-class performance at mobile power
Mojave VISC SoC platform scalable from smart mobile to servers
To be announced in 2016
Soft Machines
VISC™ Architecture
Virtual SW layer
Guest Sequential Code
OS & Hypervisor
Single Thread
Guest ISA
Virtual ISA
L2$ & Memory
Core2
Core1
L1 D$
L1 D$
Core4
Core3
L1 D$
L1 D$
Virtual
Core1
Virtual
Core2
Virtual
Core3
Virtual
Core4
VISC
Virtual Cores
HW
Global Front End
Virtual HW Threadlets
VISC
SW Layer
VISC™ Virtual Cores Dynamically
Load Balance ST & MT Apps
or
• VISC dynamically allocates resources across virtual cores based on
individual application needs
• Performance/watt balanced for both single & multi-thread applications
Heavy App
Dual SW Threads
Single SW Thread
Heavy App
Light App
Virtual Cores
Virtual HW Threads/Threadlets
Core2
Core1
L1 D$
L1 D$
Virtual
Core1
Virtual
Core2
Virtual Cores
Virtual HW Threads/Threadlets
Core2
Core1
L1 D$
L1 D$
Virtual
Core1
VISC™ Virtual Cores Scale Power
Linearly
0.5
1.5
2.5
3.5
4.5
5.5
6.5
7.5
8.5
0.8
1
1.2
1.4
1.6
1.8
2
2.2
Core 1 Virtual Core 1 Virtual Core 2P V
2* F
P No. of virtual core resources
Performance Ratio
P
ow
er
Rati
o
VISC™ Architecture Platforms
M Abdallah, 10/7/2015 ©Copyright 2015 Soft Machines Inc.
3-4x IPC speed up
VISC™
Architecture
Processors
3-4x IPC speed up
VISC™
Architecture
Processors
2-3x IPC speed up
Guest ISA
Cloud
Networking
Mobile /
Desktop
Smart
Phones
IoT Gateways /
Embedded
VISC
TM
Processor & SoC Roadmap
VISC™
Processors
VISC™
SoCs
VISC
Proof-of-Concept
- 1VC/2C, 32 bit - 28nm2015
2016
2018
Shasta
(Mid ’16)*
- 1-2VC/2C or SMP 2-4VC/4C - 64 bit, 2GHz - 16nmMojave
(Mid 16)**
- Shasta SMP 2-4VC/4ML2 - Customizable I/O features - 16nmTahoe
- 1-8VC - 10nmOrdos
-
Tahoe SMP - 10nm*RTL available **SoC tape-out
2017
Shasta+
- 1-4VC - 10nmTabernas
- Shasta+ SMP - 10nm - SoC Ref Design•
Single and Dual Virtual Core configuration
Two physical cores act as 1 or 2 Virtual Cores
Virtual Cores dynamically load balance to service threads
•
64-bit ISA
Supports larger memory space addressing and more registers
•
Support for Multiple Guest ISAs
Also runs native VISC Apps
•
2GHz Frequency
(16FF+)
Up from ~500MHz prototype
•
SMP configuration on top of Virtual Cores
Proprietary coherency protocol
•
1 MB L2$ per physical core
•
System interface unit
Generic high speed 256-bit read/write bus adaptable to customer
specification (AMBA, OCP, CoreConnect, etc..)
Shasta VISC™ Processor
L2$ & Memory
Global Front End
Core2 Core1 L1 D$ L1 D$ Virtual Core1 Virtual Core2 Virtual HW Threads (HW threadlets)
Shasta VISC™ Dual Virtual
Core Processor
Shasta VISC™ Processor uArchitecture
L1I$ 32KB Fetch 1 BP Instruction Assembly 1 Threadlet/ Formation 1Threadlet Allocation &
Scheduling 1 Core 1 EXE
RH RF R F BP L1I$ 32KB Fetch 2 Instruction Assembly 2 Threadlet/ Formation 2
Threadlet Allocation & Scheduling 2 RH RF R F LSQ L1 D$ 32KB L2 $ LSQ L1 D$ 32KB L2 $ Core 2 EXE TH0 TH0
Shasta VISC™ Processor Pipeline
L1I$ 32KB Fetch 1 BP Instruction Assembly 1 Threadlet/ Formation 1Threadlet Allocation &
Scheduling 1 Core 1 EXE
RH RF R F BP L1I$ 32KB Fetch 2 Instruction Assembly 2 Threadlet/ Formation 2
Threadlet Allocation & Scheduling 2 RH RF R F LSQ L1 D$ 32KB L2 $ LSQ L1 D$ 32KB L2 $ Core 2 EXE TH0 TH0
3 Stages 3 Stages 6 Stages+1 1 Stage
1-2/4 Cycles
Shasta VISC™ Processor SMP
L2$ & Memory
Global Front End
Core2 Core1 L1 D$ L1 D$ Virtual Core1 Virtual Core2 VISC™
Dual Virtual Core Processor 0
Virtual HW Threads (HW threadlets) L2 $ & Me mo ry Glob al Fro nt En d Core 2 Core 1 L1 D$ D$ L1 Virtual Core1 Vir tual Cor e2 Virt ua l H W T hre ad s (HW th rea dle ts )
Coherency Support
VISC™Dual Virtual Core Processor 1
Single Thread OOO Ways Perf/Watt
OOO
2-Wide
OOO
3-Wide
OOO Dual core6-Wide(3+3)OOO Dual Core
10-Wide(5+5)
OOO
5-Wide
OOO Dual Core 4-Wide (2+2)
OOO
8-Wide
OOO Dual Core
16-Wide(8+8)
M
ob
ile
Se
rv
er
Power
Shasta Delivers Server Performance
at Mobile Power
OOO
2-Wide
OOO
3-Wide
OOO Dual core6-Wide(3+3)OOO Dual Core
10-Wide(5+5)
OOO
5-Wide
OOO Dual Core 4-Wide (2+2)
OOO
8-Wide
OOO Dual Core
16-Wide(8+8)
Shasta VISC™ Processor
(1VC/2C)
Power
M
ob
ile
Se
rv
er
SPEC 2006 Score
(geomean of int & fp)
VISC™ SoC Platform
•
Scalable SoC Architecture
•
Ease of adding / deleting devices in SoC
•
Robust design methodology allows Specification to tape out in < 9 months
•
High Performance Low Power System
•
Focus on Memory / Interconnect performance
•
>200GB/s coherent fabric, 40 GB/s dual channel DDR4, 200 GB/s L3
•
High bandwidth Network and Storage connectivity
•
High performance Multimedia & Graphics
•
Industry Standard APIs and IP Blocks
•
OpenGL, OpenGL|ES, OpenMAX, OpenCL, AHCI SATA and XHCI USB
•
Soft Machines Enhanced SoC Subsystems
•
Plug-n-play HW/SW architecture for simplified system S/W development
•
Security & Virtualization
Mojave VISC™ SoC
Quad VISC™ CPU
2x Shasta Processor
Fast System Memory
1-4 Ch. LP/DDR4 2400-3200, 1-8 MB 4-way interleaved system cache
(WB/PF/DMA)
Network / Storage
1-2 1G E-net w/TCP partial offload/SRIOV Dual Storage SATA – 6G
Dual Flash UFS PCIe 3.0 8 Lanes
Multimedia/Graphics
400G–1TFLOPS, 800M-2B Tri/Sec OpenCL 2.0,
OpenGL ES 3.2 HEVC Video Enc/Dec
DTS Audio DSP
Display/Imaging
Triple 4K display outputs Dual 20MP ISP, inputs
HD Audio codec
Enterprise/
Management
Trusted Platform, HW AES/DES/HMAC/SHA, Remote Management, Fine grain DVFSVirtualization/
Security
System MMU & GIC
Secure Zones: Secured Peripherals, Memory and Message Signaled Interrupts
DRAM OCI System Cache Network Storage GPU Video Enc/Dec Audio Display ISP System MMU/GIC Secure Zones Mgmt CPU PCIe, USB2/3 Quad VISC™ Shasta