Copyright 2014 Cavium Inc.
Copyright 2014 Cavium Inc.
Trends Disrupting Server Industry
Compute, Network & Storage Virtualization
Application Specific Servers
Large end users designing server HW optimized for their applications – ODM Direct Model
Copyright 2014 Cavium Inc.
• Single threaded or limited multi-threaded program(s)
• Workload performance primarily dependent on
CPU/memory performance
• 1000s of applications used by 1000s of users
• Virtualization used to improve server utilization
• Managed by traditional IT
• Traditional benchmarks
Copyright 2014 Cavium. Confidential.
• Highly Distributed – the “System(s) are the Computer”
Shared-nothing architectures - Distributed Data and Distributed Computation Many node environments
Highly parallel – add more threads, go faster Multiple OS instances – fault tolerance in SW
• BIG DATA
Large and highly distributed data sets
• Nodes are often special purpose
Cloud Applications can benefit from a “New Class of Servers”
New class of servers requires new class of benchmarks
Copyright 2014 Cavium Inc.
Multiple applications consolidated
in
MULTI-TENANT SERVER FARM
ONE APPLICATION
used by
10M+ USERS
Need for Workload Optimized Servers
Office365 SQL Server SharePoint Video Media CRM Server Media Streaming ERP Server Web Service Mail + FTP MySQL
Copyright 2014 Cavium Inc.
Workload Example/Use Case
Graph Search Social media data analysis (e.g. GraphLab, Giraph)
Web Caching Memcached
Media Serving Video server – e.g. “DASH” servers Web Serving LAMP + Java/Tomcat/Ruby…
Data Analytics Hadoop (Mahout, Nutch)
Distributed Search Elastic Search
Distributed Storage Ceph (Object/Block) and HDFS (File)
Data Serving NoSQL type databases (e.g. Cassandra, Hbase, …)
Copyright 2014 Cavium. Confidential. 0 20 40 60 80 100 120 140 160 data caching data serving map reduce media streaming web front end web search specint tpc-c tpc-e SpecINT2006 Scaleout workloads
Source data from : A Case for Specialized Processors for Scale-Out Workloads Michael Ferdman, Almutaz Adileh, Onur Kocberber, Stavros Volos, Mohammad Alisafaee, Djordje Jevdjic, Cansu Kaynak, Adrian Daniel Popescu, Anastasia Ailamaki, Babak Falsafi, In IEEE Micro's Top Picks, 2014
Cloud Workloads are Different
Example #1 – Very different instruction miss rates
Mana ge d Pu bl ic K ey Infra str ucture (M P K I)
mpki
Copyright 2014 Cavium Inc. 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6
Source data from : A Case for Specialized Processors for Scale-Out Workloads Michael Ferdman, Almutaz Adileh, Onur Kocberber, Stavros Volos, Mohammad Alisafaee, Djordje Jevdjic, Cansu Kaynak, Adrian Daniel Popescu, Anastasia Ailamaki, Babak Falsafi, In IEEE Micro's Top Picks, 2014Mike
Cloud Workloads are Different
Example #2 – IPC
In st ru cti on P er Cy cle ( ipc ) CPU Intensive Traditional BenchmarksCopyright 2014 Cavium Inc.
Cloud Workloads are Different
Example #3 – Performance Sensitivity LLC & L2 Caches
Source data from : A Case for Specialized Processors for Scale-Out Workloads Michael Ferdman, Almutaz Adileh, Onur Kocberber, Stavros Volos, Mohammad Alisafaee, Djordje Jevdjic, Cansu Kaynak, Adrian Daniel Popescu, Anastasia Ailamaki, Babak Falsafi, In IEEE Micro's Top Picks, 2014Mike
Copyright 2014 Cavium Inc.
• Optimum Choice and size of Caches are different for scale out workloads
• Lower IPC
Less parallelism available
Less benefit for Aggressive, out-of-order, wide issue machines
• Scaleout
highly parallel nature, more independent processing cores.
Large number of more efficient cores provide lower power and
more performance for Scale Out Workloads
Copyright 2014 Cavium Inc.
• Can fit about multiple cores in area of one complex core
• For Scale Out workloads, multiple cores provide the best
performance/unit area / watt
Complex Single Core
Multiple Cores
Copyright 2014 Cavium Inc.
• Cloud Benchmarks need to address more than CPU and memory
• Need to include efficiency of storage and network functions and IO
• Challenge remains to benchmark at scale
CPU
Memory
Network I/O Data Center I/O
Tr aditi onal CP U ven dor Storage CPU Memory Network I/O Data Center I/O
SoC v en dor Storage CPU Memory NetworkI/O Data Center I/O
Sy st em v en dor Storage CPU Memory Network I/O Data Center I/O
Clo ud v en dor Storage
Copyright 2014 Cavium Inc.
Introducing
Family of Workload Optimized Processors
• Up to 48 custom ARMv8 cores @ 2.5GHz
1S and 2S configuration
Upto 4 DDR3/4 Memory Controllers
Family Specific I/O’s
Standards based low latency Enet fabric
virtSOC™: Low latency end to end virtualization
Family Specific Accelerators 4 workload optimized families:
ThunderX_CP: Private/Public cloud, web search, web serving, web caching
ThunderX_ST: Cloud storage, Analytics, Distributed Databases
ThunderX_NT: Telco servers, NFV apps
ThunderX_SC: Secure cloud servers
Up to 48 2.5GHz ARM64 Cores 16MB Cache Sub System Cavium Coherent Processor Interconnect (CCPI™) Workload Accelerators Other IO Enet Fabric Up to 4x 72-bit DDR3/4 Controllers PCIe Gen3 PCIe Gen3 PCIe Gen3 SATAv3 40 GbE/ 100 GbE 40 GbE/ 100 GbE 10/40 100GbE Security
Copyright 2014 Cavium Inc.
Virtualization
Public & Private Clouds
Application Specific Servers
Highest VM density, Highest VM performance
• High core count, high memory bandwidth & low latency • virtSOC ™ - core to IO low latency virtualization
• Integrated high bandwidth, low latency network & storage IO
Compute, Network and storage virtualization
• virtSOC™ - Full virtualization of core, network and storage IO
Custom network, storage IO for each target workload Custom hardware accelerators for compute, networking, storage and security
Copyright 2014 Cavium Inc.