Introducing EEMBC Cloud and Big Data
Server Benchmarks
Quick Background:
Industry-Standard Benchmarks for the Embedded Industry
EEMBC formed in 1997 as non-profit
consortium
Defining and developing application-specific
benchmarks
Targeting processors and systems
Expansive Industry Support
• >47 members
• >90 commercial licensees • >120 university licensees
BIG
DATA
INFLUX
General Characteristics of Cloud and Big Data
Drinking from the fire hose
Distribute data to many compute
nodes
• Graph analytics
• Hadoop – map reduce
• Unstructured data search and indexing
IOT
Traditional Method of Measuring
Server Performance
Single threaded program(s)
• Databases • Compilers • Interpreters
Single or a few machines
Most successful are
CPU/Memory (examples)
• Linpack • SpecInt ® • Lmbench • CoreMark • …How Cloud and Big Data Workloads Differ
CPU CPU/Memory Speed Transaction Access and Update DataScaleOut
Analysis –
Generate
Insight
Data sets t
ypically larger• Trending towards petabytes • Rapid growth
Many node
environment
• Distributed data (e.g. HDFS)
• Distributed computation
Nodes often special
purpose
• Webserver
• Database server • Caching layer
Introducing EEMBC Cloud and Big
Data Server Benchmark Working
Group
Goal: Provide an industry standard suite of
performance and efficiency benchmarks that
address the needs of ODMS and OEMS
providing compute systems to the scaleout
datacenter marketplace and their consumers.
Phased rollout starting with standalone
workloads
• First phase will comprise graph analytics, memory caching, media serving
Chaired by Narayan Iyengar, Lead Software
Engineer at Cavium, Inc.
Industry Benchmark Qualifications
Automated install and build process ensures
consistent execution (multiplatform support)
Relatively low cost to implement
• Does not require a large or expensive infrastructure)
Predictable performance at scale
Repeatable, verifiable, and certifiable - as in
other EEMBC benchmarks
Memory Caching Analysis
Basics
• Caching is used in data centers to optimize performance and energy usage
• Memcached is middleware that provides a caching layer to a web framework
• http://en.wikipedia.org/wiki/Memcached
EEMBC version
• Provide web workloads that mimic real-world scenarios
• Provide a mechanism to run repeatable and verifiable experiments
Media Serving
Basics
• Real-time video streaming function for on-demand access using large server clusters to packetize and transmit media files
• Automatically adjust quality based on various
pre-encoded formats and bit-rates to suit wide client base. • Example media streaming services include NetFlix,
YouTube, Pandora
EEMBC version
• Simulate multiple users or requests simultaneously and asynchronously making requests
• Provide a mechanism to run repeatable and verifiable experiments for how well clients are being serviced
Graph Analytics
Basics
• Take big-data data sets (e.g. social media output) and analyze using graph algorithms (find connectivity,
common qualities to nodes).
• Example is page rank; deriving website popularity from social data.
• Also used for applications such as Facebook and Twitter
EEMBC version
• Standardized implementation of page rank using GraphLab
• Provide a mechanism to run repeatable and verifiable experiments on a multi-node platform
EEMBC’S Expanding Scope
CPU Memory Network I/O Data Center I/O Tr ad itio nal EE M BC Ta rg et - C PU V en do r Storage CPU Memory Network I/O Data Center I/O Ex pan de d E EM BC Ta rg et - So C Ven do r Storage CPU Memory NetworkI/O Data Center I/O EE M BC T ran sit io n - S ys tem V en do r Storage CPU Memory Network I/O Data Center I/O Re quir es Be nc hma rk S ca ling - Clo ud V endo r StorageEEMBC’S Expanding Scope
SoC integration requires testing more than
CPU and memory
Focus on real-world benchmarking
• Single purpose servers/clusters run a small set of applications
Hardware configured for an application
• Memory Size
• CPU Scalar Performance vs. Throughput • Storage Capacity
Cloud and Big Data Benchmarks the
EEMBC Way
EEMBC has a long track record of producing reliable, equitable benchmarks
Open, multi-partner cooperative working group
• Participating members include Cavium, Imagination Technologies, Intel, and others (pending permission to announce)
Join this working group and help influence the future of cloud and big data benchmarking
cpu benchmarks aren’t fit for big data
and cloud
SPECInt2006 – today’s server CPU benchmark standard
• A mixture of cache friendly and very memory intensive applications from a variety of fields
– CPU focused (scalar performance) – Not a distributed application
– Essentially no I/O (network or disk)
– No operating system or hypervisor impact
• SpecRate is simple aggregation of SpecInt®
– No cooperative tasks
– No sharing, no communication
EEMBC MultiBench™
• Similar to SPECInt2006 with the exception of operating system impact and inclusion of cooperative tasks
Why Transaction oriented benchmarks are not suitable for cloud and big data
TPC
• Includes system overhead
• Can be large (and expensive to setup and run) • Generally - requires a big system
SpecJBB
®• Requires JAVA - is it a JAVA benchmark?
Other Benchmarks
Spec OSG Working Group*
• Addresses “Cloud” environment (SaaS, PassS, IaaS) • Hardware and cloud providers and cloud customers • Black box and white box environments
• Agility, elasticity, provisioning, etc.
EPFL CloudSuite
• Specific sets of workloads
• Does not address SaaS, PaaS or IaaS specifically
• Great for academic focus, but not designed for ease of use, verification, and validity
significantly different instruction miss rate
0 20 40 60 80 100 120 140 160 data caching data serving map reduce media streamingsat solver web front end
web search
specint tpc-c tpc-e
Instruction Misses Per Thousand Instructions
SpecINT2006
CloudSuite
• See Ferdman et al, ACM transactions on computer systems,Nov 2012 (compares Cloudsuite characteristics to Spec, TPC, Parsec)
– Large I cache footprint – Lower IPC