1
Architecture Support for
Big Data Analytics
Ahsan Javed Awan EMJD-DC (KTH-UPC)
(http://uk.linkedin.com/in/ahsanjavedawan/)
2
Motivation
Why should we care?
Motivation
Cont...
● A mismatch between the characteristics of emerging workloads and the underlying hardware.
– M. Ferdman et-al, “Clearing the clouds: A study of emerging scale-out workloads on modern
hardware,” in ASPLOS 2012.
– Z. Jia, et-al “Characterizing data analysis workloads in data centers,” in IISWC 2013.
– C. Zheng et-al, “Characterizing os behavior of scale-out data center workloads.” in WIVOSCA
2013.
– M. Dimitrov et al, “Memory system characterization of big data workloads,” in BigData
Conference, 2013.
– Z. Jia et-al, “Characterizing and subsetting big data workloads,” in IISWC 2014
– A. Yasin et-al, “Deep-dive analysis of the data analytics workload in cloudsuite,” in IISWC 2014. – L. Wang et-al, “Bigdatabench: A big data benchmark suite from internet services,” in HPCA,
2014.
4
Motivation
Performance Characterization of In-Memory Data Analytics on a Modern Cloud Server
5
Motivation
Cont...Our Focus
Our Focus Improve the single node performance
in scale-out configuration Phoenix ++, Metis, Ostrich, etc.. Hadoop, Spark, Flink, etc..
6
Progress Meeting 12-12-14
Which Scale-out Framework ?
Our Approach
● A three fold analysis method at Application, Thread and
Micro-architectural level
– Tuning of Spark internal Parameters
– Tuning of JVM Parameters (Heap size etc..) – Concurrency Analysis
– General Architectural Exploration
8
Our Approach
Benchmarks
3GB of Wikipedia raw datasets, Amazon Movies Reviews and numerical records have been used
Our Hardware Configuration
Machine Details
Hyper Threading and Turbo Boost is disabled
10
Our Approach
Multicore Scalability of Spark
Application Level Performance
12
Multicore Scalability of Spark
Stage Level Performance
● Shuffle Map Stages don't scale beyond 12 threads
across different workloads
● No of concurrent files open in Map-side shuffling is
C*R where C is no of threads in executor pool and R is no of reduce tasks
Multicore Scalability of Spark
Task Level Performance
14
CPU Utilization is not
scaling with performance
16
Is there any Work Time
Inflation ??
How does Micro-architecture contribute to
Work time inflation ??
18
20
Key Findings
● More than 12 threads in an executor pool does not yield significant
performance
● Spark runtime system need to be improved to provide better load
balancing and avoid work-time inflation.
● Work time inflation and load imbalance on the threads are the
scalability bottlenecks.
● Removing the bottlenecks in the front-end of the processor would
not remove more than 20% of stalls.
● Effort should be focused on removing the memory bound stalls
since they account for up to 72% of stalls in the pipeline slots.
● Memory bandwidth of current processors is sufficient for
22
Motivation
How Data Volume Affects Spark Based Data Analytics on a Scale-up Server
Motivation
Do Spark based data analytics benefit from using larger scale-up servers
24
Motivation
Motivation
26
Motivation
Motivation
28
Motivation
Motivation
How does data size affects micro-architectural performance ?
30
Motivation
How Data Volume Affects Spark Based Data Analytics on a Scale-up Server
32
Key Findings
● Spark workloads do not benefit significantly from executors with
more than 12 cores.
● The performance of Spark workloads degrades with large volumes
of data due to substantial increase in garbage collection and file I/O time.
● With out any tuning, Parallel Scavenge garbage collection scheme
outperforms Concurrent Mark Sweep and G1 garbage collectors for Spark workloads.
● Spark workloads exhibit improved instruction retirement due to
lower L1 cache misses and better utilization of functional units inside cores at large volumes of data.
● Memory bandwidth utilization of Spark benchmarks decreases
with large volumes of data and is 3x lower than the available off-chip bandwidth on our test machine
34
Our Approach
● A. J. Awan, M. Brorsson, V. Vlassov, and E. Ayguade,
“Performance charaterization of in-memory data analytics on a mordern cloud server,” in 5th International IEEE Conference on
Big Data and Cloud Computing, 2015.
● A. J. Awan, M. Brorsson, V. Vlassov, and E. Ayguade, “How
Data Volume Affects Spark Based Data Analytics on a Scale-up Server”, in 6th International Workshop on Big Data
Benchmarking, Performance Optimization and Emerging Hardware (BpoE) held in conjunction with VLDB, 2015.
Motivation
Future DirectionsNUMA Aware Task Scheduling
Cache Aware Transformations
Exploiting Processing In Memory Architectures
HW/SW Data Prefectching