Secured Embedded Many-Core
Accelerator for Big Data Processing
Amey Kulkarni
PhD Candidate
Advisor: Professor Tinoosh Mohsenin
Energy Efficient High Performance Computing (EEHPC) Lab
University of Maryland, Baltimore County
Agenda
PENC: Power Efficient Nano Clusters Many-Core and its implementation results
Cognitive based Hardware Security for Many-Core architecture
Compressive Sensing (CS) OMP Reconstruction Algorithm Modifications and its
Implementation on 65nm CMOS Technology, PENC Many-Core, and FPGA
CS-based framework for Big Data acceleration on hardware platforms
Reduction in data transfers, and communication of secured Encrypted data Implementation on three different platform and evaluations in terms of hardware overhead
Integration of CS-based framework with Hadoop Platform for Big Data
PENC: Power Efficient Nano Clusters by EEHPC
PENC many-core acts as an accelerator to work with host processor for data
analytics and machine learning applications
Architecture, Simulator, Verilog ASIC implementation are fully developed by
EEHPC lab members
Composed of 64 processing clusters: 192 low power RISC Cores
Fully Placed and routed processors and routers in 65 nm, 1V CMOS having very
small Chip area 5.5 mm
2for 64 clusters. Total power of the chip @1GHz: 8.7 W
ISCAS’12,ISQED’13,ISLPED’14,ISCAS’16,GLSVLSI’16,JETC’16 NSF Grant# 00010145
Cognitive Security Framework for PENC Many-Core
FPGA Platform
Security Kernel & Interface
Trojan Insertion Module
Inter-Cluster Trigger Intra-Cluster Trigger CLKMC CLKADM CLKADM Security Kernel & Interface Many-Core Platform (64-Core)
Cluster 5 Cluster 6 Cluster 7 Cluster 8 Cluster 11 Cluster 12 Cluster 9 Cluster 10 Cluster 15 Cluster 16 Cluster 13 Cluster 14 Cluster 3 Cluster 4 Cluster 1 Cluster 2 R1 R2 Attack Detection Module R1 R1 R1 Attack Detection Module Feature Sample Core Feed-Back Core Enable Attack Detection Module Feature Sample Core Feed-Back Core Enable Feature Sample Core Feed-Back Core Enable Attack Detection Module Feature Sample Core Feed-Back Core Enable
Test Setup for PENC Many-Core Platform (64-Core), where Attack Detection Module implemented using Online
Machine Learning technique to prevent unexpected attack
JETC’16, ISQED’16, HOST’16
Assumptions: Processing cores and memories are safe , the Trojan is inserted at
Design Phase triggers malicious activity on router internally at run-time
Detects three different Denial-of-Service attacks
Hardware area overhead of only 0.26% and requires 3 cycles for Trojan detection,
performs 2.4x faster as compared to state-of-the-art implementation
We propose platform independent and reconfigurable OMP CS Reconstruction
Algorithm (experimented on PENC, FPGA, and GPU)
Analysis of OMP algorithm for 1024x1024 size image on PENC Many-Core
OMP CS Reconstruction Algorithm
Architecture of OMP CS Reconstruction Algorithm
Analysis of OMP algorithm on Xilinx Virtex-7 FPGA
Fixed Point Hardware Implementation
Compressive Sensing (CS): OMP Reconstruction Algorithm
GLSVLSI’14 ISCAS’15
Compressive Sensing (CS): OMP Reconstruction Algorithm
We propose two different modifications to OMP CS Reconstruction Algorithm,
Gradient Descent OMP (GD-OMP) reduces complexity of Least Square kernel
Hard Thresholding – OMP (HT-OMP), reduces complexity of Identification kernel
Architecture of GD-OMP Algorithm Architecture of HT-OMP Algorithm
Architecture Signal Size Max Freq (MHz) Reconstruction Time (µS) Area (mm2) ADP (mm2 - µs) OMP (base) [Jerome et.al.] 256 165 13.69 0.69 9.44 HT – OMP (This Work) 256 317 9.32 0.63 5.87 (1.6x) GD – OMP (This Work) 256 317 12.52 0.40 5.01 (1.9x)
ASIC Implementation Analysis on 65nm CMOS, 1V technology
Quality of OMP CS Reconstruction
CS-based Framework Implementation on Different Platforms
Platform Image Size Chip Area (mm2) Power (mW) Execution Time (ms) ARM CPU (28nm,0.9V) 2MB 16 12.75 378,120 Nvidia Jetson TK1 GPU (28nm,0.9V) 2MB 37 9.52 169,225 PENC Many-Core (65nm,1V) 2MB 5.5 8.67 38,019 CS-based framework is fully implemented for the
Image reconstruction and Face Detection
application on NVIDIA TK1 CPU+GPU platform and PENC many-core
Compared to CPU and GPU implementations, PENC
achieves 15x and 200x less energy consumption and 8x and 177x faster execution time
Power Measurement Setup
Current Analysis on ARM CPU
Current Analysis on K1 GPU Quality Analysis of CS-based Framework
CS-based Framework for Big Data Acceleration using
Secured PENC on Hadoop Platform
We propose compressive sensing (CS) along with PENC accelerator to reduce
data communication and storage in big data streaming by up to 70%.
CS-based framework with PENC has been tested for machine learning and data
analytics algorithms. e.g Health monitoring, convolutional neural networks,
deep learning, statistical analysis of sparse and dense matrices
Framework has been implemented on low power Jetson GPU, ARM CPU & PENC
Reconstruction Quality Analysis
Streaming
Data
Publications
1) Amey Kulkarni, Youngok Pino, Matthew French and Tinoosh Mohsenin,"Adaptive
Real-time Trojan Detection Framework through Machine Learning”, in Hardware Oriented
Security and Trust (HOST), 2016 IEEE International Symposium on ,3- 5 May 2016
2) Amey Kulkarni, Ali Jafari, Chris Sagedy and Tinoosh Mohsenin," Sketching-Based
High-Performance Biomedical Big Data Processing Accelerator", 49th ISCAS 2016,Canada, (Invited Talk) May2016
3) Amey Kulkarni, Youngok Pino and Tinoosh Mohsenin," SVM-based Real-Time
Hardware Trojan Detection for Many-Core Platform", in 17th International Symposium on Quality Electronic Design (ISQED), March 2016
4) Amey Kulkarni, Youngok Pino, Matthew French and Tinoosh Mohsenin,"Real-Time
Anomaly Detection Framework for Many-Core Router through Machine Learning Techniques", ACM Journal on Emerging Technologies in Computing Systems
5) Amey Kulkarni, Ali Jafari, Colin Shea, and Tinoosh Mohsenin, "CS-based Secured Big
Data Processing on FPGA“, 24th Annual IEEE Symposium on Field-Programmable Custom Computing Machines, 2016. Washington DC, USA.
6) Amey Kulkarni, Tahmid Abtahi, Emily Smith and Tinoosh Mohsenin, " Low Energy
Sketching Engines on Many-Core Platform for Big Data Acceleration“, in Proceedings of the 26th Edition of the Great Lakes Symposium on VLSI, GLSVLSI'16. Boston, MA, USA.
Publications
7) Amey Kulkarni, and Tinoosh Mohsenin," Accelerating Compressive Sensing
Reconstruction OMP Algorithm with CPU, GPU, FPGA and Domain Specific Many-Core", 48th ISCAS 2015,Portugal, May2015
8) Tawana Khawari, Amey Kulkarni, Abbas Rahimi, Tinoosh Mohsenin and Houman
Homayoun "Energy-Efficient Mapping of biomedical applications on Domain-Specific Accelerator under Process Variation", International Symposium on Low Power
Electronics and Design,ISLPED14
9) Amey Kulkarni, Houman Homayoun and Tinoosh Mohsenin, " A Parallel and
Reconfigurable Architecture for Efficient OMP Compressive Sensing Reconstruction“, 24th GLSVLSI 2014,Houston, Texas, USA, May2014 (27.32% Acceptance Rate)
10) Amey Kulkarni, Colin Shea, Tahmid Abtahi and Tinoosh Mohsenin, "Low Overhead
CS-based Heterogeneous Framework for Big Data Acceleration“, ACM Transaction on Embedded Computing Systems 2016, (Submitted)