Secured Embedded Many-Core Accelerator for Big Data Processing

(1)

Secured Embedded Many-Core

Accelerator for Big Data Processing

Amey Kulkarni

PhD Candidate

Advisor: Professor Tinoosh Mohsenin

Energy Efficient High Performance Computing (EEHPC) Lab

University of Maryland, Baltimore County

(2)

Agenda



_{PENC: Power Efficient Nano Clusters Many-Core and its implementation results}



_{Cognitive based Hardware Security for Many-Core architecture}



_{Compressive Sensing (CS) OMP Reconstruction Algorithm Modifications and its}

Implementation on 65nm CMOS Technology, PENC Many-Core, and FPGA



_{CS-based framework for Big Data acceleration on hardware platforms}

 _{Reduction in data transfers, and communication of secured Encrypted data}

 Implementation on three different platform and evaluations in terms of hardware overhead



_{Integration of CS-based framework with Hadoop Platform for Big Data}

(3)

PENC: Power Efficient Nano Clusters by EEHPC



PENC many-core acts as an accelerator to work with host processor for data

analytics and machine learning applications



_{Architecture, Simulator, Verilog ASIC implementation are fully developed by}

EEHPC lab members



_{Composed of 64 processing clusters: 192 low power RISC Cores}



_{Fully Placed and routed processors and routers in 65 nm, 1V CMOS having very}

small Chip area 5.5 mm

2

_{for 64 clusters. Total power of the chip @1GHz: 8.7 W}

ISCAS’12,ISQED’13,ISLPED’14,ISCAS’16,GLSVLSI’16,JETC’16 NSF Grant# 00010145

(4)

Cognitive Security Framework for PENC Many-Core

FPGA Platform

Security Kernel & Interface

Trojan Insertion Module

Inter-Cluster Trigger Intra-Cluster Trigger CLKMC CLKADM CLKADM Security Kernel & Interface Many-Core Platform (64-Core)

Cluster 5 Cluster 6 Cluster 7 Cluster 8 Cluster 11 Cluster 12 Cluster 9 Cluster 10 Cluster 15 Cluster 16 Cluster 13 Cluster 14 Cluster 3 Cluster 4 Cluster 1 Cluster 2 R1 R2 Attack Detection Module R1 R1 R1 Attack Detection Module Feature Sample Core Feed-Back Core Enable Attack Detection Module Feature Sample Core Feed-Back Core Enable Feature Sample Core Feed-Back Core Enable Attack Detection Module Feature Sample Core Feed-Back Core Enable

Test Setup for PENC Many-Core Platform (64-Core), where Attack Detection Module implemented using Online

Machine Learning technique to prevent unexpected attack

JETC’16, ISQED’16, HOST’16



_{Assumptions: Processing cores and memories are safe , the Trojan is inserted at}

Design Phase triggers malicious activity on router internally at run-time



Detects three different Denial-of-Service attacks



Hardware area overhead of only 0.26% and requires 3 cycles for Trojan detection,

performs 2.4x faster as compared to state-of-the-art implementation

(5)



_{We propose platform independent and reconfigurable OMP CS Reconstruction}

Algorithm (experimented on PENC, FPGA, and GPU)

Analysis of OMP algorithm for 1024x1024 size image on PENC Many-Core

OMP CS Reconstruction Algorithm

Architecture of OMP CS Reconstruction Algorithm

Analysis of OMP algorithm on Xilinx Virtex-7 FPGA

Fixed Point Hardware Implementation

Compressive Sensing (CS): OMP Reconstruction Algorithm

GLSVLSI’14 ISCAS’15

(6)

Compressive Sensing (CS): OMP Reconstruction Algorithm



_{We propose two different modifications to OMP CS Reconstruction Algorithm,}

 _{Gradient Descent OMP (GD-OMP) reduces complexity of Least Square kernel}

 _{Hard Thresholding – OMP (HT-OMP), reduces complexity of Identification kernel}

Architecture of GD-OMP Algorithm Architecture of HT-OMP Algorithm

Architecture Signal Size Max Freq (MHz) Reconstruction Time (µS) Area (mm2₎ ADP (mm2_{- µs)} OMP (base) [Jerome et.al.] 256 165 13.69 0.69 9.44 HT – OMP (This Work) 256 317 9.32 0.63 5.87 (1.6x) GD – OMP (This Work) 256 317 12.52 0.40 5.01 (1.9x)

ASIC Implementation Analysis on 65nm CMOS, 1V technology

Quality of OMP CS Reconstruction

(7)

CS-based Framework Implementation on Different Platforms

Platform Image Size Chip Area (mm2₎ Power (mW) Execution Time (ms) ARM CPU (28nm,0.9V) 2MB 16 12.75 378,120 Nvidia Jetson TK1 GPU (28nm,0.9V) 2MB 37 9.52 169,225 PENC Many-Core (65nm,1V) 2MB 5.5 8.67 38,019

 _{CS-based framework is fully implemented for the}

Image reconstruction and Face Detection

application on NVIDIA TK1 CPU+GPU platform and PENC many-core

 _{Compared to CPU and GPU implementations, PENC}

achieves 15x and 200x less energy consumption and 8x and 177x faster execution time

Power Measurement Setup

Current Analysis on ARM CPU

Current Analysis on K1 GPU Quality Analysis of CS-based Framework

(8)

CS-based Framework for Big Data Acceleration using

Secured PENC on Hadoop Platform



_{We propose compressive sensing (CS) along with PENC accelerator to reduce}

data communication and storage in big data streaming by up to 70%.



_{CS-based framework with PENC has been tested for machine learning and data}

analytics algorithms. e.g Health monitoring, convolutional neural networks,

deep learning, statistical analysis of sparse and dense matrices



Framework has been implemented on low power Jetson GPU, ARM CPU & PENC

Reconstruction Quality Analysis

Streaming

Data

(9)

Publications

1) Amey Kulkarni, Youngok Pino, Matthew French and Tinoosh Mohsenin,"Adaptive

Real-time Trojan Detection Framework through Machine Learning”, in Hardware Oriented

Security and Trust (HOST), 2016 IEEE International Symposium on ,3- 5 May 2016

2) Amey Kulkarni, Ali Jafari, Chris Sagedy and Tinoosh Mohsenin," Sketching-Based

High-Performance Biomedical Big Data Processing Accelerator", 49th ISCAS 2016,Canada, (Invited Talk) May2016

3) Amey Kulkarni, Youngok Pino and Tinoosh Mohsenin," SVM-based Real-Time

Hardware Trojan Detection for Many-Core Platform", in 17th International Symposium on Quality Electronic Design (ISQED), March 2016

4) Amey Kulkarni, Youngok Pino, Matthew French and Tinoosh Mohsenin,"Real-Time

Anomaly Detection Framework for Many-Core Router through Machine Learning Techniques", ACM Journal on Emerging Technologies in Computing Systems

5) Amey Kulkarni, Ali Jafari, Colin Shea, and Tinoosh Mohsenin, "CS-based Secured Big

Data Processing on FPGA“, 24th Annual IEEE Symposium on Field-Programmable Custom Computing Machines, 2016. Washington DC, USA.

6) Amey Kulkarni, Tahmid Abtahi, Emily Smith and Tinoosh Mohsenin, " Low Energy

Sketching Engines on Many-Core Platform for Big Data Acceleration“, in Proceedings of the 26th Edition of the Great Lakes Symposium on VLSI, GLSVLSI'16. Boston, MA, USA.

(10)

Publications

7) Amey Kulkarni, and Tinoosh Mohsenin," Accelerating Compressive Sensing

Reconstruction OMP Algorithm with CPU, GPU, FPGA and Domain Specific Many-Core", 48th ISCAS 2015,Portugal, May2015

8) Tawana Khawari, Amey Kulkarni, Abbas Rahimi, Tinoosh Mohsenin and Houman

Homayoun "Energy-Efficient Mapping of biomedical applications on Domain-Specific Accelerator under Process Variation", International Symposium on Low Power

Electronics and Design,ISLPED14

9) Amey Kulkarni, Houman Homayoun and Tinoosh Mohsenin, " A Parallel and

Reconfigurable Architecture for Efficient OMP Compressive Sensing Reconstruction“, 24th GLSVLSI 2014,Houston, Texas, USA, May2014 (27.32% Acceptance Rate)

10) Amey Kulkarni, Colin Shea, Tahmid Abtahi and Tinoosh Mohsenin, "Low Overhead

CS-based Heterogeneous Framework for Big Data Acceleration“, ACM Transaction on Embedded Computing Systems 2016, (Submitted)