• No results found

Real-Time Anomaly Detection from Edge to HPC-Cloud

N/A
N/A
Protected

Academic year: 2019

Share "Real-Time Anomaly Detection from Edge to HPC-Cloud"

Copied!
32
0
0

Loading.... (view fulltext now)

Full text

(1)

Real-Time Anomaly Detection

from Edge to HPC-Cloud

September 6,

2018

Prof. Judy Qiu

Intelligent Systems Engineering Department

Indiana University

(2)

Gartner’s report on Strategic Technology Trend for

(3)
(4)

Future of Cloud Computing

(5)

HPC-Cloud

Software Tools: Harp-DAAL

(6)

HPC-Cloud

Data Analytics and Computing Ecosystem Compared

Application Level

Middleware and Management

System Software

Cluster Hardware

Mahout, R and Applications Applications and Community Codes

C lou d Se rvic es (e. g., A WS) Zoo kee p er (c oo rd in at ion )

Hive Pig Sqoop Flume

Storm A V R O Map-Reduce

Hbase BigTable (key-value store)

HDFS (Hadoop File System)

Virtual machines and Cloud Services (optional)

Linux OS variant

FORTRAN, C, C++, and IDEs

Domain-specific libraries MPI/OpenMP + Accelerator Tools Numerical Libraries Performance and Debugging

(such as PAPI)

Lustre (Parallel File System)

Batch Scheduler (such as SLURM)

System Monitoring

Tools

Linux OS variant

Ethernet Switches Local Node

Storage

Commodity X86 Racks

Infiniband + Ethernet Switches

SAN + Local Node Storage

X86 Racks + GPUs or Accelerators

Data Analysis Ecosystem Computational Science Ecosystem

(7)

HPC-Cloud |

Harp and Harp-DAAL allow our data analytics to be scalable and interoperable across a range of computing systems, including clouds (Azure, Amazon), clusters (Haswell, Knights Landing) and supercomputers (IU Big Red II).

(8)

HPC-Cloud:

(9)

Example: K-means Clustering

The Allreduce Computation Model

Model

Worker Worker Worker

broadcast reduce

allreduce

rotate push & pull

allgather regroup

When the model size is small When the model size is large but can still be held in each machine’s memory

(10)

High Performance Data Analytics

Harp-DAALprovides fast machine learning solutions for Big Data applications. By leveraging Harp’s inter-node

communication innovation and Intel’s highly optimized computation kernel, it delivers 2x to 15x speedups over

other frameworks on Intel high-end servers.

K-means

Yahoo! Flickr, including 100 million images, each with 4096 dimensional deep features

(30xspeedup)

MF-SGD

Twitter with 44 million vertices, 2 billion edges, subgraph template size of 10 to 12 vertices

(3xspeedup)

Subgraph Counting

Twitter with 44 million vertices, 2 billion edges, subgraph templates of 10 to 12 vertices

(11)

Taxonomy for Machine Learning Algorithms

SVD, PCA, LR, QR

Linear Algebra Kernel Classification General Linear Model Kernel Method Nearest Neighbor Decision Tree Factorizatio n Machine Graphical Model Neural Networks

GD, SGD, LBFGS, CCD

Numerical Optimization

EM, VB ….

Expectation Maximization

Gibbs Sampling, Metropolis-Hastings

Markov Chain Monte Carlo

Clustering Regression Structure Learning Recommendati on Dimension Reduction Model Structure Level Task Level Solver Level

Optimization and related issues

• Task level only can't capture the traits of computation

• Model is the key for iterative algorithms. The structure (e.g. vectors, matrix, tree, matrices) and size are critical for performance

(12)

MatrixFactorization-SGD

SubGraph Counting

Computer Vision

Complex Networks Bioinformatics

Text Mining

(13)

Real-Time Analytics

Anomaly Detection

(14)

IndyCar

• The IndyCar Series is the premier level of open-wheel racing in North America.The series' premier event is the Indianapolis 500.

(15)

Timing and Score Data

• Sensors in the cars and under the track.

• Antenna and communication system.

• Telemetry data(including the many performance information of the cars like speed, gear,

brake, throttle,etc) stream into the on-site computer system in a real-time fasion.

Transmitter placed equally on all cars (35 inches from tip of nose)

Detection Antenna under the track (Coax)

24 inches wide–up to 65 feet across ¾ inch below the surface

Track Side Decoder (TSU) Back-up System Main System Results Module (Pit Video) (Track VIdeo) Database Module (Pit Feed) (Live Internet) (Historical data) Distribution Module (ABC/ESPN) Track scoreboards and pylons Gameboy Systems Print Module (team reports) (media reports) Operator (Main) Operator (Backup)

PL/PO: Pit Lane Entrance & Exit

T1-T2-T3-T4: Intermediate Sectors

T1T/T3T: Velocity Trap (Top Speed Measurement)

SF/SFP: Start Finish Line (Track & Pit Lane)

SS1-SS2: Short Shoot Intermediators (Indy)

(16)

Dataset

The INDYCAR Timing system

supports retrieving timing data

from the primary timing system

serial or sequential data feed for

live data and report querying for

historical or archived data.

The

Results Protocol

is designed to

deliver more detailed results

information through the use of a

single record command.

Example: one Indianapolis 500 car

race on the 28th of May 2017

(17)

Problem

" And we want to know the data corresponding to the anomaly; when car got into problems what kind of event it is and what is causing it."

"We want to know if it’s an expected event or a

minor deviation that we need to be worried about. Helps race control people. "

(18)

• Anomaly Detection: Learning algorithms themselves can only find the abnormal pattern in the data with best efforts under predefined assumption

of what is “normal”.

• Correlation Analysis: Learning algorithms can find out the

“events” from data and mine

correlation relationship among the events.

(19)
(20)

Axon, Dendrite and Synapse

• Neurons have specialized projections called dendrites and axons.

• Dendrites bring information to the cell body and axons take information away from the cell body.

(21)

Hierarchical Temporal Memory(HTM)

A. The neuron model used in

most artificial neural

networks has few synapses

and no dendrites

B. Different source of input:

feedforward, feedback and

context

(22)

Representing High-order Context

A. The layer consists of a set of mini-columns, with each mini-column containing multiple neurons.

B. Each sequence element invokes a sparse set of mini-columns, only three in this illustration. C. Learning with context input, the inputs invoke

the same mini-columns but only one cell is active in each column. Because C'and C''are unique, they can invoke the correct high-order prediction of either Y or D depending on the input from two time steps ago.

(23)

Anomaly Detection Based on HTM

• The input time series xt are fed to the HTM component. It models temporal patterns in a(xt) and

output a prediction in π(xt).

(24)

Anomaly Detection

(25)

Correlation Analysis

• The anomaly occurs

around 15:03 pm, where the RPM of car #9 totally disappeared. In fact, car #9 got

totaled due to a

collision with car #77. The others cars,

including car #24, all slowed down after the crash.

(26)
(27)

System Architecture

Data collector

• collects the input from the multiple sensors

• stores data HDFS or MongoDB

Data Preprocessor

• accesses and filters data

• stores preprocessed data in HDFS or MongoDB

Anomaly Detection Engine

• involves high performance batch and streaming data processing frameworks

(28)

Streaming Results using Apache Storm

• Performance metrics for single node execution with parallelism =1 for two input rates: 10 and 20 message/sec

(29)

Real-Time Simulation Data Analysis

• These Harp-DAAL HPC kernels could well serve in accelerating the process of clustering the nanoBIO simulation results to obtain different nanoparticle trajectories.

• NP Self-Assembly Lab simulates assembly of nanoparticles and generates 1 million high dimensional trajectory data. Harp-DAAL takes the input data and runs clustering over different nanoparticle

(30)

Source codes became available on Github in February, 2017.

• Harp-DAAL follows the same

standard of DAAL’s original codes

• Twelve Applications

▪ Harp-DAAL Kmeans

▪ Harp-DAAL MF-SGD

▪ Harp-DAAL MF-ALS

▪ Harp-DAAL SVD

▪ Harp-DAAL PCA

▪ Harp-DAAL Neural Networks

▪ Harp-DAAL Naïve Bayes

▪ Harp-DAAL Linear Regression

▪ Harp-DAAL Ridge Regression

▪ Harp-DAAL QR Decomposition

▪ Harp-DAAL Low Order Moments

▪ Harp-DAAL Covariance Harp-DAAL: Prototype and Production Code

(31)
(32)

Acknowledgements

We gratefully acknowledge support from NSF, IU and Intel Parallel Computing Center (IPCC) Grant.

Langshi Cheng , Bo Peng , Supun Kamburugamuve, Sahil Tyagi , Sabra Ossen, Lynne Wang, Tiana Deckard,

Robert Henschel, Craig Stewart, Shaojuan Zhu, Lisa Smith

Intelligent Systems Engineering School of Informatics and Computing

References

Related documents

II Contributions 52 3 Contributions to network science methods and the multi-level analysis of functional brain networks 53 3.1 Motif detection in samples of binary directed

 Compared with nucleophilic activation provided by canonical metal catalysis, gold offers orthogonal electrophilic activation..

Primary data about the various aspects of the scheme and its utilisation were collected by means of surveys administered to selected representatives of various stakeholders such as

The Swift/XRT spectrum obtained in October 9–11 has a power-law shape and is consistent with those observed in the coordinated XMM-Newton and NuSTAR observations in 2012.. In

Firstly, we show how feasible the species discrimination based on the averaged acoustic parameters of individual calls by means of Support Vector Machine (SVM) method.. Secondly,

The diversity of phenotypes displayed by the various breeds of livestock is controlled by an equally broad genetic diversity, which provides the opportunity for the selection of

The study concludes that, the threats incurred by Boko Haram sects on Nigerian state have profound security and economic implications; Boko Haram insurgents today

60,150/- (Rupees Sixty Thousand One Hundred and Fifty Only) through the download online Bank Challan Form in any branch of Allied Bank Limited of your City or by Bank Draft payable