• No results found

Big Learning Data Management and Data Analysis

N/A
N/A
Protected

Academic year: 2021

Share "Big Learning Data Management and Data Analysis"

Copied!
33
0
0

Loading.... (view fulltext now)

Full text

(1)

Das SCCH ist eine Initiative der Das SCCH befindet sich im

Big Learning – Data Management and

Data Analysis

... for industrial applications

Thomas Natschläger

+43 7236 3343 868

[email protected] www.scch.at

(2)

SCCH Key Facts

 application-oriented research organization

 initiated by institutes of the Johannes Kepler University Linz

 cooperation science - industry

 non-profit organization

constituted as „Ltd“

 owners

 Johannes Kepler University Linz

 Upper Austrian Research GmbH

 Association of Company Partners of SCCH

 ~ 60 employees (>80 with partners)

 5,7 mio euros income incl. subsidies in

business year 2010/2011

 founded in July 1999 in the realm of the K

plus Program

(3)

Research Topics

Process and Quality Engineering

 software engineering  software quality

 process and approaches

Rigorous Methods in Software Engineering

 software specification, verification, validation  formal methods (ASM, Event-B, etc.)

 process modeling, workflows  Models, Architectures and Tools

 software architecture

 model-based development

 integration of architecture in development  Knowledge-Based Vision Systems

 machine vision  object recognition  object tracking  Data Analysis Systems

 automated and intelligent data analysis  prediction and optimization

(4)

DAS - Data Analysis Systems

Topics

 Computational Models

 Semantic Knowledge Models

 Knowledge Discovery

 Machine Learning

 Stream Data Analysis

Data Warehousing  Data Management A pp lic ati on Dom ai ns A pp lic ati on Dom ai ns

(5)

DAS - Data Analysis Systems

Topics

 Computational Models

 Semantic Knowledge Models

Knowledge Discovery

Machine Learning

Stream Data Analysis

Data Warehousing  Data Management A pp lic ati on Dom ai ns A pp lic ati on Dom ai ns

(6)

Overview

Temporal Analytics on Big Data

 Applications

 Fault Detection

 Proposed Architecture

 Related Work

Learning Big Models

 Causal Inference

 Enabled by parallelization

(7)

Overview

Temporal Analytics on Big Data

 Applications

 Fault Detection

 Proposed Architecture

 Related Work

Learning Big Models

 Causal Inference

 Enabled by parallelization

(8)

Domain: Industrial Production

Subsystems generate

streams of sensor data

Stored in Production

Information Management

System

Analysis Tasks

 Quality Assurance  Process Optimization  Fault Detection  Fault Diagnosis  ... system 1 system 2 system i system n

PIMS

(9)

Selected References

voestalpine Stahl GmbH

 Analysis of continuous casting process

 Integration of expert knowledge

 „visual“ Data Mining, Interpretation

Böhler Edelstahl

 Quality analysis of high-grade steel production

unisoftware plus

 machine learning framework (mlf)

 Basis for many projects in the area of process analysis

Siemens Transformers Austria

 Optimization of power transformer cores

Voith Paper, SCA Laakirchen

 Analysis and optimization in paper production

Analysis tool PaperMiner

AMS Engineering

 Knowledge discover in discrete manufacturing

(10)

Domain: Machine Manufacturer

Machines at different

locations generate streams

of sensor data

Stored in data center

Analysis Tasks

 Usage Monitoring  Profile Analysis  Condition Monitoring  Fault Detection  Fault Diagnosis  ...

Data

Center

(11)

Domain: Decentralized Renewable Energy, Home Automation

Sensors of different kind at

each building generate

streams of sensor data

 Temperature  Solar radiation  Energy production  ...

Analysis Tasks

 Usage Monitoring  Profile Analysis  Condition Monitoring  Fault Detection  Fault Diagnosis

Data

Center

(12)

Application : Fault Detection for Renewable Energy Units

(near) real time detection of faults of units

 naturally temporal task

 => Data Stream Processing

profile analysis of units

 Need access to all units

 => central application

large amount of devices

 => Big Data

low false positive rate, i.e. good model

 needs considerable amount of historical data

 especially for long term drifts

(13)

Fault Detection Algorithms

A) Compare measured channels to a model

 Deviation indicate fault and its type

 A good model needs to be identified (learned)

 Typically using historical “good” data

B) Fit known model type

 e.g. ARX: 𝑦 𝑡 = 𝑎𝑘𝑦 𝑡 − 𝑘 + 𝑏𝑖,𝑘 𝑖,𝑘𝑥𝑖(𝑡 − 𝑘)

(14)

Evaluated Solution

Combination of

 Big Data Storage (BDS) for off-line MapReduce and

 Stream Processing Engine (SPE) for on-line, real-time

unit 1 unit 2 unit i unit n MUX BDS SPE

(15)

Fault Detection Method A

Compare measured channels to a mode

MapReduce is used to calibrate model on historical data

SPE applies model in user-defined operator (UDO)

REPLAY for testing

unit 1 unit 2 unit i unit n MUX BDS SPE Model Model MapReduce Read e.g. from RDBMS REPLAY

(16)

Fault Detection Method B

Fit known model structure to data

BDS supplies historical data for testing via REPLAY

SPE incrementally fits certain kind of regression model

unit 1 unit 2 unit i unit n MUX BDS SPE Model Mo del REPLAY

(17)

Stream Data Mining:

Incremental Algorithms

1. Process an example at a time, and inspect it only once

2. Use a limited amount of memory

3. Work in a limited amount of time

4. Be ready to predict at any time

(18)

Stream Data Mining:

Open Source Framework MOA

MOA: Massive Online Analysis

 WEKA community, Java

 Big Data stream mining (classification, regression, and clustering) in real time

 Can be easily used with e.g. Hadoop

 Extendable with new mining algorithms

 Goal: provide a benchmark suite for the stream mining community

18 © Software Competence Center Hagenberg GmbH

(19)

Discussion

General Setting

Units generate streams of sensor data (time,value)

 Central storage of data for analysis tasks

 Many analysis tasks are temporal in nature; e.g. fault detection

Implemented by current technology without much effort

REPLAY partially solves the problem of implementing

algorithms for MapReduce and SPE

Issues:

 Usage of multiple SPE per machine or combiner

 Integration of existing incremental learning tools such as MOA

(20)

Related Work: TiMR Framework

 Combination of M-R and SPE (DSMS)

 Temporal queries for off-line and on-line

 Implemented using StreamInsight and SCOPE/Dryad

Badrish Chandramouli, Jonathan Goldstein, and Songyun Duan. 2012. Temporal Analytics on Big Data for Web Advertising. In Proceedings of the 2012 IEEE 28th International Conference on Data Engineering (ICDE '12). IEEE Computer Society, Washington, DC, USA

(21)

Overview

Temporal Analytics on Big Data

 Applications

 Failure detection

 Proposed Architecture

 Related Work

Learning Big Models

 Causal Inference

 Enabled by parallelization

 Prediction und optimal control

Mo del Mo del Mo del Mo del Mo del Mo del Mo del Mo del Mo del Mo del Mo del Mo del Mo del Mo del

(22)

Setting

 Complex industrial process

 Limited knowledge about interdependencies

Goal

 E.g. Predict amount of TOC in wastewater for next 48h

Challenges

 Robustness of model

 Precision of model

 Several thousands of sensors => computational complexity

Approach

Identify causal model structure

Use parallelization to tackle computational complexity

22

Causal Models for

Prediction and Fault Detection

(23)

Linear Model

Various methods to estimate parameters

Prominent Method to estimate structure:

 Graphical Lasso (Friedman 2007, 2012) based on L1 regularized minimization of log-likelihood

(24)

X would “Granger Cause” Y if it contains information

useful in forecasting Y

Implemented by graphical lasso on time lagged variables

Work in progress

 Grouped Granger Graphical Lasso

 Detection of control loops

 Non-linear extensions

=> increases computational complexity

Extension to time: Granger Causality

(25)

Parallelization of

Machine Learning Algorithms

MapReduce (see first part of talk)

Good for data-parallel: Problems with iterative

algorithms and complex dependencies in the data

GraphLab

 intuitively expresses computational dependencies

applied to dependent records which are stored as vertices in a large distributed data-graph

GPGPU

 complex low level code (kernel) or:

 High-Level languages: SAC, Matlab, Mathematica ...

 Meta-Programming: PyCUDA / CL, ...

(26)

Parallelization of

Machine Learning Algorithms

MapReduce (see first part of talk)

data-parallel: Problems with iterative algorithms and

complex dependencies in the data

GraphLab

 intuitively expresses computational dependencies

applied to dependent records which are stored as vertices in a large distributed data-graph

GPGPU

 complex low level code (kernel) or:

 High-Level languages: SAC, Matlab, Mathematica ...

 Meta-Programming: PyCUDA / CL, ...

Hardware agnostic Parallel Patterns

 Esp. Parallel Patterns for Machine Learning

(27)

ParaPhrase

High-level design and implementation patterns

 useful parallelism for a wide range of parallel applications

 heterogeneous multicore/manycore systems

Hardware Abstraction

 Basis : FastFlow – Framework (Turin, Pisa)

General Purpose Patterns

 Master – Slave, Farm, Pipeline, work queue, data dependency

Domain Specific Patterns (SCCH, HLR Stuttgart)

 Suitability of generic patterns for machine learning

 ML - Patterns: pool oriented, graphical models patterns, time series, ...

27 © Software Competence Center Hagenberg GmbH

(28)

28 © Software Competence Center Hagenberg GmbH

Relevant Use-Cases / Project Competencies (selection)

TRUMPF Austria

 Improving precision of bending machines

K-Projekt SoftNet (I + II)

 Fault prediction in software systems

 „Mining Repositories“

K-Projekt PAC

 Process Analytic Chemestry

 Virtual sensors for chemical process analysis and control

BlueSky

 Locally optimized weather predictions

 Application : Energy Efficiency

Verbund

 Prediction of available water flow to optimize renewable energy

usage

(29)

Use Case:

Local Weather Prediction

Data sources

 Global Weather Models

 Local Sensors: Weather stations, power plante, ...

 Topographie, Expert knowledge

Goal

 Planning of events, maintenance, ...

 Basis for optimization of energy usage

0 2 4 6 0 2 4 6 -5 -2.5 0 2.5 5 0 2 4 6 Data collection Analysis Models Prediction

Expert Knowledge Alcohol

1 2 3 4 5 6 7 8 9 0111213141156171819102 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 11.41 11.73 12.05 12.37 12.69 13.01 13.33 13.65 13.97 14.29 14.61 0 20 40 60 80 100 -4 -3 -2 -1 0 1 2 20 40 60 80 100 -1 -0.5 0.5 1 9 10 11 12 13 14 15 16 17 18 46 47 48 49 9 10 11 12 13 14 15 16 17 18 46 47 48 49 925mb, 0.556939, 0.92949 Wien Linz St. Pölten Graz Salzburg Klagenfurt Innsbruck Eisenstadt Bregenz

(30)

Optimization of

Renewable Energy Usage

Goals

 Short Term: Inclusion of availability of renewable energy in energy planning and trading

(Water, Wind, Solar)

D CZ SK H SLO CH INN MUR DONAU ENNS DRAU S AL ZA C H INN Graz Oberaufdorf-Ebbs Nußdorf Braunau-Simbach Schärding-Neuhaus Passau-Ingling Jochenstein St. Pantaleon Mühlrading Staning Garsten-St. Ulrich Rosenau Ternberg Klaus Losenstein Großraming Weyer Schönau Ering-Frauenstein Egglfing-Obernberg Funsingau Urreiting Bischofshofen Kreuzbergmaut St. Veit Wallnerau Feistritz-Ludmannsdorf Rosegg-St. Jakob Villach Paternion Freudenau Greifenstein Altenwörth Aschach Ottensheim-Wilhering Abwinden-Asten Wallsee-Mitterk. Melk Ybbs-Persenbeug Ferlach-Maria Rain Lavamünd Annabrücke Edling Rabenstein Peggau Weinzödl Lebring Spielfeld Obervogau Gabersdorf Gralla Laufnitzdorf Pernegg Dionysen Friesach St. Georgen Fisching Bodendorf-Mur Triebenbach Landl Krippau Altenmarkt Mandling Kellerberg Schwabeck Mellach Leoben Legende:

Gemeinschaftskraftwerke der AHP Speicherkraftwerke der AHP Laufkraftwerke der AHP Beteiligungen des Verbund

Gerlos Bösdornau Häusling Mayrhofen Kaprun-Oberstufe Kaprun-Hauptstufe Schwarzach Reißeck-Kreuzeck Malta-Unterstufe Malta-Oberstufe Malta-Hauptstufe Bodendorf-Paal Sölk Salza St.Martin Arnstein Hieflau Roßhag HYSIM II (Drabek et al. 2002)

Snow melt, ground Humidity (Holzmann &

Nachtnebel 2002) Data Driven Models (z.B. Ridge Regression,

Neural Networks)

Data Driven Models (z.B. Ridge Regression,

Neural Networks)

Rainfall-Runoff-Model

(Hebenstreit 2000)

SAMBA: Optimal weighting of all models

Flow values, Precipitation / Temperature & Forecast

(31)

Summary

Temporal Analytics on Big Data

 Applications

 Failure detection

 Proposed Architecture

 Related Work (MOA, TiMR)

Learning Big Models

 Causal Inference

 Enabled by parallelization

 Prediction und optimal control

(32)

Veranstaltungstipp!

Mit geeigneter Strategie zur nachhaltigen

Softwarequalität: TRUST-IT

18. April, 09:00 - 14:00

Österreichische Computergesellschaft, Wien

Zielgruppe: Software-Entwicklungsleiter,

Prozessverantwortliche, Projektleiter,

Software-Qualitätsingenieure und Architekturverantwortliche.

(33)

Kontakt Dr. Holger Schöner +43 7236 3343 816 [email protected] www.scch.at DI Michael Zwick +43 7236 3343 843 [email protected] www.scch.at Dr. Thomas Natschläger +43 7236 3343 868 [email protected] www.scch.at

References

Related documents

The main wall of the living room has been designated as a "Model Wall" of Delta Gamma girls -- ELLE smiles at us from a Hawaiian Tropic ad and a Miss June USC

In support of the shea products value chain of West Africa, the Trade Hub and African Partners Network (Trade Hub) will work primarily, if not exclusively, in partnership with

Nearly 250 public radio stations air this national classical music service that operates 24/7, bringing the highest quality classical music programming to more than 2

and endurance, scapular movement, and muscle activity at the flexed shoulder

Op basis van de rapportage kan niet voor elke invulling van een capability afgeleid worden of deze effect heeft gehad op de adaptability van de operationele processen. “Soms

Kevin: there was a strain of reviews for the “What is Real” record which were like “Oh well they used to be great but now it’s just lame” Rob: that’s the downside, the

The sharing of new and innovative examples of how to use badges will be encouraged during the MOOC, being our main intention to promote a community of practice where it will

The next section summarizes at the national level the proportion of adoptions made from fiscal year 1996 through fiscal year 2003 with state agency involvement for which federal