Das SCCH ist eine Initiative der Das SCCH befindet sich im
Big Learning – Data Management and
Data Analysis
... for industrial applications
Thomas Natschläger
+43 7236 3343 868
[email protected] www.scch.at
SCCH Key Facts
application-oriented research organization
initiated by institutes of the Johannes Kepler University Linz
cooperation science - industry
non-profit organization
constituted as „Ltd“
owners
Johannes Kepler University Linz
Upper Austrian Research GmbH
Association of Company Partners of SCCH
~ 60 employees (>80 with partners)
5,7 mio euros income incl. subsidies in
business year 2010/2011
founded in July 1999 in the realm of the K
plus Program
Research Topics
Process and Quality Engineering
software engineering software quality
process and approaches
Rigorous Methods in Software Engineering
software specification, verification, validation formal methods (ASM, Event-B, etc.)
process modeling, workflows Models, Architectures and Tools
software architecture
model-based development
integration of architecture in development Knowledge-Based Vision Systems
machine vision object recognition object tracking Data Analysis Systems
automated and intelligent data analysis prediction and optimization
DAS - Data Analysis Systems
Topics
Computational Models
Semantic Knowledge Models
Knowledge Discovery
Machine Learning
Stream Data Analysis
Data Warehousing Data Management A pp lic ati on Dom ai ns A pp lic ati on Dom ai ns
DAS - Data Analysis Systems
Topics
Computational Models
Semantic Knowledge Models
Knowledge Discovery
Machine Learning
Stream Data Analysis
Data Warehousing Data Management A pp lic ati on Dom ai ns A pp lic ati on Dom ai ns
Overview
Temporal Analytics on Big Data
Applications
Fault Detection
Proposed Architecture
Related Work
Learning Big Models
Causal Inference
Enabled by parallelization
Overview
Temporal Analytics on Big Data
Applications
Fault Detection
Proposed Architecture
Related Work
Learning Big Models
Causal Inference
Enabled by parallelization
Domain: Industrial Production
Subsystems generate
streams of sensor data
Stored in Production
Information Management
System
Analysis Tasks
Quality Assurance Process Optimization Fault Detection Fault Diagnosis ... system 1 system 2 system i system nPIMS
Selected References
voestalpine Stahl GmbH
Analysis of continuous casting process
Integration of expert knowledge
„visual“ Data Mining, Interpretation
Böhler Edelstahl
Quality analysis of high-grade steel production
unisoftware plus
machine learning framework (mlf)
Basis for many projects in the area of process analysis
Siemens Transformers Austria
Optimization of power transformer cores
Voith Paper, SCA Laakirchen
Analysis and optimization in paper production
Analysis tool PaperMiner
AMS Engineering
Knowledge discover in discrete manufacturing
Domain: Machine Manufacturer
Machines at different
locations generate streams
of sensor data
Stored in data center
Analysis Tasks
Usage Monitoring Profile Analysis Condition Monitoring Fault Detection Fault Diagnosis ...Data
Center
Domain: Decentralized Renewable Energy, Home Automation
Sensors of different kind at
each building generate
streams of sensor data
Temperature Solar radiation Energy production ...
Analysis Tasks
Usage Monitoring Profile Analysis Condition Monitoring Fault Detection Fault DiagnosisData
Center
Application : Fault Detection for Renewable Energy Units
(near) real time detection of faults of units
naturally temporal task
=> Data Stream Processing
profile analysis of units
Need access to all units
=> central application
large amount of devices
=> Big Data
low false positive rate, i.e. good model
needs considerable amount of historical data
especially for long term drifts
Fault Detection Algorithms
A) Compare measured channels to a model
Deviation indicate fault and its type
A good model needs to be identified (learned)
Typically using historical “good” data
B) Fit known model type
e.g. ARX: 𝑦 𝑡 = 𝑎𝑘𝑦 𝑡 − 𝑘 + 𝑏𝑖,𝑘 𝑖,𝑘𝑥𝑖(𝑡 − 𝑘)
Evaluated Solution
Combination of
Big Data Storage (BDS) for off-line MapReduce and
Stream Processing Engine (SPE) for on-line, real-time
unit 1 unit 2 unit i unit n MUX BDS SPE
Fault Detection Method A
Compare measured channels to a mode
MapReduce is used to calibrate model on historical data
SPE applies model in user-defined operator (UDO)
REPLAY for testing
unit 1 unit 2 unit i unit n MUX BDS SPE Model Model MapReduce Read e.g. from RDBMS REPLAY
Fault Detection Method B
Fit known model structure to data
BDS supplies historical data for testing via REPLAY
SPE incrementally fits certain kind of regression model
unit 1 unit 2 unit i unit n MUX BDS SPE Model Mo del REPLAY
Stream Data Mining:
Incremental Algorithms
1. Process an example at a time, and inspect it only once
2. Use a limited amount of memory
3. Work in a limited amount of time
4. Be ready to predict at any time
Stream Data Mining:
Open Source Framework MOA
MOA: Massive Online Analysis
WEKA community, Java
Big Data stream mining (classification, regression, and clustering) in real time
Can be easily used with e.g. Hadoop
Extendable with new mining algorithms
Goal: provide a benchmark suite for the stream mining community
18 © Software Competence Center Hagenberg GmbH
Discussion
General Setting
Units generate streams of sensor data (time,value)
Central storage of data for analysis tasks
Many analysis tasks are temporal in nature; e.g. fault detection
Implemented by current technology without much effort
REPLAY partially solves the problem of implementing
algorithms for MapReduce and SPE
Issues:
Usage of multiple SPE per machine or combiner
Integration of existing incremental learning tools such as MOA
Related Work: TiMR Framework
Combination of M-R and SPE (DSMS)
Temporal queries for off-line and on-line
Implemented using StreamInsight and SCOPE/Dryad
Badrish Chandramouli, Jonathan Goldstein, and Songyun Duan. 2012. Temporal Analytics on Big Data for Web Advertising. In Proceedings of the 2012 IEEE 28th International Conference on Data Engineering (ICDE '12). IEEE Computer Society, Washington, DC, USA
Overview
Temporal Analytics on Big Data
Applications
Failure detection
Proposed Architecture
Related Work
Learning Big Models
Causal Inference
Enabled by parallelization
Prediction und optimal control
Mo del Mo del Mo del Mo del Mo del Mo del Mo del Mo del Mo del Mo del Mo del Mo del Mo del Mo del
Setting
Complex industrial process
Limited knowledge about interdependencies
Goal
E.g. Predict amount of TOC in wastewater for next 48h
Challenges
Robustness of model
Precision of model
Several thousands of sensors => computational complexity
Approach
Identify causal model structure
Use parallelization to tackle computational complexity
22
Causal Models for
Prediction and Fault Detection
Linear Model
Various methods to estimate parameters
Prominent Method to estimate structure:
Graphical Lasso (Friedman 2007, 2012) based on L1 regularized minimization of log-likelihood
X would “Granger Cause” Y if it contains information
useful in forecasting Y
Implemented by graphical lasso on time lagged variables
Work in progress
Grouped Granger Graphical Lasso
Detection of control loops
Non-linear extensions
=> increases computational complexity
Extension to time: Granger Causality
Parallelization of
Machine Learning Algorithms
MapReduce (see first part of talk)
Good for data-parallel: Problems with iterative
algorithms and complex dependencies in the data
GraphLab
intuitively expresses computational dependencies
applied to dependent records which are stored as vertices in a large distributed data-graph
GPGPU
complex low level code (kernel) or:
High-Level languages: SAC, Matlab, Mathematica ...
Meta-Programming: PyCUDA / CL, ...
Parallelization of
Machine Learning Algorithms
MapReduce (see first part of talk)
data-parallel: Problems with iterative algorithms and
complex dependencies in the data
GraphLab
intuitively expresses computational dependencies
applied to dependent records which are stored as vertices in a large distributed data-graph
GPGPU
complex low level code (kernel) or:
High-Level languages: SAC, Matlab, Mathematica ...
Meta-Programming: PyCUDA / CL, ...
Hardware agnostic Parallel Patterns
Esp. Parallel Patterns for Machine Learning
ParaPhrase
High-level design and implementation patterns
useful parallelism for a wide range of parallel applications
heterogeneous multicore/manycore systems
Hardware Abstraction
Basis : FastFlow – Framework (Turin, Pisa)
General Purpose Patterns
Master – Slave, Farm, Pipeline, work queue, data dependency
Domain Specific Patterns (SCCH, HLR Stuttgart)
Suitability of generic patterns for machine learning
ML - Patterns: pool oriented, graphical models patterns, time series, ...
27 © Software Competence Center Hagenberg GmbH
28 © Software Competence Center Hagenberg GmbH
Relevant Use-Cases / Project Competencies (selection)
TRUMPF Austria
Improving precision of bending machines
K-Projekt SoftNet (I + II)
Fault prediction in software systems
„Mining Repositories“
K-Projekt PAC
Process Analytic Chemestry
Virtual sensors for chemical process analysis and control
BlueSky
Locally optimized weather predictions
Application : Energy Efficiency
Verbund
Prediction of available water flow to optimize renewable energy
usage
Use Case:
Local Weather Prediction
Data sources
Global Weather Models
Local Sensors: Weather stations, power plante, ...
Topographie, Expert knowledge
Goal
Planning of events, maintenance, ...
Basis for optimization of energy usage
0 2 4 6 0 2 4 6 -5 -2.5 0 2.5 5 0 2 4 6 Data collection Analysis Models Prediction
Expert Knowledge Alcohol
1 2 3 4 5 6 7 8 9 0111213141156171819102 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 11.41 11.73 12.05 12.37 12.69 13.01 13.33 13.65 13.97 14.29 14.61 0 20 40 60 80 100 -4 -3 -2 -1 0 1 2 20 40 60 80 100 -1 -0.5 0.5 1 9 10 11 12 13 14 15 16 17 18 46 47 48 49 9 10 11 12 13 14 15 16 17 18 46 47 48 49 925mb, 0.556939, 0.92949 Wien Linz St. Pölten Graz Salzburg Klagenfurt Innsbruck Eisenstadt Bregenz
Optimization of
Renewable Energy Usage
Goals
Short Term: Inclusion of availability of renewable energy in energy planning and trading
(Water, Wind, Solar)
D CZ SK H SLO CH INN MUR DONAU ENNS DRAU S AL ZA C H INN Graz Oberaufdorf-Ebbs Nußdorf Braunau-Simbach Schärding-Neuhaus Passau-Ingling Jochenstein St. Pantaleon Mühlrading Staning Garsten-St. Ulrich Rosenau Ternberg Klaus Losenstein Großraming Weyer Schönau Ering-Frauenstein Egglfing-Obernberg Funsingau Urreiting Bischofshofen Kreuzbergmaut St. Veit Wallnerau Feistritz-Ludmannsdorf Rosegg-St. Jakob Villach Paternion Freudenau Greifenstein Altenwörth Aschach Ottensheim-Wilhering Abwinden-Asten Wallsee-Mitterk. Melk Ybbs-Persenbeug Ferlach-Maria Rain Lavamünd Annabrücke Edling Rabenstein Peggau Weinzödl Lebring Spielfeld Obervogau Gabersdorf Gralla Laufnitzdorf Pernegg Dionysen Friesach St. Georgen Fisching Bodendorf-Mur Triebenbach Landl Krippau Altenmarkt Mandling Kellerberg Schwabeck Mellach Leoben Legende:
Gemeinschaftskraftwerke der AHP Speicherkraftwerke der AHP Laufkraftwerke der AHP Beteiligungen des Verbund
Gerlos Bösdornau Häusling Mayrhofen Kaprun-Oberstufe Kaprun-Hauptstufe Schwarzach Reißeck-Kreuzeck Malta-Unterstufe Malta-Oberstufe Malta-Hauptstufe Bodendorf-Paal Sölk Salza St.Martin Arnstein Hieflau Roßhag HYSIM II (Drabek et al. 2002)
Snow melt, ground Humidity (Holzmann &
Nachtnebel 2002) Data Driven Models (z.B. Ridge Regression,
Neural Networks)
Data Driven Models (z.B. Ridge Regression,
Neural Networks)
Rainfall-Runoff-Model
(Hebenstreit 2000)
SAMBA: Optimal weighting of all models
Flow values, Precipitation / Temperature & Forecast
Summary
Temporal Analytics on Big Data
Applications
Failure detection
Proposed Architecture
Related Work (MOA, TiMR)
Learning Big Models
Causal Inference
Enabled by parallelization
Prediction und optimal control
Veranstaltungstipp!
Mit geeigneter Strategie zur nachhaltigen
Softwarequalität: TRUST-IT
18. April, 09:00 - 14:00
Österreichische Computergesellschaft, Wien
Zielgruppe: Software-Entwicklungsleiter,
Prozessverantwortliche, Projektleiter,
Software-Qualitätsingenieure und Architekturverantwortliche.
Kontakt Dr. Holger Schöner +43 7236 3343 816 [email protected] www.scch.at DI Michael Zwick +43 7236 3343 843 [email protected] www.scch.at Dr. Thomas Natschläger +43 7236 3343 868 [email protected] www.scch.at