• No results found

Optimized for the Industrial Internet: GE s Industrial Data Lake Platform

N/A
N/A
Protected

Academic year: 2021

Share "Optimized for the Industrial Internet: GE s Industrial Data Lake Platform"

Copied!
24
0
0

Loading.... (view fulltext now)

Full text

(1)

Optimized for the Industrial Internet:

(2)

Agenda

 The Opportunity

 The Solution

 The Challenges

 The Results

(3)

Big opportunities

with Industrial Big Data

The power of 1%

Increasing

freight utilization rail

Predictive

maintenance healthcare

Predictive

diagnostics power

Driving outcomes that matter

$27B

Industry value by

reducing system

inefficiency

$66B

Industry value

with efficiency

improvements in

gas-fired power

plant fleets

$63B

Industry value by

reducing process

inefficiency

(4)

Industrial Big Data – fast and vast

50B

Machines will be

connected on the

internet by 2020

2X

Industrial data

growth within

next 10 years

*Sources: IDC, IDC, Ericsson, Wikibon, Fast Company, ComputerWeekly CRM, ERP, etc. Logs Social network data Geo-location data

In practice only

3%

of potentially useful

data is tagged

and even less

is analyzed*

9MM

Data points per hour for each

locomotive

500GB

Data per blade

by gas turbines

Sensor

data

Content

(images, videos, manuals, etc.)

Historian

data

Machine

data

35GB

Data per day

from each Smart Meter

50X

Data growth in healthcare (2012 – 2020)

1TB

Data per flight

(5)

80% of an analytics project typically involves gathering

and then preparing the data for analysis*

Today’s approaches are not prepared

for onslaught of Industrial Big Data

Too

slow

Too

rigid

Too

expensive

(6)

All over the place

Data across multiple

locations

Snapshot

Limited to narrow

snapshots and time

Limited data types

Mostly structured and

semi-structured data types

Logs

Social network

data

Geo-location

data

CRM, ERP,

etc.

Yesterday’s data warehouse

architecture

TRADITIONAL DATA WAREHOUSE

What is it

telling me?

How does

it look?

How is it

doing?

Data scientist

Field operations

Business analyst

ONE STATIC DATA MODEL

1

2

3

(7)

All data

Access to real-time data

and historical data and not

limited to snapshot of data

Any data

Handing of all data types

including documents, images

machine data, sensor data

One place

Access to all data in one

place to quickly respond to

the speed of business change

1

2

3

Rapid access to all data for analytics

How long will

it last without

failures or

maintenance?

Is my asset

ready when

there is market

opportunity?

Is my asset

performing

optimally?

How to

configure

for best

operational

results?

FLEXIBLE DATA MODELS

Industrial Data Lake architecture

Underpinned by data governance appropriate to Business and Location

INDUSTRIAL DATA LAKE

Data scientist

Field operations

Business analyst

Sensor

data

Content

(images, videos, manuals, etc.)

Machine

data

Historian

data

CRM, ERP, etc. Logs, click streams Geo-location data Social network data

(8)

Data

governance

Analytics and

operations

Data

collection

Data

ingestion

New way

Current situation

Data loading Add semantic metadata Replica of source data

A day in the life – data management

Agility

Data scientist

Rigid

Field operations Business analyst INDUSTRIAL DATA LAKE

Agile

Data scientist Field operations Business analyst Cost Data collection Data ingestion Data governance Analytics and operations

Cost

CRM, ERP, etc. Logs Geo-location data Social network data

INDUSTRIAL

DATA

LAKE

Real-time ingestion Replica of source data Add semantic metadata Data collection Data ingestion Data governance Analytics and operations Time

Time to analyze

Data scientist Field operations Business analyst Data scientist Field operations Business analyst CRM, ERP, etc. Logs, click streams Geo-location data Social network data Sensor data Content (images, videos, manuals, etc.) Machine data Historian data

(9)

Management

of all data,

any data in

one place

Data

monetization

and outcomes

Predictive /

prescriptive

analytics and

visualization

High

performance

computing

Industrial Data Lake Appliance

Pre-integrated with data management, compute, and storage

Consume

Analyze

Process

Manage

Cus

tomer

foc

us

Indus

trial

Data

Lak

e

Security

(10)

Industrial Data Lake

Optimized for industrial workloads

Optimized

for

mission-critical

workloads

for addressing key

SLAs such as

Security, resiliency

etc. for Industrial

Internet applications

Fast

ingestion,

storage and

compute

including

machine data

to support multiple

schema and

data types

High-performance

analysis

using massively

parallel processing

architecture

supporting Apache

Hadoop

Data

governance

and

federation,

with

geographically-dispersed

deployment options

(11)

Big Data without Governance

 Dumping data into Big Data lake without repeatable

processes and data governance will create messy,

uncontrollable data environment

 Insights harvested from ungoverned data lake, is not

reliable and trustworthy

If the insights can not be fully trusted, it’s difficult to

make business decisions confidently.

Solutions for Industrial

Internet, deep domain

(12)

GE as a Custodian of Customer Owned

Data & Services

Custodian Roles

Enforcement &

Measurement

Infrastructure

Protection

Privacy

Data

Management

a person who has

responsibility for or looks after

something

Custodian

Synonyms: keeper, guardian,

steward, protector

"the custodian of the relic"

Access Controls – Visibility

– Metrics…

(13)

Governance Disciplines

Metadata

Data Dictionary

Directory of all assets

Classification and Tagging

Lifecycle

Provenance

Lineage

Retention

Quality

Accuracy

Completeness

Consistency

Auditing

Monitoring

Logging

Log Analysis

Complianc

e

Regulatory

Corporate

(14)

Evolving Hadoop Data Governance

Define

data pipelines

Apache Falcon

Uses Oozie and Ambari

Monitor

data pipelines

Trace

pipelines for dependency,

lineage

Process

Data Set

(15)

Optimized

for

mission-critical

workloads

for Industrial

Internet applications

Industrial Data Lake

Supports SLAs for industrial workload KPIs

>99.99% Continuous operations, active-active High <30ms Elastic

Industrial solutions – OT focus

(ex: M&D, CBM, ALM, etc.)

Enterprise solutions – IT focus

(ex: CRM, SCM, ERP, etc.)

Performance /

latency

Resiliency

Capacity

Availability

Security

99.95% Planned downtime active disaster recovery

Medium/High 30-40ms

(16)

Security Risk for Big Data

 More data implies higher risk of exposure

 New data types may give rise to new security breach

scenarios

 Evolving and experimental analysis implies security

policies are less likely to be in place

 Linkage to other data already under compliance may

create scenarios where compliance could be violated.

(17)

Security Requirements

 Perimeter security

 Access control

 Data protection

 Data Visibility

Challenge: Complete security solution does not exist for

(18)

Top Opportunity Areas for Security

Perimeter:

Infrastructure

Communication

protocols

Key

management

Protection:

Encryption

Access policy

based

encryption

Searching /

filtering

encrypted data

Secure

outsourcing of

computation

Access

Control:

Privacy

Secure

dissemination

Secure data

collection

/ aggregation

Secure

collaboration

Visibility: Data

Management

Data

integrity/Proven

ance

Proof of data

storage

(19)

Data Lake Security Solutions

Physical

Security

Network Security

Authentication

Protecting the

cluster(s

)

Data Center

Deployments

Kerberos

Authentication

LDAP integration

Segregation of

duties

Data at rest

and motion

security

Data

obfuscation

Change

management

Encryption and

masking

solutions

File Permissions

Group

Authorizations

RBAC

Configuration

Management

FileSystem

Groups

LDAP Groups

Identity Mgmt

Data

Provenance

Data Lineage

Data Tagging

ETL Tools

Map Reduce

(20)

Evolving Hadoop Security

 Apache Knox: Perimeter / Network security

 Apache Ranger :

 Authorization

 Data protection

 Audit tracking

(21)

21 GESoftware.com | @GESoftware | #IndustrialInternet

Availability

Excellence Framework

21

C C B P r o c e s s

C o n t i n u o u s

t o o l

i m p r o v e m e n t s

Q u i c k r e s p o n s e t o A l e r t s P r e - t e s t e d , p r e - a p p r o v e d c h a n g e s t o b e d e p l o y e d o n l y R e s t r i c t e d a c c e s s t o P R O D C o n f i g f i l e s c o m m i t t e d t o G i t N a m e N o d e H A M o n i t o r i n g / A l e r t i n g J V M i n s t r u m e n t a t i o n A u d i t i n g c h a n g e s D R S t r a t e g y D a t a B a c k u p H A f o r N I C , I S P , S e r v e r s , D i s k M o n i t o r i n g / A l e r t i n g

(22)

Target Availability SLA Cost comparison

SLA

Cost

associated

Typical industry Use

Case

Feature list required

<=99%

$

Batch update systems, Retail

Web Sites, Social Media sites,

Big Data clusters

NameNode HA, Higher Data Replication than

3, Hardware redundancy, Monitoring and

Alerting, Data Centre Redundancy, 2X

Projected Capacity implementation

99.9%

$$

Retail Web Sites, Social Media

sites,

Relational Databases

All of the above + Full Data Centre

Redundancy, Automatic Failover, 3X Projected

Capacity implementation

99.99%

$$$

Hi-Frequency Trading, Medical

support systems

All of the above + Full Data Centre

Redundancy including near real time data

replication, 4X Projected Capacity

implementation

99.999%

$$$$$

Hi-Frequency Trading, Medical

support systems, Stock

Exchanges ex. Nasdaq,

NYSE, Air-traffic controllers

All of the above + Auto-recovering

components, 5X Projected Capacity

implementation

100%

$$$$$$

Real-time Trading systems,

Stock Exchanges ex. Nasdaq,

NYSE, On-board flight

computer, Air-traffic controllers

All of the above + 10X Projected Capacity

implementation

(23)

Case study – GE Aviation

Asset productivity, minimize disruptions, improved forecasting

25

Airlines

3.4M

Flights

340TB

Data

10X

Cost reduction

7 days

Time-to-market for

new analytic app

2000X

Performance

improvement

Isolate root causes

Identify sub-optimal

performance parts

Minimize

disruptions

(24)

Thank you

General Electric reserves the right to make changes in specifications and features, or discontinue the

product or service described at any time, without notice or obligation. These materials do not constitute a

representation, warranty or documentation regarding the product or service featured. Illustrations are

provided for informational purposes, and your configuration may differ.

This information does not constitute legal, financial, coding, or regulatory advice in connection with your use

of the product or service. Please consult your professional advisors for any such advice.

GE, the GE Monogram, Predix, Predictivity are trademarks of General Electric Company.

©2014 General Electric Company – All rights reserved.

References

Related documents

RPCA [1] is a batch-based method assuming full access to the data, while GRASTA [6] and ReProCS [10] are online methods that can recover either the low-rank component (GRASTA) or

Logs Geo-location data Social network data INDUSTRIAL DATA LAKE Real-time ingestion Replica of source data Add semantic metadata Data collection Data ingestion Data

А для того, щоб така системна організація інформаційного забезпечення управління існувала необхідно додержуватися наступних принципів:

According to the international experience, federal authorities can carry out six groups of functions for support of mechanisms of development of innovative

• The projects are reviewed by a non-conflicted panel of community members who are familiar with housing and homelessness issues; use APR data, grant application. • The process is

A. The Subrecipient shall ensure that no person shall on the ground of race, color, national origin, religion, sex, age, or handicap be excluded from

Players can create characters and participate in any adventure allowed as a part of the D&amp;D Adventurers League.. As they adventure, players track their characters’

Simulating clinical concentrations and delivery rates of a typical intravenous infusion, a variety of routinely used pharmaceutical drugs were tested for potential binding to