Optimized for the Industrial Internet:
Agenda
The Opportunity
The Solution
The Challenges
The Results
Big opportunities
with Industrial Big Data
The power of 1%
Increasing
freight utilization rail
Predictive
maintenance healthcare
Predictive
diagnostics power
Driving outcomes that matter
$27B
Industry value by
reducing system
inefficiency
$66B
Industry value
with efficiency
improvements in
gas-fired power
plant fleets
$63B
Industry value by
reducing process
inefficiency
Industrial Big Data – fast and vast
50B
Machines will be
connected on the
internet by 2020
2X
Industrial data
growth within
next 10 years
*Sources: IDC, IDC, Ericsson, Wikibon, Fast Company, ComputerWeekly CRM, ERP, etc. Logs Social network data Geo-location data
In practice only
3%
of potentially useful
data is tagged
and even less
is analyzed*
9MM
Data points per hour for eachlocomotive
500GB
Data per bladeby gas turbines
Sensor
data
Content
(images, videos, manuals, etc.)Historian
data
Machine
data
35GB
Data per dayfrom each Smart Meter
50X
Data growth in healthcare (2012 – 2020)1TB
Data per flight80% of an analytics project typically involves gathering
and then preparing the data for analysis*
Today’s approaches are not prepared
for onslaught of Industrial Big Data
Too
slow
Too
rigid
Too
expensive
All over the place
Data across multiple
locations
Snapshot
Limited to narrow
snapshots and time
Limited data types
Mostly structured and
semi-structured data types
Logs
Social network
data
Geo-location
data
CRM, ERP,
etc.
Yesterday’s data warehouse
architecture
TRADITIONAL DATA WAREHOUSE
What is it
telling me?
How does
it look?
How is it
doing?
Data scientist
Field operations
Business analyst
ONE STATIC DATA MODEL
1
2
3
All data
Access to real-time data
and historical data and not
limited to snapshot of data
Any data
Handing of all data types
including documents, images
machine data, sensor data
One place
Access to all data in one
place to quickly respond to
the speed of business change
1
2
3
Rapid access to all data for analytics
How long will
it last without
failures or
maintenance?
Is my asset
ready when
there is market
opportunity?
Is my asset
performing
optimally?
How to
configure
for best
operational
results?
FLEXIBLE DATA MODELS
Industrial Data Lake architecture
Underpinned by data governance appropriate to Business and Location
INDUSTRIAL DATA LAKE
Data scientist
Field operations
Business analyst
Sensor
data
Content
(images, videos, manuals, etc.)Machine
data
Historian
data
CRM, ERP, etc. Logs, click streams Geo-location data Social network dataData
governance
Analytics and
operations
Data
collection
Data
ingestion
New way
Current situation
Data loading Add semantic metadata Replica of source dataA day in the life – data management
Agility
Data scientistRigid
Field operations Business analyst INDUSTRIAL DATA LAKEAgile
Data scientist Field operations Business analyst Cost Data collection Data ingestion Data governance Analytics and operationsCost
CRM, ERP, etc. Logs Geo-location data Social network dataINDUSTRIAL
DATA
LAKE
Real-time ingestion Replica of source data Add semantic metadata Data collection Data ingestion Data governance Analytics and operations TimeTime to analyze
Data scientist Field operations Business analyst Data scientist Field operations Business analyst CRM, ERP, etc. Logs, click streams Geo-location data Social network data Sensor data Content (images, videos, manuals, etc.) Machine data Historian dataManagement
of all data,
any data in
one place
Data
monetization
and outcomes
Predictive /
prescriptive
analytics and
visualization
High
performance
computing
Industrial Data Lake Appliance
Pre-integrated with data management, compute, and storage
Consume
Analyze
Process
Manage
Cus
tomer
foc
us
Indus
trial
Data
Lak
e
Security
Industrial Data Lake
Optimized for industrial workloads
Optimized
for
mission-critical
workloads
for addressing key
SLAs such as
Security, resiliency
etc. for Industrial
Internet applications
Fast
ingestion,
storage and
compute
including
machine data
to support multiple
schema and
data types
High-performance
analysis
using massively
parallel processing
architecture
supporting Apache
Hadoop
Data
governance
and
federation,
with
geographically-dispersed
deployment options
Big Data without Governance
Dumping data into Big Data lake without repeatable
processes and data governance will create messy,
uncontrollable data environment
Insights harvested from ungoverned data lake, is not
reliable and trustworthy
If the insights can not be fully trusted, it’s difficult to
make business decisions confidently.
Solutions for Industrial
Internet, deep domain
GE as a Custodian of Customer Owned
Data & Services
Custodian Roles
Enforcement &
Measurement
Infrastructure
Protection
Privacy
Data
Management
a person who has
responsibility for or looks after
something
Custodian
Synonyms: keeper, guardian,
steward, protector
"the custodian of the relic"
Access Controls – Visibility
– Metrics…
Governance Disciplines
Metadata
Data Dictionary
Directory of all assets
Classification and Tagging
Lifecycle
Provenance
Lineage
Retention
Quality
Accuracy
Completeness
Consistency
Auditing
Monitoring
Logging
Log Analysis
Complianc
e
Regulatory
Corporate
Evolving Hadoop Data Governance
Define
data pipelines
Apache Falcon
Uses Oozie and Ambari
Monitor
data pipelines
Trace
pipelines for dependency,
lineage
Process
Data Set
Optimized
for
mission-critical
workloads
for Industrial
Internet applications
Industrial Data Lake
Supports SLAs for industrial workload KPIs
>99.99% Continuous operations, active-active High <30ms Elastic
Industrial solutions – OT focus
(ex: M&D, CBM, ALM, etc.)
Enterprise solutions – IT focus
(ex: CRM, SCM, ERP, etc.)
Performance /
latency
Resiliency
Capacity
Availability
Security
99.95% Planned downtime active disaster recoveryMedium/High 30-40ms
Security Risk for Big Data
More data implies higher risk of exposure
New data types may give rise to new security breach
scenarios
Evolving and experimental analysis implies security
policies are less likely to be in place
Linkage to other data already under compliance may
create scenarios where compliance could be violated.
Security Requirements
Perimeter security
Access control
Data protection
Data Visibility
Challenge: Complete security solution does not exist for
Top Opportunity Areas for Security
Perimeter:
Infrastructure
Communication
protocols
Key
management
Protection:
Encryption
Access policy
based
encryption
Searching /
filtering
encrypted data
Secure
outsourcing of
computation
Access
Control:
Privacy
Secure
dissemination
Secure data
collection
/ aggregation
Secure
collaboration
Visibility: Data
Management
Data
integrity/Proven
ance
Proof of data
storage
Data Lake Security Solutions
Physical
Security
Network Security
Authentication
Protecting the
cluster(s
)
Data Center
Deployments
Kerberos
Authentication
LDAP integration
Segregation of
duties
Data at rest
and motion
security
Data
obfuscation
Change
management
Encryption and
masking
solutions
File Permissions
Group
Authorizations
RBAC
Configuration
Management
FileSystem
Groups
LDAP Groups
Identity Mgmt
Data
Provenance
Data Lineage
Data Tagging
ETL Tools
Map Reduce
Evolving Hadoop Security
Apache Knox: Perimeter / Network security
Apache Ranger :
Authorization
Data protection
Audit tracking
21 GESoftware.com | @GESoftware | #IndustrialInternet