ACS 2015 Annual Canberra Conference
Are You Big Data Ready?
Vladimir Videnovic
“If you can't explain it simply,
you don't understand it well enough”
Albert Einstein
Introduction
Big Data
Term commonly used to describe phenomenon of
large and/or complex datasets
caused by the exponential growth and availability of data
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
Big Data – Big Vs
Big Vs
Business Perspective
It’s just Data – nothing has changed...
Value
– Enable smarter decisions, faster actions and optimised results
Valuation
– Evaluation of outcomes and continuous optimisation
– Measure the impact on key performance indicators
– Assess the ability to forecast future outcomes
Visibility
– Discover important insights – demand for good quality analytics & insight
– Study patterns, behaviours, trends and relationships
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
Big Vs
Information Management Perspective
It’s Information – and many things have changed!
Variety
•
Data Integration
•
New data sources (Social Media, Data streams, Internet of Things)
•
Lack of standards for managing various data types
Velocity
•
Data streaming, machine/sensor generated data
•
Real-time insights
Vulnerability
•
Information Security
•
Fraud discovery
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
Big Vs
Data Science Perspective
Value
•
Understanding what is possible
•
Prioritising/sizing up the opportunities
•
Analytics that do not support decision making doesn’t add value
Variety
•
Internal document/text/audio sources may be inaccessible
•
Need to prototype/test value add in adding new unstructured data from social
media and other external sources
Velocity
•
Move to real-time services
Big Data
Big Vs
Survey
BIG Data
Better
BIG Information
Enterprise capability required to
acquire relevant
Data
from multiple sources,
organise it in a repository,
convert it into
Information
by analysing it,
utilise it for deriving
Knowledge
Promise of Analytics
An Information-Powered Organisation
is equipped with timely insights and knowledge
to continuously optimise the way they
•
derive answers, decisions and actions
•
conduct their business
•
streamline their operations
•
deliver services and/or products
Information Management Today
Information Management Today
Common Challenges
•
Access to Information
•
Information Quality
•
Efficient Use of Information
Barriers to Mature Use of Information
Survey
Information Management
Need
Analytic Value at Fast Pace
Tool Complexity
•
Early Hadoop tools only for experts
•
Existing BI tools not designed for Hadoop
•
Emerging solutions lack broad capabilities
80% effort typically
spent on evaluating
and preparing data
Data Uncertainty
•
Not familiar and overwhelming
•
Potential value not obvious
•
Requires significant manipulation
Overly dependent on
scarce and highly
Modern Information Environment
Roles
Data Engineer (Data Wrangler, Data Janitor)
Data Scientist (Data Analyst, Data Miner)
Data Engineer
• ability to quickly understand data
• transform and enrich data
• programming
• data visualisation
• statistics
• effective present of results
Skills
Prepare data for easier access, analytical processing and efficient interpretation
Data Engineer
• data pattern discovery
• Ability to tell the story
• curiosity
• attention to detail
• research skills
• Question everything
• strive to make a difference
Skills
Create insights from available data
Modern Information Environment
People
Everyone on the Journey
Change
Empowered people
Easy Access to ALL Information
Engineered Processes
Modern Information Environment
Business Processes
Modern Information Environment
Business Processes in the Digital Era
Data Reality
Existing sources + Emerging Sources
Data Warehouse
Existing Sources
Emerging Sources
Information Management – Logical View
Big Data Eco-System
Data Fast Data
Events Actions
1 2 3
Streams
Information Warehouse
Reservoir Factory Warehouse
Cloudera Hadoop Distribution
HDFS – Storage - distributed file system that provides high-throughput access to data MapReduce – Process - framework for performing distributed data processing
Hive - Data Warehouse system for Hadoop
HCatalog - metadata abstraction layer for referencing data HBase - Hadoop DataBase - distributed columnar database
Hue – user interface for managing and using Hadoop components
Spark - parallel data processing framework for developing Big Data applications
YARN – cluster management – reliable distributed processing of very large data sets Oozie – workflow - coordination system for managing Hadoop jobs
ZooKeeper - centralised service for maintaining configuration information
Sqoop – efficient transfer of bulk data between Hadoop and structured datastores
Addressing Barriers
Information
Management
Capabilities
Focus Organisation-wide Information Management Goal Information Capability Design Patterns enablingInformation-Powered
Silos Standardise Optimise Information on Demand
• One off tools/solutions
• Bottom-up
• Local business unit
driven
• Many versions of truth
• Independent data
marts
• Enterprise standard
tools/solutions
• Data warehouse and
dependent data marts
• IT & LOB partnership
• Secure, consistent
access to all data
• Agile, flexible
architecture
• Analyse “just in time”
structured & unstructured data together
• Advanced analytics &
real-time
recommendations
• All stage 3 capabilities
delivered as a platform
(service/cloud)
• Access of tools and
data among broad subscriber group
Unified Information Architecture Maturity Phases
Data Processing - Exploration
Definition
• Understand Data and Data Value • Explore attributes/ metadata • Understand data quality
(needs/reality) • Prioritise • Identify custodians IM Capabilities • Data Staging • Data Discovery Enterprise Capabilities • Planning • Experience • Continuous Improvement • Data and Information • Enablers
Data Processing - Collection
Definition
• Acquisition/Ingestion/Extraction of data • Quality audit
• Change Data Capture
• High-Volume Data Acquisition • Stream/Event Data Capture
IM Capabilities • Extract Data • Ingest Content
• Change Data Capture
Data Processing - Storage
Definition • Structured Data • Content/Documents/Records • Non-structured Data • Storage Strategy • Data availability IM Capabilities• Enterprise Data Warehouse (Curated Data) • Data Reservoir (Exploratory Data)
Enterprise Capabilities • Resource Management • Business Continuity • Knowledge
Data Processing – Data Use/Information Creation
Definition
• Generate Information Assets • Derive intelligence
• Actionable Insights • Support Decisions • Enable Innovation
• Support Business Processes
IM Capabilities • Transactional Reporting • Ad-Hoc reporting • Structured Analysis • Sentiment Analysis • Advanced Analysis • Enterprise Search Enterprise Capabilities • Decision Making • Performance Mngmt. • Change Management • Strategy and Plans
Addressing Barriers
Data Processing – Data/Information Sharing and Collaboration
Definition
• Publish Information Assets • Action on Intelligence
• Communicate Decisions • Operationalise Innovation
IM Capabilities
• Enterprise Information Portal
Enterprise Capabilities • Outcomes • Decision Making • Communication • Collaboration • Change Management • Shared Vision
Data Processing - Optimisation
Definition
• Optimise Information Assets • Optimise Business Processes • Optimise Decisions
IM Capabilities
• Unknown Data Analysis
• Evidence-based Decision-Making Enterprise Capabilities • Performance Management • Business Continuity • Outcomes • Shared Vision
Data Processing – Information Archiving/Disposal
Definition
• Define and apply disposal authorities • Preserve information of interest
IM Capabilities
• Enterprise Content Management/Archiving
Core Information Governance Capabilities
IM Capabilities
• Change Management
• Data Quality Continuous Optimisation • Data Lineage
• Master Data
• Information Security
Enterprise Capabilities
• Strategy and Planning • Procedures
• Policies
• Data and Information • Security
Modern Information Environment
• People
• Business Processes
• Technology
Information Management Capability Modernisation
Benefits
Information-powered
users
Right
Information
Operationalised
Innovations
- Data available for analysis and ad-value processes
- Information transparent, useable and actionable to larger audience - Improved utilisation of existing information assets
- Information presented
consistently and accurately to all authorised users via appropriate channels
- Standardised information
platform ensures reuse of business patterns and information
consistency
- Support Innovation to
continuously improve business processes and/or delivery of services
- On-time insights available to larger population of information consumers, enabling better and consistent decision-making
ACCESS
QUALITY
EVOLUTION
BUSINESS
Informed
Decisions
- Consolidated information platform can reduce costs of maintenance and increase staff efficiency - Minimise support issues from inadequate delivery of information - Real-time analytic environment to support forward-looking analysis and evidence-based decision-making