Big Data Success Stories
Dr. Vincenzo Chiochia
European TDWI Conference Munich, Germany June 22-24, 2015
Understanding Big Data
A majority of organizations carry out business based on insights gained from data analysis. There has been a shift in the size, type and form of data and in the way data is analyzed, leading to the term “Big Data”.
Big Data in generally characterized by the three Vs: • Volume: Sheer quantity of data
• Velocity: Speed with which data is produced, processed, and stored • Variety: Diversity of data sources and formats
Structured • Fields/Tables/Columns • Relational Database Management System (RDBMS)/Spreadsheet Semi-structured Unstructured • Markers/Tags to separate elements • Extensible Markup Language (XML)/Hyper Text Markup Language (HTML)
• No fields/attributes • Free-form text (email
body, notes, articles) • Audio, video and image
Big Data
Volume
Velocity
– Big Data Success Stories
Challenges for Organizations
The rapid shift in our data world challenges traditional infrastructure, technologies and information management
Relational approaches are not optimum when dealing with lack of structure, speed of processing at scale and associated cost.
Business is not effectively enabled by a holistic data analysis because of application silos
Traditional information governance and management practices cannot cope with characteristics of Big Data.
Data visualization and analytics resources are not able to deal with the right representations and insights which can be elicited from Big Data.
Necessary architecture, analytics skills and talent that can benefit from the shift to Big Data are lacking
Big Data technologies can create measurable value for an enterprise in many possible ways
Potential Value of Big Data
Enhanced Productivity and Efficiency
• Improve operational intelligence of high-volume systems • Respond real-time or near real-time
• Enable transparency through effective data sharing Data Driven Insights
• Comprehensive overview across structured and unstructured data • Bridge across enterprise data silos
• Enhanced predictive capabilities and models not limited by data samples Better Return on Investment (ROI)
• Scale out and use Open Source Software (OSS) in storage and processing • Eliminate redundant infrastructure, duplication and rework
Innovative New Business Services
• Start leveraging data unaccessible until recently • Visualize and extract right data perspectives
Big Success with Big Data
DIRECTOR OF ANALYTICS OR EQUIVALENT 65 CHIEF ANALYTICS OFFICER 38 GREATER THAN $10B 180 $5B-$10B 185 $1B-$5B 335 $500M-$1B 231 $250M-500M 76
The study is bases on a survey across various industries,
company sizes and job titles to ensure high representativity
Industry n=1,007
RETAIL 176
INSURANCE 124
HEALTHCARE
PROVIDERS & PAYERS 100
ENERGY 130
CONSUMER GOODS
& SERVICES 120 COMMUNICATION 170
BANKING 187
Revenue n=1,007 Job Title n=1,007
CHIEF DATA OFFICER 85
ANALYTICS LEAD 47 CFO 72 CMO 67 CIO 255 COO 141 OTHER SVP 3
SENIOR VICE PRESIDENT: DATA, ANALYTICS OR
TECHNOLOGY 84
TECHNOLOGY DIRECTOR 126
DATA SCIENTIST 24
Base: All respondents; n=1,007
The surveyed companies are mainly based in economically advanced countries around the globe
Headquarters n=1,007 SINGAPORE 51 MALAYSIA 50 JAPAN 52 INDIA 51 CHINA 52 AUSTRALIA 50 BRAZIL 51 UNITED STATES 101 CANADA 50 UNITED KINGDOM 50 SWEDEN 50 SPAIN 50 NORWAY 51 NETHERLANDS 51 ITALY 50 GERMANY 52 FRANCE 50 FINLAND 51 DENMARK 44
Why use Big Data?
58% TO MAINTAIN COMPETIVENESS
35% TO BE AHEAD OF INDUSTRY PEER GROUP
6% NEED TO CHANGE OR FACE POTENTIAL DECLINE
Functions where companies use Big Data
55% 53% 47% 46% 40% 29% 27% Marketing IT Finance Business Operations Supply Chain HR Product Development
Immediate impact: Where Big Data is used today
Respondents use Big Data for analyzing customer behavior, combining data sources and improving customer personalization.
57% 56% 53% 47% 45% 41% 37% 33% 20%
Analyzing customer behavior
Bringing together different data sources
Improving personalization of customer
Making data a revenue generator, not just a supporting function (Data as a platform) Enhancing responsiveness to market
dynamics
Generating reports faster than currently possible
Enhancing customer relationships
Developing new products/services
Identifying cost reduction opportunities
Implementation: Big Data demands broad learning
Security, budget, talent and integration with existing systems are challenges. 51% 47% 41% 37% 35% 33% 27% 7% 1% Security Budget
Lack of talent to implement big data Lack of talent to run big data and analytics on an ongoing basis
Integration with existing systems
Procurement limitations on big data vendors
Enterprise not ready for big data
Lack of executive sponsorship
Other
Source: Big Data, April 2014 –Q26
What are the main challenges to implementing Big Data in your company?
57%
45%
34%
5% Yes, consultants
Yes, contract employees
Yes, technology vendor resources
No, we used internal resources only
Help needed:
Most used external help for implementation and plan to hire
Did you get external help for your Big Data installation? Check all that apply.
95% used one or more sources of external help
Source: Big Data, April 2014 –Q23, Q29
Does your company have or plan to build/increase your data science expertise within the next year?
55% 36%
6% 1%
1%
Yes, within the next year Yes, but not within the next year
No, we don’t see the need No, we don’t have budget No, other reasons
vs.
67%
43%
vs.
58%
22%
Size makes a difference:
Larger companies get more from Big Data
Base: All respondents; n=1,007 Source: Big Data, April 2014 –Q2, Q34
Big Data is seen as extremely important by more large companies than small.
More large company users report that Big Data completely met their needs
Larger companies ($10B+) Smaller Companies ($250M to $500M)
Big Data is expected to bring transformation
Biggest impact in the next five years
37% 26% 15% 8% 9% 5% 63% 58% 56% 48% 47% 27% Impacting customer relationships
Redefining product development
Changing the way we organize operations
Making the business more data-focused
Optimizing the supply chain
Fundamentally changing the way we do business
Base: All respondents; n=1,007 Source: Big Data, April 2014 –Q37
Top Impact Top 3 Impact
Customer focused
Companies mainly use Big Data in Marketing and IT to improve their
competitiveness and enhance customer experience
Big Success with Big Data
Key Findings Broad learning required Organizations are learning the complexities of Big Data and how to address challenges including security, budget, lack of talent and integration with existing systems
Help needed
Companies are finding ways to get help with Big Data, whether bringing external resources for a project, hiring new talent or training their teams
Company size makes a difference
Larger companies are seeing better results by doing more with Big Data.
Potential for disruptive transformation
Organizations see Big Data as transforming the way business is done in the next five years.
Modern Architecture & Technology
The functional view of the hybrid data strategy outlines how various components should work together
Hybrid Data Architecture Strategy
Data Consumers Data Access Data Storage Data Processing
Let us take a deeper look at a reference diagram and how all of the technologies can be combined together
Hybrid Data Platform Architecture Reference Model
Legend Platform External Platform Internal Data Sources Unstructured and Structured Raw Data Storage Processing Core Services Advanced Services Data Warehouse ODS RDBMS Management
Workflow Services Platform
Access Management Integra tion Core Frameworks Interface s Users
Standard UIs Protocols/APIs Third-party Envs.
Core Platforms
Inge
s
Legend Platform External Platform Internal Data Sources Unstructured and Structured Raw Data Storage Processing Core Services Advanced Services Data Warehouse ODS RDBMS Management
Workflow Services Platform
Access Management Integra tion Core Frameworks Interface s Users
Standard UIs Protocols/APIs Third-party Envs.
Core Platforms
Inge
s
tion
Data ingestion can take place through batch ingestion or stream ingestion
Hybrid Data Platform: Data Ingestion
Ingestion Frameworks • Chukwa, Flume, Scribe • STORM • Splunk • Sqoop
Legend Platform External Platform Internal Data Sources Unstructured and Structured Raw Data Storage Processing Core Services Advanced Services Data Warehouse ODS RDBMS Management
Workflow Services Platform
Access Management Integra tion Core Frameworks Interface s Users
Standard UIs Protocols/APIs Third-party Envs.
Core Platforms
Inge
s
tion
The standard user interfaces, Protocols/API and third-party tools form the gateway to the core services
Hybrid Data Platform: Access
Framework and Services APIs
• Diverse libs and protocols • Various languages • Multiple IDLs • Often RESTful Packaged Environments • Commercial offerings • Usually service specific • Generally for analytics and visualization Top-level User Interfaces • Shells and GUIs • Enables user access to core framework and services
Legend Platform External Platform Internal Data Sources Unstructured and Structured Raw Data Storage Processing Core Services Advanced Services Data Warehouse ODS RDBMS Management
Workflow Services Platform
Access Management Integra tion Core Frameworks Interface s Users
Standard UIs Protocols/APIs Third-party Envs.
Core Platforms
Inge
s
tion
Big Data integration brings in relevant business data into the Big Data platform
Hybrid Data Platform: Integration
DBs and DWs • Oracle, MySQL • MS SQLServer • PostgresSQL • Omniture • Netezza • Terradata • Vertica
Integration Tools and Frameworks • Sqoop • Pentaho Kettle • Talend • Informatica, PowerExchange • SQL-H
Legend Platform External Platform Internal Data Sources Unstructured and Structured Raw Data Storage Processing Core Services Advanced Services Data Warehouse ODS RDBMS Management
Workflow Services Platform
Access Management Integra tion Core Frameworks Interface s Users
Standard UIs Protocols/APIs Third-party Envs.
Core Platforms
Inge
s
tion
The workflow, services and platform form the bulk of data management that helps to control the platform
Hybrid Data Platform: Management
• Hive MetaStore • HBase Master • Zookeeper • Cloudera CM • DataStax; OpsCenter • Hortonworks: Ambari • CouchDB: Futon • Oozie • Cascading • Azkaban • Talend
Legend Platform External Platform Internal Data Sources Unstructured and Structured Raw Data Storage Processing Core Services Advanced Services Data Warehouse ODS RDBMS Management
Workflow Services Platform
Access Management Integra tion Core Frameworks Interface s Users
Standard UIs Protocols/APIs Third-party Envs.
Core Platforms
Inge
s
tion
Core framework refers to a general framework, platform options, storage, processing and services
Hybrid Data Platform: Core System
Core Services
• High
Latency/batch:
Java API, Hive, Pig • Low Latency: HBase, Cassandra, Couchbase • Docu Stores: MongoDB, CouchDB • Redis, Riak Processing
• MapReduce: (M/R, M/Rv2 [YARN], Elastic MapReduce[EMR], Greenplum, Disco
• MPP: ETLs Hadapt is an analytical platform that runs on Hadoop and treats it as an operating system.
• BSP: Hama
Core Frameworks
• Hadoop: Version 1.0 and 2.0 • Cassandra: DP1, Brisk • MongoDB
• CouchDB, Couchbase Server • Redis • Riak Storage • HDFS • CassandraFS • GridFS (MongoDB) • AWS S3 • MPP databases • In-memory databases – SAP HANA, Druid
• Azure storage Advanced Services • Mahout • GraphDB • OpenGPS • Hcatalog • OpenTSBD • GeoLoc • Spark Core Platforms • Commodity hardware stack • Enterprise hardware stack • Distributed “networked” system • Cloud implementations • Hybrid implementations
Big Data Success Story 1
The rise of digital technologies forced the client to transform their IT landscape
Initial Situation
Rising Mainframe Costs
Mainframe Costs rapidly increased over the last years, with an ongoing increase of + 5% CPU usage per year
More Devices more Channels
75% - 80% of all online transactions already can be attributed from mobile devices, tablets and other channels with an increasing tendency
~60% of the overall transactions are read-only request
Digital Foundation
The client faced the challenge to invest and build a platform as Digital Foundation to be competitive in the Digital area
Scenarios
Traditional vendors and the open source community provide solutions, but which one is better for the client?
Solution Recommendation > Evaluated Solution Scenarios
Traditional Vendors Open Source Data Lake A B
• Usage of products from SAP (SAP HANA) and Oracle (Oracle Exalytics, Oracle TimesTen)
• Implementation of the “Transaction Cache” on a relational database and then leverage the analytical solutions
provided by the vendors
• Usage of NoSQL / BigData technologies established by Web 2.0 companies (e.g. Amazon, Google, Facebook, Twitter, Yahoo) and further enhanced by the open source community
• Common data platform for an enterprise, which allows to execute real-time operational and analytical queries
• Several commercial distributions are available, which provide enterprise-level support and actively contribute to the open source community to evolve the technology stack
Is a multi-workload Hadoop ready for the Enterprise?
Solution Recommendation > Open Source Data Lake
Concern How about support? Is it just a hype? Response
• Enterprise-level support is available (Cloudera, Hortonworks, MapR Technologies, etc.)
• Openness – you can switch vendor
Should I use it everywhere?
• NO – Huge Eco-System
• The adoption rate is steadily increasing, filling a real gap • All vendors are major contributors to the OS community • Comparable to Linux in take-up
• NO – Just like all of NoSQL it’s not for everything
• Be thoughtful what to adopt, the core is very stable, newer tools may not
Is it secure?
• Yes, integration with Kerberos and LDAP
• Encryption in transit fully supported in Open Source • Encryption at rest is there, and easy with Linux
At the logical level, we not only require the “Data Lake”, but
it has to be integrated in the client’s application landscape
Logical Solution Architecture
Near Real-Time Integration Real-Time Analytics Application(s) Multi-Channel Platform Distributed Cache (Application Level) Real-Time Analytics Database De-central Database Central Database
Improves the performance of the response to a user
request
Gets the transaction read load away from
the host
Provides Real-Time Analytics
The final solution is based on Hadoop and a queue-based integration with the mainframe system
Physical Solution Architecture
Real-Time Analytics Application(s) Multi-Channel Platform Central Database (RDBMS) Middleware Hadoop Hortonworks Data Platform 2.1 Other Data Sources (e.g. Server Log -Files, …) Connector Custom Adapter ODBC, JDBC, …
Core Banking Application
Frontend
read Native
API
write / update / delete / read write / update / delete read (non-transaction data)
Managed Service
With the Go-Live of the Transaction Cache the main project objectives were fulfilled
Achievements
Old Data Lake
Reduced Costs
With the Go-Live of the Transaction Cache, the client reduced immediately their online transactions by 60% and could save 50% of CPU on the mainframe
Project Duration
An agile and nimble onsite-project team implemented in only 7 months the foundation of Hadoop, data replication and APIs for online channels
Online Transactions R e d u c e d On lin e Tr a n s a c tion s
Old Data Lake
Mainframe CPU R e d u c e d M a inf ra m e C P U Milestone Mobilization Transition to Production Mobile Push Project Management Hadoop Cluster Service Performance Decision Analytics Managed Service
As Hadoop skills are rare and operation is not the core capability of the client, Accenture set-up a managed
services, with AO and IO services from PDC from ManilaIO Services
AM Services from Cebu
Besides the pure cost reduction, Accenture implemented fundamentals for next use-cases
Digital Foundation
Logging of all the customer
touchpoints with the Bank in the data lake is designed and currently under implementation
Customer Journey A POC for mobile push has been
successfully conducted. Underlying real-time technologies where established at Twitter (Storm) and LinkedIn (Kafka).
Mobile Push
Internal Fraud is the first use case of the roadmap to be implemented with the Hadoop ecosystem
Fraud Detection
Implementing a functional data model and micro segmentation is on the roadmap
Micro Segmentation
Setup of ad hoc and self-service analytics Analytics
Big Data Success Story 2
– Big Data Success Stories
Project main information
Customer Experience Management (CEM)
Change of architecture in operations system support area, including: • Business process management for network planning
• Service assurance systems enhancements • Network configuration
Close ties with OSS Factory project
• Outsourcing of the maintenance of OSS application to Accenture • Realised by onshore and offshore teams
Two project phases
• Phase 1 – Pilot: realization of three new use cases
• Phase 2 – Consolidation and migration of all Network Analytics applications to the new platform
The client wants to strengthen its customer experience
The project aims to simplify the system architecture while improving scalability and implementing new functionalities
Project objectives and content
• Consolidation and
simplification of the system architecture
• Improvement of system scalability
• Implementation of new functionalities
• Real-time data monitoring
• Reporting for new products Project Objectives
Project Content
Enable reporting,
monitoring and analysis of data from network interfaces and elements. Analysis dimensions: • Client
• Device
• Network Topology • Time
In phase 1 a pilot is being implemented for 3 major uses cases of the telecom industry
Pilot Use Cases
LTE where it matters
• Analyse LTE utilisation in the network cells from different dimensions • Identify LTE-related potential:
• Cells without LTE but with LTE contract customers and with LTE supported devices – Network Planning
• Cells with LTE and with customers with LTE supported devices but without LTE contracts – Marketing
Fixed Mobile Substitution
• Implementation of the reporting for a new product
• Measurement of service quality and identification of areas for improvement
Real-time monitoring
• Monitoring of network usage (number of users, up/down transfer in the real-time)
1
2
In phase 2 all legacy systems will be migrated to the platform
Legacy Systems
Reporting
Static reports with data & voice KPIs presented from customer / network / device dimension
Monitoring
Alarms based on monitoring of network specific KPIs
Visualization / Dashboards
Graphical representation of network data
1
2
The solution is based upon Pivotal Real Time Intelligence combined with a Hadoop / Teradata platform
Simplified architecture and data sources
37 Visual Analytics Application Passive Probing DPI Reference Data OSS Reference Data BSS ETL EMC Pivotal RTI 4 T CDRs Active Probing EM C Spring Framew ork Geomaps Server (e.g. ESRI) Other Data Business Intelligence Application Alarming Aggregated data Real-time data Hadoop / Teradata
Prerequisites for future network analytics use cases
Prospective Usage of the Big Data Platform
Integration of new data sources
• BSS data
• Fixed net data
• Campaign information • Customer surveys • Ticketing systems • … Improving management awareness
Prerequisites Possible Use Cases
Telco-specific
• Investment optimisation • Network load prediction • Proactive monitoring
General
• Churn analysis
• Customer segmentation • Product affinity analysis • Up-sell / cross-sell
Big Data Success Stories
Big Data projects require acceptance, deep knowledge and might raise data privacy concerns
Big Data Challenges
Data Privacy
• Connecting new data sources
• Interacting with sensitive or personal data
Technology & Architecture
• Hadoop is Enterprise-ready. However, some powerful tools may be immature • Rapidly evolving ecosystem
Decision Culture
• Influence requires acceptance • Acceptance relies on data quality
3
1
2
Involve your Data
Protection Officer early in the project to identify and resolve issues quickly Get real experts
Only choose from tried and tested technology
Fully integrate, don’t misuse Management should fully understand and agree to the value of the gained insight