• No results found

Big Data Success Stories

N/A
N/A
Protected

Academic year: 2021

Share "Big Data Success Stories"

Copied!
41
0
0

Loading.... (view fulltext now)

Full text

(1)

Big Data Success Stories

Dr. Vincenzo Chiochia

European TDWI Conference Munich, Germany June 22-24, 2015

(2)

Understanding Big Data

A majority of organizations carry out business based on insights gained from data analysis. There has been a shift in the size, type and form of data and in the way data is analyzed, leading to the term “Big Data”.

Big Data in generally characterized by the three Vs: • Volume: Sheer quantity of data

Velocity: Speed with which data is produced, processed, and stored • Variety: Diversity of data sources and formats

Structured • Fields/Tables/Columns • Relational Database Management System (RDBMS)/Spreadsheet Semi-structured Unstructured • Markers/Tags to separate elements • Extensible Markup Language (XML)/Hyper Text Markup Language (HTML)

• No fields/attributes • Free-form text (email

body, notes, articles) • Audio, video and image

Big Data

Volume

Velocity

(3)

– Big Data Success Stories

Challenges for Organizations

The rapid shift in our data world challenges traditional infrastructure, technologies and information management

Relational approaches are not optimum when dealing with lack of structure, speed of processing at scale and associated cost.

Business is not effectively enabled by a holistic data analysis because of application silos

Traditional information governance and management practices cannot cope with characteristics of Big Data.

Data visualization and analytics resources are not able to deal with the right representations and insights which can be elicited from Big Data.

Necessary architecture, analytics skills and talent that can benefit from the shift to Big Data are lacking

(4)

Big Data technologies can create measurable value for an enterprise in many possible ways

Potential Value of Big Data

Enhanced Productivity and Efficiency

• Improve operational intelligence of high-volume systems • Respond real-time or near real-time

• Enable transparency through effective data sharing Data Driven Insights

• Comprehensive overview across structured and unstructured data • Bridge across enterprise data silos

• Enhanced predictive capabilities and models not limited by data samples Better Return on Investment (ROI)

• Scale out and use Open Source Software (OSS) in storage and processing • Eliminate redundant infrastructure, duplication and rework

Innovative New Business Services

• Start leveraging data unaccessible until recently • Visualize and extract right data perspectives

(5)

Big Success with Big Data

(6)

DIRECTOR OF ANALYTICS OR EQUIVALENT 65 CHIEF ANALYTICS OFFICER 38 GREATER THAN $10B 180 $5B-$10B 185 $1B-$5B 335 $500M-$1B 231 $250M-500M 76

The study is bases on a survey across various industries,

company sizes and job titles to ensure high representativity

Industry n=1,007

RETAIL 176

INSURANCE 124

HEALTHCARE

PROVIDERS & PAYERS 100

ENERGY 130

CONSUMER GOODS

& SERVICES 120 COMMUNICATION 170

BANKING 187

Revenue n=1,007 Job Title n=1,007

CHIEF DATA OFFICER 85

ANALYTICS LEAD 47 CFO 72 CMO 67 CIO 255 COO 141 OTHER SVP 3

SENIOR VICE PRESIDENT: DATA, ANALYTICS OR

TECHNOLOGY 84

TECHNOLOGY DIRECTOR 126

DATA SCIENTIST 24

(7)

Base: All respondents; n=1,007

The surveyed companies are mainly based in economically advanced countries around the globe

Headquarters n=1,007 SINGAPORE 51 MALAYSIA 50 JAPAN 52 INDIA 51 CHINA 52 AUSTRALIA 50 BRAZIL 51 UNITED STATES 101 CANADA 50 UNITED KINGDOM 50 SWEDEN 50 SPAIN 50 NORWAY 51 NETHERLANDS 51 ITALY 50 GERMANY 52 FRANCE 50 FINLAND 51 DENMARK 44

(8)
(9)

Why use Big Data?

58% TO MAINTAIN COMPETIVENESS

35% TO BE AHEAD OF INDUSTRY PEER GROUP

6% NEED TO CHANGE OR FACE POTENTIAL DECLINE

Functions where companies use Big Data

55% 53% 47% 46% 40% 29% 27% Marketing IT Finance Business Operations Supply Chain HR Product Development

(10)

Immediate impact: Where Big Data is used today

Respondents use Big Data for analyzing customer behavior, combining data sources and improving customer personalization.

57% 56% 53% 47% 45% 41% 37% 33% 20%

Analyzing customer behavior

Bringing together different data sources

Improving personalization of customer

Making data a revenue generator, not just a supporting function (Data as a platform) Enhancing responsiveness to market

dynamics

Generating reports faster than currently possible

Enhancing customer relationships

Developing new products/services

Identifying cost reduction opportunities

(11)

Implementation: Big Data demands broad learning

Security, budget, talent and integration with existing systems are challenges. 51% 47% 41% 37% 35% 33% 27% 7% 1% Security Budget

Lack of talent to implement big data Lack of talent to run big data and analytics on an ongoing basis

Integration with existing systems

Procurement limitations on big data vendors

Enterprise not ready for big data

Lack of executive sponsorship

Other

Source: Big Data, April 2014 –Q26

What are the main challenges to implementing Big Data in your company?

(12)

57%

45%

34%

5% Yes, consultants

Yes, contract employees

Yes, technology vendor resources

No, we used internal resources only

Help needed:

Most used external help for implementation and plan to hire

Did you get external help for your Big Data installation? Check all that apply.

95% used one or more sources of external help

Source: Big Data, April 2014 –Q23, Q29

Does your company have or plan to build/increase your data science expertise within the next year?

55% 36%

6% 1%

1%

Yes, within the next year Yes, but not within the next year

No, we don’t see the need No, we don’t have budget No, other reasons

(13)

vs.

67%

43%

vs.

58%

22%

Size makes a difference:

Larger companies get more from Big Data

Base: All respondents; n=1,007 Source: Big Data, April 2014 –Q2, Q34

Big Data is seen as extremely important by more large companies than small.

More large company users report that Big Data completely met their needs

Larger companies ($10B+) Smaller Companies ($250M to $500M)

(14)

Big Data is expected to bring transformation

Biggest impact in the next five years

37% 26% 15% 8% 9% 5% 63% 58% 56% 48% 47% 27% Impacting customer relationships

Redefining product development

Changing the way we organize operations

Making the business more data-focused

Optimizing the supply chain

Fundamentally changing the way we do business

Base: All respondents; n=1,007 Source: Big Data, April 2014 –Q37

Top Impact Top 3 Impact

(15)

Customer focused

Companies mainly use Big Data in Marketing and IT to improve their

competitiveness and enhance customer experience

Big Success with Big Data

Key Findings Broad learning required Organizations are learning the complexities of Big Data and how to address challenges including security, budget, lack of talent and integration with existing systems

Help needed

Companies are finding ways to get help with Big Data, whether bringing external resources for a project, hiring new talent or training their teams

Company size makes a difference

Larger companies are seeing better results by doing more with Big Data.

Potential for disruptive transformation

Organizations see Big Data as transforming the way business is done in the next five years.

(16)

Modern Architecture & Technology

(17)

The functional view of the hybrid data strategy outlines how various components should work together

Hybrid Data Architecture Strategy

Data Consumers Data Access Data Storage Data Processing

(18)

Let us take a deeper look at a reference diagram and how all of the technologies can be combined together

Hybrid Data Platform Architecture Reference Model

Legend Platform External Platform Internal Data Sources Unstructured and Structured Raw Data Storage Processing Core Services Advanced Services Data Warehouse ODS RDBMS Management

Workflow Services Platform

Access Management Integra tion Core Frameworks Interface s Users

Standard UIs Protocols/APIs Third-party Envs.

Core Platforms

Inge

s

(19)

Legend Platform External Platform Internal Data Sources Unstructured and Structured Raw Data Storage Processing Core Services Advanced Services Data Warehouse ODS RDBMS Management

Workflow Services Platform

Access Management Integra tion Core Frameworks Interface s Users

Standard UIs Protocols/APIs Third-party Envs.

Core Platforms

Inge

s

tion

Data ingestion can take place through batch ingestion or stream ingestion

Hybrid Data Platform: Data Ingestion

Ingestion Frameworks • Chukwa, Flume, Scribe • STORM • Splunk • Sqoop

(20)

Legend Platform External Platform Internal Data Sources Unstructured and Structured Raw Data Storage Processing Core Services Advanced Services Data Warehouse ODS RDBMS Management

Workflow Services Platform

Access Management Integra tion Core Frameworks Interface s Users

Standard UIs Protocols/APIs Third-party Envs.

Core Platforms

Inge

s

tion

The standard user interfaces, Protocols/API and third-party tools form the gateway to the core services

Hybrid Data Platform: Access

Framework and Services APIs

• Diverse libs and protocols • Various languages • Multiple IDLs • Often RESTful Packaged Environments • Commercial offerings • Usually service specific • Generally for analytics and visualization Top-level User Interfaces • Shells and GUIs • Enables user access to core framework and services

(21)

Legend Platform External Platform Internal Data Sources Unstructured and Structured Raw Data Storage Processing Core Services Advanced Services Data Warehouse ODS RDBMS Management

Workflow Services Platform

Access Management Integra tion Core Frameworks Interface s Users

Standard UIs Protocols/APIs Third-party Envs.

Core Platforms

Inge

s

tion

Big Data integration brings in relevant business data into the Big Data platform

Hybrid Data Platform: Integration

DBs and DWs • Oracle, MySQL • MS SQLServer • PostgresSQL • Omniture • Netezza • Terradata • Vertica

Integration Tools and Frameworks • Sqoop • Pentaho Kettle • Talend • Informatica, PowerExchange • SQL-H

(22)

Legend Platform External Platform Internal Data Sources Unstructured and Structured Raw Data Storage Processing Core Services Advanced Services Data Warehouse ODS RDBMS Management

Workflow Services Platform

Access Management Integra tion Core Frameworks Interface s Users

Standard UIs Protocols/APIs Third-party Envs.

Core Platforms

Inge

s

tion

The workflow, services and platform form the bulk of data management that helps to control the platform

Hybrid Data Platform: Management

• Hive MetaStore • HBase Master • Zookeeper • Cloudera CM • DataStax; OpsCenter • Hortonworks: Ambari • CouchDB: Futon • Oozie • Cascading • Azkaban • Talend

(23)

Legend Platform External Platform Internal Data Sources Unstructured and Structured Raw Data Storage Processing Core Services Advanced Services Data Warehouse ODS RDBMS Management

Workflow Services Platform

Access Management Integra tion Core Frameworks Interface s Users

Standard UIs Protocols/APIs Third-party Envs.

Core Platforms

Inge

s

tion

Core framework refers to a general framework, platform options, storage, processing and services

Hybrid Data Platform: Core System

Core Services

High

Latency/batch:

Java API, Hive, Pig • Low Latency: HBase, Cassandra, Couchbase • Docu Stores: MongoDB, CouchDB • Redis, Riak Processing

MapReduce: (M/R, M/Rv2 [YARN], Elastic MapReduce[EMR], Greenplum, Disco

MPP: ETLs Hadapt is an analytical platform that runs on Hadoop and treats it as an operating system.

BSP: Hama

Core Frameworks

• Hadoop: Version 1.0 and 2.0 • Cassandra: DP1, Brisk • MongoDB

• CouchDB, Couchbase Server • Redis • Riak Storage • HDFS • CassandraFS • GridFS (MongoDB) • AWS S3 • MPP databases • In-memory databases – SAP HANA, Druid

• Azure storage Advanced Services • Mahout • GraphDB • OpenGPS • Hcatalog • OpenTSBD • GeoLoc • Spark Core Platforms • Commodity hardware stack • Enterprise hardware stack • Distributed “networked” system • Cloud implementations • Hybrid implementations

(24)

Big Data Success Story 1

(25)

The rise of digital technologies forced the client to transform their IT landscape

Initial Situation

Rising Mainframe Costs

Mainframe Costs rapidly increased over the last years, with an ongoing increase of + 5% CPU usage per year

More Devices more Channels

75% - 80% of all online transactions already can be attributed from mobile devices, tablets and other channels with an increasing tendency

~60% of the overall transactions are read-only request

Digital Foundation

The client faced the challenge to invest and build a platform as Digital Foundation to be competitive in the Digital area

(26)

Scenarios

Traditional vendors and the open source community provide solutions, but which one is better for the client?

Solution Recommendation > Evaluated Solution Scenarios

Traditional Vendors Open Source Data Lake A B

• Usage of products from SAP (SAP HANA) and Oracle (Oracle Exalytics, Oracle TimesTen)

• Implementation of the “Transaction Cache” on a relational database and then leverage the analytical solutions

provided by the vendors

• Usage of NoSQL / BigData technologies established by Web 2.0 companies (e.g. Amazon, Google, Facebook, Twitter, Yahoo) and further enhanced by the open source community

Common data platform for an enterprise, which allows to execute real-time operational and analytical queries

Several commercial distributions are available, which provide enterprise-level support and actively contribute to the open source community to evolve the technology stack

(27)

Is a multi-workload Hadoop ready for the Enterprise?

Solution Recommendation > Open Source Data Lake

Concern How about support? Is it just a hype? Response

Enterprise-level support is available (Cloudera, Hortonworks, MapR Technologies, etc.)

• Openness – you can switch vendor

Should I use it everywhere?

NO – Huge Eco-System

• The adoption rate is steadily increasing, filling a real gap • All vendors are major contributors to the OS community • Comparable to Linux in take-up

NO – Just like all of NoSQL it’s not for everything

• Be thoughtful what to adopt, the core is very stable, newer tools may not

Is it secure?

Yes, integration with Kerberos and LDAP

Encryption in transit fully supported in Open Source • Encryption at rest is there, and easy with Linux

(28)

At the logical level, we not only require the “Data Lake”, but

it has to be integrated in the client’s application landscape

Logical Solution Architecture

Near Real-Time Integration Real-Time Analytics Application(s) Multi-Channel Platform Distributed Cache (Application Level) Real-Time Analytics Database De-central Database Central Database

Improves the performance of the response to a user

request

Gets the transaction read load away from

the host

Provides Real-Time Analytics

(29)

The final solution is based on Hadoop and a queue-based integration with the mainframe system

Physical Solution Architecture

Real-Time Analytics Application(s) Multi-Channel Platform Central Database (RDBMS) Middleware Hadoop Hortonworks Data Platform 2.1 Other Data Sources (e.g. Server Log -Files, …) Connector Custom Adapter ODBC, JDBC,

Core Banking Application

Frontend

read Native

API

write / update / delete / read write / update / delete read (non-transaction data)

(30)

Managed Service

With the Go-Live of the Transaction Cache the main project objectives were fulfilled

Achievements

Old Data Lake

Reduced Costs

With the Go-Live of the Transaction Cache, the client reduced immediately their online transactions by 60% and could save 50% of CPU on the mainframe

Project Duration

An agile and nimble onsite-project team implemented in only 7 months the foundation of Hadoop, data replication and APIs for online channels

Online Transactions R e d u c e d On lin e Tr a n s a c tion s

Old Data Lake

Mainframe CPU R e d u c e d M a inf ra m e C P U Milestone Mobilization Transition to Production Mobile Push Project Management Hadoop Cluster Service Performance Decision Analytics Managed Service

As Hadoop skills are rare and operation is not the core capability of the client, Accenture set-up a managed

services, with AO and IO services from PDC from ManilaIO Services

AM Services from Cebu

(31)

Besides the pure cost reduction, Accenture implemented fundamentals for next use-cases

Digital Foundation

Logging of all the customer

touchpoints with the Bank in the data lake is designed and currently under implementation

Customer Journey A POC for mobile push has been

successfully conducted. Underlying real-time technologies where established at Twitter (Storm) and LinkedIn (Kafka).

Mobile Push

Internal Fraud is the first use case of the roadmap to be implemented with the Hadoop ecosystem

Fraud Detection

Implementing a functional data model and micro segmentation is on the roadmap

Micro Segmentation

Setup of ad hoc and self-service analytics Analytics

(32)

Big Data Success Story 2

(33)

– Big Data Success Stories

Project main information

Customer Experience Management (CEM)

Change of architecture in operations system support area, including: • Business process management for network planning

• Service assurance systems enhancements • Network configuration

Close ties with OSS Factory project

• Outsourcing of the maintenance of OSS application to Accenture • Realised by onshore and offshore teams

Two project phases

• Phase 1 – Pilot: realization of three new use cases

• Phase 2 – Consolidation and migration of all Network Analytics applications to the new platform

The client wants to strengthen its customer experience

(34)

The project aims to simplify the system architecture while improving scalability and implementing new functionalities

Project objectives and content

• Consolidation and

simplification of the system architecture

• Improvement of system scalability

• Implementation of new functionalities

• Real-time data monitoring

• Reporting for new products Project Objectives

Project Content

Enable reporting,

monitoring and analysis of data from network interfaces and elements. Analysis dimensions: • Client

• Device

• Network Topology • Time

(35)

In phase 1 a pilot is being implemented for 3 major uses cases of the telecom industry

Pilot Use Cases

LTE where it matters

• Analyse LTE utilisation in the network cells from different dimensions • Identify LTE-related potential:

Cells without LTE but with LTE contract customers and with LTE supported devices – Network Planning

Cells with LTE and with customers with LTE supported devices but without LTE contracts – Marketing

Fixed Mobile Substitution

• Implementation of the reporting for a new product

• Measurement of service quality and identification of areas for improvement

Real-time monitoring

• Monitoring of network usage (number of users, up/down transfer in the real-time)

1

2

(36)

In phase 2 all legacy systems will be migrated to the platform

Legacy Systems

Reporting

Static reports with data & voice KPIs presented from customer / network / device dimension

Monitoring

Alarms based on monitoring of network specific KPIs

Visualization / Dashboards

Graphical representation of network data

1

2

(37)

The solution is based upon Pivotal Real Time Intelligence combined with a Hadoop / Teradata platform

Simplified architecture and data sources

37 Visual Analytics Application Passive Probing DPI Reference Data OSS Reference Data BSS ETL EMC Pivotal RTI 4 T CDRs Active Probing EM C Spring Framew ork Geomaps Server (e.g. ESRI) Other Data Business Intelligence Application Alarming Aggregated data Real-time data Hadoop / Teradata

(38)

Prerequisites for future network analytics use cases

Prospective Usage of the Big Data Platform

Integration of new data sources

• BSS data

• Fixed net data

• Campaign information • Customer surveys • Ticketing systems • … Improving management awareness

Prerequisites Possible Use Cases

Telco-specific

• Investment optimisation • Network load prediction • Proactive monitoring

General

• Churn analysis

• Customer segmentation • Product affinity analysis • Up-sell / cross-sell

(39)

Big Data Success Stories

(40)

Big Data projects require acceptance, deep knowledge and might raise data privacy concerns

Big Data Challenges

Data Privacy

• Connecting new data sources

• Interacting with sensitive or personal data

Technology & Architecture

• Hadoop is Enterprise-ready. However, some powerful tools may be immature • Rapidly evolving ecosystem

Decision Culture

• Influence requires acceptance • Acceptance relies on data quality

3

1

2

Involve your Data

Protection Officer early in the project to identify and resolve issues quickly Get real experts

Only choose from tried and tested technology

Fully integrate, don’t misuse Management should fully understand and agree to the value of the gained insight

(41)

Thank you!

Time for your questions…

References

Related documents

Water quality investigations were refined to identify sources of acid mine drainage within Gosline, Lovers Lane, and Turkey Run, conduct tributary mass-balance chemical water

While Europe associated the security challenge posed by al-Qaeda and its supporters with an external foreign threat – originating, in the majority of cases, outside of its

Once your samples are received the only correspondence you will have with The Genetics Company is via the email address you provide.. Please note that your DNA

We finally conclude on the findings that different algorithms perform differently to different Web browsers like Internet Explorer, Mozilla Firefox, Opera and

T h e magnitude of the response was determined and the heterogeneity of the PFC with respect to the affinity of the antibody produced by individual cells was

In addition to large companies in food industry such as Unilever, Ferrero, P & G and Nestle, there are also NGOs members such as WWF, Solidaridad and Oxfam (Nikoloyuk, et

-.06 -.06 -.06 -.04 -.04 -.04 -.02 -.02 -.02 0 0 0 .02 .02 .02 .04 .04 .04 .06 .06 .06 .08 .08 .08 Normalized Frequency Normalized Frequency Normalized Frequency of Usable Donations

(1) With respect to delivery goods for delivery, the seller shall prepare and submit to the Exchange a Delivery Tender Notice (for the delivery of palladium,