Big Data Leadership Team
Chris Ward
Principal Consultant
20 years in management consul5ng and execu5ve leadership
Exper5se in retail, marke5ng, hospitality & financial services Prior consul5ng experience with Opera Solu5ons and The Boston Consul5ng Group
BA from Princeton University, MBA from the University of Virginia Darden School of Business
James Bigger
Principal Consultant
20 years of management consul5ng and entrepreneurial experience
Exper5se in financial services, insurance and telecom
Prior consul5ng experience with Opera Solu5ons and A. T. Kearney
Ph.D. in Physics from Oxford University
Brian Vaughan
Principal Consultant
15 years in management consul5ng, analy5cs and soOware experience Exper5se in healthcare and insurance
Prior experience with Opera Solu5ons, Mitchell Madison Group and Broadlane Ph.D. in Physics from Stanford University
Ma3 DuBell
Principal Engineer
20 years of experience in a range of IT and security disciplines
Responsible for deploying large, secure, Hadoop-‐based
plaUorms for the U. S. Govt. 10 year of interna5onal experience implemen5ng networking and virtual data center environments Undergraduate degree from AIU
Prem Jain
Principal Consultant
Prem has 20 years of technology experience in enterprise datacenter technologies.
He has built innova5ve solu5ons in Big Data, storage, HPC, virtualiza5on, data migra5on and enterprise applica5ons.
Prem was formerly at NetApp, was the lead architect for Big Data and FlexPod solu5ons.
Big Data Team
Chris Infan9
Consul5ng Manager
8+ years of experience in big data analy5cs consul5ng. Experience in business development and delivery of analy5cs projects in the educa5on, wealth
management, public safety, corporate security, online subscrip5on, transporta5on, and retail sectors.
B.S. in Mathema5cs, B.A. in English Literature from Georgetown University
Jamie Milne
Consul5ng Manager
Over 7 Years of management consul5ng and entrepreneurial experience. Exper5ze in financial services, travel, and retail sectors across US and Europe. Led Big Data strategy and analy5cal engagements at Opera Solu5ons.
MSci in Astrophysics from the University of Cambridge.
Jason Lu
Chief Scien5st
Eighteen years of analy5cs and soOware development experience. Exper5se in financial services, healthcare, insurance, retail and marke5ng science. Prior analy5cs development experience at Opera Solu5ons, FICO and J.D. Power and Associates. Ph.D. in Physics from Stanford University.
Virtual Team
BDAs, Analy5c
Programmers, Storage
Specialists, Network
Architects, Hadoop
Administrators and other
professionals
Many years of experience architec5ng, deploying and managing compute, storage, network, Hadoop ecoysystem and database solu5ons for fortune 500 companies to augment the exper5se of the core Big Data Leadership Team.
Yoni Malchi
Consul5ng Manager
Worked as an Engagement Manager for predic5ve analy5cs consul5ng engagements. Experience in both the Financial Services and
Telecommunica5ons industries, bridging the gap between the business and data scien5sts. PhD in Mech. Eng. in 2007 and worked in the Aerospace industry for 4 years.
Volume, Variety and Velocity of Data are Exploding
The produc5on of data is expanding at an astonishing rate. Drivers include the switch from analog
to digital technologies and the crea5on of structured and unstructured data by individuals and
companies via social media and the Web
0
10
20
30
40
2010
2015
2020
ZB Enterprise Managed Data Enterprise Created Data
0
10
20
30
40
50
60
70
80
2009 2010 2011 2012 2013 2014
Unstructured data storage Structured data storageEB
Volume
Variety
Velocity
•
Every 60 Seconds:
-
98,000+ tweets
-
695,000 status updates
-
11 million instant messages
-
698,445 Google searches
-
168 million+ emails sent
-
1,820TB of data created
-
217 new mobile web users
•
The need to process more data faster
to respond to dynamic business trends
has brought new requirements for
database architectures
•
We believe the industry stands at the
cusp of the most significant revolu8on
in database and, therefore, applica8on
architectures in the past 20 years.
Data Sources &
Capture
IT Infrastructure
Extended Infrastructure + Data PlaUorms System Integrators Specialized End-‐to-‐ End Solu5onsData Management
&Integra5on
Proprietary Data PlaUorm
Infrastructure Vendors
Data Vendors
Open Data PlaUorms
Analy5cs Service Provider
Ver5cal Analy5cs Solu5onsAnaly5cs PlaUorms &
Solu5ons
Analy5cs Services &
Support
Vendor Landscape
Is Crowded and
Growing
Key Big Data Technologies
Columnar
NoSQL
Hadoop
FOUNDATIONAL
EMERGING
In-‐Memory
Distributed File System and Processing
Language
Characteris9cs
• Parallel storage/processing
• Flexible programming model
• Horizontal scaling
• Batch processing
Enablement / Uses
• Pre-‐processing of data for analy5cs
• ETL for transforming unstructured data to structured
• Data summariza5on
Non-‐rela9onal Key-‐Value Database
Characteris9cs
• Fast read/write
• Real 5me query
• Horizontal scaling
• Simple programming model
• Dynamic schema Enablement / Uses
• Real-‐5me ingest
• Rapid retrieval
• Input to MapReduce
Column-‐Oriented Database Analy9cs
Enablement / Uses
• On-‐Line Analy5cs Processing (OLAP)
• Data storage and retrieval for advanced analy5cs
Characteris9cs
• Rela5onal
• Efficient compression
• Op5mized for fast read of many/all records
In-‐Memory Database and Processing
Characteris9cs
• Rela5onal
• Random Access
• Extremely Fast
Enablement / Uses
• Complex Event Processing
• Real Time Analy5cs
• Poten5al to use a common database for transac5ons and analy5cs
The Big Data Software Stack
The big data
ecosystem includes
open source and
proprietary
distribu5ons that
span the stack from
ingest through
analy5cs
USER/MACHINE WORKFLOW
Enterprise Structured Enterprise Unstructured 3rd Party Web/ Unstructured
TRANSFORM ANALYTICS DATABASE ANALYTICS ACCESS/ QUERIES INGEST FILE SYSTEM/ DATABASE MANAGEMENT Columnar In Memory Parallel RDBMS EMC/PIVOTAL HD / GREENPLUM HP/VERTICA/CLOUDERA ORACLE BIG DATA EXADATA/EXALYTICS
IBM INFOSPHERE BIGINSIGHTS SAP HANA TERRACOTTA BIGMEMORY ZOOKEEPER CLOUDERA HORTONWORKS MAPR PIVOTALHD HADOOP CASSANDRA HBASE MONGODB TEREDATA NETEZZA GREENPLUM VERTICA OLAP Natural Language Custom Analy9cs Custom API’s SQL
OPEN SOURCE OPENCOMMERCIAL SOURCE Fast, Scalable Provisioning Maintenance Flexible, Compressed , Fast Read Op9mized for high vol reads Interfaces to accept data Real Time & Batch HDFS NoSQL -‐ Document -‐ Key-‐Value -‐ Wide Column SQL PIG HIVE R PYTHON SAS SPSS Batch
Streaming SFQOOPLUME S
PLUNK TALEND
LAYER PROPERTIES OPTIONS EXAMPLES OF PRODUCTS INTEGRATED OFFERINGS
MapReduce HADOOP
Parallel, Distributed
ODS Data Warehouse Call Center Server Logs Financial Demographic DATA ACQUIRE ORGANIZE ANALYZE DECIDE SOLUTIONS MICROSTRATEGY BUSINESS OBJECTS
COGNOS
Dual Approach to Delivering Big Data Solu5ons
WWT offers customers both strategic and tac5cal approaches to derive value from the applica5on
of Big Data analy5cs and technology
•
Strategic Roadmap
−
Big Data Strategy
−
Use Case Design
•
Use Case PoC
−
Analy5cs Development
−
Workflow Integra5on
•
Data Warehouse Op5miza5on
−
ETL/ELT Offload
−
Data Lake Crea5on
•
SAP HANA Implementa5on
•
Big Data Stack Build / Op5miza5on
•
Produc5on Support & Sustainment
BIG DATA BUSINESS
IMPACT
Extract value from data to drive
mul9ple Use Cases
BIG DATA TECHNOLOGY
OPTIMIZATION
Accomplish data tasks, faster, cheaper,
beJer
Defining The Opportunity Is The Starting Point
The power of “Big Data” lies in
bringing together data in a
5mely fashion from sources
within and external to the
enterprise -‐ structured and
unstructured -‐ to create a
complete view of cri5cal
issues, therefore enabling
advanced analy5cs to
unlock key insights
that drive significant
Value.
Technology
Clearly defined use cases with the poten5al to deliver
significant value by dis5lling vast data into new, previously
unknowable intelligence
Advanced machine learning techniques to analyze
data and mine for insights to drive cri5cal decisions
Structured or unstructured, internal or
external, requiring new methods of
storage/integra5on
Emerging/new technology stacks
using scalable, distributed
architectures
Outcome
Analy9cs
HADOOP INFRASTRUCTURE
• Established Big Data infrastructure• Migrated and normalized data sets
• Developing visualiza5ons, tools and predic5ve analy5cs
EQUIPMENT MAINTENANCE (SAP) DISPATCH & OPERATOR (TERADATA) FUEL, OIL, ANALYSIS, ETC. (SQL SERVER)
DISPARATE DATA SETS
• Integra5ng 15+ siloed data sources in mul5ple file formats
• 10 terabytes of data
• 3 year historical data ecosystem
MINING COMPANY
PROJECT SCOPE
• 252 trucks• 200 sensors per truck
• 7 mine sites
• 10,000 readings per second
DATA LOGGER
DATA LOGGER
DATA LOGGER
Stra5fying Alarms:
1. Urgent component problem
2. Cri9cal sensor problem
3. Important/not urgent component/sensor problem
4. Not important component/sensor problem
5. Noise – ignore
Urgent component failure models: engine,
transmission, differen5als, torque converters,
final drives
Data/analy5cs-‐driven 5ming for preventa5ve
maintenance (e.g. oil changes) on individual
trucks
BUSINESS IMPACT
•
Higher equipment up-‐5me
•
Reduced cri5cal component failure
•
Beser preventa5ve maintenance
•
Increased produc5vity
TRUCK SENSOR DATA
(Osi Pi SERVER)
1
2
3
Se ns or D ata360
0VIEW OF MACHINE
WWT Hadoop Appliance
Tradi9onal Data
Warehouse
Full Data Universe
CRM Social Media
Billing Web logs Payments Scheduling
Cold Data
Warm Data
Data
Hot
2.
About 50% of data that is
brought into a typical Data
Warehouse system is rarely
accessed
3.
About 80% of the queries
and repor5ng performed on
highly-‐used data does not
need to be at DW speeds
1.
A significant amount of data
is thrown out during the ETL
process that may be valuable
in the future
Tradi9onal Data Warehouse
Full Data Universe
CRM SocialMedia Billing Web logs Payments Scheduling
Cold Data
Warm
Data
2.
Move cold/warm data, ETL
workflows, and ELT scripts
to Hadoop, taking advantage
of lower cost per TB
3.
Con5nue to take
advantage of DW agility
and speed in real-‐5me
analysis and querying
1.
U5lize addi5onal Hadoop-‐
based storage to store full
data universe
Warm
Data
Data
Hot
CU
RRE
N
T
PRO
PO
SE
D
Data Warehouse Optimization: Value Proposition
Augmen5ng the
Data Warehouse
with a less
expensive
Hadoop system
allows companies
to free up
valuable space on
their DW systems
to run faster
queries and
analysis, whilst
storing large
volumes of their
data universe
Four Major Big Data Challenges
In our mee5ngs with customers, four issues are consistently brought up as a major challenges
related to crea5ng a big data capability that can effec5vely support the business units
Big Data
Challenges
Deploying new technologies
and combining with exis9ng
architecture
• How do we create an effec5ve
integrated Big Data stack? • What new technologies do we
need and how do they fit together?
Defining the outcome
• What problem/opportunity
are we pursuing?
• What is the value that can
be created?
Naviga9ng a crowded and
evolving vendor landscape
• How do we separate marke5ng hype from reality?
• Who should we use? Who can we trust
Organizing for success
• Where does Big Data fit?
• Who is responsible for data
integrity?
• Where do we find the
cri5cal resources needed to deliver Big Data solu5ons?
• Develop a roadmap for implemen5ng Big Data
Use case explora5on
Data Governance, Infrastructure and Analy5cs ownership
• Define high impact use cases
• Design and test appropriate reference architectures
Plan
Design
Pilot
Scale
WWT Services Indica9ve Infra-‐ structure • Create detailed descrip5on of selected pilot use cases
Analy5cs
Workflow integra5on
• Test various reference architectures
• “Stand-‐up” reference architecture
• Design the pilot
Success criteria
Timeline
Scope
• Iden5fy and prepare data
• Build analy5cal models
• Design workflow
• Implement, manage and monitor
Analy9cs-‐Ready Infrastructure Solu9on Development
• Implement design changes from pilot learnings
• Invest in soOware development as necessary to improve UI
• Prepare ETL process for scale
• Build out infrastructure as required to support rollout
4.
Produc8on Support
• Opera8onalizing POC • Infrastructure Sustainment • Training • Ongoing support3.
Proof of Concept
• POC design
• Analy8cal models
• Customer data loaded, processed and analyzed
1.
Strategic Roadmap
• Use case defini8on
• Organiza8onal alignment
• Big Data Architecture high level design
2.
Big Data Stack Build
• Detailed design Big Dataarchitecture and BOM
• Procure, configure and deploy Big Data stack
EXAMPLE SCALE OUT HARDWARE
• Mul9ple expansion racks
2 Nexus 2232PP Fabric Extenders 16 Cisco UCS C240 EMC Isilon EXAMPLE STARTER KIT
• Big Data Solu9on Stack:
2 UCS 6296PP 2 Nexus 2232PP 16 Cisco UCS C240 EMC Isilon SoWware: PivotalHD, Greenplum, etc.
C
OLLABORATION
E
NTERPRISEN
ETWORKS
S
ECURITY
D
ATAC
ENTER
• Next Genera5on Networking
• Nexus (7K, 5K, 3K & 2K)
• Virtual Networking (Nexus 1000v)
• OTV, LISP, Fabric Path
• Layer 2 Extension
• DR/BC Networking
• BYOD (Bring Your Own Device) & Secure Mobility
• Jukebox
• ISE & RSA
• ASA 1000v
• VSG (Virtual Security Gateway)
• Cyber Security Solu5ons
• Unified
Communica5ons
• Tandberg Video
• VXI (View &
XenDesktop)
• WebEx, Call Center & Collabora5on Solu5ons
• Phones, Backpacks &
SoO, Phone Clients
• Telepresence & Business Video
• Vblock, FlexPod & CloudSystem Matrix
• EMC & NetApp Storage
• vSphere / XenServer
• vCloud Director
• VDI (View / XenDesktop)
• Cisco CIAC & BMC CLM
• EMC’s UIM & Cloupia
• FAST MDC (Mobile Data Center) Solu5ons
B
IGD
ATA
• Cisco UCS C220, C240
• HP DL380
• Nexus 2200, UCS 6296
• FlexPod Select, Isilon storage • Cloudera, MapR, PivotalHD • Cloud Foundry • Velocidata Appliance • Next Genera5on provisioning tools
A highly collabora5ve, ecosystem to design, build, educate, demo & deploy advanced
technology solu5ons for our customers & partners
Big Data Environment Set-up: ATC Reference Architectures
Four analy5cs-‐ready
infrastructure
stacks have been
developed in the
ATC to showcase
Big Data
technologies
DATA
Enterprise Structured Enterprise Unstructured 3rd Party Web/ Unstructured ODS Data Warehouse Call Center Server Logs Financial Demographic
STORAGE
R
EFERENCE
A
RCHITECTURE1
NETWORK FILE SYSTEM/ DATABASES ANALYTICS TOOLS ANALYTICS DATABASES COMPUTE INGESTR
EFERENCE
A
RCHITECTURE2
HP Internal LocalStorage UCS – NetApp Direct A3ached Storage
UCS 6296UP NEXUS 2232PP
UCS-‐C220M3
R
EFERENCE
A
RCHITECTURE3
UCS – Isilon Network Storage UCS 6296 NEXUS 2200 HAWQ HBASE PIVOTALHD UCS-‐C240 MICROSTRATEGY MICROSTRATEGYR
EFERENCE
A
RCHITECTURE4
SAP HANA HITACHI UCS B BLADES JBOD SATA HORTON IMPALA NEXUS 2200 HP DL 380 HBASE R PYTHON R PYTHON R PYTHON HITACHI NETAPP E5460 ISILONVELOCIDATA VELOCIDATA VELOCIDATA MAPR
CLOUDERA CLOUDERA
GEMFIRE IMPALA HBASE
JAVA JAVA JAVA
In Process
Current
In Process
SPLUNK SPLUNK SPLUNK
HORTON MAPR HORTON MAPR
CLOUDERA SAP HANA
Func9on
Descrip9on
Proof of Concept
•
Test customer solu5ons prior to full onsite implementa5on, e.g.
−
Run Use Case analy5cal models and architectures on Big Data machines
−
Create Big Data hardware/soOware stack, poten5ally with client data
Vendor Comparison
•
Compare Big Data solu5ons to provide insight into strengths and weaknesses of
each
•
Run “bake-‐offs” to gauge how well a full solu5on can be solved using certain
components
Field Demo
•
Showcase Big Data capabili5es by hos5ng demos of WWT PoCs and analysis
•
Enable virtual access for field engineers to run customer demos
Performance
Benchmarking
•
Run benchmark tests to measure speed and performance of Big Data
technologies, including compe5ng Hadoop distribu5ons and storage op5ons
Technology Evalua9on
•
Evaluate new technologies in the ATC as they are released, allowing our
engineers to get up to speed before working in customer environments
Training
•
Hold training courses for customers and partners that allow them to work with
Big Data soOware and hardware in a highly customizable environment that reach
across a variety of vendors
How to Leverage ATC Architectures
We use the ATC for
a variety of
customer and
partner use cases,
ranging from
technology tes5ng
to full solu5on
deployment
WWT Big Data Workshop
WHAT IS IT?
•
A full-‐day interac5ve session with WWT consultants and Data Scien5sts designed to
increase your understanding of Big Data and help you outline your strategy for using
Big Data analy5cs solu5ons to add value.
ESTIMATE
use-‐
cases poten5al
impact and ease
of implementa5on
IDENTIFY
clear use-‐
cases that can’t be
iden5fied with the
current setup
DETERMINE
which
of the use-‐cases
can benefit from
WWT capabili5es
CHOOSE
high-‐
value, ac5onable
use cases
WHAT TO EXPECT
•
Highly-‐Skilled Consultants and Engineers
•
Emerging Technology
•
Customized Technical and Strategic Whiteboard Session
•
Best Prac5ces
•
Expert Insight
•
Use Cases and Success Stories
$
Im
pac
t
Ease of Implementa5on
High-‐value,
ac5onable
use case