ORACLG Oracle Press
Oracle Big Data
Handbook
Tom Plunkett Brian Macdonald
Bruce Nelson Helen Sun
Mark F. Hornick Keith Laker
Khader Mohiuddin Debra L. Harding
David Segleau
Gokula Mishra Robert Stackowiak
Mc Graw Hill
Education
New York
Chicago
San Francisco Athens London Madrid MexicoCity
Milan New
Delhi Singapore Sydney
TorontoAcknowledgments
xxiIntroduction xxv
PART I
Introduction
1 Introductionto
Big
Data 3Big
Data 4Google's MapReduce Algorithm
andApache Hadoop
5Oracle's
Big
Data Platform 7Summary
102 The Value of
Big
Data 11Am I
Big Data,
orIsBig
DataMe? 12Big Data,
Little Data—It's Still Me 15What
Happened?
16NowWhat? 17
Reality,
CheckPlease!
18WhatDo You Make of It? 20
Information Chain Reaction (ICR) 21
Big
Data,Big Numbers, Big
Business? 23Twitter 24
Facebook 25
Internal Source 25
ICR: Connect 26
ICR:
Change
27xi
xii
OracleBig
DataHandbook
Wanted:
Big
Data Value 29Big
DataExample
1: Clinical TrialResearch
Within the Healthcare
Industry
30Example
2:Improvements
in CarDesign
forDriver
Safety
Within the AutomotiveIndustry
31Summary
32PARTII
Big Data Platform
3 The
Apache Hadoop
Platform 37Software vs. Hardware 39
The
Hadoop
Software Platform 39Hadoop
Distributions and Versions 40The
Hadoop
Distributed FileSystem (HDFS)
40Scheduling, Compute,
andProcessing
43Operating System
Choices 45I/O and the Linux Kernel 46
The
Hadoop
Hardware Platform 46CPU
and Memory
47Network 47
Disk 48
Putting
ItAllTogether
484
Why
anAppliance?
51Why
Would Oracle Createa
Big
DataAppliance?
52What Isan
Appliance?
53WhatAre the Goals of
Oracle
Big
DataAppliance?
54Optimizing
anAppliance
55Oracle
Big
DataAppliance
Version 2 Software 56Oracle
Big
DataAppliance
X3-2 Hardware 58Where Did Oracle
Get
Hadoop Expertise?
61Configuring
aHadoop Cluster
63Choosing the
Core ClusterComponents
64Assembling
the Cluster 66What Abouta Do-It-Yourself
Cluster?
67Total Costs ofa
Cluster
69TimetoValue 73
Howto Build Out
Larger
Clusters 75Can I
Add
Other Softwareto Oracle
Big
DataAppliance?
75Drawbacks ofan
Appliance
765 BDA
Configurations, Deployment Architectures,
andMonitoring
79Introduction 80
Big
DataAppliance
X3-2 Full Rack(Eighteen
Nodes) 82Big
DataAppliance
X3-2 Starter Rack(Six Nodes)
86Big
DataAppliance
X3-2 In-RackExpansion
(SixNodes)
89Hardware Modificationsto BDA 89
Software
Supported
onBig
DataAppliance
X3-2 90BDA Install and
Configuration
Process 92Critical and Noncritical Nodes 94
Automatic Failover of the NameNode 95
BDA Disk
Storage Layout
96Adding Storage
to aHadoop
Cluster 99Hadoop-Only Config
andHadoop+NoSQL
DB 99Hadoop-Only Appliance
100Hadoop
andNoSQL
DB 100Memory Options
103Deployment Architectures
103Multitenancy
andHadoop
in the Cloud 103Scalability
105Multirack BDA Considerations 106
Installing
Other Softwareon the BDA 107BDA in the Data Center 107
Administrative Network 107
ClientAccessNetwork 108
InfiniBand Private Network 108
Network
Requirements
109Connecting
to Data Center LAN 111Example Connectivity
Architecture 111Oracle
Big
DataAppliance
RestrictionsonUse 112BDA
Management
andMonitoring
113Enterprise Manager
115Cloudera
Manager
117Hadoop Monitoring
Utilities: Web GUI 117Oracle ILOM 120
Hue 122
DCLI
Utility
123xiv
OracleBig
Data Handbook6
Integrating
the Data Warehouse andAnalytics
Infrastructure toBig
Data 125The Data Warehouseas a Historic Database of Record 126
The Oracle Database as a Data
Warehouse
127Why
the Data Warehouse andHadoop
AreDeployed Together
128Completing
theFootprint:
BusinessAnalyst
Tools 130Building
Outthe Infrastructure 1317 BDAConnectors 133
Oracle
Big
Data Connectors 134Oracle Loader for
Hadoop
136Online Mode 137
Oracle OCI DirectPath
Output
139JDBC Output
139Offline Mode 140
Oracle
DataPump Output
141Delimited
TextOutput
141Installation of Oracle
Loader
forHadoop
142Invoking
Oracle Loader forHadoop
143Input
Formats 144DelimitedTextlnputFormat
145RegexInputFormat
146AvrolnputFormat
146HiveToAvrolnputFormat
146KVAvroInputFormat
147Custom
Input
Formats 147Oracle Loader for
Hadoop
Configuration
Files 147Loader
Maps
150Additional
Optimizations
152Leveraging
InfiniBand 152Comparison
toApache Sqoop
153Oracle
SQL
Connector for HDFS 153Installation of Oracle
SQL
Connector for HDFS 157HIVE Installation 159
Creating
ExternalTables Using
Oracle
SQL
Connector for HDFS 160ExternalTable
Configuration
Tool 161Data Source
Types
161Configuration
ToolSyntax
162Required Properties
163Optional Properties
164ExternalTable
Tool for
DelimitedTextFiles 164Testing
DDLwith -noexecute 167Adding
a New HDFS Filetothe Location File 167Manual External Table
Configuration
1 68Hive Sources 169
ExternalTable
Example
170Oracle Data
Pump
Sources 171Configuration Files
173Querying
with OracleSQL
Connector for HDFS 175
Oracle R Connector
for Hadoop
176Oracle Data
Integrator Application Adapter
forHadoop
1778 Oracle
NoSQL
Database 181What Isa
NoSQL
DatabaseSystem?
182NoSQL Applications
184Oracle
NoSQL
Database 185A
Sample
Use Case 186Architecture
188Client Driver 189
Key-Value
Pairs 190Storage
Nodes 192Replication
193Smart
Topology
194Online Elasticity
194No
Single
Point of Failure 195Data
Management
195APIs 195
CRUD
Operations
196Multiple Update Operations
196Lookup Operations
196Transactions 197
Predictable Performance 198
Integration
199Installation and Administration 200
Simple
Installation 200Administration
200How Oracle
NoSQL
Database StacksUp
201Useful
Links 202PART III
Analyzing Information and Making Decisions
9 In-Database
Analytics: Delivering
FasterTimetoValue 205Introduction
206Oracle's In-Database
Analytics
208Why Running
In-Database Is SoImportant
211XVi
OracleBig
Data HandbookIntroduction toOracle Data
Mining
and
Statistical Analysis
211Oracle's In-Database Advanced
Analytics
213Oracle Data
Mining
213Introduction to R 223
Text
Mining
231In-Database
Statistical
Functions 236Making
Bl Tools Smarter 237Spatial Analytics
238Understanding
theSpatial
DataModel 239Querying
theSpatial
Data Model 239Using Spatial Analytics
240Making
Bl Tools Smarter 241Graph-Based Analytics
242Graph
Data Model 242Querying Graph
Data 243Multidimensional
Analytics
245Making
BlTools Smarterand Faster 246In-Database
Analytics:
Bringing
It AllTogether
247Integrating Analytics
intoExtract-Load-Transform
Processing
247Delivering Guided Exploration
248Delivering Analytical Mash-ups
249Conclusion 249
10
Analyzing
Datawith R 251Introduction to
Open
SourceR 252CRAN, Packages,
and TaskViews 252GUIs and IDEs 255
Traditional R and Database Interaction vs. Oracle R
Enterprise
256Oracle's
Strategic
ROfferings
258Oracle R
Enterprise
259Oracle R Distribution 260
ROracle
261Oracle R Connector for
Hadoop
261Oracle R
Enterprise:
Next-LevelView 261Oracle R
Enterprise Installation and Configuration
263Using
Oracle REnterprise
265Transparency Layer
265EmbeddedR Execution 276
Predictive
Analytics
293OracleR Connectorfor
Hadoop
309Invoking MapReduce Jobs
311Testing
ORCH RScripts Without
theHadoop
Cluster 311Interacting
with HDFS from R 313HDFS
Metadata Discovery
314Working
withHadoop Using
the ORCH Framework 316ORCH Predictive
Analytics
onHadoop
317ORCHhive
319Oracle RConnector for
Hadoop
andOracle R
Enterprise
Interaction 322Summary
32211
Endeca
InformationDiscovery
325Why
Did Oracle Select Endeca? 326ProductSuitesOverview 326
Endeca
InformationDiscovery
Platform 328Major
Functional Areas 328Key
Features 328Endeca Information
Discovery
and Business
Intelligence
331Differencein Roles and Functions 332
Bl
Development
Processvs. Information
Discovery Approach
333Complementary
But Not Exclusive 334Architecture
335Oracle Endeca Server 336
Oracle Endeca Studio 339
Oracle Endeca Integration
Suite 342Endeca on
Exalytics
343Scalability
and LoadBalancing
344Unifying
Diverse Content Sets 348Endeca Differentiator 349
Industry
Use Cases 349Hands-On with Endeca 351
Installation and
Configuration
351Developing
an EndecaApplication
35312
Big
Data Governance 357Key
Elements ofEnterprise
Data Governance 359Business Outcome 359
Information
Lifecycle Management
359Regulatory Compliance
and RiskManagement
360Metadata
Management
360Xviii
OracleBig
Data HandbookData
Quality Management
361Master and
Reference
DataManagement
361Data
Security
andPrivacy Management
362Business Process
Alignment
362How Does
Big
DataImpact Enterprise
Data Governance? 363Modeled
Data vs. Raw Data 363Types
ofBig
Data 366Applying
Data Governance toBig
Data 370Leveraging Big
Data Governance 373Industry-Specific
Use Cases 377Utilities
377Healthcare 379
Financial Services 380
Retail 382
Consumer
Packaged
Goods(CPG)
383Telecommunications 384
Oil and Gas 386
How Does
Big
DataImpact
Data Governance Roles? 388Governance Roles and
Organization
388An
Approach
toImplementing Big
Data Governance 38913
Developing Architecture
andRoadmap
forBig
Data 393Architecture Capabilities
forBig
Data 394New Characteristics of
Big
Data 394Conceptual Architecture Capabilities
ofBig
Data 395Product Capabilities
and Tools 397Making Big
Data Architecture Decisions 399Architecture
Development
Processfor
Realizing
Incremental Values 400Overview of Oracle Information
ArchitectureFramework 400
Overview of
Applied
OADP forInformation Architecture 406
Big
DataArchitectureDevelopment
Process 408Impact
on DataManagement
and Bl Processes 415
Traditional Bl
Development
Process 415Big
Data andAnalytics Development
Process 415Big
Data Governance 416Traditional Data Governance Focus 417
New Focus for Governance in
Big
Data 417Developing
Skills andTalent
418Data Scientist 418
Big
DataDeveloper
419Big
DataAdministrator
419Big
Data BestPractices 419Align Big
Data Initiative withSpecific
Business Goals 420Ensurea
Centralized
ITStrategy
for Standards and
Governance
420Use a Center of
Excellence
toMinimize
Training
and Risk 420Correlate