• No results found

Oracle Big Data Handbook

N/A
N/A
Protected

Academic year: 2021

Share "Oracle Big Data Handbook"

Copied!
10
0
0

Loading.... (view fulltext now)

Full text

(1)

ORACLG Oracle Press

Oracle Big Data

Handbook

Tom Plunkett Brian Macdonald

Bruce Nelson Helen Sun

Mark F. Hornick Keith Laker

Khader Mohiuddin Debra L. Harding

David Segleau

Gokula Mishra Robert Stackowiak

Mc Graw Hill

Education

New York

Chicago

San Francisco Athens London Madrid Mexico

City

Milan New

Delhi Singapore Sydney

Toronto

(2)

Acknowledgments

xxi

Introduction xxv

PART I

Introduction

1 Introductionto

Big

Data 3

Big

Data 4

Google's MapReduce Algorithm

and

Apache Hadoop

5

Oracle's

Big

Data Platform 7

Summary

10

2 The Value of

Big

Data 11

Am I

Big Data,

orIs

Big

DataMe? 12

Big Data,

Little Data—It's Still Me 15

What

Happened?

16

NowWhat? 17

Reality,

Check

Please!

18

WhatDo You Make of It? 20

Information Chain Reaction (ICR) 21

Big

Data,

Big Numbers, Big

Business? 23

Twitter 24

Facebook 25

Internal Source 25

ICR: Connect 26

ICR:

Change

27

xi

(3)

xii

Oracle

Big

Data

Handbook

Wanted:

Big

Data Value 29

Big

Data

Example

1: Clinical Trial

Research

Within the Healthcare

Industry

30

Example

2:

Improvements

in Car

Design

for

Driver

Safety

Within the Automotive

Industry

31

Summary

32

PARTII

Big Data Platform

3 The

Apache Hadoop

Platform 37

Software vs. Hardware 39

The

Hadoop

Software Platform 39

Hadoop

Distributions and Versions 40

The

Hadoop

Distributed File

System (HDFS)

40

Scheduling, Compute,

and

Processing

43

Operating System

Choices 45

I/O and the Linux Kernel 46

The

Hadoop

Hardware Platform 46

CPU

and Memory

47

Network 47

Disk 48

Putting

ItAll

Together

48

4

Why

an

Appliance?

51

Why

Would Oracle Create

a

Big

Data

Appliance?

52

What Isan

Appliance?

53

WhatAre the Goals of

Oracle

Big

Data

Appliance?

54

Optimizing

an

Appliance

55

Oracle

Big

Data

Appliance

Version 2 Software 56

Oracle

Big

Data

Appliance

X3-2 Hardware 58

Where Did Oracle

Get

Hadoop Expertise?

61

Configuring

a

Hadoop Cluster

63

Choosing the

Core Cluster

Components

64

Assembling

the Cluster 66

What Abouta Do-It-Yourself

Cluster?

67

Total Costs ofa

Cluster

69

(4)

TimetoValue 73

Howto Build Out

Larger

Clusters 75

Can I

Add

Other Software

to Oracle

Big

Data

Appliance?

75

Drawbacks ofan

Appliance

76

5 BDA

Configurations, Deployment Architectures,

and

Monitoring

79

Introduction 80

Big

Data

Appliance

X3-2 Full Rack

(Eighteen

Nodes) 82

Big

Data

Appliance

X3-2 Starter Rack

(Six Nodes)

86

Big

Data

Appliance

X3-2 In-Rack

Expansion

(Six

Nodes)

89

Hardware Modificationsto BDA 89

Software

Supported

on

Big

Data

Appliance

X3-2 90

BDA Install and

Configuration

Process 92

Critical and Noncritical Nodes 94

Automatic Failover of the NameNode 95

BDA Disk

Storage Layout

96

Adding Storage

to a

Hadoop

Cluster 99

Hadoop-Only Config

and

Hadoop+NoSQL

DB 99

Hadoop-Only Appliance

100

Hadoop

and

NoSQL

DB 100

Memory Options

103

Deployment Architectures

103

Multitenancy

and

Hadoop

in the Cloud 103

Scalability

105

Multirack BDA Considerations 106

Installing

Other Softwareon the BDA 107

BDA in the Data Center 107

Administrative Network 107

ClientAccessNetwork 108

InfiniBand Private Network 108

Network

Requirements

109

Connecting

to Data Center LAN 111

Example Connectivity

Architecture 111

Oracle

Big

Data

Appliance

RestrictionsonUse 112

BDA

Management

and

Monitoring

113

Enterprise Manager

115

Cloudera

Manager

117

Hadoop Monitoring

Utilities: Web GUI 117

Oracle ILOM 120

Hue 122

DCLI

Utility

123

(5)

xiv

Oracle

Big

Data Handbook

6

Integrating

the Data Warehouse and

Analytics

Infrastructure to

Big

Data 125

The Data Warehouseas a Historic Database of Record 126

The Oracle Database as a Data

Warehouse

127

Why

the Data Warehouse and

Hadoop

Are

Deployed Together

128

Completing

the

Footprint:

Business

Analyst

Tools 130

Building

Outthe Infrastructure 131

7 BDAConnectors 133

Oracle

Big

Data Connectors 134

Oracle Loader for

Hadoop

136

Online Mode 137

Oracle OCI DirectPath

Output

139

JDBC Output

139

Offline Mode 140

Oracle

Data

Pump Output

141

Delimited

Text

Output

141

Installation of Oracle

Loader

for

Hadoop

142

Invoking

Oracle Loader for

Hadoop

143

Input

Formats 144

DelimitedTextlnputFormat

145

RegexInputFormat

146

AvrolnputFormat

146

HiveToAvrolnputFormat

146

KVAvroInputFormat

147

Custom

Input

Formats 147

Oracle Loader for

Hadoop

Configuration

Files 147

Loader

Maps

150

Additional

Optimizations

152

Leveraging

InfiniBand 152

Comparison

to

Apache Sqoop

153

Oracle

SQL

Connector for HDFS 153

Installation of Oracle

SQL

Connector for HDFS 157

HIVE Installation 159

Creating

External

Tables Using

Oracle

SQL

Connector for HDFS 160

ExternalTable

Configuration

Tool 161

Data Source

Types

161

Configuration

Tool

Syntax

162

Required Properties

163

Optional Properties

164

ExternalTable

Tool for

DelimitedTextFiles 164

Testing

DDLwith -noexecute 167

(6)

Adding

a New HDFS Filetothe Location File 167

Manual External Table

Configuration

1 68

Hive Sources 169

ExternalTable

Example

170

Oracle Data

Pump

Sources 171

Configuration Files

173

Querying

with Oracle

SQL

Connector for HDFS 175

Oracle R Connector

for Hadoop

176

Oracle Data

Integrator Application Adapter

for

Hadoop

177

8 Oracle

NoSQL

Database 181

What Isa

NoSQL

Database

System?

182

NoSQL Applications

184

Oracle

NoSQL

Database 185

A

Sample

Use Case 186

Architecture

188

Client Driver 189

Key-Value

Pairs 190

Storage

Nodes 192

Replication

193

Smart

Topology

194

Online Elasticity

194

No

Single

Point of Failure 195

Data

Management

195

APIs 195

CRUD

Operations

196

Multiple Update Operations

196

Lookup Operations

196

Transactions 197

Predictable Performance 198

Integration

199

Installation and Administration 200

Simple

Installation 200

Administration

200

How Oracle

NoSQL

Database Stacks

Up

201

Useful

Links 202

PART III

Analyzing Information and Making Decisions

9 In-Database

Analytics: Delivering

FasterTimetoValue 205

Introduction

206

Oracle's In-Database

Analytics

208

Why Running

In-Database Is So

Important

211

(7)

XVi

Oracle

Big

Data Handbook

Introduction toOracle Data

Mining

and

Statistical Analysis

211

Oracle's In-Database Advanced

Analytics

213

Oracle Data

Mining

213

Introduction to R 223

Text

Mining

231

In-Database

Statistical

Functions 236

Making

Bl Tools Smarter 237

Spatial Analytics

238

Understanding

the

Spatial

DataModel 239

Querying

the

Spatial

Data Model 239

Using Spatial Analytics

240

Making

Bl Tools Smarter 241

Graph-Based Analytics

242

Graph

Data Model 242

Querying Graph

Data 243

Multidimensional

Analytics

245

Making

BlTools Smarterand Faster 246

In-Database

Analytics:

Bringing

It All

Together

247

Integrating Analytics

into

Extract-Load-Transform

Processing

247

Delivering Guided Exploration

248

Delivering Analytical Mash-ups

249

Conclusion 249

10

Analyzing

Datawith R 251

Introduction to

Open

SourceR 252

CRAN, Packages,

and TaskViews 252

GUIs and IDEs 255

Traditional R and Database Interaction vs. Oracle R

Enterprise

256

Oracle's

Strategic

R

Offerings

258

Oracle R

Enterprise

259

Oracle R Distribution 260

ROracle

261

Oracle R Connector for

Hadoop

261

Oracle R

Enterprise:

Next-LevelView 261

Oracle R

Enterprise Installation and Configuration

263

Using

Oracle R

Enterprise

265

Transparency Layer

265

EmbeddedR Execution 276

Predictive

Analytics

293

(8)

OracleR Connectorfor

Hadoop

309

Invoking MapReduce Jobs

311

Testing

ORCH R

Scripts Without

the

Hadoop

Cluster 311

Interacting

with HDFS from R 313

HDFS

Metadata Discovery

314

Working

with

Hadoop Using

the ORCH Framework 316

ORCH Predictive

Analytics

on

Hadoop

317

ORCHhive

319

Oracle RConnector for

Hadoop

and

Oracle R

Enterprise

Interaction 322

Summary

322

11

Endeca

Information

Discovery

325

Why

Did Oracle Select Endeca? 326

ProductSuitesOverview 326

Endeca

Information

Discovery

Platform 328

Major

Functional Areas 328

Key

Features 328

Endeca Information

Discovery

and Business

Intelligence

331

Differencein Roles and Functions 332

Bl

Development

Process

vs. Information

Discovery Approach

333

Complementary

But Not Exclusive 334

Architecture

335

Oracle Endeca Server 336

Oracle Endeca Studio 339

Oracle Endeca Integration

Suite 342

Endeca on

Exalytics

343

Scalability

and Load

Balancing

344

Unifying

Diverse Content Sets 348

Endeca Differentiator 349

Industry

Use Cases 349

Hands-On with Endeca 351

Installation and

Configuration

351

Developing

an Endeca

Application

353

12

Big

Data Governance 357

Key

Elements of

Enterprise

Data Governance 359

Business Outcome 359

Information

Lifecycle Management

359

Regulatory Compliance

and Risk

Management

360

Metadata

Management

360

(9)

Xviii

Oracle

Big

Data Handbook

Data

Quality Management

361

Master and

Reference

Data

Management

361

Data

Security

and

Privacy Management

362

Business Process

Alignment

362

How Does

Big

Data

Impact Enterprise

Data Governance? 363

Modeled

Data vs. Raw Data 363

Types

of

Big

Data 366

Applying

Data Governance to

Big

Data 370

Leveraging Big

Data Governance 373

Industry-Specific

Use Cases 377

Utilities

377

Healthcare 379

Financial Services 380

Retail 382

Consumer

Packaged

Goods

(CPG)

383

Telecommunications 384

Oil and Gas 386

How Does

Big

Data

Impact

Data Governance Roles? 388

Governance Roles and

Organization

388

An

Approach

to

Implementing Big

Data Governance 389

13

Developing Architecture

and

Roadmap

for

Big

Data 393

Architecture Capabilities

for

Big

Data 394

New Characteristics of

Big

Data 394

Conceptual Architecture Capabilities

of

Big

Data 395

Product Capabilities

and Tools 397

Making Big

Data Architecture Decisions 399

Architecture

Development

Process

for

Realizing

Incremental Values 400

Overview of Oracle Information

ArchitectureFramework 400

Overview of

Applied

OADP for

Information Architecture 406

Big

DataArchitecture

Development

Process 408

Impact

on Data

Management

and Bl Processes 415

Traditional Bl

Development

Process 415

Big

Data and

Analytics Development

Process 415

Big

Data Governance 416

Traditional Data Governance Focus 417

New Focus for Governance in

Big

Data 417

Developing

Skills and

Talent

418

Data Scientist 418

(10)

Big

Data

Developer

419

Big

Data

Administrator

419

Big

Data BestPractices 419

Align Big

Data Initiative with

Specific

Business Goals 420

Ensurea

Centralized

IT

Strategy

for Standards and

Governance

420

Use a Center of

Excellence

to

Minimize

Training

and Risk 420

Correlate

Big

Data with Structured Data 420

Provide High-Performance

and Scalable

Analytical

Sandboxes 420

Reshape

the IT

Operating

Model 421

Index

423

References

Related documents

In contrast, for the oscil- lators interacting through dissipative coupling, in the syn- chronization regime, (i) the system is monostable, (ii) the intensities are symmetric, and

Mean differences (MD) in percentage body fat between each of the four lower family income quintiles and the highest income quintile were calculated in multiple linear regression

As shown in Figure 4, we find very interesting patterns (which are also prevalent in other cases): 1) first two sentences (blue frame), which contain the main topic and idea of

The majority of them (49.2% of all first booking prompted participants) finally enter the offsetting program again as prompted participants in the second booking decision... If

This study aims to determine the spider fauna from the ground and understory (herbs, shrubs and small trees) of the TMCF in El Triunfo Biosphere Reserve (REBITRI for its

• Input Select – Similarly to the video routing matrix, this allows the user to control which. output is displayed on which screen (Note: Selecting an input on any of the screens when

From 1980 to 1991, the existence of single editions of the regional newspapers in the major cities justifies the concentration of records in the central region of

The rock fall hazard may be defined as the probability of a rock fall of a given magnitude (or kinetic energy) reaching the element at risk, which can be expressed as the probability