• No results found

Professional Hadoop Solutions

N/A
N/A
Protected

Academic year: 2021

Share "Professional Hadoop Solutions"

Copied!
10
0
0

Loading.... (view fulltext now)

Full text

(1)

Brochure

More information from http://www.researchandmarkets.com/reports/2542488/

Professional Hadoop Solutions

Description: The go-to guidebook for deploying Big Data solutions with Hadoop

Today's enterprise architects need to understand how the Hadoop frameworks and APIs fit together, and how they can be integrated to deliver real-world solutions. This book is a practical, detailed guide to building and implementing those solutions, with code-level instruction in the popular Wrox tradition. It covers storing data with HDFS and Hbase, processing data with MapReduce, and automating data processing with Oozie. Hadoop security, running Hadoop with Amazon Web Services, best practices, and automating Hadoop processes in real time are also covered in depth.

With in-depth code examples in Java and XML and the latest on recent additions to the Hadoop ecosystem, this complete resource also covers the use of APIs, exposing their inner workings and allowing architects and developers to better leverage and customize them.

- The ultimate guide for developers, designers, and architects who need to build and deploy Hadoop applications

- Covers storing and processing data with various technologies, automating data processing, Hadoop security, and delivering real-time solutions

- Includes detailed, real-world examples and code-level guidelines - Explains when, why, and how to use these tools effectively

- Written by a team of Hadoop experts in the programmer-to-programmer Wrox style

Professional Hadoop Solutions is the reference enterprise architects and developers need to maximize the power of Hadoop.

Contents: Introduction xvii

Chapter 1: Big Data and the Hadoop Ecosystem 1 Big Data Meets Hadoop 2

Hadoop: Meeting the Big Data Challenge 3 Data Science in the Business World 5 The Hadoop Ecosystem 7

Hadoop Core Components 7 Hadoop Distributions 10

Developing Enterprise Applications with Hadoop 12 Summary 16

Chapter 2: Storing Data in Hadoop 19 HDFS 19

HDFS Architecture 20 Using HDFS Files 24

Hadoop-Specific File Types 26

(2)

HBase 34

HBase Architecture 34 HBase Schema Design 40 Programming for HBase 42 New HBase Features 50

Combining HDFS and HBase for Effective Data Storage 53 Using Apache Avro 53

Managing Metadata with HCatalog 58

Choosing an Appropriate Hadoop Data Organization for Your Applications 60 Summary 62

Chapter 3: Processing Your Data with MapReduce 63 Getting to Know MapReduce 63

MapReduce Execution Pipeline 65

Runtime Coordination and Task Management in MapReduce 68 Your First MapReduce Application 70

Building and Executing MapReduce Programs 74 Designing MapReduce Implementations 78

Using MapReduce as a Framework for Parallel Processing 79 Simple Data Processing with MapReduce 81

Building Joins with MapReduce 82

Building Iterative MapReduce Applications 88 To MapReduce or Not to MapReduce? 94 Common MapReduce Design Gotchas 95 Summary 96

Chapter 4: Customizing MapReduce Execution 97 Controlling MapReduce Execution with InputFormat 98

(3)

Implementing RecordReader for XML Data 119

Organizing Output Data with Custom Output Formats 123 Implementing OutputFormat for Splitting MapReduce Job’s Output into Multiple Directories 124

Writing Data Your Way with Custom RecordWriters 133 Implementing a RecordWriter to Produce Outputtar Files 133 Optimizing Your MapReduce Execution with a Combiner 135 Controlling Reducer Execution with Partitioners 139

Implementing a Custom Partitioner for One-to-Many Joins 140 Using Non-Java Code with Hadoop 143

Pipes 143

Hadoop Streaming 143 Using JNI 144

Summary 146

Chapter 5: Building Reliable MapReduce Apps 147 Unit Testing MapReduce Applications 147

Testing Mappers 150 Testing Reducers 151 Integration Testing 152

Local Application Testing with Eclipse 154 Using Logging for Hadoop Testing 156 Processing Applications Logs 160 Reporting Metrics with Job Counters 162 Defensive Programming in MapReduce 165 Summary 166

Chapter 6: Automating Data Processing with Oozie 167 Getting to Know Oozie 168

Oozie Workflow 170

Executing Asynchronous Activities in Oozie Workflow 173 Oozie Recovery Capabilities 179

(4)

Oozie Bundle 187

Oozie Parameterization with Expression Language 191 Workflow Functions 192

Coordinator Functions 192 Bundle Functions 193 Other EL Functions 193 Oozie Job Execution Model 193 Accessing Oozie 197

Oozie SLA 199 Summary 203

Chapter 7: Using Oozie 205

Validating Information about Places Using Probes 206 Designing Place Validation Based on Probes 207 Designing Oozie Workflows 208

Implementing Oozie Workflow Applications 211 Implementing the Data Preparation Workflow 212 Implementing Attendance Index and Cluster Strands Workflows 220

Implementing Workflow Activities 222

Populating the Execution Context from a java Action 223 Using MapReduce Jobs in Oozie Workflows 223

Implementing Oozie Coordinator Applications 226 Implementing Oozie Bundle Applications 231

Deploying, Testing, and Executing Oozie Applications 232 Deploying Oozie Applications 232

Using the Oozie CLI for Execution of an Oozie Application 234 Passing Arguments to Oozie Jobs 237

Using the Oozie Console to Get Information about Oozie Applications 240

(5)

Summary 247

Chapter 8: Advanced Oozie FEATURES 249 Building Custom Oozie Workflow Actions 250 Implementing a Custom Oozie Workflow Action 251 Deploying Oozie Custom Workflow Actions 255 Adding Dynamic Execution to Oozie Workflows 257 Overall Implementation Approach 257

A Machine Learning Model, Parameters, and Algorithm 261 Defining a Workflow for an Iterative Process 262

Dynamic Workflow Generation 265 Using the Oozie Java API 268

Using Uber Jars with Oozie Applications 272 Data Ingestion Conveyer 276

Summary 283

Chapter 9: Real-Time Hadoop 285

Real-Time Applications in the Real World 286

Using HBase for Implementing Real-Time Applications 287 Using HBase as a Picture Management System 289 Using HBase as a Lucene Back End 296

Using Specialized Real-Time Hadoop Query Systems 317 Apache Drill 319

Impala 320

Comparing Real-Time Queries to MapReduce 323 Using Hadoop-Based Event-Processing Systems 323 HFlame 324

Storm 326

Comparing Event Processing to MapReduce 329 Summary 330

Chapter 10: Hadoop Security 331

A Brief History: Understanding Hadoop Security Challenges 333 Authentication 334

(6)

Delegated Security Credentials 344 Authorization 350

HDFS File Permissions 350 Service-Level Authorization 354 Job Authorization 356

Oozie Authentication and Authorization 356 Network Encryption 358

Security Enhancements with Project Rhino 360 HDFS Disk-Level Encryption 361

Token-Based Authentication and Unified Authorization Framework 361 HBase Cell-Level Security 362

Putting it All Together — Best Practices for Securing Hadoop 362 Authentication 363

Authorization 364 Network Encryption 364

Stay Tuned for Hadoop Enhancements 365 Summary 365

Chapter 11: Running Hadoop Applications on AWS 367 Getting to Know AWS 368

Options for Running Hadoop on AWS 369 Custom Installation using EC2 Instances 369 Elastic MapReduce 370

Additional Considerations before Making Your Choice 370 Understanding the EMR-Hadoop Relationship 370 EMR Architecture 372

Using S3 Storage 373

Maximizing Your Use of EMR 374

Utilizing CloudWatch and Other AWS Components 376 Accessing and Using EMR 377

Using AWS S3 383

(7)

Content Browsing with the Console 386 Programmatically Accessing Files in S3 387

Using MapReduce to Upload Multiple Files to S3 397 Automating EMR Job Flow Creation and Job Execution 399 Orchestrating Job Execution in EMR 404

Using Oozie on an EMR Cluster 404 AWS Simple Workflow 407

AWS Data Pipeline 408 Summary 409

Chapter 12: Building Enterprise Security Solutions for Hadoop Implementations 411 Security Concerns for Enterprise Applications 412

Authentication 414 Authorization 414 Confidentiality 415 Integrity 415 Auditing 416

What Hadoop Security Doesn’t Natively Provide for Enterprise Applications 416 Data-Oriented Access Control 416

Differential Privacy 417 Encrypted Data at Rest 419 Enterprise Security Integration 419

Approaches for Securing Enterprise Applications Using Hadoop 419 Access Control Protection with Accumulo 420

Encryption at Rest 430

Network Isolation and Separation Approaches 430 Summary 434

Chapter 13: Hadoop’s Future 435

Simplifying MapReduce Programming with DSLs 436 What Are DSLs? 436

DSLs for Hadoop 437

(8)

Tez 452

Security Enhancements 452 Emerging Trends 453 Summary 454

APPENDIX : Useful Reading 455 Index

Ordering: Order Online - http://www.researchandmarkets.com/reports/2542488/

Order by Fax - using the form below

Order by Post - print the order form below and send to Research and Markets,

(9)

Page 1 of 2

Fax Order Form

To place an order via fax simply print this form, fill in the information below and fax the completed form to 646-607-1907 (from USA) or +353-1-481-1716 (from Rest of World). If you have any questions please visit

http://www.researchandmarkets.com/contact/

Order Information

Please verify that the product information is correct.

Product Format

Please select the product format and quantity you require:

* Shipping/Handling is only charged once per order.

Contact Information

Please enter all the information below in BLOCK CAPITALS

Product Name: Professional Hadoop Solutions

Web Address: http://www.researchandmarkets.com/reports/2542488/

Office Code: SC

Quantity

Hard Copy

(Paper back): USD 115 + USD 29 Shipping/Handling

Title: Mr Mrs Dr Miss Ms Prof

First Name: Last Name:

Email Address: * Job Title: Organisation: Address: City:

Postal / Zip Code: Country:

Phone Number: Fax Number:

(10)

Page 2 of 2

Payment Information

Please indicate the payment method you would like to use by selecting the appropriate box.

Please fax this form to:

(646) 607-1907 or (646) 964-6609 - From USA

+353-1-481-1716 or +353-1-653-1571 - From Rest of World

Pay by credit card: You will receive an email with a link to a secure webpage to enter yourcredit card details.

Pay by check: Please post the check, accompanied by this form, to: Research and Markets,

Guinness Center, Taylors Lane, Dublin 8, Ireland.

Pay by wire transfer: Please transfer funds to:

Account number 833 130 83

Sort code 98-53-30

Swift code ULSBIE2D

IBAN number IE78ULSB98533083313083 Bank Address Ulster Bank,

27-35 Main Street, Blackrock, Co. Dublin, Ireland. If you have a Marketing Code please enter it below:

Marketing Code:

References

Related documents

We base our so-called G-Net on a vehicle detection followed by a wheel localization phase on the cropped image of the vehicle, both based on a recurrent neural network [7, 11]

The mission transponder operations begin with a radar station making a calibration request to the RADCAL coordi- nator at Vandenberg Air Force Base (VAFB).. VAFB then generates

The newly added article that will provide requirements to address low-voltage Class 2 ac and dc volt equipment connected to ceiling grids, and walls built specifically for this type

Correlation between Breast Self-Examination Practices and Demographic Characteristics, Risk Factors and Clinical Stage of Breast Cancer among Iraqi Patients..

However, capital market turbulence in the late 1980s, especially following the Black Monday on 19th October 1987, as well as changes in the financial market environment and defaults

[r]

Although robots can do some jobs better, cheaper, and faster than humans in the service sector and are in huge demand at various levels, which is always expected to increase in

The process of selecting the workers in a crew and assigning crews to different tasks is crucial for ensuring the success of a construction project and improved labor