Course: Training on Bigdata/Hadoop with Hands
Course Duration / Dates / Time: 4 Days /
Venue: Eagle Photonics Pvt Ltd
First Floor, Plot No 31, Sector 19C, Vashi, Navi Mumbai Ph: 022 27841425
Fee Details:
For Indian participants: 20,000 INR For Foreign participants: 350 USD
Course Description:
Big Data is a collection of large and complex data sets that cannot be processed using regular database management tools or processing applications. On
framework that allows for the distributed processing of large data sets using simple programming models. This hands-on course equips participants on how to manage Bigdata using Hadoop.
Who should attend?
This course is meant for software developers/programmers who are interested in Bigdata/Hadoop.
Key benefits:
On course completion, participants would be knowledgeable on Managing Bigdata and comfortable working with Hadoop Distributed File Systems & components.
Course Outline:
Module 1: Introduction to Big Data Session 1: Introduction to Big Data
• So What Is Big Data?
• History of Data Management
• Structuring of Big Data
• Types of Big Data
• Elements of Big Data
• Application of Big Data in the Business Context
• Careers in Big Data
Session 2: Business application of Big Data
• Significance of Social network Data
• Uses of Social Network Data Analysis
• Financial Fraud and Big Data
• Preventing Fraud Using Big Data Analytics Training on Bigdata/Hadoop with Hands-on
4 Days / 24th - 27th June 2015 / 9:30 - 17:30 Hrs
Vashi, Navi Mumbai - 400705
Big Data is a collection of large and complex data sets that cannot be processed using regular database management tools or processing applications. On the other hand, the Apache Hadoop software library is a framework that allows for the distributed processing of large data sets using simple programming models. This
on course equips participants on how to manage Bigdata using Hadoop.
This course is meant for software developers/programmers who are interested in Bigdata/Hadoop.
On course completion, participants would be knowledgeable on Managing Bigdata and comfortable working with components.
• History of Data Management—Evolution of Big Data
• Application of Big Data in the Business Context
Session 2: Business application of Big Data
• Significance of Social network Data
• Uses of Social Network Data Analysis
• Financial Fraud and Big Data
• Preventing Fraud Using Big Data Analytics
Big Data is a collection of large and complex data sets that cannot be processed using regular database the other hand, the Apache Hadoop software library is a framework that allows for the distributed processing of large data sets using simple programming models. This
This course is meant for software developers/programmers who are interested in Bigdata/Hadoop.
On course completion, participants would be knowledgeable on Managing Bigdata and comfortable working with
• Use of Big Data in the Retail Industry
Session 3: Technologies for handling Big Data
• Distributed and Parallel Computing for Big Data
• Virtualization and its Importance to Big Data
• Introducing Hadoop
• Cloud Computing and Big Data
• Features of Cloud Computing
• Providers in Big Data Cloud Market
• Issues in Using Cloud Services
• In-Memory Technology for Big Data
Session 4: Understanding the Hadoop Ecosystem
• The Hadoop Ecosystem
• Processing Data with Hadoop MapReduce
• Managing Resources and Applications with Hadoop YARN
• Storing Big Data with HBase
• Using Hive for Querying Big Databases
• Interacting with Hadoop Ecosystem
Session 5: Map reduce fundamentals
• Origins of MapReduce
• Characteristics of MapReduce
• How MapReduce Works
• More about Map and Reduce Functions
• Optimization Techniques for MapReduce Jobs
• Hardware/Network Topology
• Applications of MapReduce
• Role of HBase in Processing Big Data
• Mining Big Data with Hive
Module 2: Managing an Enterprise Wide Big Data Ecosystem Session 1- Big Data Technology Foundations
• Exploring the Big Data Stack
• Virtualization and Big Data
• Processor and Memory Virtualization
• Data and Storage Virtualization
• Managing Virtualization with Hypervisor
• Abstraction and Virtualization
• Implementing Virtualization to Work with Big Data
• Use of Big Data in the Retail Industry
Session 3: Technologies for handling Big Data
• Distributed and Parallel Computing for Big Data
• Virtualization and its Importance to Big Data
• Cloud Computing and Big Data
• Features of Cloud Computing
• Providers in Big Data Cloud Market
• Issues in Using Cloud Services Memory Technology for Big Data
Understanding the Hadoop Ecosystem
Data with Hadoop MapReduce
• Managing Resources and Applications with Hadoop YARN
• Storing Big Data with HBase
• Using Hive for Querying Big Databases
• Interacting with Hadoop Ecosystem
• Characteristics of MapReduce
• More about Map and Reduce Functions
• Optimization Techniques for MapReduce Jobs
• Hardware/Network Topology
• Role of HBase in Processing Big Data
Module 2: Managing an Enterprise Wide Big Data Ecosystem Big Data Technology Foundations
• Exploring the Big Data Stack
• Processor and Memory Virtualization
• Data and Storage Virtualization
ging Virtualization with Hypervisor
• Abstraction and Virtualization
• Implementing Virtualization to Work with Big Data
Session 2: Big Data management Systems
• RDBMSs and Big Data Environment
• PostgreSQL Relational Datab
• Nonrelational Databases
• Key-Value Pair Databases
• Document Databases
• Columnar Databases
• Graph Databases
• Spatial Databases.
• Polyglot Persistence
• Integrating Big Data with Traditional Data Warehouse
• Rethinking Extraction, Transformation, and Loading
• Big Data Analysis and Data Warehouse
• Changing Deployment Models in Big Data Era
Session 3: Analytics and Big Data
• Using Big Data to Get Results.
• What Constitutes Big Data
• Exploring Unstructured Data
• Understanding Text Analytics
• Building New Models and Approaches to Support Big Data
Session 4: Integrating Data, Real- Time Data and Implementing Big Data
• Stages in Big Data Analysis
• Fundamentals of Big Data Inte
• Streaming Data and Complex Event Processing
• Making Big Data a Part of Your Operational Process
• Ensuring Validity, Veracity, and Volatility of Big Data
• Data Validity and Veracity
• Data Volatility
Session 5: Big Data Solutions and Data in Motion
• Big Data as a Business Strategy Tool
• Analysis in Real-Time: Adding New Dimensions to the Cycle
• The Needs for Data in Motion
• Case 1: Using Streaming Data for Environmental Impact
• Case 2: Using Streaming Data for Public Policy
• Case 3: Use of Streaming Data in Health Care Industry
• Case 4: Use of Streaming Data in Energy Industry
• Case 5: Improving Customer Experience with Real
• Case 6: Using Real-time Data in Finance
Data management Systems – Databases and Warehouses
• RDBMSs and Big Data Environment
• PostgreSQL Relational Database
• Integrating Big Data with Traditional Data Warehouse
• Rethinking Extraction, Transformation, and Loading
• Big Data Analysis and Data Warehouse
• Changing Deployment Models in Big Data Era
• Using Big Data to Get Results.
• Exploring Unstructured Data
• Understanding Text Analytics
• Building New Models and Approaches to Support Big Data
Time Data and Implementing Big Data
• Fundamentals of Big Data Integration
• Streaming Data and Complex Event Processing
• Making Big Data a Part of Your Operational Process
• Ensuring Validity, Veracity, and Volatility of Big Data
Session 5: Big Data Solutions and Data in Motion
• Big Data as a Business Strategy Tool
Time: Adding New Dimensions to the Cycle
• The Needs for Data in Motion
• Case 1: Using Streaming Data for Environmental Impact Streaming Data for Public Policy
• Case 3: Use of Streaming Data in Health Care Industry
• Case 4: Use of Streaming Data in Energy Industry
• Case 5: Improving Customer Experience with Real-Time Text Analytics time Data in Finance Industry
• Case 7: Using Real-Time Data for Insurance Fraud Prevention
Module 3: Storing and Processing Data Session 1: Storing Data in Hadoop
• HDFS, HBase
• Combining HDFS and HBase for Effective Data Storage
• Choosing an Appropriate Hadoop Data Organization for Your Applications
Session 2: Processing your data with map Reduce
• Getting to Know MapReduce
• Your First MapReduce Application
• Designing MapReduce Implementations
Session 3: Customizing MapReduce Execution
• Controlling MapReduce Execution with Input Format
• Reading Data Your Way with Custom Record Reader
• Organizing Output Data with Custom Output Formats
• Optimizing Your MapReduce Execution with a Combiner
• Controlling Reducer Execution with Partitioners
Session 4: Testing and Debugging map Reduce Applications
• Unit Testing MapReduce Applications
• Local Application Testing with Eclipse
• Using Logging for Hadoop Testing
• Reporting Metrics with Job Counters
• Defensive Programming in MapReduce
Session 5: Implementing MapReduce Wordcount Program
Module 4: Increasing Efficiency with Hadoop Tools: Hive and Pig Session 1: Exploring Hive
• Introducing Hive
• Starting Hive
• Executing Hive Queries from Files
• Data Types
• Hive Built-In Functions
• Compressed Data Storage
• Data Manipulation in Hive
Session 2: Advanced Querying with Hive
• Queries
Time Data for Insurance Fraud Prevention
and Processing Data – HDFS and MapReduce
• Combining HDFS and HBase for Effective Data Storage
• Choosing an Appropriate Hadoop Data Organization for Your Applications
Session 2: Processing your data with map Reduce
• Getting to Know MapReduce
• Your First MapReduce Application
• Designing MapReduce Implementations
Session 3: Customizing MapReduce Execution
• Controlling MapReduce Execution with Input Format
• Reading Data Your Way with Custom Record Reader
• Organizing Output Data with Custom Output Formats
• Optimizing Your MapReduce Execution with a Combiner
• Controlling Reducer Execution with Partitioners
Session 4: Testing and Debugging map Reduce Applications
• Unit Testing MapReduce Applications
• Local Application Testing with Eclipse
• Using Logging for Hadoop Testing Counters
• Defensive Programming in MapReduce
Session 5: Implementing MapReduce Wordcount Program- A case study
Module 4: Increasing Efficiency with Hadoop Tools: Hive and Pig
Hive Queries from Files
Session 2: Advanced Querying with Hive
• Manipulating Column Values Using Functions
• JOINS in Hive
• Hive Best Practices
• Performance-Tuning and Query Optimizations
• Various Execution Types
• Hive File and Record Formats
• HiveThrift Service
• Security in Hive
Session 3: Analyzing Data with Pig
• Introduction to Pig
• Installing Pig
• Properties of Pig
• Running Pig
• Pig Latin Application Flow
• Beginning with Pig Latin
• Relational Operators in Pig
Module 5: Additional Hadoop Tools: Sqoop, Flume, YARN and Storm Session 1: Efficiently transferring Bulk data Using Sqoop
• Introducing Sqoop
• Using Sqoop 1
• Importing Data with Sqoop
• Controlling Parallelism
• Encoding NULL Values
• Importing Data into Hive Tables
• Importing Data into HBase
• Exporting Data
• Exporting Data into Subset of Columns
• Drivers and Connectors in Sqoop
• Sqoop Architecture Overview
• Sqoop 2
Session 2: Flume
• Introducing Flume
• The Flume Architecture
• Setting Up Flume
• Building Flume
Session 3: Beyond MapReduce – YARN
• Why YARN?
• Manipulating Column Values Using Functions
Tuning and Query Optimizations
• Hive File and Record Formats
Module 5: Additional Hadoop Tools: Sqoop, Flume, YARN and Storm Session 1: Efficiently transferring Bulk data Using Sqoop
• Importing Data into Hive Tables
• Exporting Data into Subset of Columns
• Drivers and Connectors in Sqoop
• Sqoop Architecture Overview
• The YARN Ecosystem
• A YARN API Example
• Mesos versus YARN
Session 4: Storm on YARN
• Storm and Hadoop
• Overview of Storm
• The Storm API
• Storm on YARN
• Installing Storm on YARN
• An Example of Storm on YARN
Module 6: Leveraging NoSQL, Hadoop Security, on Cloud and Real Time Session 1: Hello MoSQL
• Two Simple Examples
• Storing and Accessing Data
• Storing and Accessing Data in MongoDB
• Storing and Accessing Data in HBase
• Storing and Accessing Data i
• Language Bindings for NoSQL Data Stores
Session 2: Working with NoSQL
• Creating Records
• Accessing Data
• Updating and Deleting Data
• MongoDB Query Language Capabilities
• Accessing Data from Column
Session 3: Hadoop Security
• Hadoop Security Challenges
• Authentication
• Delegated Security Credentials
• Authorization
Session 4: Running Hadoop Applications on AWS
• Getting to Know AWS
• Options for Running Hadoop on AWS
• Understanding the EMR–Hadoop Relationship
• Using AWS S3
• Automating EMR Job Flow Creation and Job Execution
• Orchestrating Job Execution in EMR
• An Example of Storm on YARN
Module 6: Leveraging NoSQL, Hadoop Security, on Cloud and Real Time
• Storing and Accessing Data in MongoDB
• Storing and Accessing Data in HBase
• Storing and Accessing Data in Apache Cassandra
• Language Bindings for NoSQL Data Stores
• Updating and Deleting Data
• MongoDB Query Language Capabilities
• Accessing Data from Column-Oriented Databases Like HBase
• Hadoop Security Challenges
• Delegated Security Credentials
Session 4: Running Hadoop Applications on AWS
• Options for Running Hadoop on AWS
Hadoop Relationship
• Automating EMR Job Flow Creation and Job Execution
• Orchestrating Job Execution in EMR
Session 5: Real Time Hadoop
• Real-Time Hadoop Applications
• Using Specialized Real-Time Hadoop Quer
• Using Hadoop-Based Event-Processing Systems
Trainer Profile
Mr Biswajyoti Kar holds A.M.I.E from Institution of Engineers(India), Gokhale Road Calcutta & B.Sc in Physics from University Of Calcutta. He is a Senior Architect with over 19 yea
architecture, designing and implementing systems software
kernel mode development, Data structures and algorithm development in C solutions around Big Data and Analytics
Training Experience
• Big Data, Hadoop Distributed file systems in Dell.
• Algorithm and Data Structures in C/C++, UNIX/Linux advanced programming, shell scripting in Dell
• Algorithm and Data Structures in C Proton solutions
Project Experiences
1. BIG Data Work
Leading a project that involved setting up of Hadoop distributed file system (HDFS) on Linux box to test the elasticity part of cloud computing.
Bench-marking the hado model called PIG Latin.
Statistical analysis was done using R language.
2. Parallel network file system
Leading a project that involved setting up a pNFS client configuration.
Figuring out pros and cons of each configuration in HPC and NAS environments.
3. Big Data Analytics
Providing consulting in the area of Big Data Analysis to credit rating agency Time Hadoop Applications
Time Hadoop Query Systems Processing Systems
holds A.M.I.E from Institution of Engineers(India), Gokhale Road Calcutta & B.Sc in Physics from is a Senior Architect with over 19 years of rich experience with proven record in implementing systems software. He has experience of BIG Data Analytics, UNIX/Linux
Data structures and algorithm development in C. His area of Big Data and Analytics and IP creation in Big Data space.
Big Data, Hadoop Distributed file systems in Dell.
Algorithm and Data Structures in C/C++, UNIX/Linux advanced programming, shell scripting in Dell Algorithm and Data Structures in C Proton solutions
Leading a project that involved setting up of Hadoop distributed file system (HDFS) on Linux box to test the elasticity part of cloud computing.
marking the hadoop system for crunching terabytes of data using macro model called PIG Latin.
Statistical analysis was done using R language.
Leading a project that involved setting up a pNFS client-server file and block layout
Figuring out pros and cons of each configuration in HPC and NAS environments.
Providing consulting in the area of Big Data Analysis to credit rating agency
* * *
holds A.M.I.E from Institution of Engineers(India), Gokhale Road Calcutta & B.Sc in Physics from rs of rich experience with proven record in BIG Data Analytics, UNIX/Linux area of interest is building
Algorithm and Data Structures in C/C++, UNIX/Linux advanced programming, shell scripting in Dell
Leading a project that involved setting up of Hadoop distributed file system (HDFS) on Linux box
op system for crunching terabytes of data using macro-programming
server file and block layout
Figuring out pros and cons of each configuration in HPC and NAS environments.