• No results found

BIG DATA TRENDS AND TECHNOLOGIES

N/A
N/A
Protected

Academic year: 2022

Share "BIG DATA TRENDS AND TECHNOLOGIES"

Copied!
30
0
0

Loading.... (view fulltext now)

Full text

(1)

BIG DATA

TRENDS AND TECHNOLOGIES

(2)

THE WORLD OF DATA IS CHANGING

Cloud

(3)

WHAT IS BIG DATA?

Big data are datasets that grow so large that they become

awkward to work with using on- hand database management tools.

Difficulties include capture, storage, search, sharing,

analytics, and visualizing.

We are entering the era in which sensors are collecting data in our physical world and delivering it to networks that aggregate and analyze the information. Big data defines us and will increasingly

dictate how we live in a fully interconnected world.

(4)

WHAT IS BIG DATA?

Forrester’s Brian Hopkins describes big data as

“techniques and technologies

that make handling data at

extreme scale economical.”

(5)

• BEFORE 1990

PHOTOGRAPHS AND VIDEO TAKEN ENTIRE LIFE BY ONE PROFESSIONAL OCCUPIES AROUND 10 GIGABYTES

DIGITAL DATA AGE

• 2010

PHOTOGRAPH AND VIDEO TAKEN ONE YEAR BY ORDINARY PEOPLE TAKES UP ABOUT 5 GB

(6)

• PHOTOGRAPHS

WHERE DATA COME FROM

• VIDEO

• MACHINE LOGS

• RFID READER

• VEHICLE GPS TRACE

• RETAIL TRANSACTION

• FINANCIAL TRANSACTION

(7)

DIGITAL DATA FACTS AND FIGURES

(8)

1990

1,000 MB WITH 4.4 MB/S TRANSFERATE

It takes approximately 714 s or 12 minutes to read whole disk.

STORAGE CAPACITY AND TRANSFER RATE

2011

1,000 GB WITH 135 MB/S TRANSFERATE

It takes approximately 7,600 s or 2

Hours to read whole disk.

(9)

• READ 2 TERABYTE FROM 1 DISK

TAKE 4 HOURS

MORE OXES BETTER THAN ONE BIGGER OX ?

• WHAT ABOUT READ 1 TB FROM 1,000 DISKS IN PARALLEL

TAKE 15 SECONDS

(10)

• Hadoop Distributed File System - reliable data storage

DATA IS DISTRIBUTED AND REPLICATED OVER MULTIPLE MACHINES DESIGNED FOR LARGE FILES (TB, PB, OR LARGER)

HADOOP

• MapReduce -high-performance parallel data processing

Inspired by GOOGLE BIG TABLE and MAP REDUCE papers Circa 2004 created by doug cutting

(11)

HADOOP DISTRIBUTED FILE SYSTEM

(12)

MAP/REDUCE ADVANTAGES

• SCALABLE

Automatically Parallelizes Map & Reduce Operations Supporting 1,000’s of Processors and Petabytes of Data

• FAULT TOLERANCE

Replicated Data in HDFS

Failed Jobs Automatically Restarted without Loss of the Rest of Jobs

• ELASTIC AND FLEXIBLE

Degree of Parallelism can be Determined at Runtime Flexible Data Model and Programing

• AFFORDABLE AND EASY TO USE

Open Source and Designed to Work on Commodity Hardware Two Routines : Map & Reduce

(13)

HADOOP ARCHITECTURE

(14)

HADOOP ADVANTAGES

• DISTRIBUTED

DATA WAS REPLICATED AND PROCESSED ACROSS THE CLUSTER

• FAULT TOLERANT

WHEN NODES FAIL

• SELF HEALING

REBALANCES FILES ACROSS CLUSTER

• SCALABLE

JUST BY ADDING NEW NODES

(15)

HADOOP FACTS

• OPEN SOURCE

• BATCH / OFF-LINE ORIENTED

• DATA AND I/O INTENSIVE (READ)

• HADOOP IS NOT A RELATIONAL DATABASE

• HADOOP IS NOT AN OLTP SYSTEM AND NOT

A STRUCTURED DATA STORE OF ANY KIND

(16)

HADOOP STACK

HIVE

DATA WAREHOUSE PLATFORM ON HADOOP

HBASE

TABLE STORAGE ON HADOOP

CASANDRA

DATA STORE

ZOO KEEPER

ZooKeeper is a centralized service for maintaining configuration information ,naming, providing distributed synchronization, and providing group services

FLUME, PIG, etc.

(17)

• TWITTER

WHO TO FOLLOW

WHO’S USING HADOOP?

• YAHOO

SEARCH ASSIST

• LINKEDIN

PEOPLE YOU MAY KNOW

• YOUTUBE

VIDEO SUGGESTIONS

• FACEBOOK

FRIENDS YOU MAY KNOW AND ALMOST EVERYTHING

• AMAZON, EBAY, GOOGLE

(18)

LEVERAGES TRADITIONAL AND NEW CAPABILITIES

TRADITIONAL

Relational Database Management System NEW

Petabyte-Scale Services

(19)

Share your data with the world via Azure Marketplace

Enrich with social media data via Social Analytics

Advanced analytics with Hadoop

Connecting

with the World’s Data

Microsoft’s approach to Big Data

Analyze Big Data with familiar tools Immersive insights from any data JavaScript based simple

programming

Immersive Insight, Wherever you are

Simplicity and manageability of Windows to Hadoop

Extended data warehousing with Hadoop

Scale & elasticity of cloud

Any Data, Any Size Anywhere

(20)

MICROSOFT BIG DATA ANALYTIC

Hadoop connectors for SQL named SQOOP that enable to move data seamlessly between Hadoop and SQL Server or SQL Server Parallel Data Warehouse.

new Hive ODBC Driver and an Excel Hive Add-in that enable customers to move data from Hive directly into Excel, or Microsoft BI tools such as

PowerPivot, for analysis.

(21)

Extending your Enterprise Data Warehouse with hadoop

Integration with enterprise BI solutions

Microsoft SQL Server connector for Apache Hadoop with SQOOP (SQL to Hadoop)

Integration with Microsoft Enterprise Data Warehouses

SQL Server Parallel Data Warehouse connector for Apache Hadoop with SQOOP

Deeper insights from structured and

unstructured data

Benefits Key Features

(22)

Delivery insights to everyone by enabling big data analysis with familiar end user tools

Hive add-in for Excel Interaction and analysis of unstructured data in Hadoop from Microsoft Excel

Benefits Key Features

(23)

Unlocking new insights from all data with Microsoft BI tools

Hive ODBC Driver integrates Hadoop to SQL Server Analysis Services,

PowerPivot, and Power View Familiar BI tools with structured and unstructured data

Benefits Key Features

(24)

Simplifying programming on hadoop with JavaScript

New JavaScript libraries for Hadoop

JS

Deploy JavaScript Hadoop jobs from a simple web browser MapReduce

programs in JavaScript

Simplified

programming Simplified deployment of MapReduce jobs

Benefits Key Features

(25)

Providing Choice of Deployment options

Elastic peta-scale

analytics on Microsoft’s cloud platform

Hadoop-based Service on Windows Azure platform

Enterprise-class Big Data platform on-premises

Hadoop-based distribution on Windows Server

Benefits Key Features

(26)

Connects Hadoop to the world via Windows Azure Marketplace

Mashing up of internal and public data sets via Data Explorer

Integration with third- party data and services Sharing of data and

insights through Windows Azure Marketplace

Integration with Windows Azure Marketplace

Benefits Key Features

(27)

Simplicity and manageability of windows to hadoop

Enterprise-class security

Integration with Microsoft System Center

Integration with Windows Server® Active Directory Simplified

management of

Hadoop on Windows

Smart packaging of Hadoop on premises Fast deployment of Hadoop on Azure

Easy setup on-premises and in the cloud

Benefits Key Features

(28)

A holistic BIG DATA Solution from Microsoft spanning relational and non-relational Worlds

NON-RELATIONAL

1 0 0 1 1

1 DATA MANAGEMENT

SHARE AND GOVERN DISCOVER

AND RECOMMEND TRANSFORM

AND CLEAN

INSIGHTS

DATA ENRICHMENT

OPERATIONAL

SELF-SERVICE MOBILE

PREDICTIVE

REAL-TIME COLLABORATIV

E and Services External Data MARKETPLACE

RELATIONAL MULTIDIMENSIONAL STREAMING

(29)

CY

DATA MANAGEMENT

DATA ENRICHMENT

INSIGHTS

Hadoop on Windows & Azure: Roadmap

29

H2 2011

Hadoop on Azure Preview 2

• More capacity

• Disaster Recovery for HDFS

• Support for Mahout Hadoop on Azure Private CTP

Hadoop on Server Private TAP

• Hadoop Core & Common

• JavaScript Framework

Hadoop on Azure GA

• Portal Integration & Billing

• Azure SDK integration

Hadoop on Server GA

• JavaScript, PIG, Hive, Hbase

• Active Directory Integration

• Systems Center Integration

2012 Hive ODBC Driver Preview 2

Azure Labs

Data Explorer

Social Analytics

Data Hub (Private Data Market) Hadoop Connectors

Azure Data Market

Excel Integration Preview 2

• Hive Add-in for Excel

• PowerPivot Add-in for Excel

• Power View for SharePoint

(30)

• http://www.microsoft.com/bigdata

• http://hadoop.apache.org

• http://www.cloudera.com

• http://www.youtube.com

• https://www.hadooponazure.com/

Resource :

References

Related documents