• No results found

Age of Big data. Presented by: Mohammad Iqbal BCM -2014

N/A
N/A
Protected

Academic year: 2021

Share "Age of Big data. Presented by: Mohammad Iqbal BCM -2014"

Copied!
26
0
0

Loading.... (view fulltext now)

Full text

(1)

Age of Big data

Presented by:

Mohammad Iqbal

(2)

Agenda

 What is a Big Data ?

 Big Data Attributes

 Big data Sources

 Getting Value from Big data

 New Tools for Big Data

 Hadoops' Architecture

 Hadoop evolution from Google

 The future is here!

(3)

What is a Big Data ?

Name Symbol Value

Kilobyte KB 10^3

Megabyte MB 10^6

Gigabyte GB 10^9

Terabyte TB 10^12

Petabyte PB 10^15

Exabyte EB 10^18

Zettabyte ZB 10^21

Yottabyte YB 10^24

BIG DATA

So large data that it becomes

difficult to process it using the

traditional system

Getting New Tools Hadoop

(4)

Difficult to process by Traditional

System

100 MB

document

100 TB document

100 GB

document

Unable to

send

Unable

to Edit

Unable

to View

Depends on

capability of

system

(5)

Organization/Context Specific

500 TB Text,Audio,Video

data per day

Company A

Company B

Big

Date

NOT

a Big

data

Depends on

capabilities

of the

organization

Getting New Tools Hadoop

(6)

Areas of Challenges

Capture

Curation

Storage

Anlaysis Visualization

Transfer

Sharing

search

(7)

Big Data Attributes

Big

Data • Large & growing files

• At High speed

• In various Format

VELOCITY VOLUME VARIETY

Data

comes at

high speed

This files

comes in

various

formats

Data

result in

large file

V^3

(8)

Structured / Unstructured

Unstructured Data

90%

Structured

Data 10%

Challenge

/Opportunity Mostly

wasted

Used in

decision

making

To analyze & extract

meaningful information

(9)

Big data Sources

Users

Applications

Systems

Sensors

Large & growing

files

(Big data files)

(10)

Data Generation point Examples

Mobile devices

Microphones

Readers/Scanners

Software/program

Social Media

cameras

Machine Sensors

Science facilities

(11)

Sample Events generating Data

• Every day, we create 2.5 Exabytes of data i.e 2.5 billion GB, so much that

90% of the data in the world today has been created in the last few years

alone.

• CERN Atomic facility generates 40 TB data per second.

• Twitter generates 12 TB of data every day.

• Airbus A380 generates 10 TB every 30 minutes of flight. About 650TB

generated in one flight.

• In 2009 total data in world was estimated to be 1 ZB. By 2020 estimated to

be 35 ZB .

(Source :IBM.com)

(12)

Getting Value from Big data

Collect Analyze Understand

(13)

Big data Applications

• Companies gaining edge by collecting

,analyzing and understanding information.

• Government forecasting events and taking

proactive actions.

(14)

New Tools for Big Data

Traditional

Systems

(e.g RDBMS

,SQL)

Big data tool

(e.g Hadoop

NoSQL)

Time

Not able to

handle Big

data

Created to

handle big data

(15)

Traditional Enterprise Approach

Big data Processing Limit Computer Powerful

Only So much

data could be

processed

(16)

Modern Hadoop’s approach

Big data

Combined result Computation

Computation

Computation

Computation

(17)

Hadoops’s Architecture

Source :hortonworks/hadoop/hdfs/.com/

Map Reduce

File System

HDFS

Projects

HBase

Mahout

Pig

Oozie

Flume

Scoop

Hive

(18)

Application

Task

tracker

Data

Task

tracker

Data

Task

tracker

Data

Task

tracker

Job

Tracker

Data

Node

Data

Task

tracker

Name

Node

MASTER

Slaves

DATA

(19)

Application

Task

tracker

Data

Node

Task

tracker

Data

Node

Task

tracker

Data

Node

Task

tracker

Job

Tracker

Data

Node

Data

Node

Task

tracker

Name

Node

MASTER

Slaves

DATA

K no w wher e da ta res iding

Data can

be taken

directly

(20)

HDFS vs GFS

• Similarity with Google file system

(GFS)MapReduce

• Back in 1990 search engine supported by:

Excite

Altavista

Lycos

Infoseek

(21)

Google Victory

1995

2000

Excite

Altavista

Lycos

Google

(22)

Hadoop evolution from Google

2003 2004 2005 2006

GFS paper

released by

Google

Google released

paper on

MapReduce

Hadoop created by

Doug & Cafarella at

Yahoo! (Nutch search engine)

Yahoo donated

the project to

Apache

Source :Google & Nutch white papers

(23)

The future is here !!

(24)

• Big data scientists with just two years' experience can earn between

$200,000 and $300,000 a year (wall street journel).

• Anyone with "data science" in his or her job title on a LinkedIn page is

going to get "100 recruiter emails a day,“.(wall street journel).

• Hadoop is a super hot up-and-coming "big data" technology.

(Business insider.com).

• Many other data scientists, especially at data-driven companies such as

Google, Amazon, Microsoft, Walmart, eBay, LinkedIn, and Twitter, have

added to and looking for developing the Hadoop tool kit. (Harvard

business review).

• "People are slapping buzzwords as “Hadoop”on résumés and looking to get

50 or 100 percent more, and they're getting it," said Scott Gnau, president

of Teradata Data Lab.

(25)

References

• Dean & Sanjay (2004)> MapReduce: Simplied Data Processing on Large

Clusters.google.com

• Dogh Cutting Nutch(2005): A Flexible and Scalable Open-Source Web Search

Engine.yahoo .com

• Sanjay & Howard (2003): The Google File System, google.com

• https://www.ibm.com/developerworks/vn/library/contest/dw-

freebooks/Tim_Hieu_Big_Data/Understanding_BigData.PDF [Accessed date 27

th

nov 2014]

• http://www.businessinsider.com/10-tech-skills-that-will-instantly-net-you-100000-

salary-2012-8?op=1[Accessed date 27

th

nov 2014]

• Big Data's High-Priests of Algorithms,http://online.wsj.com/articles/academic-

researchers-find-lucrative-work-as-big-data-scientists-1407543088[Accessed date

27

th

nov 2014]

(26)

Thank you for your attention

Q/A

References

Related documents

persons or enterprises where such plans, programs and/or projects are related to those of the LLDA for the development of the region. While it is a fundamental rule that

A way to incorporate competition is to model product class sales using the diffusion model, and brand share using a market share model (see case 1). Quality

Under this program, a student would complete three years of liberal arts education at Wooster and then transfer into a two- year social work program at Case Western Reserve

October 24, 2011 9 Create a detailed geometric model of the  entire cardiovascular system, extending 

The violation of the Picard condition is the simple explanation of the instability of linear inverse problems in the form of first-kind Fredholm integral equations.. At the same

In the methodology of compression the task of prediction based on motion compensation is eminent.Motion compensation is a method based on algorithm which is utilized

Pemurnian lanjutan yaitu penghilangan air atau dikenal dengan proses dehidrasi, pada penelitian ini digunakan proses distilasi-adsorbsi dengan variasi jenis yaitu

Commercial cabinets that meet OSHA specifications also qualify for the 1-hour firewall or 20-foot separation of incompatible materials. If you store incompatible materials in