• No results found

Defense Industry & Open Source & BigData

N/A
N/A
Protected

Academic year: 2022

Share "Defense Industry & Open Source & BigData"

Copied!
25
0
0

Loading.... (view fulltext now)

Full text

(1)

השבי תוכרעמ טיבלא בושקתו

Defense Industry & Open Source & BigData

(2)

הצרמ

בולירבג ןמרג

[email protected]

בושקתו השבי תוכרעמ טיבלא

ןיעידומ להנמ

רבייס םוחת

(3)

Defense Industry

Open Source Big Data

Defense Industry & Open Source & Big Data

(4)

Agenda

ךרוצ ימלוע עדימ חפנב לודיג תוינעידומ תוכרעמב ךרוצ

הז המ Big Data

?

3V Model of Big Data Scale up / Scale out CAP theorem תונורתפ יגוס

טקייורפ Apache Hadoop

HDFS

Map Reduce

Hadoop Projects

(5)

ךרוצ -

ימלוע עדימ חפנב לודיג

Twitter produces over 340 million tweets per day, with over 500 million registered users as of 2012

Over 32 billion searches were performed last month on Twitter

Facebook creates over 30 billion pieces of content ranging from web links, news, blogs, photo

Zynga processes 1 petabyte of content for players every day More than 2 billion videos are watched on YouTube every day

By 2015, nearly 3 billion people will be online, pushing

the data created and shared to nearly 8 zettabytes.

(6)

ךרוצ -

ימלוע עדימ חפנב לודיג

(7)

ךרוצ -

ימלוע עדימ חפנב לודיג

quantity of global data

(8)

ךרוצ -

תוינעידומ תוכרעמב ךרוצ

םילודג םיחפנ רצק ןמזב הטילק תלוכי םינותנ לש (

near real-time )

םינותנ לש םינוש םיגוס הטילק תלוכי

עדימ לש םילודג םיחפנ דוביע תלוכי תומאתומ תונוש תוזילנא תצרה תלוכי עדימ גוס

עדימ לש הגצה לש רוקחת תלוכי הרורב הרוצב ,

החונו הריהמ

םייקה עדימה תא אורקל תעדל הצור חוקלה

החונ הרוצב םלועב

(9)

ךרוצ -

תוינעידומ תוכרעמב ךרוצ

(10)

רטיווט ןובשחב ולעה םישנאש תונומתל תואמגוד

(11)

הז המ Big Data

?

What is data?

Data is Information in raw or unorganized form such as alphabets, numeric or symbols.

What is Big Data?

Big Data refers to large datasets which are difficult to store, manage and analyze.

Everyday, we create over 2.5 trillion byte of

data – so much that 90% of the data in the

world today has been created in the last tow

years alone.

(12)

הז המ Big Data

?

O’Reilly Radar definition:

Big data is when the size of the data itself becomes part of the problem

• EMC/IDC definition of big data:

Big data technologies describe a new generation of technologies and

architectures, designed to economically extract value from very large volumes of a wide variety of data, by enabling high-velocity capture, discovery, and/or analysis.

• IBM says that ”three characteristics define big data:”

Volume (Terabytes -> Zettabytes)

Variety (Structured -> Semi-structured -> Unstructured)

Velocity (Batch -> Streaming Data)

(13)

3V Model of Big Data

(14)

תונוכמ ןיב עידמ רוזיב

Scale up / Vertical scaling Scale out / Horizontal scaling / Distributed systems

To scale horizontally means to add more nodes to a system, such as adding a new computer to a distributed software application.

To scale vertically means to add resources to a

single node in a system, typically involving the

addition of CPUs or memory to a single

computer.

(15)

CAP theorem

CA

RDBMSs (MySql,…(

Greenplum Vertica Aster Data

AP Cassandra CouchDB SimpleDB Dynamo CP

Hbase MongoDB Terrastore BigTable

MemcacheDB

(16)

תונורתפ יגוס

Conceptual Structures Description

Store type

Schema-less Key Value Stores

Storage by column Column-oriented

databases

Uses nodes and edges to represent data.

Graph Databases

Store documents that are semi-structured. Often XML databases.

Document Oriented Databases

Value Key

Data Node

Data Node Data

Node

Structured Document (XML) Key

Weight

2.85 kg 1.23 kg 3.76 kg

Price 24.00 $ 17.50 $ 27.30 $ Target

Israel

Italia

Turkey

(17)

תונורתפ יגוס

Functionality Complexity

of Operation Flexibility in

Data Variety Horizontal

Scalability Performance

Type

variable (none) none

high high

high Berkeley

Scalaris

MemcacheDB Key-Value

stores

minimal low

moderate high

high Cassandra

HP Vertica BigTable Hbase OrientDB Column-oriented

databases

graph theory high

high variable

variable Neo4j

InfiniteGraph Titan

OrientDB Graph

Databases

variable (low) low

variable high (high) high

CouchDB MongoDB SimpleDB Redis Document

Oriented Databases

relational moderate

low variable

variable HP Vertica

EMC

Greenplum Shard RDBMS

(MPP)

(18)

טקייורפ Apache Hadoop

hadoop.apache.org

“The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using a simple programming model”

wikipedia.org

Apache Hadoop is an open-source software framework that supports data- intensive distributed applications. Hadoop implements a computational paradigm named MapReduce, where the application is divided into many small fragments of work, each of which may be executed or re-executed on any node in the cluster.

Hadoop provides a distributed file system that stores data on the compute

nodes, providing very high aggregate bandwidth across the cluster. It enables

(19)

טקייורפ Apache Hadoop

Facebook.com Amazon.com Ancestry.com Akamai

American Airlines AOL

Apple eBay

Hortonworks

Federal Reserve Board of Governors

Yahoo!

InMobi Intuit Joost Last.fm LinkedIn Microsoft NetApp Netflix Ooyala

The New York Times SAP AG

SAS Institute StumbleUpon Twitter

Yodlee

Fox Interactive Media Gemvara

Google

Hewlett-Packard

Organizations are using Hadoop to run large distributed computations

IBM - InfoSphere BigInsights Oracle - Big Data Appliance

EMC - Pivotal HD

Microsoft – HDInsights Others

Companies are provides Hadoop in they products

(20)

טקייורפ Apache Hadoop

hdfs

HDFS is a distributed, scalable, and portable

file system. HDFS is designed to store a large

amount of data in various servers/clusters.

(21)

טקייורפ Apache Hadoop

map/reduce

MapReduce is the key algorithm that the

Hadoop MapReduce engine uses to distribute

work around a cluster.

(22)

טקייורפ Apache Hadoop

• Pig )simply query language(

• Hive )SQL like queries(

• Cascading )software abstraction layer (

• Mahout )machine learning(

• Hama )scientific computation(

• Avro )data serialization system(

• Hadoop Map Reduce implementation

• Ambari (deploying, managing, and monitoring tool)

• Sqoop (transferring data tool)

• Oozie (workflow scheduler system)

• Zookeeper (coordination service)

• Flume (framework for populating Hadoop)

• Hadoop Distributed File System

Data Access / Query abilities

Map Reduce

Distributed processing

Management tools

(23)

Hadoop Ecosystem

(24)

תרזעב עדימ תכרעמ לש הרוטקטיכראל אמגוד

Hadoop

(25)

ףוס

ימלוע עדימ חפנב לודיג

תויניעדומ תוכרעמ לש ךרוצ

תונורתפ Big Data

תרזעב שומימ

Apache Hadoop

References

Related documents