• No results found

Big Data Processing: Past, Present and Future

N/A
N/A
Protected

Academic year: 2021

Share "Big Data Processing: Past, Present and Future"

Copied!
21
0
0

Loading.... (view fulltext now)

Full text

(1)

Big Data Processing:

Past, Present and Future

Orion Gebremedhin

National Solutions Director – BI & Big Data , Neudesic LLC.

VTSP – Microsoft Corp.

[email protected] [email protected]

@OrionGM

(2)

Big Data Processing:

Past, Present and Future

(3)

© Copyright 2015, Neudesic. All rights reserved.

© Copyright 2015, Neudesic. All rights reserved.

•  History and Fundamentals of Big Data Processing

•  SQL Server for Big Data, Past, Present and Future

•  Summary

Topics Covered

(4)

Characteristics of Big Data

Big data is data that exceeds the processing capacity of conventional database systems. The data is too big, moves too fast, or doesn’t fit the strictures of your

database architectures.

To gain value from this data, you must choose

an alternative way to process it.

(5)

© Copyright 2015, Neudesic. All rights reserved.

© Copyright 2015, Neudesic. All rights reserved.

Characteristics of Big Data

The Vs of Big Data

•  Volume

• 

40 Zettabytes (43 Trillion Gigabytes) of data will be created by 2020. 300 Times increase from 2005

• 

Most companies in the U.S have at least 100Tb of data

•  Velocity

• 

NYSE captures 1TB of trade information every day

• 

The average modern car has over 100 sensors

•  Variety

• 

Nearly 420 Million wearable health monitors

• 

Over 4 Billion hours of video watched on YouTube everyday

(6)
(7)

© Copyright 2015, Neudesic. All rights reserved.

© Copyright 2015, Neudesic. All rights reserved.

History of Big Data

A big data cluster is a highly interconnected platform built from a collection of commodity parts.

*Disruptive Possibilities by Jeffrey Needham Copyright © 2013

(8)

Scale Up vs. Scale Out

Scale up (SMP) Scale out (MPP)

+

(n)

Upgrade components or buy bigger server each time Add nodes to the cluster

Multiprocessor system where processors share resources :

•  Operating System (OS),

•  memory,

•  I/O devices

and connected using a common bus

Multiple processors, each processor using its own OS and memory and communicating with each other using some form of messaging interface

(9)

© Copyright 2015, Neudesic. All rights reserved.

© Copyright 2015, Neudesic. All rights reserved.

Notable milestones in Commodity hardware

CDC 6600 by Control Data Corporation. "The 6600 CPU had multiple functional units which could operate simultaneously (i.e., in parallel), allowing the CPU to overlap instructions' execution times“..

http://en.wikipedia.org/wiki/CDC_6600

A Beowulf cluster (1990s) is a computer cluster of what are normally identical, commodity-grade computers networked into a small local area network with libraries and programs installed which allow processing to be shared among them.

http://en.wikipedia.org/wiki/Beowulf_cluster

(10)

Some Applications of Big Data

Big Data supercomputers are pattern explorers.

•  Shopping Patterns

•  Sensor and Intelligent devices Data analytics

•  Social Network associations and suggestions

•  Predictive analytics

•  Crime investigation

(11)

© Copyright 2015, Neudesic. All rights reserved.

© Copyright 2015, Neudesic. All rights reserved.

(12)

SQL Server for Big Data

(13)

© Copyright 2015, Neudesic. All rights reserved.

© Copyright 2015, Neudesic. All rights reserved.

SQL Server Optimizations

(14)

About Analytics Platform System

! 

! 

! 

! 

! 

! 

! 

! 

SQL Server Parallel Data

Warehouse

Microsoft HDInsight PolyBase

Microsoft Analytics Platform System

(15)

© Copyright 2015, Neudesic. All rights reserved.

© Copyright 2015, Neudesic. All rights reserved.

APS Growth Topology

Base Unit Scale Unit Extension Base Unit

(16)

Introducing the Microsoft Analytics Platform System

•  Relational and non- relational data in a single appliance

•  Enterprise-ready Hadoop

•  Integrated querying across Hadoop and PDW using T- SQL

•  Direct integration with Microsoft BI tools such as Microsoft Excel

•  Near real-time performance with In- Memory Columnstore

•  Ability to scale out to accommodate growing data

•  Removal of data warehouse bottlenecks with MPP SQL Server

•  Concurrency that fuels rapid adoption

•  Industry’s lowest data warehouse appliance price per terabyte

•  Value through a single appliance solution

•  Value with flexible hardware options using commodity hardware

(17)

© Copyright 2015, Neudesic. All rights reserved.

© Copyright 2015, Neudesic. All rights reserved.

Deployment options and hybrid

solutions

(18)

Provides a single T-SQL query model for PDW and Hadoop with rich features of T-SQL, including joins without ETL

Uses the power of MPP to enhance query execution performance

Supports Windows Azure

HDInsight to enable new hybrid cloud scenarios

Provides the ability to query non- Microsoft Hadoop distributions, such as Hortonworks and

Cloudera

SQL Server Parallel Data

Warehouse Microsoft Azure

HDInsight

PolyBase

Microsoft HDInsight Hortonworks for

Windows and Linux Cloudera

Connecting islands of data with PolyBase

Resul t set Selec

t…

(19)

© Copyright 2015, Neudesic. All rights reserved.

© Copyright 2015, Neudesic. All rights reserved.

Microsoft’s modern data warehouse

Data Platform Analytics Platform System SQL Server 2014

Microsoft Azure HDInsight

(20)

Summary

•  Understand your data growth to determine when to

“Scale-Out”.

•  Determine the right tool for the workload you have.

(21)

© Copyright 2015, Neudesic. All rights reserved.

© Copyright 2015, Neudesic. All rights reserved.

Questions and Discussion

Questions?

References

Related documents

Respondents reporting more than 14 mentally unhealthy days in the past 30 days were more likely to report lack of access to dental care due to cost compared to respondents who

These same factors support the transition to or experience of presence in a remote or artificial environment and are used as basis for presence questionnaire (PQ) to measure the

Study on aircraft maintenance training in South Asian countries highlighted that most of the maintenance training methods are ineffective and not performance based, with

Effects of radiation on unsteady three-dimensional boundary layer flow and heat transfer over a permeable axisymmetric shrinking sheet with suction Dinesh Rajotia* Department

Specifically, we consider the popular Good Features to Track (GFTT) [23] and Harris corner detection principles, and extend them to RGBD content, making the detected keypoints

Luke’s Place envisions a family court system that responds efficiently to end violence against women and effectively provides for the safety, emotional, and financial needs of

The PA 529 GSP guarantees the growth on account contributions will be the rate tuition increases, at the appropriate school or Tuition Level, from when the contribution was made