• No results found

Big data blue print for cloud architecture

N/A
N/A
Protected

Academic year: 2021

Share "Big data blue print for cloud architecture"

Copied!
37
0
0

Loading.... (view fulltext now)

Full text

(1)

©2012, Cognizant

Image

Area

-

COGNIZANT

Prabhu Inbarajan

Srinivasan Thiruvengadathan

Muralicharan Gurumoorthy

Praveen Codur

(2)

| ©2012, Cognizant

Big Data / Cloud challenges and opportunities Cognizant’s Solution Framework

Solution deep dive Results

Future Opportunities

(3)

| ©2012, Cognizant

Projects:

Large scale cloud transformation projects for enterprise data warehouses & analytics environments using open BI

technologies

Big deal

• Business Expectations • Complex Environment

• No frame of reference, No standard stacks

Introduction

About Us

(4)

Expectation

 Effective cost utilization  Business expansion to other

regions

 Zero tolerance for the data / traffic loss

 Product/business – 24x7 available

 Business continuity plan

Technical Translation

 High Availability  Elasticity

 Effective Backup

 Spanning across regions  Efficient

 Seamless and Transparent to

end users

Expectations

(5)

| ©2012, Cognizant

 Build on Premises vs. Cloud source?

 What is the Return on Assets / Cost of Computing / Economics  What is my frame of reference?

 What are the technology choices?  What is the optimal technology stack?  What is the optimal time to market?

 What are the operational challenges and how to mitigate it?

Recurring Questions?

Goal

(6)

| ©2012, Cognizant

Cost - Optimization potential vs reality

Environment: 10 Extra Large CPU instances, 60 Large CPU instances, and 30 Small CPU instances.

F a i l o v e r s i t e Seasonality of traffic T r a f f i c

(7)

| ©2012, Cognizant

Moore says its not enough -…

Jan-10 Mar-12

On-Demand Instances Small – Linux – N. Virginia $0.085 $0.080 Quadruple Extra Large – Windows – N. California $3.160 $2.504

Data Transfer – In Free till June 2010 Free

Data Transfer – Out Per GB depending on the total monthly volume $0.1 to $0.17 $0.05 to $0.12 Storage (EBS) Per allocated GB per month $0.10 $0.10

I/O Requests Per million I/O $0.10 $0.10

(8)

| ©2012, Cognizant

Parameter Fixed capacity 2 step scaling 3 step scaling 4 step scaling

Utilization 30-35 % 50-55 % 60-70 % 75-80 %

Complexity 0 % 30 % 50 % 75%

Cloud - Optimization Models

Cost & Utilization vs Complexity

Traffic distribution P ea k Tr af fi c Planned capacity Traffic distribution Traffic distribution Traffic distribution

(9)

| ©2012, Cognizant

Mission - Value Proposition

Mission: To create a framework/solution which abstracts all of the below complexities from application developers and operators and provide a blue print implementation for Big data enterprise applications on Cloud

•Cloud Infrastructure •Operating systems •Security

•Monitoring

•Application stack for BI, Visualization, data collection and

processing

•Data Stores

Value Prop: Encapsulates the body of knowledge around cloud and open BI, into an automated solution , resulting in

•Higher Productivity •Higher Efficiency •Repeatability •Highly optimized

(10)

| ©2012, Cognizant

Solution SAHANA – blue print for Cloud BI stack

Big Storage Data Integration Collection

BI & Visualization Scheduling

(11)

| ©2012, Cognizant

SAHANA v1.0

Big Storage Data Integration Collection

BI & Visualization Scheduling

Security

Monitoring

Storage

Infrastructure & OS` Provisioning

Orchestration

(12)

| ©2012, Cognizant Map Reduce

Complex Architecture

Ad server API Cluster Ad Center Job Scheduler MemCache Middletier Load Balancer SQL Server 2008 Data Warehouse Master Slave RPT Master Slave Slave Slave Activity History Slave ETL Master Slave Su m m ary R e po rts D isp la y Pentaho HDFS HBASE Publishers Advertisers

 Infrastructure equipped for Ad Serving and analytics capabilities for a top tier Search Engine.  An atlas of functional components of Front ends, Processing layers, Data stores, spawning over

(13)
(14)

| ©2012, Cognizant

Architectures

(15)

| ©2012, Cognizant

Scaling up or scaling down as demand on the application fluctuates

Back up critical data on persistent store (Object based like S3.)

(16)

| ©2012, Cognizant Scaling Parameters Scaling Up Scaling Down  CPU  Memory  Disk/Network IO  System load  Response latency

 All the above listed metrics should be at 80%.

 Scale horizontally if any parameter threshold is breached.

 Add 10% capacity at burst if load is coming back to threshold if not keep on adding 10% capacity till load comes back to threshold.

Scaling Strategy

 Depends on the application nature, if app is fault tolerant scaling down can be automatic.

 If data needed to be backed up scaling down require human intervention.  Scale down if above parameters values come below 70%.

DR

 20% capacity will be running as hot stand by.

(17)

Deployment strategy

• Can be done using sever template of the running application components. Auto scaling group will be defined on scaling parameter which will scale up servers by launching server templates

• Other approach is to bring base server and configure it as per the application role using config tool like chef.

0 1 2 3 4 5 6

Chef Server Templates

(18)
(19)

| ©2012, Cognizant

Store for Historical data for Analytics queries.

Readily available for ad-hoc querying.

Tiered data retention policy

Amazon S3 as a data backbone.

Choices available AWS EMR, Hadoop cluster, Hive, Hbase

(20)

| ©2012, Cognizant

Architecture

Hadoop

TT/ DN

Job tracker

Job client Job client

TT/ DN TT TT Host1 ShardA ShardB ShardC ShardD Host2 ShardA ShardB ShardC ShardD HostN ShardA ShardB ShardC ShardD Message Queue

Job client Job client

(21)

| ©2012, Cognizant

Architecture continued …

HDFS

(Hadoop Distributed File System)

HBase

(Key-Value store)

MapReduce (Job Scheduling/Execution System)

Pig

(Data Flow)

Hive

(SQL)

BI Reporting ETL Tools

(22)

| ©2012, Cognizant Scaling Parameters Scaling Up Scaling Down

 Index size for Traditional OLTP system.

 Number for task handler capacity in case of batch processing system.  Processing time of query/job.

 Add shard server to OLTP server.

 Add Task tracker to batch processing system.

Scaling Strategy

 Reduce task trackers nodes if processing capacity is more.  Distribute the data to other nodes before shutting off node.

DR  0% capacity active at any time for batch processing system.

(23)

Deployment strategy – offline system

• Sever Template of Data nodes task tracker. Launched server will come online at start processing data.

• Through configuration management tool. Launch base server and configure it as cluster node. 0 1 2 3 4 5

Chef Server Templates

(24)
(25)

| ©2012, Cognizant

Data Stores

• Will have the raw, meta and summarized data sets.

• Summarized data is derived by processing raw data and used for historical comparisons.

• DR Site once operational will get the meta and summarized data sets from data bus.

10 70 20 5 % of Total data Raw Unprocessed Raw Historical Summarized Meta

Data Type Access Frequency

Frequent Moderate Rare

Raw Unprocessed X

Raw Historical X Summarized X

(26)

| ©2012, Cognizant Scaling Parameters Scaling Up Scaling Down  Storage Capacity

 Add Capacity if Utilization goes above 80%

Scaling Strategy

 Not applicable

DR

(27)

| ©2012, Cognizant Applications Ops System Virtual machines Model Recipe Dashboard Configure Provision Monitor

Orchestration Layout

Chef

AWS Cloud formation

Automated Launch Scripts and Server templatization

(28)

| ©2012, Cognizant Dash Board System monitoring DFS metric Utilization metrics Active nodes MR metrics Ganglia Zenoss

Java app with exposed JMX Apache/ Tomcat stats System metrics / SNMP Functional stats System state System count System Load

Monitoring Layout

The new host should register them selves to monitoring systems using API.

While scaling down the server needs to be removed for monitoring.

Ran chef recipe to get node added to monitoring.

(29)
(30)

| ©2012, Cognizant

30

Load Balancer

Log server Log server Log server

Data Node 1 Data Node n

Name Node

Concentrator Concentrator Secondary

(31)

| ©2012, Cognizant

31

Load Balancer

Log server Log server Log server

Data Node 1 Data Node n

Name Node Data Store Concentrato r Secondary Concentrato r RDBMS (M) Job Scheduler App server Secondary Name Node RDBMS Ops Load Balancer Log server Secondary Name Node App server Name Node RDBMS

Data Node 1 Data Node n

Orchestration layer

Managed DNS

Failover - DR Primary

(32)

| ©2012, Cognizant

32

Load Balancer

Log server Log server Log server

Data Node 1 Data Node n

Name Node Concentrato r Secondary Concentrato r RDBMS (M) Job Scheduler App server Secondary Name Node RDBMS Ops Load Balancer

Log server Log server Log server

Data Node 1 Data Node n

(33)

| ©2012, Cognizant

33

Load Balancer

Log server Log server Log server

Data Node 1 Data Node n

Name Node Concentrato r Secondary Concentrato r RDBMS (M) Job Scheduler App server Secondary Name Node RDBMS Ops Load Balancer

Log server Log server Log server

Data Node 1 Data Node n

(34)

| ©2012, Cognizant

34

Load Balancer

Log server Log server Log server

Data Node 1 Data Node n

Name Node

Concentrator Concentrator Secondary

RDBMS (M) Job Scheduler App server Secondary Name Node RDBMS Ops Load Balancer Log server Managed DNS Failover - DR Primary Data Store Orchestration layer

(35)

| ©2012, Cognizant

Results

Productivity

• * Implementation time for a base Big data / ETL system reduced from multi-month to less than a day

• * Developers focus on Business than Infrastructure

Efficiency

 * Better utilization of system resources – operating at 30% more utilization than benchmark  * Optimized for performance – 10% higher than stock configuration

Repeatability

 * Complete BI stack in less than a day regardless of scale

Opex

 * At least 50% better against bench mark

TCO

(36)

| ©2012, Cognizant

(37)

©2012, Cognizant

Image

Area

Contact us: [email protected][email protected][email protected]

Cognizant - Global Technology Consulting

7.2 billion gross revenue 1000+ customers, 50+ delivery centers 160,000 employees 23+ Verticals (E-commerce, Banking, Insurance, …)

E-commerce

Dedicated practice for internet businesses

Large scale, complex implementations with emerging Technologies

1000+ Enterprise Architects

Search & Advertising

Dedicated practice for Search, Advertising and

Analytics

Mature Big data / Cloud solutions and frameworks Research & Development Innovation & Patents

Free – assessment of your challenges / environment

Credits:

Laxmana Gunta Viral Shah

References

Related documents