Big data blue print for cloud architecture

(1)

Image

Area

-

COGNIZANT

Prabhu Inbarajan

Srinivasan Thiruvengadathan

Muralicharan Gurumoorthy

Praveen Codur

(2)

Big Data / Cloud challenges and opportunities Cognizant’s Solution Framework

Solution deep dive Results

Future Opportunities

(3)

Projects:

Large scale cloud transformation projects for enterprise data warehouses & analytics environments using open BI

technologies

Big deal

• Business Expectations • Complex Environment

• No frame of reference, No standard stacks

Introduction

About Us

(4)

Expectation

 Effective cost utilization  Business expansion to other

regions

 Zero tolerance for the data / traffic loss

 Product/business – 24x7 available

 Business continuity plan

Technical Translation

 High Availability  Elasticity

 Effective Backup

 Spanning across regions  Efficient

 Seamless and Transparent to

end users

Expectations

(5)

 Build on Premises vs. Cloud source?

 What is the Return on Assets / Cost of Computing / Economics  What is my frame of reference?

 What are the technology choices?  What is the optimal technology stack?  What is the optimal time to market?

 What are the operational challenges and how to mitigate it?

Recurring Questions?

Goal

(6)

Cost - Optimization potential vs reality

Environment: 10 Extra Large CPU instances, 60 Large CPU instances, and 30 Small CPU instances.

F a i l o v e r s i t e Seasonality of traffic T r a f f i c

(7)

Moore says its not enough -…

Jan-10 Mar-12

On-Demand Instances Small – Linux – N. Virginia $0.085 $0.080 Quadruple Extra Large – Windows – N. California $3.160 $2.504

Data Transfer – In Free till June 2010 Free

Data Transfer – Out Per GB depending on the total monthly volume $0.1 to $0.17 $0.05 to $0.12 Storage (EBS) Per allocated GB per month $0.10 $0.10

I/O Requests Per million I/O $0.10 $0.10

(8)

Parameter Fixed capacity 2 step scaling 3 step scaling 4 step scaling

Utilization 30-35 % 50-55 % 60-70 % 75-80 %

Complexity 0 % 30 % 50 % 75%

Cloud - Optimization Models

Cost & Utilization vs Complexity

Traffic distribution P ea k Tr af fi c Planned capacity Traffic distribution Traffic distribution Traffic distribution

(9)

Mission - Value Proposition

Mission: To create a framework/solution which abstracts all of the below complexities from application developers and operators and provide a blue print implementation for Big data enterprise applications on Cloud

•Cloud Infrastructure •Operating systems •Security

•Monitoring

•Application stack for BI, Visualization, data collection and

processing

•Data Stores

Value Prop: Encapsulates the body of knowledge around cloud and open BI, into an automated solution , resulting in

•Higher Productivity •Higher Efficiency •Repeatability •Highly optimized

(10)

Solution SAHANA – blue print for Cloud BI stack

Big Storage Data Integration Collection

BI & Visualization _Scheduling

(11)

SAHANA v1.0

Big Storage Data Integration Collection

BI & Visualization _Scheduling

Security

Monitoring

Storage

Infrastructure & OS` Provisioning

Orchestration

(12)

Complex Architecture

Ad server API Cluster Ad Center Job Scheduler MemCache Middletier Load Balancer SQL Server 2008 Data Warehouse Master Slave RPT Master Slave Slave Slave Activity History _Slave ETL Master Slave Su m m ary R e po rts D isp la y Pentaho HDFS HBASE Publishers _Advertisers

 Infrastructure equipped for Ad Serving and analytics capabilities for a top tier Search Engine.  An atlas of functional components of Front ends, Processing layers, Data stores, spawning over

(13)

(14)

Architectures

(15)

Scaling up or scaling down as demand on the application fluctuates

Back up critical data on persistent store (Object based like S3.)

(16)

 All the above listed metrics should be at 80%.

 Scale horizontally if any parameter threshold is breached.

 Add 10% capacity at burst if load is coming back to threshold if not keep on adding 10% capacity till load comes back to threshold.

Scaling Strategy

 Depends on the application nature, if app is fault tolerant scaling down can be automatic.

 If data needed to be backed up scaling down require human intervention.  Scale down if above parameters values come below 70%.

DR

 20% capacity will be running as hot stand by.

(17)

Deployment strategy

• Can be done using sever template of the running application components. Auto scaling group will be defined on scaling parameter which will scale up servers by launching server templates

• Other approach is to bring base server and configure it as per the application role using config tool like chef.

0 1 2 3 4 5 6

Chef Server Templates

(18)

(19)

Store for Historical data for Analytics queries.

Readily available for ad-hoc querying.

Tiered data retention policy

Amazon S3 as a data backbone.

Choices available AWS EMR, Hadoop cluster, Hive, Hbase

(20)

Architecture

Hadoop

TT/ DN

Job tracker

Job client Job client

TT/ DN TT TT Host1 ShardA ShardB ShardC ShardD Host2 ShardA ShardB ShardC ShardD HostN ShardA ShardB ShardC ShardD Message Queue

Job client Job client

(21)

Architecture continued …

HDFS

(Hadoop Distributed File System)

HBase

(Key-Value store)

MapReduce (Job Scheduling/Execution System)

Pig

(Data Flow)

Hive

(SQL)

BI Reporting ETL Tools

(22)

 Index size for Traditional OLTP system.

 Number for task handler capacity in case of batch processing system.  Processing time of query/job.

 Add shard server to OLTP server.

 Add Task tracker to batch processing system.

Scaling Strategy

 Reduce task trackers nodes if processing capacity is more.  Distribute the data to other nodes before shutting off node.

DR _{ 0% capacity active at any time for batch processing system.}

(23)

Deployment strategy – offline system

• Sever Template of Data nodes task tracker. Launched server will come online at start processing data.

• Through configuration management tool. Launch base server and configure it as cluster node. 0 1 2 3 4 5

Chef Server Templates

(24)

(25)

Data Stores

• Will have the raw, meta and summarized data sets.

• Summarized data is derived by processing raw data and used for historical comparisons.

• DR Site once operational will get the meta and summarized data sets from data bus.

10 70 20 5 % of Total data Raw Unprocessed Raw Historical Summarized Meta

Data Type Access Frequency

Frequent Moderate Rare

Raw Unprocessed _X

Raw Historical _X Summarized _X

(26)

 Add Capacity if Utilization goes above 80%

Scaling Strategy

 Not applicable

DR

(27)

Orchestration Layout

Chef

AWS Cloud formation

Automated Launch Scripts and Server templatization

(28)

Java app with exposed JMX Apache/ Tomcat stats System metrics / SNMP Functional stats System state System count System Load

Monitoring Layout

• The new host should register them selves to monitoring systems using API.

• While scaling down the server needs to be removed for monitoring.

• Ran chef recipe to get node added to monitoring.

(29)

(30)

30

Load Balancer

Log server Log server Log server

Data Node 1 Data Node n

Name Node

Concentrator _ConcentratorSecondary

(31)

31

Load Balancer

Name Node Data Store Concentrato r Secondary Concentrato r RDBMS (M) Job Scheduler App server Secondary Name Node RDBMS Ops Load Balancer Log server Secondary Name Node App server Name Node RDBMS

Orchestration layer

Managed DNS

Failover - DR Primary

(32)

32

Load Balancer

Name Node Concentrato r Secondary Concentrato r RDBMS (M) Job Scheduler App server Secondary Name Node RDBMS Ops Load Balancer

(33)

33

Load Balancer

Name Node Concentrato r Secondary Concentrato r RDBMS (M) Job Scheduler App server Secondary Name Node RDBMS Ops Load Balancer

(34)

34

Load Balancer

Name Node

Concentrator _ConcentratorSecondary

RDBMS (M) Job Scheduler App server Secondary Name Node RDBMS Ops Load Balancer Log server Managed DNS Failover - DR Primary Data Store Orchestration layer

(35)

Results



Productivity

• * Implementation time for a base Big data / ETL system reduced from multi-month to less than a day

• * Developers focus on Business than Infrastructure



Efficiency

 * Better utilization of system resources – operating at 30% more utilization than benchmark  * Optimized for performance – 10% higher than stock configuration



Repeatability

 * Complete BI stack in less than a day regardless of scale



Opex

 * At least 50% better against bench mark



TCO

(36)

(37)

Image

Area

 Contact us:  [email protected]  [email protected]  [email protected]

Cognizant - Global Technology Consulting

7.2 billion gross revenue 1000+ customers, 50+ delivery centers 160,000 employees 23+ Verticals (E-commerce, Banking, Insurance, …)

E-commerce

Dedicated practice for internet businesses

Large scale, complex implementations with emerging Technologies

1000+ Enterprise Architects

Search & Advertising

Dedicated practice for Search, Advertising and

Analytics

Mature Big data / Cloud solutions and frameworks Research & Development Innovation & Patents

Free – assessment of your challenges / environment

Credits:

Laxmana Gunta Viral Shah