©2012, Cognizant
Image
Area
-
COGNIZANT
Prabhu Inbarajan
Srinivasan Thiruvengadathan
Muralicharan Gurumoorthy
Praveen Codur
| ©2012, Cognizant
Big Data / Cloud challenges and opportunities Cognizant’s Solution Framework
Solution deep dive Results
Future Opportunities
| ©2012, Cognizant
Projects:
Large scale cloud transformation projects for enterprise data warehouses & analytics environments using open BI
technologies
Big deal
• Business Expectations • Complex Environment
• No frame of reference, No standard stacks
Introduction
About Us
Expectation
Effective cost utilization Business expansion to other
regions
Zero tolerance for the data / traffic loss
Product/business – 24x7 available
Business continuity plan
Technical Translation
High Availability Elasticity
Effective Backup
Spanning across regions Efficient
Seamless and Transparent to
end users
Expectations
| ©2012, Cognizant
Build on Premises vs. Cloud source?
What is the Return on Assets / Cost of Computing / Economics What is my frame of reference?
What are the technology choices? What is the optimal technology stack? What is the optimal time to market?
What are the operational challenges and how to mitigate it?
Recurring Questions?
Goal
| ©2012, Cognizant
Cost - Optimization potential vs reality
Environment: 10 Extra Large CPU instances, 60 Large CPU instances, and 30 Small CPU instances.
F a i l o v e r s i t e Seasonality of traffic T r a f f i c
| ©2012, Cognizant
Moore says its not enough -…
Jan-10 Mar-12
On-Demand Instances Small – Linux – N. Virginia $0.085 $0.080 Quadruple Extra Large – Windows – N. California $3.160 $2.504
Data Transfer – In Free till June 2010 Free
Data Transfer – Out Per GB depending on the total monthly volume $0.1 to $0.17 $0.05 to $0.12 Storage (EBS) Per allocated GB per month $0.10 $0.10
I/O Requests Per million I/O $0.10 $0.10
| ©2012, Cognizant
Parameter Fixed capacity 2 step scaling 3 step scaling 4 step scaling
Utilization 30-35 % 50-55 % 60-70 % 75-80 %
Complexity 0 % 30 % 50 % 75%
Cloud - Optimization Models
Cost & Utilization vs Complexity
Traffic distribution P ea k Tr af fi c Planned capacity Traffic distribution Traffic distribution Traffic distribution
| ©2012, Cognizant
Mission - Value Proposition
Mission: To create a framework/solution which abstracts all of the below complexities from application developers and operators and provide a blue print implementation for Big data enterprise applications on Cloud
•Cloud Infrastructure •Operating systems •Security
•Monitoring
•Application stack for BI, Visualization, data collection and
processing
•Data Stores
Value Prop: Encapsulates the body of knowledge around cloud and open BI, into an automated solution , resulting in
•Higher Productivity •Higher Efficiency •Repeatability •Highly optimized
| ©2012, Cognizant
Solution SAHANA – blue print for Cloud BI stack
Big Storage Data Integration Collection
BI & Visualization Scheduling
| ©2012, Cognizant
SAHANA v1.0
Big Storage Data Integration Collection
BI & Visualization Scheduling
Security
Monitoring
Storage
Infrastructure & OS` Provisioning
Orchestration
| ©2012, Cognizant Map Reduce
Complex Architecture
Ad server API Cluster Ad Center Job Scheduler MemCache Middletier Load Balancer SQL Server 2008 Data Warehouse Master Slave RPT Master Slave Slave Slave Activity History Slave ETL Master Slave Su m m ary R e po rts D isp la y Pentaho HDFS HBASE Publishers Advertisers Infrastructure equipped for Ad Serving and analytics capabilities for a top tier Search Engine. An atlas of functional components of Front ends, Processing layers, Data stores, spawning over
| ©2012, Cognizant
Architectures
| ©2012, Cognizant
Scaling up or scaling down as demand on the application fluctuates
Back up critical data on persistent store (Object based like S3.)
| ©2012, Cognizant Scaling Parameters Scaling Up Scaling Down CPU Memory Disk/Network IO System load Response latency
All the above listed metrics should be at 80%.
Scale horizontally if any parameter threshold is breached.
Add 10% capacity at burst if load is coming back to threshold if not keep on adding 10% capacity till load comes back to threshold.
Scaling Strategy
Depends on the application nature, if app is fault tolerant scaling down can be automatic.
If data needed to be backed up scaling down require human intervention. Scale down if above parameters values come below 70%.
DR
20% capacity will be running as hot stand by.
Deployment strategy
• Can be done using sever template of the running application components. Auto scaling group will be defined on scaling parameter which will scale up servers by launching server templates
• Other approach is to bring base server and configure it as per the application role using config tool like chef.
0 1 2 3 4 5 6
Chef Server Templates
| ©2012, Cognizant
Store for Historical data for Analytics queries.
Readily available for ad-hoc querying.
Tiered data retention policy
Amazon S3 as a data backbone.
Choices available AWS EMR, Hadoop cluster, Hive, Hbase
| ©2012, Cognizant
Architecture
Hadoop
TT/ DN
Job tracker
Job client Job client
TT/ DN TT TT Host1 ShardA ShardB ShardC ShardD Host2 ShardA ShardB ShardC ShardD HostN ShardA ShardB ShardC ShardD Message Queue
Job client Job client
| ©2012, Cognizant
Architecture continued …
HDFS
(Hadoop Distributed File System)
HBase
(Key-Value store)
MapReduce (Job Scheduling/Execution System)
Pig
(Data Flow)
Hive(SQL)
BI Reporting ETL Tools
| ©2012, Cognizant Scaling Parameters Scaling Up Scaling Down
Index size for Traditional OLTP system.
Number for task handler capacity in case of batch processing system. Processing time of query/job.
Add shard server to OLTP server.
Add Task tracker to batch processing system.
Scaling Strategy
Reduce task trackers nodes if processing capacity is more. Distribute the data to other nodes before shutting off node.
DR 0% capacity active at any time for batch processing system.
Deployment strategy – offline system
• Sever Template of Data nodes task tracker. Launched server will come online at start processing data.
• Through configuration management tool. Launch base server and configure it as cluster node. 0 1 2 3 4 5
Chef Server Templates
| ©2012, Cognizant
Data Stores
• Will have the raw, meta and summarized data sets.
• Summarized data is derived by processing raw data and used for historical comparisons.
• DR Site once operational will get the meta and summarized data sets from data bus.
10 70 20 5 % of Total data Raw Unprocessed Raw Historical Summarized Meta
Data Type Access Frequency
Frequent Moderate Rare
Raw Unprocessed X
Raw Historical X Summarized X
| ©2012, Cognizant Scaling Parameters Scaling Up Scaling Down Storage Capacity
Add Capacity if Utilization goes above 80%
Scaling Strategy
Not applicable
DR
| ©2012, Cognizant Applications Ops System Virtual machines Model Recipe Dashboard Configure Provision Monitor
Orchestration Layout
ChefAWS Cloud formation
Automated Launch Scripts and Server templatization
| ©2012, Cognizant Dash Board System monitoring DFS metric Utilization metrics Active nodes MR metrics Ganglia Zenoss
Java app with exposed JMX Apache/ Tomcat stats System metrics / SNMP Functional stats System state System count System Load
Monitoring Layout
•
The new host should register them selves to monitoring systems using API.
•
While scaling down the server needs to be removed for monitoring.
•
Ran chef recipe to get node added to monitoring.
| ©2012, Cognizant
30
Load Balancer
Log server Log server Log server
Data Node 1 Data Node n
Name Node
Concentrator Concentrator Secondary
| ©2012, Cognizant
31
Load Balancer
Log server Log server Log server
Data Node 1 Data Node n
Name Node Data Store Concentrato r Secondary Concentrato r RDBMS (M) Job Scheduler App server Secondary Name Node RDBMS Ops Load Balancer Log server Secondary Name Node App server Name Node RDBMS
Data Node 1 Data Node n
Orchestration layer
Managed DNS
Failover - DR Primary
| ©2012, Cognizant
32
Load Balancer
Log server Log server Log server
Data Node 1 Data Node n
Name Node Concentrato r Secondary Concentrato r RDBMS (M) Job Scheduler App server Secondary Name Node RDBMS Ops Load Balancer
Log server Log server Log server
Data Node 1 Data Node n
| ©2012, Cognizant
33
Load Balancer
Log server Log server Log server
Data Node 1 Data Node n
Name Node Concentrato r Secondary Concentrato r RDBMS (M) Job Scheduler App server Secondary Name Node RDBMS Ops Load Balancer
Log server Log server Log server
Data Node 1 Data Node n
| ©2012, Cognizant
34
Load Balancer
Log server Log server Log server
Data Node 1 Data Node n
Name Node
Concentrator Concentrator Secondary
RDBMS (M) Job Scheduler App server Secondary Name Node RDBMS Ops Load Balancer Log server Managed DNS Failover - DR Primary Data Store Orchestration layer
| ©2012, Cognizant
Results
Productivity
• * Implementation time for a base Big data / ETL system reduced from multi-month to less than a day
• * Developers focus on Business than Infrastructure
Efficiency
* Better utilization of system resources – operating at 30% more utilization than benchmark * Optimized for performance – 10% higher than stock configuration
Repeatability
* Complete BI stack in less than a day regardless of scale
Opex
* At least 50% better against bench mark
TCO
| ©2012, Cognizant
©2012, Cognizant
Image
Area
Contact us: [email protected] [email protected] [email protected]Cognizant - Global Technology Consulting
7.2 billion gross revenue 1000+ customers, 50+ delivery centers 160,000 employees 23+ Verticals (E-commerce, Banking, Insurance, …)
E-commerce
Dedicated practice for internet businesses
Large scale, complex implementations with emerging Technologies
1000+ Enterprise Architects
Search & Advertising
Dedicated practice for Search, Advertising and
Analytics
Mature Big data / Cloud solutions and frameworks Research & Development Innovation & Patents
Free – assessment of your challenges / environment
Credits:Laxmana Gunta Viral Shah