Accelerating the adoption of Cloud Computing
Planning an OpenStack PoC
Webinar
Speakers Today
– 20 years as architect of infrastructure solutions
for the enterprise
– Experience designing and deploying across US,
APAC and Emerging Markets
– Specializes in infrastructure adoption in the
worlds largest enterprises across people,
process and technology
– Managed and delivered some of the largest
cloud deployments, both public and private,
worldwide
– Business and technical leadership to service
providers and enterprises around the world
– Prior to Solinea, Seth was a Director in the
Product Management Group at Cloudscaling
Brad Vaughan
Solinea Overview
Cloud is the only domain we focus on, with vertical
industry and horizontal solutions specialization
Purpose-built
for cloud
Track record of success architecting, building and
operating production clouds – private and public –
world-wide
Proven Delivery
Success
We understand cloud adoption challenges of global
companies
Enterprise IT
Experience
Integrated capabilities lifecycle: cloud strategy,
architecture, implementation and adoption services
Unique
Approach
!
!
Accelerating Open Infrastructure Adoption
Built the first OpenStack production clouds and
contributors to the platform since its inception
OpenStack™
Experience
Webinar Agenda
Why a Proof of Concept (PoC)?
Select PoC Candidate Workloads
Creating a Test Plan
PoC Architecture
Deployment Planning
Technology Evaluation Continuum
Sandbox
• Informal exploration of
technology
• Small scale
installation to allow for
experimentation
• Single user/operator
testing
Proof of Concept
• Quantifiable proof of business value to
multiple business stakeholders
• Scoped and budgeted project with
assigned staffing
• Proving technical viability for specific
use case and solution
• May also evaluate competing
solutions
• Fully understand the impact/value
across multiple business units/
workloads
Pilot
• Initial build-out of
tested solution
• Limited user
community and SLAs
• Operated with
production tooling and
support
Setting Goals & Criteria
Sandbox
• No predefined goals
or criteria
• Reduced HW
footprint
• Functional
understanding of
technology
Proof of Concept
• Prove a hypothesis
• Goals must be directly link to the
business requirements for approving
next steps
• Generate convincing data
comparing current state solution
• Prove ROI and Investment
• Gain practical skills and
understanding, to properly design the
end state
• Understand impact on IT lifecycle
service development and delivery
process
Pilot
• Production quality/
performance goals
• Successful
completion of
Preproduction QA
testing
• Completion of user
testing
Candidate Workloads
! Selection Criteria
– Solve a existing problem
– Workload/application profile
– Representative architecture pattern
– Complexity and dependency
– Supportability, Customization
! Stakeholder Involvement
– Resource commitment
– Is the pain point real
! Measurability
– Existing quantifiable testing
– Historical data
Selecting Tests
! Defining the scope (breadth and depth of PoC)
! Defines timeline, cost and complexity
! Application level testing
– Primary issue is finding existing test with actual data
– Needs to be self contained with limited dependency on other production
or test/dev systems
– Many applications require refactoring to take advantage of cloud
architecture
! Largest number of tests are generally functional testing
– Auto-scaling
– High Availability
– Operational
! Non-functional tests can be challenging
– PoC is usually only functional simulation of production
Creating a Test Plan
! The candidate selection process should have
identified a workloads with existing test harness
! Developing, architecting and implementing
testing tools is time consuming and complicated
! Formal definition of use cases is required to
ensure a valid scope
Use Case ID
Purpose
Pre-requisites
Required Data
Steps
Expected Results
Actual Results
OpenStack Operational Use Cases
! Exercise the APIs
– Create and destroy
Objects (e.g. users,
tenants, flavors, image)
– Start/Stop, Enable/
Disable
! Non-functional features
– Upgrading the
environment
– High availability /
Failover
OpenStack Testing Tools
! Several tools available
– Tempest: automated CI/CD test suite for OpenStack
– Rally: benchmark OpenStack at scale
! Valuable to validate PoC platform install prior to
running other tests
! Can be very complicated to configure
! Types of Tests
– API – RESTful calls
– CLI – read-only actions of the client
– Scenario – often operational actions
– Stress – used primarily to identify race condition bugs
This test showcases the ability for the cluster to
grow and shrink as needed to handle expected
and unexpected high load and can scale
according to the level of load pushed against the
cluster
Benefits
Results
1
2
1
2
Once the stress testing load was initiated there was
about 60K to 80K requests per second. During this
initial phase the single caching server generated a
sustained CPU load over 75% (Red Bars). This
triggers a heat alarm which will launch and configure
a new caching server.
1
This new caching server is joined to the cluster and
gets an equal number of requests distributed to itself.
This causes the overall Cluster CPU load average to
decrease by roughly half. This should allow the
overall cluster to handle significantly more requests
per second.
2
! Equipment
– Rack
• RUs, Power, A/C
– Servers
• Controller, Storage,
Compute
– Storage
• Storage software,
drives, backup space
– Networking
• Networks, IPs, SSL
certs
! Software & Data
– OpenStack Code
– Application Software
• Licenses
• Who will install
• Who will customize
– Testing Tools
• Install and configure
– Sample Datasets
• Which datasets (live,
test) ?
! Privacy and Security
Identifying the Prerequisites
Example Skills Matrix
Role
Networking
Compute
Storage
Other
“OpenStack”
Generalist
Good Linux
networking
experience
Excellent hypervisor
skills
Excellent Linux
administration skills
Config management
with Puppet, Chef,
etc.
Experience
administrating
iSCSI or NFS
servers
General python
scripting
Experience using
OpenStack
clouds
Network Specialist
Strong general L2/L3
skills with chosen
ToR switches
Excellent virtualized
networking skills
(OVS, linux bridging,
etc.)
Experience with
chosen hypervisor(s)
Experience with NICs
and IPMI/ILo on
chosen hardware
Understanding of
network tuning for
iSCSI / NFS traffic
Storage Specialist
Familiarity with
iSCSI / NFS tuning
Excellent tuning/
troubleshooting
with chosen
storage
OpenStack Distributions
Sandbox
• DevStack
• RDO
• Fuel
Proof of Concept
• RDO/RHEL OSP
• Fuel
• Piston
• Cloudscaling
• Stackops
• Many others …
Pilot
• RHEL OSP
• Fuel
• Piston
• Cloudscaling
• Stackops
• Many others …
Distribution Selection Criteria
! Price
! Adoption
! Support Offerings
! Installation Simplicity
! Maintainability and Management
! OpenStack release
! Value Added Tools
! Specialized Features
– Storage
– VMware integration
– Quota
– SDN
! Familiarity
Logical Architecture
Object Store
• Swift Proxy
• Container
• Object
• Account
Controller(s)
• All APIs
except Swift
• Neutron
gateway
• Qpid
• MySQL
Jump Box
• Foreman
• Repository
• Heat VM
• Horizon
• SSH
Compute
• Nova compute
• Neutron agent
Block
• iSCSI
• Cinder
IPMI Network
Mgmt Network
Storage Network
Public Network
192.168.103.0/24
Private Network
192.168.1.0/24
Floating IPs
10.10.1.0/24
192.168.102.0/24
192.168.101.0/24
Unit Segment Role Hardware
42
Network Switch (IPMI) Cisco 2xxx 41 Switch (Service) Arista 7150 40 Switch (Management) Cisco 3xxx 39 Management cntr-‐01 Quanta X12RS 38 cntr-‐02 Quanta X12RS 37 cntr-‐03 Quanta X12RS 36 cntr-‐04 Quanta X12RS 35 cntr-‐05 Quanta X12RS 34 cntr-‐06 Quanta X12RS 33 Compute comp-‐01 Quanta X12RS 32 comp-‐02 Quanta X12RS 31 comp-‐03 Quanta X12RS 30 comp-‐04 Quanta X12RS 29 comp-‐05 Quanta X12RS 28 comp-‐06 Quanta X12RS 27 comp-‐07 Quanta X12RS 26 comp-‐08 Quanta X12RS 25 comp-‐09 Quanta X12RS 24 comp-‐10 Quanta X12RS 23 KVM Monitor + KVM Dell KVM 22
21 Admin jump-‐01 Quanta X12RS 20 Block iscsi-‐01 Quanta X22RQ 19 18 iscsi-‐02 Quanta X22RQ 17 16 iscsi-‐03 Quanta X22RQ 15 14 Object obj-‐01 Quanta X22RQ 13 12 obj-‐02 Quanta X22RQ 11 10 obj-‐03 Quanta X22RQ 9 8 obj-‐04 Quanta X22RQ 7 6 obj-‐05 Quanta X22RQ 5 4 3 2 1
! Servers
– Minimal server hardware
configuration diversity
– One model for compute, one
for storage
– Most people segregate
compute, object and block
storage from controller
nodes
! ToR Switches
– 10Gb networking for public,
management and data
networks
– 1GB for IPMI
! Storage will be determined
by workload needs
– NFS, iSCSI, Swift and Ceph
dominate storage configs
Example Hardware Design
OpenStack PoC Evaluation Weighting (0 to 5) 5=most important
RHEL OSP SUSE
Rank Weighted Score Rank Weighted Score
Criteria
1. Compute Resources
This category defines the attributes of the compute resource that are under control of the end user. The end user should be able to configure the capacity and attributes of a compute unit with minimal friction and deploy the appropriate level of resources without the need to "over provision". The ideal situation is to have granular control over both the workload capacity of the compute unit and the service level. The compute unit should be able to easily scale to meet a variety of workloads, I.E. once the initial compute unit is provisioned you should be able to easily add incremental and storage resources.
Compute
B. Ability to configure private flavors 4 5 20 3 12 C. Ability to configure memory in GB increments from .5 to 128 4 5 20 4 0 D. Ability to configure attached storage in GB increments to 1TB 4 5 20 3 12 F. Ability to meter usage in 1 hour increments 1 5 5 2 2 G. Compute resource configuration changes can be made via the
portal or via an API call 5 5 25 1 5 H. Ability to upload images into service catalog 5 5 25 2 10 I. 3 5 15 2 6
Compute Score 5.0 18.6 2.4 6.7
Allocation of Compute Score 15% 0.8 2.79 0.4 1.01
2. Storage Resources
This category defines the attributes of the storage services that are under control of the end user. Two categories of storage services are listed Object based storage and Block based storage. Object based storage, which would be appropriate for storing backups, images, archives, etc. Object based storage is used when latency and performance are not top criteria and low cost/high volume requirements preside. Object based storage is not part of the local attached file system. Amazon web services S3 or Openstack SWIFT are examples of object based storage. Block based storage refers to the typical file system storage that is directly accessible by OS and conforms to the file system structure in use by the Guest OS. Block based storage can be delivered using a variety of service levels and is often classified
using IOPS , latency or QoS levels.
Object based storage
A. Ability to read, write and delete and Secure objects ranging in size
from 1 byte to 5 terabytes 2 3 6 1 2 B. Objects can be stored over geographically tiered locations 1 4 4 2 2 E. Accessible via APIs 1 5 5 3 3 E. Objects are taggable and versioned 1 3 3 4 4 F. Objects are replicated to multiple locations 1 2 2 6 6
Block-‐based storage
A. Integrate with compute (attach/detach) 3 2 6 3 9 B. Multiple SLAs based tiers of block storage service 3 5 15 1 3 C. Ability to provide point-in-time snapshot backups
2 5 10 5 10 D. Ability to resize volumes 1 5 5 7 7 E. Available across geographically dispersed locations 1 5 5 3 3 F. Storage has configurable IOPS 1 5 5 1 1 G. Metering is produced on volume/GB hours 1 5 5 2 2
Storage Score 4.1 5.9 3.2 4.3
Allocation of Storage Score 15% 0.6 0.89 0.5 0.65