© 2014 IBM Corporation
1
IBM Platform Computing Cloud Service
Ready to use Platform LSF & Symphony clusters
in the SoftLayer cloud
© 2014 IBM Corporation
2
Agenda
v
Mapping clients needs to cloud technologies
v
Addressing your pain points
v
Introducing IBM Platform Computing Cloud Service
v
Product features and benefits
v
Use cases
© 2014 IBM Corporation
3
HPC cloud characteristics and economics are different than
general-purpose computing
• High-end hardware and special purpose devices (e.g. GPUs) are typically used to
supply the needed processing, memory, network, and storage capabilities
• The performance requirements of technical computing and service-oriented
workloads means that performance may be impacted in a virtualized cloud
environment, especially when latency or I/O is a constraint
• HPC cluster/grid utilization is usually in the 70-90% range, removing a major
potential advantage of a public cloud service provider for stable workload volumes
HPC Workloads Recommended for Private Cloud
HPC Workloads with Best Potential for Virtualized Public & Hybrid Cloud
© 2014 IBM Corporation
4
IBM’s HPC cloud strategy provides a flexible approach to address
a variety of client needs
Evolve existing
infrastructure to
HPC Cloud to enhance
responsiveness,
flexibility, and
cost effectiveness.
Enable integrated
approach to improve
HPC cost and
capability
60%
Access additional
HPC capacity with
variable cost model
Private
Clouds
Hybrid
Clouds
Public
Clouds
Based on HPC Cloud’s potential impact, organizations are evolving their infrastructures to
enable private cloud deployments, exploring hybrid clouds, and considering public clouds.
© 2014 IBM Corporation
5
Are you experiencing any of these pain points?
• Unable to meet business objectives (delay to market, etc.)
• Existing resources insufficient to meet peek compute demand
– Long run times on existing cluster or grid
– No access to local technical computing resources (workstation users)
• Technical resources expensive and time consuming to acquire
• The skills/staff to architect and manage a technical computing infrastructure can
be difficult to acquire
- 10,000 20,000 30,000 40,000 50,000 1 4 7 10 13 16 19 22Planned Daily Cycle
(24 x 365)
Financial Services
0 200 400 600 800 1000 1200 1400 1600April May June
Planned Project
© 2014 IBM Corporation
6
IBM Platform Computing Cloud Service
Making the cloud work for you
Build
• Complete, ready to run
clusters in the cloud
• Add additional capacity
in hours instead of
months
Manage
• Seamless workload
management,
on-premise and in the
cloud
• Transparent user
experience
Support
• 24X7 cloud operation
support
• Access to technical
computing expertise
when you need it
Protect
• Data encryption,
dedicated physical
machines and network
• Security through
physical isolation
© 2014 IBM Corporation
7
Ready to use Platform LSF & Platform Symphony clusters in the cloud
IBM Platform Computing Cloud Service (SaaS)
IBM Platform LSF
IBM Platform
Symphony
SoftLayer, an IBM Company
Infrastructure
24X7 CloudOps Support
© 2014 IBM Corporation
8
Dedicated physical and virtual machine infrastructure as a service
• 13+ data centers
• 17 network PoPs
• Global private network
• Bare metal and virtual machines
190,000+
SERVERS
21,000+
CUSTOMERS
22,000,000+
DOMAINS
© 2014 IBM Corporation
9
Workload I/O intensity
• SoftLayer’s architecture
outperforms by >50% equivalent
AWS instances for high I/O
workloads
Control (APIs,
hardware / network
configurability)
• SoftLayer offers hundreds of
hardware configurations vs. 14
for AWS
• ~2,000 APIs for SoftLayer vs. ~60
for AWS and none for RAX
Integrated platform of
multiple architectures
• Unified integration & control
panel for multiple cloud
architectures
• RAX requires paid bridge,
different control interfaces
Ready to use Platform LSF & Platform Symphony clusters in the cloud
Low intensity workloads Low degree of control and customization AWS IBM High intensity workloads High degree of control and customization
Single platform Seamless integration
DIFFERENTIATOR
RATING
IBM ADVANTAGES
© 2014 IBM Corporation
10
Non-shared physical machines for added security and performance
• Dedicated and isolated compute environment
• All machine instances are dedicated to the client
• Each cluster is isolated on a VLAN
• Only the VPN gateway has an addressable interface
• All customer data at rest is encrypted on shared file systems
• When machines instances are decommissioned the disks are scrubbed using
DoD approved methods
© 2014 IBM Corporation
11
Optimal performance for technical computing apps
Industrial Manufacturing Benchmark – Structural Mechanics
EDA Benchmark (IBM-MESA)
Note: Benchmark results were obtained by IBM and have not yet been externally audited or validated.
© 2014 IBM Corporation
12
Run and supported by dedicated, 24X7 HPC Cloud Operations Team
CloudOps functions
• Pre-provisioning: Provide guidance to client on how to enable VPN, multi-cluster settings &
security settings on the client on-premise environment
• One time setup testing: Extensive testing of the cluster prior to release to the client
• Extensive testing of the cluster on every event of flex-up prior to release to the client
• Email alerts prior to flex-down & cluster shutdown operations
• Email alerts in case of any overage (compute hours, download bandwidth)
• Provide billing details of monthly usage including overage details
• Provide support under IBM SLA by experts highly experienced in Platform Computing
products
Value: quality, peace of mind & minimum disruption to business
• Extensive quality checks ensures minimum loss of usage hours & disruptions
• Proactive alerts ensures that in-progress critical jobs are not killed in case of Flex-down &
Cluster Shutdowns and Overages
• Highly trained & experienced Support ensures smooth on-boarding and minimize
disruptions
© 2014 IBM Corporation
13
Industry-leading workload management
• 20 years managing distributed scale-out systems with 2000+
customers in many industries
• High performance workload management combined with
intelligent resource scheduling engine
• Unmatched scalability (small clusters to global grids) and
production-proven reliability
• Heterogeneous – manages System x and Power plus 3rd party
systems, virtual and bare metal, accelerators / GPU, cloud, etc.
• Shared services for both compute and data intensive workloads
• Integrated solutions with vertical reference architectures
23 of 30
largest
commercial
enterprises
Over 5M
CPUs under
management
60% of top
financial
services
companies
© 2014 IBM Corporation
14
IBM Platform LSF
Overview
Powerful workload management for demanding, distributed and mission-critical high
performance computing environments.
Key Capabilities
• Powerful
- Policy and resource-aware scheduling
- Resource consolidation for optimal performance
- Advanced self-management
• Flexible
- Heterogeneous platform support
- Policy-driven automation
- CLI, web services, APIs
• Scalable
- Thousands of concurrent users and jobs
- Virtualized pool of shared resources
- Flexible control, multiple policies
Client Benefits
• Optimal utilization: reduced infrastructure cost
• Robust capabilities: improved productivity
• High throughput: faster time to results
© 2014 IBM Corporation
15
IBM Platform Symphony
Overview
Low-latency grid management platform for distributed
computing and analytics with sophisticated resource
sharing
Key Capabilities
• Accelerates service-oriented applications
• Extreme app scalability and throughput with very low
latency
• Compute and data-intensive applications on a single
platform
• Sophisticated, hierarchical resource sharing
• Open and flexible: choice of OS, frameworks and
languages
Client Benefits
• Increase performance and analytic result quality
• Reduces IT costs - increase utilization, simplify
application onboarding, reduce administration costs
Low Latency / High throughput
Sub-millisecond, 17,000 tasks per second
Large Scale
10k cores per application, 40k cores per grid
Efficient shared services
Heterogeneous & Open
Linux, Windows, AIX, C/C++, C#, Java, Excel, Python, R
© 2014 IBM Corporation
16
Use case 1 – hybrid cluster
The problem
• Existing resources cannot meet peak demand
• Resources are expensive and time consuming to acquire
• Skills to architect and manage clusters are difficult to find
• Fixed or reduced budgets
• On-premise constraints in space, cooling and power
The solution
• Fully functioning IBM Platform LSF or Symphony clusters are
provisioned on the SoftLayer cloud and connected to the
on-premise cluster, expanding capacity as needed
• Leverage MultiCluster capability for managed forwarding of
jobs from on premise cluster to off premise cluster
The Value
• Access to additional compute capacity on a temporary basis as needed
• Near-zero wait times
• Reduce costs by paying for only what is used
• Pay for additional capacity as an operating expense
• Fully supported, end-to-end solution, from the on-premise to the on-cloud clusters
• Expected and reliable performance from running technical computing workloads on physical machines
• Transparent access to cloud resources, the end user experience does not change
© 2014 IBM Corporation
17
Use case 2 – stand-alone cluster in the cloud
The problem
• New and emerging need for technical computing
• Skills to architect and manage clusters are difficult to find
• Resources are expensive and time consuming to acquire
• Inconsistent demand does not justify the investment
The solution
• Fully functioning Platform LSF and Symphony clusters are
provisioned on the SoftLayer cloud providing resources as
needed
The value
§
Market-leading Platform LSF and Platform Symphony software
§
Access to technical computing resources on a temporary basis
without the need to acquire, install and configure the infrastructure and cluster software
§
Keep costs low by paying for only what is used
§
Pay for capacity as an operating expense
§
Fully supported solution
© 2014 IBM Corporation
18
Is IBM Platform Computing Cloud Service a good fit for you?
Business pain points
• And you experiencing lost profit due to missed deadlines?
• Do you experience pressure to convert your compute environment capital expense to
operational expense?
• Have you ever missed a deadline or delayed a project because technical computing
resource procurement took too long ?
Technology pain points
• Do your users ever scale back their analyses to lower fidelity or less accuracy in order to fit
them into the local compute environment or to a time window?
• Do you regularly, occasionally, or permanently have fewer resources (CPUs, disk, memory,
etc) than you would like to have to service the user’s compute demand?
• Do you experience a large variance in compute resource utilization?
• Have you reached, or will you reach the capacity of your datacenter(s), and do you need a
plan to grow beyond that capacity ?
© 2014 IBM Corporation
19
IBM Platform Computing Cloud Service
Making the Cloud Work for You
Unmatched Expertise
Analytics, Technical Computing,
Software, Services and ISV Partnerships
IBM Hybrid Cloud
Consolidation
Supporting heterogeneous IBM and non-IBM infrastructure
Cloud Leadership
Expertise from
Client Engagements
powered byOn
SmartCloud
Unmatched Capabilities
Policy-driven Workload
Management
On
Premise
© 2014 IBM Corporation
20
© 2014 IBM Corporation
21
SoftLayer and Amazon EC2 Products tested
NAME
IaaS
Provider
CPU Cores Memory
(GB)
Disk Space
(GB)
Physical /
Virtual
Rate (USD)
Hourly
SL PM
So'Layer
16
64
1000[1]
Physical
$1.85[2]
SL VM
So'Layer
8
8
500[3]
Virtual
$0.88
SL PM (ded)
So'Layer
16
64
1000[1]
Physical
$3.83[5]
EC2 CC2
Amazon
EC2 (CC2)
32
60.5
3360
Virtual
$2.40[4]
EC2 2XL
Amazon
EC2
(c1.xlarge)
8
7
840
Virtual
$0.58
SL Physical Machine
Intel(R) Xeon(R) CPU E5-‐2650 0 @ 2.00GHz
SL Physical Machine (dedicated)
Intel® Xeon® CPU E5-‐2690 0 @ 2.90GHz
SL Virtual Machine
Intel(R) Xeon(R) CPU E5-‐2650 v2 @ 2.60GHz
Amazon CCI2
Intel(R) Xeon(R) CPU E5-‐2670 0 @ 2.60GHz
Amazon 2XL
Intel(R) Xeon(R) CPU E5-‐2650 0 @ 2.00GHz
© 2014 IBM Corporation 22
Memory Bandwidth
0 1000 2000 3000 4000 5000 6000 7000 8000 9000SL PM SL VM EC2 CCI2 EC2 2XL SL PM (ded)
STREAM
(higher is better)
COPY SCALE ADD TRIAD 0.00 500.00 1,000.00 1,500.00 2,000.00 2,500.00 3,000.00 3,500.00 4,000.00 4,500.00SL PM SL VM EC2 CCI2 EC2 2XL SL PM (ded)
STREAM Price Performance
(higher is better)
COPY SCALE ADD TRIAD
© 2014 IBM Corporation 23
CPU Performance
0 100 200 300 400 500 600 700 800SL PM SL VM EC2 CCI2 EC2 2XL SL PM (ded)
El ap se d T im e
SuperPI
(lower is better)
0.00 2.00 4.00 6.00 8.00 10.00SL PM SL VM EC2 CCI2 EC2 2XL SL PM (ded)
th ro u g h p u t p er d o lla r
SuperPI Price-Performance
(higher is better)
© 2014 IBM Corporation 24
Network Bandwidth
1 10 100 1000 10000 100000 1 10 100 1000 10000 100000 1000000 10000000 B an d w id th (Mb its /s )Message Size (Bytes)
openMPI
SLVM EC2 2XL EC2 CCI2 SL PM SL PM Dedicated© 2014 IBM Corporation 25
Network Latency
0 20 40 60 80 100 120SL VM MPI 2 node EC2 2XL MPI 2 node EC2 CCI2 MPI 2 node
SL PM MPI 2 node SL PM (ded) MPI 2 node
openMPI Latency
© 2014 IBM Corporation
26
Input / Output Performance
0 50000 100000 150000 200000 250000 300000 350000 0 1 2 3 4 5 kB/sec
I/O file size (factor of memory size)
I/O Bandwidth - WRITE
(higher is better)
SL VM Write EC2 2XL Write EC2 CCI2 Write SL PM Write SL PM Ded Write 0 50000 100000 150000 200000 250000 300000 350000 400000 0 1 2 3 4 5 kB/sec
I/O file size (factor of memory size)
I/O Bandwidth - READ
(higher is better)
SL VM Read EC2 CCI2 Read EC2 2XL Read SL PM Read SL PM Ded Read
© 2014 IBM Corporation 27
Software Compilation
0 100 200 300 400 500 600 700 800SL VM SL PM EC2 2XL EC2 CCI SL PM Ded
El ap se d T im e (s )
Software Compile Performance
(lower is better) 0.00 1.00 2.00 3.00 4.00 5.00 6.00 7.00 8.00 9.00
SL VM SL PM EC2 2XL EC2 CCI SL PM Ded
Runs / $
Software Compile Price-Performance
© 2014 IBM Corporation
28
Life Science (BWA)
SL PM (ded) SL PM SL VM EC2 CCI2 EC2 2XL Series1 20846.481 26509.368 25897.44 22442.7 37491 0 5000 10000 15000 20000 25000 30000 35000 40000 El ap se d ti m e (s ec )
Life Sciences Benchmark (BWA)
(lower is better)
SL PM (ded) SL PM SL VM EC2 CCI2 EC2 2XL Series1 22.21 7.79 6.33 14.96 6.04 0.00 5.00 10.00 15.00 20.00 25.00 $ / run
Life Sciences Benchmark (BWA) Price
Performance
© 2014 IBM Corporation
29
EDA Benchmark (IBM-MESA)
0 500 1000 1500 2000 2500 3000 3500
SL PM (ded) SL PM SL VM EC2 2XL EC2 CCI2
El ap se d T im e (s ec )
EDA - IBM Mesa
(lower is better) 0.00 0.50 1.00 1.50 2.00 2.50
SL PM (ded) SL PM SL VM EC2 2XL EC2 CCI2
Runs / $
EDA - IBM Mesa - Price-Performance
© 2014 IBM Corporation 30
Provisioning Time
1 10 100 1000 10000 100000SL PM SL VM EC2 CCI2 EC2 2XL SL PM Ded
Provisioning Time (sec)
© 2014 IBM Corporation
31
Industrial Manufacturing – Structural Mechanics
1 3 5 7 9 11 13 0 2 4 6 8 10 12 14 16 Sp ee d u p (r el ati ve to EC 2 2XL ) CPUs
One Node - S4D
SL PM EC2 CCI2 SL VM EC2 2XL SL PM (ded) 1 2 3 4 5 6 7 0 2 4 6 8 10 12 14 16 Sp ee d u p (r el ati ve to EC 2 2XL ) CPUsOne Node - S6
SL PM EC2 CCI2 SL VM EC2 2XL SL PM (ded) 1 3 5 7 9 11 13 15 17 19 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 Sp ee d u p (r el ati ve to EC 2 2XL ) CPUsTwo Nodes - S4D
SL PM EC2 CCI2 SL VM EC2 2XL SL PM (ded) 1 2 3 4 5 6 7 8 9 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 Sp ee d u p (r el ati ve to EC 2 2XL ) CPUsTwo Nodes - S6
SL PM EC2 CCI2 SL VM EC2 2XL SL PM (ded)© 2014 IBM Corporation 32
Industrial Manufacturing – CFD
0 2 4 6 8 10 12 14 16 18 1 3 5 7 9 11 13 15 Sp ee d u p (r el ati ve to EC 2 2XL ) # coresOpenFoam Speedup Backplane
(higher is better) SL PM (ded) SL PM SL VM EC2 CCI2 EC2 2XL 0 1 2 3 4 5 6 7 8 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 Sp ee d u p (r el ati ve to EC 2 2XL ) # cores
OpenFoam Speedup Ethernet
(higher is better) SL PM (ded) SL PM SL VM EC2 CCI2 EC2 2XL