Linux Workload Performance on
LinuxONE III / z15
—
Dr. Dominic Roehm
Technical Lead Systems Performance [email protected]
June 17 th , 2020
Agenda
2
§ LinuxONE III
§ SMC-D Performance
§ MongoDB Performance
§ Scale-out Performance with NGINX webserver
§ Integrated Accelerator for z Enterprise Data Compression
§ OpenShift Container Platform 4.2 Performance
Systems / LinuxONE / © 2020 IBM Corporation
LinuxONE III
Systems / LinuxONE / © 2020 IBM Corporation
3
• System Capacity & Performance
• Security
• Reliability
• Cost
• LinuxONE III vs x86
Businesses needs the right technology at the best value
Systems / LinuxONE / © 2020 IBM Corporation
4
§ The focus of business:
§ Managing / processing / making sense of huge amounts of data
§ Securing that data all the time at the highest levels
§ Delivering results in real-time, yet being flexible to adapt in a moment
§ De facto standards for technology:
§ Driven by Linux and open standards
§ 100% compliant, 100% secure
§ Fully scalable and elastic, yet completely resilient and reliable
§ Cost-effective – don’t pay for what you don’t use
Typical platform options
Systems / LinuxONE / © 2020 IBM Corporation
Commodity servers
Advantages:
§ Cheap, commonplace, “good enough”
Disadvantages:
§ Vast numbers of servers makes the data center extraordinarily complex to manage
§ Servers continue to run under- utilized, with large amount of wasted resources
§ Virtualization has max’ed out – no room to grow
Cloud computing
Advantages:
§ Perceived to deliver faster access to infrastructure, greater scalability, higher
availability and faster time-to- market
Disadvantages:
§ Lack of expertise (more complex than anticipated)
§ Lack of security, compliance and governance/control
§ Surprisingly expensive
IBM LinuxONE: An alternative to commodity servers
Systems / LinuxONE / © 2020 IBM Corporation
6
§ Large, centralized enterprise server
§ Supports large numbers of applications in a
single box
§ Scales up/down with ease
§ Resources can be dedicated
or shared
§ Exceptional
virtualization efficiency
§ 100% Linux
§ Securable to highest levels possible
§ No exposed networks
§ Mean time between failures measured in decades, not months
§ Simplified management
IBM LinuxONE III
Architectural differences – at the chip level
Systems / LinuxONE / © 2020 IBM Corporation
7
IBM LinuxONE Commodity x86 server
Core clock speed 5.2 GHz (LinuxONE III) 2.3 GHz (Xeon Gold 6140 – Skylake)
L1 / L2 cache 128 KB I + 128 KB D / 4 MB I + 4 MB D – per core 64 KB / 256 KB per core
L3 cache 256 MB – shared by all active cores on the chip 24.75 MB – shared by all cores on chip
L4 cache 940 MB – on separate chip, shared by all active cores N/A
SMT, SIMD, OOO, HTM Yes, yes, yes, yes Yes, yes, yes, yes
Java enhancements Pause-less garbage collection N/A
Cryptographic functions Per core crypto co-processing Crypto co-processing shared by all cores
Compression functions Per chip compression unit (IF for zEDC) QuickAssist Technology
Intel Xeon
“Skylake”
processor chip IBM LinuxONE
processor chip
14 nm SOI chip technology 14 nm SOI chip
technology 14 nm SOI chip
technology
Encryption on LinuxONE is much faster than on x86
Systems / LinuxONE / © 2020 IBM Corporation
9
Disclaimer: Performance results based on IBM internal tests running OpenSSL 1.1.1 Speed AES-256-GCM cipher with 4k buffer size. Results may vary. LinuxONE III configuration: LPAR with 4 dedicated cores, 128 GB memory, SLES 12 SP4 (SMT mode) running OpenSSL Speed with 1-8 threads using OpenSSL 1.1.1c, libica 3.5.0 and OpenSSL-ibmca 2.0.3. x86 configuration: 2 Intel® Xeon® Gold 6140 CPU @ 2.30GHz with Hyperthreading turned on, 128 GB memory, SLES12 SP4 running OpenSSL Speed with 1-8 threads.
OpenSSL 1.1 using the AES-256-GCM cipher
provides up to 3.8x better performance per core on a LinuxONE III LPAR
versus a compared x86 platform
3.6x 3.8x
3.6x
Total Compression
Throughput with Internal Accelerator for zEDC on LinuxONE III
DISCLAIMER: Performance result is extrapolated from IBM internal tests running in a LinuxONE III LPAR with 36 or 39 dedicated cores and 256 GB memory, a z/VM 7.1 instance in SMT mode with 4 guests running SLES 12 SP4. With 36 cores each guest was configured with 18 vCPU. With 39 cores 3 guests were configured with 20 vCPU and 1 guest was configured with 18 vCPU. Each guest was configured with 64 GB memory, had a direct-attached OSA-Express6S adapter, and was running a dockerized NGINX 1.15.9 web server (with patch https://github.com/nginx/nginx/commit/
cfa1316368dcc6dc1aa82e3d0b67ec0d1cf7eebb applied) using zlib (based on patch from https://github.com/madler/zlib/pull/410) to compress web pages. The guest images were located on a FICON-attached DS8886. Each NGINX server was driven remotely by a separate x86 blade server with 24 Intel Xeon E5-2697 v2 @ 2.7GHz cores and 256 GB memory, running the wrk2 4.0.0.0 benchmarking tool (https://github.com/giltene/wrk2) with 18 parallel threads and 36 open HTTPS connections. The requested and compressed web pages had a size of 512 MB.
• Integrated Accelerator for z Enterprise Data Compression
Compress up to 275 GB data per second on a single LinuxONE III server using the Integrated Accelerator for z Enterprise Data Compression
Each NGINX web server compressed in average 13.75 GB data per second
LinuxONE III zHypervisor
LPAR 1 - 2
LPAR
39 cores, 256 GB memory z/VM 7.1
NGINX
SLES 12 SP4 18 vCPU
64 GB
Guest 1 Guest 2 - 4
NGINX
SLES 12 SP4 20 vCPU
64 GB
NGINX
SLES 12 SP4 20 vCPU
64 GB
NGINX
SLES 12 SP4 20 vCPU
64 GB
39 cores, 256 GB memory LPAR z/VM 7.1
NGINX
SLES 12 SP4 18 vCPU 64 GB mem.
Guest 1 Guest 2 - 4
NGINX
SLES 12 SP4 20 vCPU
64 GB
NGINX
SLES 12 SP4 20 vCPU
64 GB
NGINX
SLES 12 SP4 20 vCPU 64 GB mem.
39 cores, 256 GB memory LPAR z/VM 7.1
NGINX
SLES 12 SP4 18 vCPU
64 GB
Guest 1 Guest 2 - 4
NGINX
SLES 12 SP4 20 vCPU
64 GB
NGINX
SLES 12 SP4 20 vCPU
64 GB
NGINX
SLES 12 SP4 20 vCPU
64 GB
38 cores, 256 GB memory LPAR z/VM 7.1
NGINX
SLES 12 SP4 16 vCPU 64 GB mem.
Guest 1 Guest 2 - 4
NGINX
SLES 12 SP4 20 vCPU
64 GB
NGINX
SLES 12 SP4 20 vCPU
64 GB
NGINX
SLES 12 SP4 20 vCPU 64 GB mem.
LPAR 3 - 4
LPAR
36 cores, 256 GB memory z/VM 7.1
NGINX
SLES 12 SP4 18 vCPU 64 GB mem.
Guest 1 Guest 2 - 4
NGINX
SLES 12 SP4 20 vCPU
64 GB
NGINX
SLES 12 SP4 20 vCPU
64 GB
NGINX
SLES 12 SP4 18 vCPU 64 GB mem.
LPAR 5 4 for LPAR 1 4 for LPAR 2 4 for LPAR 3 4 for LPAR 4 4 for LPAR 5
x86 server wrk2 Workload
Driver
x86 server wrk2 Workload
Driver
x86 server wrk2 Workload
Driver
x86 server wrk2 Workload
Driver
x86 server wrk2 Workload
Driver
x86 server wrk2 Workload
Driver
x86 server wrk2 Workload
Driver
x86 server wrk2 Workload
Driver
x86 server wrk2 Workload
Driver
x86 server wrk2 Workload
Driver
x86 server wrk2 Workload
Driver
x86 server wrk2 Workload
Driver x86 server
wrk2 Workload
Driver
x86 server wrk2 Workload
Driver
x86 server wrk2 Workload
Driver
x86 server wrk2 Workload
Driver
x86 server wrk2 Workload
Driver
x86 server wrk2 Workload
Driver
x86 server wrk2 Workload
Driver
x86 server wrk2 Workload
Driver
x86 Systems 20 running Workload Driver, one
for each z/VM guest
Systems / LinuxONE / © 2020 IBM Corporation
• Most LinuxONE servers ship with two extra cores
designated as spares
• In addition, any unused core can act as a spare
• Core failover (called sparing) is transparent to applications
• Spares need not be local on the same chip or in the same drawer
• Any core can failover to a spare
If a core fails, a spare can be “turned on” without system or program interruption
12
Typical x86 servers do not have core
sparing
Core0 Core2
Core1
Core5 Core6
Core7 Core4
Core6
Shared L3 Cache
Core3 Core5 Core7 Core9 Core8
Core0 Core2
Core1
Core5 Core6
Core7 Core4
Core6
Shared L3 Cache
Core3 Core5 Core7 Core9 Core8
Systems / LinuxONE / © 2020 IBM Corporation
LinuxONE systems never go down because of memory failures
13
A level of memory protection not found on typical servers
• LinuxONE uses special memory that is designed to eliminate even the most remote failures (due to cosmic radiation)
• Redundant Array of Independent Memory (RAIM)
• Very robust , very cost effective
• No performance penalty
• Covers memory buses, DIMM connectors, clock failures, etc.
• Zero observable memory failures on systems using RAIM
Systems / LinuxONE / © 2020 IBM Corporation
IBM LinuxONE delivers the highest availability
Systems / LinuxONE / © 2020 IBM Corporation
14
§ IBM LinuxONE exhibits true fault tolerance
§ Close to 6 9’s availability – far better than traditional x86 servers, and better than converged systems
§ For IBM LinuxONE, the mean time between failures is
measured in decades, not
months
Maintain system availability even as resources are added or reallocated
15
• All boxes ship with all cores
• Activate only the number of cores needed
• As demand increases, activate additional cores
• Reallocate cores across VMs and across
partitions as business and application needs change
• Optionally, activate cores temporarily and pay only for “on” time (Capacity on Demand)
• Example: Sales cycles may demand extra capacity during specific periods
… …
Partition Partition Partition Hypervisor Hypervisor Hypervisor
Li nux Li nux Li nux Li nux Li nux Li nux Li nux Li nux Li nux Li nux Li nux Li nux
Active cores Inactive cores
Physical HW
CPUs … …
Systems / LinuxONE / © 2020 IBM Corporation
Shared Memory
Communications –
Direct Memory Access (SMC-D) Performance
• Throughput & Response Time of SMC-D versus Network adapter
• Workload Performance on IBM LinuxONE III
Systems / LinuxONE / © 2020 IBM Corporation
SMC-D connection between LinuxONE III LPARs versus TCP/IP connection to x86 servers
• SMC-D Performance
• Benchmark Setup
• Ran uperf network benchmark with 50 parallel connections and streaming writes workload profile, i.e. continuously write in 30720 byte chunks of data
1) On 2 LPARs with SMC-D connection
2) On a LPAR and a x86 server using a 25 Gb TCP/IP connection
• System Stack
• LinuxONE III
• 2 LPARs, each with 4 dedicated cores, 64 GB memory, running SLES 12 SP4 with SMT enabled
• One LPAR connected with a 25 GbE RoCE Express2 network adapter to a 25 Gb network switch
• uperf network benchmark (https://github.com/uperf/uperf/tree/
09fbbdb93e4f0e6569bd532ffd5a4d5969d3eb84)
• x86
• 4 Intel® Xeon® Gold 6126 CPU @ 2.60GHz with Hyperthreading turned on, 64 GB of memory, running SLES 12 SP4
• Connected with a 25 Gb Ethernet Adapter to a 25 Gb network switch
• uperf network benchmark (https://github.com/uperf/uperf/tree/
09fbbdb93e4f0e6569bd532ffd5a4d5969d3eb84)
LinuxONE III zHypervisor
SMC-D connection LPAR1
4 cores, 64 GB memory SLES 12 SP4
uperf
LPAR2
4 cores, 64 GB memory SLES 12 SP4
uperf
x86 server 4 cores, 64 GB memory
SLES 12 SP4 uperf Network
Switch
25 Gb TCP/IP connection
2
1
Systems / LinuxONE / © 2020 IBM Corporation
SMC-D connection between
LinuxONE III LPARs versus TCP/IP connection to x86 servers
DISCLAIMER: Performance results based on IBM internal tests running network benchmark uperf (https://github.com/uperf/uperf/tree/09fbbdb93e4f0e6569bd532ffd5a4d5969d3eb84) with streaming writes workload profile (30 KB payload) and 50 parallel connections on both LPARs using an SMC-D connection versus using LPAR and x86 server with a 25 Gb TCP/IP connection. Results may vary. LinuxONE III configuration: 2 LPARs, each with 4 dedicated cores, 64 GB memory, SLES 12 SP4 (SMT mode) running uperf.
25 GbE RoCE Express2 network adapter attached to one LPAR. x86 configuration: 4 Intel® Xeon® Gold 6126 CPU @ 2.6GHz with Hyperthreading turned on, 64 GB memory, 25 Gb Ethernet Adapter, SLES 12 SP4 running uperf. The LinuxONE III was connected over one 25 Gb network switch to the x86 server.
• SMC-D Performance
Get up to 7.3x more throughput and up to 7.3x lower latency using an SMC-D connection
between two LinuxONE III LPARs for
communications compared to using a 25 Gb TCP/IP connection between a LinuxONE III LPAR and the compared x86 server
7.3x
7.3x
Systems / LinuxONE / © 2020 IBM Corporation
MongoDB Performance
Systems / LinuxONE / © 2020 IBM Corporation
20
• MongoDB performance on LinuxONE III and x86 systems
• MongoDB scale-up on LinuxONE III vs. scale-out on
x86 cluster with replication
MongoDB Performance LinuxONE III and x86 Systems
22
§ Benchmark Setup
• YCSB workload driver locally à emulates parallel clients access to DB
• read-only (100% read)
• write-heavy (50% write, 50% read)
• MongoDB database size 50 GB with 1 KB records
• no replication / sharding
§ System Stack
• LinuxONE III: LPAR with 2-8 dedicated cores, 128 GB memory, running SLES 12 SP4 with SMT enabled
• 120 GB LUN on FlashSystem 900
• MongoDB 4.0.6
• x86: 2-8 Intel® Xeon® Gold 6140 CPU @ 2.30GHz w/ Hyperthreading turned on, 128 GB memory, SLES 12 SP4
• 2 TB local RAID5 SSD storage
• MongoDB 4.0.6
Systems / LinuxONE / © 2020 IBM Corporation
LinuxONE III Setup
x86 blade 0
YCSB 0 64 threads
YCSB 1 IF 64 threads
x86 blade 1
YCSB 2 64 threads
YCSB 3 IF 64 threads
LinuxONE III LPAR MongoDB
with journaling IF
0
IF 1
FlashSystem 900
10 Gbit/s
10 Gbit/s
x86 Setup
x86 blade 0
YCSB 0 64 threads
IF YCSB 1
64 threads
x86 server MongoDB
with journaling
IF Local SSD
10 Gbit/s
YCSB 2 64 threads
YCSB 3 64 threads
MongoDB Performance LinuxONE III and x86 Systems
Systems / LinuxONE / © 2020 IBM Corporation
23
Run the YCSB 0.15.0 read-only benchmark with up to 2.1x more throughput and the YCSB 0.15.0 write-heavy benchmark with up to 2x more throughput on MongoDB 4.0.6
on 2 cores on a LinuxONE III LPAR versus a compared x86 platform
Disclaimer: Performance results based on IBM internal tests running YCSB 0.15.0 (read-only, write-heavy) from remote x86 servers on MongoDB Enterprise Release 4.0.6. MongoDB database size was 50 GB and record size 1 KB. Results may vary. LinuxONE III configuration:
LPAR with 2 dedicated cores, 128 GB memory, 120 GB FlashSystem 900 storage, SLES 12 SP4 (SMT mode) running MongoDB, driven remotely by YCSB using 2 x86 servers with total 256 threads. x86 configuration: 2 Intel® Xeon® Gold 6140 CPU @ 2.30GHz with Hyperthreading turned on, 128 GB memory, 2 TB local RAID5 SSD storage, SLES12 SP4 running MongoDB, driven remotely by YCSB using 1 x86 server with total 256 threads.
2.1x
1.7x
1.8x
2x
1.8x
1.9x
MongoDB Scale-up on LinuxONE III vs. Scale-out on x86
Systems / LinuxONE / © 2020 IBM Corporation
§ Setup on LinuxONE III
• 1TB aggregated database size
• 3-node replica set
• Journaling turned on
• Database to memory ratio 4:1
• No sharding on LinuxONE III
• 2 OSA cards
• 1 primary + 2 secondaries in 3 LinuxONE LPARs
• Flashsystem900 storage on LinuxONE III
§ Benchmark Setup
– 3x LinuxONE III LPARs, 4 cores for the primary and 1 or 2 cores per secondary, 128 GB memory per LPAR,
FlashSystem 900 storage
• 1 primary (1 TB)
• 2 replica (each 1 TB)
– YCSB Benchmark read-mostly
– MongoDB 4.0.6 on SLES 12 SP4, write concern “majority”
– 2 driving blades with each 4 YCSB instances each with 64 threads, in total 512 threads
LinuxONE III Setup
LinuxONE III LPAR MongoDB
Shard #0 Primary
FlashSystem 900
10 Gbit/s
LinuxONE III LPAR
LinuxONE III LPAR MongoDB
Shard #0 Secondary #0
MongoDB
Shard #0 Secondary #1
x86 blade 0
YCSB 1 64 threads
YCSB 0 64 threads
YCSB 2 64 threads
YCSB 3 64 threads
x86 blade 1
YCSB 1 64 threads
YCSB 0 64 threads
YCSB 2 64 threads
YCSB 3 64 threads
10 Gbit/s
MongoDB Scale-up on LinuxONE III vs. Scale-out on x86
Systems / LinuxONE / © 2020 IBM Corporation
25
25
§ Setup on x86
• 1TB aggregated database size
• 3-node replica set
• Database to memory ratio 6:1
• Sharding on x86
• 4 shards + 8 secondaries on 4 x86 server
• Local SSD storage on x86
§ Benchmark Setup
– 5 x86 Skylake each with 12 cores, 128 GB memory, local SSDs (~ 1TB)
• Each server hosting 1 shard (256 GB) and 2 replica (2x 256 GB)
– YCSB Benchmark read-mostly
– MongoDB 4.0.6 (or newer) on SLES 12 SP4, write concern
“majority”
– 2 driving blades with each 4 YCSB instances each with 64 threads, in total 512 threads
x86 Cluster Configuration
x86 Server #0 (local SSDs)
x86 Server #1 (local SSDs)
x86 Server #2 (local SSDs)
x86 Server #3 (local SSDs) x86 Server # 5
(local SSDs)
Router (Mongos)
x86 blade 0
YCSB 1 64 threads
YCSB 0 64 threads
YCSB 2 64 threads
YCSB 3 64 threads
x86 blade 1
YCSB 1 64 threads
YCSB 0 64 threads
YCSB 2 64 threads
YCSB 3 64 threads
Shard #0 Primary Shard #3 Secondary
#0
Shard #1 Primary Shard #0 Secondary
#0
Shard #2 Primary Shard #1 Secondary
#0
Shard #3 Primary Shard #2 Secondary
#0
Shard #0 Secondary
#1
Shard #1 Secondary
#1 Shard #2 Secondary
#1
Shard #3 Secondary
#1
MongoDB Scale-up on
LinuxONE III vs. Scale-out on x86
Systems / LinuxONE / © 2020 IBM Corporation
26
Disclaimer: Performance results based on IBM internal tests running YCSB 0.10.0 benchmark (read-mostly) on MongoDB Enterprise Release 4.0.6 with 3-node replication. On LinuxONE III MongoDB was setup without sharding. On x86 MongoDB was setup with four shards. Results may vary. x86 config: 5 Intel® Xeon® Gold 6140 CPU @ 2.30GHz with Hyperthreading turned on, 128 GB memory, 2 TB local RAID5 SSD storage, SLES12 SP4 running MongoDB, driven remotely by YCSB using 5 x86 server with total 512 threads LinuxONE III configuration: LPAR with 4 dedicated cores and 2 LPARs with each 1 core, each with SMT and 128 GB memory, 5 TB FlashSystem 900 storage, SLES 12 SP4 (SMT mode) running MongoDB, driven remotely by YCSB using 4 x86 servers with total 512 threads.
Run the Yahoo Cloud Serving Benchmark (YCSB) on MongoDB without sharding on IBM LinuxONE III with 6 cores in total and achieve the same throughput as on
MongoDB with 4 shards on compared
x86 systems with 60 cores in total, which
provides a 10:1 core consolidation ratio
in favor of LinuxONE III
MongoDB Scale-up on
LinuxONE III vs. Scale-out on x86
Systems / LinuxONE / © 2020 IBM Corporation
27
Run the Yahoo Cloud Serving Benchmark (YCSB) on MongoDB without sharding on IBM LinuxONE III with up to 3.7x better read latency and 2.4x better write
latency than on MongoDB with four shards on compared x86 systems
2.4x 3.7x
Disclaimer: Performance results based on IBM internal tests running YCSB 0.10.0 benchmark (read-mostly) on MongoDB Enterprise Release 4.0.6 with 3-node replication. On LinuxONE III MongoDB was setup without sharding. On x86 MongoDB was setup with four shards. Results may vary. x86 config: 5 Intel® Xeon® Gold 6140 CPU @ 2.30GHz with Hyperthreading turned on, 128 GB memory, 2 TB local RAID5 SSD storage, SLES12 SP4 running MongoDB, driven remotely by YCSB using 5 x86 server with total 512 threads LinuxONE III configuration: LPAR with 4 dedicated cores and 2 LPARs with each 1 core, each with SMT and 128 GB memory, 5 TB FlashSystem 900 storage, SLES 12 SP4 (SMT mode) running MongoDB, driven remotely by YCSB using 4 x86 servers with total 512 threads.
• Scale-up Performance
Scale-out Performance
& Secure Execution
• Scale-out with Docker under KVM on LinuxONE III versus x86
• Scale-out under KVM with Secure Eecution
• Overhead of Secure Execution
• Workload Performance on IBM LinuxONE III
Systems / LinuxONE / © 2020 IBM Corporation
Scale-out under KVM on LinuxONE III versus x86 Skylake
DISCLAIMER: Performance result is extrapolated from IBM internal tests running 980 NGINX Docker containers in a LinuxONE III LPAR and bare-metal on a x86 server. LinuxONE III measurement configuration: LPAR with 1 dedicated core, 16 GB memory, running SLES 12 SP4 (SMT mode), Docker 18.09.6, NGINX 1.15.9. x86 measurement configuration: 1 Intel® Xeon® Gold 6126 CPU @ 2.60 GHz with Hyperthreading turned on, 16 GB memory, running SLES 12 SP4, Docker 18.09.6, NGINX 1.15.9. Based on the measurement results it is extrapolated that a LinuxONE III server with 190 cores and 40 TB memory can run 2.469 million NGINX Docker containers if configured with 20 LPARs, each having 9 cores, 2 TB memory, and running a KVM 2.11.2 instance with 126 KVM guests, each configured with 2 vCPUs, 16 GB memory, and running 980 dockerized NGINX web server. Based on the measurement results it is extrapolated that a x86 server with 8 Intel® Xeon®
Platinum 8156 processors (32 cores in total) and 6 TB memory can run 376 thousand NGINX Docker containers if configured with KVM 2.11.2 with 384 KVM guests, each configured with 2 vCPUs, 16 GB memory, and running 980 dockerized NGINX web server. Results may vary.
• Scale-out Performance
Run up to 6.6x more Docker containers under KVM on a LinuxONE III system
versus a compared x86 platform LinuxONE III (190 cores, 40 TB memory) zHypervisor
LPAR 1 (9 cores, 2 TB memory) KVM
980 NGINX web server KVM guest
1 (2 vCPU, 16GB memory)
. . .
980 NGINX web server KVM guest
126 (2 vCPU, 16GB memory)
LPAR 20 (9 cores, 2 TB memory) KVM
980 NGINX web server KVM guest
2395 (2 vCPU, 16GB memory)
. . .
980 NGINX web server KVM guest
2520 (2 vCPU, 16GB memory)
. . .
Compared x86 platform (32 cores, 6 TB memory) KVM
980 NGINX web server KVM guest 1
(2 vCPU, 16GB memory)
. . .
980 NGINX web server KVM guest 384
(2 vCPU, 16GB memory)
2.4 million Docker container on LinuxONE III w/ 40 TB memory versus
376 thousand Docker container on a x86 server w/ 6 TB memory
Systems / LinuxONE / © 2020 IBM Corporation
Scale-out with KVM
guests on LinuxONE III using Secure Execution
• DISCLAIMER: Performance result is extrapolated from IBM internal tests running in a LinuxONE III LT1 LPAR with 9 dedicated cores and 144 GB memory, an Ubuntu 20.04 KVM instance in SMT mode with 72 guests using Secure Execution. Each guest was configured with 1 vCPU, 8 GB memory and running a dockerized NGINX 1.15.9 web server on Ubuntu 20.04. Each NGINX server was driven remotely by an instance of the wrk2 4.0.0.0 benchmarking tool (https://github.com/giltene/wrk2) with 2 parallel threads and 8 open HTTPS connections. The transferred web pages had a size of 644 bytes. KVM guests were stored using qcow2 images. Results may vary.
• Security
Scale up to 1500 KVM guests running a web page serving workload on a
LinuxONE III server using IBM Secure Execution
In total 8.3 million HTTPS requests/sec, 5.5k HTTPS requests/sec per Secure Execution KVM guest
LinuxONE III zHypervisor LPAR 1 (9 cores, 144 GB memory)
NGINX web server
Guest 1 (1 vCPU, 8 GB memory)
using Secure Execution
. . .
NGINX web server
Guest 72 (1 vCPU, 8 GB memory)
using Secure Execution x86 server
wrk2 Workload
Driver 1
Ubuntu 20.04 KVM
wrk2 Workload
Driver 72
. . .
LPAR 21 (9 cores, 144 GB memory) NGINX
web server Guest 1441 (1 vCPU, 8 GB memory)
using Secure Execution
. . .
NGINX web server Guest 1512 (1 vCPU, 8 GB memory)
using Secure Execution x86 server
wrk2 Workload
Driver 1441
Ubuntu 20.04 KVM
wrk2 Workload
Driver 1512
. . .
. . . . . .
Systems / LinuxONE / © 2020 IBM Corporation
Overhead of Secure Execution on LinuxONE III
• Security
• Benchmark Setup
• Ran 72 KVM guests with and without using Secure Execution
• Each KVM guest was configured with 1 vCPU, 8 GB memory running an NGINX 1.15.9 web server on Ubuntu 20.04
• Each NGINX web server was driven remotely by a wrk2 benchmark instance
• The transferred web pages had a size of 644 bytes
• System Stack
• LinuxONE III
• LPAR with 9 dedicated cores, 144 GB memory, running Ubuntu 20.04 with SMT enabled
• 1 TB FlashSystem 900 storage
In total 396k HTTPS requests/sec, 5.5k HTTPS requests/sec per KVM guest
LinuxONE III zHypervisor
LPAR (9 cores, 144 GB memory) NGINX
web server Guest 1 (1 vCPU, 8 GB memory)
. . .
NGINX web server
Guest 72 (1 vCPU, 8 GB memory) x86 server
wrk2 Workload
Driver
Ubuntu 20.04 KVM
wrk2 Workload
Driver
. . .
Systems / LinuxONE / © 2020 IBM Corporation
Overhead of Secure
Execution on LinuxONE III
• DISCLAIMER: Performance results based on IBM internal tests running in a LinuxONE III LT1 LPAR with 9 dedicated cores and 144 GB memory, an Ubuntu 20.04 KVM instance in SMT mode with 72 guests using Secure Execution versus not using Secure Execution. Each guest was configured with 1 vCPU, 8 GB memory and running a dockerized NGINX 1.15.9 web server on Ubuntu 20.04. Each NGINX web server was driven remotely by an instance of the wrk2 4.0.0.0 benchmarking tool (https://github.com/giltene/wrk2) with 2 parallel threads and 8 open HTTPS connections. The transferred web pages had a size of 644 bytes. KVM guests were stored using qcow2 images. Results may vary.
• Security
Run NGINX web servers on KVM guests on LinuxONE III with only 6% CPU overhead when using IBM Secure Execution
6%
Systems / LinuxONE / © 2020 IBM Corporation
Integrated Accelerator for z Enterprise Data
Compression
• Speed up versus software compression
• MongoDB backup
• Compressing HTTPS data before encryption
• Workload Performance on IBM LinuxONE III
Systems / LinuxONE / © 2020 IBM Corporation
Compression Time with Integrated Accelerator for zEDC versus Software Compression on LinuxONE III
• Integrated Accelerator for z Enterprise Data Compression
• Benchmark Setup
• Ran minigzip benchmark using
• zlib exploiting the Integrated Accelerator for zEDC
• zlib -1 software compression
to compress source data files
• Source data files were taken from the Large Corpus (http://corpus.canterbury.ac.nz/descriptions)
• Canterbury.tar contained all files from all corpora
• System Stack
• LinuxONE III
• LPAR with 4 dedicated cores and 64 GB memory running SLES 12 SP4 with SMT enabled
• minigzip benchmark from the dfltcc branch of zlib (https://github.com/iii-i/zlib/tree/dfltcc-20190708)
Source data files
minigzip -1
Source data files
minigzip -1 w/ Int. Acc.
for zEDC
With software
based compression With Int. Acc. for zEDC based compression Compressed
data files Compressed data files
Systems / LinuxONE / © 2020 IBM Corporation
Compression Time with
Integrated Accelerator for zEDC versus Software Compression on LinuxONE III
DISCLAIMER: Performance results based on IBM internal tests running the minigzip benchmark with compression level -1 from the dfltcc branch of zlib (downloaded from https://github.com/iii- i/zlib/tree/dfltcc-20190708). Source data files were taken from the Large Corpus (downloaded from http://corpus.canterbury.ac.nz/descriptions). Results may vary. LinuxONE III configuration: LPAR with 4 dedicated cores, 64 GB memory, 40 GB DASD storage, SLES 12 SP4 (SMT mode).
• Integrated Accelerator for z Enterprise Data Compression
Compress data with zlib on LinuxONE III with 4 cores up to 42x faster with Integrated
Accelerator for zEDC compared to using software compression
33x 30x 24x 42x
Systems / LinuxONE / © 2020 IBM Corporation
MongoDB Dump Performance on LinuxONE III versus x86 Skylake
• Integrated Accelerator for z Enterprise Data Compression
• Benchmark Setup
• Ran mongodump
• with software compression (pigz -1) on x86
• with gzip exploiting the Integrated Accelerator for zEDC on LinuxONE III
• MongoDB database size 355 GB
• System Stack
• LinuxONE III
• LPAR with 1-8 dedicated cores, 1.5 TB memory, running RHEL 7.6 with SMT enabled
• Database located on IBM DS8000 storage
• MongoDB 4.0.6
• gzip based on source code level https://git.savannah.gnu.org/git/gzip.git commit 7a6f9c9c3267185a299ad178607ac5e3716ab4a5
• x86
• 1-8 Intel® Xeon® Gold 6140 CPU @ 2.30GHz w/ Hyperthreading turned on, 1.5 TB memory running RHEL 7.6
• Database located on IBM DS8000 storage
• MongoDB 4.0.6
• pigz
MongoDB
IBM System Storage DS8000
pigz -1
MongoDB
IBM System Storage DS8000
Int. Acc.
for zEDC
With software based
compression
With Int. Acc. for zEDC based compression
Systems / LinuxONE / © 2020 IBM Corporation
MongoDB Dump Performance on LinuxONE III versus x86 Skylake
DISCLAIMER: Performance results based on IBM internal tests running database dump with compression on MongoDB 4.0.6 on a database of size 355 GB using pigz on x86 and gzip on LinuxONE III. On x86 pigz was invoked with option -1 (compression level 1) and used software compression. On LinuxONE III gzip was invoked with option -1 and exploited the Integrated Accelerator for z Enterprise Data Compression. The database dump file size on LinuxONE III is 20% bigger than on x86. Results may vary. LinuxONE III configuration: LPAR with 2 dedicated cores, 1.5 TB memory, RHEL 7.6 in SMT mode, database located on IBM DS8000 storage. x86 configuration: 2 Intel® Xeon®
Gold 6140 CPU @ 2.30GHz with Hyperthreading turned on, 1.5 TB memory, RHEL 7.6, database located on IBM DS8000 storage.
• Integrated Accelerator for z Enterprise Data Compression
Perform database dump up to 6.5x faster and with up to 5.5x less CPU time for MongoDB on 2 cores on a LinuxONE III LPAR using the Integrated Accelerator for z Enterprise Data Compression versus a compared x86 platform using software compression
6.1x 6.5x 5x 3.1x
6x 5.5x 5.7x 6.4x
Systems / LinuxONE / © 2020 IBM Corporation
Compressing HTTPS Data before Encryption on LinuxONE III
• Integrated Accelerator for z Enterprise Data Compression
• Benchmark Setup
• Ran wrk2 4.0.0.0 benchmarking tool remotely on x86 blade server with fixed transaction rate of 5 HTTPS requests / core against NGINX web server using
• zlib exploiting the Integrated Accelerator for zEDC
• no compression
to compress transaction data before encryption
• Data transmitted via NGINX webserver was the Silesia compression corpus http://sun.aei.polsl.pl/~sdeor/index.php?page=silesia
• System Stack
• LinuxONE III
• LPAR with 2-8 dedicated cores, 32 GB memory, 40 GB DASD storage, running SLES 12 SP4 with SMT enabled
• NGINX 1.15.9 with patch https://github.com/nginx/nginx/commit/
cfa1316368dcc6dc1aa82e3d0b67ec0d1cf7eebb
• zlib with NXU support https://github.com/madler/zlib/pull/410
LinuxONE III zHypervisor
LPAR (2-8 cores, 32 GB memory) NGINX
web server x86 server
wrk2 Workload
Driver
Systems / LinuxONE / © 2020 IBM Corporation
Compressing HTTPS Data before Encryption on LinuxONE III
DISCLAIMER: Performance results based on IBM internal tests running the wrk2 4.0.0.0 benchmarking tool (https://github.com/giltene/wrk2) remotely with a fix transaction rate against a NGINX 1.15.9 web server exploiting zlib (https://github.com/madler/zlib/pull/410) to compress transaction data before encryption versus not compressing transaction data before encryption. Data transmitted via NGINX webserver was the Silesia compression corpus (http://sun.aei.polsl.pl/~sdeor/index.php?page=Silesia). Results may vary. LinuxONE III configuration: LPAR with 8 dedicated cores, 32 GB memory, 40 GB DASD storage, 200 GB FlashSystem 900 storage, SLES12 SP4 (SMT mode), running NGINX 1.15.9 with patch https://github.com/nginx/nginx/commit/cfa1316368dcc6dc1aa82e3d0b67ec0d1cf7eebb.
• Integrated Accelerator for z Enterprise Data Compression
By compressing transaction data with the Integrated Accelerator for z Enterprise Data Compression prior to encryption, run secure web transactions with up to 2.7x lower latency , up to 1.8x less CPU utilization, and 2.6x less network bandwidth consumption on a LinuxONE III compared to running the transactions with encryption alone
2.2x 2.3x 2.7x
1.6x 1.7x 1.8x
2.6x 2.6x 2.6x
Systems / LinuxONE / © 2020 IBM Corporation
Compressing HTTPS Data before Encryption with Integrated Accelerator for zEDC versus Software Compression
• Integrated Accelerator for z Enterprise Data Compression
• Benchmark Setup
• Ran wrk2 4.0.0.0 benchmarking tool remotely on x86 blade server with fixed transaction rate of 5 HTTP requests / core against NGINX web server using
• zlib exploiting the Integrated Accelerator for zEDC
• zlib -1 software compression
to compress transaction data before encryption
• Data transmitted via NGINX webserver was the Silesia compression corpus http://sun.aei.polsl.pl/~sdeor/index.php?page=silesia
• System Stack
• LinuxONE III
• LPAR with 2-8 dedicated cores, 32 GB memory, 40 GB DASD storage, running SLES 12 SP4 with SMT enabled
• NGINX 1.15.9 with patch https://github.com/nginx/nginx/commit/
cfa1316368dcc6dc1aa82e3d0b67ec0d1cf7eebb
• zlib with NXU support https://github.com/madler/zlib/pull/410
LinuxONE III zHypervisor
LPAR (2-8 cores, 32 GB memory) NGINX
web server x86 server
wrk2 Workload
Driver
Systems / LinuxONE / © 2020 IBM Corporation
Compressing HTTPS Data before Encryption with Integrated
Accelerator for zEDC versus Software Compression
DISCLAIMER: Performance results based on IBM internal tests running the wrk2 4.0.0.0 benchmarking tool (https://github.com/giltene/wrk2) remotely with a fix transaction rate against a NGINX 1.15.9 web server exploiting zlib (https://github.com/madler/zlib/pull/410) to compress transaction data before encryption versus zlib -1 software compression. Data transmitted via NGINX webserver was the Silesia compression corpus (http://sun.aei.polsl.pl/~sdeor/index.php?page=silesia). Results may vary. LinuxONE III configuration: LPAR with 4 dedicated cores, 32 GB memory, 40 GB DASD storage, 200 GB FlashSystem 900 storage, SLES12 SP4 (SMT mode), running NGINX 1.15.9 with patch https://github.com/nginx/nginx/commit/cfa1316368dcc6dc1aa82e3d0b67ec0d1cf7eebb.
• Integrated Accelerator for z Enterprise Data Compression
Up to 30x lower latency and up to 28x less CPU utilization on LinuxONE III by compressing secure web transaction data before encryption using the Integrated Accelerator for z Enterprise Data
Compression instead of using software compression
30x 30x 12x
26x 28x 27x
Systems / LinuxONE / © 2020 IBM Corporation
OpenShift Container Platform 4.2
• Acme Air throughput versus x86
• Acme Air scaling efficiency on z15
• Workload Performance on IBM LinuxONE III
Systems / LinuxONE / © 2020 IBM Corporation
Acme Air Performance on
OpenShift Container Platform 4.2 on LinuxONE III vs. x86 Skylake
• OpenShift Container Platform (OCP)
• Benchmark Setup
• 3 OpenShift Container Platform (OCP) Master and 3 Worker nodes on LinuxONE III under z/VM versus on x86 under KVM
• Acme Air microservice benchmark (https://github.com/blueperf/acmeair- mainservice-java) instances placed manually on the OCP Worker nodes such that each OCP Worker node ran the same number of instances
• Acme Air instances were driven remotely from 3 x86 servers running JMeter 5.2.1
• System Stack
• LinuxONE III
• LPAR with 4 dedicated cores, 64 GB memory, RHEL 8.1 (SMT mode), running the OCP Proxy server
• LPAR with 30 dedicated cores, 160 GB memory, DASD storage, running z/VM 7.1
• 3 guests with 4 vCPU, 16 GB memory, each running an OCP Master
• 3 guests with 16 vCPUs, 32 GB memory, each running an OCP Worker
• OpenShift Container Platform (OCP) 4.2.19
• x86
• 4 Intel® Xeon® Gold 6126 CPU @ 2.60GHz w/ Hyperthreading turned on, 64 GB memory, RHEL 8.1, running the OCP Proxy server
• 30 Intel® Xeon® Gold 6140 CPU @ 2.30GHz w/ Hyperthreading turned on, 160 GB memory, running KVM on RHEL 8.1
• 3 guests with 4 vCPU, 16 GB memory, each running an OCP Master
• 3 guests with 16 vCPUs, 32 GB memory, each running a OCP Worker
• OpenShift Container Platform (OCP) 4.2.19
x86 server
LinuxONE III
Guest 4 (4 vCPU, 16GB memory)
Guest 4 (4 vCPU, 16GB memory) Guest 1
(16 vCPU, 32GB memory)
Guest 1 (16 vCPU, 32GB memory)
zHypervisor z/VM 7.1 in LPAR 30 cores, 160 GB memory
Guest 4 - 6 each 4 vCPU, 16 GB memory
OCP Master
x86 server (4 cores, 64 GB memory)
Guest 1 - 3 each 16 vCPU, 32 GB memory
OCP Worker with Acme Air instances
Compared x86 Platform KVM on RHEL 8.1 30 cores, 160 GB memory LinuxONE III LPAR
(4 cores, 64 GB memory) Proxy / Balancer
x86 server x86 server 1 - 3
JMeter Workload Driver
x86 server x86 server x86 server 1 - 3
Guest 4 (4 vCPU, 16GB memory)
Guest 4 (4 vCPU, 16GB memory) Guest 1
(16 vCPU, 32GB memory)
Guest 1 (16 vCPU, 32GB memory)
Guest 4 - 6 each 4 vCPU, 16 GB memory
OCP Master
Guest 1 - 3 each 16 vCPU, 32 GB memory
OCP Worker with Acme Air instances
Proxy / Balancer JMeter Workload Driver
Systems / LinuxONE / © 2020 IBM Corporation
Acme Air Performance on OpenShift
Container Platform 4.2 on LinuxONE III vs. x86 Skylake
DISCLAIMER: Performance results based on IBM internal tests running the Acme Air microservice benchmark (https://github.com/blueperf/acmeair-mainservice-java) on OpenShift Container Platform (OCP) 4.2.19 on LinuxONE III using z/VM versus on compared x86 platform using KVM. On both platforms 12 Acme Air instances were running on 3 OCP Worker nodes. The z/VM and KVM guests with the OCP Master nodes were configured with 4 vCPUs and 16 GB memory each. The z/VM and KVM guests with the OCP Worker nodes were configured with 16 vCPUs and 32 GB memory each. Results may vary.
LinuxONE III configuration: The OCP Proxy server ran native LPAR with 4 dedicated cores, 64 GB memory, RHEL 8.1 (SMT mode).
The OCP Master and Worker nodes ran on z/VM 7.1 in a LPAR with 30 dedicated cores, 160 GB memory, and DASD storage. x86 configuration: The OCP Proxy server ran on 4 Intel® Xeon® Gold 6126 CPU @ 2.60GHz with Hyperthreading turned on, 64 GB memory, RHEL 8.1. The OCP Master and Worker nodes ran on KVM on RHEL 8.1 on 30 Intel® Xeon® Gold 6140 CPU @ 2.30GHz with Hyperthreading turned on, 160 GB memory, and RAID5 local SSD storage.
Achieve up to 2.7x more throughput per core and up to 2.9x lower latency on OpenShift Container Platform 4.2 on LinuxONE III using z/VM versus on compared x86 platform using KVM, when running 12 Acme Air
benchmark instances on 3 worker nodes
• OpenShift Container Platform (OCP)
2.6x 2.7x 2.2x 2.4x
2.7x 2.9x 2.4x 2.6x
Systems / LinuxONE / © 2020 IBM Corporation
Acme Air Scaling on OpenShift Container Platform 4.2 on
LinuxONE III
• DISCLAIMER: Performance results based on IBM internal tests running the Acme Air microservice benchmark (https://github.com/blueperf/acmeair-mainservice-java) on OpenShift Container Platform (OCP) 4.2.19 on LinuxONE III LT1 using z/VM. The z/VM guests with the OCP Master nodes were configured with 4 vCPUs and 16 GB memory each. The z/VM guest with the OCP Worker node was configured with 2 - 24 vCPUs and 64 GB memory.
Per vCPU one Acme Air instance was running on the OCP Worker node. The Acme Air instances were driven remotely from JMeter 5.2.1. Results may vary. LinuxONE III LT1 configuration: The OCP Proxy server ran native LPAR with 4 dedicated cores, 64 GB memory, RHEL 8.1 (SMT mode). The OCP Master and Worker nodes ran on z/VM 7.1 in a LPAR with 30 dedicated cores, 160 GB memory, and DASD storage.
Scale-out the Acme Air benchmark to 24 virtual CPUs with up to 88% scaling efficiency on an
OpenShift Container Platform 4.2 worker node on LinuxONE III LT1 using z/VM
• OpenShift Container Platform (OCP)
48 LinuxONE III
Guest 4 (4 vCPU, 16GB memory)
Guest 4 (4 vCPU, 16GB memory)
zHypervisor z/VM 7.1 in LPAR 30 cores, 160 GB memory
Guest 2 - 4 each 4 vCPU, 16 GB memory
Master OCP
Guest 1 2 - 24 vCPU, 64 GB memory
OCP Worker with 2 – 24 Acme Air instances
LinuxONE III LT1 LPAR (4 cores, 64 GB memory)
Proxy / Balancer x86 server
JMeter Workload Driver
Systems / LinuxONE / © 2020 IBM Corporation
Questions?
We welcome feedback, ideas and future directions …
Systems / LinuxONE / © 2020 IBM Corporation
49
Notices and Disclaimers
Systems / LinuxONE / © 2020 IBM Corporation
50
© 2020 International Business Machines Corporation. No part of this document may be reproduced or transmitted in any form without written permission from IBM.
U.S. Government Users Restricted Rights — use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM.
Information in these presentations (including information relating to products that have not yet been announced by IBM) has been reviewed for accuracy as of the date of initial publication and could include unintentional technical or typographical errors. IBM shall have no responsibility to update this information. This document is distributed “as is” without any warranty, either express or implied. In no event, shall IBM be liable for any damage arising from the use of this information, including but not limited to, loss of data, business interruption, loss of profit or loss of
opportunity. IBM products and services are warranted per the terms and conditions of the agreements under which they are provided.
IBM products are manufactured from new parts or new and used parts.
In some cases, a product may not be new and may have been previously installed.
Regardless, our warranty terms apply.”
Any statements regarding IBM's future direction, intent or product plans are subject to change or withdrawal without notice.
Performance data contained herein was generally obtained in a controlled,
isolated environments. Customer examples are presented as illustrations of how those customers have used IBM products and the results they may have achieved. Actual performance, cost, savings or other results in other operating environments may vary.
References in this document to IBM products, programs, or services does not imply that IBM intends to make such products, programs or services available in all countries in which IBM operates or does business.
Workshops, sessions and associated materials may have been prepared by independent session speakers, and do not necessarily reflect the views of IBM. All materials and
discussions are provided for informational purposes only, and are neither intended to, nor shall constitute legal or other guidance or advice to any individual participant or their specific situation.
It is the customer’s responsibility to insure its own compliance with legal requirements and to obtain advice of competent legal counsel as to the identification and
interpretation of any relevant laws and regulatory requirements that may affect the
customer’s business and any actions the customer may need to take to comply with such
laws. IBM does not provide legal advice or represent or warrant that its services or
products will ensure that the customer follows any law.
Notices and Disclaimers continued
Systems / LinuxONE / © 2020 IBM Corporation