• No results found

Intel Virtual RAID on CPU (Intel VROC) Detailed Comparison to RAID HBA

N/A
N/A
Protected

Academic year: 2022

Share "Intel Virtual RAID on CPU (Intel VROC) Detailed Comparison to RAID HBA"

Copied!
33
0
0

Loading.... (view fulltext now)

Full text

(1)

Intel® Virtual RAID on CPU (Intel® VROC)

Detailed Comparison to RAID HBA

(2)

Intel Optane Group 2

Notices and Disclaimers

Notices & Disclaimers

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex​​.

Performance results are based on testing as of dates shown in configurations and may not reflect all publicly available ​upda tes. See backup for configuration details. No product or component can be absolutely secure.

Your costs and results may vary.

Intel technologies may require enabled hardware, software or service activation.

© Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others. ​

(3)

Intel Optane Group 3

Purpose:

Broad categorical comparison of Intel VROC (Integrated RAID) vs HW RAID HBAs on features, performance, latency, CPU% and power usage.

Agenda:

1. Architecture and Feature Comparison 2. Key findings

3. Intel® Optane™ SSD Comparisons 4. Test Configuration Details

5. Pass-thru Mode (No RAID) Comparison 6. RAID0/1/5/10 Performance Results

7. Detailed RAID0/5 Review (Latency, CPU%, Power)

(4)

Department or Event Name 4

Architecture and Feature Comparison

(5)

Intel Optane Group 5

Intel® VROC onboards RAID HBA functionality onto Intel® Xeon® CPUs

1

Intel® VROC vs RAID HBA

RAID HBA Product:

• MegaRAID 9560-16i

Category:

• HW RAID

PCIe Generation:

• Gen. 4

Storage Uplink:

• x8 PCIe Lanes

# Drives:

• 4 SSDs

Legacy RAID Architecture

PCIe Uplink Potential

Bottleneck

Intel® Xeon® Scalable Processor

Intel VROC Product:

• Intel VROC

Category:

• Integrated RAID

PCIe Generation:

• Gen. 4

Storage Uplink:

• X4 PCIe per SSD

# Drives:

• 4 SSDs

1-Intel VROC and Intel VMD are available on all generations (Gen. 1, 2 and 3) and SKUs (Bronze, Silver, Gold, and Platinum) of Intel Xeon Scalable Processor

(6)

Intel Optane Group 6

Intel® VROC vs RAID HBA

Major RAID Features HW RAID VROC Intel® VROC Comment

Error Handling/Isolation

√ √

Both architectures isolates SSD error/event handling to reduce OS crash/reboot

Reliable data storage

√ √

Enterprise data protection, even when power loss occurs

Boot support

√ √

Redundant system volume = less down-time/crashes

In-band Management Tools

√ √

Various UEFI, GUI, and CLI Utilities for each

Out-of-band RAID Config.

X

Intel VROC has OOB on roadmap for upcoming releases

Full NVMe SSD x4 Bandwidth

X

Intel VROC + Intel VMD allows full x4 access to SSDs, no HW Uplink

RAID Processing Location On HBA On Intel® Xeon Uses powerful Intel® Xeon® CPU to RAID the fast NVMe* SSDs. Better scaling for heavy workloads (see Detailed CPU Review)

Supported RAID Levels 0/1/5/6/10/50/60 0/1/5/10 RAID6/50/60 not needed for perf./AFR of NVMe SSDs

Write back cache DRAM + BBU Integrated Caching +

Intel® Optane™ SSD Replace DRAM WB Cache + BBU with persistent Intel® Optane™ media

SED Key Management On HBA Platform Integrated Intel VROC uses platform protocols and remote KMS to manage keys

Idle Power1 577W 562W Tested 15W reduction in Idle Power Usage with Intel VROC

See backup for configuration details. For more complete information about performance and benchmark results, visit www.intel.com/benchmarks..

(7)

Department or Event Name 7

Key Findings

(8)

Intel Optane Group 8

Summary (Highlights) 1,2

1. Intel VROC has compelling features to replace RAID HBA , plus a roadmap to fill any gaps (OOB)

2. Intel VROC is the only RAID solution that scales with the Intel Optane SSD solution to deliver extraordinary performance (Over 5.6M IOPS!)

3. Intel VROC performance for all RAID levels is equal or better than RAID HBA ( Performance, ↓ Latency )

4. Intel VROC can improve resource utilization by removing the HBA and related choke points ( ↓ CPU Usage, ↓ Power )

5. Intel VROC has a scalable, integrated design that is better designed for NVMe SSDs ( ↑ IOPS/CPU Core, ↑ IOPS/W )

See backup for configuration details. For more complete information about performance and benchmark results, visit www.intel.com/benchmarks..

(9)

Department or Event Name 9

Test Configuration Details

(10)

Intel Optane Group 10

Test Configuration Details (Optane)

4 x 400GB Intel Optane P5800X SSDs

• Write Spec: 1,150,000 IOPS

• Read Spec: 1,500,000 IOPS

Tested Configurations:

• Single Drive Performance

• 4x Drives pass-thru in parallel (no RAID)

• 4x Drive RAID0/5/10

• 2x Drive RAID1

Workload Details:

• 4k Random: 70/30 R/W

• 16 Threads, 16 IODepth

Metrics

• Performance: IOPS

• Bandwidth: MB/sec

• Latency: µsec

• CPU Usage*: Effective Intel Xeon Cores used

Data on Slides 11-14

Legacy RAID HBA Intel® Xeon® Scalable

Processor

*CPU Usage measured as total platform CPU % consumption, includes workload generation, storage stack (RAID) usage, and background activity Measured as “Cores Used” = CPU% report out * # cores on system (64 cores)

(11)

Intel Optane Group 11

Test Configuration Details (NAND)

4x 3.84TB Intel D7 P5510 SSDs

• Write Spec: 170,000 IOPS

• Read Spec: 700,000 IOPS

Tested Configurations:

• Single Drive Performance

• 4x Drives pass-thru in parallel (no RAID)

• 4x Drive RAID0/5/10

• 2x Drive RAID1

Workload Details:

• 4k Random: 100% Reads, 70/30 R/W, 100% Writes

• 1 Threads, 1 IODepth (Isolate Storage Path)

• 16 Threads, 64 and 256 IODepth (Peak performance)

Metrics

• Performance: IOPS

• Power: Watts (Idle and under load)

• Latency: µsec

• CPU Usage*: Effective Intel Xeon Cores used

Data on slides15-28

Legacy RAID HBA Intel® Xeon® Scalable

Processor

*CPU Usage measured as total platform CPU % consumption, includes workload generation, storage stack (RAID) usage, and background activity Measured as “Cores Used” = CPU% report out * # cores on system (64 cores)

(12)

Department or Event Name 12

Intel Optane Comparisons

(13)

Intel Optane Group 13

RAID Levels Performance Comparison 1

Intel VROC achieves up to 5.6 million IOPS with RAID0 on mixed workloads

Intel VROC has up to:

161% more IOPS on RAID0 50% more IOPS on RAID5 248% more IOPS on RAID10

138% more IOPS on RAID1

Intel VROC RAID5 > HBA RAID10 performance

- 1,000,000 2,000,000 3,000,000 4,000,000 5,000,000 6,000,000

RAID0 RAID5 RAID10 RAID1

IOPS

R/W IOPS Comparison

VROC Reads/Writes HBA Reads/Writes

Intel® Optane™ SSDs: 16 Thread, 16 IODepth: 70/30 R/W

See backup for configuration details. Results may vary

(higher is better)

(14)

Intel Optane Group 14

RAID0 Simultaneous Read/Write Comparison 1

Intel VROC RAID0 reads/writes provides:

• ↑ IOPS

• ↓ Latency

• ↓ CPU Usage

• ↑ Bandwidth

RAID0 provides higher performance metrics but with lower resource usage (CPU)

Up to 161% more Read/Write IOPS Up to 61% lower latency

5,689,962

2,177,404

0 2,000,000 4,000,000 6,000,000

IOPS

Hi gher is better

VROC HBA

43

111

0 50 100 150

Latency (µsec)

Lower i s better

8

9

0 4 8 12

CPU Cores Used

Lower i s better

22,760

8,710 0

8,000 16,000 24,000

Bandwidth (MB/sec)

Hi gher is better

Intel® Optane™ SSDs: 16 Thread, 16 IODepth: 70/30 R/W

See backup for configuration details. Results may vary

(15)

Intel Optane Group 15 1,121,999

743,373

0 400,000 800,000 1,200,000

IOPS Hi gher is better

VROC HBA

362

461

0 200 400 600

Latency (µsec)

Lower i s better

RAID5 Simultaneous Read/Write Comparison 1

Intel VROC RAID5 reads/write provides:

• ↑ IOPS

• ↓ Latency

• ↑ CPU Usage*

• ↓ Bandwidth

*RAID5 uses 4 more cores but delivers up to 380K additional IOPS

Up to 50% more Read/Write IOPS Up to 50% more Bandwidth

7

3

0 3 6 9

CPU Cores Used

Lower i s better

4,488

2,973

0 1,600 3,200 4,800

Bandwidth (MB/sec)

Hi gher is better

Intel® Optane™ SSDs: 16 Thread, 16 IODepth: 70/30 R/W

See backup for configuration details. Results may vary

(16)

Department or Event Name 16

NAND SSD Comparisons

(17)

Department or Event Name 17

Pass-thru Mode (No RAID) Comparison

(18)

Intel Optane Group 18

Low Workload, Pass-Thru Comparison 2

Intel VROC provides unimpeded access to storage for lower latency I/0

▪ Single Drive, 100% Write: {40% IOPS ↑, 32% Latency ↓}

▪ Single Drive, 100% Read: {29% IOPS ↑, 23% Latency ↓}

Single drive performance improvements scales to multiple drives

NAND SSDs: 1 Thread, 1 IODepth

0 25 50 75 100

0 20,000 40,000 60,000 80,000

100% Writes 70/30 R/W 100% Read

Latency (usec)

IOPS

1x Pass-thru Comparisons

VROC 1x Pt IOPS HBA 1x Pt IOPS

VROC 1x Pt Ave. Latency HBA 1x Pt Ave. Latency

0 25 50 75 100

0 75,000 150,000 225,000 300,000

100% Writes 70/30 R/W 100% Read

Latency (usec)

IOPS

4x Pass-thru Comparisons

VROC 4x Pt's IOPS HBA 4x Pt's IOPS

VROC 4x Pt's Ave. Latency HBA 4x Pt's Ave. Latency

See backup for configuration details. Results may vary

(higher is better) (lower is better) (higher is better) (lower is better)

(19)

Intel Optane Group 19

Peak Performance, Pass-Thru Comparison 2

NAND SSDs: 16 Thread, 64 IODepth

0.0 0.5 1.0 1.5 2.0

0 200,000 400,000 600,000 800,000

100% Writes 70/30 R/W 100% Read

CPU Cores Used

IOPS

1x Pass-thru Comparisons

VROC 1x Pt IOPS HBA 1x Pt IOPS

VROC 1x Pt CPU Cores Used HBA 1x Pt CPU Cores Used

0 5 10 15

0 1,000,000 2,000,000 3,000,000

100% Writes 70/30 R/W 100% Read

CPU Cores Used

IOPS

4x Pass-thru Comparisons

VROC 4x Pt's IOPS HBA 4x Pt's IOPS

VROC 4x Pt's CPU Cores Used HBA 4x Pt's CPU Cores Used

Higher workloads saturate the storage on both solutions

▪ Latency differences are masked, performance becomes equivalent

Other architecture differences are exposed: Power and CPU usage

▪ Additional HBA power draw creates positive WΔ; Intel VROC ↓ Power

▪ RAID HBA on card processing is oversaturated by larger workloads; Intel VROC ↓ CPU Usage

(See detailed CPU Review) 1x

W Δ IO 4x

W Δ 13W Write 20W 17W 70/30 30W

22W Read 46W

Power (W) Usage Delta RAID HBA (W) – Intel VROC (W)

See backup for configuration details. Results may vary

(higher is better) (lower is better) (higher is better) (lower is better)

(20)

Department or Event Name 20

RAID0/1/5/10 Performance Results

(21)

Intel Optane Group 21

RAID Levels Performance Comparison 2

- 100,000 200,000 300,000 400,000 500,000 600,000 700,000

RAID0 RAID5 RAID10 RAID1

IOPS

Write IOPS Comparison

VROC Writes HBA Writes

▪ Intel VROC has 33% more IOPS on RAID5 writes

Intel VROC Read Performance scales to maximum 4x SSD Spec (~2.8M IOPS RAID0/5/10)

▪ HBA hits 2.2M IOPS Bottleneck; Intel VROC delivers up to 27% more IOPS on RAID0/5/10 reads

- 500,000 1,000,000 1,500,000 2,000,000 2,500,000 3,000,000

RAID0 RAID5 RAID10 RAID1

IOPS

Reads IOPS Comparison

VROC Reads HBA Reads

NAND SSDs: 16 Thread, 64 IODepth

RAID Level Ma x. IOPS

See backup for configuration details. Results may vary

(22)

Department or Event Name 22

Detailed RAID0/5 Review (Latency, CPU%,

Power)

(23)

Intel Optane Group 23

RAID0/5 Read Comparison 2

Intel VROC RAID0/5 reads provides:

• ↑ IOPS

• ↓ Latency

• ↓ CPU Usage

• ↓ Power Consumption

Integrated RAID is a more effective RAID architecture for NVMe SSDs

Up to 30% more Read IOPS/W

Up to 164% more Read IOPS/CPU Cores Used

2,811,404 2,808,785

2,255,698 2,196,795

0 1,000,000 2,000,000 3,000,000

RAID0 RAID5

IOPS

Hi gher is better

VROC HBA

362 447 362 461

0 300 600

Latency/usec

Lower i s better

4 5

8 7

0 5 10

CPU Cores Used

Lower i s better

748 751

771 768

720 750 780

Power (W)

Lower i s better

NAND SSDs: 16 Thread, 64 IODepth

See backup for configuration details. Results may vary

(24)

Intel Optane Group 24

RAID0/5 Write Comparison 2

Intel VROC RAID0/5 reads provides:

• ↑ IOPS

• ↓ Latency

RAID 0 also ↓ CPU Usage and Power Usage

RAID5 provides higher performance metrics but with higher resource usage (CPU and Power)....

This is not the whole story

Up to 28% more Write IOPS/W

See ‘CPU% Usage Explained’ for more

601,711

214,286 595,907

161,029 0

350,000 700,000

RAID0 RAID5

IOPS

Hi gher is better

VROC HBA

850

2,386 857

3,178

0 1,600 3,200

Latency/usec

Lower i s better

1.23

2.62

1.87 0.50

0.0 1.5 3.0

CPU Cores Used

Lower i s better

707

730 726

703

700 720 740

Power (W)

Lower i s better

NAND SSDs: 16 Thread, 64 IODepth

See backup for configuration details. Results may vary

(25)

Department or Event Name 25

CPU% Usage Explained

(26)

Intel Optane Group 26

CPU% Usage-Perception 2

Common perception: RAID HBA consumes less host CPU resources due to HBA offload Reality: Intel VROC can deliver ↑ performance and consumes ↓ CPU resources!

HOW?

0 2 4 6

0 250,000 500,000 750,000

1x Pt 4x Pt RAID0 RAID5

CPU Cores Used

IOPS

IOPS v CPU (writes)

VROC IOPS HBA IOPS VROC CPU Cores Used HBA CPU Cores Used

0 5 10 15

0 1,000,000 2,000,000 3,000,000

1x Pt 4x Pt RAID0 RAID5

CPU Cores Used

IOPS

IOPS v CPU (reads)

RAID5 Write Exception

See backup for configuration details. Results may vary

(27)

Intel Optane Group 27

CPU% Usage-Reality Explained 2

NVMe SSD performance can overwhelm RAID HBA offload design

16 Threads 64 IODepth → 100k’s Write IOPS and 1M’s Read IOPS

HBA architecture has choke points that can bottleneck performance:

These limitations cause thrash on CPU%....and can lead to iowait%

1. Limited PCIe Uplink

(x8 PCIe lanes)

Full x4 bandwidth per NVMe SSD

2. Fixed amount of RAID processing Scaled compute on powerful Intel Xeon

3. SCSi-based RAID stack NVMe optimized RAID stack

Intel® VROC Improvements

See backup for configuration details. Results may vary

(28)

Intel Optane Group 28

iowait% Closer Look 2

RAID5 writes require high CPU%

• Highest of any Intel VROC supported RAID level per IOP

RAID HBA offload generates iowait at higher workloads:

• If limits of HBA architecture are reached (more IO), host CPU usage ramps up in iowait%

• Iowait could be wasted cycles depending on application

Intel VROC is more efficient for RAID5 writes:

No ramping of iowait

Up to 4% more Write IOPS/CPU Cores Used*

214,286 219,960

161,029 161,031

0 150,000 300,000

RAID5 64-Thread RAID5 256-Thread

IOPS

Hi gher is better

VROC HBA

NAND SSDs: 16 Thread, 64 IODepth → 16 Thread, 256 IODepth

See backup for configuration details. Results may vary

2.6

0 0.5 2 4

CPU Cores Used

Lower i s better 3.7

0.5

2.2 Cores i n

i owait

*when accounting for i owait%

(29)

Intel Optane Group 29

CPU% Usage-Customer Impact

Server design must plan for Peak Storage Load Peak Storage Load (PSL): Max. IO during data center operation

RAID Solution Response to PSL

RAID HBA Intel VROC

• Bottleneck performance

• Iowait% ramp and higher latency

• Operational Thrash if storage architecture not properly planned

• Scale performance to absorb PSL

• Proportionally ramp CPU usage and latency

• Mitigate server thrash with fewer CPU cores dedicated for RAID

Intel VROC servers often require fewer CPU cores to handle Peak Storage Load

See backup for configuration details. Results may vary

(30)

Department or Event Name 30

Backup

(31)

Intel Optane Group 31

Configuration Details

1. Intel VROC vs RADI HBA Comparison (Optane)

System configuration: Beta Coyote Pass M50CYP2SB2U/M50CYP2SBSTD (chassis M50CYP2UR208BPP), 2 x Intel® Xeon® Platinum 8358 CPU @ 2.60GHz, 32 cores each, DRAM 128GB , BIOS Release 04/02/2021, BIOS Version: SE5C6200.86B.0020.P24.2104020811

OS: RedHat* Enterprise Linux 8.1, kernel-4.18.0-147.el8.x86_64, mdadm - v4.1 - 2018-10-01, Intel® VROC Pre-OS version 7.5.0.1152

Storage: Both configurations used 4 x 400GB Intel Optane P5800X PCIe Gen4 U.2 SSDs (Model: SSDPF21Q400GB, Firmware: L0310100) connected to backplane which is connected via SlimSAS cables directly to a Broadcom 9560-16i (x8) card on Riser 2, PCIe slot 1 on CPU2 BIOS setting:

SpeedStep(Enabled), Turbo(Enabled), ProcessorC6(Enabled), PackageC-State(C0/C1 State), CPU_PowerAndPerformancePolicy(Performance), HardwareP-States(NativeMode), WorkloadConfiguration(I/O Sensitive)

RAID Configurations: 4-Disk RAID0/5/10 and 2-Disk RAID1 with Intel VROC and Broadcom MegaRAID 9560-16i Workload Generator: FIO 3.25, 16-thread 16-IODepth

Performance results are based on testing as of 6/25/2021 and may not reflect all publicly available updates. See configuration disclosure for details. No product can be absolutely secure.

(32)

Intel Optane Group 32

Configuration Details

2. Intel VROC vs RADI HBA Comparison (NAND)

System configuration: Beta Coyote Pass M50CYP2SB2U/M50CYP2SBSTD (chassis M50CYP2UR208BPP), 2 x Intel® Xeon® Platinum 8358 CPU @ 2.60GHz, 32 cores each, DRAM 128GB , BIOS Release 03/22/2021, BIOS Version: SE5C6200.86B.0022.D08.2103221623

OS: RedHat* Enterprise Linux 8.1, kernel-4.18.0-147.el8.x86_64, mdadm - v4.1 - 2018-10-01, Intel® VROC Pre-OS version 7.5.0.1152

Storage: Both configurations used 4x 3.84 TB Intel® D7-P5510 Series SSDs (Model: SSDPF2KX038TZ, Firmware: JCV10016) connected to internal

backplane. With Intel VROC config, backplane connect directly to CPU2 via SlimSAS. With RAID HBA, backplane connect to RAID HBA on Riser 2, PCIe slot 1 on CPU2

BIOS setting: SpeedStep(Enabled), Turbo(Enabled), ProcessorC6(Enabled), PackageC-State(C0/C1 State),

CPU_PowerAndPerformancePolicy(Performance), HardwareP-States(NativeMode), WorkloadConfiguration(I/O Sensitive) RAID Configurations: 4-Disk RAID0/5/10 and 2-Disk RAID1 with Intel VROC and Broadcom MegaRAID 9560-16i

Workload Generator: FIO 3.25, 1-thread 1-IODepth, 16 thread 64/256 IODepth

Performance results are based on testing as of 5/3/2020 and may not reflect all publicly available updates. See configuration disclosure for details. No product can be absolutely secure.

(33)

33

References

Related documents

The MPA has core units that meet professional accounting educational requirements that include study in business accounting, accounting information systems, financial

Snapshot stores periodic snapshots of a virtual drive’s data at different points in time (PiTs), the user configures virtual drives for each application(s) including one for

34 Forecast/actual expenditure for Children’s Social Care is in line with the latest agreed budget <2.0% of net budget +£1.595m or. +3.4%

You should only consider a personal attack alarm in situations where it is the only practical means of obtaining rapid police intervention and protecting of the potential

Note: Intel® C600 series chipset based server boards’ onboard RAID controller need proper Intel® C600 Chipset RAID upgrade key to support SAS devices.. Otherwise, the RAID

Integrated SATA: Intel® Matrix Storage Technology with host-based software RAID levels 0, 1, 5, and 10 Optional SAS: Intel® Integrated Server RAID system boards with RAID 0, 1, 1E,

If drives have previously been used as part of a RAID array, it may be necessary to reset the volume information prior to the creation of an array or pass through configuration..