Operating System, Storage Performance Analysis

(1)

Operating System, Storage Performance Analysis

Robert M. Smith, Microsoft Corporation

(2)

2 2

SNIA Legal Notice

The material contained in this tutorial is copyrighted by the SNIA unless otherwise noted.

Member companies and individual members may use this material in presentations and literature under the following conditions:

Any slide or slides used must be reproduced in their entirety without modification

The SNIA must be acknowledged as the source of any material used in the body of any document containing material from these presentations.

This presentation is a project of the SNIA Education Committee.

Neither the author nor the presenter is an attorney and nothing in this

presentation is intended to be, or should be construed as legal advice or an opinion of counsel. If you need legal advice or a legal opinion please contact your attorney.

The information presented herein represents the author's personal opinion and current understanding of the relevant issues involved. The author, the presenter, and the SNIA do not assume any responsibility or liability for damages arising out of any reliance on or use of this information.

NO WARRANTIES, EXPRESS OR IMPLIED. USE AT YOUR OWN RISK.

(3)

Abstract

OS Storage Performance Analysis

Analyzing and dealing with storage performance at the OS level can be challenging in many respects. This tutorial covers aspects of performance with respect to storage.

This tutorial will also cover tools that can be used to assist in the analysis of operating system performance.

This presentation will include the following:

Factors affecting storage performance

Examples of tools to monitor storage performance Recommendations to improve storage performance

(4)

SAN I/O Path, 1000 ft. view

4

(5)

OS I/O: Closer View

File System Volume / Partition

Device Class Command Port User Mode

Kernel Mode

Application

Storage

(6)

Rotational Drives

“Capacity Optimized” drives

TB Size: 0.5, 1, 2, 3

*IOPS: >= 120 (worst case, random “full-stroke” workloads) SAS or SATA

Regardless of size, same performance, same IOPs

~8.5 ms latency (½ platter seek); worst case 16 to 19 ms (on average across manufacturers)

“Performance Optimized” drives

GB Size: 72, 144, 450, 600, 900

*IOPS: 200-400 (worst case) SAS, FC (some SATA)

2-4 ms latency (on average across manufacturers)

6

Disk Drive Factors

(7)

SSD & Hybrid Storage

Cost: Dollars per GB SSD Solid-State Drive

No moving parts

Less power consumption 75, 150, 300, 500, 600 GB

OS likely has native SSD support (Trim, etc.) Microsecond latency

Flash block erase before write

Undersized: provides spare cells for wear-leveling and bad-block mapping

(ex. 150 GB drive might be sold as a 100 GB drive)

Hybrid Storage Solutions

Solid-State and rotational disks in same chassis

Disk Drive Factors (2)

(8)

Storage Hardware Factors

Controller Cache Configurations

How much cache?

What is read/write ratio of cache?

How effective is cache?

Enterprise storage usually has performance measuring capability onboard

What happens when a threshold is reached? (I.E. Flush)

Idle flushing: does not interrupt, I/O continues

Low and high watermark flushing: triggers flushing, minor performance impact

Forced flushing: to free cache pages, all I/O temporarily halted

8

(9)

Storage Hardware Factors (2)

Is cache “mirroring” involved

If so, is there a performance impact?

Are there other workloads on the storage device?

What hardware is between initiator and target?

If SAN, how many and what types of switches?

Virtualization Appliances

Some take the “LUNs” presented and virtualize those Some have onboard storage

(10)

Storage Hardware Virtualization

Virtual Disks (AKA LUN)

Comprised of a group or “chunk” of a group of physical disks, and then presented by a storage device

Possibly indicated by:

Non-standard size

Device interrogations returning storage vendor vs. drive vendor

Virtualize to consolidate

Aggregation of underlying LUNs (virtualization appliance)

Adds complexity

Troubleshooting more difficult

(example, very tough to find “hot spots”)

10

(11)

Storage Layout Factors:

Disk Configuration RAID level

ex. 1, 5, 6, 1+0, 0+1, 5+0, 0+5, 6+0 etc.

Number of physical disk drives backing

Levels of virtualization between server(s) and disks?

Any storage pool sharing involved?

Dedicated disks or shared storage pools?

What is the backup schedule for ALL connected hosts

LUN snapshots, database table scans, etc.

What decisions affected design?

(12)

Cost

Consolidation Migration

Risk

RAID types versus performance

Power and cooling Expansion

Manageability

12

Storage Layout Factors:

Design Decisions

(13)

Storage Layout Factors (2)

What happens to a storage group if a disk drive fails?

What is the performance impact?

How long to rebuild?

Data could be vulnerable during rebuild Is anyone notified of a failure?

(14)

Storage Path Factors (FC & iSCSI)

Path is usually a “mesh”

Multiple paths may be meshed, but not physically connected

Redundant paths on separate fabrics are common Multipath I/O (MPIO) software can load balance

Designed Path Capacity

Oversubscription Fan In, Fan Out

Inter-switch links (ISLs)

Intermediate devices

Core Switches (Fan-In / Fan-Out Ratio)

Routing across disparate fabrics ₁₄

Failover Only Least Queue Depth Round Robin Weighted Paths Round Robin with Subset Least Blocks

(15)

Storage Controller Factors

Mass-Storage Controllers

Range from on-board to add-in

Some have battery backup ability in either case

Basic controllers report limited diagnostic information Advanced controllers have diagnostics available

Vendor supplied tools

Capable of sending events to operating system through extended logging

Enterprise storage may have multiple controllers with shared cache

(16)

Fibre Channel or SAS HBAs

Host-Bus Adapter (HBA)

8 Gb and 16 Gb available today SCSI command interface to OS

Often synonymous with Fibre Channel SAN Offload packet assembly and disassembly Provides OS a view into the SAN

(though most activity is abstracted by default)

Vendor provided diagnostics and performance tools No software capture tools

Multiple HBAs, or multiple-port HBAs enable Multiple Path I/O (MPIO)

Most OS have native support for MPIO

16

(17)

Ethernet Adapters

Ethernet Network Interface Card (NIC)

10 GbE

TCP/IP and Chimney Offloads Hardware parity, CRC, ECC

Converged Network Adapter (CNA)

Combines functionality of HBA and NIC Fibre Channel over Ethernet (FCoE)

CPU offloads for FCoE and iSCSI

Can present NIC, FCoE, or iSCSI function to host

Teaming software for throughput and availability

Software analyzers likely unable to capture all traffic

(18)

Latency

Rotational Disks

Millisecond latency

Sequential writing to rotational drives is the most efficient

Sequential, and/or “full-stripe”

writes to RAID disks are most efficient

Latency occurs as heads have to move position across rotating platter

Operating system logical address may be different from physical location on disk device

18

SSD

Microsecond latency

Small random writes slowest (Flash block) Flushing

Firmware

Keeps improving performance and availability

(19)

Queuing

The art of keeping the I/O pipeline populated, but not congested

Can happen at many levels

Operating system can build up thousands of I/O requests Can build up at switch ports (buffer credits)

Can build up at backend storage ports (inbound queue) Can build up in storage controllers (HBA, NIC, etc.)

I/O throttling via queue depth setting

Individual disk devices

Native command queuing (NCQ) for SATA AHCI

(20)

“Short-Stroking” to reduce latency

Forcing the use of a smaller area of a rotating disk to reduce seek distance, thus latency

Also a result of “aerial density”

Data is written more densely on outer tracks

Outer edge of disk may get 150 MB/s while inner tracks get 80 MB/s

Less latency means more IOPs

Penalty is under-utilized storage space

20

(21)

“Advanced Format” (AF) Technology

AF Refers to physical disk sector size and/or block architecture

Previous limits

Physical disk sector size: 512 bytes

Master Boot Record (MBR) structure sizes

Approximately 2 Terabytes maximum disk size

New Capabilities:

Physical sector size: <currently> 4096 bytes (4 kb) 512e is a 4 kb block presented as 512-byte block More space for error checking (CRC)

More storage space available in same or less physical space

No corresponding increase in performance capability

(22)

Partition Alignment

Previously a problem, manual steps to mitigate Current OS align by default

Check partition starting sector to confirm

Using management interface (Ex. WMI) Look for starting offset of 2048 blocks

Cannot easily change

Can automate during OS installation Affects legacy and AF drive technology

512e AF blocks can suffer from misalignment

22

(23)

Understanding the workload

Request size Burstiness

“Hot” data Concurrency

Inter-arrival time

(time of arrival from one request to the next) Locality (matters more on rotational than SSD)

Few tools can faithfully reproduce a “live” workload

(24)

Performance Counters

I/O Transfer Time (Latency)

Avg. disk sec/read Avg. disk sec/write

Queuing

Avg. disk read queue length Avg. disk write queue length

Throughput

Avg. disk bytes/read Avg. disk bytes/write

Network

Output queue length

24

Transfers / sec (IOPS)

Disk transfers/sec Disk reads/sec Disk write/sec

%Idle Time

Can be misleading

Split I/O

Fragmentation Large Requests

OS CPU

OS Memory

(25)

Performance Analysis Tools

Sampling Tools

Samples may be instantaneous or counters Good for long-term analysis

Real-time Tools

Software tracing

Kernel Drivers

Hardware tracing

Nothing abstracted

Can be difficult to see everything in between initiator and target Transport security may be a factor

– IPSEC

(26)

Vendor Provided Tools (1)

Vendor Provided Tools

Provide information about devices that may not all be reported up to OS

Provide adapter-wide performance statistics Allow for adapter test

Settings changes for tuning

Fabric Software

End-to-end visibility

Sometimes bundled with devices

Ability to easily view fabric devices, including stats Help identify “hot spots”

May require <all> device clock sync for accuracy

26

(27)

Vendor Provided Tools (2)

(28)

Vendor Provided Tools (3)

28

Some common FC error counters

Link Failure

Link down, zoning change (isolation)

Sync Loss

Can be caused by OS reboot

Signal Loss

Can be caused by OS reboot

Invalid CRC

Not normal

Primitive Sequence

(29)

Other Error Factors

iSCSI

CRC Digest

TCP/IP

CRC

Checksum

Fibre Channel

Primitive Sequence Buffer_0

ED_TOV RA_TOC

(30)

Virtualization Factors: Hosts

Measure overall workload over time

Try to provision storage to meet workload

Stripe-Unit size

Number of disks per storage pool or LUN

If latency becomes apparent, monitor queue depth

If queue depth is too low, disks may not be fully utilized

If queue depth is too high, disks might be queuing, or I/O might be delayed in transit

Adapter (FC, iSCSI, CNA, etc.)

Consult with vendor for recommendations

Queue depth

– Determine if a change is needed based on performance – Too high and could saturate link of cause stalling in transit

Onboard: Add disks, add controllers and disks, spread load

Keep up with host software updates and firmware

30

(31)

Virtualization: General

Fixed size disks for intensive performance needs Over-provisioned disks; SSD or hybrid if possible Pass-Through Disks: Very little overhead, good perf Additions/Integrations

Emulated SCSI or FC controllers may yield better perf Add additional emulated controllers with fewer disks per

Monitor memory within VM

Low free memory could lead to excessive paging or trimming

Patch guests as you would physicals:

Proactively look for and apply performance and stability

(32)

Performance Recommendations

Update software and drivers running in storage stack

Anti-Virus Firewall

Other Security File Screening HBA, CNA, NIC

Multipath (MPIO) software Teaming software

Discover all software in storage stack

Trace Tools

Remove any non-vital software in storage stack Utilize appropriate tier of storage per workload

32

(33)

Performance Recommendations (2)

Tune cache on storage controllers

Based on observed workload over time

Based on cache effectiveness counters (cache hits, etc.)

Look for hot spots

Can be hard to find

Visual trace tools may help

Symptom: Optimal storage performing poorly for no other reason

Be proactive with alerting

SMI-S SNMP

Start with a baseline, periodic snapshot

(34)

Performance Recommendations (3)

Optimize FAN-IN and/or FAN/OUT ratios

Avoid congestion points

Monitor fabric for BUFFER_O, and other errors (set alerts; automate as much as possible)

Follow best practices for iSCSI

VLAN or dedicated hardware Limit protocols in use

Limit or remove sharing

Optimize hardware per vendor recommendations

Avoid unplanned changes and track in detail if made

Snapshot before and after if possible, and keep logs

Chart all storage related tasks, look for overlap

34

(35)

Performance Recommendations (4)

Keep historical data about workload

Take traces periodically (automate if possible)

Provides for trending and lifecycle planning

Use monitoring software and keep data for a year or two Have data readily available for engineering and vendor staff

Plan the workload as much as possible

Keep charts, graphs, spreadsheets, databases

Exercise new storage layouts before production

Ask vendors for help if needed with load simulation tools Also ask for help if needed with performance tools

Simulate failure(s) in test environment Familiarize yourself with support model

(36)

36 36

Q&A / Feedback

Many thanks to the following individuals for their contributions to this tutorial.

- SNIA Education Committee Chris Lionetti,

Flavio Muratore Bruce Worthington, Joseph White, Juniper