Operating System, Storage Performance Analysis
Robert M. Smith, Microsoft Corporation
2 2
SNIA Legal Notice
The material contained in this tutorial is copyrighted by the SNIA unless otherwise noted.
Member companies and individual members may use this material in presentations and literature under the following conditions:
Any slide or slides used must be reproduced in their entirety without modification
The SNIA must be acknowledged as the source of any material used in the body of any document containing material from these presentations.
This presentation is a project of the SNIA Education Committee.
Neither the author nor the presenter is an attorney and nothing in this
presentation is intended to be, or should be construed as legal advice or an opinion of counsel. If you need legal advice or a legal opinion please contact your attorney.
The information presented herein represents the author's personal opinion and current understanding of the relevant issues involved. The author, the presenter, and the SNIA do not assume any responsibility or liability for damages arising out of any reliance on or use of this information.
NO WARRANTIES, EXPRESS OR IMPLIED. USE AT YOUR OWN RISK.
Abstract
OS Storage Performance Analysis
Analyzing and dealing with storage performance at the OS level can be challenging in many respects. This tutorial covers aspects of performance with respect to storage.
This tutorial will also cover tools that can be used to assist in the analysis of operating system performance.
This presentation will include the following:
Factors affecting storage performance
Examples of tools to monitor storage performance Recommendations to improve storage performance
SAN I/O Path, 1000 ft. view
4
OS I/O: Closer View
File System Volume / Partition
Device Class Command Port User Mode
Kernel Mode
Application
Storage
Rotational Drives
“Capacity Optimized” drives
TB Size: 0.5, 1, 2, 3
*IOPS: >= 120 (worst case, random “full-stroke” workloads) SAS or SATA
Regardless of size, same performance, same IOPs
~8.5 ms latency (½ platter seek); worst case 16 to 19 ms (on average across manufacturers)
“Performance Optimized” drives
GB Size: 72, 144, 450, 600, 900
*IOPS: 200-400 (worst case) SAS, FC (some SATA)
2-4 ms latency (on average across manufacturers)
6
Disk Drive Factors
SSD & Hybrid Storage
Cost: Dollars per GB SSD Solid-State Drive
No moving parts
Less power consumption 75, 150, 300, 500, 600 GB
OS likely has native SSD support (Trim, etc.) Microsecond latency
Flash block erase before write
Undersized: provides spare cells for wear-leveling and bad-block mapping
(ex. 150 GB drive might be sold as a 100 GB drive)
Hybrid Storage Solutions
Solid-State and rotational disks in same chassis
Disk Drive Factors (2)
Storage Hardware Factors
Controller Cache Configurations
How much cache?
What is read/write ratio of cache?
How effective is cache?
Enterprise storage usually has performance measuring capability onboard
What happens when a threshold is reached? (I.E. Flush)
Idle flushing: does not interrupt, I/O continues
Low and high watermark flushing: triggers flushing, minor performance impact
Forced flushing: to free cache pages, all I/O temporarily halted
8
Storage Hardware Factors (2)
Is cache “mirroring” involved
If so, is there a performance impact?
Are there other workloads on the storage device?
What hardware is between initiator and target?
If SAN, how many and what types of switches?
Virtualization Appliances
Some take the “LUNs” presented and virtualize those Some have onboard storage
Storage Hardware Virtualization
Virtual Disks (AKA LUN)
Comprised of a group or “chunk” of a group of physical disks, and then presented by a storage device
Possibly indicated by:
Non-standard size
Device interrogations returning storage vendor vs. drive vendor
Virtualize to consolidate
Aggregation of underlying LUNs (virtualization appliance)
Adds complexity
Troubleshooting more difficult
(example, very tough to find “hot spots”)
10
Storage Layout Factors:
Disk Configuration RAID level
ex. 1, 5, 6, 1+0, 0+1, 5+0, 0+5, 6+0 etc.
Number of physical disk drives backing
Levels of virtualization between server(s) and disks?
Any storage pool sharing involved?
Dedicated disks or shared storage pools?
What is the backup schedule for ALL connected hosts
LUN snapshots, database table scans, etc.
What decisions affected design?
Cost
Consolidation Migration
Risk
RAID types versus performance
Power and cooling Expansion
Manageability
12
Storage Layout Factors:
Design Decisions
Storage Layout Factors (2)
What happens to a storage group if a disk drive fails?
What is the performance impact?
How long to rebuild?
Data could be vulnerable during rebuild Is anyone notified of a failure?
Storage Path Factors (FC & iSCSI)
Path is usually a “mesh”
Multiple paths may be meshed, but not physically connected
Redundant paths on separate fabrics are common Multipath I/O (MPIO) software can load balance
Designed Path Capacity
Oversubscription Fan In, Fan Out
Inter-switch links (ISLs)
Intermediate devices
Core Switches (Fan-In / Fan-Out Ratio)
Routing across disparate fabrics 14
Failover Only Least Queue Depth Round Robin Weighted Paths Round Robin with Subset Least Blocks
Storage Controller Factors
Mass-Storage Controllers
Range from on-board to add-in
Some have battery backup ability in either case
Basic controllers report limited diagnostic information Advanced controllers have diagnostics available
Vendor supplied tools
Capable of sending events to operating system through extended logging
Enterprise storage may have multiple controllers with shared cache
Fibre Channel or SAS HBAs
Host-Bus Adapter (HBA)
8 Gb and 16 Gb available today SCSI command interface to OS
Often synonymous with Fibre Channel SAN Offload packet assembly and disassembly Provides OS a view into the SAN
(though most activity is abstracted by default)
Vendor provided diagnostics and performance tools No software capture tools
Multiple HBAs, or multiple-port HBAs enable Multiple Path I/O (MPIO)
Most OS have native support for MPIO
16
Ethernet Adapters
Ethernet Network Interface Card (NIC)
10 GbE
TCP/IP and Chimney Offloads Hardware parity, CRC, ECC
Converged Network Adapter (CNA)
Combines functionality of HBA and NIC Fibre Channel over Ethernet (FCoE)
CPU offloads for FCoE and iSCSI
Can present NIC, FCoE, or iSCSI function to host
Teaming software for throughput and availability
Software analyzers likely unable to capture all traffic
Latency
Rotational Disks
Millisecond latency
Sequential writing to rotational drives is the most efficient
Sequential, and/or “full-stripe”
writes to RAID disks are most efficient
Latency occurs as heads have to move position across rotating platter
Operating system logical address may be different from physical location on disk device
18
SSD
Microsecond latency
Small random writes slowest (Flash block) Flushing
Firmware
Keeps improving performance and availability
Queuing
The art of keeping the I/O pipeline populated, but not congested
Can happen at many levels
Operating system can build up thousands of I/O requests Can build up at switch ports (buffer credits)
Can build up at backend storage ports (inbound queue) Can build up in storage controllers (HBA, NIC, etc.)
I/O throttling via queue depth setting
Individual disk devices
Native command queuing (NCQ) for SATA AHCI
“Short-Stroking” to reduce latency
Forcing the use of a smaller area of a rotating disk to reduce seek distance, thus latency
Also a result of “aerial density”
Data is written more densely on outer tracks
Outer edge of disk may get 150 MB/s while inner tracks get 80 MB/s
Less latency means more IOPs
Penalty is under-utilized storage space
20
“Advanced Format” (AF) Technology
AF Refers to physical disk sector size and/or block architecture
Previous limits
Physical disk sector size: 512 bytes
Master Boot Record (MBR) structure sizes
Approximately 2 Terabytes maximum disk size
New Capabilities:
Physical sector size: <currently> 4096 bytes (4 kb) 512e is a 4 kb block presented as 512-byte block More space for error checking (CRC)
More storage space available in same or less physical space
No corresponding increase in performance capability
Partition Alignment
Previously a problem, manual steps to mitigate Current OS align by default
Check partition starting sector to confirm
Using management interface (Ex. WMI) Look for starting offset of 2048 blocks
Cannot easily change
Can automate during OS installation Affects legacy and AF drive technology
512e AF blocks can suffer from misalignment
22
Understanding the workload
Request size Burstiness
“Hot” data Concurrency
Inter-arrival time
(time of arrival from one request to the next) Locality (matters more on rotational than SSD)
Few tools can faithfully reproduce a “live” workload
Performance Counters
I/O Transfer Time (Latency)
Avg. disk sec/read Avg. disk sec/write
Queuing
Avg. disk read queue length Avg. disk write queue length
Throughput
Avg. disk bytes/read Avg. disk bytes/write
Network
Output queue length
24
Transfers / sec (IOPS)
Disk transfers/sec Disk reads/sec Disk write/sec
%Idle Time
Can be misleading
Split I/O
Fragmentation Large Requests
OS CPU
OS Memory
Performance Analysis Tools
Sampling Tools
Samples may be instantaneous or counters Good for long-term analysis
Real-time Tools
Software tracing
Kernel Drivers
Hardware tracing
Nothing abstracted
Can be difficult to see everything in between initiator and target Transport security may be a factor
– IPSEC
Vendor Provided Tools (1)
Vendor Provided Tools
Provide information about devices that may not all be reported up to OS
Provide adapter-wide performance statistics Allow for adapter test
Settings changes for tuning
Fabric Software
End-to-end visibility
Sometimes bundled with devices
Ability to easily view fabric devices, including stats Help identify “hot spots”
May require <all> device clock sync for accuracy
26
Vendor Provided Tools (2)
Vendor Provided Tools (3)
28
Some common FC error counters
Link Failure
Link down, zoning change (isolation)
Sync Loss
Can be caused by OS reboot
Signal Loss
Can be caused by OS reboot
Invalid CRC
Not normal
Primitive Sequence
Other Error Factors
iSCSI
CRC Digest
TCP/IP
CRC
Checksum
Fibre Channel
Primitive Sequence Buffer_0
ED_TOV RA_TOC
Virtualization Factors: Hosts
Measure overall workload over time
Try to provision storage to meet workload
Stripe-Unit size
Number of disks per storage pool or LUN
If latency becomes apparent, monitor queue depth
If queue depth is too low, disks may not be fully utilized
If queue depth is too high, disks might be queuing, or I/O might be delayed in transit
Adapter (FC, iSCSI, CNA, etc.)
Consult with vendor for recommendations
Queue depth
– Determine if a change is needed based on performance – Too high and could saturate link of cause stalling in transit
Onboard: Add disks, add controllers and disks, spread load
Keep up with host software updates and firmware
30
Virtualization: General
Fixed size disks for intensive performance needs Over-provisioned disks; SSD or hybrid if possible Pass-Through Disks: Very little overhead, good perf Additions/Integrations
Emulated SCSI or FC controllers may yield better perf Add additional emulated controllers with fewer disks per
Monitor memory within VM
Low free memory could lead to excessive paging or trimming
Patch guests as you would physicals:
Proactively look for and apply performance and stability
Performance Recommendations
Update software and drivers running in storage stack
Anti-Virus Firewall
Other Security File Screening HBA, CNA, NIC
Multipath (MPIO) software Teaming software
Discover all software in storage stack
Trace Tools
Remove any non-vital software in storage stack Utilize appropriate tier of storage per workload
32
Performance Recommendations (2)
Tune cache on storage controllers
Based on observed workload over time
Based on cache effectiveness counters (cache hits, etc.)
Look for hot spots
Can be hard to find
Visual trace tools may help
Symptom: Optimal storage performing poorly for no other reason
Be proactive with alerting
SMI-S SNMP
Start with a baseline, periodic snapshot
Performance Recommendations (3)
Optimize FAN-IN and/or FAN/OUT ratios
Avoid congestion points
Monitor fabric for BUFFER_O, and other errors (set alerts; automate as much as possible)
Follow best practices for iSCSI
VLAN or dedicated hardware Limit protocols in use
Limit or remove sharing
Optimize hardware per vendor recommendations
Avoid unplanned changes and track in detail if made
Snapshot before and after if possible, and keep logs
Chart all storage related tasks, look for overlap
34
Performance Recommendations (4)
Keep historical data about workload
Take traces periodically (automate if possible)
Provides for trending and lifecycle planning
Use monitoring software and keep data for a year or two Have data readily available for engineering and vendor staff
Plan the workload as much as possible
Keep charts, graphs, spreadsheets, databases
Exercise new storage layouts before production
Ask vendors for help if needed with load simulation tools Also ask for help if needed with performance tools
Simulate failure(s) in test environment Familiarize yourself with support model
36 36
Q&A / Feedback
Many thanks to the following individuals for their contributions to this tutorial.
- SNIA Education Committee Chris Lionetti,
Flavio Muratore Bruce Worthington, Joseph White, Juniper