Disk Storage Shortfall
Introduction
Many data centers have performance bottlenecks that impact application performance and service delivery to users. These bottlenecks exist across data center locations including servers (application, web, file, email and database), networks, application software, and storage systems as shown in Figure 1. Resolving performance problems is challenging and requires the analysis and understanding of complex interdependent system environments.
Server bottlenecks due to lack of CPU processing power, memory or under sized I/O interfaces can result in poor performance or in worse case scenarios application instability.
Figure 1: Data center performance bottleneck locations
contention, and lack of available storage system I/O throughput and response time.
Impact On Application Performance
These areas, in particular I/O performance bottlenecks, impact most enterprise
applications. There are many applications across different industries that are sensitive to timely data access and impacted by common datacenter performance bottlenecks. For example, as more users access a popular file, database table, or other stored data item, resource contention will increase.
One way resource contention manifests itself is in the form of database “deadlock” which translates into slower response times and lost productivity. Given the rise and popularity of internet search engines and on-line price shopping, some businesses have been forced to create expensive read-only copies of databases. These read-only copies are used to support more queries to prevent extra workloads from impacting time sensitive transaction databases.
The direct impact of data center performance bottlenecks includes:
• Additional IT staff attention to trouble shoot, analyze, re-configure and react to application delays and service disruptions
• Poor quality of service (QoS) causing missed service level agreements (SLAs)
• Premature infrastructure upgrades combined with increased management and operating costs
• Inability to meet peak and seasonal workload demands resulting in lost business opportunities
The indirect impact of data center I/O performance issues includes: • General slowing of the systems and applications
I/O Performance Metrics: Response Time And Throughput
There are two I/O main performance metrics: I/O response time and I/O throughput. I/O response time is the time it takes from initiating an I/O operation throughcompletion. I/O throughput refers to the amount of data (number of bytes) processed simultaneously. There are many applications across different industries that are
sensitive to timely data access and impacted by common I/O performance bottlenecks. For example, as more users access a popular file, database table, or other stored data item, resource contention will increase. Depending on the application profile, one of these metrics becomes more relevant and causes the main bottleneck. Applications such as databases – with many small I/O transactions – are sensitive to response time issues, while applications with large I/O operations are prone to suffer from an I/O throughput bottleneck as shown in Figure 2.
Figure 2: I/O performance metrics and impact on typical applications
Server-‐Storage Performance Gap
In the future this problem will worsen exponentially because of the Server-Storage Performance Gap. Historically, different computer system components have advanced at different relative rates. Although disk capacity has improved somewhat, disk
performance ranks at the bottom with no significant improvement compared to
million-fold boosts by other system components. Figure 3 outlines the varying growth rates between CPU and disk performance.
Figure 3: Server-‐Storage Performance Gap
For example, CPU performance has progressed at an impressive clip, driven by Moore’s law, multi-core processors, and threading technology to increase 2,000,000 times since 1987. In comparison, disk performance only improved by 11 times. The net impact is that bottlenecks associated with the server to I/O performance lapse result in lost productivity for IT personal and customers who must wait for
The Root Cause: Disk Drive Shortfall
The root cause for the server-storage performance gap is the mechanical process of accessing disk data. Moving physical parts – rotating the magnetic platter and the actuator – implies a significant delay or latency. As additional activity or application workload increases, subsequent I/O requests are put on hold, causing an I/O request queue shown in Figure 4.
Figure 4: Disk Drive Shortfall: Disk Latency and Queue Wait Time
There are two primary disk access problems, disk latency and queue wait time. 1. Disk latency: for each disk access the magnetic platter has rotate and the
actuator has to seek the requested data block
As more workload is added to a system with existing I/O issues, response time will correspondingly decrease as shown in Figure 5. The more severe the bottleneck, the faster response time will deteriorate (e.g. increase) from acceptable levels.
The Disk Drive Shortfall Creates The I/O Bottleneck
With most performance metrics more is better; however, in the case of response time or latency, less is better. Figure 6 shows the impact of additional workload resulting in I/O bottlenecks that negative impact performance by increasing response times (grey curve) above acceptable levels. The specific acceptable response time threshold will vary by application and SLA requirements. The acceptable threshold level based on performance plans, testing, SLAs and real world experience serves as a guideline between acceptable and poor application performance.
Figure 6: Response Times Compared to Throughput
As more workload is added to a system with existing I/O issues, response times correspondingly increase. The more severe the bottleneck, the faster response times will deteriorate (e.g. increase) from acceptable levels. The elimination of bottlenecks enables more work to be performed while maintaining response times below
A Makeshift Approach Is Insufficient
The various I/O performance improvement approaches to address I/O bottlenecks go from doing nothing (incur and deal with the service disruptions) to over-provisioning by throwing more hardware and software at the problem. A makeshift approach to compensate for lack of I/O performance and counter the resulting negative impact to IT users is to add more hardware to mask or move the problem.
The simple idea – to cut the I/O queue in half by adding another disk – doesn’t work, because it doesn’t change the response time which is the root cause.
However, it often leads to extra storage capacity being added to make up for a shortfall in I/O performance. By over-configuring to support peak workloads and prevent loss of business revenue, excess storage capacity must be managed
throughout the non-peak periods, adding to data center and management costs. The resulting ripple affect is that now more storage needs to be managed, including allocating storage network ports, configuring, tuning, and backing up of data.
Conclusions
The Server-Storage Performance Gap is based on the shortfall of disk drives that worsens exponentially every year. Specifically, I/O operations per unit of capacity are decreasing...a bad sign compared to other massive performance improvements in the data center.
Today however, there are many makeshift approaches based on adding more hardware or addressing bandwidth or throughput issues. These approaches do not address the Server-Storage Performance Gap but rather move and hid the bottleneck elsewhere. They do not improve applications that depend on low response times as workload including throughput increases.
Violin Memory accelerates storage and delivers real time application performance with vCACHE NFS caching. Deployed in the data center, Violin Memory vCACHE caching systems provide scalable and transparent acceleration for existing storage infrastructures to speed up applications, eliminate peak load disruptions, and simplify enterprise configurations. © 2010 Violin Memory. All rights reserved. All other trademarks and copyrights are property of their respective owners. Information provided in this paper may be subject to change. For more information, visit www.violin-memory.com
Contact Violin
Violin Memory, Inc. USA
2700 Garcia Ave, Suite 100, Mountain View, CA 94043 33 Wood Ave South, 3rd Floor, Iselin, NJ 08830
888) 9-‐VIOLIN Ext 10 or (888) 984-‐6546 Ext 10
Email: sales@violin-‐memory.com