Optimizing Virtual Infrastructure
Storage Systems with Xangati
V
irtualized infrastructures are comprised of servers, switches, storage systems and client devices. Of the four, storage systems are the most challenging to correctly size, configure and operate. That’s why IT Administrators managing large, complex virtual and cloud environments rely on Xangati to provide actionable insights around the storage systems in their environment and the entire end-to-end infrastructure.Traditional rotating media storage systems are great at large sequential access patterns such as backup/restore or bulk data storage in 128KB or 1MB blocks; but they are poor at delivering random 8KB read or write IOPS (Input/Output Operations Per Second), which hypervisors require. Flash storage delivers tremendous IOPS, at a higher cost per GB, but that sometimes moves the choke point to the SAN (Storage Area Network) itself, be it one or more Fibre Chan-nel, GbE or 10GbE links. Hybrid storage promises to offer the best of both worlds, IOPS and cost-effective storage capacity, but the algorithms that govern the caching policy from disk to flash vary in their effects. This highly-variable complexity drives the need for a tool that can allow IT Administrators to work smarter and faster at quickly and effectively identifying areas requiring attention.
The hypervisors are the customers of the storage systems. Therefore, if the IOPS, latency or throughputs are not sufficient, the next place to look is at the storage system itself. Each storage system vendor has tools to report on their metrics, but it is a separate tool that may only be for use by storage administrators.
To aid the IT administrator -- who is in charge of keeping the virtual machines, desktops and applications, running smoothly -- Xangati has implemented deep integration into NetApp storage systems in 7-mode that shows the IOPS, throughput and latency that NetApp is delivering to the hypervisors. Additionally, data on NFS or CIFS (Common Internet File System) shares, iSCSI or Fibre Channel LUNs (Logical Unit Numbers) and CPU utilization and overall network bitrates are collected.
Whereas IOPS, throughput and latency are measured at the data store (reported by the hypervisor) or at the storage system, there are some guidelines that you can use to optimize storage devices in the virtualized environment. The choke point for a storage system is on the network interfaces through which storage talks to the controllers that perform the read/write transactions, the disks’ ability to deliver I/O, and/or the flash’s ability to cache and deliver I/O. System shares, iSCSI or Fibre Channel LUNs (Logical Unit Numbers) and CPU utilization and overall network bitrates are collected.
Here Are Some Best Practices to Keep in Mind:
THROUGHPUT
Throughput (MB/s) becomes the limiting factor in backups, restores and other bulk data transfers. Large sequential accesses, like 128KB to 1MB reads or writes usually yield higher throughput than small random 8KB reads or writes. If throughput is insufficient, the bottleneck could be the controllers in the storage system or the network interfaces into the storage system. The CPU utilization metrics on the storage system’s controllers are a good indicator of possible problem area(s). If CPU utilization is low, then the network interfaces should be beefed up. For example, you may choose to either put in 10GbE or 8Gb FC, or trunk additional 1/10GbE lines.
IOPS
In a highly virtualized environment, the random 8KB read and write operations per second (IOPS) required by the hypervisors often stresses the storage system – especially if it uses rotating media. If IOPS are insufficient, it is usually not a networking issue, though it could be. Teaming additional 1GbE lines will not improve IOPS – only throughput. Consider placing some flash into the storage system, which will cache the “hot blocks” on demand, dramatically and cost-effectively increasing IOPS.
LATENCY
Latency generally has an inverse relationship to IOPS – the lower the latencies, the higher the IOPS. Latencies of 10ms are usually considered fine, but as the storage system becomes more heavily loaded or a RAID (Redundant Array of Independent Disks) rebuild is in progress, these times can increase by several seconds, which is never acceptable. Usually a flash caching layer does the trick to decrease latencies. Caching hot blocks is far better than caching entire LUNs or sub-LUNs for a hybrid storage system, as it makes better use of the flash – an expensive resource.
CONCLUSION
Xangati Difference
Xangati’s technology has been developed over a decade with over 10 patents granted or in process.
Xangati is focused on delivering service assurance to hybrid cloud workloads and applications in the following fundamentally unique ways:
• Agentless and extensible data collection from disparate compute, network, storage and application sources • Live and continuous visualization of any resource consumption and interaction with second-by-second recording
functionality
• Cross-silo dependency metrics in addition to per-silo consumptive metrics
• Self-learning and best practice driven predictive analytics for performance contention triage and remediation, and performance-conditioned capacity planning