Overview
Introduction This study examines the performance characteristics of the following block protocols that VNX platforms support in a 10 Gb environment:
FCoE iSCSI
In a 10 Gb FCoE environment, there are two ways to connect to the storage array.
FCoE capable switches generally include native Fibre Channel ports. They can be connected to traditional Fibre Channel SANs, or directly to EMC FCoE interfaces on the VNX array. Both methods are examined in this chapter.
Contents This chapter contains the following topics:
Topic See Page
Overview 46
Testing tools 47
Methodology 47
Test results summary 48
Results analysis 49
Testing tools
Introduction For protocol testing it is not required to examine the complete SQL Server
environment because the workload characteristics are constant until they reach the storage layer. Hence, the Microsoft SQLIO load generation tool was used instead of a database transaction load.
SQLIO SQLIO is a disk subsystem benchmark tool provided by Microsoft. This tool can be used to determine the I/O performance of a given configuration. The purpose of SQLIO is to test a variety of I/O types and sizes, and then determine the capabilities of an I/O subsystem. SQLIO is not used to simulate the I/O pattern of SQL Server.
The tool generates results in terms of I/Os per second (IOPS), bandwidth (MB/s), and latency (ms).
Methodology
Introduction This section explains the testing methodology used.
Testing methodology
The following three connection methods with 10 Gb Ethernet links on the Cisco Nexus 5020 switch were examined:
10 Gb iSCSI
10 Gb FCoE (FCoE SLIC on VNX)
10 Gb FCoE from host to switch and 8 Gb FC from switch to VNX Two test loads were defined—one to examine the bandwidth and the other to achieve the maximum throughput on the link. In all cases, the dataset was defined so that it remained in the SP cache and was limited by the network interconnect.
Maximum bandwidth test
Sequential read I/Os of 64 KB was used to determine the maximum bandwidth (MB/s) achieved with the link. From a SQL Server perspective, this type of workload helps to manage large table scans or backup activity.
Maximum throughput test
Random read I/Os of 8 KB was used to determine the maximum throughput (IOPS) with the link. In a disk-bound scenario, higher IOPS is achieved by creating
sequential I/O, but in a cache-resident dataset, this impact is minimized. This type of I/O helps to manage OLTP environments, which are dominated by random read activity.
Test results summary
Storage Setup Twenty six SAS drives (15k rpm) configured in a (13+13) RAID10 storage pool were used for all the three scenarios. Two fully provisioned LUNs were created (one LUN for each SP), and were made available to the host through the test network.
The VNX5700 has a factory setting of 512 MB read cache for each SP. The test file size for each LUN was 400 MB to fit the entire workload within the cache.
Network Setup The server and storage configuration was common for all scenarios. They differed in the storage network configuration.
The following figure shows a test scenario with a 10 Gb iSCSI network configuration.
The following figure shows a test scenario with a network configuration of 10 Gb FCoE from host to switch and 8 Gb FC from switch to VNX.
The following figure shows a test scenario with a 10 Gb FCoE (FCoE SLIC on VNX) network configuration.
Result analysis
Testing results The results clearly show that further testing needs to be done to fully understand the performance maximums for this environment. The results are achieved with the default options in Windows and the storage array.
The following graph shows the maximum throughput results of the three different connection methods that supported IOPS loads.
All the three links were expected to perform at approximately the same level based on the limiting factor in the array. The test results clearly indicate that the network connection influenced the limit. Further testing is required to determine if the limit can be increased.
For the bandwidth tests, a similar pattern was observed. The following graph shows that maximum bandwidth is achieved for all three connections.
However, the maximum network utilization was different in each case. The following graph shows the network utilization for each connection.
The initial iSCSI test scenario yielded very poor results. The test achieved only 26 percent network utilization. After investigating this behavior, it was possible to improve the network utilization to 93 percent by modifying the network parameters:
1. The initial configuration used two active ports and normal sized frames.
However, by enabling Jumbo Frames the network utilization was improved to 42 percent.
2. After using a single active port, the utilization was improved by 73 percent. This improvement is less than expected due to the change in theoretical max and
indicates that the second link was active.
3. A hardware iSCSI was used to achieve 89 percent network utilization for a 10 GbE environment. However, the same type of hardware in 1 GbE context showed no significant value.
4. Finally, Jumbo Frames were used on the hardware iSCSI to achieve 93 percent utilization.
The following graph shows the network utilization for all the iSCSI scenarios mentioned earlier.
Based on these test results, there are changes in best practices between 1 GbE and 10 GbE environments.
Conclusions All three protocols can generate good performance
The test scenarios documented here are specifically designed to achieve the highest performance from a connection. All three protocols are capable of generating good performance in a wide variety of workloads; the differences discussed occur at the edge of the performance envelope.
Enable Jumbo Frames for high-bandwidth applications
The results indicate that Jumbo Frames enhance the link utilization. Jumbo Frames must be enabled at every connection level—servers, switches, and arrays to have an appropriate impact.
Use iSCSI hardware initiators for maximum performance
The use of iSCSI hardware initiators yielded significant performance improvement for 10 GbE networks environment when compared to a 1 GbE environment.
These test scenarios are not considered to be fully optimized and efforts to expand the performance envelope are ongoing.