Virtual testing environment - }w!"#$%&'()+,-./012345

To simulate performance in real deployment, some tests were run in VM with following configuration:

CPU: 1x kvm64 CPU on KVM enabled host

Memory: 2048 MB

System: Fedora 18 x86 64 freshly booted from LiveCD

Disk: attached using virtio driver with disabled caching of data

The tests were conducted with SELinux in enforcing mode. Tests in VMs were run from non-graphical terminal to avoid performance drop caused by graphics emulation.

Chapter 6 Storage performance testing

For comparable performance we have chosen hard drives with similar physical characteristics such as size, linear read and write speed. Comparison of these values was done by a simple test of 2000 reading and writing of 1 MB blocks to disk. The test was conducted using dd program with direct flag set for ensuring that page cache and disk cache are avoided for both reading and writing. The difference between disks was less than 3% during whole testing.

Because most of DFSs use some existing underlying file system on disk we have also tested performance of two file systems (FSs). One was ext4, which is FS widely used as default FS on many Linux distributions and represents conservative FS approach. Second is btrfs, which is FS that gathers lots of pop- ularity by re-making way how modern FSs works. Both of FSs were tested using IOzone File system Benchmark1_{. All raw results of following tests can be found}

as attachment of this work. Test were ran in a way that page and disk cache is avoided. For our work we consider following results from IOzone relevant as they represent extremes which are crucial for wide range of deployments of DFS:

– Linear read performance – Random read performance – Linear write performance – Random write performance

On both file systems, ext4 and btrfs, we ran the test two times, once with 200 MB and once with 400 MB testing file. As a record size we have tested power of 2 from 128 bytes to 8096 bytes (8kB). As relevant record sizes we consider of 512 bytes, 1kB, 4kB and 8kB. While 512 bytes and 4kB are common disk block sizes, 1kB and 8kB are their multiplies. Results of IOzone benchmark on these two FSs show the figure 6.1 and figure 6.2. Please notice that these graphs are scaled to show small differences between FSs.

Figure 6.1: Linear read/write performance of ext4 and btrfs on physical disks

Figure 6.2: Random read/write performance of ext4 and btrfs on physical disks

6.1 DFS capabilities requirements

For our deployment we decided on following requirements for DFS:

hardware independent — It’s essential that DFS would be independent of hardware on which it’s run as most of it would be commodity HW with possi- bly different configurations

distribution independent — There must be at least 2 officially supported Linux distributions with this file system support

redundancy of data — each piece of user data must be stored at least at two physical nodes to provide redundancy and fault tolerance

random read/write performance — DFS should be able to provide high performance in random reads and writes of small blocks (512 bytes, 1 kB, 4kB, 8kB)

on-line configuration — DFS should provide ability to change number of data nodes without need to shut down whole DFS

self-healing — DFS should be able to heal itself after node failure. Data with reduced redundancy must be always available. It’s not feasible to deny access to data because of undergoing replication.

6. STORAGE PERFORMANCE TESTING maintenance on multiple data nodes at once without great impact on performance of DFS = DFS should not re-balance abruptly

6.2 DRBD

Before moving to ”truly distributed” DFS, we have tested performance of DRBD. This was done to compare it to performance of other DFS. DRBD is ar- chitected to be most efficient in scenario of disk mirroring between two nodes. We expected write and read performance to be very close to native performance of physical disks as if they were stand alone.

In test results we were interested in amount of overhead which brings this networked file system as there is need for synchronization between nodes over network. We also wanted to compare this performance on physical (bare metal) and virtual machine (VM). Results of these tests can be seen in figure 6.3 and figure 6.4. As we can see the overhead of virtualization layer and its impact on I/O (input/output) performance of disks is marginal.

Figure 6.3: Linear read/write performance of ext4 and btrfs on DRBD device in virtualized (virt) and physical (bare) environment

Figure 6.4: Random read/write performance of ext4 and btrfs on DRBD device in virtualized (virt) and physical (bare) environment

6.3 GlusterFS

First of tested DFSs, which meets most of our requirements, was GlusterFS. Performance tests were conducted using version 3.4-alpha which was stable enough for the testing. In general, it was very easy to start using GlusterFS as it has rapid deployment ability, which only requires few commands to have DFS up and running. There was no need to mess around configuration files, everything was configured via command line interface (CLI). Since the version 3.2 the CLI is only supported way of configuring GlusterFS which makes some manuals around Internet useless.

We tested GlusterFS in configuration with distributed, replicated and striped volume to meet requirements of high performance and redundancy. Exact commands used for creating volume can be found in attachment of this work. Be- cause GlusterFS doesn’t provide block device support yet, we had to use files to simulate block device instead. So the virtual disk for VM was in fact RAW non- preallocated image file. Graphs bellow shows the performance of GlusterFS attached to VM as storage depending on FS used. In figure 6.5 are tests of linear read and write operations. In figure 6.6 are results for random read and write.

Figure 6.5: Linear read/write performance of ext4 and btrfs on GlusterFS in virtualized environment

From the graphs can be seen that, there is a measurable difference in read and write performance of GlusterFS depending on underlying FS and FS used in VM. But in most scenarios this difference is too small to bring some benefits.

6. STORAGE PERFORMANCE TESTING

Figure 6.6: Random read/write performance of ext4 and btrfs on GlusterFS in virtualized environment

Due to the lack of block device support the previous tests might be discrimi- nating compared to other DFSs that have this functionality. Work on implement- ing this functionality is planned for the final version 3.4 of GlusterFS. However it would be restricted to cluster with one node only which is still not accept- able for deployment in distributed environment. We believe that even despite of that, the performance of block device over DFS would be in that case better than compared to image file over DFS as we tested now.

Another downside we encountered was limited ability to control fail domain. This is very essential thing in planning the outage of several data nodes for maintenance. This is somehow possible in case of replicated volumes where ex- ists concept named Replica-Set. Replica-Set holds list of nodes which can be safely taken down without disruption of whole DFS. However this list of nodes is de- fined statically when volume is created or extended. GlusterFS doesn’t allow dynamic change, which might be needed in some cases. In case we want to turn off machines in one rack that are not in same Replica-Set, there is no possibility to do that or to change it without taking down the DFS.

6.4 CEPH

Second tested DFS that we have chosen was CEPH in version 0.56. Deploy- ment was not so straightforward as in previous DFS and required several stages of file copying over network to establish working cluster. Learning curve to figure out the right configuration was a bit steep.

The tests here used RBD block device provided by CEPH, which was stripped and replicated over the cluster to meet requirements. Block device was attached to system via Linux kernel module, which should bring better performance in

some cases but doesn’t support some features such as copy-on-write clones. Scripts used to configure testing environment can be founded in attachment of this work. Graphs in figure 6.7 and figure 6.8 shows performance of CEPH RBD device used by VM as storage device. Interesting is good read performance of ext4 used in VM.

Figure 6.7: Linear read/write performance of ext4 and btrfs of CEPH RBD in virtualized environment

Figure 6.8: Linear read/write performance of ext4 and btrfs of CEPH RBD in virtualized environment

Despite quite complex configuration CEPH gives us more options that we can use. For example one configuration, which is separated from main configuration file, is CRUSH map. This map describes how the cluster looks and allows us to create logical view over it. So we can organize OSDs into racks, storage rooms, datacenters or anything that we come to into hierarchical structure. But CRUSH map is not about only organizing things but it’s about the weight of each part of tree hierarchy in determining where to place data. This can be es- pecially useful when we need to change physical topology of cluster and then we want to rebalance the data distribution in the cluster. Also it can be used for maintenance when we can tag some parts of tree to be offline for a while but we

6. STORAGE PERFORMANCE TESTING don’t want to rebalance data that are stored there. This is case when we want to just reboot one rack of computers for maintenance.

In document }w!"#$%&'()+,-./012345<ya (Page 31-38)