Technology I
nsigh
t S
eries
Manageability for Big Data Storage
Evaluator Group
John Webster
March, 2012
Introduction
Attitudes toward IT storage within the corporate board room are changing. It wasn’t long ago that
corporate executives valued storage technologies that could mitigate corporate risk. The storage
environment could be made to deal with the demands of the SEC, Sarbanes‐Oxley, industry regulatory
organizations, and avaricious attorneys holding subpoenas ordering them to hand over thousands of
potentially sensitive email messages.
A new view of storage is now emerging—the Big Data view. Corporate executives have known for years
that they could create new business opportunities and respond more effectively to competitive
challenges when armed with the right information. The challenge was gathering all of the data necessary
into one place and doing something with it—not just the data sitting in the corporate electronic
warehouse. The new view is that it is now possible to use the corporate storage environment as a
repository for all the data needed to fuel new business opportunities. This view swings the view of
storage from a place for corporate data preservation to an environment for opportunity creation.
This change in attitude is being driven by accelerated growth of data captured within the data center
and an understanding that data generated outside the data center could be leveraged as well for
competitive advantage.
Accelerated data growth within the data center
Data growth within the data center now averages 50% on a compounded annual growth rate (CAGR)
basis. Acceleration will be driven by a number of factors we can identify now:
• The extent to which enterprises embrace mobile devices as computing platforms and
leverage virtual desktop infrastructure (VDI).
• The growing use of data analytics applications that add to and augment existing data
warehousing processes.
• The growth of rich media (images, audio, video) in the enterprise coming from within and
from outside the enterprise
• The growth of machine generated data (log files, sensor data, RFID, GPS, etc.)
Data outside the data center that will eventually wind up in the data center
Data generated by individuals is growing at three to five times the rate of growth within the data center,
driven largely by social media usage—Facebook, Twitter, YouTube, and other web‐based sources with
more to come. We believe that a percentage of this data, perhaps a large percentage, will eventually
wind up in the data center.
What
is
the
Path
to
Sustainable
Data
Management?
Given the drivers noted above, we anticipate data growth to double and even triple within enterprise
data center environments over what IT managers are currently experiencing. This growth will come
within the next three to five years. The rate of growth will vary from one data center to the next
external (public) cloud resources are used. Nevertheless, we believe that all enterprise data center
managers should prepare, if they have not already done so, for a new data wave—the phenomenon that
is now commonly referred to as “Big Data.”
Traditional data center RAID‐based storage architectures and data protection regimes will likely not be
able to effectively and efficiently sustain these anticipated growth rates. New architectures featuring
virtualization, platform federation, forward error correction coding, and scale‐out growth capabilities
will be needed. Here we look at EMC Isilon’s OneFS‐based storage platform as a new storage
architecture that exemplifies the use of scale‐out NAS as a way to put the data center on a sustainable
path toward management of the coming Big Data wave.
Storage
Efficiency
Redefined
Enterprise IT managers typically define efficiency in terms of budgetary impact. Efficiency equates to
operational and capital expense factors. An efficient storage array for example is one in which the
capital investment is maximized and operational management requirements are minimized on the basis
of cost per units of storage—dollars per TB for example. However, we believe that coming growth will
require an expanded view of storage efficiency. This view includes:
• Creating a storage environment where a petabyte of storage can be managed by a single
administrator
• Using scale‐out storage architectures to manage capacity growth while maintaining application
performance
• Rebuilding data protection processes that can actually protect a Big Data environment
Redefining management efficiency
It was once common to think of storage management efficiency in terms of TB/administrator or stated
otherwise; how many TBs can a single administrator effectively manage? In a petabyte world however,
this measure becomes meaningless. The only way IT can manage data at petabyte scale is to automate
i.e. shift as much of the management burden as possible from the shoulders of IT administrators to
management intelligence embedded within the storage environment. Management tasks that can now
be automated by scale‐out storage systems include:
• Rebalancing internal workloads to accomplish storage system performance tuning
• Generation of both virtual (snapshot) and physical (clone) data copies for VM and VDI
deployment, data protection and disaster recovery purposes
• Content distribution
• Provisioning of storage capacity and storage‐related services
• Meeting internal governance and external compliance requirements through automated data
identification and archiving processes
Using a scaleout storage platform to minimize forklift replacement
The useful life of the storage system should also become a measure of management sustainability.
Replacement of a storage array can be prompted by a number of factors, some under the control of IT,
whole forcing IT to respond. An inefficient and increasingly unsustainable way to manage the kind of
growth that we foresee is to replace storage systems as they run out of capacity. Efficiency as measured
on the basis of cost per unit of capacity is lowered by virtue of the fact that the costs associated with
data migration from the old to the new system must now be taken into consideration in addition to the
cost of the replacement system. Data migrations are costly in terms of system resources, staff resources,
and exposure to disruption and error. They should therefore be minimized through the use of storage
platforms that offer significant single‐image scalability, the ability to perform upgrades via software
revision, and the non‐disruptive replacement of hardware modules rather than entire storage systems.
Sustaining performance while sustaining growth
It is important to understand performance as defined by measures such as access density (I/O’s per
second, per GB) and bandwidth (data transferred in MB’s per second) when evaluating a scale‐out
storage platform. However, another important measure, consistent with the idea of management
sustainability, is that of performance as the system is scaled upward to meet capacity demands as well
as the demands emanating from the application environment. When evaluating a scale‐out system, one
must determine the degree to which capacity and performance scale linearly. Scaling capacity upward
over time cannot result in performance degradation as experienced by the application user.
Confronting rising energy costs
As demands for storage gather increasing momentum, so will the rising cost of energy play an increasing
role in storage‐related decision making and platform selection. The need to reduce or at least hold the
line on the cost of energy per unit of storage will be seen by IT administrators an increasingly important
aspect of the IT operational budgeting process. Therefore, the ability to add capacity to a single image
storage platform without duplicating ancillary, power consuming hardware (power supplies, cooling
fans, etc.) will become more desirable as time goes on.
Making the provisioning process more efficient
The need to provision additional storage resources quickly will become more critical over the next few
years. This will be particularly true when supporting virtual desktops, mobile device users, and users of
data analytics applications. While difficult to quantify objectively, there are indeed costs associated with
provisioning that go beyond the cost of the additional storage capacity. These include costs resulting
from administrative management time and business opportunity lost due to delay caused by approval
cycles and other administrative red tape. The process of capacity planning and provisioning should be
streamlined to the greatest extent possible to enable sustainable capacity management in an
accelerated growth environment.
Impact of compression, deduplication and thin provisioning
It is important to note that the implementation of technologies such as compression, deduplication and
thin provisioning at the storage array level will be critical to managing accelerated data growth. In
addition, they will have a decided impact on a commonly used measure of storage array efficiency:
capacity utilization as defined by the ratio of usable capacity to raw capacity. In this case, data
compression, deduplication and thin provisioning can dramatically increase useable capacity vs. that
which IT administrators are used to seeing from traditional storage architectures that normally have
While we are not saying that these features make this particular measure of efficiency invalid, we are
saying that the effects of compression, deduplication and thin provisioning must now be considered,
both at the time of product comparison and selection, and after the storage system has been use for a
period of time. Implementing them will also help put storage administration on a path to sustainability.
Sustainable
Management
using
EMC
Isilon
Scale
Out
NAS
Prior to its acquisition by EMC, Isilon had established a data‐center production quality, scale‐out NAS
platform that conforms to many of the requirements for sustainable storage management noted above.
Since that time, revenues generated from the sale of Isilon OneFS‐based storage platforms have more
than doubled.
Here we compare the attributes of Isilon storage platforms to our requirements for sustainable data and
storage management. We have included the results from two of the Isilon customer interviews that
were conducted to validate our findings.
Isilon
Attributes
Singlevolume file system presented by multiple nodes
Isilon’s scale‐out storage implementation differs from a number of other competing architectures in the
way scalability is brought forward. We have seen NAS systems where scaling is implemented through
the addition of fixed‐scale file systems that are aggregated together around a single name space. With
Isilon, both the file system—One FS along with its single volume—and the namespace are allowed to
scale together.
OneFS allows the IT administrator to add capacity to a single file system and volume by adding nodes.
Each node consists of processing power and memory, network I/O bandwidth, and storage capacity.
Therefore with each additional node, more processing, I/O bandwidth, globally coherent cache and
storage resources are added to the storage cluster.
Symmetric Multiparallelism and AutoBalance
OneFS uses a symmetric, multi‐parallel processing technology to stripe data across all participating
storage nodes in the cluster. Any of the processors in any of the OneFS nodes can be applied to work on
any I/O task regardless of where that data is located—in cluster cache or disk storage. AutoBalance
moves task processing and data among cluster nodes to maximize application performance. As a result
of this combination of technologies, capacity, I/O and storage system throughput scale out and are
automatically and persistently balanced as more nodes are added to the cluster.
FlexProtect rather than RAID
We note that, as storage systems scale to meet the demands we have outlined above, the use of RAID
groups to manage through disk failure events is becoming increasing less tenable. As individual drives
with multiple TB capacities are used as building blocks to scale overall system capacity, it becomes
increasingly more difficult to rebuild a failed drive while maintaining storage system availability. Yes, a
failed drive can be rebuilt non‐disruptively (albeit in degraded performance mode), but the system could
failed drive. This exposure could exist for hours and even days depending on how quickly the failed drive
rebuild process achieves completion. The period of data loss exposure only elongates with the use of
denser, higher capacity drives. As a result, more scalable alternatives to RAID are now emerging within
enterprise data centers.
FlexProtect is an implementation of Reed‐Solomon forward error correction encoding that has been
applied to the Isilon OneFS multi‐node storage architecture. FlexProtect operates across all nodes in a
OneFS cluster, no matter how small or large the cluster may be. It provides n‐way protection across the
Isilon’s redundant internal communication fabric that interconnects distributed storage nodes and
scales in its ability to provide data protection as nodes are added to the cluster.
With OneFS, a cluster can be composed of multiple storage pools creating failure zones. FlexProtect
offers protection for up to four simultaneous node failures per pool. Failure modes include anything
from complete node failure down to individual disk drive failure. Data access is maintained during failure
recovery.
Recovery from failure is also a decidedly different process from that of traditional RAID. Protection
information for each file is stored independently of file data that is striped and dispersed among all
OneFS nodes participating in the cluster (see section above). When file reconstruction is required,
FlexProtect identifies the parts of a file that are affected by the failure and reconstructs them using the
distributed protection information. This form of data reconstruction differs from traditional RAID in the
following ways:
1. Multiple node processors and tens to hundreds disk spindles participate simultaneously in the
rebuild of a file rather than confining rebuild to a single RAID group or ten to twenty drives or a
controller pair.
2. Reconstruction occurs within the free, distributed storage space available in the cluster rather
than within a single RAID group that is working to rebuild a single, multiple TB drive. Isilon uses a
"virtual hot spare.” An IT administrator can choose to reserve space for multiple drive failures
and/or node failures. The system then makes these reservations automatically across the nodes
i.e. not a single drive’s worth of space on a single node.
Current Isilon users we interviewed (see below) admitted they were a bit skeptical about replacing an
established and understood RAID environment with Isilon’s FlexProtect. One reported that he had to
work to win‐over senior IT management before he could replace his SAN environment. However, these
users now report that they are more confident of the data protection capabilities of forward error
correction coding than they were of traditional RAID and appreciate the ability to scale capacity upward
while maintaining performance and data integrity.
A single, scalable, logical volume
As its name implies, the Isilon OneFS storage cluster presents a single, scalable storage volume to the
application environment. This differs significantly from scale‐out storage cluster architectures that
aggregate multiple fixed‐volume storage units under a single namespace. Overall storage efficiency is
reduced as, over time, some fixed volumes will grow to their capacity limitations while others will
cognizant of the problems this imbalance creates and to make ongoing corrections in order to rebalance
the cluster.
OneFS frees the storage administrator from having to track fixed storage volume growth and capacity
limitations on an application by application basis, a management process that engenders multiple
problem areas:
The process of managing across fixed volumes is manual and error‐prone, and therefore must
be assigned to experienced IT staff members.
The movement of data from one volume that is outgrowing its useable capacity to another one
with available capacity is often involved in order to balance workload performance. Again, this
process opens up opportunities for errors to occur which could have unexpected consequences
or result in system outages.
Storage and Data Management Applications under a Single Management GUI
A single management GUI is used to control the entire cluster as it scales into the petabyte range in
capacity. It is also the interface to OneFS‐based storage and data management applications—a number
of which have been called out as significantly valuable by Isilon users we have interviewed:
SmartPools – supports the creation of multiple tiers (pools) of disk storage within a single Isilon OneFS
file system that vary by performance and storage density characteristics. As an example, highly active
data sets can reside on SAS or solid state disk when required for maximum performance while inactive
data can be moved to high density disk. Automated data movement is policy‐based (user defined) and
occurs within the confines of the single namespace file system. No links among the tiers or “stubs” are
required.
SmartQuotas – allows IT administrators to partition an entire Isilon cluster into managed quota‐related
segments. All quota segments can be thinly provisioned and managed from a single management
interface. Quote segments can be assigned to specific users and user groups with each segment having
its own provisioning policies.
SnapshotIQ – creates locally retained, read‐only data snapshots. OneFS supports an unlimited number of
snapshots within a single cluster and up to 1,024 snapshots within a single directory. Snapshots are
updateable using only changes to blocks from the originating file. Snapshots can be used for data
protection (see user interview #2 below) and can be managed on a cluster‐wide basis from a single
management interface.
InsightIQ – implements analytics processes aimed at the Isilon cluster and attached IT resources. Using
IsightIQ, an IT administrator can:
• Identify data set growth and forecast additional capacity requirements on a file type basis
• Diagnose real time events as well as historic events and identify performance bottlenecks
• Establish past performance trends in order to predict future results from configuration changes
• Track file access patterns and identify “heavy hitters”
SyncIQ – schedules and creates storage replication jobs between physically separate Isilon storage
clusters over LAN and WAN communications links. The IT administrator can set replication policies at the
cluster, directory, or file levels. Jobs can be run on‐demand under administrative control or scheduled
for a future time when the cost of communications bandwidth is reduced. Replication jobs can
parallelized for performance and evenly distributed across cluster nodes. SyncIQ use cases include
disaster recovery, disk‐to‐disk backup, and content distribution.
SmartConnect – automatically manages host connection load balancing across storage nodes to
optimize performance. SmartConnect also provides the dynamic NFS failover and failback of host
connections without the use of application host‐side drivers.
SmartLock – adds WORM file capability to Isilon OneFS. Write‐protected data is stored along with other
data types allowing the IT administrator to apply the same storage and data management capabilities of
OneFS including tiering and data protection to WORM files.
User
Experiences
We spoke to two IT systems administrators with hands‐on experience with Isilon and OneFS. Both
requested anonymity but were nevertheless comfortable with reporting in detail their experiences to us.
Healthcare
Insurance
Processing
We spoke to a manager of engineering and system administration for a company that does back office
processing for healthcare insurers. This company touches nearly half of the all of the healthcare
insurance claims processed annually in the US.
In this company’s data center, IT supports 3,000 users of many small applications that have been
acquired through the years via the acquisition of smaller services providers. The Isilon system,
purchased two and one half years ago, now supports 400 physical servers and 800 virtual servers, 300 of
which are in production. Storage capacity is approaching 1 PB and is growing at a nominal rate of 200%
per year. As this manager says, “We have an insatiable appetite for storage. Users always want more
space.”
It is also interesting to note that this company’s IT infrastructure has been supporting virtual desktops
for more than six years. Currently the number of virtual desktops exceeds 1000 and is growing.
Prior to Isilon, all server storage consisted of directly attached RAID arrays (DAS). The most dramatic
result from the consolidation of DAS to Isilon was a drastic reduction in the amount of staff time
devoted to storage administration. In the DAS environment, capacity provisioning and file cleanup were
the two most time consuming tasks. In addition, recovering from data loss due to drive failures and
other events was a regular occurrence.
The size of the Isilon environment was tripled last year and will likely double again in 2012 due to
company acquisitions. This user had previous experience with a SAN environment and reported that
Isilon was much easier to manage due to the fact that it presents a single file system to the application
host environment. The entire near‐PB system can be managed by one administrator on a part‐time
Isilon’s quota management application (SmartQuotas‐see above) was singled out as being particularly
useful. Business and IT users are given a storage quota that they manage individually, saving the storage
administrator a number of manual provisioning and clean‐up tasks. Isilon’s Insight IQ was also being
used effectively to reduce the time required to managing the cluster and maintain performance levels.
Book
Publishing
We interviewed the systems manager of a large book publisher with twenty seven data centers spread
globally. In this case, the data center was located in the US. This company publishes 600‐800 book titles
annually.
A single Isilon image supports over 250 TB of mostly unstructured data including rich media files. The
multi‐host environment is 100% virtualized. The predominant application is book design and pre‐
publishing with users on a mix of Windows and Apple desktops.
Isilon scale‐out NAS has been installed for approximately one year. It replaced a Pillar‐based SAN
environment that was growing at 50‐70% per year. This growth rate continued over the past year.
Virtual desktops are now being deployed. So far, no performance issues have been encountered. In fact,
response time for virtual desktops is much improved over early VDI deployments on the previous SAN
infrastructure that was more expensive on a cost per GB basis.
During the interview, two major improvements over the previous storage environment were noted:
1. Isilon allowed this data center’s data protection process to be completely revamped. The
previous generation infrastructure (backup apps and servers with backups to tape) has been
replaced by the snapshots taken every five minutes locally and file replication every hour using
Isilon SyncIQ to one of this publisher’s remote data centers for off‐site disaster recovery
capability. The previous backup environment took up approximately 50% of this data center’s
operating budget. That expense has been eliminated. Recovery time under a disaster recovery
scenario has gone from five days to minutes.
2. Data and storage management processes have been greatly simplified. SAN storage within a
majority of the publisher’s more than twenty data centers have been replaced by Isilon.
Whereas with the SAN environment, one person within each data center was assigned SAN
management responsibilities, now only one person is required to manage all of the current
Isilon systems. In addition, storage efficiency has been greatly improved. Storage utilization has
gone from 25% to 60% with room to grow for another year under the current configuration.
Terabytes of wasted capacity were reclaimed in the conversion from SAN to Isilon.
Isilon is now the global storage standard for this publisher’s data centers.
Conclusion
Some major findings stood out from our interviews with current Isilon users. First, the ability to manage
a large scale Isilon environment with a minimum of hands‐on administrative effort was most notable.
“part‐time” job. The book publishing IT administrator believed that a single administrator could manage
twenty seven Isilon instances distributed world‐wide, once all of the Isilon systems they were in the
process of acquiring were installed. We believe that the high degree of management efficiency
demonstrated by these users is a direct result of the OneFS architecture that scales into the petabyte
range as a single, scalable storage volume.
Second is the versatility of the Isilon platform. One environment supported virtual desktop users—a very
demanding environment for storage. As reported, the Isilon platform offered an improvement in
performance for these users over the Pillar‐based SAN it replaced. The other was heavily dedicated to
unstructured file storage services. However, in both cases, performance scaled linearly along with
capacity. We believe this linear scaling capability results from Isilon’s Symmetric Multi‐parallelism and
AutoBalance features.
Finally, it is interesting to note that while there were initial reservations expressed toward replacing well
known and understood RAID architectures with Isilon’s FlexProtect implementation of forward error
correction encoding for array‐level data protection, those reservations have been dismissed. As noted
by one user, the level of data protection under FlexProtect is actually an improvement over the RAID‐
based storage in use prior to replacement.
As we have noted, a new attitude—the Big Data attitude—is now emerging from corporate board
rooms. Data now being generated by users interacting with the web and interacting with each other
using mobile devices can be leveraged to create new business opportunities and enhance our daily lives
when applied to healthcare and governmental services. Big Data processing will require IT to stand up a
sustainable storage environment. We believe that EMC Isilon demonstrates the required attributes for
Big Data storage management sustainability.
About Evaluator Group
Evaluator Group Inc. is dedicated to helping IT professionals and vendors create and implement strategies that make the most of
the value of their storage and digital information. Evaluator Group services deliver in‐depth, unbiased analysis on storage
architectures, infrastructures and management for IT professionals. Since 1997 Evaluator Group has provided services for
thousands of end users and vendor professionals through product and market evaluations, competitive analysis and education.
www.evaluatorgroup.com Follow us on Twitter @evaluator_group
####