Technology Insight Series

(1)

Technology I

nsigh

t S

eries

Manageability for Big Data Storage

Evaluator Group

John Webster

March, 2012

(2)

(3)

Introduction

Attitudes toward IT storage within the corporate board room are changing. It wasn’t long ago that

corporate executives valued storage technologies that could mitigate corporate risk. The storage

environment could be made to deal with the demands of the SEC, Sarbanes‐Oxley, industry regulatory

organizations, and avaricious attorneys holding subpoenas ordering them to hand over thousands of

potentially sensitive email messages.

A new view of storage is now emerging—the Big Data view. Corporate executives have known for years

that they could create new business opportunities and respond more effectively to competitive

challenges when armed with the right information. The challenge was gathering all of the data necessary

into one place and doing something with it—not just the data sitting in the corporate electronic

warehouse. The new view is that it is now possible to use the corporate storage environment as a

repository for all the data needed to fuel new business opportunities. This view swings the view of

storage from a place for corporate data preservation to an environment for opportunity creation.

This change in attitude is being driven by accelerated growth of data captured within the data center

and an understanding that data generated outside the data center could be leveraged as well for

competitive advantage.

Accelerated data growth within the data center

Data growth within the data center now averages 50% on a compounded annual growth rate (CAGR)

basis. Acceleration will be driven by a number of factors we can identify now:

• The extent to which enterprises embrace mobile devices as computing platforms and

leverage virtual desktop infrastructure (VDI).

• The growing use of data analytics applications that add to and augment existing data

warehousing processes.

• The growth of rich media (images, audio, video) in the enterprise coming from within and

from outside the enterprise

• The growth of machine generated data (log files, sensor data, RFID, GPS, etc.)

Data outside the data center that will eventually wind up in the data center

Data generated by individuals is growing at three to five times the rate of growth within the data center,

driven largely by social media usage—Facebook, Twitter, YouTube, and other web‐based sources with

more to come. We believe that a percentage of this data, perhaps a large percentage, will eventually

wind up in the data center.

What

is

the

Path

to

Sustainable

Data

Management?

Given the drivers noted above, we anticipate data growth to double and even triple within enterprise

data center environments over what IT managers are currently experiencing. This growth will come

within the next three to five years. The rate of growth will vary from one data center to the next

(4)

external (public) cloud resources are used. Nevertheless, we believe that all enterprise data center

managers should prepare, if they have not already done so, for a new data wave—the phenomenon that

is now commonly referred to as “Big Data.”

Traditional data center RAID‐based storage architectures and data protection regimes will likely not be

able to effectively and efficiently sustain these anticipated growth rates. New architectures featuring

virtualization, platform federation, forward error correction coding, and scale‐out growth capabilities

will be needed. Here we look at EMC Isilon’s OneFS‐based storage platform as a new storage

architecture that exemplifies the use of scale‐out NAS as a way to put the data center on a sustainable

path toward management of the coming Big Data wave.

Storage

Efficiency

Redefined

Enterprise IT managers typically define efficiency in terms of budgetary impact. Efficiency equates to

operational and capital expense factors. An efficient storage array for example is one in which the

capital investment is maximized and operational management requirements are minimized on the basis

of cost per units of storage—dollars per TB for example. However, we believe that coming growth will

require an expanded view of storage efficiency. This view includes:

• Creating a storage environment where a petabyte of storage can be managed by a single

administrator

• Using scale‐out storage architectures to manage capacity growth while maintaining application

performance

• Rebuilding data protection processes that can actually protect a Big Data environment

Redefining management efficiency

It was once common to think of storage management efficiency in terms of TB/administrator or stated

otherwise; how many TBs can a single administrator effectively manage? In a petabyte world however,

this measure becomes meaningless. The only way IT can manage data at petabyte scale is to automate

i.e. shift as much of the management burden as possible from the shoulders of IT administrators to

management intelligence embedded within the storage environment. Management tasks that can now

be automated by scale‐out storage systems include:

• Rebalancing internal workloads to accomplish storage system performance tuning

• Generation of both virtual (snapshot) and physical (clone) data copies for VM and VDI

deployment, data protection and disaster recovery purposes

• Content distribution

• Provisioning of storage capacity and storage‐related services

• Meeting internal governance and external compliance requirements through automated data

identification and archiving processes

Using a scaleout storage platform to minimize forklift replacement

The useful life of the storage system should also become a measure of management sustainability.

Replacement of a storage array can be prompted by a number of factors, some under the control of IT,

(5)

whole forcing IT to respond. An inefficient and increasingly unsustainable way to manage the kind of

growth that we foresee is to replace storage systems as they run out of capacity. Efficiency as measured

on the basis of cost per unit of capacity is lowered by virtue of the fact that the costs associated with

data migration from the old to the new system must now be taken into consideration in addition to the

cost of the replacement system. Data migrations are costly in terms of system resources, staff resources,

and exposure to disruption and error. They should therefore be minimized through the use of storage

platforms that offer significant single‐image scalability, the ability to perform upgrades via software

revision, and the non‐disruptive replacement of hardware modules rather than entire storage systems.

Sustaining performance while sustaining growth

It is important to understand performance as defined by measures such as access density (I/O’s per

second, per GB) and bandwidth (data transferred in MB’s per second) when evaluating a scale‐out

storage platform. However, another important measure, consistent with the idea of management

sustainability, is that of performance as the system is scaled upward to meet capacity demands as well

as the demands emanating from the application environment. When evaluating a scale‐out system, one

must determine the degree to which capacity and performance scale linearly. Scaling capacity upward

over time cannot result in performance degradation as experienced by the application user.

Confronting rising energy costs

As demands for storage gather increasing momentum, so will the rising cost of energy play an increasing

role in storage‐related decision making and platform selection. The need to reduce or at least hold the

line on the cost of energy per unit of storage will be seen by IT administrators an increasingly important

aspect of the IT operational budgeting process. Therefore, the ability to add capacity to a single image

storage platform without duplicating ancillary, power consuming hardware (power supplies, cooling

fans, etc.) will become more desirable as time goes on.

Making the provisioning process more efficient

The need to provision additional storage resources quickly will become more critical over the next few

years. This will be particularly true when supporting virtual desktops, mobile device users, and users of

data analytics applications. While difficult to quantify objectively, there are indeed costs associated with

provisioning that go beyond the cost of the additional storage capacity. These include costs resulting

from administrative management time and business opportunity lost due to delay caused by approval

cycles and other administrative red tape. The process of capacity planning and provisioning should be

streamlined to the greatest extent possible to enable sustainable capacity management in an

accelerated growth environment.

Impact of compression, deduplication and thin provisioning

It is important to note that the implementation of technologies such as compression, deduplication and

thin provisioning at the storage array level will be critical to managing accelerated data growth. In

addition, they will have a decided impact on a commonly used measure of storage array efficiency:

capacity utilization as defined by the ratio of usable capacity to raw capacity. In this case, data

compression, deduplication and thin provisioning can dramatically increase useable capacity vs. that

which IT administrators are used to seeing from traditional storage architectures that normally have

(6)

While we are not saying that these features make this particular measure of efficiency invalid, we are

saying that the effects of compression, deduplication and thin provisioning must now be considered,

both at the time of product comparison and selection, and after the storage system has been use for a

period of time. Implementing them will also help put storage administration on a path to sustainability.

Sustainable

Management

using

EMC

Isilon

Scale

Out

NAS

Prior to its acquisition by EMC, Isilon had established a data‐center production quality, scale‐out NAS

platform that conforms to many of the requirements for sustainable storage management noted above.

Since that time, revenues generated from the sale of Isilon OneFS‐based storage platforms have more

than doubled.

Here we compare the attributes of Isilon storage platforms to our requirements for sustainable data and

storage management. We have included the results from two of the Isilon customer interviews that

were conducted to validate our findings.

Isilon

Attributes

Singlevolume file system presented by multiple nodes

Isilon’s scale‐out storage implementation differs from a number of other competing architectures in the

way scalability is brought forward. We have seen NAS systems where scaling is implemented through

the addition of fixed‐scale file systems that are aggregated together around a single name space. With

Isilon, both the file system—One FS along with its single volume—and the namespace are allowed to

scale together.

OneFS allows the IT administrator to add capacity to a single file system and volume by adding nodes.

Each node consists of processing power and memory, network I/O bandwidth, and storage capacity.

Therefore with each additional node, more processing, I/O bandwidth, globally coherent cache and

storage resources are added to the storage cluster.

Symmetric Multiparallelism and AutoBalance

OneFS uses a symmetric, multi‐parallel processing technology to stripe data across all participating

storage nodes in the cluster. Any of the processors in any of the OneFS nodes can be applied to work on

any I/O task regardless of where that data is located—in cluster cache or disk storage. AutoBalance

moves task processing and data among cluster nodes to maximize application performance. As a result

of this combination of technologies, capacity, I/O and storage system throughput scale out and are

automatically and persistently balanced as more nodes are added to the cluster.

FlexProtect rather than RAID

We note that, as storage systems scale to meet the demands we have outlined above, the use of RAID

groups to manage through disk failure events is becoming increasing less tenable. As individual drives

with multiple TB capacities are used as building blocks to scale overall system capacity, it becomes

increasingly more difficult to rebuild a failed drive while maintaining storage system availability. Yes, a

failed drive can be rebuilt non‐disruptively (albeit in degraded performance mode), but the system could

(7)

failed drive. This exposure could exist for hours and even days depending on how quickly the failed drive

rebuild process achieves completion. The period of data loss exposure only elongates with the use of

denser, higher capacity drives. As a result, more scalable alternatives to RAID are now emerging within

enterprise data centers.

FlexProtect is an implementation of Reed‐Solomon forward error correction encoding that has been

applied to the Isilon OneFS multi‐node storage architecture. FlexProtect operates across all nodes in a

OneFS cluster, no matter how small or large the cluster may be. It provides n‐way protection across the

Isilon’s redundant internal communication fabric that interconnects distributed storage nodes and

scales in its ability to provide data protection as nodes are added to the cluster.

With OneFS, a cluster can be composed of multiple storage pools creating failure zones. FlexProtect

offers protection for up to four simultaneous node failures per pool. Failure modes include anything

from complete node failure down to individual disk drive failure. Data access is maintained during failure

recovery.

Recovery from failure is also a decidedly different process from that of traditional RAID. Protection

information for each file is stored independently of file data that is striped and dispersed among all

OneFS nodes participating in the cluster (see section above). When file reconstruction is required,

FlexProtect identifies the parts of a file that are affected by the failure and reconstructs them using the

distributed protection information. This form of data reconstruction differs from traditional RAID in the

following ways:

1. Multiple node processors and tens to hundreds disk spindles participate simultaneously in the

rebuild of a file rather than confining rebuild to a single RAID group or ten to twenty drives or a

controller pair.

2. Reconstruction occurs within the free, distributed storage space available in the cluster rather

than within a single RAID group that is working to rebuild a single, multiple TB drive. Isilon uses a

"virtual hot spare.” An IT administrator can choose to reserve space for multiple drive failures

and/or node failures. The system then makes these reservations automatically across the nodes

i.e. not a single drive’s worth of space on a single node.

Current Isilon users we interviewed (see below) admitted they were a bit skeptical about replacing an

established and understood RAID environment with Isilon’s FlexProtect. One reported that he had to

work to win‐over senior IT management before he could replace his SAN environment. However, these

users now report that they are more confident of the data protection capabilities of forward error

correction coding than they were of traditional RAID and appreciate the ability to scale capacity upward

while maintaining performance and data integrity.

A single, scalable, logical volume

As its name implies, the Isilon OneFS storage cluster presents a single, scalable storage volume to the

application environment. This differs significantly from scale‐out storage cluster architectures that

aggregate multiple fixed‐volume storage units under a single namespace. Overall storage efficiency is

reduced as, over time, some fixed volumes will grow to their capacity limitations while others will

(8)

cognizant of the problems this imbalance creates and to make ongoing corrections in order to rebalance

the cluster.

OneFS frees the storage administrator from having to track fixed storage volume growth and capacity

limitations on an application by application basis, a management process that engenders multiple

problem areas:

The process of managing across fixed volumes is manual and error‐prone, and therefore must

be assigned to experienced IT staff members.

The movement of data from one volume that is outgrowing its useable capacity to another one

with available capacity is often involved in order to balance workload performance. Again, this

process opens up opportunities for errors to occur which could have unexpected consequences

or result in system outages.

Storage and Data Management Applications under a Single Management GUI

A single management GUI is used to control the entire cluster as it scales into the petabyte range in

capacity. It is also the interface to OneFS‐based storage and data management applications—a number

of which have been called out as significantly valuable by Isilon users we have interviewed:

SmartPools – supports the creation of multiple tiers (pools) of disk storage within a single Isilon OneFS

file system that vary by performance and storage density characteristics. As an example, highly active

data sets can reside on SAS or solid state disk when required for maximum performance while inactive

data can be moved to high density disk. Automated data movement is policy‐based (user defined) and

occurs within the confines of the single namespace file system. No links among the tiers or “stubs” are

required.

SmartQuotas – allows IT administrators to partition an entire Isilon cluster into managed quota‐related

segments. All quota segments can be thinly provisioned and managed from a single management

interface. Quote segments can be assigned to specific users and user groups with each segment having

its own provisioning policies.

SnapshotIQ – creates locally retained, read‐only data snapshots. OneFS supports an unlimited number of

snapshots within a single cluster and up to 1,024 snapshots within a single directory. Snapshots are

updateable using only changes to blocks from the originating file. Snapshots can be used for data

protection (see user interview #2 below) and can be managed on a cluster‐wide basis from a single

management interface.

InsightIQ – implements analytics processes aimed at the Isilon cluster and attached IT resources. Using

IsightIQ, an IT administrator can:

• Identify data set growth and forecast additional capacity requirements on a file type basis

• Diagnose real time events as well as historic events and identify performance bottlenecks

• Establish past performance trends in order to predict future results from configuration changes

• Track file access patterns and identify “heavy hitters”

(9)

SyncIQ – schedules and creates storage replication jobs between physically separate Isilon storage

clusters over LAN and WAN communications links. The IT administrator can set replication policies at the

cluster, directory, or file levels. Jobs can be run on‐demand under administrative control or scheduled

for a future time when the cost of communications bandwidth is reduced. Replication jobs can

parallelized for performance and evenly distributed across cluster nodes. SyncIQ use cases include

disaster recovery, disk‐to‐disk backup, and content distribution.

SmartConnect – automatically manages host connection load balancing across storage nodes to

optimize performance. SmartConnect also provides the dynamic NFS failover and failback of host

connections without the use of application host‐side drivers.

SmartLock – adds WORM file capability to Isilon OneFS. Write‐protected data is stored along with other

data types allowing the IT administrator to apply the same storage and data management capabilities of

OneFS including tiering and data protection to WORM files.

User

Experiences

We spoke to two IT systems administrators with hands‐on experience with Isilon and OneFS. Both

requested anonymity but were nevertheless comfortable with reporting in detail their experiences to us.

Healthcare

Insurance

Processing

We spoke to a manager of engineering and system administration for a company that does back office

processing for healthcare insurers. This company touches nearly half of the all of the healthcare

insurance claims processed annually in the US.

In this company’s data center, IT supports 3,000 users of many small applications that have been

acquired through the years via the acquisition of smaller services providers. The Isilon system,

purchased two and one half years ago, now supports 400 physical servers and 800 virtual servers, 300 of

which are in production. Storage capacity is approaching 1 PB and is growing at a nominal rate of 200%

per year. As this manager says, “We have an insatiable appetite for storage. Users always want more

space.”

It is also interesting to note that this company’s IT infrastructure has been supporting virtual desktops

for more than six years. Currently the number of virtual desktops exceeds 1000 and is growing.

Prior to Isilon, all server storage consisted of directly attached RAID arrays (DAS). The most dramatic

result from the consolidation of DAS to Isilon was a drastic reduction in the amount of staff time

devoted to storage administration. In the DAS environment, capacity provisioning and file cleanup were

the two most time consuming tasks. In addition, recovering from data loss due to drive failures and

other events was a regular occurrence.

The size of the Isilon environment was tripled last year and will likely double again in 2012 due to

company acquisitions. This user had previous experience with a SAN environment and reported that

Isilon was much easier to manage due to the fact that it presents a single file system to the application

host environment. The entire near‐PB system can be managed by one administrator on a part‐time

(10)

Isilon’s quota management application (SmartQuotas‐see above) was singled out as being particularly

useful. Business and IT users are given a storage quota that they manage individually, saving the storage

administrator a number of manual provisioning and clean‐up tasks. Isilon’s Insight IQ was also being

used effectively to reduce the time required to managing the cluster and maintain performance levels.

Book

Publishing

We interviewed the systems manager of a large book publisher with twenty seven data centers spread

globally. In this case, the data center was located in the US. This company publishes 600‐800 book titles

annually.

A single Isilon image supports over 250 TB of mostly unstructured data including rich media files. The

multi‐host environment is 100% virtualized. The predominant application is book design and pre‐

publishing with users on a mix of Windows and Apple desktops.

Isilon scale‐out NAS has been installed for approximately one year. It replaced a Pillar‐based SAN

environment that was growing at 50‐70% per year. This growth rate continued over the past year.

Virtual desktops are now being deployed. So far, no performance issues have been encountered. In fact,

response time for virtual desktops is much improved over early VDI deployments on the previous SAN

infrastructure that was more expensive on a cost per GB basis.

During the interview, two major improvements over the previous storage environment were noted:

1. Isilon allowed this data center’s data protection process to be completely revamped. The

previous generation infrastructure (backup apps and servers with backups to tape) has been

replaced by the snapshots taken every five minutes locally and file replication every hour using

Isilon SyncIQ to one of this publisher’s remote data centers for off‐site disaster recovery

capability. The previous backup environment took up approximately 50% of this data center’s

operating budget. That expense has been eliminated. Recovery time under a disaster recovery

scenario has gone from five days to minutes.

2. Data and storage management processes have been greatly simplified. SAN storage within a

majority of the publisher’s more than twenty data centers have been replaced by Isilon.

Whereas with the SAN environment, one person within each data center was assigned SAN

management responsibilities, now only one person is required to manage all of the current

Isilon systems. In addition, storage efficiency has been greatly improved. Storage utilization has

gone from 25% to 60% with room to grow for another year under the current configuration.

Terabytes of wasted capacity were reclaimed in the conversion from SAN to Isilon.

Isilon is now the global storage standard for this publisher’s data centers.

Conclusion

Some major findings stood out from our interviews with current Isilon users. First, the ability to manage

a large scale Isilon environment with a minimum of hands‐on administrative effort was most notable.

(11)

“part‐time” job. The book publishing IT administrator believed that a single administrator could manage

twenty seven Isilon instances distributed world‐wide, once all of the Isilon systems they were in the

process of acquiring were installed. We believe that the high degree of management efficiency

demonstrated by these users is a direct result of the OneFS architecture that scales into the petabyte

range as a single, scalable storage volume.

Second is the versatility of the Isilon platform. One environment supported virtual desktop users—a very

demanding environment for storage. As reported, the Isilon platform offered an improvement in

performance for these users over the Pillar‐based SAN it replaced. The other was heavily dedicated to

unstructured file storage services. However, in both cases, performance scaled linearly along with

capacity. We believe this linear scaling capability results from Isilon’s Symmetric Multi‐parallelism and

AutoBalance features.

Finally, it is interesting to note that while there were initial reservations expressed toward replacing well

known and understood RAID architectures with Isilon’s FlexProtect implementation of forward error

correction encoding for array‐level data protection, those reservations have been dismissed. As noted

by one user, the level of data protection under FlexProtect is actually an improvement over the RAID‐

based storage in use prior to replacement.

As we have noted, a new attitude—the Big Data attitude—is now emerging from corporate board

rooms. Data now being generated by users interacting with the web and interacting with each other

using mobile devices can be leveraged to create new business opportunities and enhance our daily lives

when applied to healthcare and governmental services. Big Data processing will require IT to stand up a

sustainable storage environment. We believe that EMC Isilon demonstrates the required attributes for

Big Data storage management sustainability.

About Evaluator Group

Evaluator Group Inc. is dedicated to helping IT professionals and vendors create and implement strategies that make the most of

the value of their storage and digital information. Evaluator Group services deliver in‐depth, unbiased analysis on storage

architectures, infrastructures and management for IT professionals. Since 1997 Evaluator Group has provided services for

thousands of end users and vendor professionals through product and market evaluations, competitive analysis and education.

www.evaluatorgroup.com Follow us on Twitter @evaluator_group

####

Technology Insight Series

Technology I

nsigh

t S

eries

Manageability for Big Data Storage

Evaluator Group

John Webster

March, 2012

Introduction

What

is

the

Path

to

Sustainable

Data

Management?

Storage

Efficiency

Redefined

Sustainable

Management

using

EMC

Isilon

Scale

­

Out

NAS

Isilon

Attributes

User

Experiences

Healthcare

Insurance

Processing

Book

Publishing

Conclusion