The Evolution of Modern Data Protection
By Jason Buffington, Senior Analyst; and Monya Keane, Research Analyst
This ESG White Paper was commissioned by CommVault
and is distributed under license from ESG.
Overview ... 3
The Convergence of Data Protection, Data Management, Archive, and Backup ... 4
A Unified Experience for Management and Policies ... 5
A Common Store on the Back-end ... 6
CommVault Simpana: A Solution to Consider ... 6
Simpana OnePass ... 6
The IntelliSnap Ecosystem ... 7
Understanding the Simpana Architecture and Its ContentStore ... 8
The Bigger Truth ... 9
All trademark names are property of their respective companies. Information contained in this publication has been obtained by sources The Enterprise Strategy Group (ESG) considers to be reliable but is not warranted by ESG. This publication may contain opinions of ESG, which are subject to change from time to time. This publication is copyrighted by The Enterprise Strategy Group, Inc. Any reproduction or redistribution of this publication, in whole or in part, whether in hard-copy format, electronically, or otherwise to persons not authorized to receive it, without the express consent of The Enterprise Strategy Group, Inc., is in violation of U.S. copyright law and will be subject to an action for civil damages and, if applicable, criminal prosecution. Should you have any questions, please contact ESG Client Relations at 508.482.0188.
The business needs
multiple copies and
iterations. But does it
need 20 of them?
Organizations have many business reasons to create multiple copies of the same data—reasons that are all
justifiably important to achieving operational effectiveness. For every “production” copy of data, organizations will often have other “operational” copies for:
• Application development and testing.
• Simulating scenarios during sales demonstrations or recreating issues as part of customer-support efforts. In addition to the copies being used for business purposes:
• One should assume that multiple copies are being created as part of the organization’s backup procedures, including possible iterations across daily, weekly, and monthly retention rotation mechanisms.
• The company’s storage team probably has implemented snapshots, thus creating additional copies.
• The archive administrator, auditor, or compliance facilitator also would need a copy of the same data store. And those are just the copies created by IT and business stakeholders. This partial listing does not include whatever versioning is occurring or any copies that users are creating within their home directories, devices, or other
The means may differ, but the result is the same. Before long, the same data set resides in what can be a mind numbing and incalculable number of places. If you had the discovery tools, it would not be inconceivable to discover 15-20 copies of some data across the protection, production, and other operational storage silos that many environments maintain. And what is key to remember is that all of the IT processes that created those copies were enacted for the right reasons. The business needs multiple copies and iterations, but does it need 20 of them? Think about how much extra IT infrastructure footprint that approach requires; now, imagine the footprint
enlarging even further because the organization also is dealing with the more macro-level issue of persistent data growth in general. Companies of all sizes, across all geographies, affiliated with all industries continue to struggle with data growth. The business requirement to manage all that data has consistently appeared as one of the top-five most important IT priorities reported by respondents to ESG’s annual spending intentions surveys.1
Couple that priority with the fact that rising costs also remain a top storage challenge reported by respondents, and it’s clear why organizations want to optimize the use of their disk-based IT infrastructures.2 Information must continue to be immediately and consistently available for business, legal, and regulatory purposes.
For all of those reasons, data management has to be pursued as a
comprehensive, strategic, and deliberately thought-out initiative instead of being treated as the haphazard and often parallel/disconnected set of data usage and protection tasks that many organizations suffer with today.
1 Source: ESG Research Report, 2013 IT Spending Intentions Survey, January 2013. 2 Source: ESG Research Brief, Key Challenges in Data Storage, December 2012.
The Convergence of Data Protection, Data Management, Archive, and BackupThankfully, a growing number of IT professionals are finally beginning to understand that “backup” is not
synonymous with “data protection.” Rather, backup is simply one facet of a comprehensive data protection strategy (see Figure 1).
Figure 1. The Spectrum of Data Protection
Source: Enterprise Strategy Group, 2013. And arguably, “data protection” itself is a subset of what IT should be focused on with data management.3
Obviously, it is important to develop and follow data protection and data management approaches that conform to business requirements and corporate culture. But is it really necessary to maintain separate stores for all of those copies that various business units and IT teams require?
Perhaps it’s time to explore an approach to protection, replication, and distribution based mostly on how the organization wants to leverage its data, rather than on what the organization’s current data protection and data management toolsets are technically capable of providing. Similarly, it appears to be well past time to balance the need for efficiency with the need of different groups to be storing everything separately “on their own.”
Fortunately, the timing for such a reevaluation is right because the data protection segment of the IT market is evolving rapidly. Specifically, the industry is moving toward combining backups, archives, snapshots, and replicated copies in a more holistic manner to achieve truly efficient data protection. And because almost everyone at this point agrees with the notion that data protection has evolved beyond just backup, the natural next step is efficient data management—an approach even more inclusive and broad.
But to achieve this efficiency, IT organizations should consider a couple of criteria:
• The SLA/requirements of the primary server, its application(s), and its primary storage device(s).
• The desirability of avoiding isolated silos of information on the back-end.
In other words, instead of just thinking about the middle of the process—the activities of copying/moving data for the purposes of backup, replication, snapshotting, and archiving—start by thinking about what kinds of recovery you need to enable, and then think about which products to continue using or begin evaluating.
In the best of all worlds, a
API-level connectivity with the
storage arrays and the
data protection functions
of the applications so that
IT can establish a unified
set of management
And don’t forget: Your “what” goals should be based on foundational, business-centric “why” objectives. Thus, using the spectrum shown in Figure 1, start with your business goal, and then consider the “color” or IT process that will help you achieve that purpose:
• If you need to preserve data for content-specific purposes, implement archiving.
• If you need to recover data selectively (or en masse) according to a range of previous timeframes, use traditional backups.
• If you need to recover to near-current points in time, utilize snapshottechnology (ies).
• If you need data to be in more than one location, replicateit.
• If that remote data needs to be accessible, leverage the replicated copies for BC/DR.
• And if the data must be accessible all of the time, safeguard that continuous usability through high-availability mechanisms.
If the business unit leaders and the technical leaders can agree on the why (i.e., the reasons to protect/preserve data) and the how (i.e., the general technology approaches to employ, then everyone can come to an agreement on which products to use based on what features, capabilities, and value propositions each technology solution would offer.
With that being said, there are even a few more key ideas to consider:
• Unified management experience(s) and policies. • Common storage mechanisms.
After all, just because you need to do several things to manage your organization’s data properly, you don’t need to do each one in an isolated vacuum.
A Unified Experience for Management and Policies
Unfortunately, the most common way for IT organizations to achieve agility with those diverse data protection and recovery capabilities listed earlier is for all IT function owners to implement part of the solution on their own. For instance:
• The storage administrator configures snapshots and possibly replicated snaps.
• The backup administrator configures backups and more replication.
• The application owner configures application-centric replication, data dumps, and data mirroring (yet more replication).
• Someone working outside of IT configures archiving of the primary storage data, such as a regulatory/compliance team or the legal department.
But it doesn’t have to be that way.
Storage, backup, and application vendors maintain well-established alliances in order to deliver cohesive, comprehensive data protection experiences across their technical portfolios. In the best scenarios, a single management experience is provided through API-level connectivity with the storage arrays and the data protection functions of the applications so that IT can then establish a unified set of management policies.
In those cases, a single set of schedules can ensure that the primary and secondary storage layers create snapshots when appropriate, while the backup disk/tape/cloud repository retains previous versions in time, all while the application is in a coordinated/protectable state (by using Volume
The goal is to avoid
creating separate data
stores for separate data
Identical or near-identical
data in three or four
places equals three or
four times the footprint.Shadow Copy Service for SQL or Exchange, for example). Ideally, the unified policies act upon the data by way of intelligent processes such as software agents that can perform more than one data protection task, instead of via parallel data operations that all touch the same data objects but with different outcomes.
A Common Store on the Back-end
Unifying the management experience reduces many of the operational expenses related to modern data protection. But to reduce capital expenses and the specific operational expense of managing IT assets, it’s necessary to unify the storage container(s) themselves.
This does not mean that everything must fit into one vendor’s storage appliance or other device. In fact, to deliver the cost-effective and agile recovery capability that modern enterprises demand, an IT organization almost certainly will use multiple storage devices and storage media, including deduplicated secondary storage, intelligently managed primary storage, tape for long-term retention, and cloud/remote repositories for BC/DR.
With that in mind, the objective should be to manage all that storage in as few logical data containers as possible, or even in just one data container. That’s how an IT manager can ensure that the right copies and iterations of data are in the appropriate containers to support maximum recovery agility— without the redundancy that comes with continuing to store too many copies of data in the wrong places.
Achieving logically coordinated data iterations and instances across data
repositories demands a cohesive storage architecture orchestrated by a unified management experience. When it’s done right, data can be deduplicated and/or compressed (based on efficiency and data type), then copied or moved between storage containers in that optimized state.
Again, the goal is to avoid creating separate data silos for each of the separate data protection activities—evolving beyond having storage administrators make a snapshot (thus creating one data store), while archive administrators own a similar store of their own, and the backup team has its own store, too.
Identical or near-identical data in three or four places equals three or four times the footprint.
CommVault Simpana: A Solution to Consider
As companies search for ways to move past the problems of isolated data silos and inflexible recovery capabilities, some are discovering what other IT organizations have known for years: The CommVault Simpana software platform addresses those problem scenarios.
CommVault, which is currently shipping Simpana version 10, offers one of a few technologies built as a new generation of software-centric solutions for enabling efficient data management. One point squarely in Simpana’s favor is the software’s ability to manage a wide variety of disk arrays, tape devices, and cloud repositories. That capability can help an organization to establish and maintain one data repository for all end-users, regardless of how heterogeneous existing infrastructure components may be.
Again, the key to improving data protection capabilities is unifying both the storage containers and the
management experience—preferably with a consolidated set of agent-type behaviors to minimize any effects on the production server.
To minimize effects on the production servers while continuing to provide a range of backup and archival capabilities, the Simpana agent’s OnePass mechanisms are designed to perform several data protection and
management tasks in a single workflow instead of having three separate products each running their own agent on each production server (see Figure 2).
Figure 2. Comparing Three Traditional Data Protection Workflows to “OnePass” within CommVault Simpana
Source: Enterprise Strategy Group, 2013. Figure 2 shows what three separate data protection agent workflows would do on a single production server (left), compared with the Simpana OnePass workflow (right), which was examined during an ESG Lab validation.4 In this consolidated workflow, Simpana’s agent executes backup, archiving, and reporting activities, then sends the deduplicated data stream to the Simpana server and its ContentStore repositories.
The IntelliSnap Ecosystem
In addition to the backup/archiving functionality of its agent and its ContentStore repositories, the Simpana platform integrates with a wide variety of primary storage devices (see Table 1) to embed snapshot recoverability alongside the backup and archive restoration capabilities of the software.
Table 1. Some of the Hardware Partners Whose Snapshots Can Be Integrated with Simpana
Dell EMC Fujitsu Hitachi HP IBM NetApp Nimble
EqualLogic Symmetrix Eternus DX USP/USPv EVA XIV FAS CS Series
Compellent CLARiiON AMS XP SVC SnapVault
PowerVault MD Celerra HUS 3PAR DS SnapMirror
VNX VSP N LSi
Source: CommVault, 2013. IntelliSnap, CommVault’s snapshot management functionality that ESG has previously analyzed, enables IT organizations that rely on diverse primary storage solutions to enjoy a consistent snapshot experience under Simpana management—leveraging recoverability options for snapshots, backups, and archives from a single administration console.5
4 Source: ESG Lab Review, CommVault Simpana 9 "OnePass" Including Integration with HP X9000 Scale-out NAS, February 2012. 5 Source: ESG Brief, IntelliSnap Keeps On Snapping, October 2012.
Understanding the Simpana Architecture and Its ContentStore
As mentioned, a key feature of a modern data protection infrastructure is a logically unified data repository— possibly including deduplicated disk supporting storage optimization and rapid recovery, plus tape for long-term retention and cloud extensibility. The Simpana technology has been augmented incrementally with new capabilities for nearly 20 years now, making it unusual among industry-leading backup software products: It boasts a single software code base that serves as the foundation of its entire diverse data protection capability set.
The ever-broadening Simpana architecture is manifested not only in its capability set, but also in a consolidated repository that powers all of its recovery capabilities (see Figure 3).
Figure 3. Simpana Architecture Including OnePass, IntelliSnap, and ContentStore
Source: Enterprise Strategy Group, 2013. The top-left portion of Figure 3 breaks out the source-side capabilities of OnePass and IntelliSnap. The remainder of the figure depicts the spectrum of data protection capabilities Simpana offers through its unified ContentStore representing disk, tape, and cloud repositories. Within the ContentStore, data is stored and tiered automatically. And a single federated-search-capable index encompasses all snapshot, backup, and archived data—again, facilitating shared access by multiple business units.
The Bigger Truth
The time has come to refine the “efficient data management” story by connecting efficient operation to modern data protection—and in the process, solve the problems caused when legacy backup technology extends into a modern virtual infrastructure.
Modern data protection includes not only the traditional processes of backup and archiving, but also snapshotting, replication, comprehensive monitoring/management, automated data recovery, and other fairly sophisticated IT activities. Technically, of course, those data protection activities are still separate processes. But that doesn’t mean they should be deployed and managed from isolated storage silos.
Data sets should be able to interoperate—at least to the extent that the interoperation benefits the organization by reducing its storage footprint, reducing its operational activity windows on production servers (due to
smaller/fewer redundant production data sets), and enabling common/shared data management.
Those are the elements that help organizations become more agile in their recovery capabilities, while minimizing the impact that data protection operations can have on the front-end IT environment and the overall infrastructure footprint. It’s not just about backup or data protection; it’s about data management overall.
Think about what IT outcomes you need, then envision the smartest way to achieve those outcomes. This exercise is about more than modernizing data protection; it is about being more efficient with storage to better deliver data to users at any point in the data storage lifecycle. And efficiencies that make production storage more agile also enable better data protection by enhancing other data-movement functions or creating additional data recovery capabilities … all of it truly being better together.