W H I T E P A P E R
I m p l e m e n t i n g T i e r e d S t o r a g e : T h e H e a r t o f a n O p t i m i z e d
D a t a E n v i r o n m e n t
Sponsored by: HP
Brad Nisbet Matt Healey
I N T H I S W H I T E P A P E R
This white paper investigates the pressures facing many IT organizations today and the growing importance to optimize the storage environment. The paper focuses on the benefits of a tiered storage architecture to address an emerging set of requirements around cost, performance, protection, and long-term data management in the datacenter. The paper then takes a closer look at services and solutions offered by HP aimed at optimizing data through a tiered storage approach.
S I T U A T I O N O V E R V I E W
Digital information is the backbone of today's fast-paced, competitive business environment. Increasingly, enterprise organizations must rely on and leverage the value of their digital assets to remain competitive and ensure profitability. For years, the main charter of a storage administrator was to store data as efficiently as possible, which often simply translated into effective storage utilization. Gone are the days when simply managing utilization was paramount.
In addition to managing utilization, the modern CIO and storage administrator face a whirlwind of challenges associated with the emerging pressures to reduce spending and manage floor space, energy usage, performance, long-term archiving, compliance, and other business governance issues. Not only are organizations creating and storing tremendous volumes of data, but the reliance on this data to maximize business results among a broad set of applications and users is rapidly increasing. An organization's data is more frequently subject to a growing set of requirements, such as business analytics, testing, or scrutiny under the umbrella of ediscovery and other compliance-related mandates. The requirements related to such activity are forcing administrators to store, protect, and manage their digital information for longer periods of time.
T h e I m p o r t a n c e o f D a t a O p t i m i z a t i o n
Many enterprises have built a storage architecture comprising high-performance systems to accommodate mission-critical applications. Over time, the role of these systems has expanded to include less mission-critical applications or to support a role in data protection or other secondary data activities. For many organizations, multiple storage pools spanning mission-critical and non-mission-critical applications share a single storage architecture or tier.
G lo ba l He adq ua rt er s: 5 Spe en St re et F ram in gha m , M A 01 701 US A P.508 .8 72. 8200 F .50 8.93 5.401 5 ww w .idc .c o m
Often, to accommodate the needs of the mission-critical applications, the single storage architecture is built on high-performance drives. While this satisfies the needs of the more demanding applications, the resources can be unnecessary for non-mission-critical or secondary data needs. Further, to satisfy data protection or archive activity aimed at addressing business analytics, ediscovery, or other information management requirements, the long-term storage of information on a single, high-performance tier of storage can be costly.
Within the complex task of managing an organization's vast amount of information, it is extremely important to recognize that different pools of data can hold varying degrees of business value. An efficient storage environment is one that aligns the value of the data with a particular service level. Optimizing the storage infrastructure to allow for this alignment enables increased efficiency, greater control, ease of management, and ultimately reduced capital and administrative cost.
Technologies and Architecture
An optimized data environment enables the flexibility to move and manage data among different service levels, depending on the application requirements associated with that data set. For organizations to accomplish this in a way that does not adversely impact cost and organization resources, the confluence of the following technologies and architecture becomes a core consideration:
Virtualization. This enables a dynamic environment and the ability to separate the deployment and management of the application/data from the physical devices on which it is maintained. An optimized data environment must be dynamic to easily manage the alignment of different data sets with the correct service levels at different stages of the data life cycle.
Management software and policies. The ability to dynamically and automatically migrate and manage data among different tiers of storage can reduce administrative cost and strain on resources. As enterprise storage capacities continue to grow into the petabyte level, the ability to manage data associated with different applications among different tiers of storage will prove to be cumbersome, if not impractical, for IT and storage administrators. Automating the migration of data, along with instituting policy management, will be a necessary component to maximize an optimized data environment.
Advanced data management. To further optimize the environment, increase utilization, reduce cost, and meet growing information management requirements, a data environment will need to incorporate an emerging set of technologies. Such technologies include deduplication, encryption, and advanced file systems providing global namespace capabilities. In addition, data protection and archive solutions will need to be integrated to ensure the appropriate levels of protection and long-term retention requirements within the framework of the optimized environment.
Storage tiering. This enables the ability to distribute different categories of data among different classes of storage media in order to reduce cost. The ability to align data value with the appropriate tier of storage can improve utilization, cut waste, and reduce unnecessary spending. The tiers of storage, comprising various performance and capacity levels among solid state drives (SSDs), hard disk drives (HDDs), and tape, allow for a spectrum of choices to fine-tune this alignment and reduce the overall cost per capacity of storage. Tiering can be based on required levels of data protection, performance or capacity considerations, age of data, frequency of use, or other criteria.
Pulling It All Together with Services
To realize the full potential of an optimized data environment, one must expand on the technology considerations described above and consider a variety of activities, including:
Data classification. An essential first step is to understand what applications and data types are in the environment today and will be in the future. Identifying the business value of each data set is the critical foundation for the entire data optimization process.
Plan and design. Once the value of the data sets is established, the planning process establishes the policies and architectures required to satisfy the desired classification. The environment can then be designed with the appropriate technologies to accommodate the desired outcome.
Implement and manage. Upon completing the planning and design process, organizations can implement the solution and establish ongoing management activity. Aided by technologies that automate, such as virtualization and other policy management schemes, the alignment of data with the appropriate tier of storage, data protection, long-term archive, and reassessment of data policies will be an ongoing activity. Organizations that deploy a successful solution will periodically review existing data classification, processes, and policies to adjust for evolving needs over time.
Professional services that incorporate a framework of technologies combined with the planning, implementation, and management activities described above can dramatically improve the agility of an organization's storage environment while reducing spending on equipment and administrative costs.
At the heart of an effective optimized data environment is the ability to design, implement, and manage a tiered storage architecture. Implementing any one technology such as virtualization or data deduplication might be manageable for an administrator, but implementing a comprehensive solution that incorporates the benefits of tiered storage can be more complex. The root of being able to manage data effectively across storage tiers is the ability to identify the data correctly, establish and manage policies against the data, and ensure that the targeted storage tier is appropriate and available based on the business objective. Professional services that focus on tiered storage solutions can help organizations achieve these goals.
I m p l e m e n t i n g T i e r e d S t o r a g e : T h e H e a r t o f a n O p t i m i z e d D a t a E n v i r o n m e n t
Typically, no single storage platform will efficiently support all data value classes. Instead, a tiered architecture can enable a customer to store data based on business value in a cost-optimized manner.
At a fundamental level, the objective of the tiered storage architecture is to ensure the business data receives the appropriate service level while optimizing the use of the infrastructure through appropriate policy. This ensures ROI by balancing business requirements with the cost per GB of data.
In the past, the most basic tiered storage environment leveraged a tier of primary HDD storage that was backed up and archived to a tier of tape storage. This could exist within a single server or in a networked storage environment. Today's storage environments, especially for larger companies, are much more complex and involve a variety of disk-based tiers. Increasingly, the architectures will include tiers of SSDs. Although the use of tape will have a less expansive role, it is still an important component for many companies today.
As the deployment of tiered architectures evolves, specific classification of tiers will generally fall into three main categories. These types will not be the traditional primary (tier 1), secondary (tier 2), and tertiary (tier 3); rather, they will be a reflection of the main function of the particular storage pool.
The performance tier includes storage pools designed to deliver high levels of performance, though the definition of performance will be more variable (random I/O versus sequential, as well as primarily reads, primarily writes, or random read/writes). The performance tier will also require the highest levels of reliability, though reliability will have a different meaning in different contexts. For example, reliable storage for random reads/writes in a database application is very different from the reliable streaming of a movie in an IPTV application.
This is the segment where technologies such as 2.5 in SAS, SSDs, high-speed network links, and dynamic use of multiple active copies will be deployed most quickly. However, performance requirements will vary across a spectrum of levels depending on application, business requirements, and access patterns. As such, this segment is increasingly incorporating the use of high-capacity hard drives into the mix.
The replication tier includes storage pools designed to store copies of data/images for reuse cost-effectively. The most obvious use case for this tier is to support the shift to disk-based, rapid data and application recovery. Others include rapid provisioning in test/development, data mining/analytics, and ediscovery repositories.
In this tier, the focus will be on developing storage solutions that ingest, replicate, and store the maximum amount of replicated data at the lowest cost. Storage solutions in
this tier may have very specific performance requirements (e.g., rapid ingest of backups), but most of the focus will be on adding processing performance to enable capabilities such as compression, data deduplication, or space-efficient snapshots. Archive Tier
The archive tier includes storage pools designed to provide active access (though not necessarily high performance) to large amounts of data/information for very long periods (years and/or decades). "Active" is the key to distinguishing this archive tier from the "archiving" most companies historically did with tape libraries. This tier is almost exclusively file based (e.g., database archive files, telemetry data, medical records, and personal photos).
Any single file may never be accessed, but all files must be accessible in terms of being readable and findable. Not surprisingly, data density and operational efficiency (e.g., reduced power consumption) will be key considerations in these solutions. This tier will also be considered for public cloud storage services. Of equal importance, however, will be support for advanced information management capabilities, ranging from massive, clustered file systems to integrated data classification, search, and analytics.
Tie rin g C rite r ia
Traditionally, the criteria upon which any tiering occurred were typically based simply on the age of the data. As data aged, it generally moved to a tier of lower-performing, less expensive media, such as tape. Over time, the list of criteria has grown to satisfy the increasingly complex data environments, spanning a variety of applications, usage patterns, and business values. Expanding on the metric of data age, we note that a more comprehensive list of tiering metrics includes:
Age of data
Inherent business value of data (and how this changes with age) Performance considerations such as response time and throughput System and application availability
Disk capacity and interconnection type
Access and usage patterns (i.e., 24 x 7 or infrequent, random/serial) Protection level (e.g., frequency of backup, RAID level)
To date, a majority of deployed tiered storage architectures are limited to two tiers of disk-based storage: performance-optimized drives (e.g., 10,000/15,000rpm Fibre Channel/SCSI/SAS) and capacity-optimized drives (e.g., 7,200rpm SATA). In today's more diverse IT environments, this simplistic approach to tiering is insufficient. IT architects need to consider six major factors:
Performance. By leveraging techniques such as overprovisioning and short-stroking performance-optimized drives, an organization is able to create a tier of storage that provides superlative performance metrics (compared with the rest of the storage infrastructure) to satisfy high I/O applications.
Cost. Conversely, by using (typically) lower-cost capacity-optimized drives, an organization can create a tier of storage that is focused primarily on long-term archiving of data over time (as an example). Therefore, while the preservation of the data is critical to the organization, instantaneous data request is second to the cost-effective nature of simply having relatively quick access to the data for future needs or requests.
Function. As the number of regulatory and legislative requirements increases for the preservation of the actual data, as well as integrity and security, organizations may be required to create one or more tiers of storage in which data is not only preserved in its original stored state but also tagged with audit trails to track how the data has been accessed and by whom or by which applications.
Environment. With predictable power, cooling, and real estate costs, an organization may consider creating a tier of storage focused on energy efficiency (e.g., through the use of solid state storage technologies).
Reliability. Depending on the perceived or real criticality of the data being stored, an organization may also manage its data based on reliability factors. For example, one organization may consider a mirrored and replicated approach as the appropriate strategy for mission-critical data and the use of RAID 3, 5, or 6 as appropriate for less critical data. On the other hand, another organization may consider an array of disks protected by the properties of RAID 6 suitable for its mission-critical data.
Efficiency. An additional motivation to move to tiered storage can be the desire for a more efficient storage infrastructure in terms of the simplicity of the management. Storage consolidation, virtualization, and information management can all individually increase the complexity of a storage infrastructure, something that IT administrators wish to avoid. Integrating such features not only improves the efficiency of storage but also can drive up storage utilization.
The N e e d to Imple me nt
The information presented above describes the importance of deploying a tiered storage architecture to address the increasingly stringent data requirements facing an IT organization. Simply managing storage utilization is no longer sufficient. The demands placed on today's storage environment, coupled with a new set of aggressive business economics in the datacenter, are forcing many organizations to seek an alternative approach based on data optimization and tiered storage. While a few organizations are able to tackle this complex task on their own, a majority are seeking solutions and services from storage providers and partners that can help organizations realize their full potential.
One such IT provider is HP. With a breadth of technologies and services centered on a tiered-storage approach, HP is helping to pave the road to the optimized datacenter.
T H E H P S O L U T I O N
Keeping in mind the focus on virtualization and advanced data management that spans the life cycle of business information, we note that several key technologies within HP's broad portfolio of solutions enable organizations to maximize the benefits of tiered storage within the context of data optimization. These technologies include: Advanced File Systems. These technologies include the HP StorageWorks
X9000 scale-out NAS and the tiered file system based on IBRIX technology. As unstructured data becomes the majority of data and the center of concern for many enterprise organizations, HP's file system technologies serve as the foundation for tiering unstructured data among HP and heterogeneous storage systems.
Data Deduplication. Reducing capacity demands on storage systems is integral to optimizing and reducing costs across a tiered architecture. Technologies such as HP's Virtual Library Systems, which help to reduce capacity strains associated with disk-based data protection, and HP's partnership with Ocarina Networks, which helps to reduce capacities associated with HP's file-based solutions, are addressing the capability to optimize capacity.
Storage Virtualization. Technology such as HP StorageWorks SAN Virtualization Services Platform (SVSP) enables a virtualized environment among a heterogeneous storage environment, greatly improving storage efficiency while easing management pain and streamlining operations.
Archive Platform for Content. This technology includes the HP Integrated Archive Platform (IAP). Organizations seek to leverage business information and comply with evolving regulatory demands over increasing retention periods. The emphasis and focus on solutions that address the efficient storage and retrieval of data over the long-term will be a critical component of a fully optimized data environment.
HP Storage Arrays. HP's portfolio of disk arrays forms the foundation of hardware upon which the tiering approach is applied. The spectrum of choice among disk arrays spans HP XP, EVA, P4000, P2000, various NAS arrays, and other disk enclosures.
H P S e r v i c e s f o r T i e r e d S t o r a g e
To maximize the benefits and mitigate costs associated with designing, implementing, and providing ongoing management and support of the technologies discussed above, HP offers enterprises a range of services around tiered storage architectures. Specifically, the company takes a life-cycle approach to deploying a tiered storage environment, starting with assessing and planning for a deployment and continuing all the way through to the ongoing support and management of that environment. HP has a continuum of customer services representing the life cycle of storage initiatives, including deploying, designing, maintaining, and managing the storage environment. HP services to implement tiered storage and data optimization are offered as part of HP's larger portfolio of services for storage, shown in Figure 1.
F I G U R E 1
H P ' s S e r v i c e s P o r t f o l i o f o r S t o r a g e
Source: HP, 2010
In addition, HP takes a holistic approach to the entire environment, bridging storage, servers, blades, virtualization and other software, and network infrastructure. When an organization deploys an optimized data storage environment, it is not sufficient to focus solely on the storage environment. The design needs to be optimized to work with other elements in the datacenter that are currently deployed or will be deployed in coming years. Specifically, the storage environment needs to be flexible enough to adapt to a migration toward a fully virtualized server environment as virtualization deployed continues to spread throughout enterprises. The HP Converged Infrastructure is a good example of an integrated virtualized environment.
Finally, many customers are investigating moving some applications to the cloud. As this migration continues, any data architecture will need to be able to adjust to these developments and remain efficient in terms of performance, utilization, and cost.
The initial phase of the engagement is typically data profiling. During this phase, HP works with customers to understand their current storage pain points and identify how the migration to a tiered storage environment can help alleviate current storage issues. Critical to this phase is an investigation into the business requirements for various data. Once HP has captured the critical attributes for the data and the storage environments, the direction of the data optimization plan is set.
Planning and Policy
After having profiled the data, HP is then able to develop a plan to deploy a tiered storage environment. To develop the plan, HP works with critical stakeholders within the organization. These stakeholders often include business units outside the storage or IT organization. For example, HP will conduct a workshop that includes the business units, compliance offices, and legal departments to ensure that the new tiered architecture will meet the requirements of the enterprise and stay in compliance.
In addition to developing a plan for how to deploy and support a tiered storage environment, HP will assist customers in developing policies to manage their data. These policies include:
Data Value. By determining the real business value of data, customers will be in a better position to be able to cost-effectively optimize how that data is stored. However, without a policy that can objectively determine that value, enterprises will often misclassify data, resulting in a suboptimized storage environment. Capacity Planning. In recent years, with the dramatic increase in data storage
requirements, capacity planning has become more difficult for enterprises. By developing policies that can address data management and reporting schedules, enterprises will be in a better position to be able to anticipate their data requirements and plan for increases in storage requirements.
Tiering. In deploying a tiered environment, enterprises must understand what tier to assign data to. By understanding the value of the data and how it relates to business needs, storage administrators are in a better position to correctly assign the data to various tiers.
Archiving. Archiving policy is critical to all enterprises because they need to develop archiving schedules that are in compliance with industry regulations. Often these schedules vary based on the type of data being archived, and as such, developing solid archiving policies is important to ensure continued compliance. Data Protection. Finally, data protection for backup and recovery is critical
because most enterprises cannot tolerate the loss of critical data. Further, even the loss of what has traditionally been considered less critical data can impact an organization because the organization cannot use that information to make informed business decisions. Therefore, enterprises need to establish a well-developed data protection policy.
Solution Design and Implementation
After all of the data has been profiled, the business stakeholders have been consulted with, and the policies for data optimization have been created, HP begins to design and implement a solution that incorporates all of the business requirements. During this phase, the detailed solution design is developed and implemented in accordance with requirements that had been developed in previous phases. After plan development, the solution is implemented in accordance with the ITIL/ITSM framework.
Ongoing Management and Support
The ongoing support for solutions is, in many cases, one of the most critical services that a vendor can offer. Storage solutions are rarely, if ever, static solutions. For systems to meet the design requirements and continue to address the business' growing data needs, they must be properly supported. Further, properly supported does not mean that all systems need the highest level of support. Systems that are not supporting critical data should be supported at a lower level than mission-critical systems. To address this, HP has a wide range of support options for customers to choose from, ranging from basic attached HP Care Pack support offerings all the way up to HP Mission Critical Partnership.
HP Care Pack Services help maintain and manage all aspects of storage solutions, including technology, processes, and staff education. An organization can boost performance, security, and availability while increasing return on investments. HP Care Pack Services offer levels of service that make it easy to select the right services for business needs and realities. The full range of HP Care Pack Services includes: Support Plus Service and Support Plus 24 Service provide a single source for
integrated hardware and software services and software updates for selected HP and third-party products.
Proactive Select Service offers a choice of precisely targeted proactive services to address a particular environment and situation, with the flexibility to modify service activity choice anytime during the term of agreement.
Proactive 24 Service provides integrated proactive and reactive support to complement internal resources. Proactive 24 Service for SANs offers SAN-specific problem prevention and resolution. It's an ideal service if the datacenter is in need of improved SAN availability and increased ROI on SAN investments. Critical Service reduces risk and improves efficiency by proactively managing
changes across the total environment, with complete support for enhanced service levels and proactive risk avoidance.
HP Insight Remote Support provides automated solutions to better manage the customer's IT environment. HP Insight Remote Support software provides simple, reliable, secure, and constant server and storage monitoring, along with problem resolution. It supports up to 2,000 devices, provides multivendor support, and is integrated with HP Systems Insight Manager. HP Insight Remote Support can also be implemented with Mission Critical Services support.
Education and training provide flexible and comprehensive training courses online and in a traditional classroom setting to help IT and user communities master storage networking, disk storage systems, and storage software.
In addition to the HP Care Pack solutions, HP offers its Mission Critical Partnership for organizations with mission-critical environments. This is the highest level of support that HP offers and is intended for systems that cannot fail or experience unplanned downtime. Under this agreement, the customer and HP develop an ongoing support plan for the systems. HP can remotely monitor the systems to ensure they are continuing to operate efficiently.
C H A L L E N G E S / O P P O R T U N I T I E S
IDC believes that HP faces two main challenges in the delivery of the services around tiered storage solutions. The first is a challenge faced by all IT services providers — the ability to execute on their plans on a continuous basis. The very nature of services makes execution the largest challenge because services depend on the knowledge and skill of the people delivering them. A hardware or software product can be replaced or repaired, but it is harder to recover from a negative services experience. However, HP has a very strong reputation in the delivery of services, and IDC expects that it will not encounter significant difficulty in being able to successfully meet its customers' expectations in the quality and timeliness of services delivery. The second challenge is around the rapid growth of data requirements for customers. Given the accelerated growth of data, and the increasing new technologies that are being developed to help manage that growth, HP services will need to stay at the forefront of these technologies. This can be challenging for any large organization because ensuring that services personnel are trained on the latest storage equipment and technologies can be challenging. However, once again, HP has demonstrated that it is able to accomplish this. Further, HP has a very strong partner ecosystem that it can tap into in cases where it may not have the required internal skill sets.
C O N C L U S I O N
Enterprises that are investigating deploying a tiered storage architecture should consider using an external services provider to assist with the migration. Of critical importance when investigating these providers is the ability to offer a range of services that span the life cycle of the deployment from the initial planning all the way through to ongoing support and management.
Further, IDC believes that enterprises should investigate a provider that not only understands storage but also can ensure that data optimization is designed to work with other technologies that are present in the datacenter. As the industry takes an increasingly holistic approach to delivering IT in the datacenter, enterprises are seeking providers that share that view, such as HP with its Converged Infrastructure. HP can provide this level of service and has a strong reputation, which has been built over many years, of delivering storage and broader IT services to enterprise customers.
C o p y r i g h t N o t i c e
External Publication of IDC Information and Data — Any IDC information that is to be used in advertising, press releases, or promotional materials requires prior written approval from the appropriate IDC Vice President or Country Manager. A draft of the proposed document should accompany any such request. IDC reserves the right to deny approval of external usage for any reason.