• No results found

A Best Practice Guide to Archiving Persistent Data: How archiving is a vital tool as part of a data center cost savings exercise

N/A
N/A
Protected

Academic year: 2021

Share "A Best Practice Guide to Archiving Persistent Data: How archiving is a vital tool as part of a data center cost savings exercise"

Copied!
10
0
0

Loading.... (view fulltext now)

Full text

(1)

A Best Practice Guide to

Archiving Persistent Data:

How archiving is a vital tool as part

of a data center cost savings exercise

NOTICE

This White Paper may contain proprietary information protected by copyright. Information in this White Paper is subject to change without notice and does not represent a commitment on the part of Quantum. Although using sources deemed to be reliable, Quantum assumes no liability for any inaccuracies that may be contained in this White Paper. Quantum makes no commitment to update or keep current the information in this White Paper, and reserves the right to make changes to or discontinue this White Paper and/or products without notice. No part of this document may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or information storage and retrieval systems, for any person other than the purchaser’s personal use, without the express written permission of Quantum.

(2)

Archiving Persistent Data

2

Introduction . . . .3

Data types and the “Big Data” Challenge . . . .4

Media and Entertainment . . . .5

Life Sciences. . . .5

Utilities/Oil & Gas . . . .5

Why Tiered Storage is the Only Realistic Solution . . . .6

The Role of Data Management Software As Part of an Archive . . . .6

The Role of Tape for Archiving Data . . . .7

Considerations of Using Disk vs Tape for Archive Storage . . . .7

Operational Costs . . . .7

About Quantum’s Archiving Solutions . . . .9

About Quantum StorNext . . . .9

Key Features and Benefits . . . .9

About Quantum Scalar Tape Libraries . . . .10

About StorNext AEL Archives . . . .10

(3)

Archiving Persistent Data 3

W H I T E P A P E R

Today’s enterprises are experiencing greater storage growth than ever before. The growth comes from structured data from enterprise databases, or unstructured data from a variety of applications. Wherever it comes from, it must be preserved for business continuity, data retention laws, and to meet compliance requirements.

The data center needs to reduce its total cost of ownership (TCO) for its data retention and protection infrastructure. It must contain costs, manage data growth, improve the data management processes, and make it more efficient. Unstructured data is often the core data asset in an organization’s workflow and is inherent in revenue generating operations.

This white paper discusses how to manage unstructured data growth in the most cost-effective manner. It also discusses the distinct differences between backup and archive and what is the best policy for both.

INTRODUCTION

When developing a strategy for managing unstructured data growth there are many considerations that need to be taken into account including the type of data and its value, the cost of data growth, the correct platform for the data to reside on and data security. But, to start with, what is archiving and what is the difference between archive and backup?

To (over) simplify:

(4)

Archiving Persistent Data

4

DATA TYPES AND THE “BIG DATA” CHALLENGE

Not all data is created equal. If you consider the difference between an on-line trading system, processing multiple transactions in milliseconds, and the company payroll that only has to issue funds once per month you can see that data should be prioritized in terms of its currency, life cycle and value. As another example, compare a SAS Grid business analytics environment—where the resulting data is not very important six months later—to a corporate executive video communications or shareholder video where you may need to save the content for several years. Later in the document we will discuss value and give some real world examples of how data is dynamic. In general, data can be categorized as live, persistent or backup, meaning that it is either in use, is not in use but can be recalled at any point, or it is a copy of the primary data to only be used when the system needs to be restored after a failure.

IT projects are fast moving and dynamic but in most cases they are reliant on reusable assets e.g. historic data. For years, the way to handle data growth was to simply throw raw storage capacity at the problem. But that approach no longer works as organizations must not only deal with capacity challenges, but also with the performance, management and running cost of the systems. In fact, data is growing in some industries at such epidemic proportions, that a new industry term—Big Data—is being used to refer to the storage and management of this massive data challenge.

The challenge is compounded because many extra copies of data are being made—a snapshot is taken of it, some internal applications back it up, and backing it up with an enterprise backup application will probably make a copy of itself. This scenario doesn’t even mention all the replication processes going on: applications are replicating, storage is replicating and backup devices are replicating. What is needed is an automated system that can remove duplicate data, and archive just the data you need. This is where a good archiving strategy and architecture come in—utilizing archiving software to manage how and where the archived data is stored, and some combination of disk and tape storage to hold the archived data.

Frequency of Access

Primary Storage

High performance Secondary StorageTier 2 or 3 storage

Cost of Storage

Paying too much

(5)

Archiving Persistent Data 5

W H I T E P A P E R

Here are some examples of how different industries are being driven to archive Big Data.

Media and Entertainment

In the film industry it is common practice to store raw and edited content on high performance arrays while work is in progress. Once the project is completed the content is placed in a working archive or a long-term archive depending on the time it takes to create intermediaries, certain special effects and other content to develop the final cut. Preservation of source media containing the original content is extremely common since it is difficult, if not impossible to recreate, but this is often insufficient since raw material does not capture any edits or metadata generated during the processing of raw content to create a finished product. As a result raw content as well as final cuts must be archived.

Further reading: Archiving in the Media and Entertainment Market – http://qntm.co/ohk0Af Life Sciences

DNA sequencing and the use of imaging technology are producing new volumes of data that must be analysed, stored and managed. Research centers need to access, share and manage hundreds of terabytes of DNA sequencing data for analysis at any time. Each new generation of sequencers, mass spectrometers, microscopes and other lab equipment produces a richer, more detailed set of data. When the data is part of a workflow it must be on the highest performance systems accessible to researchers for analysis and discovery. The data should then be archived on more cost-effective systems for additional review and retrieval, and backed up off site.

Further reading: Next Gen Data Management for Next Gen Life Sciences – http://qntm.co/rnE0VM Utilities/Oil & Gas

To increase oil and gas exploration, speeding the processing of seismic data is a vital tool. This involves massively powerful 3D processing software, fast high capacity Ethernet networks and SAN based storage. Daqing Oil Field Petroleum Exploration and Development Research Institute (EDRI) performs seismic data archival, retrieval, data protection and vaulting through a high performance tape library. Based on parameters such as schedules, work areas, users and key processing criteria, its Geophysics Service Center can migrate data from online RAID systems to tape thereby releasing disk space for other jobs. When archived files are needed, they can be retrieved automatically from tape back to disk. Additionally, a clone of the final version of processed data can be replicated to the tape library to allow offsite vaulting and data protection for final data.

(6)

Archiving Persistent Data

6

WHY TIERED STORAGE IS THE ONLY REALISTIC SOLUTION

In order to be more energy efficient, various business requirements need to be matched with the right data storage technology. In most cases, this results in a multi-tiered storage architecture that includes a mix of disk and tape hardware together with replication, deduplication, data management and archive software.

As mentioned in the data types section at the start of the document, data can generally be

categorized as either live, persistent, or backup. This means that data is either in use, not in use but can be recalled at any point, or it is part of a copy of the primary data to only be used when the system needs to be restored after a failure. With this in mind, you need to prioritize where the data resides to ensure that live data is on fast, high-performance systems, persistent data is archived but easily accessible and that backup data is not only on lower cost systems but ideally powered down unless called upon and, if part of a disaster recovery strategy, copied to a different location.

For example, you should set a policy that moves data that has not been accessed for 30 days to a secondary storage array and then archive it after 90 days. In this case, fast primary storage is used for the live data, SATA disk or NAS disk arrays for the secondary data and tape libraries for the archive. The reason for this structure is to maintain the most cost-effective system. You could put all data on primary storage but the capital expenditure for the hardware, the management time needed and power usage would be excessive. Therefore, a tiered storage solution is the best approach, and a critical part of this tiered storage archive is software that can manage where data is stored, and do this in a way that is transparent to users accessing the data. This is the role of a data management application such as StorNext Storage Manager.

THE ROLE OF DATA MANAGEMENT SOFTWARE AS PART OF AN ARCHIVE

Good data management software should give you high-speed content sharing combined with cost-effective data archiving. It’s all about helping you build an infrastructure that consolidates your resources, so workflow runs faster and operations cost less. Data sharing and retention should be combined in a single solution, so you don’t have to piece together multiple products that may not integrate well. Even in heterogeneous environments, all data should be easily accessible to all hosts. Further reading: StorNext Technical Product Brief – http://qntm.co/qAYUOJ

to allow scientists to analyze the structure of matter. The system generates approximately one Gigabyte of new data per second and must be sustained day and night for at least one month of an experiment. This is the equivalent of more than a Petabyte of data being accumulated during the month. All these billions of bits of data generated every second are acquired by the ‘A Large Ion Collider Experiment’ (ALICE) data acquisition system before being selected, transferred and stored in the main computer center three kilometers away.

This requires high-speed, shared workflow operations and large-scale, multi-tier archiving.

(7)

Archiving Persistent Data 7

W H I T E P A P E R

THE ROLE OF TAPE FOR ARCHIVING DATA

Tape has historically been the primary media for backup and archive support for the data center. It continues to be pervasive in data centers of all sizes. According to the Clipper Group “for backup, 20% of all enterprises use only tape, while another 65% use both tape and disk, with tape usually sitting behind disk. This means that 85% of all enterprises use tape in some capacity for their data protection need.”

The primary role of tape is evolving to long-term archive and data retention, with many enterprises using disk systems for short-term backup and recovery in order to take advantage of the quick access speeds from disk for an individual file. Tape continues to be the primary storage media for most disaster recovery plans.

Further reading:

Clipper Notes - In Search of the Long-Term Archiving Solution—Tape Delivers Significant TCO Advantage over Disk – http://qntm.co/o7iiuD

CONSIDERATIONS OF USING DISK VS TAPE FOR ARCHIVE STORAGE

Operational Costs

There is often a misconception that disk-based arrays are faster than tape. If you want fast access to an individual file then disk is the correct choice but if you need sustained access to multiple files or need to restore files from a backup then tape, when used in conjunction with intelligent file management and archive software, will be your best choice.

In addition, new tape technologies like Linear Tape File System, and solutions from different vendors for policy-based data integrity checking, are making tape a better fit for archive storage.

Disk arrays used for archiving typically use SATA hard drives since they provide high storage capacity for a given price and are reliable when accessed infrequently. Data movement between tiers in an archive can be a manual process but this is cumbersome and susceptible to error, potentially resulting in data loss. Good data management software can be used to simplify this task. This software should include the ability to protect content by copying files and placing them on archive media. They should also work hand in hand with content asset managers and provide other efficiency features such as replication and deduplication for storage tiers. These features will greatly reduce storage requirements while enabling data to be retained longer.

Archiving should not be regarded as a static process. Data volumes will always grow and when an archive load becomes too large decisions will have to be made about which content to transfer and preserve on new media. Selecting media format should always be made with a consideration towards backwards compatibility—otherwise data transfer could become an almost constant process.

(8)

Archiving Persistent Data

8

any data on suspect media to new media, transparently to the user, also improve efficiency. In reality a combination of enterprise data management and protection software and a high performance LTO tape library will give the most cost-effective archive performance.

Further reading: Tape: Protecting Data to the Final [fron]Tier – http://qntm.co/oUxzfr Quantum LTO: http://www.quantum.com/Products/TapeDrives/LTOUltrium/LTO-5/Index.aspx

Data center power, cooling, and space requirements are becoming a challenge. And the demands for data protection, improved restore performance, longer data retention times, and technology integration such as deduplication, are growing at a vast rate.

Only the original data needs to be backed up and retained for long periods of time. Keeping it on spinning media for years on end will eat away at the energy portion of your IT infrastructure budget. Moving long-term data retention to tape largely removes the electricity costs to store that data, and enables the enterprise to demonstrate sustainability, via green initiatives, that seek to reduce energy consumption.

The above diagram shows the acquisition and running costs of an LTO tape library compared with primary and secondary disk storage. If you are only accessing data occasionally it makes sense to ensure it is stored on the most cost-effective and efficient platform. As you can see, power

consumption is a vital consideration for a cost-effective system. With primary data continuing to grow, doubling every 12 to 18 months, powering and managing that growth has moved into the top five of CIO concerns. Overall, 15% of office use of electricity is attributable to IT, according to UK-based Carbon Trust, and it forecasts this will rise to 30% by 2020.

100 80 60 40 20 100 80 60 40 20 Primary Storage

High performance Secondary StorageSATA Tier 3 Storage TapeSecondary Storage

Relative Cost of Acquisition

Po

wer % used

(9)

Archiving Persistent Data 9

W H I T E P A P E R

ABOUT QUANTUM’S ARCHIVING SOLUTIONS

About Quantum StorNext

With StorNext data management software, you get high-speed content sharing combined with cost-effective data archiving and content protection for large, unstructured data. It’s all about helping you build an infrastructure that consolidates your resources, so your workflow runs faster and operations cost less. StorNext offers data sharing and retention in a single solution, so you don’t have to piece together multiple products that may not integrate well. Even in heterogeneous environments, all data is easily accessible to all hosts.

Key Features and Benefits

• File system deduplication optimizes the capacity and cost of primary storage.

• Distributed Data Movers (DDMs) increase the performance and scalability of storage tiers. • Replication enables powerful data protection and data distribution solutions.

• Management console greatly simplifies data management complexities.

• Virtualization of storage tiers greatly reduces future storage requirements while enabling data to be retained longer.

• Self-protecting architecture leverages integrated data protection, and integrity checks safeguard data both on-site and off-site.

Further reading: Quantum StorNext: http://www.quantum.com/products/software/index.aspx Quantum’s StorNext AEL Archive appliance combines StorNext data management software with Quantum’s Scalar tape libraries containing LTO-5 drives to provide a solution that enables you to establish policies to automatically archive data to a tape library based on the time since the file was last accessed. When a user or application tries to access the file that has already been moved to the archive tier, the file is automatically copied to your high-performing primary disk tier—this can save many hours, sometimes even months, compared to a conventional recovery.

Quantum’s Scalar i6000 tape library with LTO-5 tape drives which is included in the enterprise version of the StorNext AEL Archive appliance, includes innovative new features, such as policy-based data integrity checking via the Extended Data Life Management feature. In addition, the Extended Data Life Management feature can be combined with StorNext user-defined policies to check the integrity of tapes, say, every six months and move data from suspect tapes to new tapes and automatically retire suspect tapes, without operator intervention.

(10)

Archiving Persistent Data

10

and data security capabilities with embedded management software called the Quantum iLayer. This software uses detailed information to automatically evaluate the integrity of drives and media within the library, so you can increase backup reliability while decreasing the total cost of ownership. The Scalar family of tape libraries easily integrates into your existing infrastructure and works seamlessly with disk for a complete data protection solution.

Further reading: Quantum Scalar tape libraries:

http://www.quantum.com/Products/TapeLibraries/Index.aspx About StorNext AEL Archives

The StorNext AEL Archive appliance delivers a policy-based data migration engine which provides the powerful data movement capabilities required for archiving and the preservation of digital assets. The StorNext AEL Archive utilizes a pricing model guaranteeing investment protection over the long term of the data. As tape media capacities increase, the capacity of the StorNext AEL Archive increases— with no additional software license fees. This is an extremely high confidence tape solution—where not only are tapes validated for accuracy and readability, when a suspect tape cartridge is identified, StorNext’s Storage Manager software automatically copies the data on the suspect cartridge to a new cartridge through EDLM policies. And because the solution is based on tape media technology and uses slot-based pricing, its initial investment is affordable and its long-term TCO is unbeatable.

CONCLUSION

Remember – archiving is not backup, but having a good archive strategy is still a vital part of your corporate IT policy that can realize massive storage cost savings. The combination of intelligent archiving and data preservation software coupled with the latest high speed tape libraries will give you the best value, protection, operation cost savings and disaster recovery plan available.

www.quantum.com/stornext • 1.866.809.5230

Preserving the World’s Most Important Data. Yours.™

©2012 Quantum Corporation. All rights reserved. Quantum, the Quantum logo, DXi and StorNext are either registered trademarks or trademarks of Quantum Corporation and its affiliates in the United States and/or other countries. All other trademarks are the property of their respective owners.

WP00148-v03A Jan 2012

About Quantum

References

Related documents

[r]

All stationary perfect equilibria of the intertemporal game approach (as slight stochastic perturbations as in Nash (1953) tend to zero) the same division of surplus as the static

Funding: Black Butte Ranch pays full coost of the vanpool and hired VPSI to provide operation and administra- tive support.. VPSI provided (and continues to provide) the

In general we create two files, one containing the declaration of all relevant functions and data while the other file contains the actual code body of each function{. These two

Dickens is the novelist of common people while________ is the novelist of upper cleass people in Victorian era... Rossetti, Swinburne and Morris are the writer of

The Nortel Switched Firewall is a key component in Nortel's layered defense strategy, and is certified under the Check Point Open Platform for Security (OPSEC) criteria and enhances

• TCO (Total Cost of Ownership) is the cumulative, fully loaded cost of a project over time (typically 3 years for IT), and incorporates financial changes over that period such

It is also necessary to decide what, if any, data items from sources outside of the application database should be added to the data model.. Once a data warehouse is