HP Data Protector software and HP StoreOnce backup systems for federated deduplication and flexible deployment

(1)

Brochure

Maximize storage efficiency across the enterprise

HP Data Protector software and HP StoreOnce backup systems

for federated deduplication and flexible deployment

(2)

Maximize storage efficiency across the enterprise

As data volumes double every 12 to 18 months, managing information growth continues to be a top priority for IT

departments.

¹

Data deduplication is one of the most important and fastest growing storage optimization techniques to appear in the recent years.

²

It can help you reduce stored data by up to 95% in just a few short months—significantly reducing your secondary storage footprint as well as backup and restore times.

However, first-generation deduplication technologies have significant drawbacks. Most of the older deduplication technologies are inefficient and difficult to scale, requiring high CPU and memory usage that drives up the cost of deployment.

Many vendors offer different point solutions for source-side and target-side deduplication. But because these solutions use different deduplication algorithms, data must be “rehydrated”

before it is sent across the wire (see Figure 1). This can lead to huge inefficiencies and longer restore times. Additionally, the deduplication agents are made “application aware” to achieve better deduplication ratios on live systems. As a result, applying source deduplication across the enterprise means buying, deploying, and managing different deduplication agents. This puts a strain on your IT budget and makes your infrastructure more complex and harder to manage.

Figure 1: Limitations and inefficiencies of the first-generation deduplication technologies

Heavy resource requirements

First-generation deduplication engines segment data into large block sizes. This coarse- grained chunking can be sufficient for backup servers, but it does not achieve useful deduplication ratios on application servers or clients.

Deduplication is a resource-intensive process; it can place a tremendous load on the server where it is running. Many deduplication technologies use an inefficient process that reads each entire data chunk on disk to determine if a new chunk is a match. Such a laborious process taxes the CPU and slows down hardware and other applications. This can greatly impact the performance of backup or application servers, depending on where deduplication is being executed, to the point where deduplication makes them virtually unusable, or prevents them from scaling to back up large volumes of data.

Data re-hydrated before sending across the wire

Data deduplicated again using diﬀerent algorithm High CPU/memory required Highly fragmented data is stored on disk Full data is sent, high-bandwidth required

Longer restore times

1 3

4 2

5

(3)

3

Reconstituting highly fragmented data chunks can dramatically slow the process of reading and rehydrating data from a deduplicated backup. Recovery performance is a critical criterion for businesses trying to quantify their recovery time objectives in the event of system or site failure.³

Deduplication explained

Data deduplication compares chunks of information to detect duplicates, and stores each unique data segment only once. For this to happen, a deduplication engine assigns a unique identifier to each chunk of data using mathematical hash functions. Once it has identified two chunks of data as identical, the system will replace the duplicate with a link to the original chunk.

There are two architectural approaches to chunking. A fixed deduplication algorithm breaks data into blocks of a fixed size. Variable chunking groups the data into blocks based on patterns in the data itself. The advantage of variable chunking is that it can recognize duplicates when small changes have occurred and merely shift the data from one backup to the next. The technique most commonly used today, variable chunking, leads to higher deduplication ratios.

Deduplication involves a combination of three elements:

• The deduplication engine

• The deduplication store

• Backup agents

The deduplication engine is where the majority of processing takes place. It manages the logic and processing of the backup stream by calculating segments and hash values, identifying unique and repeated segments, and maintaining the hash lookup table.

The deduplication store is the disk storage location managed by the deduplication engine.

It stores the unique (deduplicated) segments, and is often physically coupled with the deduplication engine.

Deduplication-enabled backup agents (for example, media agents, disk agents, and application agents) manage some of the deduplication processes. Agents can be deployed separately from the deduplication engine to offload some of the performance impact. Agents can perform tasks such as segmenting the data, calculating the hash value of segments, and sending new data to the engine and the store. The deduplication agent talks to the deduplication engine to calculate which segments are unique.

Deduplication can take place at the application source, backup server, or target device.

Application source deduplication removes redundant data before it is transmitted to the backup target. This type of deduplication reduces storage and bandwidth requirements, as only unique data is transmitted over the wire. However, it can be slower than target deduplication and increase the workload on servers.

3 Source: “HP StoreOnce: The Next Wave of Data Deduplication,” Enterprise Strategy

Group, November 2011.

(4)

Backup server deduplication shifts the deduplication execution onto a separate dedicated server to maximize the performance of the target device and minimize the impact on the application server.

Target deduplication runs deduplication processing at the backup target and removes redundant data from a backup stream before storing it on the local store. This method can use any backup application the device supports, and the deduplication process is transparent to the backup application.

Backup applications can also deploy and manage target deduplication onto a variety of hardware targets such as disk arrays, tape libraries, and network-attached storage devices.

Target deduplication reduces the volume of storage required for the backup, but it does not reduce the amount of data that must be sent across a LAN or WAN during the backup. With target deduplication, backup agents are not aware of the deduplication process. In backup server or application source deduplication, backup agents will have deduplication technology built-in, and will be deployed onto the backup server or application server as appropriate and send the backup data stream that contains the unique segments and references to duplicate segments. This reduces the network bandwidth required. With replicated deduplication, the unique backup data is sent to a replication target, which enables efficient replication over low- bandwidth links.

Figure 2: The HP StoreOnce deduplication engine can be deployed at an application source, a backup server, or a target appliance such as B6200 appliance or HP Data Protector software store.

HP StoreOnce

HP StoreOnce, a patented technology developed at HP Labs, is the industry’s most advanced deduplication engine available today. HP StoreOnce implements smart techniques such as variable data chunking, sparse indexing, and container matching to deliver a highly efficient deduplication solution that requires 99% less bandwidth and significantly less memory and CPU than traditional solutions. Designed using a modular approach for flexibility, HP StoreOnce provides a common deduplication algorithm that can be deployed as software-only solutions or as dedicated physical or virtual appliances.

HP StoreOnce product family

The HP StoreOnce product family offers a broad array of deduplication deployment options, including Data Protector StoreOnce store, a pure software solution that can be deployed on industry standard hardware, a virtual appliance (StoreOnce VSA), and a series of physical appliances that range from cost-effective, single-node appliances to highly available, multi- node appliances.

The target appliances support standard Virtual Tape Library (VTL) and CIFS/NFS interfaces and HP’s own StoreOnce Catalyst interface. The HP StoreOnce Catalyst API is designed to improve the backup and recovery speed, reduce network bandwidth, and maximize storage efficiency.

HP Data Protector software fully supports writing backup data through all the three interfaces.

• Dedupe where data is created

• Eliminate need for extra hardware at ROBO sites

• Reduces network bandwidth consumption

Application Source

• Oﬄoad application server from dedupe processing

• Higher total target appliance throughput

• Eﬃcient utilization of backup infrastructure

Backup Server

• Software only implementation can utilize generic disk storage

• Purpose built appliance optimized for backup data

• Simplest implementation—no operational changes

Target System

(5)

5

HP Data Protector and HP StoreOnce

HP Data Protector software and HP StoreOnce backup systems solve the challenges associated with traditional deduplication solutions. HP Data Protector is designed to seamlessly work with HP StoreOnce targets—software only, and both physical and virtual appliances—to deliver the most efficient and highly optimized backup and recovery strategies for enterprise and ROBO environments. Together, this solution delivers the industry’s first and only federated deduplication solution (Figure 2) with the flexibility of performing deduplication at the source, backup server, or target system within the environment, depending on performance requirements and business needs.

A federated approach supports the notion that deduplication should be performed only once, anywhere, with efficient data movement, and all managed through a single pane of glass. This unique capability provides maximum flexibility in deployment and maximizes storage efficiency.

Flexible, anywhere deployment

The HP StoreOnce federated deduplication capability provides a common modular architecture that can be deployed across a wide range of hardware—on both physical and virtual devices—

from the edge of an enterprise to the data center. The StoreOnce technology is application- independent, supporting any type of application data or files. HP Data Protector leverages the flexible architecture of the HP StoreOnce deduplication engine to provide a software-based StoreOnce library that allows customers to deploy deduplicated target stores on any industry standard hardware. A single software deduplication store can be shared among multiple clients across a LAN or SAN. Software deduplication can be remotely managed and deployed in remote offices without requiring onsite IT expertise.

Highly efficient and intelligent deduplication technology

Data Protector software deduplication is powered by the HP StoreOnce engine. This patented technology, developed by HP Labs, is portable, flexible, and scalable. Unlike competing solutions, the HP StoreOnce engine can be integrated into hardware or software to deliver greater flexibility and faster backup and recovery performance.

HP StoreOnce offers a thin, efficient footprint that minimizes the load on CPU processing and maximizes application availability. It uses as little as one-tenth of the memory of other available solutions, which means it can be deployed on application or backup servers, and even virtual machines, without crippling performance.⁴

The StoreOnce deduplication engine uses an extremely efficient Adaptive Micro-Chunking technique to segment data into very small blocks, ranging from one kilobyte to 10 kilobytes in size, with an average of four kilobytes. These four-kilobyte chunks are up to one-sixteenth the size of the blocks used by other solutions. This increases HP Data Protector’s ability to find commonality in the data stream during deduplication and, thus, store less data on disk.

StoreOnce also uses algorithms called Sparse Indexing and Container Matching to reduce the number of times the deduplication engine has to read the data to determine if chunks match.

Instead of reading an entire data chunk, these algorithms preview parts of it and compare them to a table of existing chunks stored in memory. This greatly improves throughput and reduces processing requirements.

4 Benchmarking and testing done with equivalent deduplication solutions in a controlled environment in HP R&D lab, Boeblingen, Germany, November 2011.

(6)

Figure 3: Single pane of glass management for backup and recovery, deduplication, and replication across the enterprise.

Efficient data movement

HP StoreOnce enables the most efficient use of system resources and network bandwidth.

Since the entire backup stack (the software and backup appliance) uses a single algorithm, data is deduplicated only once and moved from site to site without the need for rehydration. This includes the ability to set different retention times at different sites based on business needs.

Centralized management and control

HP Data Protector software manages and controls the entire backup and recovery process, from edge to data center, through a single pane of glass. Centralized management enables IT to deploy, manage, and monitor backup agents on the remote and branch office locations eliminating the need for specialized IT staff at these locations. With StoreOnce Catalyst integration, HP Data Protector manages and controls deduplication-enabled, multi- site replication between sites—for locally or geographically distributed environments.

Geographically distributed organizations can take control of the data at its furthest outposts and bring it to the data center in a cost-effective way.

The Data Protector GUI enables you to create both regular and encrypted StoreOnce stores on any StoreOnce target, and proactively manage the storage capacity of any StoreOnce target by configuring a quota threshold to avoid service interruption due to capacity issues.

Small branches

Regional oﬃces HPData Protector

DR sites Primary datacenter

(7)

7

Federated deduplication use cases

Today, much of an organization’s critical information is created and consumed at the remote office/branch office (ROBO) locations. As there is often little to no IT expertise at these small remote locations, they are exposed to data loss—and the subsequent business fallout—

because they are not adequately protected. Additionally, the traditional tape-based approach for a remote office is cumbersome, expensive, and labor intensive.

HP Data Protector’s federated deduplication capability can be deployed across a range of different scenarios; particularly in a global, remote office environment, which can include hundreds of small, medium, and large remote offices with differing backup and recovery needs.

A small remote office protection

Small offices generally have limited space, IT infrastructure, and IT staff. These standalone offices generally have a small number of applications and servers (fewer than five servers) that need to be protected. The data that needs to be protected is typically not very big either. And in many cases, the network connectivity between remote sites and the data center is also not very good (T1 lines, 1.54 Mbps, or worse).

The best backup option in these scenarios is using application source deduplication with HP Data Protector and Catalyst to back up data directly to the central data center, to a StoreOnce backup appliance such as the HP StoreOnce B6200. This option eliminates the need of having a backup appliance on site. Since the data is deduplicated before it is sent across the wire, only the unique data is transferred across the wire. This reduces backup windows, especially in high latency networks. In case of multiple remote sites storing to the same B6200 store in the primary data center, cross-site deduplication will improve the efficiency even further. For example, if all the different remote sites have the same file, it will be stored only once.

With HP StoreOnce Catalyst, HP Data Protector enables IT staff to manage the entire backup and recovery operation from the central site, including deployment of backup agents on the remote site. The extremely lean and efficient HP StoreOnce engine allows applications and software deduplication to coexist on the same server without crippling performance. The HP StoreOnce algorithm delivers a higher deduplication ratio through the use of a smaller chunk size, improving overall storage efficiency. HP StoreOnce deduplication also works for all applications and doesn’t require any customization.

With HP Data Protector, you can deploy a centrally managed, high-performing, and highly efficient backup solution for a very large number of remote offices.

(8)

Figure 4: Application source deduplication reduces the network bandwidth and eliminates the need of storage and server at the remote site providing a cost-effective de-dupe solution for small environments.

Figure 5: Backup server deduplication minimizes impact on application performance and maximizes performance of the target device.

Medium-sized regional offices with local recovery requirements

To support more complex configurations in regional office settings that have a relatively large number of servers and large data sets with local recovery needs, HP Data Protector offers deduplication at the backup server level. A backup server is essentially a backup client with an HP Data Protector media agent installed and running the deduplication task and other standard media management tasks, such as mirroring using object copy. Running deduplication tasks on a dedicated server minimizes impact on application performance and maximizes performance of the target device.

The HP Data Protector media agent can run on all leading operating systems: Windows, Linux, and Unix. A server-side deduplication strategy is very useful in medium sized (5-15 servers) remote offices that have local recovery requirements. HP Data Protector StoreOnce store can be easily created on the backup server itself backing up data locally. The data can be backed up on a local store at the backup server and then replicated to the primary data center for Disaster Recovery purposes. The entire data backup and replication is done using the same algorithm and centrally managed via the HP Data Protector console. This approach reduces the load on the application servers and provides local recovery on the remote sites. Since only unique data is transferred from the remote site to the primary data center, you gain efficient use of network bandwidth and a reduction in backup windows, especially in high-latency networks.

Application Servers Catalyst or Data Protector

B2D Software Store

Deduplicated data sent oﬀsite

UNIX / HP-UX Microsoft Windows

Application Servers

Backup server (local storage optional)

UNIX / Linux Microsoft Windows

UNIX / HP-UX Microsoft Windows

Catalyst or Data Protector B2D Software Store

Deduplicated data sent oﬀsite

(9)

9

Large enterprises with multiple remote offices and data centers

Large data centers that are connected to several remote sites are relatively more complex than a single remote office location. The data centers typically have a large number of

applications, different platforms and storage arrays, physical and virtual servers environments, and generally have IT expertise to support this infrastructure. In these environments the HP StoreOnce backup appliance store can be deployed to backup data center applications and remote site data. The HP StoreOnce B6200 appliance is a multi-node, highly scalable appliance with built-in high availability. In case of a replica data center, data can be replicated to another B6200 device deployed there.

HP Data Protector can centrally manage the entire backup and replication process. Through HP StoreOnce Catalyst, Data Protector can trigger the replication process on B6200 and catalog this information. HP Data Protector enables IT to centrally manage the entire backup, recovery, and replication process. The single point of management provides visibility and control and improves IT staff productivity. As illustrated in Figure 6, HP Data Protector delivers centralized management, rapid data recovery, and maximum storage efficiency in large enterprise environments.

Figure 6: Data Protection in large enterprise environment with multiple remote sites and datacenters

Figure 7: StoreOnce deduplication significantly reduces the amount of storage required for backing up of virtual server environments.

VM

VMware ESX

VM VM

Snapshot Snapshot Snapshot Mount

Remote oﬃce

protection Critical application

protection Required industry

compliance Cloud-based

recovery Rapid data

recovery Backup

Large ROBO

Data Protector Centralized Management

Primary data center

DR site Medium ROBO

Small ROBO

Low bandwidth replication

Low bandwidth replication Low bandwidth replication

Local backup to HP StoreOnce Backup System

HP StoreOnce B6200 Backup System

HPData Protector

serversApp

HP Data Protector Backup Server

Local backup to HP Data Protector software store serversApp

Tape

No backup server hardware needed App server with

HP StoreOnce Catalyst

HP Data Protector Backup Server

HP StoreOnce Backup System

WAN

(10)

Virtual environment protection

As virtualization technologies become more mature and reliable, IT organizations are increasingly deploying mission-critical applications in virtual environments, and these environments require data protection. A large virtual environment can have thousands of virtual machines running the same operating system (for example, Microsoft® Windows or Linux). The duplication of information within virtualized data stores is driving enormous consumption of backup storage resources and the associated capital expenditures.

HP Data Protector provides many advanced options to protect virtual environments. Its policy- based protection for applications and virtual environments automates and simplifies virtual environment protection and frees up IT staff for high priority projects that drive business growth. HP Data Protector’s deduplication capabilities offer significant cost savings through storage efficiency, by eliminating the redundant operating system information across backup images and guest profiles, which provides fast recovery to any data within the backup image.

HP Data Protector provides application-aware, array-based snapshots for virtual environments for a wide variety of storage arrays and applications, ensuring business continuity for 24X7 global operations. Through a single pane of glass, HP Data Protector can manage the entire backup and recovery process across any hypervisor, including snapshots and replication in VMware, Microsoft HyperV, and Citrix Xen environments.

(11)

11

Conclusion

The HP StoreOnce backup solution offers highly flexible, centrally managed, and highly efficient data protection for any enterprise. HP StoreOnce technology provides a common architecture across software and hardware—at remote sites and in the data center—enabling deduplicated data movement from edge to core, without having to rehydrate at multiple deployments. Powered by the industry’s most advanced deduplication engine HP StoreOnce, HP Data Protector and the HP StoreOnce appliance deliver federated deduplication that enables deduplication of data at any location in the backup stack. HP Data Protector software provides the single point of management of the entire data movement—backup, replication, and data recovery. HP Data Protector and HP StoreOnce help organizations maximize their critical storage resources through the most efficient deduplication available, while meeting stringent business SLAs and minimizing backup infrastructure related costs.

(12)

Sign up for updates

hp.com/go/getupdated Share with colleagues

About HP Autonomy

HP Autonomy is a global leader in software that processes unstructured human information, including social media, email, video, audio, text and web pages. Using HP Autonomy’s information management and analytics technologies, organizations can extract meaning in real time from data in virtually any format or language, including structured data. A range of purpose-built market offerings helps organizations drive greater value through information analytics, unified information access, archiving, eDiscovery, enterprise content management, data protection and marketing optimization.

Additional information is available at autonomy.com. Learn more

hp.com/go/dataprotector.