• No results found

Understanding Connected DataProtector

N/A
N/A
Protected

Academic year: 2021

Share "Understanding Connected DataProtector"

Copied!
204
0
0

Loading.... (view fulltext now)

Full text

(1)

Understanding

Connected DataProtector

Version 7.1

Includes Data Center Information

(2)

©2003 Connected Corporation. All Rights Reserved.

Connected, Connected DataProtector, Connected EmailOptimizer, iRoam, Delta Block, SendOnce, the Connected design, and the Connected logo are trademarks and/or registered marks of Connected Corporation.

All other brand or product names are trademarks or registered trademarks of their respective owners.

The information in this publication is subject to change without notice and should not be considered a commitment by Connected Corporation. While Connected has made every effort to ensure the accuracy and completeness of this publication, it assumes no responsibility for the consequences to users of any errors that may be contained herein.

Connected Corporation 100 Pennsylvania Avenue Framingham, MA 01701 508.808.7300 Main Voice 1.800.675.5971 Technical Support [email protected] www.connected.com

Understanding Connected DataProtector, 1st Edition Connected DataProtector Version 7.1

September 10, 2003 Printed in USA

(3)

iii

About This Manual . . . ix

Overview of the Connected DataProtector Architecture . . . x

Related Documentation . . . xii

Part I: Data Center . . . 1

Chapter 1: Data Center Overview. . . 3

Data Center Server . . . 3

Hierarchical Storage Manager. . . 4

Support Center . . . 4

iRoam . . . 4

Chapter 2: Data Center Configurations . . . 5

Standalone Server . . . 6

Mirrored Pair . . . 7

Cluster. . . 8

HSM Versus Disk-Only . . . 9

Chapter 3: Archive Sets. . . 11

The Agent’s Role in Archiving . . . 12

What Happens to Archive Sets on the Data Center . . . 13

Naming and Identification Conventions . . . 13

Sizing and Splitting Archive Sets . . . 14

Structure and Contents of an Archive Set . . . 14

File Expiration Dates and Rebasing . . . 15

Chapter 4: Data Center Services . . . 17

BackupServer . . . 17 IndexServer . . . 18 ReplicationServer . . . 19 PoolServer . . . 19 HSMServer . . . 20

T

ABLE

OF

C

ONTENTS

(4)

Table of Contents

Compactor . . . 21

Chapter 5: Hierarchical Storage Manager . . . 23

Migration and Purge. . . 24

Tape Groups and Tape Account Groups. . . 24

Tape Sets. . . 25

Multiple Tape Libraries . . . 29

Chapter 6: Compactor . . . 33

Check for Necessary Disk Space . . . 34

Select Accounts or a Tape Account Group . . . 34

Perform a System Analysis and Repair . . . 35

Mark Files as Expired . . . 35

Repackage Archive Sets . . . 35

Delete Archive Sets and Database Entries. . . 36

Migrate New Archive Sets to Tape. . . 36

Inform the Agent of Changes . . . 36

File Expiration Rules. . . 37

EMC Centera and Garbage Collection . . . 38

Chapter 7: Databases . . . 39

Directory Database . . . 39

Registry Database . . . 40

Asset Database. . . 40

Database Protection . . . 40

Chapter 8: Data Center and Enterprise Directory . . . 43

Validate Support Center Technicians. . . 44

Maintain End-User Personal Data . . . 44

Enterprise Directory Management . . . 44

Chapter 9: Licensing . . . 45

Chapter 10: Data Center Logging . . . 49

Event Logging . . . 50

Event Log Maintenance . . . 51

Trace Logging . . . 52

Chapter 11: Support Center Overview. . . 53

Support Center Uses . . . 53

Support Center Interface . . . 55

(5)

Table of Contents

Understanding Connected DataProtector v

Chapter 12: Accounts and Communities . . . 59

Account Management. . . 59

Communities . . . 64

Chapter 13: Agent Configurations . . . 67

Configuration Components . . . 67

The Default Agent Configuration. . . 68

Updating Agents Automatically Using Central Administration. . . 69

Chapter 14: Technicians . . . 71

The Admin Account . . . 71

Access Permissions for Technicians . . . 72

Enterprise Directory with Technician Accounts. . . 73

Chapter 15: File Selection Rules . . . 75

Data and FSR Rules. . . 75

Base and User Rules . . . 76

How Rules Affect File Selection . . . 77

Types of Rules . . . 77

Rule Precedence. . . 81

Chapter 16: Support Center Reports . . . 83

Default Reports . . . 83

Create and View Reports . . . 84

Viewing and Displaying Charts . . . 85

Saving Report Results in XML . . . 85

Account Groups. . . 86

Chapter 17: iRoam . . . 87

iRoam User Interface . . . 87

File Retrieval Process . . . 89

Cleaning Files Off of the iRoam Server . . . 90

Part II: Agent . . . 93

Chapter 18: Agent Overview . . . 95

Agent Interface . . . 96

Deployment . . . 96

Backup . . . 96

Retrieve . . . 96

(6)

Table of Contents

Security . . . 97

Logging. . . 97

Linking to an External URL . . . 97

Chapter 19: Agent Deployment. . . 99

How the Agent is Created and Deployed . . . 99

Recovering Agent Accounts . . . 102

Chapter 20: Agent Interface . . . 103

Agent Tabs . . . 103

Hot Key Commands . . . 105

Command Line Interface . . . 105

Chapter 21: E-mail Storage and Optimization . . . 113

Standard E-Mail Backup . . . 113

Optimized E-mail Backup . . . 114

Chapter 22: File Backup . . . 117

Scan of the Computer Hard Disk . . . 117

Analysis of Files Identified in the Scan . . . 118

Connection to the Data Center . . . 120

File Transmission . . . 120

Backup Results. . . 121

Backup Settings and Configurations . . . 121

Audit . . . 125

Chapter 23: File Retrieval . . . 127

Selecting Files. . . 127

Selecting a Destination Location . . . 128

Repackaging Files . . . 128

Downloading Files . . . 128

Optional Retrieve Features . . . 129

Chapter 24: Agent to Data Center Connections . . . 131

Agent Connection Properties . . . 131

Firewalls . . . 132 Network Interrupts . . . 133 Chapter 25: Heal . . . 135 Heal Requirements . . . 135 Using Heal . . . 137 Healing from CD . . . 138

(7)

Table of Contents

Understanding Connected DataProtector vii

Chapter 26: Agent Security Features . . . 139

Encryption Keys . . . 139

Account Password. . . 141

Access Control List Management . . . 141

Unauthorized Access Prevention . . . 142

Chapter 27: Agent Logs and Messages. . . 143

Viewing Agent Logs . . . 143

Creating Agent Messages. . . 144

Chapter 28: Agent Information Web Pages. . . 145

Agent Interface Links . . . 146

Agent Link Query Variables . . . 146

Part III: Data Center Tools . . . 149

Chapter 29: Data Center Tools Overview . . . 151

Data Center Management Console. . . 151

CD Maker . . . 151

Other Tools and Utilities . . . 152

Chapter 30: Data Center Management Console. . . 153

Starting the DCMC . . . 155

DCMC User Interface . . . 155

Chapter 31: CD Maker. . . 159

CD Maker Process . . . 160

Chapter 32: Other Tools and Utilities . . . 163

Data Center Toolkit. . . 163

DataCopier . . . 164

Remote Diagnostic Tool . . . 165

Compress . . . 166 DSPing . . . 166 Dump . . . 167 FileDater . . . 169 HostID. . . 170 ReplCheck. . . 171 Retrieve . . . 173 Tdate Converter. . . 174 HSM Disk Status . . . 175

(8)

Table of Contents

HSM Library Status. . . 177 Index . . . 181

(9)

ix

A

BOUT

T

HIS

M

ANUAL

Your organization’s data is your most critical asset. Connected DataProtector™ is a client-server solution for safeguarding the data on your organization's

computers. As such, Connected DataProtector offers the following features: • Data protection client software that automatically or manually backs up

data on all computers on your network

• The ability to access and retrieve backed-up data from the actual computer or remotely from any computer via a Web interface • System protection and repair of operating system state, registry key

settings, and applications for damaged systems

• A migration feature for setting up an end user on a new computer by moving their data from the old system on to the new system

• The ability to audit the software that's installed on each computer in your network and the hardware on which it is running

• Other tools for managing and facilitating Data Center administration Additionally, Connected DataProtector with Connected EmailOptimizer™ facilitates efficient e-mail backup and restoration of your organization’s e-mail files.

This introductory chapter offers a general overview of the Connected DataProtector solution. It provides a high-level look at the Connected

DataProtector architecture, which will be expanded upon in later chapters. This chapter also acquaints you with the different components in a Connected DataProtector deployment and describes the basic processes during backup and restore operations.

(10)

About This Manual

Overview of the Connected DataProtector

Architecture

The Connected DataProtector architecture includes several components. In the simplest terms, it is a client-server application. The client side software is called the Agent. The server side software, and the hardware on which it is installed, is called the Data Center. A Web interface, called Support Center, provides access to Agent management tools that you can use to create and deploy custom Agents to all of the end users in your organization. Connected DataProtector also includes other components, databases, and technologies that help you manage backups and restoration of critical data.

The illustration below provides a diagram of Connected DataProtector’s basic architecture. You might want to refer this illustration while reading the rest of this section.

Data Center

The Data Center provides the “backbone” operations for the Connected

DataProtector system. The Data Center processes all requests (such as to back up or retrieve data) from all of your deployed Agents.

(11)

Overview of the Connected DataProtector Architecture

Understanding Connected DataProtector xi

Depending on your organization's preferences and the size of your deployment, your Data Center can have one of several different configurations ranging from one server to a cluster of many servers.

The Data Center can also include Hierarchical Storage Manager (HSM) for migrating backed-up data to an archive storage device, such as a tape library or EMC Centera. HSM augments the Data Center by providing more storage space as backup data on your Data Center grows beyond your allocated disk storage capacity. It also provides extra protection by enabling you to create secondary sources of your backup data.

Support Center

Support Center, covered in Part I: Data Center, is a Web-based application that enables you to manage the Agents deployed throughout your organization. Through Support Center, you can create the Agent Setup program, modify the Agent configuration, view end-user account information, set up and print reports about your accounts and Data Center, and provide troubleshooting assistance to your end users.

You can also use Support Center to manage communities, which are groups of user accounts that are related in some way, allowing for easier management of groups as a whole. For example, you might have a community for all user accounts that belong to people in your marketing department and another for all accounts within your finance department. Or you might have a community for all users who work with laptop computers (and are therefore not always connected to your network) and another for all users who work with desktop computers.

Agent

The Agent, covered in Part II: Agent, is deployed on all of the computers that you are protecting in your organization. It provides an interface for end users to work with the system as well as internal functionality to initiate contact with the Data Center server for such activities as backing up data, retrieving data, and

performing a Heal on the client. For these activities, the Agent initiates all activity with the Data Center. The Data Center does not access the Agent unless first initiated by the Agent.

When you first set up the Agent on a computer, the system creates a user account for that particular computer. The account is called a user account because the end user can transfer all of the data associated with the account to another computer-in effect, staycomputer-ing with the end user. However, a user account can exist on only one computer at any given time.

(12)

About This Manual

Other Tools

Other tools, covered in Part III: Data Center Tools, are used to assist you with Data Center administration. These tools include:

• Data Center Management Console (DCMC), for you to monitor and control Data Center operations and settings

• The CD Maker application, for burning CDs of backed-up data for account archival or if end users need to retrieve data via a CD • Other miscellaneous Data Center tools and utilities

Related Documentation

For detailed installation and configuration procedures, refer to the Setting Up Connected DataProtector manual. It contains information for both first-time installations and subsequent upgrades. The Maintaining Your Data Center manual contains recommended procedures and useful checklists that facilitate ongoing maintenance of your Data Center.

(13)

Chapter 1: Data Center Overview

Chapter 2: Data Center Configurations

Chapter 3: Archive Sets

Chapter 4: Data Center Services

Chapter 5: Hierarchical Storage Manager

Chapter 6: Compactor

Chapter 7: Databases

Chapter 8: Data Center and Enterprise Directory

Chapter 9: Licensing

Chapter 10: Data Center Logging

Chapter 11: Support Center Overview

Chapter 12: Accounts and Communities

Chapter 13: Agent Configurations

Chapter 14: Technicians

Chapter 15: File Selection Rules

Chapter 16: Support Center Reports

Chapter 17: iRoam

P

ART

I:

(14)
(15)

3

1

D

ATA

C

ENTER

O

VERVIEW

The Data Center is comprised of several components. Each component plays a role in managing or storing the data backed up by your end users. The chapters that follow discuss each of the components in detail. The Data Center components include:

• Data Center Server

• Hierarchical Storage Manager (HSM) • Support Center

• iRoam™

Data Center Server

The Data Center server processes all data backed up by Agents deployed to your organization. The Data Center server employs several services and processes to manage the data. A chapter follows on each of the following items to assist you in understanding the concepts behind each service and process:

• Data Center configurations • Data Center services • SQL databases • Compactor • Licensing • Logging

(16)

Chapter 1 / Data Center Overview

Hierarchical Storage Manager

HSM is used by the Data Center as extended storage for data backed up by end users. HSM copies data to the archive storage device connected to the Data Center server. It uses three main components to perform most of its tasks. Refer to

Chapter 4: Data Center Services, beginning on page 17 for more information on the HSM services. Refer to Chapter 5: Hierarchical Storage Manager, beginning on page 23 for more information on how HSM works.

Support Center

Support Center is a Web-based application used to monitor and manage end-user accounts. A chapter follows on each of the following items to assist you in understanding the concepts and uses of Support Center:

• Accounts and communities • Agent configurations • Managing file selection rules • Technicians

• Reports

iRoam

iRoam is an optional Web-based application that your end users can use to retrieve files. Since iRoam is Web-based, files are retrievable from any computer with a Web browser and an Internet connection. Refer to Chapter 17: iRoam, beginning on page 87 for more information about using iRoam.

(17)

5

2

D

ATA

C

ENTER

C

ONFIGURATIONS

A Data Center is one or more servers running various software components to manage data. There are several types of server configurations, depending on the hardware available in the Data Center (these configurations are explained in more detail later in this chapter):

• A Data Center with only one server is said to have a standalone server. • A mirrored pair is a pair of Data Center servers containing identical data.

Having two identical servers protects against data loss in the event of a disaster to either mirrored server. It also enables Agents to access a secondary server in the event that the Agents’ primary server is not available.

• A clustered Data Center is composed of two or more pairs of mirrored servers. The servers in a clustered Data Center are referred to collectively as a cluster.

Any of these configurations can use Hierarchical Storage Manager (HSM) for archive storage. An HSM configuration provides greater protection against data loss. A configuration without HSM is referred to as a disk-only configuration. The Data Center servers provide functionality for a registration server and a backup server (both of which are performed by the same Data Center server). The registration server processes requests for new accounts, while the Backup Server receives and stores backup data from Agents (end-users) and enables Agents to retrieve backed up data upon request.

A Data Center also includes a Support Center server and an iRoam server to run these two Web-based applications. Support Center is a Data Center management tool and iRoam provides a Web interface for end users to retrieve files via the Web.

(18)

Chapter 2 / Data Center Configurations

You can run these two applications on the same server or on separate servers, but run them on a Web server, as opposed to on your Data Center servers. These concepts are explained in more detail later in this manual.

Standalone Server

The standalone configuration consists of one Data Center server for accepting Agent backups and retrieve requests, a Web server to host Support Center and optionally iRoam, and an optional archive storage device for HSM. Using HSM with a standalone Data Center is not required but it is highly recommended in case of disk failure. If a disk of a standalone Data Center fails, your end users’ data will be lost because there is no copy or secondary storage.

A standalone Data Center has all of the same software and functionality as a mirrored Data Center but lacks fail-over security. The lack of fail-over redundancy effects the Data Center in the following ways:

• Potential loss of data • Agent fail-over

• Data Center availability

Data backed up to the Data Center server is stored on local disks. In a standalone configuration the data on the local disk is the only copy of the data. If a local disk fails, the data is lost. If you use HSM the data is moved to the archive storage device after a disk space threshold is reached. Even if you use HSM, the data on disk is vulnerable because the data is not immediately moved to the archive storage device. For this reason, using a Secondary Tape Set is highly

recommended. A Secondary Tape Set provides a layer of security in that it holds a second copy of archive sets backed up by end users. You can remove this second set of tapes from the library and store them in a different location on a scheduled basis, or as they become full. For more information on Secondary Tapes Sets refer to Tape Sets, on page 25.

Agent fail-over is another limitation of a standalone Data Center. If the Data Center server is unavailable for backups or retrieves, the Agent must wait for it to become available. In a mirrored Data Center configuration, if one server is unavailable the Agent automatically connects to the mirrored server. End-users are not aware of server downtime with a mirrored configuration.

A service outage is necessary when performing maintenance on a standalone Data Center server that requires shut-down or restart. There is no disruption of service to the end-user during a shut-down or restart with a mirrored Data Center as discussed in the previous paragraph.

(19)

Mirrored Pair

Understanding Connected DataProtector 7

Mirrored Pair

For maximum protection against data loss due to hardware or other failure, you can operate two identical Data Centers, ideally at physically separate sites. Each Data Center uses the same software, stores the same archive sets, and services the same user communities. For this reason, they are referred to as mirrored Data Centers. Mirrored Data Centers provide redundant storage of data and fail-safe availability of file backup and retrieval service.

The two mirrors are redundant peers; neither is dominant over the other. Each is fully capable of performing any backup or retrieval operation.

When a new user account is created, one of the mirrored Data Center servers is assigned as the primary Data Center for that account. When an Agent needs to contact a Data Center, it contacts its primary Data Center. However, if the primary Data Center is unavailable, the Agent contacts the mirror instead, and all operations proceed normally. The alternating assignment of Data Centers as primary to individual accounts accomplishes load balancing, so that an approximately equal number of sessions connect with each of the two Data Centers.

Either Data Center can create new accounts or modify account information. Any change to the Registry database is dynamically replicated (that is, duplicated) across mirrored Data Centers so that the Registry databases on the two mirrors have identical content (for more information on the Registry database, as well as other Data Center databases, refer to Chapter 7: Databases, beginning on page 39). During a backup session, backed-up data is received at one Data Center, stored as an archive set on its disk, and recorded in its Directory database. To protect the data against failure of the Data Center's disk, the data is automatically replicated to the mirror server, and is recorded into the mirror's Directory database. This means that every archive set is stored redundantly on both Data Centers: initially on the disk and then migrated by the HSM to that server's archive storage device, if used. Therefore, either Data Center server can retrieve any file that was backed up, even if the backup was originally sent to the other Data Center server. It is possible that an archive set will be received at a Data Center when its mirror is down (or while communication between the mirrors is down) and it is not possible to immediately replicate the archive set. When this occurs, the replication is deferred, and Data Center software automatically performs all deferred replications when the mirror, or communication to it, is restored to operation. This process of reestablishing equivalency of data between the mirrors is called recovery of replication or resynchronization.

(20)

Chapter 2 / Data Center Configurations

The following illustration shows Connected DataProtector configured at the highest level of data protection: a mirrored RAID disk with an attached archive storage device, and each side of the mirror located in a different locale.

Cluster

For Data Centers that serve a large number of users and/or receive a large amount of data, a simple mirrored setup (in which each site has a single server dedicated to receiving backups) might not have sufficient capacity to handle the load of backup and retrieval activity. In this situation, you might prefer to set up clusters at each site (not to be confused with Windows Clustering).

In a clustered setup, each Data Center (physical location) has two or more Data Center servers. Each mirrored pair of Data Center servers provides backup services to a subset of the user community. The Data Center servers collectively are referred to as a cluster. They share the same Registry database in SQL, but each mirrored pair of servers has its own Directory database.

(21)

HSM Versus Disk-Only

Understanding Connected DataProtector 9

The following illustration shows a cluster:

In this illustration, Data Center servers A1 and B1 form a mirrored pair. They share the same set of users and maintain identical Directory databases. The same is true of Data Center servers A2 and B2, which are mirrors of each other. Servers A1 and A2 serve the entire user community. The same is true of servers B1 and B2. Servers A2 and B2 use the Registry databases that reside on A1 and B1 respectively. In this configuration, each individual user account is assigned to a server pair at the time the account is created. Thereafter, the end user can perform backups to, and retrieves from, the Data Center server on which the account was assigned or that server's mirror. For example, if an account is assigned to server A2, it can back up to A2 or B2, but never to A1 or B1.

HSM Versus Disk-Only

A Data Center that uses HSM has the advantage of archive storage off of the local disks. HSM provides an extra level of security in that the archive sets are not all in one location (on the server’s local disk). Higher data loss is possible with a disk-only configuration in the event of local disk failure.

The Data Center supports the use of multiple disk volumes for archive set storage. You can set up your Data Center to use multiple volumes to store archive sets during Data Center Setup. You can also add and manage disk volumes using the DCMC after the Data Center has been installed. Multiple volume support is available only to disk-only Data Centers.

For more information on HSM refer to Chapter 5: Hierarchical Storage Manager, beginning on page 23.

(22)
(23)

11

3

A

RCHIVE

S

ETS

Optimal, reliable, and secure storage begins with consistent, efficient, and well ordered organization of backed-up data. Connected DataProtector employs a process for archiving files that greatly reduces the amount of overhead. Central to this is the process of combining new and modified client files into size-constrained archive sets.

Archive sets are files that contain compressed and encrypted files transmitted from a client by the Agent. The size of Archive sets is generally limited to 5 MB in order to facilitate transmission and preservation of Data Center disk space. Archive sets are stored on the Data Center server. In mirrored or clustered Data Center environments, archive sets are replicated to each server’s mirror.

To help you understand how archive sets are created, stored, and managed, this chapter discusses the following:

• The role the Agent plays in compressing and securing data for storage on the Data Center

• What happens to the archive set in various types of Data Center configurations

• Conventions used to name, identify and store archive sets • Sizing strategies used to regulate the size of archive sets • How archive sets are structured

(24)

Chapter 3 / Archive Sets

The Agent’s Role in Archiving

The Agent is responsible for determining which files to back up, preparing the file data for transmission and transmitting the file data to the Data Center. Preparing the files for transmission is a two step process that involves compressing and encrypting the client’s file data. Each Agent represents a single account, or client computer. All of the data in a single archive set originates from one client, and its account number is included in the archive set’s header for identification.

The term account is frequently used throughout this manual to refer to an end-user computer or laptop. Account data is used to describe the body of data that originated from a specific end-user account.

The Agent relies on a set of rules that defines which types of files to include and exclude from backups. When a backup session begins, the Agent scans the client hard drive to identify new and modified files that are eligible for backup. For the first system backup, the Agent sends entire files to establish a base for each file. For subsequent backups, it sends just the changes that have been made to the file since the last good backup. This change is called a delta. After scanning the client’s hard drive, the Agent determines which files to send and begins processing them for transmission to the Data Center.

Prior to transmission, the Agent compresses each file using the ZLIB compression library. ZLIB is a compression algorithm that significantly reduces the size of a file. The Agent also encrypts each compressed file using one of several levels of encryption available in Connected DataProtector, ranging from no encryption, to the 128-bit Advanced Encryption Standard (AES), the strongest level of encryption available today.

Using the Agent Configuration Editor (ACE) in Support Center, you can configure the level of encryption that the Agent uses, ranging from no encryption, 40-bit DES, 56-bit DES, 112-bit DES to 128-bit AES. Refer to Support Center Help for more information about configuring Agent security settings.

The Agent transmits the compressed, encrypted files to the Agent’s designated primary server on the Data Center using TCP/IP protocol via a local area network (LAN), wide area network (WAN), or a dial-up modem. In response, the Data Center creates a new empty file called an archive set and begins receiving the files from the Agent.

(25)

What Happens to Archive Sets on the Data Center

Understanding Connected DataProtector 13

What Happens to Archive Sets on the Data Center

The life cycle of an archive set varies depending on the configuration of the Data Center on which it resides. From the first transmission of account data to the Data Center, the life cycle of an archive set proceeds as follows:

1. The Data Center BackupServer service apportions the client data into one or more archive sets on the volume with the highest amount of available disk space.

2. The Data Center IndexServer service indexes the archive sets and their individual contents in the Directory database to keep track of each revision of each backed up file.

3. In mirrored or clustered Data Centers, the ReplicationServer service replicates the archive sets to the mirror (if applicable).

4. In Data Centers with an optional auxiliary storage device, such as a tape library, the HSMServer service migrates archive sets from the server to the auxiliary device.

5. The Compactor service routinely checks each account’s archive sets for any expired revisions, and freshens (repackages) the archive sets as needed.

The Data Center services that are responsible for archiving, indexing and (where applicable) replicating data to a mirror, and the individual roles they play, are described in further detail in Chapter 4: Data Center Services, beginning on page 17.

Naming and Identification Conventions

Archive set files have .arc file extensions and are written to a Customers directory on one of the Data Center server volumes. The contents of a single archive set always originate from the same user account. In disk-only Data Centers that contain multiple volumes, archive sets from one account may reside on different volumes. Writing archive sets to the volume with the most available free space facilitates even distribution of disk load.

Never delete .arc files. This will result in a permanent loss of data and will make it impossible to restore the deleted information to the affected client.

(26)

Chapter 3 / Archive Sets

Sizing and Splitting Archive Sets

When an Agent transmits account data to the Data Center, the BackupServer service is responsible for grouping it into archive sets. As it does so, BackupServer regulates the size of the archive sets and creates new archive sets for the client data, if necessary. A new archive set is automatically created under the following conditions:

• The archive set reaches 5 MB in size.

• The number of files in the archive set totals 2,000.

• The next file en route from the client is greater than 50 MB.

The BackupServer service creates new archive sets as needed to efficiently group the data into the smallest possible units. For example, if files A, B, and C together represent 3 MB, but file D equals 60 MB, then A, B, and C are combined together in one archive set and a new archive set is created for file D.

Archive sets can include information from many individual files. Baselines of new files and deltas of older files can reside together in the same archive set. However, multiple versions of a single file exist in multiple archive sets. This is because new archive sets of account data are created during every backup session. A large file of 10 MB that undergoes significant revisions between each of five backup sessions will span five archive sets. For example, the baseline of one file is written to one archive set, and each of its deltas to additional archive sets for a total of five sets.

All of the data contained in a single archive set comes from one backup session from a single account.

Structure and Contents of an Archive Set

Archive sets are structured to contain three basic elements: • An archive header

• File headers • File data

The archive header contains information about the archive file itself, including its name, the account from which the data originated, and the current version of the Agent in use when the archive set was created.

(27)

File Expiration Dates and Rebasing

Understanding Connected DataProtector 15

The file header or headers contain detailed information about the files in the archive set. Specifically it identifies whether the file is a baseline or delta, the current revision level (if it is a delta), and its path and filenames among other details. When a file has been deleted from the client, or has been excluded from the backup list, the Agent transmits just a file header that signals to the Data Center that the file has been deleted or excluded. The third element is the file data itself in its compressed and encrypted form.

It is possible for an archive set to contain only file headers and no file data if many files are deleted or excluded from backup between backup sessions.

File Expiration Dates and Rebasing

As files are backed up, the Agent maintains a complete copy of the base file, and then captures the changes that occur between backups in delta files. The delta files are not complete pictures of the original file. They are only the changes that have occurred to the file since its last backup. After some time and several backups, that one file may have accumulated many delta files, and reconstitution of that file for a retrieval request requires the Agent to get the baseline and all the accumulated deltas.

At some point, the original base file and some of its deltas expire, depending on the expiration settings set for the Data Center (for example, some organizations, such as lawyers or medical offices, might turn off expiration rules for legal reasons).

Compactor searches for expired files. All the base and delta files are contained in multiple archive sets. Compactor evaluates each file and, when it determines that the base file or the base file and several delta files are expired, it creates a new baseline or rebases the file.

Rebasing is simply the process of rolling up the original base file and any expired deltas to create a new base. Compactor must rebase a file before its expired base and deltas can be deleted. It extracts the base file and deltas from the archive set or sets and rolls them up into a new file. After creating the new base file, Compactor deletes the expired files and then repackages the remaining unexpired files into one or more new archive sets.

(28)

Chapter 3 / Archive Sets

For example, a new file is created and backed up from the account on 3/17. Subsequent modifications and backups occur over the next two days and then cease for that file.

The next time Compactor reviews this account, it rolls up the data in “base1”, “delta1”, and “delta2” to create a new baseline, “base2”:

For more information about Compactor, refer to Chapter 6: Compactor, beginning on page 33.

archive set1 base1 created and backed up on 3/17

archive set2 delta1 modifications backed up on 3/18

archive set3 delta2 modifications backed up on 3/19

archive set1 base1 expired on 5/17 archive set2 delta1 expired on 5/18 archive set3 delta2 expired on 5/19 archive set6 base2 rolled up file

created and archived on 5/31

(29)

17

4

D

ATA

C

ENTER

S

ERVICES

The Data Center is comprised of several services to perform the necessary tasks of running the Data Center server. The following services are included in the Data Center and are discussed in detail in the following sections:

• BackupServer • IndexServer • ReplicationServer • PoolServer • HSMServer • Compactor

BackupServer

BackupServer is the Data Center service that processes requests from the Agent for data backup and retrieval. BackupServer gathers together all of the backed-up data into an archive set. The archive set is a file stored on the Data Center server’s disk in a directory called Customers. It contains file backup data transmitted from a client during a single backup session. 1 For more information on archive sets, refer to Chapter 3: Archive Sets, beginning on page 11.

When the Agent requests BackupServer to retrieve a file to a client, BackupServer must find the first backup of the file (called the baseline) and all of that file’s changes (called deltas) necessary to recreate the specific version of the file that the end user has requested.

1. If the backup data from a single backup session is large, the BackupServer service uses more than one archive set, each representing a portion of the backup session. This helps to optimize data recovery performance.

(30)

Chapter 4 / Data Center Services

For example, if the end user has requested to retrieve the third backed-up version of a file, BackupServer must retrieve the baseline (version 1), the delta that represents the differences between version 1 and version 2, and the delta that represents the differences between version 2 and version 3. Since the baseline and the deltas were backed up in different backup sessions, they are in different archive sets. Therefore, BackupServer typically uses multiple archive sets in order to retrieve a file.

In addition to processing requests for data backup and retrieval, the BackupServer service manages the list of authorized user accounts and registers new accounts. There is one user account for each client that is being backed up.

Backed-up archive sets are stored on the Data Center server in a folder called Customers. Archive sets are saved as files with the extension .arc. Under no circumstances should you ever delete an .arc file from the Customers directory. Doing so would mean deleting end users’ data and rendering it unrecoverable.

BackupServer starts automatically with Windows Server. Status and statistics for BackupServer are found in the DCMC. To view the service in the DCMC, expand the Data Center server name and click BackupServer.

IndexServer

IndexServer is the Data Center service that indexes file and archive set information to database tables. As end users backup archive sets to the Data Center server, information about each file within the archive set must be stored in the Directory database. The IndexServer writes this information to the Directory database once the archive set is fully written to the Data Center server from the Agent. When the indexing process is finished, the archive set is queued for replication to the mirrored server, if a mirrored configuration is used.

If the Data Center is mirrored or clustered, the IndexServer writes information to the database for all archive sets that have been replicated from the mirrored server. IndexServer starts automatically with Windows Server. Status and statistics for IndexServer are found in DCMC. To view the service in the DCMC, expand the Data Center server name and click IndexServer.

(31)

ReplicationServer

Understanding Connected DataProtector 19

ReplicationServer

The ReplicationServer service only runs on mirrored and clustered configurations. This service replicates the following content between the servers in a mirrored pair:

• Archive sets • Database table rows • Agent configurations

After an archive set has been backed up by the Agent to the Data Center server and indexed to the database, it is put into a queue to be replicated to the mirror. The archive set is replicated as a whole to the mirror rather than bit by bit as it is backed up by the Agent.

Most, but not all, of the database table rows in the schema are replicated between the servers in a mirrored pair. When a row is either inserted, deleted, or modified, it is queued for replication between the mirrored servers.

When you use either Support Center or Agent Configuration Editor (ACE) to create files to be downloaded to Agents, the files created must be replicated between the mirrored servers. ReplicationServer queues both the Agent

configuration file(s) and the corresponding database table rows for replication to the mirror. In order for file downloads to Agents to be successful, the files and database rows must be on both servers because Agents can connect to either Data Center server. The server they connect to first is dependent on which server they are configured to contact first. Therefore, it is necessary for Agent configuration files to be available on all servers in the Data Center.

ReplicationServer starts automatically with Windows Server. Archive sets and database entries are replicated continuously when ReplicationServer is running. If it becomes necessary to pause or stop replication, you can pause or stop the service in the DCMC. You can view the status and progress of the replication service in the DCMC by expanding the Data Center server name and clicking

ReplicationServer.

PoolServer

PoolServer is the Data Center service that maintains the shared file pool used to implement ConnectedSendOnce® technology. SendOnce provides a method for identical files from multiple Agents to be backed up once. This method reduces the storage needed on the Data Center server since multiple copies of the same file are not stored on the server.

(32)

Chapter 4 / Data Center Services

Application, operating system, and common organizational files take the greatest advantage of this feature. The PoolServer service performs a process called Copy On Reference and cleans the pool of uncommon files.

The Copy On Reference process makes copies of files that have been backed up by more than one Agent. When an identical file is backed up by two Agents, SendOnce places it in a queue for Copy On Reference. Copy On Reference makes a copy of the file and places it in a special account known as the Pool Account, account number 999999999. Any Agent that backs up the same file references the copy instead of sending another full copy of the file to the server. Also, any Agent that has backed up the file that now wants to retrieve the file retrieves the copy from the Pool Account.

PoolServer cleans the pool of uncommon files every 14 days. An uncommon file is a file that has not been backed up by another account within 14 days. These files are removed to keep the Directory database from growing too large and to keep the performance of the SendOnce operation as efficient as possible. You can change the number of days uncommon files are kept in the pool in the DCMC. PoolServer starts automatically with Windows Server. Use the DCMC to view the status and statistics of the PoolServer. In the DCMC, expand the Data Center server name and click PoolServer.

HSMServer

HSMServer is the Data Center service that processes the copying of archive sets between the local server’s disk and the archive storage device. There are three components to HSMServer:

• HSMClient

• BackupHSM

• HSMPurge

HSMClient is invoked by BackupServer to pass archive set copy requests to the BackupHSM service. The HSMClient monitors the processing of the requests and mediates between BackupServer (the Windows service) and BackupHSM. BackupHSM handles the operations for archive storage devices. HSMServer supports tape libraries and EMC Centera archive storage devices.

It is not recommended that you pause the BackupHSM service. When BackupHSM is paused you cannot cancel requests or view the status in DCMC. You can unmount a tape manually from a tape library while BackupHSM is paused.

(33)

Compactor

Understanding Connected DataProtector 21

The library audits its contents and then BackupHSM audits the library. If it is necessary to stop HSM activities stopping BackupHSM alerts the service to complete the current request and then stop.

It is the job of HSMPurge to migrate (copy) archive sets from the disk to the archive storage device and, when necessary, purge (delete) archive sets from the disk in order to create free disk space.

When the end user wants to retrieve files, BackupServer sends a request to HSMClient to retrieve the appropriate archive set(s). BackupHSM copies the archive sets from the archive storage device back onto the server’s disk where BackupServer can process them. Refer to Chapter 5: Hierarchical Storage Manager, beginning on page 23 for further information on HSM.

Compactor

The Compactor service works to clean old data off of the Data Center. Compactor checks for synchronization between mirrored servers, applies expiration rules to backed up data and deletes data that is deemed expired. The goal of Compactor is to speed up the end-user retrieve process and to reduce the amount of data stored long term on the Data Center. For more details about the Compactor process refer to Chapter 6: Compactor, beginning on page 33.

(34)
(35)

23

5

H

IERARCHICAL

S

TORAGE

M

ANAGER

Over time, the Agents on many computers perform many backups, and the number of archive sets on the Data Center server’s disk grows. When free space on the disk drops below a preconfigured threshold, BackupServer requests

Hierarchical Storage Manager (HSM) to migrate archive sets from disk to the archive storage device, if one has been installed. If no archive storage device is installed, archive sets are kept only on the Data Center server’s disks. The Compactor service, discussed in Chapter 6: Compactor, beginning on page 33, removes old data and recycles disk space as needed. Connected DataProtector supports the following types of archive storage devices:

• Tape libraries (SCSI and DAS) • EMC Centera

Visit the Resource Center for an updated list of hardware solutions that are currently supported.

This chapter discusses the following concepts to help you understand overall archive set storage and management on the Data Center:

• Migration and purge

• Tape Groups and Tape Account Groups • Tape sets

(36)

Chapter 5 / Hierarchical Storage Manager

Migration and Purge

If your Data Center is configured with HSM, the HSMPurge service migrates archive sets from disk to an archive storage device when free disk space is reduced to a preset threshold. Upon reaching another free disk space threshold, the migrated archive sets are purged from disk, freeing disk space for newer backups. You can see the process graphically through the DCMC. For more information refer to Chapter 30: Data Center Management Console, beginning on page 153. If there are unmigrated archive sets and the free space drops below a specified percentage of disk space, HSMPurge begins migrating the archive sets, while keeping the original archive sets on disk. As archive sets are continually backed up to the server and occupy more disk space, free disk space continues to drop. When free disk space drops to a second specified percentage, HSMPurge starts purging migrated archive sets from disk. The purging continues until free disk space grows to a third specified percentage. You can specify the disk space percentages for the migration and purge processes in the DCMC.

Archive sets are not immediately purged from disk after migration to the archive storage device. The reason for this is to keep as many archive sets as possible available on disk for possible file retrieval requests.

Tape Groups and Tape Account Groups

Tape Groups provide a method of keeping data from different communities on separate tapes. A community is the basic organizational unit for accounts on the Data Center server. You might find Tape Groups useful if you have a community whose data you want to keep on separate tapes in the tape library.

Tape Group 0 (zero) is the default Tape Group created by Data Center Setup. The default community is assigned to Tape Group 0. Unless specified in Support Center, all new communities are also assigned to Tape Group 0.

Tape Account Groups provide a way for HSM to group accounts together for assignment to tape. Tape Account Groups are groupings of accounts within a Tape Group. The purpose of Tape Account Groups is to fully utilize tape space. Tape Account Groups have a predetermined maximum number of accounts and quantity of data that are assigned. HSM creates a new Tape Account Group when the current Tape Account Group’s limits are reached.

The Primary Tape Set is organized using Tape Groups and Tape Account Groups by default. The Secondary Tape Set is not organized using these groupings. Refer to Tape Sets, on page 25 for more information about Primary and Secondary Tape Sets.

(37)

Tape Sets

Understanding Connected DataProtector 25

Tape Sets

The Data Center software offers a feature that provides redundant protection of backed up data in both a standalone server and mirrored environment. This includes creating additional copies of archive sets, referred to as Secondary Tape Sets, and taking them off-site as needed. If you install Secondary Tape Sets, there are some concepts that you must understand in order to maintain this setup. This section explains the Secondary Tape Set basic functionality and how to best take advantage of this feature. This feature is also available to Data Centers with EMC Centera. In this situation, a tape library is attached to the server for the purpose of creating and using the Secondary Tape Set.

Overview

With a standalone Data Center configuration and in a mirrored server

environment, there is an amount of risk of losing backed-up data due to various kinds of failures, such as:

• Disk failure on a standalone Data Center or on one of the servers of a mirrored pair

• Loss of tape cartridge

• Total system loss due to fire or similar disaster

The amount of risk decreases in a mirrored server environment, where all backed-up data is stored redundantly on two identical Data Centers, so that if one Data Center of a mirrored pair experiences some technical problems, data is still available on its mirror.

Unlike a mirrored pair, a standalone Data Center only stores a single copy of data on disk or archive storage device (if applicable). In the event of hardware or software malfunction, service outage, a fire, or similar disaster, backed-up data, both on disk and on the archive storage device, will be completely lost if no extra protective measures have been taken. To take such protective measures, you can configure your Data Center to use one or more additional tape sets (refer to the

Setting Up Connected DataProtector manual for installation information). During the migration process, HSM copies data from disk to tapes that belong to the tape sets referred to as the Primary and Secondary Tape Sets.

The Primary and Secondary Tape Sets serve different functions within the Data Center. Therefore, the methods by which they are created differ as well.

(38)

Chapter 5 / Hierarchical Storage Manager

Primary Tape Set

There is only one Primary Tape Set in the tape library, and tapes that belong to it remain permanently in the library to ensure prompt recovery of archive sets at the end user’s request. The main purpose of the Primary Tape Set is to optimize the recovery process for end users if they must retrieve some or all of their data. To maximize the speed and efficiency of file retrieval, data for each individual account is kept together in a Tape Account Group (refer to Tape Groups and Tape Account Groups, on page 24 for more information).

In order to enable maximum amounts of data to accumulate on disk before each migration, data is migrated to tape infrequently. When the Data Center disk space usage parameters have been reached, HSMPurge migrates data to the Primary Tape Set with the goal of consolidating data for each account. In order for an account to be assigned to a particular tape, the amount of data that is already on that tape must be under a specific threshold. Imposing a data threshold provides space for future migrations for accounts that have already been assigned to the tape. Therefore, when an end user initiates a retrieve, the requested data is quickly located on the tape the account is assigned to and copied back to disk. Data is migrated from the Data Center disk to the Primary Tape Set as needed, based on registry settings. If the archive disk is properly sized, migration should occur once a week.

Secondary Tape Sets

The purpose of Secondary Tape Sets is to create and maintain a valid copy of all backed-up data in restorable form so that, if a major data loss occurs at the Data Center, archive sets are still recoverable using disaster recovery tools and procedures. Therefore, instead of consolidating data for each account on a particular tape, HSM tries to migrate archive sets to the Secondary Tape Set tapes as quickly as possible.

There are two kinds of Secondary Tape Sets:

• The SendOnce account tape set stores a backup of the SendOnce account (you can create only one copy of a SendOnce account tape set). This tape set usually remains on-site and is especially helpful in a standalone Data Center configuration, enabling fast recovery of backed up data, lost due to a bad tape or a disk failure. When the SendOnce account tape set tape becomes full, you can remove it from the library and store it on the shelf at the same location.

(39)

Tape Sets

Understanding Connected DataProtector 27

• Off-site Secondary Tape Sets contain a complete copy, with the exception of the SendOnce account, of archive sets and are intended for off-site storage. Depending on your organization’s needs, you can set up the system to create one or more off-site Secondary Tape Sets. For maximum data protection, tapes in these tape sets are filled and removed from the library as often as possible. Once removed from the library, they must be stored in a safe location, preferably in a different building. Therefore, in the event of full-system crash, the most recent users’ data would still be available on the off-site Secondary Tape Set tapes.

Deciding to Use Secondary Tape Sets

In order to decide whether or not to use Secondary Tape Sets, you should consider the following:

• The amount of risk involved in your Data Center operations

If you are running a standalone Data Center, the risk of losing some or all of your backed up data is much higher than in a mirrored environment. If you run a mirrored Data Center, data is still at risk if one of the mirrors is completely destroyed.

• The advantages and disadvantages of this setup and how it can affect your Data Center operations

The primary advantage of having Secondary Tape Sets is in having an ultimate degree of protection against loss or damage of backup data. It is particularly valuable in a standalone server environment, where the risk of losing data due to a disk or tape failure is especially high. In the event of an entire system crash, the off-site tapes from the Secondary Tape Set remain the only source of end-user data, which would otherwise be lost forever.

Although a mirrored server configuration provides an extra degree of data protection against all possible failures by storing data redundantly at the two identical Data Centers, Secondary Tape Sets are still very helpful in the following situations:

• You must quickly restore archive sets that are lost or damaged due to a tape failure.

• One of the servers of a mirrored pair is completely destroyed, and you must quickly move backed-up data to a new mirror.

(40)

Chapter 5 / Hierarchical Storage Manager

The primary disadvantage of using Secondary Tape Sets is the increasing cost of media (you must provide additional tapes to maintain this setup) and operation maintenance. Your decision is therefore a trade-off of cost against the level of risk you are ready to accept.

Taking Secondary Tape Set Tapes Off-Site

To minimize the vulnerability of data in case of disk failure, fire, or other disaster, two schedules have been defined for the Secondary Tape Sets: the migration schedule and the extraction schedule.

Frequency of data migration to the Secondary Tape Set is determined by the migration schedule. In a single server environment, the risk of losing data due to disk failure is much higher than in a mirrored server configuration. To reduce this risk, data must be migrated to the Secondary Tape Set as frequently as possible. Instead of being demand driven, migration is scheduled to run daily or several times per day using the daily automatic procedure. The greater the frequency of migration, the less the data loss if the disk were to fail. Migration to the Secondary Tape Set can also be performed with the DCMC.

To ensure data safety in case of fire or other disaster that might result in loss of the entire Data Center, Secondary Tape Set tapes must be removed from the library and taken off-site as often as possible. The extraction schedule defines how often the Secondary Tape Set tapes are removed from the library.

The frequency of tape extraction is determined by the following factors:

• The amount of data that the Data Center receives daily (if the Data Center has a large user community, tape removal should be performed more frequently)

• The number of blank tapes that the user provides to support the Secondary Tape Set configuration

You can set the extraction interval to less than, equal to, or greater than a day. You should remove Secondary Tape Set tapes from the library every other day or as soon as the tape gets full (waiting until the tape gets full reduces the cost of media, but increases the risk of losing backed-up data due to complete disk loss). Once removed, the tapes should be stored in a safe location, preferably in a different building. Therefore, in the event of full system crash, the most recent data can still be retrieved from the off-site Secondary Tape Sets.

If you host your own Data Center and would like details on how to remove Secondary Tape Set tapes from a tape library using the DCMC, refer to the

(41)

Multiple Tape Libraries

Understanding Connected DataProtector 29

Multiple Tape Libraries

The Data Center is capable of running with two tape libraries attached to each server. You might use multiple tape libraries for any of the following situations:

• You have an existing tape library and would like to replace it by transitioning to a new tape library (for example, if you are replacing an older tape library with one that uses newer technology).

• You want to keep your existing tape library, but you must use an additional “reliever” library temporarily until you can free up tape space on the original library.

• You want to permanently use multiple tape libraries to expand your total available tape capacity.

Each of these situations poses its own unique considerations and procedures. For information on installing two tape libraries on your Data Center or adding a second tape library refer to the Setting Up Connected DataProtector manual.

Transition to a New Tape Library

If you want to replace your original tape library with a new one, you must make the transition over a period of time during which you copy the data from the old library to the new library. A likely example of this situation is if you are replacing an older tape library with one that uses newer technology.

When you replace a library, your goal is to stop using the old tape library, start using the new library, and copy the data from the tapes in the old library to the tapes in the new library. Refer to the Setting Up Connected DataProtector manual for a procedure to transition to a new library.

Temporary Reliever Library

There might be times when you must use an additional tape library for temporary extended storage until Compactor is able to free sufficient space in your original library. Your original library would remain your permanent library, and the additional temporary library would remain in use only for as long as needed. In this situation, you would simply connect the additional library and let the Compactor service run until it has freed up enough tape space to warrant removing the additional library.

(42)

Chapter 5 / Hierarchical Storage Manager

The previous process requires you to transfer tapes back and forth between tape libraries. Therefore, the two libraries must be of compatible tape and barcode technologies.

Permanent Expansion Library

If you are using multiple tape libraries because you want to permanently expand your available tape capacity, then you must plan to keep the multiple libraries in use for an indefinite amount of time. Unlike the previous situations, your goal in this situation is not to work toward using only one library again. Instead your goal is to continually use the multiple libraries as efficiently as possible. Doing so means balancing tape utilization among all libraries in use.

To balance tape utilization, you should understand the following concepts: • How tape utilization works in HSM

• How to balance tape utilization across multiple libraries • How to work with libraries of different technologies

Understanding Tape Utilization in HSM

When HSM migrates data to tape, it accesses the tapes in the alphabetical and numerical order of their labels. Regardless of where or when the tapes are inserted, HSM looks for the next tape labeled alphabetically (or numerically) when the previous tape is full.

For example, assume you have multiple tape libraries with 100 tapes that are labeled ABK001, ABK002,..., ABK100 (you could have inserted these tapes at any time, in any order, or in any library). When ABK001 is full, HSM then migrates data to ABK002. When ABK002 is full, HSM migrates data to ABK003, and so forth. It does not matter which library the tapes are in.

If you have more than one Tape Group, you can split the tapes for the Tape Group between the two libraries. This is not a concern if the libraries and tapes are of the same technology. The same holds true for Tape Account Groups. It is not a concern if a Tape Account Group is split across two libraries. For more information on Tape Groups and Tape Account Groups refer to Tape Groups and Tape Account Groups

(43)

Multiple Tape Libraries

Understanding Connected DataProtector 31

Balancing Tape Utilization

To balance the workload across tape libraries, you should insert the tapes into the tape libraries so that their labels span the libraries evenly.

For example, assume you have two libraries, each with a 50-tape capacity (a total of 100 tapes). Assume the barcode labels that you attached to the tapes are ABK001, ABK002,..., ABK100. When you insert the tapes into the two libraries, you should insert ABK001 into the first library, ABK002 into the second library, ABK003 into the first library, ABK004 into the second library, and so forth. Then, when one tape is full and HSM accesses the next tape, it alternates between each tape library.

Working with Libraries of Different Technologies

Balancing tape utilization is easy if you use libraries that are of compatible tape and barcode technologies because you can simply move tapes between the libraries to get the order that yields optimum load balancing. However, this process is not as easy if you use libraries of different tape and barcode technologies because you cannot simply move tapes between such libraries.

If you use libraries of different tape and barcode technologies, you must prepare in advance of setting up the new tapes. When you order barcode labels for new tapes, order labels with the same barcode labels as your other libraries. For example, if one library uses ABK001-ABK200, order labels with ABK001-ABK200 for the additional library. That way you can attach the barcodes, alternating numbers for each library. For example, use the ABK001 label for the first library, the ABK002 label for the second library, the ABK003 label for the first library, the ABK004 label for the second library, and so forth. Then HSM alternates libraries when migrating data to a new tape.

(44)
(45)

33

6

C

OMPACTOR

Compactor is one of the Data Center services. As a service, Compactor runs automatically and continuously based on Data Center activity. Compactor has several purposes:

• Reduce overall storage requirement for the Data Center. • Improve Agent file retrieval performance.

• Limit the number of tapes needed for account recovery. • Free tape and disk space by removing expired data. • Reduce the size of the databases.

• Improve data integrity.

Compactor runs on all Data Center configurations but runs differently on a standalone server than it does in a mirrored configuration. It also works differently with HSM as opposed to a disk-only configuration.

For mirrored Data Centers, the Compactor service runs on both servers but only one of the servers in the pair controls the workload of the compaction process. This server is referred to as the primary server. If you are running a clustered Data Center, there is one primary server for every mirrored pair in the cluster. For example, a clustered Data Center with three mirrored pairs has three primary servers. You can check the status of the primary server(s) in the Compactor view of the DCMC.

The Compactor service removes older, unnecessary data from the Data Center. It accomplishes this task through the following process:

1. Check for necessary disk space.

(46)

Chapter 6 / Compactor

3. Perform a system analysis and repair. 4. Mark files as expired.

5. Repackage archive sets.

6. Delete expired archive sets and database entries. 7. Migrate new archive sets to tape.

8. Inform the Agent of changes.

These steps represent a standard compaction cycle on a mirrored Data Center using HSM. Some steps are different or excluded for other configurations as noted in the descriptions below.

Check for Necessary Disk Space

Before Compactor begins processing accounts, it checks for necessary disk space on all servers where HSM is installed. It compares the DiskCache value in the Windows registry to the sum of free disk space on the archive partitions and the amount of space taken up by customer archive sets. If there is available space, the compaction process proceeds. If there is not enough available space, Compactor writes an error message to the Application log and then stops. A certain amount of disk space is necessary because all archive sets for an account must be on disk for Compactor to process the account. Compactor also checks for available disk space before each account is processed.

On disk-only Data Centers all of the account's archive sets are already on disk, therefore the disk cache check is not necessary. If the free disk space on a disk-only Data Center server drops below 10% of the total disk space, Compactor attempts to compact all accounts on the server to free up disk space.

Select Accounts or a Tape Account Group

Compactor must determine which accounts to work on per session. For a Data Center using a tape library, Compactor selects the oldest Tape Account Group that has not been compacted in a set number of days. The default number of days is 90, but you can adjust this number in the DCMC. For more information on Tape Account Groups refer to Tape Groups and Tape Account Groups, on page 24. If a Data Center does not use a tape library for the Primary Tape Set (if it is disk-only or uses Centera), Compactor begins working on accounts that have not been compacted in a set number of days. The default number of days is 30, but you can adjust this number in the DCMC.

References

Related documents

1. History of industrial relations – No enterprise can escape its good and bad history of industrial relations. A good history is marked by harmonious

Data Protection Remote Support D2D Backup Systems ESL tape libraries VLS virtual library systems EML tape libraries MSL tape libraries RDX, tape drives & tape

In Chapter 1 and 2 we showed that in dimension 1 the Membership Problem for finitely generated quadratic modules is solvable in the affirmative over arbitrary real closed fields if

To restore data from the secondary physical tape copy, it must either be migrated back to virtual tape in the VTL or placed into a tape library NetBackup controls, and then

Company # of Award Dates Sum of Capital Investment Sum of Total Direct Funding Awarded Benefits Awarded Sum of Tax.. Sum of Total

2) Why do you think some people like to help other people? Most people get a good feeling when they help others, and they understand that we can all experience difficult times in

contractors, check that ornamental plants and animals used in your hotel are not listed as invasive species and give preference to the use of alternative native species (for

Epson EasyMP Monitor lets you carry out operations such as using a computer monitor to check the status of multiple Epson projectors that are connected to a network, and using