Implementation &
Maintenance
2 |
Copyright
Information in this document, including URL and other website references, represents the current view of CommVault Systems, Inc. as of the date of publication and is subject to change without notice to you. Descriptions or references to third party products, services or websites are provided only as a convenience to you and should not be considered an endorsement by CommVault. CommVault makes no representations or warranties, express or implied, as to any third party products, services or websites.
The names of actual companies and products mentioned herein may be the trademarks of their respective owners. Unless otherwise noted, the example companies, organizations, products, domain names, e-mail addresses, logos, people, places, and events depicted herein are fictitious.
Complying with all applicable copyright laws is the responsibility of the user. This document is intended for distribution to and use only by CommVault customers. Use or distribution of this document by any other persons is prohibited without the express written permission of CommVault. Without limiting the rights under copyright, no part of this document may be reproduced, stored in or introduced into a retrieval system, or transmitted in any form or by any means (electronic, mechanical, photocopying, recording, or otherwise), or for any purpose, without the express written permission of CommVault Systems, Inc.
CommVault may have patents, patent applications, trademarks, copyrights, or other intellectual property rights covering subject matter in this document. Except as expressly provided in any written license agreement from CommVault, this document does not give you any license to CommVault’s intellectual property.
COMMVAULT MAKES NO WARRANTIES OF ANY KIND, EXPRESS OR IMPLIED, AS TO THE INFORMATION CONTAINED IN THIS DOCUMENT.
©1999-2014 CommVault Systems, Inc. All rights reserved
CommVault, CommVault and logo, the “CV” logo, CommVault Systems, Solving Forward, SIM, Singular Information Management, Simpana, CommVault Galaxy, Unified Data Management, QiNetix, Quick Recovery, QR, CommNet, GridStor, Vault Tracker, InnerVault, QuickSnap, QSnap, Recovery Director, CommServe, CommCell, IntelliSnap, ROMS, Simpana OnePass, CommVault Edge and CommValue, are trademarks or registered trademarks of CommVault Systems, Inc. All other third party brands, products, service names, trademarks, or registered service marks are the property of and used to identify the products or services of their respective owners. All specifications are subject to change without notice. All right, title and intellectual property rights in and to the Manual is owned by CommVault. No rights are granted to you other than a license to use the Manual for your personal use and information. You may not make a copy or derivative work of this Manual. You may not sell, resell, sublicense, rent, loan or lease the Manual to another party, transfer or assign your rights to use the Manual or otherwise exploit or use the Manual for any purpose other than for your personal use and reference. The Manual is provided "AS IS" without a warranty of any kind and the information provided herein is subject to change without notice.
Table of Contents
(R10.4)Introduction ... 7
Preliminaries ... 8
Education Advantage ... 9
Customer Education LIfecycle ... 10
CommVault Certification ... 11
CommVault Advantage ... 12
Course Building Blocks ... Error! Bookmark not defined. Course Objective ... 14
Common Technology Engine ... 15
Training Environment ... 17
Module 1 – Planning a CommCell Architecture ... 19
Topics ... 20
Common Technology Engine Architecture ... 21
CommCell Architecture Overview ... 22
CommServe Server ... 23
Indexing Structure ... 25
Common Technology Engine Best Practices ... 27
Architecting a Storage Solution ... 29
Simpana Deduplication... 32
Understanding Simpana Deduplication ... 33
Deduplication Building Block Guidelines ... 36
Deduplication Storage Options ... 38
Partitioned Deduplication Database ... 40
Enterprise Building Block Guidelines ... 41
SILO Storage ... 42
Advanced Deduplication Configurations ... 45
Deduplication Best Practices ... 47
Disaster Recovery Concepts ... 51
Business Continuity Concepts ... 55
Protection Methods ... 57
Data Description ... 60
Data Availability ... 62
Protected Storage Requirements ... 64
Designing a Sound Data Protection Strategy... 65
Understanding Client Agents ... 66
Protecting Virtual Environments ... 67
The VSA Backup Process ... 69
Protecting Applications ... 71
Snapshot Management ... 73
Data Protection Best Practices ... 76
Module 2 - CommCell Environment Deployment ... 81
Topics ... 82
CommCell Deployment Process ... 83
New CommCell Deployment Process ... 84
Existing CommCell Upgrade Process ... 85
CommCell Disaster Recovery Process ... 86
Environment Requirements ... 89
Installing CommServe Software ... 90
Installing MediaAgent Software ... 94
Index Cache Configuration ... 96
Library Detection and Configuration ... 98
Module 3 – Advanced Configurations ... 111
Topics ... 112
Storage Policy Design ... 113
Storage-Based Design Strategy ... 114
Business Based Design Strategy ... 115
Deduplication’s Impact on Policy Design ... 116
Advanced Storage Policy Features ... 122
Storage Policy Design Best Practices ... 126
Advanced Job Control ... 131
Controlling Data Protection and Recovery Jobs ... 132
Understanding Ports and Services ... 133
Firewall Configuration ... 136
Network Control ... 138
Data Interface Pairs ... 141
Network Throttling ... 142
Configuring Data Encryption ... 143
Base Folder and Resource Pack Tools ... 145
Module 4 – Performance Tuning ... 149
Topics ... 150 Performance ... 151 Establishing Benchmarks ... 152 Storage Performance ... 154 Performance Parameters ... 158 Stream Management ... 161 Data Streams ... 162
Deduplication and Stream Management ... 165
Introduction
Preliminaries
The value of this course comes from three distinct areas – first, the content of the material which guides your exploration and understanding of the product. Second, the skill of the instructor to expand on those areas of interest and to add value from their experience with the product. And lastly, you, the student whose questions and own experiences help not only yourself but others in understanding how Simpana® software can help you with your data management requirements.
No unauthorized use, copy or distribution.
• Who am I? • Who are you? • Why are we here?
• How will this course be conducted?
Education Advantage
The CommVault Education Advantage product training portal contains a set of powerful tools to enable CommVault customers and partners to better educate themselves on the use of the CommVault software suite. The portal includes:
• Training Self-Assessment Tools
• Curriculum Guidance based on your Role in your CommVault Enterprise
• Management of your CommVault Certifications
• Access to Practice Exams and Certification Preparatory Tools
• And more!
No unauthorized use, copy or distribution.
• Track Career Paths • Self Assessment • Register for Courses • Track Certification
Progress • Leave feedback
Customer Education LIfecycle
Before customers install CommVault® Simpana® software, they should have a basic
understanding of the product. This learning timeline illustrates the role of product education over the early years of owning CommVault Simpana software. A lifecycle ranging from the pre-installation review of the "Introduction to Simpana Software" eLearning module, to the pursuit of Masters Program certifications.
No unauthorized use, copy or distribution.
• Available at: http://services.commvault.com/education
• Provides recommended training Pre / Post deployment and at various levels of Simpana®software expertise
CommVault Certification
CommVault's Certification Program validates expertise and advanced knowledge in topics, including: CommVault Core Fundamentals, Implementation and Maintenance, Preparing for Disaster Recovery and more advanced Specialist and Master technologies. Certification is a valuable investment for both a company and the IT professional. Certified personnel can increase a company's productivity, reduce operating costs, and increase potential for personal career advancement.
CommVault's Certification Program has been re-designed to now offer Professional-level, Specialist-level and Master-level certifications. This new Program provides certification based on a career path, and enables advancement through the program based on an individual’s previous experience and desired area of focus. It also distinguishes higher-level certification from lower-level certification as a verified proof of expertise.
Key Points
• Certification is integrated with and managed through CommVault's online registration in the
Education Advantage Customer Portal.
• Cost of certification registration is included in the associated training course.
• Practice assessments are given in class at the end of each module.
• Students may take the online certification exam(s) any time after completing the course.
• Previous training course students (as validated by the registrar) can also request an
opportunity to take the online assessment exam at no charge.
• For those that feel they do not require training, an online assessment opportunity for each
certification level may be purchased separately from the training course.
No unauthorized use, copy or distribution.
• Certified Professional requires passing • Core Fundamentals
• Implementation and Maintenance • Certified Specialist
• Disaster Recovery • Virtualization • Certified Master
CommVault Advantage
CommVault® Advantage is your profile as a CommVault consumer and expert. The CommVault Advantage system captures your certifications, participation in learning events and courses, your Forum participation, Support interaction and much more. Through your CommVault interactions your awarded Profile Points are collected and compared with other CommVault consumers worldwide. These Profile Points allow our users to thoroughly demonstrate their Simpana® software expertise for personal and professional growth. Login to CommVault Advantage to check your progress and compare yourself to the Global CommVault community or create an account today.
No unauthorized use, copy or distribution.
• Point system based on: • ILT and eLearning courses • Certification
• Maintenance Advantage
Common Technology Engine
The CommCell® environment is the logical management boundary for all components that protect, move, store and manage the movement of data and information. All activity within the CommCell environment is centrally managed through the CommServe® server. Users log on to the CommCell® Console Graphical User Interface (GUI) which is used to manage and monitor the environment. Agents are deployed to clients to protect production data by communicating with the file system or application requiring protection. The data is processed by the agents and protected through MediaAgents to disk, tape or cloud storage. Clients, MediaAgents and
libraries can be in local or remote locations. All local and remote resources can be centrally configured and managed through the CommCell console. This allows for centralized and decentralized organizations to manage all data movement activities through a single interface. All production data protected by agents, all MediaAgents and all libraries that are controlled by a CommServe server is referred to as the CommCell environment.
Physical Architecture
A physical CommCell® environment is made up of one CommServe® server, one or more MediaAgents and one or more Clients. The CommServe server is the central component of a CommCell environment. It hosts the CommServe database which contains all metadata for the CommCell environment. All operations are executed and managed through the CommServe. MediaAgents are the workhorses which move data from source to destination. Sources can be production data or protected data and destinations can be disk, cloud or removable media
No unauthorized use, copy or distribution.
Common Technology Engine
Default Custom Custom P h y si cal V iew Logic a l V ie w CommServe MediaAgent Client Libraries Data Protection / Recovery Communication Data Set Agent Storage Policy Policy Copy Policy Copy
libraries. Clients are production systems requiring protection and will have one or more Agents installed directly on them or on a proxy server to protect the production data.
Logical Architecture
CommVault’s logical architecture is defined in two main areas. The first area depicts the logical management of production data which is designed in a hierarchal tree structure. Production data is managed using Agents. These agents interface natively with the file system or application and can be configured based on specific functionality of data being protected. Data within these agents are grouped into a data set (backup set, replication set, or archive set). These data sets represent all data the Agent is designed to protect. Within the data set, one or more subclients can be used to map to specific data. The flexibility of subclients is that data can be grouped into logical containers which can then be managed independently in the CommVault protected environment.
The second area depicts managing data in CommVault protected storage. This is facilitated through the use of storage policies. Storage policies are policy containers which contain one or more rule sets for managing one or more copies of protected data. The first rule set is the primary copy. This copy manages data being protected from the production environment. Additional secondary copies can be created with their own rule sets. These rule sets will manage additional copies of data which will be generated from existing copies within the CommVault protected environment. The rule sets define what data will be protected
(subclients), where it will reside (data path), how long it will be kept for (retention), encryption options, and media management options.
Training Environment
The CommVault Virtual Training environment, when available, can be used by students to perform course activities or explore the product’s user interface. The training environment is NOT fully resourced, nor are all components installed or available. All course activities are supported, but due to host memory (RAM and disk space) constraints, only a limited number of Virtual Machines can be operational at the same time and few tasks beyond the activities listed in the course manual can be performed. Please discuss with your instructor what other
activity/tasks you can do.
No unauthorized use, copy or distribution.
Module 1 – Planning a CommCell Architecture
Module 1 Planning a CommCell® Architecture
Topics
No unauthorized use, copy or distribution.
• Common Technology Engine Architecture
• CommCell® Architecture Overview • CommServe® Server
• Indexing Structure
• Common Technology Engine Best Practices
• Architecting a Storage Solution
• Simpana® Deduplication
• Understanding Simpana Deduplication
• Deduplication Building Block Guidelines
• SILO Storage
• Deduplication Best Practices
• Planning a Sound Data Protection Strategy
• Disaster Recovery Concepts • Business Continuity Concepts • Protection Methods
• Data Description • Data Availability
• Protected Storage Requirements
• Designing a Sound Data Protection Strategy
• Understanding Client Agents • Protecting Virtual Environments • VSA Backup Process
• Protecting Applications • Snapshot Management • Data Protection
Best Practices
Common Technology Engine Architecture
COMMON TECHNOLOGY ENGINE ARCHITECTURE
CommCell Architecture Overview
The heart of any Simpana deployment is the CommServe® server. All activity is managed from this central point and all backup and restore activity must be initiated from the CommServe server. A Microsoft SQL metadata database is used to store all CommServe configuration and job history data.
Data movement is conducted from source to destination using MediaAgents. One or more MediaAgents can be used to move data providing greater flexibility and scalability.
Production data is managed by installing iDataAgents on physical hosts, virtual hosts or on proxy hosts. The iDataAgent communicates with the file system or application being protected and uses native APIs and / or scripting to conduct data protection operations. Physical and virtual hosts with iDataAgents installed are referred to as clients.
Libraries are used to store protected data. CommVault software supports a wide range of library configurations.
The CommServe server, MediaAgents, libraries and clients that communicate with one another make up the CommCell architecture.
No unauthorized use, copy or distribution.
• CommServe®Server
• MediaAgent • Indexing • Library • Client
CommCell
®Architecture Overview
CommServe
MediaAgent Client
Data Protection
Snap Backup Archive
Revert
Restore / Recall
Deduplication
Compression / Deduplication / Encryption
Protected Storage Index Cache Create / Update Browse / Retrieve Archive / Prune
CommServe Server
Within a CommCell environment there can only be one active CommServe server. For high availability and failover there are several methods that can be implemented. The following information explains each of these methods.
Hot / Cold Standby
A hot or cold standby CommServe server consists of a physical or virtual machine with the CommServe software pre-installed. The DR backup Export process directs metadata exports to the standby CommServe server. In the event that the production CommServe server is not available the standby CommServe server can quickly be brought online.
Virtualization
Some customers with virtual environments are choosing to virtualize the production
CommServe server. A virtualized CommServe server has an advantage of using the hypervisors high availability functionality (when multiple hypervisors are configured in a cluster) and
reduces costs since separate CommServe hardware is not required. Although this method could be beneficial, it should be properly planned and implemented. If the virtual environment is not properly scaled the CommServe server could become a bottleneck when conducting data protection jobs. In larger environments where jobs run throughout the business day,
CommServe server activity could have a negative performance impact on production servers. When virtualizing the CommServe server it is still critical to run the CommServe DR backup. In the event of a disaster the CommServe server may still have to be reconstructed on a physical
server. Do not rely on the availability of a virtual environment in the case of a disaster. Follow normal CommVault best practices in protecting the CommServe metadata.
Clustering
The CommServe server can be deployed in a clustered configuration. This will provide high availability for environments where CommCell operations run 24/7. A clustered CommServe server is not a DR solution and a standby CommServe server must be planned for at a DR site. Clustering the CommServe server is a good solution in large environments where performance and availability are critical.
Another benefit for using a clustered CommServe server is when using Simpana data archiving. Archiving operations can be configured to create stub files which allow end users to initiate recall operations. For the end user recall to complete successfully the CommServe server must be available.
SQL Database Replication
The CommServe database typically will not grow very large so running periodic DR backups may be adequate to properly protect the CommServe metadata. In larger environments where the databases grow to larger sizes, using SQL log shipping or Simpana’s Continuous Data Replicator (CDR) can be used where minimal CommServe server data loss (short RPO) is required. These protection methods require additional Simpana agents and additional Microsoft SQL server licenses.
CommServe DR Backup Process
By default every day at 10:00 AM the CommServe DR backup process is executed. This process will first dump the CommServe SQL database and the registry hive to the <install
path>\CommVault\Simpana\CommServeDR folder. An Export process will then copy the folder contents to a user defined drive letter or UNC path. A Backup phase will then back up the DR Metadata, registry hive and user defined log files to a location based on the storage policy associated with the backup phase of the DR process. All processes, schedules and
export/backup location are customizable in the DR Backup Settings applet in Control Panel.
CommServe DR IP Address
A CommCell license is bound to the IP address of the CommServe server. In situations where a standby CommServe server with a different IP address it must be included in the CommCell license information.
Indexing Structure
Simpana software uses a distributed indexing structure that provides for enterprise level scalability and automated index management. This works by using the CommServe database to only retain job based metadata which will keep the database relatively small. Job and detailed index information will be kept on the MediaAgent protecting the job, automatically copied to media containing the job and optionally copied to an Index Cache Server.
Job summary data maintained in the CommServe database will keep track of all data chunks being written to media. As each chunk completes it is logged in the CommServe database. This information will also maintain media identities where the job was written to which can be used when recalling off-site media back for restores. This data will be held in the database for as long as the job exists. This means even if the data has exceeded defined retention rules, the
summary information will still remain in the database until the job has been overwritten. An option to browse aged data can be used to browse and recover data on media that has exceeded retention but has not been overwritten.
The detailed index information for jobs is maintained in the MediaAgent’s Index Cache. This information will contain each object protected, what chunk the data is in, and the chunk offset defining the exact location of the data within the chunk. The index files are stored in the index cache and after the data is protected to media, an archive index operation is conducted to write the index to the media. This method automatically protects the index information eliminating the need to perform separate index backup operations. The archived index can also be used if
No unauthorized use, copy or distribution.
Indexing Structure
Index file log shipping to ICS
Index file copied to media
Dedicated Index Cache and Index Cache Server
Shared Index Cache
Index file copied to media MediaAgent Disk Library Tape Library MediaAgent MediaAgent Disk
Library Tape Library
Index Cache
Index Cache Server Shared Index
the index cache is not available, when restoring the data at alternate locations, or if the indexes have been pruned from the index cache location.
Index Cache Server
Index Cache Server is an index cache sharing mechanism that saves an additional copy of the index cache for sharing purposes. This additional copy, the Index Cache server, is located on one of the MediaAgent computers participating in the share. This Index Cache server can be
accessed by all participating MediaAgents.
Index Cache Server provides the following advantages:
• Index cache restores for data protection operations.
• Job restartability in GridStor™ Technology scenarios (when used with transaction logging).
• Index cache rebuilding in failover scenarios (when used with transaction logging).
• Maintaining a local cache prevents network disruptions from affecting the data protection
operations.
Transaction logging
The index copy on the Index Cache server is created by either copying the original index during the Archive Index phase of the data protection job, or dynamically through transactional log replay. Transactional logs are sent at the completion of each storage chunk. In the event the local cache is lost while indexing a job, the job can be restarted at the last transaction
successfully entered on the Index Cache Server.
Shared Index Cache (Network Share)
A Network Share is a designated location on the network where one or more MediaAgents store their index cache. The Index Cache stored in a network share can be accessed from all
participating MediaAgents. You might use a network share if you have a dedicated partition created exclusively for Index Cache and you wish to use this partition for index cache sharing.
Ensure that you have enough space to accommodate the index cache from all participating MediaAgents.
Note: When using a network share, the local index and the shared index are one and the same.
A network disruption might corrupt the index and jobs might have to be restarted due to index cache failure.
Intermediate Index Cache
It is highly recommended that the option Enable Intermediate Index Cache Directory be used when configuring Index Cache on a network share. With this option turned on the index is written to the local disk first and at commit points uploaded to the Network share. This will avoid failures due to network disruptions/failures writing to the index on the network share.
Common Technology Engine Best Practices CommServe Server
The CommServe metadata database is the most critical component within the CommCell
infrastructure. If the data becomes corrupt, the CommServe server disk crashes or you are faced with a full site disaster situation, having the metadata backup readily accessible is critical.
Consider the following key points for proper metadata protection:
• Wherever you send protected data a copy of the DR Metadata should be included.
• If you have a standby CommServe server or a dedicated DR site with network accessibility set
the Export phase of the metadata backup to be written to that location.
• Consider using post process scripts to copy the metadata to additional locations as needed.
• Make sure you properly secure the metadata backup since all configuration, security,
licensing and encryption key information is kept in the database.
• If data protection copies to tape are typically performed during the day and finish after 10
AM consider setting a second schedule of the DR backup to ensure the most up to date metadata is sent off-site with the data tapes.
• If tapes are sent off-site prior to 10 AM consider changing the default DR backup schedule to
ensure the most up to date metadata is sent off-site with backup tapes.
• If your environment is using the Erase Data feature make sure the metadata backup is going
to a dedicated DR storage policy or a storage policy with the Erase Data option deselected.
No unauthorized use, copy or distribution.
• CommServe®Server
• Index Cache Settings
Index Cache Settings
All object level data protection jobs will use indexes for all operations. These indexes are maintained in the index cache. Improper configuration of the index cache can result in job failures and long delays in browse and recovery operations.
Consider the following when designing and configuring the index cache:
• Do NOT put the index cache on the system drive. Use a dedicated drive (recommended) or a
dedicated partition (for smaller environments). During MediaAgent installation the default path for the index cache is the system drive. The location of the cache can be changed by selecting: right-click the MediaAgent à select properties Catalog tab.
• Size the index cache appropriately based on the size of your environment and the estimated
number of objects that will be protected. It is much better to overestimate than underestimate index cache size. Sizing guidelines are available on CommVault’s documentation-site.
• If you will be running many concurrent jobs protecting millions of objects locate the index
cache on high speed dedicated disks. Backup performance can suffer during index update operations.
• The default retention time for the index cache is 15 days. If you will be frequently browsing
for data older than 15 days increase this setting and allocate enough disk space for the index cache.
• Index files are automatically backed up to media after each data protection job so there is no
need to perform backups of the index cache location. If you are concerned about having fast access to indexes in the event that the cache is lost consider using an Index Cache Server.
Architecting a Storage Solution
All data storage devices associated/configured with the Simpana® software are referred to as Libraries. All data destined to and from a library must pass through a MediaAgent (or a NAS Filer). Libraries can be shared between dissimilar OS hosted MediaAgents. Data written by the Simpana software is OS independent. This means data written by a UNIX MediaAgent can be restored via a Windows MediaAgent and vice-versa.
The most common supported library types are listed below. For a list of specific vendor devices consult the Hardware Compatibility List on CommVault's Maintenance Advantage website. For a list of all supported library types consult CommVault's Books Online website.
Disk library - A disk library is a virtual library associated disk media configured for read/write
access as one or more mount paths. The disk library is a software entity and does not represent a specific hardware entity. The storage capacity of a disk library is determined by the total storage space in its mount paths.
Tape library - Tape libraries are made up of one or more tape devices with a library controller
and internal media storage. A Tape library can have mixed media and shared access with one or more MediaAgents (on NAS Filers) in the same CommCell® group. Note that Tape libraries can be configured to use WORM media.
No unauthorized use, copy or distribution.
• Library Types • Disk Library • Removable Media
Library • NDMP Library • Plug and Play Library • Cloud Storage • Storage Connections • Direct Attached Storage (DAS) • Network Attached Storage (NAS) • Storage Area Network (SAN)
Architecting a Storage Solution
DAS
SAN NAS
Cloud
Clients Client / MA
Client / MA MediaAgent MediaAgent
Fibre / iSCSI SAN
Blind Library - A blind library is a tape library without a barcode reader, and is the opposite of a
sighted library which has a barcode reader. A blind library must have all its drives (and media) of the same type. Once configured, a blind library cannot be configured as a sighted library.
IP Library - An IP Library provides LAN-based Media Management for multiple applications.
Media inventory, pools, and device loading/unloading is all managed by the library software. IP Libraries are the only libraries that can be shared between CommCell entities. Example: STK ACSLS or ADICS SDLC libraries.
Stand-alone Tape Library - A single tape device with no library controller or internal storage that
is accessible from a MediaAgent. Stand-alone Tape drives can be pooled together for a multi-stream job or single multi-stream failover configuration.
NAS NDMP Library - A tape library attached to a NAS Filer for NDMP data storage. The library
control and drives in a NAS NDMP library can be dynamically shared between multiple devices (NAS file servers and MediaAgents) if these devices are connected to the library in a SAN environment. The device initially having library control (media changer) would be the first configured device.
Virtual Tape Library - A software representation of a tape library using disk storage. Virtual tape
libraries are supported, but not recommended because a normal disk library provides many more features and capabilities.
Plug & Play Library - Plug and Play (PnP) storage devices (e.g., FireWire, USB, SATA storage
devices, etc.) can be used for storage instead of tapes. Once configured, PnP disks are treated like tapes in a Stand-Alone drive. PnP libraries are useful in locations where it is hard to
configure and manage tapes due to operational issues. Only one PnP library can be configured per MediaAgent. Although multiple drives can be configured, only single-streamed jobs are supported. (Multiple drives provide the ability to span across multiple media for a single-streamed job.)
Cloud Library - A Cloud library uses online storage devices — cloud storage devices — as storage
targets. Cloud libraries provide a pay-as-you-go capability for network storage. Data is transferred through secured channels using HTTPS protocol.
Removable Disk Drives
Removable Disk Drives can be configured as stand-alone drives. All operations supported by stand-alone drives are supported by such devices. Removable disks differ from PnP disks in that they are drive enclosure devices that retain a persistent drive letter in the Operating System, regardless of whether or not a disk media is loaded into the enclosure.
Storage Connections
Direct Attached Storage (DAS)
Direct Attached Storage (DAS) means the production storage location is directly attached (not SAN) to the production server. In situations where many production servers use DAS, there is no single point of failure. The primary disadvantages are higher administrative overhead and depending on budget limitations, lower quality storage being used instead of high quality enterprise class disks (typically found in SAN/NAS storage).
For some applications such as Exchange 2010 using DAG (Database Availability Groups), Direct Attached Storage may be a valid solution. The main point is that although the storage trend over the past several years has been to storage consolidation, DAS storage should still be considered for certain production applications.
One key disadvantage regarding DAS protection is that backup operations will likely require data to be moved over a network. This problem can be reduced by using dedicated backup networks. Another disadvantage is that DAS is not as efficient as SAN or NAS when moving large amounts of data.
Network Attached Storage (NAS)
Network Attached Storage (NAS) has made a strong comeback over the past few years by taking advantage of its versatility. Where NAS was once only used as file stores they are now
considered good options for databases and virtual machines. NAS versatility includes the ability to attach Fibre or iSCSI connections along with traditional NAS NFS/CIFS shares and has a primary advantage of device intelligence using specifically designed operating systems to control and manage disks and disk access. From a high availability and disaster recovery aspect, disk cloning or mirroring and replication provide sound solutions. Simpana's IntelliSnap™ integration with supported hardware provides simple yet powerful snapshot management capabilities.
One key disadvantage of NAS is that it typically requires network protocols when performing data protection operations. This disadvantage can be greatly reduced through the use of snapshots and proxy based backup operations.
Storage Area Network (SAN)
Storage Area Networks (SAN) are very commonly implemented for the most mission critical systems within an environment. The ability to consolidate storage using efficient data movement protocols, Fibre channel and iSCSI provide flexibility and performance.
One key disadvantage of SAN is the complexity of configuring and managing SAN networks. Typically, specialized training is required and all hardware must be fully compatible for proper operation. Since SAN storage lacks the operating system that NAS storage has, it relies on a host system for data movement. Depending on the configuration, the load of data movement can be offloaded to a proxy and by adding Host Bus Adapters (HBA) connected to a dedicated backup SAN data can be more efficiently backed up.
Simpana Deduplication
Understanding Simpana Deduplication
The deduplication process contains the following key components:
• Storage Policy • Deduplication Blocks • Signature Hash • MediaAgent • Deduplication Database (DDB) • Disk storage
• Optional client side signature cache
Storage Policy
All deduplication activity is centrally managed through a storage policy. Configuration settings are defined in the policy, the location of the deduplication database is set through the policy, and the disk library which will be used is also defined in the policy.
Deduplication blocks and Signature Generation
When data protection jobs are executed, the data is sent to the Simpana® agent from the file system or application the agent is responsible for protecting. Even though the data may be files or application data, we will process the data as deduplication blocks. The deduplication process starts by performing a calculation to generate a Signature Hash. This is a 512 bit value that
No unauthorized use, copy or distribution.
• MediaAgent • Deduplication Database • Deduplication Store • Storage Policy • Dedicated • Global
Understanding Simpana
®Deduplication
MediaAgent Data Storage Policy Data Storage Policy Global Deduplication Policy Deduplication Database (DDB) Deduplication Store
uniquely represents the data within the block. This hash will then be used to determine if the block already exists in storage.
The block size that will be used is determined in the Storage Policy Properties in the Advanced tab. CommVault® recommends using the default value of 128k but the value ranges from 32k to 512k. Higher block sizes for large databases is recommended.
Signature Hash Comparison
The block signature hash is used to determine if the block exists in storage by comparing the hash against other hashes in the Deduplication Database. By default, signature hashes are generated on the Client. This is preferred since the processing of block signatures can be distributed to many different systems. This is required when using Simpana Client Side Deduplication. For underpowered Clients that will not be using Client Side Deduplication, a subclient can be optionally configured to generate signatures on the MediaAgent.
Deduplication can be configured for Storage Side Deduplication or Client (source) Side
Deduplication. Depending on how deduplication is configured, the process will work as follows:
Storage Side Deduplication. Once the signature hash is generated on the block, the block and
the hash are both sent to the MediaAgent. The MediaAgent with a local or remotely hosted deduplication database (DDB) will compare the hash within the database. If the hash does not exist that means the block is unique. The block will be written to disk storage and the hash will be logged in the database. If the hash already exists in the database that means the block already exists on disk. The block and hash will be discarded but the metadata of the data being protected will be written to the disk library.
Client Side Deduplication Once the signature is generated on the block, only the hash will be
sent to the MediaAgent. The MediaAgent with a local or remotely hosted deduplication database will compare the hash within the database. If the hash does not exist that means the block is unique. The MediaAgent will request the block to be sent from the Client to the MediaAgent which will then write the data to disk. If the hash already exists in the database that means the block already exists on disk. The MediaAgent will inform the Client to discard the block and only metadata will be written to the disk library.
Client Side Disk Cache An optional configuration for low bandwidth environments is the
client side disk cache. This will maintain a local cache for deduplicated data. Each subclient will maintain its own cache. The signature is first compared in the local cache. If the hash exists the block is discarded. If the hash does not exist in the local cache, it is sent to the MediaAgent. If the hash does not exist in the DDB, the MediaAgent will request the block to be sent to the MediaAgent. Both the local cache and the
deduplication database will be updated with the new hash. If the block does exist the MediaAgent will request the block to be discarded.
Deduplication Database
The deduplication database is the primary component of Simpana’s deduplication process. It maintains all signature hash records for a deduplicated storage policy. Each storage policy will have its own deduplication database. Optionally, a global deduplication storage policy can be used to link multiple storage policies to a single deduplication database by associating storage policy copies to a global deduplication storage policy.
The deduplication database currently can scale from 500 to 750 million records. This results in up to 90 Terabytes of data stored within the disk library and up to 900 Terabytes of production in protected storage. It is important to note that the 900 TB is not source size but the amount of data that is baked up over time. For example if 200 TB of data is being protected and retained for 28 days using weekly full and daily incremental backups, the total amount of protected data would be 800 TB (200 TB per cycle multiplied by 4 cycles since a full is being performed every seven days). These estimations are based on a 128k block size and may be higher or lower depending on the number of unique blocks and deduplication ratio being attained.
A deduplication database can handle up to 50 concurrent connections with up to 10 active threads at any given time. The database structure uses a primary and secondary table. Unique blocks are committed to the primary table and use 152 bytes per entry. Duplicate entries are registered in the secondary table and use 48 bytes per entry.
Deduplication Store
Each storage policy copy configured with a deduplication database will have its own deduplication store. Quite simply a deduplication store is a group of folders used to write deduplicated data to disk. Each store will be completely self-contained. Data blocks from one store cannot be written to another store and data blocks in one store cannot be referenced from a different deduplication database for another store. This means that the more
independent deduplication storage policies you have, the more duplicate data will exist in disk storage.
Deduplication Building Block Guidelines
CommVault recommends using building block guidelines for scalability in large environments. There are two layers to a building block, the physical layer and the logical layer.
For the physical layer, each building block will consist of one or more MediaAgents, one disk library and one deduplication database.
For the logical layer, each building block will contain one or more storage policies. If multiple storage policies are going to be used they should all be linked to a single global deduplication policy for the building block.
A building block using a deduplication block size of 128 KB can scale to retain up to 96 TB of deduplicated data. This could retain approximately 40 – 60 TB of production data with retention of 30 – 90 days. The actual size of data will vary depending on the uniqueness of production data and the incremental block rate of change.
It is critical to provide adequate hardware to achieve maximum performance for a building block.
Performance starts with properly scaling the MediaAgent. There should be a minimum of 32 GB of RAM on each MediaAgent hosting the deduplication database.
No unauthorized use, copy or distribution.
• Must meet IOPs requirements • Iometer.org • MediaAgent Minimum 32 GB RAM • Up to 96 TB Per Building Block • Based on 128 KB Block Size
Deduplication Building Block Guidelines
2 – 8 TB Mount Paths Spill & Fill Configuration Up to 50 concurrent write streams Up to 100 TB usable capacity Windows or Linux
64 bit OS 2 CPU Quad Core 32 GB RAM
Dedicated high speed SSD disks Must meet IOPs requirements 300 – 500 GB capacity MediaAgent Deduplication Database (DDB) Disk Library
The disks library can be sized up to 100 TB for a single building block. Mount paths should be configured between 2 – 8 TB.
In order to meet deduplication database IOPs requirements, high performance disks in a RAID array must be used. Enterprise class Solid State Disks or high speed SCSI disks are
recommended. The disks should be configured in a RAID 0 or RAID 10 configuration. RAID 0 provides the best read / write performance but creates multiple single points of failure. If RAID 0 is going to be used ensure you are frequently protecting the deduplication database. Dedupe database backup and recovery will be covered later in this ELearning course.
Best practices for the deduplication database:
• Put the DDB on the same server as the MediaAgent
• The DDB volume needs to be on dedicated high performance disks
• DDB disk volume performance must meet IOPs required for qualifying as a disk for DDB use
• Dedicated storage adaptors - make sure MA sees the drives
• Use Simpana Cvdiskperf tool (found in software base directory) to check read/write rate
Deduplication Storage Options
There are three methods that disk library data paths can be configured when using
deduplication: Direct Attached Storage or DAS, Storage Area Network or SAN and Network Attached Storage or NAS.
Direct attached storage is when the disk library is physically attached to the MediaAgent. In this case each building block will be completely self-contained. This provides for high performance but limits resiliency. If the MediaAgent controlling the building block fails, data stored in the disk library cannot be recovered until the MediaAgent is repaired or replaced.
Keep in mind that, in this case, all the data in the disk library is still completely indexed and recoverable, even if the index cache is lost. Once the MediaAgent is reconstructed, data from the disk library can be restored.
Storage Area Networks or SANs are very common in many data centers. SAN storage can be zoned and presented to MediaAgents using either Fibre Chanel or iSCSI. In this case the zoned storage is presented directly to the MediaAgent providing Read / Write access to the disks. When using SAN storage, each building block should use a dedicated MediaAgent, deduplication database and disk library. Although the backend disk storage in the SAN can reside on the same disk array, logically in the Simpana software it should be configured as two separate libraries.
No unauthorized use, copy or distribution.
Deduplication Storage Options
MediaAgent MediaAgent Read/Write Read/Write Read Only NAS CIFS / NFS Shared Disk Library
Building Block 1 Building Block 2
MediaAgent
Disk Library Read/Write
Building Block
Dedicated Building Block Shared Building Block
This provides for fast and protocol efficient movement of data but, as in the case of Direct Attached Storage, if the building block MediaAgent fails, data cannot be restored. When using SAN storage either the MediaAgent can be rebuilt or the disk library can be re-zoned to a different MediaAgent. If the disk library is rezoned, it must be reconfigured in the Simpana software to the MediaAgent that has access to the LUN.
Network Attached Storage has an advantage in that the path to the storage is directly through the NAS hardware. This means that by using CIFS or NFS, UNC paths can be configured for a disk library to read and write directly to storage. When using NAS storage as a disk library, it is still recommended to configure two separate disk libraries in the Simpana software. In this case the library can be configured as a shared library, where both MediaAgents can see all storage. Separate building blocks should still be used for each MediaAgent providing Read / Write access to a disk library but Read Only access can also be granted to all libraries on the NAS storage. In this case, if a MediaAgent fails, any other MediaAgent with access to the library can conduct restore operations.
Partitioned Deduplication Database
Parallel deduplication is a highly scalable and resilient solution that allows the deduplication database to be partitioned. It works by dividing signatures between multiple databases to increase the capacity of a single building block. If two dedupe partitions are used, it effectively doubles the size of the deduplication store.
In this example, two dedupe partitions have been configured, each on a separate MediaAgent. Signatures are generated on the Client and depending on the signature generated it will be directed to one of the two partitions for processing. Although either MediaAgent can process signature lookups, the data for the client will always use its default MediaAgent path. This allows all unique deduplication blocks to be protected through a single MediaAgent although duplicate blocks may have been protected by either of the MediaAgents.
Since deduplicated data can exist on either of the partitions, the disk library should be configured using NAS storage. UNC paths should be used for the NAS disk library so either MediaAgent will be able to access data even if the other MediaAgent is unavailable.
Parallel deduplication is an advanced feature for large enterprise environments and CommVault Professional Services should be consulted when designing deduplication building blocks using this solution.
No unauthorized use, copy or distribution.
• Provides data path resiliency
• Up to two partitions currently supported • Must be configured
during Storage Policy copy creation
Partitioned Deduplication Database
Media Agent
NAS Disk Library Media Agent DDB DDB Partition 1 Partition 2 Client Data Path Signature Lookups Failover Path
Enterprise Building Block Guidelines
When designing storage policy and building block architecture, another consideration is that certain data types do not deduplicate well against other data types. A prime example would be file system data and database data. In this case, different building blocks and storage policies can be configured to manage different data types. In this example a global deduplication storage policy has been configured with a block size of 128 KB. Two data management storage policies have been configured, one with a 30 day retention and the other with a 90 day
retention. All deduplication blocks from both storage policies will deduplicate based on the global deduplication policy setting, but will be retained based on the data management storage policy retention.
A second building block using a dedicated storage policy has been configured for database backups. In this example a 256 KB block size has been configured and the storage policy has retention of 14 days.
No unauthorized use, copy or distribution.
Enterprise Building Block Guidelines
24 MediaAgent Disk Library MediaAgent Disk Library 256 KB 14 Day Retention Global Deduplication Storage Policy 128 KB 30 Day Retention 128 KB 90 Day Retention 800 GB Production Data 4 TB Production Data 24 TB Production Data
SILO Storage
Consider all the data that is protected within one fiscal quarter within an organization. Traditionally a quarter end backup would be preserved for long term retention. Let’s assume that quarter end backup of all data requires 10 LTO 5 tapes. Unfortunately with this strategy the only data that could be recovered would be what existed at the time of the quarter end backup. Anything deleted prior to the backup within the specific quarter would be unrecoverable unless it existed in a prior quarter end backup. This results in a single point in time that data can be recovered. Now let’s consider those same 10 tapes containing every backup that existed within the entire quarter. Now any point in time within the entire quarter can be recovered. That is what SILO storage can do.
SILO storage allows deduplicated data to be copied to tape without rehydrating the data. This means the same deduplication ratio that is achieved on disk can also be achieved to tape. As data on disk storage gets older the data can be pruned to make space available for new data. This allows disk retention to be extended out for very long periods of time by moving older data to tape.
How SILO works
Data blocks are written to volume folders in disk storage. These folders make up the
deduplication store. The folders have a maximum size which once reached the folder is marked closed. New folders will then be created for new blocks being written. The default volume folder
No unauthorized use, copy or distribution.
• How SILO Works • SILO Folder
Recovery Process
SILO Storage
Folder closed when size limit reached Metadata Block data Index data Client Storage Policy Primary Copy Secondary Copy SILO Copy
size for a SILO enabled copy is 512 MB. This value can be set in the Control Panel, in the Media
Management Applet. The SILO Archive Configuration setting Approximate Dedup disk volume size in MB for SILO enabled copy is used to specify the volume folder size. It is strongly
recommended to use the default 512 MB value. For a SILO enabled storage policy, when the folder is marked full it can then be copied to tape. What this really is doing is backing up the backup.
How volume folders are moved to SILO Storage
When a storage policy is enabled for SILO storage an On Demand Backup Set is created in the File System iDataAgent on the CommServe server. The on Demand Backup Set will determine which volume folders have been marked full and back them up to tape each time a SILO operation runs. Within the backup set a Default Subclient is used to schedule the SILO
operations to run. Just like an ordinary data protection operation, right-click the subclient and select Backup. The SILO backup will always be a full backup operation and use the On Demand Backup to determine which folders will be copied to SILO storage.
SILO storage recovery process
In traditional recovery from tape, the tape is mounted in a drive and the data is recovered directly back to the recovery location. With SILO to tape the data must first be staged to the disk before the data can be recovered. Each volume folder that contains data blocks for the restore must be staged to the disk library for the recovery operation to complete. Since block level deduplication will result in blocks in different locations being referenced by data, multiple volume folders may be needed for a single recovery operation. This can result in a slower restore performance.
SILO storage is intended to be a compliance solution by storing data with long retention in deduplicated form. Time to recover SILO data will be longer than traditional tape or disk storage since it needs to be pre-staged to disk before recovery. SILO storage is not an option to recover data from last week but rather is a feature to recover data from last year or five years ago. Understanding this concept places Silo storage into proper perspective. This feature is for long term preservation of data to allow for point in time restores within a time period with
considerably less storage requirements than traditional tape storage methods.
How the Process works
Let’s assume we are using deduplication and Silo storage. Our primary storage policy copy has a retention of two years. We choose to seal the deduplication store every quarter. We will have one active store, and at least one cached store on disk. This means we can perform point in time recovery of data for a period of six months from disk. We will also be using space management with disk thresholds configured that if we reach 85% of disk capacity we will prune cached volumes. If there is enough disk storage available we might be able to keep 9 – 12 months of data on disk. Beyond that point the data will need to be pulled from the tape SILO.
We could define our SLA for up to 6 months to be 2 hours. From 6 months to 1 year the SLA will be 2-4 hours. Beyond that point the SLA will be 4+ hours.
The recovery process will work as follows:
• The CommVault administrator performs a browse operation to restore a folder from eight months ago.
• If the volume folders are still on disk the recovery operation will proceed normally. • If the volume folders are not on disk the recovery operation will go into a waiting state. • A SILO recovery operation will start and all volume folders required for the restore will be
staged back to the disk library.
• Once all volume folders have been staged, the recovery operation will run.
• To ensure adequate space for SILO staging operations a disk library mount path can
optionally be dedicated to SILO restore operations. To do this, in the Mount Path Properties
General tab select the option Reserve space for SILO restores.
• The procedure is straight forward and as long as SILO tapes are available the recovery operation is fully automated and requires no special intervention by the CommVault administrator.
Advanced Deduplication Configurations Compression
It is recommended for most data types to enable compression during the deduplication process. Compression can be enabled in the storage policy primary copy or in the subclient properties. By default compression is enabled for a deduplication storage policy. You can turn compression off in the storage policy copy or you can override the use of compression in the subclient properties.
For certain application types such as Oracle and SQL which may perform application level compression you should use a dedicated deduplication storage policy with compression turned off. In some cases using application compression can cause deduplication rates to suffer. In this case you should experiment with using application compression or CommVault compression to determine which results in better deduplication ratios. For large databases it is recommended to consult with CommVault on best practices.
Client Side Disk Cache
Along with configuring Client Side Deduplication, a Client Side Disk Cache can be created. Each subclient will contain their own disk cache which will hold signatures for data blocks related to the subclient. The default cache size is 4GB. The Client Side Disk Cache is recommended for slow networks such as WAN backups. For any networks that are 1Gbps or higher using this option will not improve backup performance.
No unauthorized use, copy or distribution.
• Compression
• Client side disk cache • Variable Content Alignment • Fragmentation considerations
In this example a signature is generated on a deduplication block. The signature is then compared in the local client disk cache first. If the block does not exist in the disk cache, the signature is then sent to the MediaAgent and compared in the deduplication database. If the block does not exist, both the client disk cache and the deduplication database is updated and the block is written to the library.
Variable Content Alignment
Variable Content Alignment can be used in some situations to improve deduplication ratios for large data files such as database dumps. Enabling this option will read block data and align the blocks to correspond to prior data blocks that have been deduplicated. By aligning the content prior to performing the hash process, better deduplication ratios may be attained. This will however require more processing power on the Client. Since Simpana deduplication is content aware, enabling this option will not provide better deduplication for average file data. This option is only recommended for large file system data such as database dumps or PST files with low incremental rates of change.
Fragmentation Considerations
Since CommVault stores data in the disk library in chunks, when blocks are deleted from disk it causes empty spaces within the chunk. For Windows MediaAgents, the sparse file attribute is used to allow empty spaces within the chunk to be used to store new blocks. Since Windows uses a write next mechanism when writing data to disk, the empty spaces will only be allocated to new data when the disk starts to reach full capacity. If new data is written to the empty spaces, fragmentation could occur. This could negatively affect performance for auxiliary copy and restore operations. Scheduled fragmentation analysis operations can be configured for the disk library. This will analyze each mount path to determine the level of chunk fragmentation that exists. If fragmentation levels are too high, defragmentation operations can be run by using third party file level defrag tools. When performing defragmentation operations on a mount path, the mount path should be placed in an offline state.
Deduplication Best Practices
General Guidelines
• Carefully plan your environment before implementing deduplication policies.
• Consider current protection and future growth into your storage policy design. Scale your
deduplication solution accordingly so the deduplication infrastructure can scale with your environment.
• Once a storage policy has been created the option to use a global dedupe policy cannot be
modified.
• When using encryption use dedicated policies for encrypted data and other policies for
non-encrypted data.
• Not all data should be deduplicated. Consider a non-deduplicated policy for certain data
types.
• Non-deduplicated data should be stored in a separate disk library. This will ensure accurate
deduplication statistics which can assist in estimating future disk requirements.
Deduplication Database
• Ensure there is adequate disk space for the deduplication database.
• Use dedicated dedupe databases with local disk access on each MediaAgent.
• Use high speed SCSI disks in a RAID 0, 5, 10, or 50 configurations.
• Ensure the deduplication database is properly protected.
• Do NOT backup the deduplication database to the same location the active database resides.
No unauthorized use, copy or distribution.
• General Guidelines • Deduplication Database • Disk Library Considerations
• GridStor™ Technology Considerations • Deduplication Storage
• Block Size Settings • Performance
• Global Deduplication • SILO Storage
Disk Library Considerations
• It is recommended to use dedicated disk libraries for each MediaAgent.
• If using a shared disk library with multiple MediaAgents use NAS disk storage as opposed to
SAN.
• Disk libraries should be divided into 2-4 TB mount paths.
• Use network paths as opposed to drive letters. Drive letters will limit the total number of
mount paths that can be added.
GridStor Technology Considerations
• For backup and restore performance in large environments, it is not recommended to use
GridStor Round Robin load balancing.
• If you choose to use the GridStor feature for data protection resiliency configure the GridStor
feature in a shared disk library configuration to Failover as opposed to Round Robin.
• Do NOT use GridStor Round Robin option when using a shared disk library in a SAN
environment.
Deduplication Store
• Only seal deduplication stores when databases grow too large or when using SILO storage.
• When using SILO storage consider sealing stores at specific time intervals e.g. monthly or
quarterly to consolidate the time period to tape media.
• For WAN backups you can seed active stores to reduce data blocks that must be
retransmitted when a store is sealed. Use the option Use Store Priming option with
Source-Side Deduplication to seed new active stores with data blocks from sealed stores.
Block Size & block Processing
• Use the recommended 128 KB block size for all object level and virtual machine data
protection jobs.
• For large databases use 256 KB or higher block setting. Consult with Professional Services for
very large databases for best approach for data protection.
• Use compression for object level and virtual machine data protection jobs.
• For database applications that perform their own compression do NOT use CommVault
compression.
• Use the Variable Content Alignment option when backing up large database dump files using
the Simpana File System iDataAgent.
Performance
• Use DASH Full backup operations to greatly increase performance for full data protection
operations.
• Use DASH Copy for auxiliary copy jobs to greatly increase auxiliary copy performance.
• Ensure the deduplication database is on high speed SCSI disks.
• Ensure MediaAgents hosting a dedupe database has enough memory (at least 32GB).
Global Deduplication
• Global deduplication is not a be-all-end-all solution and should not be used all the time.
• Consider using global dedupe policies as a base for other object level policy copies. This will
provide greater flexibility in defining retention policies when protecting object data.
• Use global deduplication storage policies to consolidate remote office backup data in one
location.
• Use this feature when like data types (File data and or virtual machine data) need to be
managed by different storage policies but in the same disk library.
SILO storage
• SILO storage is for long term data preservation and not short term disaster recovery.
• Recovery time will be longer if data is in tape SILO so for short term fast data recovery use
Designing a Sound Data Protection Strategy
PLANNING A SOUND DATA PROTECTION STRATEGY