• No results found

NetBackup enterprise lifecycle best practices

SAN-related problems generally involve the use of Shared Storage Option (SSO). The two types of NetBackup users generally are as follows:

■ Operators who have limited access to hosts and to the fabric of the SAN ■ System administrators who have administrator privileges, but no access to the

fabric

The SAN administrator generally operates outside the NetBackup domain entirely. Troubleshooting NetBackup is difficult when it involves the SAN because

administrative responsibility tends to be spread out. No one person has a clear picture of the overall backup structure.

CommandCentral Storage provides a consistent view of the entire SAN against which to measure performance. It gives NetBackup administrators the data they need to request changes of and collaborate with the SAN administrators. It helps NetBackup administrators when they design, configure, implement, or modify solutions in response to changes in backup environments (hardware, applications, demand).

CommandCentral Storage can help those responsible for managing a backup system in a SAN environment by integrating SAN management and backup operation information.

93 Troubleshooting procedures

CommandCentral Storage can provide support during the following backup lifecycle stages:

■ Design

Use CommandCentral Storage during the design phase to determine the following:

■ Where to deploy a backup system on the SAN

■ If SAN redesign is required to meet backup windows at minimum hardware cost and application impact

For example, a backup design may not require the purchase of additional switches if it takes into account the following: the performance trending reports that CommandCentral Storage keeps to determine the pattern of fabric utilization.

Or perhaps if you re-zone the fabric through CommandCentral Storage, it may provide sufficient bandwidth for meeting backup window requirements. In addition, CommandCentral Storage can provide visibility into recovery designs and fabric performance in the event of large restores that critical business operations require.

■ Configuration, testing

Generally, backup systems are tested before implementation to obtain benchmarks and adjust (tune) the system for maximum efficiency.

CommandCentral Storage can provide the performance metrics for end-to-end I/O capabilities for all elements in the backup path. Additionally, CommandCentral Storage can provide valuable environmental information for qualifying the backup environment as well as a baseline for future troubleshooting configuration management.

■ Implementation, reconfiguration, production

CommandCentral Storage can help to determine whether a host can see through the entire I/O path to the target backup device by pinpointing connectivity issues.

Using CommandCentral Storage to troubleshoot NetBackup in a

SAN environment

CommandCentral Storage provides centralized visibility and control across physical and virtual heterogeneous storage environments. It helps you optimize your data center by providing you a single view of the full storage stack from application to spindle. By enabling storage capacity management, centralized monitoring, and mapping, CommandCentral Storage software helps improve storage utilization, optimizes resources, increases data availability, and reduces capital and operational costs.

94 Troubleshooting procedures

You can use CommandCentral Storage in the following ways to troubleshoot NetBackup in a SAN environment:

The ability to launch CommandCentral Storage and access an overview of the SAN from NetBackup in context is valuable for quickly identifying root problems. Also, because NetBackup and SAN administrators are often in different groups, you can avoid the fragmented operations that lead to resolution delays. With CommandCentral Storage, the NetBackup administrator has a view of the overall health of the SAN as part of the initial troubleshooting process.

In-context launch

The CommandCentral Storage view of the SAN environment can help you detect any failure in the topology. An environment inventory provides valuable troubleshooting support for the support process. Connectivity and

device check

To investigate a backup failure:

■ Launch CommandCentral Storage in context from NetBackup to check fabric health.

■ Check reports for fabric events that occur about the time NetBackup generated the error log.

General

troubleshooting tools

The following use cases demonstrate how CommandCentral Storage can be integrated into a NetBackup troubleshooting procedure to investigate the SAN context of a backup system. Most common NetBackup problems on SANs are associated with connectivity issues.

95 Troubleshooting procedures

Table 2-9 Troubleshooting NetBackup using CommandCentral Storage Troubleshooting

Symptom

This problem represents a loss of connectivity and typically generates status code 213 (no storage units available for use). NetBackup freezes tapes with two write failures even when SAN problems cause the failures.

Do the following in the order listed:

■ In theNetBackup Administration Console, check theDevice Monitorfor a device that is down. If so, try to bring it back up. ■ If the drive is still down, check the syslog, device logs, and

NetBackup logs for status 219 (the required storage unit is unavailable) and 213 (no storage units available for use) on the media server. Check the NetBackup logs for status codes 83, 84, 85, or 86. These codes relate to write, read, open, and position failures to access the drive.

■ Try arobtestto check connectivity. If no connectivity exists, the likely problem is with hardware.

■ From the master server, select the robot or device that the storage unit is associated with.

■ Launch CommandCentral Storage for a view of the media server and devices. Check the fabric connectivity (whether any I/O path devices are down).

Cannot access drives or robots. Backup jobs fail.

96 Troubleshooting procedures

Table 2-9 Troubleshooting NetBackup using CommandCentral Storage

(continued)

Troubleshooting Symptom

CommandCentral Storage topology is a good visual tool to check connectivity between the hosts and the devices. Use it to find a dislodged network cable or other hardware problems.

You may not be able to discover a drive or robot when you configure off-host backups which require the media server to detect all devices involved in the backup: disk array, disk cache, data mover, library, and drive. Connectivity must be correct. In addition, thebptpcinfo

command in the NetBackup Snapshot Client generates a3pc.conf

configuration file for running the backup. The WWN (world wide name) for some devices is often incorrect. Use CommandCentral Storage to verify that the contents of the3pc.conffile correlate to the actual fabric configuration.

For a description of an off-host backup, thebptpcinfocommand, and the3pc.conffile, refer to theNetBackup Snapshot Client Configuration

document.

Do the following in the order listed:

■ Run the device discovery again. If you still do not detect the new device, the likely problem is with hardware.

■ Launch CommandCentral Storage. If the new device does not appear in the CommandCentral Storage topology, check the SAN hardware connections to determine if the device is connected. If the new device shows up as disconnected or offline, contact the SAN administrator and check switch configuration.

Compare this troubleshooting procedure to a similar problem without the benefit of CommandCentral Storage, such as status code 214: robot number does not exist.

■ Rerun the Device Configuration Wizard. After you run the

Device Configuration Wizard, the new device does not appear in the discovered devices list.

97 Troubleshooting procedures

Table 2-9 Troubleshooting NetBackup using CommandCentral Storage

(continued)

Troubleshooting Symptom

Sometimes a problem with a switch or bridge either before or during the backup job causes the job to fail and take down the drive. This problem is very difficult to diagnose. By the time the NetBackup administrator checks the SAN, everything may be fine again. Another possibility is that another application reserved the device. A SCSI device monitoring utility is required to resolve this issue, which neither CommandCentral Storage nor NetBackup currently supplies. Do the following in the order listed:

■ Select a drive inside the NetBackup Device Monitor. Launch CommandCentral Storage in the drive context to see if the drive is connected to the SAN.

■ Check for alerts around the time of the job failure and see if a SAN problem could have caused the job to fail.

The backup job fails intermittently and the drive is down intermittently. No errors appear in the error log other than that the job failed.

98 Troubleshooting procedures

Using NetBackup utilities