MN-EHADMPM-001 October 2006
Element and Poller
This documentation (the "Documentation") and related computer software program (the "Software") (hereinafter collectively referred to as the "Product") is for the end user's informational purposes only and is subject to change or withdrawal by CA at any time.
This Product may not be copied, transferred, reproduced, disclosed, modified or duplicated, in whole or in part, without the prior written consent of CA. This Product is confidential and proprietary information of CA and protected by the copyright laws of the United States and international treaties.
Notwithstanding the foregoing, licensed users may print a reasonable number of copies of the Documentation for their own internal use, and may make one copy of the Software as reasonably required for back-up and disaster recovery purposes, provided that all CA copyright notices and legends are affixed to each reproduced copy. Only authorized employees, consultants, or agents of the user who are bound by the provisions of the license for the Software are permitted to have access to such copies.
The right to print copies of the Documentation and to make a copy of the Software is limited to the period during which the license for the Product remains in full force and effect. Should the license terminate for any reason, it shall be the user's responsibility to certify in writing to CA that all copies and partial copies of the Product have been returned to CA or destroyed.
EXCEPT AS OTHERWISE STATED IN THE APPLICABLE LICENSE AGREEMENT, TO THE EXTENT PERMITTED BY APPLICABLE LAW, CA PROVIDES THIS PRODUCT "AS IS" WITHOUT WARRANTY OF ANY KIND, INCLUDING WITHOUT LIMITATION, ANY IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE OR NONINFRINGEMENT. IN NO EVENT WILL CA BE LIABLE TO THE END USER OR ANY THIRD PARTY FOR ANY LOSS OR DAMAGE, DIRECT OR INDIRECT, FROM THE USE OF THIS PRODUCT, INCLUDING WITHOUT LIMITATION, LOST PROFITS, BUSINESS INTERRUPTION, GOODWILL, OR LOST DATA, EVEN IF CA IS EXPRESSLY ADVISED OF SUCH LOSS OR DAMAGE.
The use of this Product and any product referenced in the Documentation is governed by the end user's applicable license agreement. The manufacturer of this Product is CA.
This Product is provided with "Restricted Rights." Use, duplication or disclosure by the United States Government is subject to the restrictions set forth in FAR Sections 12.212, 52.227-14, and 52.227-19(c)(1) - (2) and DFARS Section 252.227-7013(c)(1)(ii), as applicable, or their successors.
All trademarks, trade names, service marks, and logos referenced herein belong to their respective companies. Copyright © 2006 CA. All rights reserved.
Table of Contents
Audience . . . 5
About This Guide . . . 5
Reading Path . . . 5
Revision Information . . . 5
Documentation Conventions. . . 6
Technical Support . . . 6
Managing the Data Collection Process
7Understanding the Polling Process: Important Concepts . . . 7
After You Save Discover Results . . . 7
Controlling the Polling Rates and Intervals . . . 8
Stopping and Starting the Poller . . . 8
Changing the Statistics Poll Rates . . . 8
Changing the Conversation Poll Rate . . . 9
Viewing the Poller Configuration . . . 9
Using OneClickEH . . . 9
Using the eHealth Console . . . 10
Managing Poller License Consumption . . . 11
Freeing Licenses . . . 11
Using DCI to Modify Element Information . . . 11
Your Resource Management Roadmap . . . 12
Resolving Polling Errors
13OneClickEH Status Summary . . . 13
Understanding the Cause of Common Polling Errors . . . 14
Identifying Polling Problems. . . 15
Investigating the Problems . . . 15
4 • Table of Contents
Modifying Your Element Configuration
Updating Element Properties . . . 21
Agent Type . . . 22
SNMP Index . . . 23
Interface Speed . . . 23
Element Values (Discovered Information) . . . 23
Specifying User Strings to Use as a Filter in OneClickEH . . . 24
Making Element Names More Intuitive . . . 25
Element Names. . . 25
Aliases . . . 25
Excluding an Element from Live Exceptions Monitoring . . . 26
Recording Statistics Data for an Element . . . 26
Tracking Changes to the Poller Configuration. . . 26
Monitoring Administration Changes . . . 27
Adding New Elements to Your Configuration
29Adding a Statistics Element . . . 29
Creating a Modem Pool Element. . . 29
Adding a Permanent Virtual Circuit Element for a Frame Relay Element. . . 30
Managing Alternate Latency. . . 31
Understanding Alternate Latency Collection. . . 31
Configuring the Alternate Latency Ping Process . . . 31
Disabling Alternate Latency Data Collection. . . 34
Organizing Your Elements by Grouping
35The Purpose of Grouping . . . 35
Creating Groups and Group Lists . . . 35
Controlling Access to Groups and Group Lists . . . 38
Editing and Copying Groups and Group Lists . . . 38
Deleting Groups and Group Lists of Elements . . . 40
Focusing eHealth Console Administration on One Group . . 40
Troubleshooting Common Problems
43eHealth Misses a Scheduled Poll . . . 43
Unable to Report on Router Interface Elements . . . 43
eHealth Reports the Speed of Elements as Zero . . . 44
Cannot Save Data for an Element in the Database. . . 44
This guide describes the eHealth element and poller management process—collecting, modifying, and maintaining data on the health of the resources that eHealth is monitoring. Element and poller management is one of several primary tasks that an eHealth administrator performs. This guide supports eHealth Release 6.0 and later.
This guide is intended for anyone who must perform element and poller management or is responsible for managing one or more aspects of this administrative function. Before you use this guide, you should become familiar with network terminology, general eHealth concepts, and the resource discovery process.
About This Guide
This section describes the reading path that you should follow, as well as the revision history of this guide. It also includes the documentation conventions used in this guide.
Prior to reading this guide, you should review the Introduction to eHealth guideand the eHealth Administration Overview Guide. You can refer to the eHealth Resource Discovery Guide for detailed information about the process that eHealth uses to find the resources to monitor. These guides are available in PDF format in the eHealth Web Help and on the Support web site.
6 • Preface
Table 1 lists the conventions used in this document.
If you have a Support Contract ID and password, you can access our Support Express knowledgebase at the following URL: http://search.support.concord.com.
If you have a software maintenance contract, you can obtain assistance with eHealth. For online technical assistance and a complete list of primary service hours and telephone numbers, contact Technical Support at http://support.concord.com.
Table 1. Documentation Conventions
File or Directory Name Text that refers to file or directory names.
code Text that refers to system, code, or operating system command lines.
emphasis Text that refers to guide titles or text that is emphasized. enter Text that you must type exactly as shown.
Name Text that refers to menus, fields in dialogs, or keyboard keys.
New Term Text that refers to a new term, that is, one that is being introduced.
Variable Text that refers to variable values that you substitute.
→ A sequence of menus or menu options. For example, File→Exit
means “Choose Exit from the File menu.”
NOTE Important information, tips, or other noteworthy details.
CAUTION Information that helps you avoid data corruption or system
Managing the Data Collection Process
Managing element data that eHealth collects on your resources is a critical administration task. Over time, your infrastructure changes, so it is important to proactively maintain it to ensure that eHealth can continue to collect data on the health of your resources. This chapter begins by describing the important concepts associated with managing the process of data collection (the polling process), and explains how to control the collection rate. It follows with procedures for viewing your element data and managing your poller license consumption.
Understanding the Polling Process: Important Concepts
To monitor and manage the performance of networks, systems, and applications, the eHealth software locates resources, or elements, within your infrastructure through a process referred to as discovery. To find the resources, eHealth uses Simple Network Management Protocol (SNMP) agents to search for the IP addresses that you specify. It then obtains information from the management information base (MIB) of each device and creates elements based on that data.
After You Save Discover Results
When you save the discover process results, eHealth stores the element information in its database and its poller configuration. The eHealth poller automatically collects performance and availability statistics data from the network, system, and applications elements in the eHealth poller configuration through a process referred to as polling. The poller configuration defines the information that is specific to each element such as the name, any configuration information that eHealth obtained during discovery, the polling rate (the frequency with which eHealth polls the element), and the agent type (the type of element that eHealth discovered).
While discover can keep the majority of your poller configuration up-to-date automatically, the discover process works most effectively when you maintain your configuration by correcting polling problems, deleting resources that have been removed from the infrastructure, and adding new ones. When resources change so drastically that discover cannot match any of their attributes to existing elements, you need to manually update the element information to prevent or resolve discover errors and duplicate elements. If you do not actively manage and maintain your poller configuration, eHealth will not be able to successfully collect data on your resources.
Managing Discovered Elements
eHealth provides two administrative interfaces that you can use to manage the poller and your elements: the eHealth console and the OneClick for eHealth console (OneClickEH). Not all functions are available through both interfaces; some are only available in one. This document provides
8 • Chapter1 Managing the Data Collection Process
Controlling the Polling Rates and Intervals
The eHealth poller runs continuously to regularly collect data from elements. You can stop the poller to troubleshoot problems, and you can control how it runs. If you do not want to collect data all day or every day of the week, you can configure the
poller to run only during certain hours of the day and also control the
Stopping and Starting the
Using the Poller Controls option in the
Setup menu of the eHealth console, you can turn the poller on or off for all of your resources, set it to run
continuously, or set it to run between
two specific times of the day. You can also control the amount of data that you collect by customizing the
polling interval—the rate at which eHealth collects statistics data and conversation data. When you turn
off polling, you are turning off the statistics, import, and conversation pollers, so eHealth does not poll any elements. As a result, reports show a gap during periods when the poller is off.
Changing the Statistics Poll Rates
By default, eHealth polls all elements at the Normal rate (every 5 minutes). With the exception of alternate latency elements, the statistics poller can also poll statistics elements at two other rates: Slow (every 30 minutes) and Fast (every minute). When eHealth polls an element at the Slow or Normal rates, it saves the data in the database during each poll. For those elements that are polling at the Fast rate, eHealth collects one-minute samples and saves the aggregate 5-minute sample (the normal poll interval) as a single sample in the database.
For individual elements, you can also set a fourth rate, Fast Store, to collect data with a higher granularity. When you poll elements at the Fast Store rate, eHealth collects data at the Fast rate and stores the samples in the database without aggregating them. For example, if you have set the Fast poll interval to 30 seconds (using the Poller Controls dialog), eHealth collects 30-second samples for those elements and saves the individual samples every 5 minutes (at the Normal poll interval) without aggregating the data.
To set the poll rate for individual elements, use the Edit Element window in OneClickEH. To set global controls that impact all elements that you are polling, use the Poller Controls dialog in the eHealth console.
Modifying the Polling Interval. You can use the Poller Controls dialog to change the Fast rate to 30 seconds or 2.5 minutes. You can change the Normal rate to 10, 15, or 30 minutes, and change the Slow rate to 60 minutes. By increasing the Normal and Slow polling intervals, your database does not grow as quickly, but the data in the reports has less granularity. As a result, you might not observe peaks because
eHealth averages them over a longer sample time. If poll intervals are too long or you have a complex polling environment, the devices may lose data or discard it before eHealth can collect it from them. As your eHealth element configuration becomes larger, you may need to tune your statistics poller to ensure that the poller operates more efficiently, reduce the SNMP polling impact on shared system resources, and allow the eHealth installation to maximize its element configuration size. For instructions, refer to the
Tuning the eHealth Statistics Poller White Paper, which is available on the Support web site. For additional guidelines on changing the polling interval, refer to “Guidelines for Changing to the Fast and Fast Store
Viewing the Poller Configuration • 9
eHealth Element and Poller Management Guide
Changing the Conversation Poll Rate
The default polling interval for conversation data (which applies to data collected by Traffic Accountant) is 30 minutes, but you can set it to 15, 45, or 60 minutes. The polling interval is longer than that for statistics data because eHealth polls each Traffic Accountant probe to collect data on every conversation that the probe detected, which can result in a tremendous amount of data being collected at each poll.
Probe memory can vary and determines the amount of conversation data that eHealth collects from the probe. Use a polling interval that allows you to retrieve data from the probe before it resets counters or drops data. The number of elements in your database and the amount of disk space available for the database might require you to use a polling interval that is longer than the default.
Viewing the Poller Configuration
To view all elements that you are monitoring and their associated configuration information, you can use the eHealth console or OneClickEH. Each interface provides unique capabilities.
To view and modify the poller configuration through OneClickEH, click Find Elements in the Managed Resources folder and select the Element Chooser tab.
Filtering the Table Display
By default, OneClickEH immediately shows all of your elements in the table. If you specify matching criteria in the Filter table by field at the top of the console screen, it filters the list as you type. If you have a very large configuration, OneClickEH can require several minutes to show the entire list. To filter your element list before OneClickEH displays the entire element configuration, you can disable these features.
To modify the matching criteria before OneClickEH displays all elements:
1. Select Tools→Options→Advanced at the top of the console to display the Advanced Settings dialog. 2. Deselect Show all elements immediately when displaying element tables.
3. Optionally, if you want eHealth to allow you to finish specifying all matching criteria before it filters the list, also deselect Filter element tables as you specify the filtering criteria.
4. Click OK to close the Advanced Settings dialog. 5. Select the Element Chooser tab.
6. Specify matching criteria in the Filter table by field; then click Go to complete the operation.
10 • Chapter1 Managing the Data Collection Process
Searching for Elements
Additionally, you can filter your element search based on a specific name, alias, or IP address. You can select Name or Alias from the list on the Element Chooser page and specify a portion of the element’s name using wildcards (for example, *boston*). You can also specify the digits of one or more of IP address octets in the IP Address field. You can use the * symbol as a wildcard to search for all numbers within the range of 0-250, but you cannot use the wildcard to search for a single digit. All sample formats shown in the following table are valid:
You can also exclude the element’s subcomponents from your search. After OneClickEH finds the elements that match your search criteria, it displays the properties in the element table. In each view, you can sort any column to quickly find elements of particular interest to you. You can move, reorder, and resize the columns. If you want to save the data, you can copy, export, or print it.
To view and modify the poller configuration through the
eHealth console, select Setup →Poller Configuration, as shown in Figure 1. By default, the dialog displays all
elements in your configuration sorted
alphabetically by name. Using various options, you can reorder and filter the list to show specific elements, and manage all of your element types. If you enable the global setting in the Options dialog to display alias names, the poller configuration shows aliases in the list. To find an element, enter a string in the
Search for Name field. You can use wildcards, such as an asterisk (*) to match zero or more characters or a question
mark (?) to match any single character. If you enter a string without any wildcards, the filter displays the elements that contain that string anywhere in the name. If the Search for Name field is empty, the filter displays all elements.
Type Examples Range 111.* 111.*.11.* *.11 111.11.23-30.* 111.11.*.50-100 111.11.11.*-30 List 111.11.13,15,17.* 20,27,32.11.*.11
Managing Poller License Consumption • 11
eHealth Element and Poller Management Guide
Saving the Poller Configuration Information
To manage the poller configuration, you can save some or all of the information that appears in the Poller Configuration dialog by outputting it to a file. The resulting ASCII file can be useful for inventory checking and for reviewing the elements that you are monitoring. For example, you could use it to devise a list of alias names. The Save List To File option (located in the lower right corner of the dialog) saves all
configuration data that is currently displayed in the dialog and stores it in an ASCII file named
poller.cfg.log in the /ehealth/log directory. If the file already exists, eHealth overwrites it with the new information. To simply output a list of element names that exist in your poller configuration, you can run the nhListElements command.
Managing Poller License Consumption
You can discover any number of elements; however, eHealth will only poll elements that have a poller license. eHealth uses poller licenses to control the number of elements that you can poll. Although most elements only consume one license, it is important to track the number of
licenses that eHealth requires for all of the elements in your poller configuration.
You can manage your poller license consumption from the OneClickEH Status Summary window that appears immediately after you log in to your
eHealth system. From that window, you can identify the number of poller licenses that are available for use, the total number of licenses that you have, and the total number that you need to poll all of your elements. For instructions on purchasing additional licenses, refer to the Technical Support web site.
If you do not need to monitor all of your elements, you can free some licenses by turning off polling for them or deleting them. When you turn off polling for an element, it remains in the database, but it does not require a license because eHealth is no longer collecting data for it. These procedures are discussed in detail in “Disabling Polling” on page 17.
In some cases, two elements may share a license. To identify those elements that are the primary consumers of a license, and those that share a license with another element, use the nhListElementLicenses command. To free a license for use by another element, you need to delete or disable all primary elements that share the same license. The nhListElementLicenses command also identifies those elements that do not need a license and those that are not consuming a license.
Using DCI to Modify Element Information
If you have a network management system (NMS) or other source at your site that collects configuration information and data for your resources, you can use the eHealth DataSync application programming interface to import element information. When you need to modify element configuration information in the eHealth database, you can export the information to a Database Configuration Information (DCI) file, modify the information, then import it into the eHealth database. To use DataSync, you must be
comfortable creating programs, scripts, and files that use complex syntax and have a working knowledge of
12 • Chapter1 Managing the Data Collection Process
Your Resource Management Roadmap
This chapter provides an overview of the eHealth polling process and how to access the controls. Managing the process of collecting, modifying, and maintaining the data is a critical administrative function that involves several primary tasks (outlined in Table 1) that are discussed in the remaining chapters.
Table 1. Resource Management Tasks
Task Description Chapter
Resolving Polling Errors
Use the OneClickEH Statistics Polling Management interface to identify and resolve polling problems to ensure that eHealth can continue to collect data.
Updating Element Information
Use the eHealth console Poller Configuration dialog and OneClickEH to manually modify element properties and make element names easily recognizable.
Adding New Elements
Use the eHealth console Poller Configuration dialog to manually add elements that discover is unable to find.
Use the OneClickEH Managed Resources interface to organize related elements into groups for easier reporting and management.
Resolving Polling Errors
To ensure that eHealth can continue to collect data on your resources, it is important to monitor polling, identify errors, and quickly resolve any polling problems that occur. This chapter provides you with guidelines for using the OneClickEH Statistics Polling Management interface to perform those tasks.
OneClickEH Status Summary
The OneClickEH Status Summary window (Figure 2) summarizes all polling activity that takes place on your eHealth system and updates it every minute. If your web user account has permission to manage statistics polling, you can click All Errors to drill down to the Statistics Polling Management interface from this window and review polling problems and errors. To resolve an error, you can right-click any element and select an option from the pop-up menu.
14 • Chapter2 Resolving Polling Errors
Understanding the Cause of Common Polling Errors
After successfully discovering your elements, you could encounter some common polling problems such as those described in Table 2. Errors can result from temporary or prolonged network delays or connection problems, element index shifts, and traffic congestion. You can also receive errors if you are polling devices that have not been certified for use with eHealth.
Table 2. Typical Polling Errors (Page 1 of 2) Polling Error Description Device Exceeded Allowable Timeouts
When devices with multiple elements such as routers or servers do not respond to SNMP requests, the information that the poller collects is incomplete. In addition, if those elements are consistently unresponsive, the performance of the poller is negatively impacted. To ensure that polled data is accurate and to limit the amount of time that eHealth spends polling one device, eHealth tracks the number of SNMP request timeouts. If it exceeds the value of NH_SNMP_DEVICE_TIMEOUTS, it stops polling any elements associated with that device and proceeds to poll the next device.
Resolution: If the device has not responded for one or two polling cycles, the problem could be a temporary one. If you wait another polling cycle or two, the problem may resolve itself. If the device has never responded to polls, the element information could be incorrect or the SNMP agent could be down. Contact the device owner to investigate problems.
Received Large Delta Error
A counter wrap occurs when one or more MIB variables reaches its maximum value, resets to zero, and begins counting again within one polling cycle. When the delta, which is the difference between the counter values for the last poll and the current poll, reaches or exceeds 50% of its maximum value, eHealth identifies this as a large delta error. It discards the data for that element for that poll because the counter wrap can cause unusual results in reports and performance monitoring.
Resolution: Most often, delta errors occur with high-speed links or devices. Their MIB counters increment so quickly that they can wrap within the normal five-minute polling interval. Use OneClickEH to change the statistics polling rate by right-clicking and selecting Fast Poll to Resolve Large Deltas.
If the index shifted as a result of a reboot or other configuration change, right-click and select Rediscover or select Rediscover with Rules to rediscover the device and update the poller configuration. If the error persists, verify that the device is certified for use with eHealth. Search for the device at
Response (to SNMP)
If the polled device does not respond to an eHealth poll within a specified amount of time, eHealth generates an error and does not collect any data for that element for that poll interval.
Resolution: If the agent is running, the problem may be due to temporary network delays. Wait a few poll cycles to see if the error persists. If the element has never responded to SNMP requests, increase the time that eHealth waits for an SNMP response and the number of times that it retries by modifying the NH_SNMP_TIMEOUT and the NH_SNMP_RETRIES environment variables. To prevent wasteful increases in the overall poll cycle time, increment the value by one second and then wait to observe the impact. If rediscovery is not successful, refer to the Poller Tuning Guide that is available on the Support web site.
Received an SNMP Error
If a device was unable to provide an object identifier (OID) in response to an SNMP request from eHealth,
eHealth generates a generic SNMP error. This can occur when polling information is out-of-date because the element had an index shift or the device has not been certified by CA.
Resolution: Using OneClickEH, right-click and select Rediscover. If the SNMP error persists, ensure that the device responds to SNMP V1 “get” commands, and then verify that the device is certified for use with eHealth. Search for the device at http://support.concord.com/devices/html/search.html.
Identifying Polling Problems • 15
eHealth Element and Poller Management Guide
Identifying Polling Problems
To access all eHealth elements that have reported polling errors during the last Normal polling period, you can use the OneClickEH Statistics Polling Management interface. The element table displays the name and properties of every element that has an error, and the time period for which eHealth has not collected data, as shown in Figure 3. You can easily sort the data and reorder the columns to find specific problems.
Figure 3. Element Table
Investigating the Problems
If your web user account has permission to manage elements, you can right-click any element (except those that are remotely polled) to access the pop-up menu, as shown in Figure 4. To investigate why eHealth was unable to poll an element, as an initial step, you can ping the element from the server to confirm that the element exists and is active within your infrastructure, and that the network connection from eHealth to the device is working. If the element responds to ping, eHealth should be able to collect data from it the next time that a poll occurs. No
Response to ping
eHealth pings each IP address before it sends an SNMP request. If the device does not respond to the ping request, eHealth considers it to be a missed poll and does not collect SNMP data from the device. Ping failures can occur because of network connection problems that either prevent the ping request from reaching the polled device or prevent the ping response from reaching the eHealth system. Ping failures also occur if the device is off. Also, when traffic on the network is congested, routers and switches may discard ping requests in favor of higher-priority traffic.
Resolution: If the ping failure is local to one or more devices, use OneClickEH to ping it. Right-click the ele-ment and select Ping from Server. If it does not respond, it may be off. If you determine that the device has been removed from the network, delete or retire the element by right-clicking it.
If ping errors are occurring for many elements in your configuration and you have confirmed that the cause is not due to temporary network conditions, you can increase the value of the NH_POLL_PING_TIMEOUT environment variable. If you have network restrictions that would always prevent a successful ping, you can disable the ping operation with each poll using the NH_POLL_PING_DISABLED environment variable. This will allow eHealth to gather SNMP responses, but calculate reachability based on SNMP responses rather than true device reachability using ping.
Table 2. Typical Polling Errors (Page 2 of 2) Polling
Figure 4. Element Actions
Ping the element
16 • Chapter2 Resolving Polling Errors
If the element does not respond to ping, you can drill down to an At-a-Glance report or a Trend report to troubleshoot problems. You could also generate an Element Configuration report to review the element’s configuration details and identify its relationships, or associations, to other elements. If an element repeatedly does not respond to ping, but you know that it exists, modify the number of times that eHealth will attempt to retry polling and also change the timeout rate. Double-click the element in the table. In the Edit Element window, select Polling, specify the timeout and retries values, and click OK.
If you are not able to resume polling for the element by manually pinging it, take one of the actions described in Table 3 to resolve the problem. These methods are described in detail in the following sections.
Any changes that you make to statistics or conversation elements take effect during the next poll. The poller does not restart; the changes take effect without interrupting the polling cycle. The following sections provide guidelines. For detailed instructions, refer to the OneClickEH Web Help.
Changing the Polling Rate
By default, eHealth assigns the Normal polling rate to newly discovered statistics elements. It polls elements every 5 minutes and saves the data to the database during the poll. This is the rate that you should typically use to collect data from statistics elements, but eHealth can also poll at these three other rates:
Table 3. Element Actions Available
Element Action Result
Change the polling rate. Enable eHealth to collect data less frequently to accommodate a device that is experiencing heavy traffic.
Disable polling for the element.
Stop collecting data from the element temporarily, or allow another element to use the poller license.
Modify the community string. Update the SNMP password that eHealth uses to control read-write access to data. Rediscover. Update the element in your poller configuration.
Delete the element. Stop polling the element and stop reporting on it.
Retire an element. Stop polling the element, but continue to include element data in reports.
Rate Data Collection Process
Slow eHealth polls elements every 30 minutes and saves the data in the database during the poll. You can change the default interval to 60 minutes.
Fast eHealth aggregates the data from five 1-minute polls into one 5-minute average sample before saving it to the database. This process ensures that the data samples are consistent. You can change the default interval to 30 seconds or 2.5 minutes.
Fast Store eHealth collects data at the Fast rate and stores the samples in the database without aggregating them. If you poll parent elements at the Fast Store rate, you must change the poll rate of each individual child element to Fast Store. Otherwise, eHealth polls their children at the Fast rate and does not store the data.
Identifying Polling Problems • 17
eHealth Element and Poller Management Guide
If an element that is being polled at the Fast rate consistently shows missed polls in the OneClickEH Status Summary window, the element agent might not be able to respond to the SNMP polls during the fast interval. To enable the agent to respond, you should assign the Normal rate to the element.
In contrast, the Slow polling rate allows you to poll elements that require even more time to respond, as well as to poll those elements that you want to poll less frequently because you do not expect their utilization rates to change. This rate is also effective for elements that are not located close to the eHealth system and cannot respond during the Normal poll interval.
Guidelines for Changing to the Fast and Fast Store Rates. To resolve a polling problem, OneClickEH allows you to quickly change any element to the Fast or Fast Store rate. Double-click the element name and select the Polling tab in the Edit Element window; then select a poll rate. Typically, you should only use the Fast rate to collect data more frequently from high-speed devices that do not support 64-bit counters, such as FDDI or ATM interfaces. High-speed element agents collect a significant amount of data during a Normal (five-minute) polling period. If a high-speed interface indicates significantly less volume than you expect, or it generates a large delta error, polling it at the Fast rate can ensure that eHealth will not miss data. The Fast rate is also effective for collecting data from modems or ISDN connections that have short-duration connections.
You can poll alternate latency elements at the Normal rate only.
To ensure that fast-polled data samples are consistent, eHealth aggregates the data from five 1-minute polls into one 5-minute average sample before saving it to the database. When it polls at the Fast Store rate,
eHealth saves the 1-minute samples and does not aggregate them. Therefore, when you use the Fast or Fast Store rate, your system performance requirements increase.
Depending on the size of your eHealth system, if you poll too many elements at the Fast rate, the poll might not finish before the next Fast poll is scheduled to begin. To determine the equivalent polling load, multiply the number of Fast-polled elements by the ratio of Normal poll rate to Fast poll rate, and add that number to your total number of elements. For example, if you have 5000 elements, and 200 are polled at the Fast rate, the poller is actually performing 4800+ (200x5)=5800 polls during a Normal 5-minute polling interval.
Using OneClickEH, you can easily disable polling for one or more elements by right-clicking them in the element table. When you disable polling for an element, it remains in the database and in the poller configuration, but eHealth no longer collects data for it. You may want to disable an element to exclude it from reports temporarily or permanently:
• If you are unable to determine why an element is generating an error after being polled, you should disable polling for it while you try to resolve the error.
• If you attempt to poll an element that is not certified, it will generate a “Received an SNMP Error” or “Received Large Delta Error.” While your request for certification is being processed, you should disable polling for the element to prevent subsequent errors.
If you disable polling for a router or a system, keep in mind that eHealth also disables polling for all elements that belong to that router or system. However, it continues to poll any router or system interfaces that record detail data unless you specifically disable polling for those interfaces.
18 • Chapter2 Resolving Polling Errors
Redistributing Polling Licenses. You can only poll elements for which you have available poller licenses. If you do not need to poll one or more elements, you can free the licenses by disabling polling for the elements. When you disable an element temporarily, it remains in the database, but it does not consume a license. In some cases, you may just want to free the license temporarily for another element.
To free a license for use by another element:
1. Right-click the element in the OneClickEH table and select Disable Polling. 2. Double-click the element that needs the license.
3. In the Edit Element window, select the Polling tab.
4. Select Yes from the Polling Enabled list to enable polling. eHealth automatically assigns the free license to that element and begins polling it.
Changing the Community String
The community string is used by administrators to grant read and write access to various device MIBs.
eHealth typically uses a read-only community string to poll devices. If you change the read-write
community string or the read-only community string of a device, you must change the community strings that eHealth uses for the element that represents the device. Otherwise, eHealth cannot poll the device. If your web user account has permission to manage community strings from the OneClickEH interface, you can easily change both the read-write and read-only community strings for an element that you are monitoring.
• To quickly change the read-write community string for an element, right-click the element name and select Modify Read-Write Community. Specify a maximum of 64 single-byte or 32 double-byte
characters using the letters A through Z and a through z, backslashes, and the numbers 0 through 9. Do not use the word All, spaces, and commas. If you include a backslash (\) in a community string at the command line on a UNIX system, you must supply an additional backslash as an escape character. On a Windows system, do not supply the extra backslash escape character.
• To change both the read-write and the read-only community strings for an element, double-click it to display the Edit Element window; then select the General tab. Specify a read-write community string for SNMP Gets and Sets, which includes creating, modifying, or deleting MIB monitoring rows, and automatic licensing; then specify a read-only community string for SNMP Get requests.
Rediscovering an Element
If an element generates a “Received Large Delta Error,” the index may have shifted as a result of a reboot or other configuration change. To resolve the problem, you can right-click the device name in the
OneClickEH element table and select Rediscover to try to update the poller configuration. (If you are using DCI rules files to filter your discoveries, you can select Rediscover with Rules to specify the rules for your rediscovery). Large delta errors may also occur if you have not applied the latest device certification patch. Once you have applied it, you need to rediscover using the same method.
When you rediscover an element using OneClickEH to correct polling problems, OneClickEH runs the same discover process that is available from the eHealth console based on some default guidelines for determining the type of discovery to run, and the level of changes to make. For those devices that may have more than one element, it rediscovers the entire device, not just the one element that is having polling problems. As a best practice, always review the Discover log carefully—before saving the results—to confirm the changes that OneClickEH identifies.
Identifying Polling Problems • 19
eHealth Element and Poller Management Guide
Changing the Retries and Timeout Rate
If you receive a “No Response (to SNMP)” error for an element that has never responded to SNMP requests, you can increase the time that eHealth waits for an SNMP response by double-click the element in the OneClickEH table. In the Edit Element window, select Polling, specify the poll timeout value, and click
OK. By default, eHealth waits four seconds (4000000 microseconds) before timing out, based on the value of the NH_SNMP_TIMEOUT environment variable. To prevent wasteful increases in the overall poll cycle time, increment the value by one second and then wait to observe the impact.
By default, eHealth attempts to retry polling three times (based on the NH_SNMP_RETRIES environment variable setting) before skipping the element and recording the poll as a missed poll. You can specify a value in the Poll Retries field to increase or decrease the number of times that eHealth will attempt to retry polling an element before giving up on the poll.
Removing Elements That You Do Not Want to Monitor
Over time, you may remove resources from the infrastructure or determine that you no longer want to monitor them. If you continue to poll elements that do not exist, eHealth will generate “No Response to Ping” polling errors, and future discoveries will list those resources as Missing Elements in the discover log. To keep your poller configuration up-to-date, and prevent eHealth from polling these elements, you must delete or retire them from your poller configuration.
Deleting an element is a permanent action. You will not be able to resume polling for that element in the future, and eHealth will no longer include that element in its reports. When you retire an element, you remove it from polling, but eHealth will continue to include it in reports. In the future, you can unretire the element if you want to begin collecting data for it again. The following sections explain why you might want to retire an element rather than delete it.
Deleting Elements. A deleted element is one that you no longer want to poll and that you do not want to include in reports. When you delete an element, eHealth removes all data associated with that element from the database. However, eHealth could attempt to rediscover it the next time that you run the discover process for the same IP address. As a best practice, if you do not want to rediscover existing elements that you have deleted, add the IP addresses to anIP exclusion file to filter it out. Follow the instructions provided in the eHealth Resource Discovery Guide.
When deleting elements from your poller configuration, keep in mind the following:
• If you delete a router element, eHealth also deletes all elements associated with that router, such as CPUs, interfaces, and so on.
• If you delete a system element, eHealth also deletes all elements associated with that system, such as CPUs, disks, partitions, interfaces, process sets, and processes. If the Record detail data option is set in the Modify Element dialog for a system element, you can delete the interface as a subcomponent of the parent as well as an interface element, or retain one of the elements.
• If you delete a process set, eHealth deletes all of its associated processes as well. If you rediscover the system and choose to find processes, eHealth rediscovers the process set.
• If you delete an application service element, eHealth also deletes all process sets that are associated with that element.
20 • Chapter2 Resolving Polling Errors
Before you delete an element, use OneClickEH to right-click it and select Ping from Server. If the element does not respond, confirm that the network has not experienced an outage that could be preventing the element from responding. Ping problems can result from temporary connectivity problems. The element will stop returning ping errors once the network problem is resolved. Once you are certain that the element does not exist and you are sure that you do not want to monitor it, right-click and select Delete; then click
Retiring Elements. A retired element is one that you no longer want to poll, but that you want to include in reports until its data ages out or you delete the element. When you retire an element, you retain the old element but do not collect new data for it. You may want to retire an element when you upgrade to a new technology so that you can compare the improvement for the new technology type.
The discover process ignores retired elements when it compares newly discovered elements to those in the poller configuration. When you no longer need the element or when data is no longer available for it, you can retire it by right-clicking it and selecting Retire from the OneClickEH menu. Once you have retired an element, eHealth grays out its entry in all OneClickEH element tables. If you want to return a retired element to an active status, right-click the element and select Unretire from the menu.
When retiring elements, follow these guidelines and practices:
• If retired elements still exist in your network and you discover them again, eHealth creates new elements. If you do not want to rediscover them, add their IP addresses to an IP exclusion file for your scheduled Discover jobs.
• When retiring a parent element, keep in mind that eHealth automatically retires all related child elements that do not consume their own polling licenses. It does not retire child elements for which you have specified Record detail data in the Modify Element dialog.
• When retiring response source elements, you must first disassociate or remove any Service Availability tests associated with the agent and retire the response source element associated with the agent. Otherwise, you will need to manually edit the svcrsp.cf file to prevent active tests from running on the Service Availability agent.
• If a different element has been retired with the same agent type and index values, the status bar indicates that an element already exists with these properties. To resolve this problem, change the IP address of one of the retired elements. The information for retired elements must be unique.
Modifying Your Element Configuration
In most cases, interactive and scheduled discovery keeps your element information up-to-date with the latest information from the devices. However, you can manually change the configuration to perform the following tasks:
• Update element properties.
• Specify user-defined data to use as an element filter. • Make element names more intuitive.
• Exclude an element from Live Exceptions monitoring.
This chapter provides you with guidelines for modifying element information manually and tracking the changes that you and other administrators make to the poller configuration.
Updating Element Properties
Although discover is able to update elements
automatically, you may need to manually update element properties to prevent or resolve discover errors and duplicate elements. Typically, you perform manual updates when the element has changed so drastically that discover cannot match any of its attributes to an existing element, so it considers it a new element. Also, you can perform these changes manually when you need to override the device settings. This might happen when the device configuration is incorrect, but you do not have permission to change the device, or you want to monitor the device in a special way. Any changes that you make to statistics or conversation elements take effect during the next poll. The poller does not need to restart; the changes take effect without interrupting the polling cycle.
You can quickly examine the primary properties of an element by selecting Properties from the right-click menu to display the Properties window, as shown in Figure 5. To change a subset of the element’s properties, you can double-click the element to display the Edit Element window, as shown in Figure 6. However, to change most element properties, such as agent type, speed, index, you must use the eHealth console.
Figure 5. Element Properties
22 • Chapter3 Modifying Your Element Configuration
To change an element, select
Setup→Poller Configuration; select the element that you want to modify and click
Modify to display the Modify Element dialog (Figure 7). When you make changes to the poller configuration, it is important to follow these guidelines:
• Do not modify the poller configuration while a scheduled discover job or a configuration import is running. If the poller configuration changes after you open the dialog, an error message appears, and you cannot save your changes. You must close the window, reopen it, and make the changes again. • If you change the
information for an existing device, make the
corresponding changes to the information for any component element. For
example, if you edit the system name setting for a router, edit the setting for the interfaces and CPUs on that router, as well.
If you change the polling status, agent type, or identification information of a parent element such as a router, system, RAS, or modem pool, eHealth also modifies the information for the child elements of that parent. You do not need to make these changes.
The discover process uses the information that is supplied by the agent at the polled device to select an agent type for each element that it creates. The agent type specifies the type of element, and thus, the data that eHealth collects from it.
You can change the agent type of an element to tailor the way that eHealth identifies and manages the element. However, use care when doing so; if you select a type that the device cannot support, it could prevent eHealth from polling the element.
Updating Element Properties • 23
eHealth Element and Poller Management Guide
Typically, you can modify the agent type to change system user partitions to system partitions, or to change interfaces to collect MIB2 rather than RMON data:
• eHealth uses different thresholds for system and user partitions because the utilizations and problems are usually very different. System partitions are not expected to change size, and they are often stable even with a high utilization percentage (80% or greater). User partitions often change and grow quickly, and problems can occur if the user partitions are more than 60-70% utilized.
• If interface elements discovered as RMON (Ethernet) agents support MIB2 variables, you can change the agent type to MIB2 (port) to take advantage of the additional information that is collected for MIB2 interfaces.
The SNMP index is a unique identifier for similar elements of a device (for example, all interfaces of a router). A device index can change, or shift, as a result of adding or removing interface cards in a router, upgrading router firmware, changing system elements, or rebooting a device. Index changes can result in an SNMP polling error. If they occur, you can usually rediscover to resolve the problem and update the elements. However, if the changes are too complex for discover to resolve, you may need to manually update them.
Index changes can affect all elements of a device. For example, if you remove an interface card from the first card slot of a router, the indexes for the remaining interface elements could all decrease by one. To ensure that eHealth can successfully poll these devices, manually change the index for these elements. If the element has more than one index, you can change all of them using the Modify Element dialog.
During the first poll, eHealth obtains the speed of LAN elements, and both the incoming and outgoing speeds of full-duplex interface elements as configured at the device. If the device’s configuration does not report the correct speed, you may need to manually modify the element speed to ensure that eHealth can produce meaningful reports. You may also need to modify it if eHealth sets the speed of Frame Relay devices to 0 because they did not have any speeds configured.
You can specify a number, which sets the rate in bits per second, or a number and the letter k (to specify kilobits per second) such as 56 k for 56,000 bits per second. You could also specify m (to specify megabits per second) such as 16 m for 16 Mbits per second (Mbps). eHealth does not accept a speed of zero (0).
Element Values (Discovered Information)
The Discovered Information values listed in Table 4 are the values that eHealth obtains from the element agent during a discover. You can use the Modify Element dialog in the eHealth console to change these values.
Use caution when changing these values—a subsequent discover process may reset them to the device values or create duplicate elements. As a best practice, only change these values to prevent or resolve duplicate elements that resulted from a discover process, or to update the information to match the associated attributes that have changed at the device level.
24 • Chapter3 Modifying Your Element Configuration
Before you change the Discovered Information values, carefully consider the following:
• If you change the values of System Name, Discover Key, Interface, or Hardware ID, but not at the agent level, a subsequent discover process could reset them to the agent values or create duplicate elements. • If you change the values of System Description, Location, Type, or Contact, a subsequent discover will
reset the values to the values at the agent.
• If you change the Interface value, and it is used in the discover key, eHealth also changes the discover key. You must confirm this change when you click OK or Apply/Next.
Specifying User Strings to Use as a Filter in OneClickEH
The User String column displays a custom string that you can specify to uniquely describe your elements. Typically, the string is created in the Element section of the DCI file using the DCI element configuration input tools; but with OneClickEH, you can view, specify, and filter based on those strings. The Element section specifies configuration information for the elements that you want to import in to the eHealth poller configuration and database using the DataSync programming interface. You can use the fields in an element definition to filter or group elements, and to modify element information during importing. To add the User String column to your element tables, right-click a column name and select Select Fields. If you have not used DataSync to import your elements, you can specify a user-defined string to use as a filter in OneClickEH. You can also modify or append to the string that you have already defined.
To define a userString value to use as a filter in OneClickEH tables: 1. Double-click the element name in the element table.
2. In the Edit Element window, select the General tab.
3. Specify text in the User String text box or append additional text to the existing string. 4. Click OK.
Table 4. Discovered Information Fields Field Name Description
System Name sysName variable obtained from the device
sysDescr variable obtained from the device
Location sysLocation variable obtained from the device
Discover Key Unique value that the discover process assigns to the element
Interface ifDescr variable obtained from the device (for CPUs, disks, and partitions, it is the device or partition name)
Type ifType variable obtained from the device
Contact sysContact variable obtained from the device
Making Element Names More Intuitive • 25
eHealth Element and Poller Management Guide
Making Element Names More Intuitive
When you perform a discovery, eHealth automatically assigns a name to each element that it creates. It follows the naming conventions described in the eHealth Resource Discovery Guide. eHealth reports often truncate element names that are longer than 30 characters. In other cases, the names are not very
meaningful, so it is difficult for users to quickly identify the elements. Once you have saved elements in the poller configuration, you can make the element names more intuitive by editing the names or creating alias names using the OneClickEH Edit Element window.
You might want to change the name of an interface to be more meaningful to administrators or report consumers. If you leave the Element Alias field blank when you modify the element name, eHealth saves the old element name in that field to help you match the old element name with the new one.
When renaming an element, follow these guidelines:
• Do not duplicate another element name in your database. Element names must be unique.
• Specify a maximum of 64 single-byte or 32 double-byte characters using the letters A through Z and a through z, the numbers 0 through 9, dashes (-), periods(.), underscores (_), colons (:), and slashes (/). Do not specify a name that exceeds 64 bytes or is composed entirely of numerical characters.
• If you change the name of a router or a system, eHealth automatically updates the elements of that router or system with the new name of the device to which they belong. You do not have to make those changes yourself.
To quickly identify elements in your report, you can assign shorter and more meaningful alias names to them after you discover them. For example, you might want to change the name of one or more interfaces to indicate the names of the cities to which the interfaces connect or to show that the interfaces are leased lines.
• To change the alias name of a single element, double-click the name in the OneClickEH table and select the General tab. Specify the alias name by following the character limitations for element names and click
OK. Alias names do not have to be unique.
• To change or assign an alias substring for multiple elements, select the elements in the Poller
Configuration dialog and click Modify; then specify a regular expression with which to replace all of them. If you do not assign an alias name to an element, eHealth displays the element name in the Alias Name
field of the OneClickEH element tables and the Poller Configuration dialog.
Parents. A parent is a device (such as a router, system, modem pool, or remote access server) to which an element belongs. For example, a modem can belong to a modem pool. If you move an element from one device to another, you should modify its parent setting in the poller configuration. If the element is on a router, system, or remote access server (or is part of a modem pool), double-click the name in the Poller Configuration dialog and update the name that appears in the Parent Element Name field of the Modify Element dialog.
When you modify a parent element, eHealth modifies certain properties of the child elements to match. If you attempt to modify properties of the child element that cause inconsistencies with the parent element,
26 • Chapter3 Modifying Your Element Configuration
Excluding an Element from Live Exceptions Monitoring
If you have a license for Live Health, the Live Exceptions field appears in the Modify Element dialog. Live Exceptions monitors eHealth elements to detect conditions defined by alarm rules. When Live Exceptions monitors a group or group list, it monitors all elements in the group or group list; however, you can exclude an element from Live Exceptions monitoring by deselecting Monitor in the Modify Element dialog. You can also select the time zone within which Live Exceptions should monitor the element.
Recording Statistics Data for an Element
To ensure that eHealth reports on individual statistics in LAN/WAN reports and aggregate statistics in Router or System reports, you can modify a router interface or system interface element by selecting
Record detail data in the Modify Element dialog. This option appears for all router or system interface
Tracking Changes to the Poller Configuration
You can review changes that you and other administrators make to the poller configuration by examining the pollerAudit.date.time.log in the log directory of the eHealth installation. Table 5 describes the
information that is provided for each change:
Table 5. pollerAudit.date.time.log File
date-time-timezone Specifies the date and time of the poller configuration change, and the time zone in which it was made.
username Specifies the name of the eHealth user who made the change. src Specifies the source of the change:
i indicates that a user made the change from the Poller Configuration dialog.
i indicates that an interactive discover process made the change.
s indicates that a scheduled discover process made the change.
u indicates that an unspecified process made the change (this is based on the value of the nmsSource field in the GlobalInfo section of the Element DCI file). DBID Specifies a unique numerical identifier that eHealth assigns to the element. action Specifies that the process or user performed one of the following actions:
add Added an element to the poller configuration. delete Removed an element from the poller configuration. modify Replaced the value in a field.
disable Turned polling off for this element.
element Specifies the name of the element in the poller configuration.
Tracking Changes to the Poller Configuration • 27
eHealth Element and Poller Management Guide
You can access this log by selecting Server Files in the System Information folder of the OneClickEH console window. The file contains any changes that have been made to the poller configuration through OneClickEH and the eHealth console. OneClickEH also provides an activity log that includes all activity generated by OneClickEH administrators.
Monitoring Administration Changes
The eHealth OneClick Activity option (in the eHealth History folder of the OneClickEH console)
identifies the administrators who recently made changes to the poller configuration using OneClickEH, the tasks that they performed, when the activity took place, and the IP addresses of the clients on which the tasks were performed. The table distinguishes each message based on type: informational messages appear in white, warnings appear in orange, and errors appear in red. By default, the log displays all activity that has occurred within the last hour, but you can obtain a full log of activity that has occurred within specific time periods.
This log itemizes all actions performed by users through OneClickEH; it does not report any actions performed through the eHealth console.
The eHealth OneClick activity log also allows you to manage the size of the OneClickEH log file. Click the
Log Administration tab to determine the amount of disk space that the OneClickEH activity log is
consuming. If the log file is becoming too large and you do not need to retain the data, you can clear all data or reset the default.
Adding New Elements to Your
Under most circumstances, the discover process will find all of your resources and add them to your poller configuration automatically. In a typical eHealth environment, you should not have to use the
eHealth console to manually add elements. However, when discover is unable to add an element, you can add the element manually to avoid generating multiple “no response to ping” errors and ensure that you can poll it. You also need to use the manual process to do the following:
• Create a modem pool element for a discovered RAS device.
• Add a permanent virtual circuit (PVC) element for a discovered Frame Relay circuit element. This chapter provides guidelines for performing these tasks using the Poller Configuration user interface through the eHealth console. It also discusses the process that eHealth uses to collect alternate latency, configure the ping process, and specify latency partners for discovered devices.
Adding a Statistics Element
If you are unable to discover an element that you want to monitor, you can add it to your poller configuration manually using the Add option in the Poller Configuration dialog. The element name cannot duplicate another element name in the database and should not be composed entirely of numerical characters. If you are adding an element that is on a router, system, or remote access server (or part of a modem pool), specify the router, system, remote access server, or modem pool element name as the parent name. After you create the element, at the next scheduled poll, eHealth polls it and updates the Poller Licenses Required information.
Creating a Modem Pool Element
To report on modem pools that might span one or more RAS devices, you can discover the devices using the Modem Pool mode. eHealth searches the specified IP addresses for RAS devices and creates an element for each of the following:
• Each modem pool configured at the device • Each modem in the device
• Each ISDN interface in the device
If the RAS device agent does not have a modem pool definition, eHealth creates a single modem pool element for the RAS device and assigns all modem and ISDN elements in the device to it. If the default modem pool does not reflect your modem pool configuration, you can create modem pool elements and assign the modem and ISDN elements to the correct modem pools. When you run reports for the