IBM Tivoli Netcool Performance Manager Wireline Component Document Revision R2E2. Troubleshooting Guide

(1)

IBM Tivoli Netcool Performance Manager 1.3.2 Wireline Component

Document Revision R2E2

Troubleshooting Guide

(2)

Note

Before using this information and the product it supports, read the information in “Notices” on page 177.

(3)

Chapter 1. Troubleshooting Tivoli

Netcool Performance Manager . . . 1

Troubleshooting a problem . . . 1

Troubleshooting checklist for Tivoli Netcool Performance Manager . . . 3

Known problems and solutions . . . 3

Troubleshooting tasks . . . 3

Real-time charts do not work as expected. . . . 3

Error: ORA-00001:unique constraint (PV_ADMIN.PK_SEGM) violated . . . 4

MDE memory constraint . . . 5

Incomplete SNMPv3 metric collection . . . 5

After upgrading Tivoli Common Reporting, it is not possible to log into Tivoli Integrated Portal . . 6

Collectors swapping from idle to running at startup . . . 6

Searching knowledge bases . . . 7

Chapter 2. Logs (Wireline Component) . 9

Overview . . . 9

Logs by component . . . 9

Installation log files . . . 9

DataChannel logs . . . 11

DataLoad logs . . . 12

DataMart logs . . . 13

DataView logs . . . 14

Database log . . . 14

Logs messages format . . . 14

Logging configuration and information utilities . . 15

DataChannel logs configuration . . . 15

DataView logs configuration. . . 18

statGet utility. . . 18

Configuring trace and logging . . . 19

Default logging level . . . 20

Trace logging for DataView . . . 20

The configure command . . . . 20

Troubleshooting . . . 22

Event IDs . . . 22

Chapter 3. Contacting IBM support . . 23

Exchanging information with IBM . . . 23

Sending information to IBM Support . . . 24

Chapter 4. Introduction SNMP Inventory . . . 25

Overview . . . 25

Discovery . . . 25

Metrics and Properties. . . 26

Inventory Synchronization and Change Management . . . 26

Change Management for Elements . . . 26

Change Management for Sub-Elements . . . . 27

Grouping Sub-Elements . . . 28

Where to Go From Here . . . 28

Chapter 5. SNMP inventory troubleshooting . . . 29

Overview . . . 29

Discovery Troubleshooting . . . 31

Discovery Does Not Start . . . 31

Discovery Starts But Issues Warning Messages. . 37

Discovery Seems to Hang or Never Finishes . . 45

Synchronization Troubleshooting . . . 47

Synchronization (Elements) . . . 47

Synchronization (Sub-elements). . . 53

Grouping Troubleshooting . . . 57

Monitoring the Tivoli Netcool Performance Manager Log File . . . 57

Tivoli Netcool Performance Manager Log Messages 57 Burned subelements . . . 57

Scenario 1 - Instance Shift Causes Disconnect . . 57

Scenario 2 - Instance Shift Causes Burn . . . . 58

Where to Go From Here . . . 59

Chapter 6. SNMP inventory management . . . 61

Regular monitoring. . . 61

Routine SNMP inventory management tasks . . . 61

Finding Elements and subelements about to reach their retry limit . . . 61

Finding Elements and Sub-elements That Have Been Retired . . . 63

Where to go from here . . . 64

Chapter 7. Messages. . . 65

DataChannel error messages. . . 65

DataLoad error messages . . . 121

DataView operational messages . . . 163

Notices . . . 177

(4)

(5)

Chapter 1. Troubleshooting Tivoli Netcool Performance Manager

You can use this troubleshooting and support information to troubleshoot problems with Tivoli Netcool Performance Manager.

This information assumes a working installation of Tivoli Netcool Performance Manager. For installation or upgrade problems, refer to the installation and upgrade information.

Troubleshooting a problem

Troubleshooting is a systematic approach to solving a problem. The goal of troubleshooting is to determine why something does not work as expected and how to resolve the problem.

The first step in the troubleshooting process is to describe the problem completely.

Problem descriptions help you and the IBM technical-support representative know where to start to find the cause of the problem. This step includes asking yourself basic questions:

v What are the symptoms of the problem?

v Where does the problem occur?

v When does the problem occur?

v Under which conditions does the problem occur?

v Can the problem be reproduced?

The answers to these questions typically lead to a good description of the problem, which can then lead you a problem resolution.

What are the symptoms of the problem?

When starting to describe a problem, the most obvious question is “What is the problem?” This question might seem straightforward; however, you can break it down into several more-focused questions that create a more descriptive picture of the problem. These questions can include:

v Who, or what, is reporting the problem?

v What are the error codes and messages?

v How does the system fail? For example, is it a loop, hang, crash, performance degradation, or incorrect result?

Where does the problem occur?

Determining where the problem originates is not always easy, but it is one of the most important steps in resolving a problem. Many layers of technology can exist between the reporting and failing components. Networks, disks, and drivers are only a few of the components to consider when you are investigating problems.

The following questions help you to focus on where the problem occurs to isolate the problem layer:

(6)

v Is the problem specific to one platform or operating system, or is it common across multiple platforms or operating systems?

v Is the current environment and configuration supported?

If one layer reports the problem, the problem does not necessarily originate in that layer. Part of identifying where a problem originates is understanding the

environment in which it exists. Take some time to completely describe the problem environment, including the operating system and version, all corresponding software and versions, and hardware information. Confirm that you are running within an environment that is a supported configuration; many problems can be traced back to incompatible levels of software that are not intended to run together or have not been fully tested together.

When does the problem occur?

Develop a detailed timeline of events leading up to a failure, especially for those cases that are one-time occurrences. You can most easily develop a timeline by working backward: Start at the time an error was reported (as precisely as possible, even down to the millisecond), and work backward through the available logs and information. Typically, you need to look only as far as the first suspicious event that you find in a diagnostic log.

To develop a detailed timeline of events, answer these questions:

v Does the problem happen only at a certain time of day or night?

v How often does the problem happen?

v What sequence of events leads up to the time that the problem is reported?

v Does the problem happen after an environment change, such as upgrading or installing software or hardware?

Responding to these types of questions can give you a frame of reference in which to investigate the problem.

Under which conditions does the problem occur?

Knowing which systems and applications are running at the time that a problem occurs is an important part of troubleshooting. These questions about your environment can help you to identify the root cause of the problem:

v Does the problem always occur when the same task is being performed?

v Does a certain sequence of events need to occur for the problem to surface?

v Do any other applications fail at the same time?

Answering these types of questions can help you explain the environment in which the problem occurs and correlate any dependencies. Remember that just because multiple problems might have occurred around the same time, the problems are not necessarily related.

Can the problem be reproduced?

From a troubleshooting standpoint, the ideal problem is one that can be

reproduced. Typically, when a problem can be reproduced you have a larger set of tools or procedures at your disposal to help you investigate. Consequently,

problems that you can reproduce are often easier to debug and solve. However, problems that you can reproduce can have a disadvantage: If the problem is of

(7)

problem in a test or development environment, which typically offers you more flexibility and control during your investigation.

v Can the problem be re-created on a test system?

v Are multiple users or applications encountering the same type of problem?

v Can the problem be re-created by running a single command, a set of commands, or a particular application?

“Searching knowledge bases” on page 7

You can often find solutions to problems by searching IBM knowledge bases.

You can optimize your results by using available resources, support tools, and search methods.

Troubleshooting checklist for Tivoli Netcool Performance Manager

By answering a set of questions that are structured into a checklist, you can sometimes identify the cause of a problem and find a resolution to the problem on your own.

Answering the following questions can help you to identify the source of a problem that is occurring with Tivoli Netcool Performance Manager:

1. Is your issue a known problem?

2. Is the configuration supported?

3. What are you doing when the problem occurs?

v Installing, upgrading, or migrating the product v Doing administration tasks

v Doing authorization tasks v Networking

v Using the product

4. What, if any, error messages or error codes were issued?

5. If the checklist does not guide you to a resolution, collect additional diagnostic data. This data is necessary for an IBM^®technical-support representative to effectively troubleshoot and assist you in resolving the problem.

Known problems and solutions

A list of known problems and their solutions are described here.

For a list of known problems, visit the following Web site:

Known Issues with Tivoli Netcool Performance Manager 1.3 - Wireline Component

Troubleshooting tasks

Some troubleshooting tasks in Tivoli Netcool Performance Manager are described here.

Real-time charts do not work as expected

Symptoms

Sometimes when you restart the Tivoli Netcool Performance Manager Channel Name Server (CNS) or Channel Manager (CMGR) component, your real-time charts no longer work as expected. Real-time charts require a valid reference to the real-time subscriber object. A restart of these components can cause the Channel Name Server (CNS) to no longer provide a valid reference to the real-time subscriber object.

(8)

Resolving the problem

To restore the proper operation of your real-time charts in this situation:

User response:

1. On the DataChannel host, change to the $DCHOME/bin directory. For example:

cd /opt/datachannel/bin

2. Find the PID of the DataChannel CNS component.

./findvisual | grep CNS_visual

You can see an output like the following:

pvuser 653 648 0 Apr 14 ? 129:46 /opt/datachannel/bin/CNS_visual

-nologo /opt/datachannel/bin/dc.im -a CNS -f /o

Note: The PID is the first number after your login ID. For example, in the previous output, the PID is 653.

3. Stop CNS by using the following command:

kill -9 CNS_pid

4. Find the PID of the DataChannel CMGR component by using the following command:

./findvisual | grep CMGR_visual

5. Stop CMGR by using the following command:

kill -9 CMGR_pid

6. On the DataView host, stop the Tivoli Integrated Portal server by using the following command:

stopServer server1 -user tip_user -password tip_password

7. On the DataChannel host, change to the $DCHOME/bin directory. For example:

cd /opt/datachannel/bin

8. Restart CNS and CMGR by running the following commands in the following order:

./cnsw ./cmgrw

9. On the DataView host, restart the Tivoli Integrated Portal server by using the following command:

startServer server1 -user tip_user -password tip_password

Error: ORA-00001:unique constraint (PV_ADMIN.PK_SEGM) violated

An unique constraint error is generated during inventory and grouping.

Symptoms

The following message is received in inventory and grouping output:

-Error: ORA-00001:unique constraint (PV_ADMIN.PK_SEGM) violated

-Reason: a static group membership link exists between a sub-elements and a group

Causes

A user uses the resmgr command to insert a resource into a group and then runs inventory grouping. If inventory grouping tries to place the same resource into the group, a unique constraint error is generated.

Resolving the problem

Using the resmgr command, delete the grouping link.

(9)

MDE memory constraint

An unique constraint error is generated when running an MDE query.

Symptoms

The following message is received you ttempt to run MDE query:

GYMVD0002E: Unable to retrieve information from the data source" error.

or

Mon Feb 1 08:53:51 2010

ORA-1652: unable to extend temp segment by 64 in tablespace PV_LOIS_TEMP

Causes

You are constrained by the available space in PV_LOIS_TEMP.

This is a TEMP tablespace that is shared across all MDE sessions. A user uses the resmgrcommand to insert a resource into a group and then runs inventory grouping. If inventory grouping tries to place the same resource into the group, a unique constraint error is generated.

Resolving the problem

This error can be addressed with the following suggestions:

v Reduce the time period used in the MDE query. If the time range is halved then the space required should be halved.

v Reduce the number of metrics requested. If the original request was for 5 metrics and you can reduce this by 1 metric, it should require approximately 20% less space.

v If neither option is possible, then you can increase the size of the PV_LOIS_TEMP tablespace.

Incomplete SNMPv3 metric collection

Some data loss can be observed from the “Expected measures” and “Produced measures” in the SNMP collector log message.

Symptoms

Some data loss can be observed from the “Expected measures” and “Produced measures” in the SNMP collector log message, especially when there is a sudden drop in “Produced measures” when compared with past hours of collection.

For example:

DL31066 I DL_PERF_SUMMARY Hour: <1s:hour> subElmts: <2s:ses> metrics: <3s:metrics>

requests: <4s:requests> expMeas: <5s:expmeas> prodMeas: <6s:prodmeas>

GYMDL30287W: SNMPJob jobdescr can not start in a dedicated thread. Internal error is err.

Causes

This is a thread-related issue that corresponds to insufficient resources to create another thread, or there is a system-imposed limit on the number of threads to be created.

Resolving the problem

This resolution is applicable only if the following warning message is displayed first to indicate that this is a thread-specific issue as there are many other configuration issues that can cause data losses.

DL30287 W SNMPJOB_THREAD_ERR SNMPJob <1s:jobdescr> can not start in a dedicated thread. Internal error is ’<2s:err>’.

(10)

If the system resources cannot be increased, decrease the concurrency in the collector.

Note: This decreases the overall performance of the collector.

Concurrency is controlled by the GLOBAL.SNMP.MAXASYNC parameter, which is given the default value of 256. The default value can be changed by adding a Custom Datachannel parameter in the TNPM topology. This parameter can be added using the Topology Editor: GLOBAL.SNMP.MAXASYNC=<value> where <value> is a number less than 256.

After upgrading Tivoli Common Reporting, it is not possible to log into Tivoli Integrated Portal

Symptoms

After upgrading Tivoli Common Reporting, it is not possible to log into Tivoli Integrated Portal.

Resolving the problem

After upgrade of Tivoli Common Reporting, if you are experiencing problems logging into Tivoli Integrated Portal, do the following you need to and : User response:

1. Clear your Browser cache for the Tivoli Integrated Portal server For example:

v In Firefox (v3.6):

a. Click Tools > Options > Privacy b. Remove individual cookies.

c. Search using your Tivoli Integrated Portal server name and remove those cookies.

2. Restart your browser.

3. log in to Tivoli Integrated Portal.

Collectors swapping from idle to running at startup

Symptoms

The situation may occur that a collector, upon startup, waits a period of time, then transitions itself to running state only to be swapped to an idle state by the High Availability Manager (HAM).

Causes

At startup the Collector sits in 'Idle' state waiting to determine what it should do next ( if anything ). The HAM (High Availability Manager) should probe the Collector, discover its state, and instruct it to 'start' or to 'stay in idle' or to 'load'

’Start’ leads to running (active collection), ’load’ leads to ready (a sort of hot-spare mode ) and staying in ’idle’ means keep waiting.

If a network disconnect occurred between the collector and HAM, the Collector may have a configuration (channel & collector number) from its previous run state.

The result would be that the collector waits a period of time, then transitions itself to running state only to be swapped to an idle state by the HAM. The symptom is a result of the timing of the HAM's probing, and the Collector's idle-timeout.

(11)

User response:

If you this behaviour a problem, you update either the polling interval of the HAM or the idle timeout of the collector:

v Increase the Collector's Idle Timeout by editing DataChannels > Global DataChannel Properties> Advanced Properties > IDLETIMEOUT, or

v Decrease the HAM's probe interval by editing DataChannels > Administrative Components> High Availibility Managers > <HAM identifier> > Properties >

POLL_INTERVAL

Searching knowledge bases

You can often find solutions to problems by searching IBM knowledge bases. You can optimize your results by using available resources, support tools, and search methods.

About this task

You can find useful information by searching the information center for Tivoli Netcool Performance Manager, but sometimes you need to look beyond the information center to answer your questions or resolve problems.

Procedure

To search knowledge bases for information that you need, use one or more of the following approaches:

v Search for content by using the IBM Support Assistant (ISA).

ISA is a no-charge software serviceability workbench that helps you answer questions and resolve problems with IBM software products. You can find instructions for downloading and installing ISA on the ISA website.

v Find the content that you need by using the IBM Support Portal.

The IBM Support Portal is a unified, centralized view of all technical support tools and information for all IBM systems, software, and services. The IBM Support Portal lets you access the IBM electronic support portfolio from one place. You can tailor the pages to focus on the information and resources that you need for problem prevention and faster problem resolution. Familiarize yourself with the IBM Support Portal by viewing the demo videos

(https://www.ibm.com/blogs/SPNA/entry/the_ibm_support_portal_videos) about this tool. These videos introduce you to the IBM Support Portal, explore troubleshooting and other resources, and demonstrate how you can tailor the page by moving, adding, and deleting portlets.

v Search for content about Product X by using one of the following additional technical resources:

– Tivoli Netcool Performance Manager technotes – Tivoli Netcool Performance Manager Support Website – Tivoli^® support communities (forums and newsgroups)

v Search for content by using the IBM masthead search. You can use the IBM masthead search by typing your search string into the Search field at the top of any ibm.com^®page.

v Search for content by using any external search engine, such as Google, Yahoo, or Bing. If you use an external search engine, your results are more likely to include information that is outside the ibm.com domain. However, sometimes

(12)

you can find useful problem-solving information about IBM products in newsgroups, forums, and blogs that are not on ibm.com.

Tip: Include “IBM” and the name of the product in your search if you are looking for information about an IBM product.

(13)

Chapter 2. Logs (Wireline Component)

Tivoli Netcool Performance Manager has various logs that can be used to examine processing results and problems.

Overview

The following table provides a high-level description of Tivoli Netcool Performance Manager logs.

Table 1. The Tivoli Netcool Performance Manager logs by component Component Description

DataChannel DataChannel manages the primary Tivoli Netcool Performance Manager proviso.loglog file. This log file collects data from each DataChannel component, and from the SNMP and BCOL collectors. DataChannel also creates the tnpmlog.log log file that contains all logs of interest to the user.

DataLoad Separate log files capture events about the daemon start-stop sequence, SNMP activity, and watchdog queries to determine that status of the daemon. Log data is also written to the primary DataChannel log file.

DataView Records data about web transactions and database calls.

DataMart A number of log files containing messages related to inventory, internal Tivoli Netcool Performance Manager communications, and database status.

Database The standard Oracle database log file.

Logs by component

A description of Tivoli Netcool Performance Manager logs organized by component.

Installation log files

A list of log files created by the Tivoli Netcool Performance Manager Wireline Components during installation. Where relevant, recommendations regarding the deletion of log files is provided.

Database

v /opt/Proviso/*

(do not delete these files as they are used during upgrades and for maintenance.

Only delete these files on uninstallation.) v /var/tmp/PvInstall/install.cfg

v /var/tmp/PvInstall/install.log

DataChannel

DataChannel installer logs to stdout and stderr so there are no logs resulting from the installation process.

(14)

Tivoli Integrated Portal

$home represents the root location for the Tivoli Integrated Portal.

v Install logs

– $home/IA-TIPInstall-XX.log – $home/TCR13InstallTrace00.log – $home/TCR13InstallMessage00.log v Uninstall logs

– $home//IA-TIPUninstall-XX.log

There are also logs created for the Deployment Engine, these are not entirely Tivoli Integrated Portal or Tivoli Common Reporting related.

DataView

v $home/DataView_InstallLog.log

DataLoad

v /tmp/dlSetup_install_`date +%Y.%m.%d`

DataMart

v /var/tmp/PvInstall/install.log

Installation

The log resulting from the installation of the Topology Editor is located at:

<topology Editor install location>/Topology_Editor_InstallLog.log.

The logs resulting from any run of the Deployer are created in /tmp/ProvisoConsumer.

The main log file is:

v /tmp/ProvisoConsumer/log.txt Deployment plan logs:

v /tmp/ProvisoConsumer/Plan/logs/[INSTALL_<TIME_STAMP>

Deployer Ant logs:

v /tmp/ProvisoConsumer/Plan/MachinePlan_<MACHINE_NAME>/logs The best cleanup method for installation logs is to remove the

/tmp/ProvisoConsumerdirectory when the system has been running long enough to validate all aspects of a working installation. A ProvisoConsumer directory is created in the /tmp directory of each server in the installation, not just for the primary host.

Topology Editor logs are located at:

<topology Editor install location>/topologyEditor/topologyEditorTrace.log Note: A new log file is created each time the Topology Editor is run.

(15)

DataChannel logs

The DataChannel logs include proviso.log, tnpmlog.log, walkback logs, and logs for individual DataChannel components. The default location of these logs is /DCHOME/log, where DCHOME is the location of DataChannel on your system.

Note: DataChannel component logs are generated only when you enable

dual-logging for the component. When enabled, dual-logging writes information to proviso.log and tnpmlog.log, and to the individual log for the DataChannel component. For information about dual-logging, see “DataChannel logs configuration” on page 15.

Proviso.log

proviso.log is the core Tivoli Netcool Performance Manager log file. It is always running and continuously processing data.

The location of this log file is determined by the LOG.FILE configuration setting.

For more information, see “Topology Editor log settings” on page 16. At the end of the day, a timestamp is added to the file name (for example,

2007.08.24_1188000000_proviso.log) and a new proviso.log file is created for current processing.

The key performance indicators of Tivoli Netcool Performance Manager are derived from Proviso log files.

The statistics are analyzed and an hourly system summary is provided per Tivoli Netcool Performance Manager component. For example,

2013.04.09-08.32.42 UTC UBA.8.200-8664:12997 PERF_INPUT_PROCESSING GYMDC10021I SAM_INVENTORY-sam_inventory_2013-04-09-08-30.csv: processed 3145 records in 5.002 sec (37724.9 records/min), rejected 0 records.

This log file contains information about all DataChannel components and SNMP collectors. The types of events that are logged are determined by the LOG_FILTER setting, as described in “Topology Editor log settings” on page 16.

tnpmlog.log

tnpmlog.log contains a subset of the messages written to the proviso.log file. The subset of messages placed in the tnpmlog.log file are only those messages that are of interest to the user.

The location of this log is determined by the LOG.LOG.FILE configuration setting (for more information, see “Topology Editor log settings” on page 16).

DataChannel log format

The current log format for log messages written to proviso.log and tnpmlog.log:

<version and sequence number><UTC time stamp><Component>-PID:T ID

Message Components

Explanation of each component of the log format for the previous example:

v <version and sequence number>: V1:1234 v <time stamp>: 2007.01.16-06.02.03 UTC v <Component>-PID:T ID: CME.1.1-25201:3456

v <GYMXXXXXSeverity>:Unique ID format, for example GYMDC0412W, which consists of

(16)

– TNPM product identifier: "GYM"

– Component: DC

– Message identifier: 0412

– Severity level character: W=warning (can also be I=information, E=error) v Category: DISCARDED_RECORDS

v Message Text: Got 4 duplicate records, Discarded 0, Total of 4 for mid:

2206 rid: 200023263

If log messages do not have a message ID, then only the severity is shown.

Log message example:

V1:1234 2007.01.16-06.02.03 UTC CME.1.1-25201:3456 GYMDC0412W DISCARDED_RECORDS Got 4 duplicate records, Discarded 0, Total of 4 for mid: 2206

rid: 200023263 4|4|2006| 200023263

Walkback logs

Walkback logs are generated when a Tivoli Netcool Performance Manager application encounters serious problems. In most cases, walkback logs are produced just before the application shuts down because of the error.

These logs are crucial for problem determination by IBM Technical Support. The name of the file begins with walkback- and includes the DataChannel component and timestamp (for example, walkback-UBA.1.2-18032-2007.08.21-16.51.32.log).

Unless you have advanced knowledge of the DataChannel, only the first few lines of a walkback log are useful. The following log entry is an example:

EXCEPTION: ORA-01034: ORACLE not available ORA-27101: shared memory realm does not exist SVR4 Error: 2: No such file or directory FACILITY_NAME: LDR.1-11827

Release: 4.4.1 R2E2 Build: Guam.156

ORIGINATOR: ’an OracleThreadedConnection( hsvcctx = 245F374 )’

PARAMETER: OrderedCollection (an OracleError) TEXT: ORA-01034: ORACLE not available

ORA-27101: shared memory realm does not exist SVR4 Error: 2: No such file or directory

Note: Walkback files must be manually deleted. Due to the DataChannel cron settings, a process that fails is repeated every 5 minutes. Since a new walkback log is generated upon every failure, the result can be many unneeded files.

DataLoad logs

DataLoad logs include the SNMP.log, pvmdmgr.log, and WatchDog.log logs. The default location of these logs is the /DLHOME/log directory, where DLHOME is the location of DataLoad on your system.

SNMP log

The SNMP log file contains detailed messages about all SNMP requests. All SNMP log files are created with the date in their name. Also, the name of the local SNMP log file also includes the collector number, for example: 2010.04.27SNMP.1.1.log By default, SNMP logs include the following recurring events:

v Close hour events v Debug level changes

(17)

Note: Events reported in this log are also reported to the DataChannel log.

Only the current debug level can be set by using the Collector Information Tool as described in the IBM Tivoli Netcool Performance Manager: DataMart Operation Guide. On restart, the collector switches back to the permanent debug level (which by default is Fatal + Warning + Info messages). The permanent debug level is set through the Topology Editor. Changing the debug level is useful when a specific network device or device group does not respond correctly to SNMP requests.

Pvmdmgr.log

The pvmdmgr.log file stores events about the start and stop sequence of the daemon.

Occasionally, the PVM Collecting daemon is running message is displayed before the process is in complete run mode.

WatchDog logs

The daily DataLoad WatchDog.log file contains entries about pings sent to the collector to ensure that the daemon is still running. The name of this log begins with the date for which events are records, for example 2007.08.16WatchDog.log.

DataMart logs

DataMart does not have a central log. Instead, DataMart information is written to log files, such as logFile.PVM or TraceInventory.log, that are associated with individual components or actions, such as Inventory. The default location of these logs is the /DMHOME/log directory, where DMHOME is the location of DataMart on your system.

Note: The inventory process does NOT automatically create a log. If inventory is run as a cron entry, you must redirect the data to a specified log file, otherwise no log data is stored. To show time stamps in inventory log initiated by cron, you must add the following lines to the dataMart.env file:

PVM_LOG_DATE=1 export PVM_LOG_DATE

TraceInventory.log

This log is created when the SNMP Inventory GUI is used. The log file contains messages that sequentially indicate the processing status of an inventory. The following example shows a typical Discover_Analyze entry in this log.

logFile.POLLPROFILE.{collector ID}

This log contains messages related to bulk inventory, and one file is produced for each bulk collector profile. The file suffix follows the format .bulk_n, where n is the number of the collector.

(18)

logFile.*

This type of log file records minor events related to GUI or module function and is generated as required. Examples of this type of log file include

logFile.POLLINVENTORY and logFile.RESMGR.

**provisoinfod*.log**

This log contains messages about internal communications and is of limited use for troubleshooting. At the end of each day, a UNIX timestamp is appended to the file name (for example, provisoinfod1187726401534.log).

**NotifyDBSpace*.log file**

The NotifyDBSpace*.log file is a daily automatic file containing messages about the status of the database. At the end of each day, a UNIX timestamp is appended to the file name (for example, NotifyDBSpace1190102314443.log).

DataView logs

DataView does not have a central log. Instead, DataView writes DataView traces to the Tivoli Integrated Portal.

Note: DataView log messages and configuration options are covered in the IBM Tivoli Netcool Performance Manager: DataView User and Administrator Guide, under Configuring trace and logging in Chapter 4: Administration tasks.

Database log

The Tivoli Netcool Performance Manager database uses the Oracle-supplied log.

The default location for this log file is /ORACLEHOME/admin/PV/bdump/alert_PV.log directory, where ORACLEHOME is the location of Oracle on your system.

Logs messages format

In general, each log message indicates the date, time, Tivoli Netcool Performance Manager component, severity code, event ID, and event description.

The following table describes log message elements.

Table 2. Log message elements.

Field Description Date and

Time

Date and time using the following format:

Timezone Always UTC.

(19)

Table 2. Log message elements. (continued) Field Description

Component Name of Tivoli Netcool Performance Manager component and its process ID separated by a dash (-). Names of Tivoli Netcool Performance Manager components are defined privately for each subsystem. DataChannel uses channel-based naming conventions (for example, CME.1.1), other subsystems can develop their own conventions. Some components include both the process ID and thread ID separated by a colon (for example,

CME.1.1-5638:415).

Severity Code

Event severity code. For more information, see the description of the LOG_FILTER setting in “Topology Editor log settings” on page 16.

Event ID Event identifier. For more information, see “Event IDs” on page 22.

Description Description corresponding to the Event ID.

Logging configuration and information utilities

You can configure logging behavior or use information utilities to help troubleshoot Tivoli Netcool Performance Manager components.

DataChannel logs configuration

Configuration options that govern DataChannel logging behavior are set by using the Topology Editor and are maintained in the database. Logging behavior can be set at three levels: DataChannel, specific DataChannel components, and all DataChannel components or ‘global'. Logging settings are controlled by the log configuration.

Table 3. DataChannel log configuration components.

Level Description

DataChannel Specify logging behavior for DataChannel, including the Channel Manager (CMGR), Channel Name Server (CNS), and Application Manager (AMGR). They override any conflicting options set using GLOBAL. To specify, use the following syntax:

LOG.<option>=<value>

where

<option> = log configuration option (see XREF)

<value> = value for the configuration option Example: LOG.ROOT_DIRECTORY=/opt/datachannel DataChannel

Component

Specify logging behavior for DataChannel components (UBA, FTE, CME, LDR, DLDR). They override any conflicting options set using GLOBAL. To specify, use the following syntax:

where

<component> = 3- or 4-character string for the component (UBA, CME, FTE, LDR, DLDR)

<channel> = DataChannel number.

<collector> = Collector number.

<option> = Configuration option (see XREF)

<value> = Value

Example: CME.2.500.DUAL_LOGGING=true

(20)

Table 3. DataChannel log configuration components. (continued)

Level Description

Global Specify logging behavior for all DataChannel components. To specify, use the following syntax:

GLOBAL.<option>=<value>

where

<option> = Configuration option (see XREF)

<value> = Value

Example: GLOBAL.FC_RETENTION_HOURS=48

Logging configuration changes in Tivoli Netcool Performance Manager can only be made by the Installer. All components must be restarted to apply the changes. The settings include enabling/disabling central and local logging and tracing and changing the log levels for local/remote logging/tracing. This behavior is consistent with the previous releases.

Topology Editor log settings

Log options that can be specified in the Topology Editor are described in the following table.

Table 4. Log options in Topology Editor

Option Levels Description

DUAL_LOGGING GLOBAL

and Component

Use true or false to turn dual logging on or off.

When set to false at the GLOBAL level, only DataChannel logs are generated. When set to true at the GLOBAL level, individual logs for all DataChannel components are generated, in addition to the DataChannel log. Default is false.

Example: UBA.2.500.DUAL_LOGGING=true

LOG_PORT GLOBAL Port number of the log server for common log and trace files.

Example: GLOBAL.LOG_PORT=25000

LOG_SERVER GLOBAL Host name of the log server for common log and trace log files.

Example: GLOBAL.LOG_SERVER=cme4 MAX_LOGS GLOBAL Retention period for local trace files in days.

Example: CME.1.1. MAX_LOGS=7 LOG_PORT GLOBAL Port number to use for logging.

Example: GLOBAL.LOG_PORT=25000 LOG_SERVER GLOBAL Host name of the log server.

Example: GLOBAL.LOG_SERVER=burlington.acme.com RENDER_MESSAGE_

ARGUMENTS

GLOBAL Example: CME.1.1.RENDER_MESSAGE_ARGUMENTS=false

SYSLOG_FACILITY GLOBAL Enter 128 to set the syslog facility to localhost.

Example: GLOBAL.SYSLOG_FACILITY=128

Note: The syslog daemon must be running locally to have access to the log host on the network.

(21)

Table 4. Log options in Topology Editor (continued)

FILTER or LOG_FILTER ALL The event types to log:

v F = Failure (a hard process error)

v E = Error (process termination or disk space problem)

v W = Warning (frequent messages that require no specific action) v I = Information only messages

v 1 = Level 1 debugging information v 2 = Level 2 debugging information v 3 = Level 3 debugging information Example: UBA.2.500.LOG_FILTER=FE

LOG_FILE LOG Name of the common log file.

Example: LOG.LOG_FILE=/opt/datachannel/log/tnpmlog.log LOG_MAX_LOGS LOG Retention period for common log and trace files in days.

Example: LOG.LOG_MAX_LOGS=7 LOG_RENDER_

MESSAGE_

ARGUMENTS

LOG Write message parameters to log file.

Example: LOG.LOG_RENDER_MESSAGE_ARGUMENTS=FALSE

FILE LOG Name and location of the DataChannel log file.

Example: LOG.FILE=/opt/datachannel/log/proviso.log TRAP_HOST LOG Host to send traps generated from log rules.

Example: LOG.TRAP_HOST=127.0.01 TRAP_PORT LOG Port to send traps generated from log rules.

Example: LOG.TRAP_PORT=162

SMTP_HOST LOG Host to send emails generated from log rules.

Example: LOG.SMTP_HOST=127.0.01 SMTP_PORT LOG Port to send emails generated from log rules.

Example: LOG.SMTP_PORT=162

SMTP_TO LOG To address for emails generated from log rules.

LOG_FORWARD LOG Use true or false to enable/disable syslog forwarding.

Example: LOG.LOG_FORWARD=false LOG_FORWARD_

FILTER

LOG Filter to use for forwarded log messages.

Example: LOG.LOG_FORWARD_FILTER=FEWI 123 LOG_FORWARD_

PORT

LOG UDP port used by the host defined in LOG_FORWARD_SERVER.

Example: LOG.LOG_FORWARD_PORT=514 LOG_FORWARD_

SERVER

LOG Host name of the syslog server to forward log messages.

Example: LOG.LOG_FORWARD_SERVER=localhost

LOG_TRAP_HOST LOG Host where SNMP traps for specific types of log messages are sent.

Example: LOG.LOG_TRAP_HOST=localhost

Note: The rules file defining the message types must be installed and loaded.

LOG_TRAP_PORT LOG Port used by the host defined in LOG_TRAP_HOST.

Example: LOG.LOG_TRAP_PORT=162

Note: The rules file defining the message types must be installed and loaded.

LOG_TRAPS If set to TRUE, traps sent by the CME as the result of threshold violations are added to the CME log file.

Example: CME.1.1.LOG_TRAPS=TRUE

(22)

Table 4. Log options in Topology Editor (continued)

MAX_LOGS ALL Maximum number of days to retain log files.

Example: UBA.2.500.MAX_LOGS=3

ROOT_DIRECTORY LOG Root directory where DataChannel logs are generated. The logs are located in thelog directory directly under this root.

Example: LOG.ROOT_DIRECTORY=/opt/datachannel SUPPRESS_

TIMESTAMP_

ON_FORWARD

LOG Suppresses the timestamp.

Example: LOG.SUPPRESS_TIMESTAMP_ON_FORWARD=true

Note: GLOBAL settings are used by all applications. LOG settings are

application-specific settings that affect only that application. ALL settings can be used as both global and application-specific settings.

DataView logs configuration

statGet utility

statGet is a utility that is located at each collector and provides DataLoad statistics like the statistics that are accessible from the Collector Information Tool GUI.

statGet can be run on any local server, and is located in the following default locations:

v ~/opt/dataload/bin/

v ~/opt/datamart/bin/

Note: For remote systems, you must use the Collector Information Tool. For more information, see the IBM Tivoli Netcool Performance Manager: DataMart

Operation Guide.

Syntax

statGet [-l {objects|instances|counters|stats|requests}] [-o <object>]

[-i <instance>] [-c <counter>] [-D <debugLevel>] [-S <serverName>]

[-P <portNumber>] [-T <connectTimeout>] [-T2 <dialogTimeout>] [-?] [-v]

Options

Table 5. Options for statGet utility

Option Description

[-l <objectType>] <objectType>is one of the following items:

v objects- Lists all main classes of statistics counters.

v instances- Lists all instances of a specific statistic class.

v counters- Lists all counters names of a specific statistic class.

v stats- Lists all counters values for a specific class, instance, and so on.

v requests- List all requests configured inside the scheduler.

The requests objectType is the default objectType if no -l

(23)

Table 5. Options for statGet utility (continued)

Option Description

[-o <object>] <object>is a flag that filters the result set for -l stats and -l instances.

-i <instance> <instance>is a flag that filters the result set for -l stats and -l counters.

-c <counter> <counter>is a flag that filters the result set for -l stats.

[-D <debugLevel>] <debugLevel>is a number from 0 - 6, where 0 specifies no debugging, and 6 specifies verbose debugging.

[-S <serverName>] <serverName>is the name of the server that hosts a specific SNMP collector. If the flag is undefined, the PVM_SSDADDRESS environment variable is used. If the environment variable is also undefined, the value ‘localhost' is used.

[-P <portNumber>] <portNumber>is the number of the listening port number for the collector. If the flag is undefined, the PVM_SSDPORT environment variable is used. If the environment variable is also undefined, the value 30023002 is used.

[-T <connectTimeout>] <connectTimeout>is the amount of time permitted to establish a connection before a timeout. If the flag is undefined, a default value of 20 seconds is used.

[-T2 <dialogTimeout>] <dialogTimeout>is the amount of time permitted for a connection response. If the flag is undefined, a default value of 7200 seconds (2 hours) is used.

-? Displays the statGet command reference page.

-v Displays the build version string.

Examples

Table 6. Examples of statGet usage

Use Syntax

Dump all pending and current SNMP requests. statGet

Get all classes of statistics counters. statGet -l objects Get all possible instances of the class of

counters ‘Targets'.

statGet -l instances -o Targets

Get values of all statistics counters of class Targets, for instance ‘_Total'.

statGet -l stats -o Targets -i _Total

Get all currently configured requests in DataLoad.

statGet -l requests

Configuring trace and logging

The default logging level can be set by using the configure command. You can manage logs and trace in the Tivoli Integrated Portal from the Websphere Administrative Console from the Settings > Websphere Administrative Console option.

(24)

Default logging level

You use the configure command to configure the Tivoli Integrated Portal logging level for Tivoli Netcool Performance Manager packages or components.

You must restart the application server for your changes to take effect. When you have restarted the server, the logging level you have selected becomes the default logging level.

Trace logging for DataView

The configure command

Configures the database connection information and the Tivoli Integrated Portal logging level for the Tivoli Netcool Performance Manager installation.

You must restart the application server for your changes to take effect. When you have restarted the server, the logging level you have selected becomes the default logging level.

Location

<tip_location>/products/tnpm/dataview/bin

Syntax

configure.sh -tipuser <tip_username> -tippassword <tip_password> -type jdbc [-driverhome <driver_home>] [-jdbcurl <jdbc_url>] [-jdbcuser <jdbc_username>]

[-jdbcpassword <jdbc_password>]

configure.sh -tipuser <tip_username> -tippassword <tip_password> -type logging [-level <level>] [-package <package_name>] [-module <module_name>]

configure.sh -tipuser <tip_username> -tippassword <tip_password> -type debug [-state <on|off>]

Parameters

<tip_location>

The Tivoli Integrated Portal installation directory, by default /opt/IBM/tivoli/tipv2.

<tip_username>

A Tivoli Integrated Portal user name for the local Tivoli Integrated Portal.

<tip_password>

The Tivoli Integrated Portal user password for the local Tivoli Integrated Portal.

<jdbc|logging|debug>

The three types of configuration options.

jdbc Configures the JDBC database connection information.

(25)

logging

Configures the Tivoli Integrated Portal logging level for the Tivoli Netcool Performance Manager installation.

debug Configures the remote debugging.

Optional parameters

<driver_home>

The JDBC driver location.

<jdbc_url>

The database URL.

<jdbc_username>

The database user name.

<jdbc_password>

The database password.

<level>

Set the level of logging detail: fatal, severe, warning, audit, info, config, detail, fine, finer, finest, or all.

<package_name>

Set logging for this software package. Wildcards * and ? are supported.

<module_name>

Set logging for this software component. Wildcards * and ? are supported.

<on|off>

This is the remote debugging state.

on The remote debugging state is on.

off The remote debugging state is off.

Examples

The following command sets a Tivoli Integrated Portal logging level of detail. A wildcard selects all of the com.ibm.tivoli.tnpm.dal packages.

configure.sh -tipuser <tip_username> -tippassword <tip_password> logging -level detail -package com.ibm.tivoli.tnpm.dal.*

The following command sets the JDBC URL to jdbc:oracle:thin@:host1.company.com:1521:PV:

configure.sh -tipuser <tip_username> -tippassword <tip_password> jdbc -driverhome "/root/directory/tnpm.dataview" -jdbcurl

jdbc:oracle:thin@:host1.company.com:1521:PV -jdbcuser <jdbc_username>

-jdbcpassword <jdbc_password>

(26)

Troubleshooting

You can use logs for a number of troubleshooting tasks.

v Use DUAL_LOGGING to write component-specific log files in addition to proviso.log.

v CME “BUILD_TREE” messages indicate whether Formula Requests are deployed and for how many subelements (Debug Level 2).

v UBA “SCANNED_INPUT”, “PERF_ACQUIRE_ALL”, “START_INPUT”, and

“METRIC_STREAM_INFO” messages indicate if metric input is being retrieved and processed.

v UBA PERF_INVFLUSH messages indicate when discovered elements and subelements are written to the database.

v Use StatGet to obtain SNMP DataLoad information.

Event IDs

For information about Event IDs that are used in Tivoli Netcool Performance Manager logs, see Error Messages section in this guide..

(27)

Chapter 3. Contacting IBM support

IBM Support provides assistance with product defects, answering FAQs, and performing rediscovery.

Before you begin

After trying to find your answer or solution by using other self-help options such as technical notes, you can contact IBM Support. Before contacting IBM Support, your company must have an active IBM maintenance contract, and you must be authorized to submit problems to IBM. For information about the types of

available support, see the Support portfolio topic in the Software Support Handbook.

Procedure

Complete the following steps to contact IBM Support with a problem:

1. Define the problem, gather background information, and determine the severity of the problem. For more information, see the Getting IBM support topic in the Software Support Handbook.

2. Gather diagnostic information.

3. Submit the problem to IBM Support in one of the following ways:

v Using IBM Support Assistant (ISA):

v Online through the IBM Support Portal: You can open, update, and view all your Service Requests from the Service Request portlet on the Service Request page.

v By phone: For the phone number to call in your country, see the Directory of worldwide contacts web page.

Results

If the problem that you submit is for a software defect or for missing or inaccurate documentation, IBM Support creates an Authorized Program Analysis Report (APAR). The APAR describes the problem in detail. Whenever possible, IBM Support provides a workaround that you can implement until the APAR is

resolved and a fix is delivered. IBM publishes resolved APARs on the IBM Support website daily, so that other users who experience the same problem can benefit from the same resolution.

Exchanging information with IBM

To diagnose or identify a problem, you might need to provide IBM Support with data and information from your system. In other cases, IBM Support might provide you with tools or utilities to use for problem determination.

(28)

Sending information to IBM Support

To reduce the time that it takes to resolve your problem, you can send trace and diagnostic information to IBM Support.

Procedure

To submit diagnostic information to IBM Support:

1. Open a problem management record (PMR).

2. Collect the diagnostic data that you need. Diagnostic data helps reduce the time that it takes to resolve your PMR. You can collect the diagnostic data manually or automatically:

v Collect the data manually.

v Collect the data automatically.

3. Compress the files by using the ZIP or TAR format.

4. Transfer the files to IBM. You can use one of the following methods to transfer the files to IBM:

v IBM Support Assistant v The Service Request tool

v Standard data upload methods: FTP, HTTP v Secure data upload methods: FTPS, SFTP, HTTPS v Email

All of these data exchange methods are explained on the IBM Support site.

(29)

Chapter 4. Introduction SNMP Inventory

This chapter provides an introduction to the Tivoli Netcool Performance Manager SNMP Inventory.

Overview

Tivoli Netcool Performance Manager allows the operator to decide how much the Tivoli Netcool Performance Manager DataMart will rely upon the OSS Inventory system.

The Inventory system can be virtually anything from a full-featured commercial Inventory package, to an EMS or Node Manager like HP Open View, to a flat file like /etc/hosts . The minimum required is a list of the IP addresses of resources to monitor.

Tivoli Netcool Performance Manager can discover both elements (resources that have an IP address, such as a router or a switch), and the sub-elements associated or contained with them, such as an interface or a port.

Tivoli Netcool Performance Manager supports the following three modes of element and sub-element discovery:

Mode Inventory Contains

Tivoli Netcool Performance Manager Discovers

1 Nothing Elements, sub-elements

2 Elements Sub-elements

3 Elements, Sub-Elements Nothing

Most Tivoli Netcool Performance Manager deployments are in mode two. In this mode, Tivoli Netcool Performance Manager imports a list of elements and then walks through the MIB to discover the sub-elements. In the first mode, Tivoli Netcool Performance Manager sweeps the network to discover the elements and their associated sub-elements.

Discovery

Tivoli Netcool Performance Manager's Discovery capabilities include some

powerful and flexible tools that allow you to determine exactly what Tivoli Netcool Performance Manager will monitor, and how the sub-elements will be labeled and grouped.

These capabilities make it possible to initiate automatically data collection, threshold monitoring, and reporting on discovered elements.

Using a formula language, Tivoli Netcool Performance Manager can be configured to walk through an element's MIBs to discover particular MIBs representing users, tunnels, protocols, service classes or other sub-elements. Particular OIDs can be used to automatically create a label for the sub-element.

(30)

For example, the sub-element label could be a combination of the element name, the interface, the port and the customer name, all taken from the MIB.

Metrics and Properties

In addition to the identifier of the sub-element and the metrics collected for it, Tivoli Netcool Performance Manager allows the operator to create any number of user-defined properties.

There are two main differences between metrics and properties. Metrics come from a monitored resource and are used to calculate statistics that are the basis of performance reports and alarm thresholds. Metrics are generally numeric values that change frequently, like the number of packets transmitted or a resource's availability.

Properties, by contrast, are values that change less frequently, such as the CIR (committed information rate) or the location of the element. Properties consist of metadata-like identifiers or labels for such things as the customer and/or the services using a particular sub-element.

The values for properties can be discovered automatically from the monitored resource, or they can be imported from Inventory, provisioning or from another OSS component.

Inventory Synchronization and Change Management

Sub-element properties such as the CIR or customer name can change. Tivoli Netcool Performance Manager tracks the change and the time of the change, so that reports are displayed correctly.

For example, utilization may be calculated against CIR. After the CIR is updated, reports must reflect the new value for utilization calculations. But reports that show dates prior to the CIR update must use the old CIR value. Tivoli Netcool Performance Manager manages this without error.

If a sub-element is assigned to a new customer, the customer property will change.

If the sub-element is in a particular customer's group, this can cause the sub-element to move to a new group. This can change the collection, alarm thresholds and reporting for that sub-element, automatically.

Change Management for Elements

The Inventory must track changes so that continuity of meta-data associated with the elements can be maintained.

Unfortunately, Inventory is not as simple as sweeping a range of IP addresses to identify the network elements. That is just the beginning of the process. The Inventory must track changes so that continuity of meta-data associated with the elements (such as associations to customers, VPNs and services) can be

maintained. At least one additional challenge remains to keep the element Inventory accurate, as shown with these two problem statements:

v IP Address changes v

Problem: If you are tracking a router by its IP address, and you discover a router at a new IP address, how do you know if it is a new router, or an existing router

(31)

Tivoli Netcool Performance Manager solves this problem by associating

additional properties with each element which provide additional continuity and trace-ability in the face of IP address changes. These additional properties can be discovered from the device itself, like SNMP sysName, or gathered externally, like the name resolved from the IP address of the element's management interface.

v Name changes v

Problem: If you are tracking a router by its name, and you discover that the name has changed, how do you know if it is a new router with that IP address, or an existing router with a changed name?

Tivoli Netcool Performance Manager does not track elements by their name or any other single property. Instead, by tracking a combination of properties, Tivoli Netcool Performance Manager is able to provide continuity to inventory even when any of these properties change.

By automatically tracking changes to an element, rather than discovering it as a new element or forcing the operator to manually update the database, Tivoli Netcool Performance Manager helps reduce operating costs as follows:

v Performance and trend reports for the element show the entire history of the element, without interruption.

v Changes to the element are shown in historical reports so they can be correlated to problems or changes in performance.

v Meta-data, such as location, community string, or other properties remains associated with the element, saving the operator from having to re-enter this.

v Inventory accuracy is improved because the update operation is automatic, not manual, eliminating errors.

v Inventory accuracy is improved because synchronization is automated, eliminating manual delays.

Change Management for Sub-Elements

In addition to the challenge of detecting and correctly managing changes on sub-elements, it is important to display this information correctly on reports.

From an external (customer) point of view, subelement changes should be invisible.

From an internal (network operations) perspective, the change must be visible.

Tivoli Netcool Performance Manager manages all of this automatically.

There are many reasons why the identifier (in SNMP, the Object Identifier, or OID) might change for a particular sub-element. Assuming that the sub-element is a port or virtual circuit residing on an interface, some of the changes will be due to failure and recovery scenarios, or network reconfigurations due to growth:

v Adding or removing an interface card can cause the SNMP indexes to shift for other sub-elements.

v The interface the sub-element resides upon might fail, forcing the service associated with the sub-element to be moved to another interface.

v The service may be moved to a currently unused sub-element.The service may be moved to a sub-element in use, and the service currently on the sub-element is moved to another sub-element

Most network changes should be invisible to customers. Their reports should reflect the quality of their service, and moves and changes to the network to

(32)

preserve their service should be invisible to them. This is particularly important for SLA reporting. You certainly want to avoid forcing the customer to view two reports, one for the original NIC and a second report for the replacement NIC.

Throughout the network changes, network operations and engineering staff must have an accurate view of the actual sub-elements. For troubleshooting and capacity planning purposes, they should have a historical view of performance and traffic on a particular port, with information on changes that have occurred.

Grouping Sub-Elements

Properties can be used to automatically group sub-elements.

For example, sub-elements can be grouped according to technology, customer, service or site. Groups can be hierarchical, so it is possible to create structures like the following:

v Site/Technology, to see all ATM SVCs in the New York POP.

v Customer/Service, to show all of the services a particular customer has subscribed to.

v Technology/Site, to see which sites are generating the most Frame Relay activity.

Sub-elements can exist in multiple groups simultaneously. For example, a sub-element might be part of a network operations group and a particular customer's group.

Where to Go From Here

Relevant information.

For information on troubleshooting tasks to perform after a new SNMP Inventory has been run, see See SNMP Inventory Troubleshooting.

For information on periodic administrative tasks to perform, see See SNMP Inventory Management.

(33)

Chapter 5. SNMP inventory troubleshooting

This section discusses SNMP Inventory troubleshooting for Tivoli Netcool Performance Manager.

Overview

The major phases of SNMP inventory.

The Tivoli Netcool Performance Manager SNMP Inventory consists of the following three major phases, which usually happen sequentially:

v SNMP Discovery

Detects all resources on a target network and creates a virtual image of the network.

v Synchronization

Compares the virtual network image generated by the Discovery with the records in the Tivoli Netcool Performance Manager database that were created by the previous Inventory run. Any modifications (new, missing, or renamed resources, for example) are then synchronized through the application of various algorithms, and the new network image is written to the database.

v Grouping

Updates the grouping structure in the database, which determines the kind of information that is to be collected on each resource, element, sub-element, and so forth.

In almost all cases, Tivoli Netcool Performance Manager's SNMP Inventory requires virtually no operator intervention.

However, under certain circumstances, problems arise which you will need to address. The following sections discuss the more common problems you are likely to encounter and - where possible -- provide suggestions for remedial actions.

It is strongly advised that you monitor the logs for potential error messages by doing one of the following:

v Running a Discovery from the command line.

If you run a Discovery from the command line, redirect STDER to a log file, as follows:

inventory -noX -action discovery -name lowell >output 2>error_log

For a complete list of error messages written to the Tivoli Netcool Performance Manager log file, see Messages section of this guide. For more information on using the Tivoli Netcool Performance Manager log file, see See Monitoring the Log File.

v Running a Discovery from the DataMart GUI

If you use the DataMart GUI to initiate a Discovery, error messages will appear on the DataMart GUI > Resource tab > Inventory Tool icon > Live Information tab.

(34)

The Inventory Tool prints out messages like the following every five seconds:

2005/12/09 13:46:52 [PL2DBS1, 238 sec, IP done.1/ SNMP done.1/ Elmt 0.1.0/ SubElmt 0.0.0]

These messages explain the progress of the discovery as follows:

– IP done.1

Indicates that the IP phase of the discovery process has completed.

– SNMP done.1

Indicates that the SNMP phase of the discovery process has completed.

– Elmt 0.1.0

Indicates that progress of discovered elements, using the following syntax:

numberOfObjectsInInputQueue.numberOfThreadsRunning.numberOfElementsDiscovered – SubElmt 0.0.0

Indicates that progress of discovered sub-elements, using the following syntax:

numberOfObjectsInInputQueue.numberOfThreadsRunning.numberOfSubElementsDiscovered If after two minutes there is no change in these messages, the Inventory Tool

displays a more detailed message like the following:

2005/12/09 13:46:57 Current activity @ 2005.12.09-18.46.54 2005/12/09 13:46:57 Stage: IP done.1

2005/12/09 13:46:57 Stage: SNMP done.1 2005/12/09 13:46:57 Stage: Elmt 0.1.0 2005/12/09 13:46:57 W: R00004/192.168.80.2 2005/12/09 13:46:57 Stage: SubElmt 0.0.0

The line that includes the run number and IP address (2005/12/09 13:46:57 W:

R00004/192.168.80.2, for example) can be used to troubleshoot possible problems, as explained in See Discovery Seems to Hang or Never Finishes.

Figure 1. Errors Displayed in the DataMart GUI

(35)

Discovery Troubleshooting

The following sections address the more common problems that arise during Discovery.

Discovery Does Not Start

The following sections offer the most common solutions to problems with Discovery not starting.

Discovery Fails Because Discovery Server Does Not Run

What do to if the discovery fails.

About this task

If the Discovery server fails to start, an error message like the following is returned:

IIOP: couldn’t connect to 192.168.68.251:34024: couldn’t open socket: connection refused

Error: StartInventory Failed for Discovery Server : IDL:omg.org/CORBA/INTF_REPOS:1.0 {minor 0 completion_status COMPLETED_NO}

To troubleshoot this problem, do the following:

Procedure

1. Log in as pvuser on the system where the channel manager and log server are installed.

2. Change your working directory to the $DC_HOME/bin directory, by entering the following command.

Note that $DC_HOME is defined as /opt/datachannel by default.

cd $DC_HOME/bin

3. Verify that the Discovery server is not running by entering the following command:

$ dccmd -action status -pattern DISC.*.*

If the Discovery server is not running, the dccmd command returns output like the following:

NUMBER FACILITY HOST STATUS ES DURATION EXTENDED STATUS

1 DISC unresponsive

What to do next

ACTION: If the Discovery server is not running, do the following:

1. Restart the Discovery server by entering a command like the following, specifying the Discovery server for your deployment (in this example we use DISC.DEV19.1 ):

dccmd -action bounce -pattern DISC.DEV19.1

2. Verify that the Discovery server is running by entering the following command:

dccmd -action status -pattern DISC.*.*

If the Discovery server is running, the dccmd command returns output like the following:

(36)

NUMBER FACILITY HOST STATUS ES DURATION EXTENDED STATUS

1 DISC DEV19.QUALLA running 1 running

For more information on using the dccmd command, see the Netcool/Proviso Command Line Interface Guide .

Discovery Fails Because Collector Stops During Discovery About this task

If the collector stops during a Discovery, several different error messages are logged. The most common error messages are the following:

v Error: Aborted at March 14, 2005 10:21:58 pm v Error: Connection refused

v Error: Discovery Server : Status of lowell : invalid CLIENTERR [DC1] R00015 Connection refused (-I 682 -D 0 -profil lowell -collector dev19.quallaby.com:3002 -nbGetIfAddress 100 -invFileTxt /opt/datamart/conf/inventory_subelements.txt -vname {}

-intcollector 1)

To troubleshoot this problem, do the following:

Procedure

1. Log in as pvuser (or the user name that you specified during installation) on the system where DataMart is installed.

2. (Optional) Ensure that the Oracle database and Listener are running. For more information, see the Tivoli Netcool Performance Manager Installation Guide.

3. Enter the following command, replacing DATAMART_ROOT with the root DataMart directory ( /opt/datamart by default):

DATAMART_ROOT/bin/pvm The DataMart GUI appears.

Figure 2. DataMart GUI

IBM Tivoli Netcool Performance Manager Wireline Component Document Revision R2E2. Troubleshooting Guide

IBM Tivoli Netcool Performance Manager 1.3.2 Wireline Component

Document Revision R2E2

Troubleshooting Guide

Contents

Chapter 1. Troubleshooting Tivoli

Netcool Performance Manager . . . 1

Chapter 2. Logs (Wireline Component) . 9

Chapter 3. Contacting IBM support . . 23

Chapter 4. Introduction SNMP Inventory . . . 25

Chapter 5. SNMP inventory troubleshooting . . . 29

Chapter 6. SNMP inventory management . . . 61

Chapter 7. Messages. . . 65

Notices . . . 177

Chapter 1. Troubleshooting Tivoli Netcool Performance Manager

Troubleshooting a problem

What are the symptoms of the problem?

Where does the problem occur?

When does the problem occur?

Under which conditions does the problem occur?

Can the problem be reproduced?

Troubleshooting checklist for Tivoli Netcool Performance Manager

Known problems and solutions

Troubleshooting tasks

Real-time charts do not work as expected

Symptoms

Resolving the problem

Error: ORA-00001:unique constraint (PV_ADMIN.PK_SEGM) violated

Symptoms

Causes

Resolving the problem

MDE memory constraint

Symptoms

Causes

Resolving the problem

Incomplete SNMPv3 metric collection

Symptoms

Causes

Resolving the problem

After upgrading Tivoli Common Reporting, it is not possible to log into Tivoli Integrated Portal

Symptoms

Resolving the problem

Collectors swapping from idle to running at startup

Symptoms

Causes

Searching knowledge bases

About this task

Procedure

Chapter 2. Logs (Wireline Component)

Overview

Logs by component

Installation log files

Database

DataChannel

Tivoli Integrated Portal

DataView

DataLoad

DataMart

Installation

DataChannel logs

Proviso.log

tnpmlog.log

DataChannel log format

Message Components

Walkback logs

DataLoad logs

SNMP log

Pvmdmgr.log

WatchDog logs

DataMart logs

TraceInventory.log

logFile.POLLPROFILE.{collector ID}

logFile.*

provisoinfod*.log

NotifyDBSpace*.log file

DataView logs

Database log

Logs messages format

Logging configuration and information utilities

DataChannel logs configuration

**provisoinfod*.log**

**NotifyDBSpace*.log file**