FAULT MANAGEMENT SERVICE IN ATM NETWORKS USING TINA NETWORK RESOURCE ARCHITECTURE

Loading....

Loading....

Loading....

Loading....

Loading....

Full text

(1)

FAULT MANAGEMENT SERVICE IN ATM NETWORKS USING TINA

NETWORK RESOURCE ARCHITECTURE

Chetan P. Chiba, Setumo Mohapi, Hu Hanrahan

Centre for Telecommunications Access and Services

1

Department of Electrical Engineering

University of the Witwatersrand, Johannesburg

{c.chiba; s.mohapi; h.hanrahan}@ee.wits.ac.za

1

This work was supported by Telkom SA Limited, Siemens Telecommunications and the THRIP Programme of the Department of Trade and Industry. Authors’ address: Department of Electrical Engineering, Private Bag3, Wits 2050, South Africa. C.P. Chiba is with the ITAS Division of Telkom SA Limited.

ABSTRACT

The TINA Network Resource Architecture (NRA) technology-independent abstraction of potentially heterogeneous underlying networks. TINA defines a NRA that covers management areas of Fault, Configuration, Accounting, Performance and Security Management (FCAPS). The combination of the TINA NRA and the management architecture allow a technology-independent abstraction of network management functionality. This paper addresses the work in progress that aims to design and develop a distributed Fault Management functionality for an ATM network that is defined, represented and implemented according to the TINA NRA specifications.

Keywords: TINA, Fault Management, Network

Resource Architecture (NRA), TMN and ATM.

I.

INTRODUCTION

The purpose of network management is the assignment and control of proper network resources, both hardware and software, to address service performance needs and the network’s objectives. With the ever-increasing size and complexity of underlying networks and services, it has become impossible to carry out these functions without the support of automated tools.

With the advent of new softswitch architectures, such as TINA, JAIN and Parlay, and service architectures that aims to separate service provision from the underlying networks, there is a need for an appropriate management framework to support these architectures. The TINA (Telecommunications Information Networking Architecture) NRA and Management architecture offer a generic structure that may be applied across heterogeneous networks to provide this management functionality.

The TINA architecture is decomposed into four architectures: Computing, Service, Network and Management. TINA’s Management Architecture

covers the principles and concepts for managing TINA systems and networks and draws heavily on the ITU’s TMN architecture. The TINA-C Management Architecture follows the functional area organization defined in the OSI Management Framework, namely fault, configuration, accounting, performance, and security management (FCAPS). Although TINA-C embraces all the areas, the work done so far has been focused in selected management functional areas, i.e. This paper reports on work in progress that aims to design and develop a Fault Management service using the generic components of the TINA NRA and specialising it for an ATM network.. The proposed solution will define Fault Management requirements for an ATM NRA implementation, design and develop an ATM NRA management based on TINA specifications, that will be deployed on the SATINA (South African TINA) [6] trial environment.

Section II of this paper discusses the origins of fault. It also provides a fault management flow diagram and describes the types of fault alarms as specified by the OSI Alarm Reporting Function. Section III examines fault management for the TINA environment. It highlights the requirements for a fault management system in the TINA environment. Section IV provides a view of how the fault management service fits into the TINA NRA. Section V explains the fault management information model based on the TINA NRIM as reference. Finally, section VI of this paper provides a flow of events (FOE) of the fault management service. A fault scenario is presented to explain the FOE.

II.

FAULT MANAGEMENT

A.

Origins of Faults

Network faults can be classified into hardware and software faults, which cause elements to produce incorrect outputs, which in turn can cause overall failure effects in the network such as congestion.

(2)

Examples of hardware faults are failures of an element due to physical failures and malfunctions due to a failing or a weakness in their logical design, or elements malfunctioning due to simple wear and tear or through external forces such as accidents, acts of nature, being mishandled, or improperly installed. Examples of software faults include failure of elements due to incorrect or incomplete design of their software, or the network due to software bugs (e.g., incorrect packet header processing), and slow or faulty service by the network due to incorrect information (e.g., incorrect routing tables).

B.

Fault Management Flow

The flow of fault management, shown in Figure 1, can be described as follows:

NETWORK

Collect Alarms Filter and

Correlate Alarms Diagnose Faults Develop and Implement Corrective Plan Verify Fault is Eliminated Record Events and Analyse Fault

Management Fault is Eliminated Physical Alarms Logical Alarms F a u lt i s n o t E lim in a te d 1 2 3 5 6 4

Figure 1: The Fault Management Process [1]

1. The first step in fault management is to collect monitoring and performance alarms. Alarms can be classified into two categories, physical and logical, where physical alarms are hard errors (e.g., a link is down), typically reported through an element manager, and logical alarms are statistical errors (e.g., performance degradation due to congestion). Once the alarms have been reported and collected, adequate service must be maintained through immediate action.

2. The next step is to filter and correlate the alarms.

Alarm filtering is a process that analyzes the multitude of alarms received and eliminates the redundant alarms (e.g., multiple occurrences of the same alarm). Alarm correlation is the interpretation of multiple alarms such that new conceptual meanings can be assigned to the alarms, creating derived alarms.

3. Faults are identified by analyzing the filtered and

correlated alarms and by requesting tests and

status updates from the element managers, which provide additional information for diagnosis.

4. Once a fault has been diagnosed, corrective

procedures is undertaken by the network to eliminate the cause of the fault. The fault management system’s role in correction is to develop a plan or series of actions, and to initiate this plan with other functions within the network.

5. The correction must be verified through testing

requests sent to the element managers, where if the fault does not disappear, more data is analyzed and the diagnostic process is repeated.

C.

Fault Alarms and Faults relevant to

TINA Management Service

The following are the types of fault alarms a fault management service detects [4]:

Communication Alarms – is associated with general

communication failures. They may be reported by the NE level or may be detected at the resource management level. For example, loss of signal, loss of frame, framing error, local node transmission error, remote node transmission error, call establishment error, degraded signal, communications subsystem

failure, communication protocol error,LAN error.

QoS Alarms – is associated with the degradation in

the quality of service. They may be reported by the NE level or may be detected at the resource management level. For example, response time exceeded, queue size exceeded, bandwidth reduced, retransmission rate exceeded, threshold crossed, performance degraded, congestion, resource at or nearing capacity.

Processing Alarm – is associated with a software of

processing fault. For example, parameter out of range, or underlying resource unavailable.

Equipment Alarm - is associated with an equipment

fault. For example, fault in ATM switch.

III.

FAULT MANAGEMENT FOR

THE TINA ENVIRONMENT

Within the context of TINA, fault management is related to the service management, network resource management and DPE management areas. This paper is concerned with the TINA network resource management area.

D.

Functional Requirements

The functional requirements of Fault Management are

providing information and computational

(3)

telecommunication management. The functional requirements of fault management are [2]:

Alarm Surveillance: Includes collection and

logging of alarm notification from the network resources, and monitor/retrieval of alarm data from them.

Fault Localisation: Analyses the collected alarm

information, detects the root cause of alarm, and notifies the result to the clients of the alarm surveillance.

Fault Correction: Is responsible of dealing with

the computational objects that represent the resources in which a root cause alarm is detected in order to restore or to recover them from the fault condition.

Testing Function: Invokes a test capability of a

resource object upon a request from the clients of the service. It may also support a test of series of resource objects.

Trouble Administration: Enables the reporting

of troubles due to fault conditions and the tracking of their status.

IV.

APPLICATION OF FAULT

MANAGEMENT IN THE TINA

NRA

The TINA NRA provides a model of a transport network that is capable of transporting multimedia information over end-to-end connections and deals with heterogeneous types of traffic. It is a complex and broad architecture dealing with aspects such as connection, fault, accounting and network topology management. The TMN functional layers [M3010] relevant in Network Architecture management are the Network Management Layer (NML) and the Network Element Management Layer (EML), since both networks and network elements are the resources being considered in the Network Architecture. The ATM Connection Management Architecture is composed of 5 computational object classes [3] (see Figure 2), namely Connection Coordinator (CC), Layer Network Co-ordinator (LNC), Network Management Level Connection Performer (NMP-CP), Element Management Level Connection Performer (EML-CP) and Resource Adapter (RA). Also, in this diagram, fault management (FM) computational objects are shown attached to the EML-CP and the NML-CP computational objects (CO). This is where the fault management service fits into the TINA NRA. The CO’s are EML-FM, NML-FM and RA-FM respectively. RA RA-FM ATM Switch ; EML-CP EML-FM NML-CP NML-FM NML-CP NML-FM NML-CP NML-FM CC LNC KEY: CC = Connection Coordinator CP = Connection Performer NE = Network Element LNC = Layer Network Coordinator NML = Network Management Layer EML = Element Management Layer

EML-CP EML-FM EML-CP EML-FM

Figure 2: ATM Connection-Fault Management Architecture Components

All fault management services will perform the 5 fault management activities, i.e. surveillance, localisation, correction, testing and trouble administration. The NML-FM and the EML-FM CO’s are further sub-divided into 3-fault management computational object, i.e. the Alarm Manager, the Fault Coordinator and the Test/Diagnostic Server. These CO’s are described further in the next section. The fault management CO’s, in Figure 2, is shown expanded in Figure 3 below.

AM: Alarm Manager FC: Fault Coordinator

TDS: Testing/Diagnostic Server NML-AM NML-FC NML-TDS NML-FM EML-AM EML-FC EML-TDS EML-FM EML-AM EML-FC EML-TDS EML-FM Fault Management within a Network NML EML Federation 1 1 1 2 2 2 3 3 3

Figure 3: Basic computational model [3]

The network resource fault management services are provided by the interaction of the CO inside and outside the fault management area. The CO’s identified for the network resource fault manager are [3]:

1. Alarm Manager (AM)

The Alarm Manager (AM) receives fault-related alarm from Managed Objects (MO’s) and

performs relevant procedures for alarm

(4)

to fault coordinator or fault management service user and for alarm record management. Each AM has its own discriminating criteria through which incoming alarms are logged and forwarded to relevant computational objects in the system. 2. Fault Coordinator (FC)

The Fault Coordinator (FC) includes capabilities to internally analyze alarms received from multiple MOs to determine next possible step for fault localization/correction. For this purpose, the FC correlates all available information to refine information concerning the root cause of the event in question. During the analysis, the TDS can be invoked to run tests as appropriate. 3. Testing/Diagnostic Server (TDS)

The Testing/Diagnostic Server (TDS) is

concerned with testing of MOs for the purpose of service and function verification of MOs. From fault management’s view, the TDS is invoked by either fault coordinator or fault management service user. However, it is also possible that the TDS can be invoked by other computational objects in the system, e.g., resource configuration and connection management objects or scheduler. The diagram below, Figure 4, shows the interactions among CO in fault management functions [3]. The dotted rectangle shows the functions of fault management and interfaces for fault management services and activities.

Figure 4: CO interactions in Fault Management [3]

Figure 4 can be interpreted as those interactions that are provided by fault management function CO’s that manage the network at various levels.

V.

FAULT MANAGEMENT

INFORMATION MODEL

The information model defined in TINA-C for the Network Architecture is the Network Resource

Information Model (NRIM). The NRIM contains the object classes needed for the representation of network resources. The information model is presented in a number of fragments. The fragments show the related object classes that deal with a particular subject and are introduced for an easier understanding of the information model grouping a limited number of object class’s definitions in each fragment.

The fault management fragment specifies the management support information objects for fault management. TINA fault management functional area addresses the five fault management activities discussed in section 3.1. Some of the object types specified in the FM fragment and shown in figure 5 are [4,5]:

1. FaultManageable- Represents the management

information that a network resource has to provide so that it can be subject to fault management. This is a subtype of Manageable. 2. FaultManagementDomain- represents a set of

FaultManageable objects that is controlled by a fault management function. Associated with a fault management domain is a set of policies that govern the fault management of all objects in the

domain. This is a subtype of

ManagementDomain.

3. AlarmRecord- Represents the alarm information

stored in a Log. This is a subtype of LogRecord. 4. CurrentAlarmSummaryControl- Specifies

criteria for the generation of a current alarm summary report.

5. AlarmSeverityAssignmentProfile- Specifies the

assignment of alarm severity to different types of alarms. Each profile object may specify different severity assignments.

The fault management information model can be seen in figure 5, together with the fault management CO’s and how they related to each other.

Figure 5 can be divided into 3 sections. Section 1 contains the following objects: domain, manageable resources, management domain and administrative domain objects. The domain object represents a group of information objects instances. Two types of domains are identified in the TINA management architecture i.e. the management domain and the administrative domain. An administrative domain also contains a number of management domains. The manageable resources share an assignTo relationship with the management domain object.

Alarm Manager Fault Coordinator Testing/ Diagnostic Server managed system MO MO notification server testing alarm testing report testing req fault localisation interaction alarm access re g is te r alarm Alarm register Alarm access Alarm report Alarm summary Fault localise req. Fault localise reply Testing/ Diagnostic interaction CM MO equipment hierarchy connection topology performance data NTCM Support Data RC PM

(5)

Figure 5: OMT Diagram for Fault Management [4,5]

Section 2 contains specialised types of manageable resources object types, e.g. fault manageable

resources, configurable resources, etc. The

management domain object from section 1 contains

specialised management domains i.e. fault

management domain, configuration management domain, etc. A number of manageable resources are assigned to the fault management domain.

Section 3 contains objects specific to the fault management service. As an example, the alarmable resource flow diagram is shown in detail. Associated with this object, are the different types of alarms detected, as mentioned in chapter 2.3. It also contains specialised relationship with other objects in section 3 that reside in the fault management domain.

VI.

FLOW OF EVENTS OF A

TYPICAL FAULT

MANAGEMENT SERVICE

To explain a typical fault management scenario in the TINA environment, an example of a damaged link between two ATM switches will be used.

Figure 6 illustrates the flow graph of the events that takes place when a link between two switches is damaged. Figure illustrates the interaction between the NML_FM and the EML-FM from the time the alarm is received, to the time the fault is corrected.

When a link between two switches becomes inoperable, a communication alarm is sent to the EML-FM component of the EML-CP in which the link is contained. The severity of the alarms may range from critical to major [4]. This means that the condition is service affecting and immediate /urgent corrective action is required. (e.g., the resource is out of service/degraded). testing on MOs forward notification alarm correlation filtering alarm record alarm log NML Alarm Manager fault localisation interaction forwarded notification notification forwarding request equipment hierarchy connection topology Support Data NML-NTCM alarm analysis alarm record access federation with NML-FCs in other networks NML Fault Coordinator testing interaction test analysis NML Testing/ Diagnostic Server testing/diagnostic request from FM clients

EML-AM EML-AM EML-FC

localised fault corrective interaction EML-TDS EML-TDS alarm forwarding alarm correlation filtering log alarm record alarm log event log record event log EML Alarm Manager fault localisation interaction report fault corrective interaction forwarded alarm alarming

forwarding request equipment hierarchy connection topology Support Data alarm analysis fault correction alarm access

report fault corrective interaction (with NML) EML Fault Coordinator testing interaction alternate resource setup request/reply test analysis EML Testing/ Diagnostic Server testing on MOs MANAGED OBJECTS EML-NTCM 2 4 6 1 3 5

Figure 6: Flow of Events of a Fault Management Service on a Damaged Link [4]

The flow of events of a typical fault correction process can be viewed in three steps:

1. EML-AM - The communication alarm is first

received by EML Alarm Manager (ELM-AM). The EML-AM first makes the corresponding event log record and filter the alarm. In case the alarm passes filtering, EML-AM prepares the corresponding alarm record that can be referred for further alarm analysis and reporting purposes. The filtered alarm passes the alarm correlation function, which provides redundant alarm removal and initiates fault localisation procedure. In the event that the EML-FC cannot localise the fault, the alarm is then passed to NML Alarm Manager (NML-AM) through alarm forwarding functions. The AM interacts with the

EML-Network Topology Configuration Manager

Resource Testable Localisation Correctable Administration SeverityAssignment 1+ Entity Domain Management Domain Admin. Domain Manageable Resource Configurable Resource FaultManageable Resource Alarmable Resource Fault Management Domain Communications Alarmable Resource Processing Error Alarmable Resource Alarm Severity Assignment Profile Current Alarm Summary Control Log Alarm Record re p o rt P ro c e s s in g E rr o r A la rm s T o re p o rt C o m m u n ic a ti o n E rr o r A la rm s T o receiver reporter reporter 1+ assignTo element set RCM Domain AlarmsurveyedBy 1+ assignTo element set 3 5 4 1 2 1 2 3

(6)

(NTCM) to get the equipment hierarchy and connection topology data.

2. NML-AM - The NML-AM receives alarm reports

from corresponding EML-AM. The alarm reports from an EML-AM does not specify the root cause of the alarm report. The NML-AM interacts with the NML-FC to identify the root cause of alarms. The fault localisation results are then forwarded to users. NML-AM makes use of NML-Network Topology Configuration Manager (NTCM) to get the equipment hierarchy and connection topology information between subnetworks as supporting data.

3. EML-FC - The alarm analysis function (of the

EML-FC), receives the fault localisation request from the EML-AM, and performs analysis of current alarm records to determine the root cause of a set of related alarms. During this phase, the

EML-FC interacts with EML-TDS for

testing/diagnostic over a set of related resources. The EML-FC performs automatic restoration by activating a back-up resource of a faulty resource. This is done by re-routing the information on a different link i.e. for an example, using the same VP but using a different VC. To get the re-configuration data possibilities for re-routing, the EML-FC interacts with the EML-Network Topology Configuration Manager (NTCM) to get the equipment hierarchy and connection topology data. A report is then submitted to the EML-NTCM to report the changes in configuration as a result of fault correction.

4. NML-FC - In the event that the EML-FC cannot

locate or correct the fault, the alarm is sent to the NML-AM. The NML-AM in-turn interacts with the NML Fault Coordinator (NML-FC). The NML-FC performs alarm analysis for the current alarm records and interacts with NML-TDS to determine the root cause of a set of related alarms. For the localisation of faults, which span multiple networks, the EML-FC interacts with NML-FC in other networks through federation. During alarm analysis, connection topology between subnetwork is used as supporting data. If the information flow is re-routed, then the NML-FC interacts with the NML-Network Topology Configuration Manager (NTCM) to report the changes in configuration as a result of fault correction.

5. EML-TDS - The EML Testing/Diagnostic Server

(EML-TDS) provides capabilities for

testing/diagnostic of a set of resources. Testing is a function verification of a set of resources and diagnostic involves analysing the results of testing to find out the main cause of abnormal behaviour within the network. The EML-TDS is

activated by EML-FC, NML fault management COs, and other FM clients.

6. NML-TDS - The NML Testing/Diagnostic Server

(NML-TDS) provides capabilities for

testing/diagnostic of a set of resources, which span multiple subnetworks or multiple networks. FC, NML fault management users, NML-TDSs in other networks, and other FM clients including resource configuration and connection management activate the NML-TDS.

VII.

CONCLUSION

The paper addressed the work in progress of the development of a fault management system for an ATM network that is defined, represented and

implemented according to the TINA NRA

specifications. The proposed solution defines a distributed management functionality that is capable

of providing management support across

heterogeneous networks. The paper describes the Fault Management requirements for an ATM NRA implementation. The design approach implements the TINA NRIM and shows how the fault management service can be implemented in the TINA NRA. The FM computational objects, i.e. the Alarm manager, Fault Coordinator and the Test/Diagnostic Server, have been defined and their interactions have been discussed in providing a fault management service.

VIII.

REFERENCES

[1]. Gurer DW, Khan I, et al. “An Artificial Intelligence Approach to Network Fault Management.”

http://www.sce.carleton.ca/netmanage/docs/An_AI _Approach.pdf

[2]. Fuente LA, Walles T. “Management Architecture Version: 2.0” Document No. TB_GN.010_2.0_94. December 1994, TINA Consortium.

[3]. C. Abarca, J. Forslow, T. Hanada, et al., "Network Resource Architecture Version 3.0," Document N0. NRA_v3.0_97_02_10, 10 February 1997, TINA-Consortium.

[4]. Natarajan N, Flinck H, Rosli RM, “Network Resource Information Model Specification” Document No. NRIM_v2.2_97_10_31, TINA Consortium, October 31, 1997.

[5]. Kawnaoshi M, “ ’94 Report on Fault Management and Resource Configuration management” Doc. No. TR_MK.006_1.0_94. TINA Consortium. January 20, 1995.

[6]. F. Scholtz, H.E. Hanrahan, R.A. Achterberg, "The South African TINA Trial: SATINA," Proceedings of SATNAC98, 6th-8th September 1999, University of Durban Westville.

Figure

Updating...

References

Updating...

Related subjects :