RF Troubleshooting Guide Radio

(1)

Troubleshooting Guide

Issue 04

Date 2013-08-30

(2)

No part of this document may be reproduced or transmitted in any form or by any means without prior written consent of Huawei Technologies Co., Ltd.

Trademarks and Permissions

and other Huawei trademarks are trademarks of Huawei Technologies Co., Ltd.

All other trademarks and trade names mentioned in this document are the property of their respective holders.

Notice

The purchased products, services and features are stipulated by the contract made between Huawei and the customer. All or part of the products, services and features described in this document may not be within the purchase scope or the usage scope. Unless otherwise specified in the contract, all statements, information, and recommendations in this document are provided "AS IS" without warranties, guarantees or representations of any kind, either express or implied.

The information in this document is subject to change without notice. Every effort has been made in the preparation of this document to ensure accuracy of the contents, but all statements, information, and recommendations in this document do not constitute a warranty of any kind, express or implied.

Huawei Technologies Co., Ltd.

Address: Huawei Industrial Base

Bantian, Longgang Shenzhen 518129

People's Republic of China

Website: http://www.huawei.com

(3)

About This Document

Purpose

This document describes how to diagnose and handle eRAN faults. Maintenance engineers can troubleshoot the following faults by referring to this document:

l Faults reflected in user complaints l Faults found during routine maintenance l Sudden faults

l Faults indicated by alarms

Intended Audience

This document is intended for: l System engineers

l Site maintenance engineers

Product Versions

The following table lists the product versions related to this document.

Product Name Product Version

DBS3900 LTE V100R005C00 DBS3900 LTE TDD V100R005C00 BTS3900 LTE V100R005C00 BTS3900A LTE V100R005C00 BTS3900L LTE V100R005C00 BTS3900AL LTE V100R005C00

(4)

Change History

For details about the changes in this document, see 1 Changes in eRAN Troubleshooting Guide.

Organization

1 Changes in eRAN Troubleshooting Guide

2 Troubleshooting Process and Methods

This chapter describes the general troubleshooting process and methods.

3 Common Maintenance Functions

This chapter describes common maintenance functions that are used to analyze and handle faults. It also explains or provides references on how to use the functions.

4 Troubleshooting Access Faults

This chapter describes how to diagnose and handle access faults.

5 Troubleshooting Intra-RAT Handover Faults

This chapter describes how to diagnose and handle intra-RAT handover faults. RAT is short for radio access technology.

6 Troubleshooting Service Drops

This chapter describes the method and procedure for troubleshooting service drops in the Long Term Evolution (LTE) system. It also provides the definitions of service drops and related key performance indicator (KPI) formulas.

7 Troubleshooting Inter-RAT Handover Faults

This section defines inter-RAT handover faults, describes handover principles, and provides the fault handling method and procedure.

8 Troubleshooting Rate Faults

This chapter provides definitions of faults related to traffic rates and describes how to

troubleshoot low uplink/downlink UDP/TCP rates and rate fluctuations. UDP is short for User Datagram Protocol, and TCP is short for Transmission Control Protocol.

9 Troubleshooting Cell Unavailability Faults

This chapter defines cell unavailability faults and provides a troubleshooting method.

10 Troubleshooting IP Transmission Faults

This section defines IP transmission faults and describes how to troubleshoot IP transmission faults.

11 Troubleshooting Application Layer Faults

(5)

12 Troubleshooting Transmission Synchronization Faults

This chapter describes how to troubleshoot transmission synchronization faults. This type of faults include the clcok reference problem, IP clock link fault, system clock unlocked fault, base station synchronization frame number error, or time synchronization failure.

13 Troubleshooting Transmission Security Faults

This chapter describes how to troubleshoot transmission security faults.

14 Troubleshooting RF Unit Faults

This chapter describes the method and procedure for troubleshooting radio frequency (RF) unit faults in the Long Term Evolution (LTE) system.

15 Troubleshooting License Faults

This chapter describes how to diagnose and handle license faults.

16 Fault Information Collection

When faults cannot be rectified by referring to this document, collect fault information for Huawei technical support to quickly troubleshoot the faults. This section describes how to collect fault information.

Conventions

Symbol Conventions

The symbols that may be found in this document are defined as follows.

Symbol Description

Indicates an imminently hazardous situation which, if not avoided, will result in death or serious injury.

Indicates a potentially hazardous situation which, if not avoided, could result in death or serious injury.

Indicates a potentially hazardous situation which, if not avoided, may result in minor or moderate injury. Indicates a potentially hazardous situation which, if not avoided, could result in equipment damage, data loss, performance deterioration, or unanticipated results.

NOTICE is used to address practices not related to personal injury.

Calls attention to important information, best practices and tips.

NOTE is used to address information not related to personal injury, equipment damage, and environment deterioration.

(6)

General Conventions

The general conventions that may be found in this document are defined as follows.

Convention Description

Times New Roman Normal paragraphs are in Times New Roman.

Boldface Names of files, directories, folders, and users are in boldface. For example, log in as user root.

Italic Book titles are in italics.

Courier New Examples of information displayed on the screen are in Courier New.

Command Conventions

The command conventions that may be found in this document are defined as follows.

Boldface The keywords of a command line are in boldface. Italic Command arguments are in italics.

[ ] Items (keywords or arguments) in brackets [ ] are optional.

{ x | y | ... } Optional items are grouped in braces and separated by vertical bars. One item is selected.

[ x | y | ... ] Optional items are grouped in brackets and separated by vertical bars. One item is selected or no item is selected. { x | y | ... }* _{Optional items are grouped in braces and separated by}

vertical bars. A minimum of one item or a maximum of all items can be selected.

[ x | y | ... ]* _{Optional items are grouped in brackets and separated by}

vertical bars. Several items or no item can be selected.

GUI Conventions

The GUI conventions that may be found in this document are defined as follows.

Boldface Buttons, menus, parameters, tabs, window, and dialog titles are in boldface. For example, click OK.

> Multi-level menus are in boldface and separated by the ">" signs. For example, choose File > Create > Folder.

(7)

Keyboard Operations

The keyboard operations that may be found in this document are defined as follows.

Format Description

Key Press the key. For example, press Enter and press Tab.

Key 1+Key 2 Press the keys concurrently. For example, pressing Ctrl+Alt +A means the three keys should be pressed concurrently. Key 1, Key 2 Press the keys in turn. For example, pressing Alt, A means

the two keys should be pressed in turn.

Mouse Operations

The mouse operations that may be found in this document are defined as follows.

Action Description

Click Select and release the primary mouse button without moving

the pointer.

Double-click Press the primary mouse button twice continuously and quickly without moving the pointer.

Drag Press and hold the primary mouse button and move the

(8)

About This Document...ii

1 Changes in eRAN Troubleshooting Guide...1

2 Troubleshooting Process and Methods...3

2.1 General Troubleshooting Process...4

2.2 General Troubleshooting Steps...5

2.2.1 Backing Up Data...5

2.2.2 Collecting Fault Information...5

2.2.3 Determining the Fault Scope and Type...7

2.2.4 Identifying Fault Causes...9

2.2.5 Rectifying the Fault...9

2.2.6 Checking Whether Faults Have Been Rectified...10

2.2.7 Contacting Huawei Technical Support...10

3 Common Maintenance Functions...12

3.1 User Tracing...13

3.2 Interface Tracing...13

3.3 Comparison/Interchange...13

3.4 Switchover/Reset...13

4 Troubleshooting Access Faults...15

4.1 Definitions of Access Faults...16

4.2 Background Information...16

4.3 Troubleshooting Method...18

4.4 Troubleshooting Access Faults Due to Incorrect Parameter Configurations...20

4.5 Troubleshooting Access Faults Due to Radio Environment Abnormalities...26

5 Troubleshooting Intra-RAT Handover Faults...31

5.1 Definitions of Intra-RAT Handover Faults...33

5.4 Troubleshooting Intra-RAT Handover Faults Due to Hardware Faults...36

5.5 Troubleshooting Intra-RAT Handover Faults Due to Incorrect Data Configurations...39

(9)

5.7 Troubleshooting Intra-RAT Handover Faults Due to Poor Uu Quality...43

6 Troubleshooting Service Drops...47

6.1 Definitions of Service Drops...49

6.4 Troubleshooting Service Drops Due to Radio Faults...53

6.5 Troubleshooting Service Drops Due to Transmission Faults...54

6.6 Troubleshooting Service Drops Due to Congestion...55

6.7 Troubleshooting Service Drops Due to Handover Failures...56

6.8 Troubleshooting Service Drops Due to MME Faults...57

7 Troubleshooting Inter-RAT Handover Faults...59

7.1 Definitions of Inter-RAT Handover Faults...60

7.3 Troubleshooting Inter-RAT Handovers...60

8 Troubleshooting Rate Faults...66

8.1 Definitions of Rate Faults...67

8.3 Troubleshooting Abnormal Single-UE Rates...70

8.4 Troubleshooting Abnormal Multi-UE Rates...76

9 Troubleshooting Cell Unavailability Faults...79

9.1 Definitions of Cell Unavailability Faults...81

9.4 Troubleshooting Cell Unavailability Faults Due to Incorrect Data Configuration...84

9.5 Troubleshooting Cell Unavailability Faults Due to Abnormal Transport Resources...86

9.6 Troubleshooting Cell Unavailability Faults Due to Abnormal RF Resources...87

9.7 Troubleshooting Cell Unavailability Faults Due to Limited Capacity or Capability...90

9.8 Troubleshooting Cell Unavailability Faults Due to Faulty Hardware...91

10 Troubleshooting IP Transmission Faults...93

10.1 Definitions of IP Transmission Faults...94

10.4 Troubleshooting IP Physical Layer Faults...95

10.5 Troubleshooting IP Link Layer Faults...98

10.6 Troubleshooting IP Layer Faults...100

11 Troubleshooting Application Layer Faults...101

11.1 Definitions of Application Layer Faults...102

(10)

11.4 Troubleshooting SCTP Link Faults...104

11.5 Troubleshooting IP Path Faults...107

11.6 Troubleshooting OM Channel Faults...107

12 Troubleshooting Transmission Synchronization Faults...110

12.1 Definitions of Transmission Synchronization Faults...111

12.3 Troubleshooting Specific Transmission Synchronization Faults...111

13 Troubleshooting Transmission Security Faults... 115

13.1 Definitions of Transmission Security Faults...116

13.3 Troubleshooting Specific Transmission Security Faults...117

14 Troubleshooting RF Unit Faults...125

14.1 Definitions of RF Unit Faults...126

14.4 Troubleshooting VSWR Faults...132

14.5 Troubleshooting RTWP Faults...134

14.6 Troubleshooting ALD Link Faults...140

15 Troubleshooting License Faults...142

15.1 Definitions of License Faults...143

15.4 Troubleshooting License Faults That Occur During License Installation...144

15.5 Troubleshooting License Faults That Occur During Network Running...147

15.6 Troubleshooting License Faults That Occur During Network Adjustment...149

(11)

1

Changes in eRAN Troubleshooting Guide

This chapter describes the changes in eRAN Troubleshooting Guide.

04 (2013-08-30)

This is the fourth official release.

Compared with issue 03 (2012-12-29), this issue includes the following new information.

Topic Change Description

16 Fault Information Collection Added the common fault information and collection methods.

Compared with issue 03 (2012-12-29), this issue includes the following changes.

4.3 Troubleshooting Method Modified the description of possible causes and flowcharts.

No information in issue 03 (2012-12-29) is deleted from this issue.

03 (2012-12-29)

This is the third official release.

Compared with issue 02 (2012-07-30), this issue does not include any new information. Compared with issue 02 (2012-07-30), this issue includes the following changes.

(12)

9.5 Troubleshooting Cell Unavailability Faults Due to Abnormal Transport Resources

l Deleted the possible cause: the IP path is faulty or not configured.

l Added a possible cause: the S1 interface is faulty or not configured.

13.3 Troubleshooting Specific Transmission Security Faults

Updated the figure 9.

02 (2012-07-30)

This is the second official release.

Compared with issue 01 (2012-06-29), this issue does not include any new information. Compared with issue 01 (2012-06-29), this issue includes the following changes.

Whole document Updated descriptions.

01 (2012-06-29)

This is the first official release.

Compared with draft A (2012-05-11), this issue does not include any new information. Compared with draft A (11.05.12), this issue includes the following changes.

14.5 Troubleshooting RTWP Faults Added the step for troubleshooting, including the step for diagnosing and handling the cross-connected antennas.

No information in draft A (2012-05-11) is deleted from this issue.

Draft A (2012-05-11)

(13)

2

Troubleshooting Process and Methods

About This Chapter

This chapter describes the general troubleshooting process and methods.

2.1 General Troubleshooting Process

This section describes the general troubleshooting process.

2.2 General Troubleshooting Steps

(14)

2.1 General Troubleshooting Process

This section describes the general troubleshooting process.

Figure 2-1 shows the general troubleshooting process.

Figure 2-1 General troubleshooting process

Table 2-1 details each step of the general troubleshooting process.

Table 2-1 Steps in the general troubleshooting process

No. Step Remarks

1 2.2.1 Backing Up Data Data to be backed up includes the database, alarm information, and log files.

(15)

No. Step Remarks 2 2.2.2 Collecting Fault

Information

Fault information is essential to troubleshooting. Therefore, maintenance personnel are advised to collect as much fault information as possible.

3 2.2.3 Determining the Fault Scope and Type

Determine the fault scope and type based on the symptoms.

4 2.2.4 Identifying Fault Causes

Identify the fault causes based on the fault information and symptom.

5 2.2.5 Rectifying the Fault

Take appropriate measures or steps to rectify the fault.

6 2.2.6 Checking

Whether Faults Have Been Rectified

Verify whether the fault is rectified.

If the fault is rectified, the troubleshooting process ends. If the fault persists, check whether this fault falls in another fault scope or type.

7 2.2.7 Contacting

Huawei Technical Support

If the fault scope or type cannot be determined, or the fault cannot be rectified, contact Huawei technical support.

2.2 General Troubleshooting Steps

This section describes each step in the general troubleshooting process in detail.

2.2.1 Backing Up Data

To ensure data security, first save onsite data and back up related databases, alarm information, and log files during troubleshooting.

For details about data to be backed up and how to back up data, see eNodeB Routine Maintenance Guide.

2.2.2 Collecting Fault Information

Fault information is essential to troubleshooting. Therefore, maintenance personnel should collect fault information as much as possible.

Fault Information to Be Collected

Before rectifying a fault, collect the following information: l Fault symptom

l Time, location, and frequency l Scope and impact

(16)

l Operations performed on the equipment before the fault occurs, and the results of these operations

l Measures taken to deal with the fault, and the results l Alarms and correlated alarms when the fault occurs l Board indicator status when the fault occurs

Fault Information Collection Methods

The methods for collecting fault information are as follows:

l Consult the person who reports the fault about the symptom, time, location, and frequency of the fault.

l Consult maintenance personnel about the equipment running status, fault symptom, operations performed before the fault occurs, and measures taken after the fault occurs and the effect of these measures.

l Observe the board indicator, operation and maintenance (OM) system, and alarm management system to obtain the software and hardware running status.

l Estimate the scope and impact of the fault by means of service demonstration, performance measurement, and interface or signaling tracing.

Fault Information Collection Skills

The following are skills in collecting fault information:

l Do not handle a fault hastily. Collect as much information as possible before rectifying the fault.

l Keep good liaison with maintenance personnel of other sites. Resort to them for technical support if necessary.

Fault Information Classification

Table 2-2 Fault information types

Type Attribute Description Original

information

Definition Original information includes the fault information reflected in user complaints, fault notifications from other offices, exceptions detected in maintenance, and the information collected by maintenance personnel through different channels in the early period when the fault is found. Original information is important for fault locating and analysis. Function Original information is used to determine the fault scope and

fault category. Original information helps narrow the fault scope and locate the faults in the initial stage of

troubleshooting. Original information can also help troubleshoot other faults, especially trunk faults.

(17)

Type Attribute Description Alarm

information

Definition Alarm information is the output of the eNodeB alarm system. It relates to the hardware, links, trunk, and CPU load of the eNodeB, and includes the description of faults or exceptions, fault causes, and handling suggestions. Alarm information is a key element for fault locating and analysis.

Function Alarm information is specific and complete; therefore, it is directly used to locate the faulty component or find the fault cause. In addition, alarm information can also be used with other methods to locate a fault.

Reference For details about how to use the alarm system, see M2000 Online Help. For detailed information about each alarm, see eNodeB Alarm Reference.

Indicator status

Definition Board indicators indicate the running status of boards, circuits, links, optical channels, and nodes. Indicator status information is also a key element for fault locating and analysis.

Function By analyzing indicator status, you can roughly locate faulty components or fault causes that facilitate subsequent operations. Generally, indicator status information is combined with alarm information for locating faults.

Reference For the description of indicator status, see associated hardware description manuals.

Performance counter

Definition Performance counters are statistics about service performance, such as statistics about service drops and handovers. They help find out causes of service faults so that measures can be taken in a timely manner to prevent such faults.

Function Performance counters are used with signaling tracing and signaling analysis to locate causes of service faults such as a high service drop rate, low handover success rate, and service exception. They are generally used for the key performance indicator (KPI) analysis and performance monitoring of the entire network.

Reference For details about the usage of performance counters, see M2000 Online Help. For the definitions of each performance counter, see eNodeB Performance Counter Reference.

2.2.3 Determining the Fault Scope and Type

Based on the fault symptom, determine the fault scope and type.

In this document, faults are classified according to symptoms. eRAN faults are classified into service faults and equipment faults.

(18)

Service Faults

Service faults are further classified into the following types: l Access faults

– User access fails.

– The access success rate is low. l Handover faults

– The intra-frequency handover success rate is low. – The inter-frequency handover success rate is low. l Service drop faults

– Service drops occur during handovers. – Services are unexpectedly released. l Inter-RAT interoperability faults

Inter-RAT handovers cannot be normally performed. l Rate faults

– Data rates are low. – There is no data rate. – Data rates fluctuate.

Equipment Faults

Equipment faults are further classified into the following types: l Cell faults

– Cell setup fails. – Cell activation fails.

l Operation and maintenance channel (OMCH) faults – The OMCH is interrupted or fails intermittently. – The CPRI link does not work properly.

– The S1/X2/SCTP/IPPATH links do not work properly. – IP transport is abnormal.

l Clock faults

– The clock source is faulty. – The IP clock link is faulty. – The system clock is out of lock. l Security faults

– The IPSec tunnel is abnormal. – SSL negotiation is abnormal.

– Digital certificate processing is abnormal. l Radio frequency faults

(19)

– The received total wideband power (RTWP) on the RX channel is abnormal. – The antenna line device (ALD) link does not work properly.

l License faults

– License installation fails. – License modification fails.

2.2.4 Identifying Fault Causes

Fault locating is a process of finding the fault causes from many possible causes. By analyzing and comparing all possible causes and then excluding impossible factors, you can determine the specific fault causes.

Locating Equipment Faults

Locating equipment faults is easier than locating service faults. Though there are many types of equipment faults, the fault scope is relatively narrow. Equipment faults are generally indicated by the indicator status, alarms, and error messages. Based on the indicator status information, alarm handling suggestions, or error messages, users can rectify most equipment faults.

Locating Service Faults

The methods for locating different types of service faults are as follows:

l Access faults: Check the S1 interface and Uu interface. Locate transmission faults segment by segment. Then, determine whether faults occur in the eRAN based on the interface conditions. If so, proceed to locate specific faults.

l Rate faults: Check whether there are access faults. If there are access faults, locate specific faults by using the previous methods. Then, check the traffic on the IP path to determine fault points.

l Handover faults: Start signaling tracing and determine fault points according to the signaling flow.

For instructions on fault locating and analysis, see 3 Common Maintenance Functions.

2.2.5 Rectifying the Fault

To rectify a fault, take proper measures to eliminate the fault and restore the system, including checking and repairing cables, replacing boards, modifying configuration data, switching over the system, and resetting boards. Maintenance personnel need to rectify different faults using proper methods.

After the fault is rectified, be sure to perform the following: l Perform testing to confirm that the fault has been rectified. l Record the troubleshooting process and key points.

l Summarize measures of preventing or decreasing such faults. This helps to prevent similar faults from occurring in the future.

(20)

2.2.6 Checking Whether Faults Have Been Rectified

Check the equipment running status, observe the board indicator status, and query alarm information to verify that the system is running properly. Perform testing to confirm that faults have been rectified and that services return to normal.

2.2.7 Contacting Huawei Technical Support

If the fault scope or type cannot be determined, or the fault cannot be rectified, contact Huawei technical support.

If you need to contact Huawei technical support during troubleshooting, collect necessary information in advance.

Collecting General Fault Information

General fault information includes the following: l Name of the office

l Name and phone number of the contact person l Time when the fault occurs

l Detailed description of the fault symptoms l Host software version of the equipment

l Measures taken after the fault occurs and the result

l Severity level of the fault and the time required for rectifying the fault

Collecting Fault Location Information

When a fault occurs, collect the following information: l One-click logs of the main control board

l One-click logs of baseband boards l One-click logs of RRUs

l Alarm information

l KPI data of the entire network

l Intelligent field test system (IFTS) tracing l Cell drive test (DT) tracing

l SCTP link tracing

l Signaling tracing on interfaces l eNodeB configuration information

l M2000 self-organizing network (SON) logs l M2000 adaptation logs

l M2000 software module management logs

For details about how to collect fault information, see eNodeB LMT User Guide, eNodeB Performance Monitoring Reference, eNodeB Routine Maintenance Guide, and M2000 Online Help.

(21)

Contacting Huawei Technical Support

The following lists the contact information of Huawei technical support: l If you are in mainland China, dial 4008302118.

l If you are outside mainland China, contact the technical support personnel in the local Huawei office.

l Email: [email protected]

(22)

3

Common Maintenance Functions

About This Chapter

This chapter describes common maintenance functions that are used to analyze and handle faults. It also explains or provides references on how to use the functions.

3.1 User Tracing

User tracing is a function that traces all messages of a user in sequence over standard and internal interfaces, traces internal status of the user equipment (UE), and displays the tracing results on the screen.

3.2 Interface Tracing

Interface tracing is a function that traces all messages within a period in sequence on a standard or internal interface and displays them on the screen.

3.3 Comparison/Interchange

Comparison and interchange are used to locate faults in a piece or pieces of equipment.

3.4 Switchover/Reset

Switchover helps identify whether the originally active equipment is faulty or whether the active/ standby relationship is normal. Reset is used to identify whether software running errors exist.

(23)

3.1 User Tracing

User tracing is a function that traces all messages of a user in sequence over standard and internal interfaces, traces internal status of the user equipment (UE), and displays the tracing results on the screen.

User tracing has the following advantages: l Real-time

l Able to trace the user over all standard interfaces l Usable when traffic is heavy

l Applicable in various scenarios, for example, call procedure analysis and VIP user tracing User tracing is usually used to diagnose call faults that can be reproduced. For details about how to perform user tracing, see the online help for the operation and maintenance system.

3.2 Interface Tracing

Interface tracing is a function that traces all messages within a period in sequence on a standard or internal interface and displays them on the screen.

Interface tracing has the following advantages: l Real-time

l Complete: All messages within a period on an interface can be traced. l Able to trace link management messages

Interface tracing applies in scenarios where user equipment (UEs) involved are uncertain. For example, this function can be used to diagnose the cause for a low success rate of radio resource control (RRC) connection setup at a site. For details about how to perform interface tracing, see the online help for the operation and maintenance system.

3.3 Comparison/Interchange

Comparison and interchange are used to locate faults in a piece or pieces of equipment. Comparison is a function used to locate a fault by comparing the faulty component or fault symptom with a functional component or normal condition, respectively. Interchange is a function used to locate a fault by interchanging a possibly faulty component with a functional component and comparing the running status before and after the interchange.

Comparison usually applies in scenarios with a single fault. Interchange usually applies in scenarios with complicated faults.

3.4 Switchover/Reset

Switchover helps identify whether the originally active equipment is faulty or whether the active/ standby relationship is normal. Reset is used to identify whether software running errors exist.

(24)

Switchover switching of the active and standby roles of equipment so that the standby equipment takes over services. Comparing the running status before and after the switchover helps identify whether the originally active equipment is faulty or whether the active/standby relationship is normal. Reset is a means to manually restart part of or the entire equipment. It is used to identify whether software running errors exist.

Switchover and reset can only be emergency resorts. Exercise caution when using them, because: l Compared with other functions, switchover and reset can only be auxiliary means for fault

locating.

l Because software runs randomly, a fault is usually not reproduced in a short period after a switchover or reset. This hides the fault, which causes risks in secure and stable running of the equipment.

l Resets might interrupt services. Improper operations may even cause collapse. The interruption and collapse have a severe impact on the operation of the system.

(25)

4

Troubleshooting Access Faults

About This Chapter

This chapter describes how to diagnose and handle access faults.

4.1 Definitions of Access Faults

If an access fault occurs, UEs have difficulty accessing the network due to radio resource control (RRC) connection setup failures or E-UTRAN radio access bearer (E-RAB) setup failures.

4.2 Background Information

This section provides counters and alarms related to access faults, and methods for analyzing TopN cells.

4.3 Troubleshooting Method

This section describes how to identify and troubleshoot the possible cause.

4.4 Troubleshooting Access Faults Due to Incorrect Parameter Configurations

This section provides information required to troubleshoot access faults due to incorrect parameter configurations. The information includes fault descriptions, background information, possible causes, fault handling method and procedure, and typical cases.

4.5 Troubleshooting Access Faults Due to Radio Environment Abnormalities

This section provides information required to troubleshoot access faults due to radio environment abnormalities. The information includes fault descriptions, background information, possible causes, fault handling method and procedure, and typical cases.

(26)

4.1 Definitions of Access Faults

If an access fault occurs, UEs have difficulty accessing the network due to radio resource control (RRC) connection setup failures or E-UTRAN radio access bearer (E-RAB) setup failures.

4.2 Background Information

This section provides counters and alarms related to access faults, and methods for analyzing TopN cells.

In Long Term Evolution (LTE) networks, access faults occur either during radio resource control (RRC) connection setup or during E-UTRAN radio access bearer (E-RAB) setup. The access success rate is a key performance indicator (KPI) that quantifies end user experience. An excessively low access success rate indicates that end users have difficulty making mobile-originated or mobile-terminated calls.

Related Counters

l RRC Connection Setup Measurement (Cell)(RRC.Setup.Cell)

l RRC Connection Setup Failure Measurement (Cell)(RRC.SetupFail.Cell) l E-RAB Setup Measurement (Cell)(E-RAB.Est.Cell)

l E-RAB Setup Failure Measurement (Cell)(E-RAB.EstFail.Cell) For details, see eNodeB Performance Counter Reference.

Related Alarms

l Hardware-related alarms

– ALM-26104 Board Temperature Unacceptable – ALM-26106 Board Clock Input Unavailable – ALM-26107 Board Input Voltage Out of Range – ALM-26200 Board Hardware Fault

– ALM-26202 Board Overload

– ALM-26203 Board Software Program Error – ALM-26208 Board File System Damaged l Temperature-related alarms

– ALM-25650 Ambient Temperature Unacceptable – ALM-25651 Ambient Humidity Unacceptable – ALM-25652 Cabinet Temperature Unacceptable – ALM-25653 Cabinet Humidity Unacceptable

– ALM-25655 Cabinet Air Outlet Temperature Unacceptable – ALM-25656 Cabinet Air Inlet Temperature Unacceptable l Link-related alarms

(27)

– ALM-25880 Ethernet Link Fault – ALM-25886 IP Path Fault – ALM-25888 SCTP Link Fault – ALM-25889 SCTP Link Congestion

– ALM-26233 BBU CPRI Optical Interface Performance Degraded – ALM-26234 BBU CPRI Interface Error

– ALM-29201 S1 Interface Fault

– ALM-29211 Excessive Packet Loss Rate in the Transmission Network – ALM-29212 Excessive Delay in the Transmission Network

– ALM-29213 Excessive Jitter in the Transmission Network l RF-related alarms

– ALM-26239 RX Channel RTWP/RSSI Unbalanced Between RF Units – ALM-26520 RF Unit TX Channel Gain Out of Range

– ALM-26521 RF Unit RX Channel RTWP/RSSI Too Low – ALM-26522 RF Unit RX Channel RTWP/RSSI Unbalanced l Configuration-related alarms

– ALM-26245 Configuration Data Inconsistency – ALM-26243 Board Configuration Data Ineffective

– ALM-26812 System Dynamic Traffic Exceeding Licensed Limit – ALM-26815 Licensed Feature Entering Keep-Alive Period – ALM-26818 No License Running in System

– ALM-26819 Data Configuration Exceeding Licensed Limit – ALM-29243 Cell Capability Degraded

– ALM-29247 Cell PCI Conflict For details, see eNodeB Alarm Reference.

TopN Cell Selection

TopN cells can be selected by analyzing the daily KPI file exported by the M2000. l Top3 cells with the largest amounts of failed RRC connection setups

(L.RRC.ConnReq.Att - L.RRC.ConnReq.Succ) and lowest RRC connection setup success rates

l Top3 cells with the largest amounts of failed E-RAB setups and lowest E-RAB setup success rates

Tracing TopN Cells

After finding out topN cells and the periods when they have the lowest success rates, start Uu, S1, and X2 interface tracing tasks and check the exact point where the RRC connection or E-RAB setup fails.

(28)

In addition, after the Evolved Packet Core (EPC) obtains the international mobile subscriber identity (IMSI) of the UE with the lowest success rate based on the UE's temporary mobile subscriber identity (TMSI), you can start a task to trace the UE throughout the whole network.

Analyzing Environmental Interference to TopN Cells

Environmental interference to a cell consists of downlink (DL) interference and uplink (UL) interference to the cell. The following methods can be used to check the environmental interference:

l To check DL interference, use a spectral scanner. If both neighboring cells and external systems may cause DL interference to the cell, locate the exact source of the DL interference.

l To check UL interference, start a cell interference detection task and analyze the result.

4.3 Troubleshooting Method

Possible Causes

Scenario Fault Description Possible Causes

The RRC connection fails to be set up.

l The UE cannot search cells.

l A fault occurs in radio interface processing. l Top user problems occur.

l Parameters of the UE or eNodeB are incorrectly configured.

l The radio environment is abnormal.

l The UE is abnormal. The E-RAB fails to be set up. l Resources are

insufficient.

l A fault occurs in radio interface processing. l The EPC is abnormal. l Top user problems occur.

l Parameters of the UE or eNodeB are incorrectly configured.

l The radio environment is abnormal.

l Parameters of the Evolved Packet Core (EPC) are incorrectly configured. l The UE is abnormal.

Troubleshooting Flowchart

Figure 4-1 show the troubleshooting flowcharts for handling low RRC connection setup rates and low E-RAB setup rates, respectively.

(29)

Figure 4-1 Troubleshooting flowcharts for handling low RRC connection setup rates and low E-RAB setup rates

Troubleshooting Procedure

1. Select topN cells.

2. Check whether parameters of the UE or eNodeB are incorrectly configured. l Yes: Correct the parameter configurations. Go to 3.

l No: Go to 4.

3. Check whether the fault is rectified. l Yes: End.

l No: Go to 4.

4. Check whether the radio environment is abnormal.

l Yes: Handle abnormalities in the radio environment. Go to 5. l No: Go to 6.

(30)

l Yes: End. l No: Go to 6.

6. Check whether parameters of the EPC are incorrectly configured. l Yes: Correct the parameter configurations. Go to 7.

l No: Go to 8.

7. Check whether the fault is rectified. l Yes: End.

l No: Go to 8.

8. Contact Huawei technical support.

4.4 Troubleshooting Access Faults Due to Incorrect

Parameter Configurations

This section provides information required to troubleshoot access faults due to incorrect parameter configurations. The information includes fault descriptions, background information, possible causes, fault handling method and procedure, and typical cases.

Fault Description

l The UE cannot receive broadcast information from the cell. l The UE cannot receive signals from the cell.

l The UE cannot camp on the cell.

l The end user complains about an access failure, and the value of the performance counter L.RRC.ConnReq.Att is 0.

l An RRC connection is successfully set up for the UE according to standard interface tracing results, but then the mobility management entity (MME) releases the UE because the authentication procedure fails.

l The end user complains that the UE can receive signals from the cell but is unable to access the cell.

l According to the values of the performance counters on the eNodeB side, the number of RRC connections that are successfully set up is much greater than the number of E-RABs that are successfully set up.

l According to the KPIs, the E-RAB setup success rate is relatively low, and among all cause values, the cause values indicated by L.E-RAB.FailEst.TNL and L.E-RAB.FailEst.RNL contribute a large proportion.

Background Information

None

Possible Causes

l Cell parameters are incorrectly configured. For example, the E-UTRA absolute radio frequency number (EARFCN), public land mobile network (PLMN) ID, threshold used in the evaluation of cell camping, pilot strength, and access class.

(31)

l The UE has special requirements for authentication and encryption.

l Parameters of the subscriber identity module (SIM) card or registration-related parameters on the home subscriber server (HSS) are incorrectly configured.

l The authentication and encryption algorithms are incorrectly configured on the Evolved Packet Core (EPC).

l The IPPATH or IPRT managed objects (MOs) are incorrectly configured.

Fault Handling Flowchart

(32)

Fault Handling Procedure

1. Check whether cell parameters are incorrectly configured. Pay special attention to the following parameter settings as they are often incorrectly configured: the EARFCN, PLMN ID, threshold used in the evaluation of cell camping, pilot strength, and access class. Yes: Correct the cell parameter configurations. Go to 2.

No: Go to 3.

2. Check whether the fault is rectified. Yes: End.

No: Go to 3.

3. Check the type and version of the UE and determine whether the authentication and encryption functions are required.

Yes: Enable the authentication and encryption functions. Go to 4. No: Go to 5.

No: Go to 5.

5. Check whether parameters of the SIM card or registration-related parameters on the HSS are incorrectly configured. The parameters of the SIM card include the K value, originating point code (OPC), international mobile subscriber identity (IMSI), and whether this SIM card is a UMTS SIM (USIM) card.

Yes: Correct the parameter configurations. Go to 6. No: Go to 7.

No: Go to 7.

7. Check whether the authentication and encryption algorithms are incorrectly configured on the EPC. For example, check whether the switches for the algorithms are turned off. Yes: Modify the parameter configuration on the EPC. Go to 8.

No: Go to 9.

9. Check whether the IPPATH or IPRT MOs are incorrectly configured. Yes: Correct the MO configurations. Go to 10.

No: Go to 11.

11. Check whether the fault can be diagnosed by tracing the access signaling procedure. Yes: Handle the fault. Go to 12.

(33)

No: Go to 13.

Typical Cases

l Case 1: An E398 UE failed to access the network despite the fact that the authentication and encryption functions were enabled on the EPC.

Fault Description

During a site test, an E398 UE failed to access a network where the authentication and encryption functions were enabled on the EPC.

Fault Diagnosis

1. The S1 interface was traced. According to the tracing result shown in Figure 4-3, the access attempt was rejected due to no-Sultable-Cells-In-tracking-area(15).

Figure 4-3 S1 tracing result

2. The signaling at the EPC side was traced. According to the tracing result shown in

Figure 4-4, the access attempt was rejected by the HSS in the diameter-authorization-rejected(5003) message.

Figure 4-4 Tracing result of the signaling at the EPC side

3. The UE was checked. Specifically, the configuration, registration information, and the category of the SIM card were checked. Then, the cause of the fault was located, which was that the E398 UE used a SIM card. In response to the access request from a UE using a SIM card, the EPC would reply a diameter-authorization-rejected message. Figure 4-5 shows a snapshot of the related section in 3GPP TS 29.272.

(34)

Figure 4-5 Related section in the protocol

In conclusion, the E398 UE was unable to access the network because the UE used a SIM card. To access an LTE network, the UE must use a USIM card.

Fault Handling

The SIM card in the E398 UE was replaced by a USIM card. Then, the authentication procedure was successful and the UE successfully accessed the network.

l Case 2: The E-RAB setup success rate at a site deteriorated due to incorrect transport resource configurations.

According to the KPIs for a site, the E-RAB setup success rate deteriorated intermittently. Fault Diagnosis

1. The cause value contained in the S1AP_INITIAL_CONTEXT_SETUP_FAIL message (that is, the initial context setup request message) was checked and was found to be transport resource unavailable(0), as shown in Figure 4-6.

(35)

Figure 4-6 Snapshot of the S1AP_INITIAL_CONTEXT_SETUP_FAIL message

This cause value indicates that the E-RAB failed to be set up due to faults related to transport resources, rather than faults related to radio resources.

2. The IP address contained in the S1AP_INITIAL_CONTEXT_SETUP_REQ message was checked and was found to be 8A:14:05:14. However, this IP address (8A: 14:05:14) was different from the peer IP address (8A 14 05 13) specified in the IPPATH MO. Figure 4-7 shows the details of the

S1AP_INITIAL_CONTEXT_SETUP_REQ message.

Figure 4-7 Snapshot of the S1AP_INITIAL_CONTEXT_SETUP_REQ message

3. This inconsistency was investigated. As the EPC maintenance personnel confirmed, multiple logical IP addresses were configured on the interface of the unified gateway (UGW), but only one IPPATH MO was configured on the eNodeB. As a result, the E-RAB failed to be set up.

(36)

Fault Handling

New IPPATH MOs were configured on the eNodeB based on the network plan. Then, the E-RAB setup success rate was observed for a while, during which the E-RAB setup success rate was normal all along.

4.5 Troubleshooting Access Faults Due to Radio

Environment Abnormalities

This section provides information required to troubleshoot access faults due to radio environment abnormalities. The information includes fault descriptions, background information, possible causes, fault handling method and procedure, and typical cases.

Fault Description

l During a random access procedure, the UE cannot receive any random access responses. l During an RRC connection setup process, the eNodeB has not received any RRC

connection setup complete messages within the related timeout duration. l During an E-RAB setup process, the response in security mode times out.

l The eNodeB has not received any RRC connection reconfiguration complete messages within the related timeout duration.

l At the eNodeB side, both the RRC connection setup success rate and the E-RAB setup success rate are low.

Background Information

Radio environment abnormalities include radio interference, imbalance between the uplink (UL) and downlink (DL) quality, weak coverage, and eNodeB hardware faults (such as distinct antenna configurations). The items to be investigated as well as the methods of investigating these items are described as follows:

l Investigating radio interference

DL interference from neighboring cells, DL interference from external systems, and UL interference need to be investigated. To investigate the DL interference, use a spectral scanner. To investigate the UL interference, start a cell interference detection task. l Investigating weak coverage

The reference signal received power (RSRP) values reported by UEs during their access need to be investigated. If most of these values are relatively low, it is highly probable that the access difficulties lie in the weak coverage provided by the cell.

The actual radius of cell coverage as well as the signal quality variation need to be investigated so that users can determine whether wide coverage or cross-cell coverage occurs.

l Investigating the imbalance between UL and DL quality

The transmit power of the remote radio unit (RRU) and UE need to be investigated to check whether UL or DL limitations have occurred, because imbalance between UL and DL quality is caused by UL limitations or DL limitations.

(37)

l Investigating eNodeB hardware

If two antennas are used, the tilt and azimuth of each antenna need to be investigated. If their tilts or azimuths are significantly different from each other, adjust them so that their tilts and azimuths are the same.

The jumper connection needs to be investigated by analyzing drive test results. If the jumper is reversely connected, the UL signal level will be much lower than the DL signal level in the cell, in which case UEs remote from the eNodeB will easily encounter access failures. Therefore, if the jumper is reversely connected, rectify the jumper connection.

The physical conditions of feeders need to be investigated. If a feeder is damaged, water immersed, bending, or not securely connected, a large number of call drops will occur. If a voltage standing wave ratio (VSWR) alarm is reported, such problems exist and you need to replace the faulty feeder.

Figure 4-8 and Figure 4-9 show common causes of random access failures and E-RAB setup failures, respectively.

Figure 4-8 Common causes of random access failures

(38)

Possible Causes

l The cell provides weak coverage.

l The UE does not use the maximum transmit power. l Inter-modulation interference exists.

l The UE is located at cell edge.

Fault Diagnosis

To effectively diagnose access faults due to radio environment abnormalities, you are advised to firstly find out whether this fault is caused by radio interference or weak coverage. The following procedure is recommended:

Fault Handling Procedure

1. Check whether related alarms are reported.

Yes: Handle these alarms by referring to eNodeB Alarm Reference. Go to 2. No: Go to 3.

No: Go to 3.

3. Check whether interference exists. By using a spectral scanner, check whether there is DL interference from neighboring cells or external systems. By analyzing the cell interference detection result, check whether there is UL interference.

Yes: Minimize the interference. Go to 4. No: Go to 5.

No: Go to 5.

5. Check whether the transmit power of the RRU and UE falls beyond link budgets. Yes: Adjust the UL and DL transmit power. Go to 6.

No: Go to 7.

7. Check whether cell coverage is abnormal.

Yes: Based on the RSRP distribution of the UEs attempting to access the cell, investigate and handle possible coverage, interference, and imbalance between UL and DL quality by using drive tests. Go to 8.

No: Go to 9.

(39)

Typical Cases

According to the KPIs for an eNodeB at a site, the RRC connection setup success rate fluctuated significantly within a period.

Fault Diagnosis

1. The KPIs were checked. For local cell 1, the daily RRC connection success rate was only 52%.

Figure 4-10 PRS KPI about RRC connection setups

2. The signaling over the Uu interface was traced. The result indicated that all RRC connection setup failures occurred because UEs do not respond. The following figure shows a snapshot of the signaling traced over the Uu interface.

Figure 4-11 Signaling traced over the Uu interface

3. Simulated load was added to the LTE side. The impact of the DL LTE signals on the DL GSM signals was tested, during which the call drop rate at the GSM side raised significantly. As a result, it was highly probable that inter-modulation interference existed.

4. Online spectral scan was applied to the LTE side. Interference with a magnitude of 10 dB was found within the high-frequency resource blocks (RBs), which affected signaling transmission.

(40)

Figure 4-12 Online precise spectral scan result

5. The site was investigated and the cause of the fault was located. The LTE and GSM sides shared the same antennas. The antennas aged and induced inter-modulation interference. Fault Handling

(41)

5

Troubleshooting Intra-RAT Handover

Faults

About This Chapter

This chapter describes how to diagnose and handle intra-RAT handover faults. RAT is short for radio access technology.

5.1 Definitions of Intra-RAT Handover Faults

If an intra-RAT handover fault occurs, UEs have difficulty performing intra-RAT handovers due to system faults.

5.2 Background Information

This section describes counters and alarms related to intra-RAT handover faults. In addition, this section provides intra-RAT handover procedures.

5.3 Troubleshooting Method

5.4 Troubleshooting Intra-RAT Handover Faults Due to Hardware Faults

This section provides information required to troubleshoot intra-RAT handover faults due to hardware faults. The information includes fault descriptions, background information, possible causes, fault handling method and procedure, and typical cases.

5.5 Troubleshooting Intra-RAT Handover Faults Due to Incorrect Data Configurations

This section provides information required to troubleshoot intra-RAT handover faults due to incorrect data configurations. The information includes fault descriptions, background information, possible causes, fault handling method and procedure, and typical cases.

5.6 Troubleshooting Intra-RAT Handover Faults Due to Target Cell Congestion

This section provides information required to troubleshoot intra-RAT handover faults due to target cell congestion. The information includes fault descriptions, background information, possible causes, fault handling method and procedure, and typical cases.

(42)

This section provides information required to troubleshoot intra-RAT handover faults due to poor Uu quality. The information includes fault descriptions, background information, possible causes, fault handling method and procedure, and typical cases.

(43)

5.1 Definitions of Intra-RAT Handover Faults

If an intra-RAT handover fault occurs, UEs have difficulty performing intra-RAT handovers due to system faults.

5.2 Background Information

This section describes counters and alarms related to intra-RAT handover faults. In addition, this section provides intra-RAT handover procedures.

Related Counters

l Outgoing Handover Measurement (Cell)(HO.eRAN.Out.Cell) l Incoming Handover Measurement (Cell)(HO.eRAN.In.Cell) For details, see eNodeB Performance Counter Reference.

Related Alarms

l Board overload alarm

– ALM-26202 Board Overload l Alarms related to RF modules

– ALM-26529 RF Unit VSWR Threshold Crossed

– ALM-26522 RF Unit RX Channel RTWP/RSSI Unbalanced l Cell capability degraded alarm

– ALM-29243 Cell Capability Degraded l Alarms related to CPRI links

– ALM-26235 RF Unit Maintenance Link Failure – ALM-26234 BBU CPRI Interface Error

– ALM-26233 BBU CPRI Optical Interface Performance Degraded – ALM-26506 RF Unit Optical Interface Performance Degraded l Alarms related to clock sources

– ALM-26263 IP Clock Link Failure – ALM-26264 System Clock Unlocked – ALM-26538 RF Unit Clock Problem – ALM-26260 System Clock Failure

– ALM-26265 Base Station Frame Number Synchronization Error

Handover Procedures

Handovers are classified as coverage-based, load-based, frequency-priority-based, service-based, and UL-quality-based. For details, see eRAN Mobility Management in Connected Mode Feature Parameter Description.

(44)

5.3 Troubleshooting Method

Possible Causes

There are various causes of handover faults, such as incorrect data configuration, hardware faults, interference, and poor Uu quality. Therefore, to effectively diagnose a handover fault, you need to carry out a pertinent analysis based on the actual situation.

Table 5-1 shows possible causes of handover faults.

Table 5-1 Possible causes of handover faults

Scenario Fault Description Possible Causes

The whole network experiences abnormalities.

l The performance counters throughout the whole network are abnormal.

l Related alarms are reported.

l Network parameters are incorrectly configured. l The signaling exchange

procedure is incorrect.

A single eNodeB experiences abnormalities.

l The performance counters for the serving cell are abnormal. l Related alarms are

reported. l Handovers to

neighboring cells are seldom initiated. l Handovers to

neighboring cells are frequently initiated. l The UE cannot receive

handover commands from the network.

l Hardware is faulty. l Parameters are set to

inappropriate values. l The target cell is

congested.

l The Uu quality is poor.

Fault Analysis

The following measures are effective in locating a handover fault: l Analyzing handover-related performance counters

l Investigating TopN cells

l Checking alarms related to devices or data transmission l Checking the configurations of neighboring cells

(45)

l Checking handover algorithm configurations l Investigating interference and cell coverage

To locate an intra-RAT handover fault, you are advised to select TopN cells with handover faults and then follow the troubleshooting procedure shown in Figure 5-1.

Figure 5-1 Troubleshooting flowchart for intra-RAT handover faults

Troubleshooting Procedure

1. Check whether the hardware is faulty.

Hardware faults are the most likely cause if handovers suddenly become abnormal without recent modifications to the configurations of the abnormal cell and its neighboring cells. Yes: Hardware faults are often accompanied by alarms. You are advised to handle the fault by following the instructions on how to troubleshoot handover faults due to hardware faults. Go to 2.

(46)

No: Go to 3.

3. Check whether handover parameters are incorrectly configured.

Specifically, check whether handover thresholds and neighboring cell configurations are incorrect.

Yes: Follow the instructions on how to troubleshoot handover faults due to incorrect data configurations. Go to 4.

No: Go to 5.

5. Check whether the service channel of the target cell is severely congested.

Check the service satisfaction rates to determine whether the service channel of the target cell is severely congested.

Yes: Follow the instructions on how to troubleshoot handover faults due to target cell congestion. Go to 6.

No: Go to 7.

7. Check whether the Uu quality is poor.

Poor Uu quality will cause abnormal signaling exchanges, leading to handover failures. Yes: Follow the instructions on how to troubleshoot handover faults due to poor Uu quality. Go to 8.

No: Go to 9.

5.4 Troubleshooting Intra-RAT Handover Faults Due to

Hardware Faults

This section provides information required to troubleshoot intra-RAT handover faults due to hardware faults. The information includes fault descriptions, background information, possible causes, fault handling method and procedure, and typical cases.

Fault Description

Typical hardware faults include faulty or overloaded boards, as well as abnormal radio frequency (RF) module or clock sources. If a hardware fault occurs, the cell will degrade in capability or even become out of service, in addition to the following symptoms:

(47)

l Abnormal cell-level performance counters – Increased service drop rate

– Decreased handover success rate – Decreased access success rate l Related alarms

Background Information

Related Alarms

l Board overload alarm

– ALM-26202 Board Overload l Alarms related to RF modules

– ALM-26529 RF Unit VSWR Threshold Crossed

– ALM-26522 RF Unit RX Channel RTWP/RSSI Unbalanced l Cell capability degraded alarm

– ALM-29243 Cell Capability Degraded l Alarms related to CPRI links

– ALM-26235 RF Unit Maintenance Link Failure – ALM-26234 BBU CPRI Interface Error

– ALM-26233 BBU CPRI Optical Interface Performance Degraded – ALM-26506 RF Unit Optical Interface Performance Degraded l Alarms related to clock sources

– ALM-26263 IP Clock Link Failure – ALM-26264 System Clock Unlocked – ALM-26538 RF Unit Clock Problem – ALM-26260 System Clock Failure

– ALM-26265 Base Station Frame Number Synchronization Error

Possible Causes

Possible hardware faults that will cause handover faults are listed as follows: l A board is overloaded.

l An RF module is faulty.

l A common public radio interface (CPRI) link is faulty. l A clock source is faulty.

Fault Handling Flowchart

Figure 5-2 shows the fault handling flowchart for intra-RAT handover faults due to hardware faults.

(48)

Figure 5-2 Fault handling flowchart for intra-RAT handover faults due to hardware faults

Fault Handling Procedure

1. Check whether a hardware fault alarm is reported. Yes: Handle the hardware fault alarm. Go to 2. No: Go to 3.

No: Go to 3.

Typical Cases

Handovers between cell 0 and cell 2 under an eNodeB were normal with a high success rate, but the handovers from cell 1 under the eNodeB to its neighboring cells were abnormal with a relatively low success rate (7%) during busy hours.

Fault Diagnosis

1. Alarms about the eNodeB were checked. Cell 1 had reported ALM-26529 RF Unit VSWR Threshold Crossed.

2. As engineers of the customer confirmed, the eNodeB had been reconstructed recently. Therefore, it was highly probable that the RF connections became abnormal during the site reconstruction.

3. At the site, it was found that the jumper was not securely connected to the feeder, which had caused the cell malfunction.

Fault Handling

The jumper was securely connected to the feeder. According to the KPI log, the inter-cell handover success rate was restored.

(49)

5.5 Troubleshooting Intra-RAT Handover Faults Due to

Incorrect Data Configurations

This section provides information required to troubleshoot intra-RAT handover faults due to incorrect data configurations. The information includes fault descriptions, background information, possible causes, fault handling method and procedure, and typical cases.

Fault Description

l Handovers to neighboring cells are seldom initiated.

According to drive test results or signaling tracing results, the UE experiences relatively low signal quality in its serving cell. The signal level of neighboring cells meets the threshold for a handover, but handovers occur with a low probability This leads to a high service drop rate.

l Handovers to neighboring cells are frequently initiated.

The signal level and quality of neighboring cells are almost the same as those of the serving cell, but handovers to the neighboring cells are frequently initiated. This leads to poor quality of voice services and a high probability of service drops.

Background Information

None

Possible Causes

l Configurations of neighboring cells are incorrect.

If neighboring cells are not configured or incorrectly configured, handovers cannot be triggered even after the UE reports measurements of these neighboring cells.

l The X2 link is incorrectly configured.

If an X2 interface is incorrectly configured, handovers to some neighboring cells cannot be successfully executed. For example, if the IP path for an X2 interface is incorrectly configured, X2-based inter-eNodeB handovers cannot be executed; or, if the IP path from the target eNodeB to the source serving gateway (S-GW) is not configured, X2-based inter-S-GW handovers cannot be executed.

l Parameters such as handover thresholds, hysteresis, and time-to-trigger are inappropriately configured.

In the preceding handover scenario, a handover is triggered only when the signal level of a neighboring cell is higher than that of the serving cell by at least a certain amount. As a result, if handover parameters (such as the threshold, cell individual offsets [CIOs], hysteresis, and time-to-trigger) are inappropriately set, the probability of triggering handovers is either significantly low or significantly high.

Fault Handling Flowchart

Figure 5-3 shows the fault handling flowchart for intra-RAT handover faults due to incorrect data configurations.

(50)

Figure 5-3 Fault handling flowchart for intra-RAT handover faults due to incorrect data configurations

Fault Handling Procedure

1. Check whether the X2 link is incorrectly configured. Yes: Correct the X2 link configuration. Go to 2. No: Go to 3.

No: Go to 3.

3. Check whether there are missing configurations of neighboring cells. Yes: Complete neighboring cell configurations. Go to 4.

No: Go to 5.

5. Check whether handover parameters are incorrectly configured. Yes: Correct their configurations.

No: Go to 7.

RF Troubleshooting Guide Radio

Troubleshooting Guide

Huawei Technologies Co., Ltd.

About This Document

Purpose

Intended Audience

Product Versions

Change History

Organization

Conventions

Contents

About This Document...ii

1 Changes in eRAN Troubleshooting Guide...1

2 Troubleshooting Process and Methods...3

3 Common Maintenance Functions...12

4 Troubleshooting Access Faults...15

5 Troubleshooting Intra-RAT Handover Faults...31

6 Troubleshooting Service Drops...47

7 Troubleshooting Inter-RAT Handover Faults...59

8 Troubleshooting Rate Faults...66

9 Troubleshooting Cell Unavailability Faults...79

10 Troubleshooting IP Transmission Faults...93

11 Troubleshooting Application Layer Faults...101

12 Troubleshooting Transmission Synchronization Faults...110

13 Troubleshooting Transmission Security Faults... 115

14 Troubleshooting RF Unit Faults...125

15 Troubleshooting License Faults...142

1

Changes in eRAN Troubleshooting Guide

04 (2013-08-30)

03 (2012-12-29)

02 (2012-07-30)

01 (2012-06-29)

Draft A (2012-05-11)

2

Troubleshooting Process and Methods

About This Chapter

2.1 General Troubleshooting Process

2.2 General Troubleshooting Steps

2.2.1 Backing Up Data

2.2.2 Collecting Fault Information

Fault Information to Be Collected

Fault Information Collection Methods

Fault Information Collection Skills

Fault Information Classification

2.2.3 Determining the Fault Scope and Type

Service Faults

Equipment Faults

2.2.4 Identifying Fault Causes

Locating Equipment Faults

Locating Service Faults

2.2.5 Rectifying the Fault

2.2.6 Checking Whether Faults Have Been Rectified

2.2.7 Contacting Huawei Technical Support

Collecting General Fault Information

Collecting Fault Location Information

Contacting Huawei Technical Support

3

Common Maintenance Functions

About This Chapter

3.1 User Tracing

3.2 Interface Tracing

3.3 Comparison/Interchange

3.4 Switchover/Reset

4

Troubleshooting Access Faults

About This Chapter

4.1 Definitions of Access Faults

4.2 Background Information

Related Counters

Related Alarms

TopN Cell Selection

Tracing TopN Cells

Analyzing Environmental Interference to TopN Cells

4.3 Troubleshooting Method

Possible Causes

Troubleshooting Flowchart

Troubleshooting Procedure

4.4 Troubleshooting Access Faults Due to Incorrect

Parameter Configurations