• No results found

Technical white paper Troubleshooting the top VoIP call quality issues

N/A
N/A
Protected

Academic year: 2021

Share "Technical white paper Troubleshooting the top VoIP call quality issues"

Copied!
12
0
0

Loading.... (view fulltext now)

Full text

(1)

Integrated Research – The people behind PROGNOSIS www.prognosis.com

Technical white paper

Troubleshooting the top VoIP call

quality issues

B y S ue B rads haw

Technology Writer

The top issues affecting VoIP call quality are often not caused by the application itself, but are related to its configuration, conflict with other applications for network resources and misalignment between VoIP and network design.

This whitepaper discusses troubleshooting the top VoIP call quality issues: poor voice quality, dial tone failure and call setup problems. It explains why specialized

management is essential for quality VoIP service delivery, reducing operational costs and increasing the return on your VoIP investment.

(2)

Executive summary

Troubleshooting the delivery of VoIP as a vital business service is complex. Much of this complexity comes from the way it uses the data network and spans multiple operational domains. Handover between these domains can add significant overhead to troubleshooting times and can make VoIP problem resolution up to four times more complex than in traditional telephony environments1.

The top issues affecting VoIP call quality are often not caused by the application itself, but are related to its configuration, conflict with other applications for network resources and misalignment between VoIP and network design. Like other distributed network applications, VoIP is affected by configuration errors, insufficient network bandwidth, poor hardware performance and outages.

This whitepaper discusses the potential root causes of the top VoIP call quality issues and examines the role specialized VoIP monitoring, management and reporting plays in troubleshooting them effectively. While the paper is not designed as a step-by-step troubleshooting guide, it explains how problems in other domains can impact the quality of VoIP.

Key findings

• The top issues affecting VoIP are poor voice quality, dial tone failure and call setup problems • The root causes of VoIP problems frequently span multiple operational domains

• Organizations should consider VoIP monitoring as critical to their business

• Specialized VoIP monitoring and management plays a vital role in helping support staff identify problems where they are experienced – from inside the VoIP domain

• VoIP management must fit with existing management frameworks and processes The importance of choosing the right tools

Choosing the right tools for the job can significantly reduce operational costs and increase the return on your VoIP investment.

Too often companies expect to solve their VoIP problems with existing network, event and PBX management tools. While these tools enable configuration, network monitoring and event driven alerts, they do not deliver the deep diagnostics, alerting and extensive reporting of components that can impact delivery of VoIP as a business service.

Specialist VoIP monitoring facilitates high quality service delivery by empowering VoIP support personnel to identify issues more quickly and thereby reduce mean-time-to-repair. It also reduces operational impact when integrated with existing management tools, business processes and team structures.

(3)

1. Troubles hooting poor voice quality

Voice quality issues are usually experienced by users in the VoIP domain, but are mostly caused by problems that manifest themselves in the LAN or WAN. Understanding the reasons for poor voice quality will help you recognize the root causes and how to tackle them.

Root cause: Insufficient prioritized bandwidth for voice traffic

In the early stages of deployment, the root causes of voice quality problems are usually to be found in the network. Network architects often aren’t aware of calling patterns prior to VoIP deployment, so it’s common practice to take a best guess at what capacity is required. If insufficient WAN bandwidth is available, the addition of voice traffic is likely to result in general network congestion. This means that voice packets may arrive late, out of order or be discarded, all of which will have a noticeable impact on voice quality.

In the early days of the rollout, as a lack of bandwidth is frequently perceived as the primary challenge to voice quality, the attempted remedy is often to purchase more bandwidth in small incremental chunks. However, this approach may still not accommodate the delay-sensitive, real-time demands of voice. A better approach is to accurately measure existing call patterns to determine what capacity may be required before adding more bandwidth. Your carrier may be able to provide this information to assist you with establishing capacity to support busy hour requirements and the use of the WAN to route calls to PSTN gateways and other locations.

It’s also worth remembering that even if sufficient capacity is available with a guaranteed on-ramp to a broadband freeway, you cannot assume that enough bandwidth will be there for the entire journey. Once voice packets cross into the carrier’s domain, they will compete for bandwidth with all other traffic. This is where data traffic management methods such as MPLS and Diffserv are critical to ensuring that voice quality is preserved for the entire path of the call.

Root cause: Switch/router (MPLS/Diffserv) configuration errors

Data traffic prioritization techniques are only effective in helping achieve good voice quality if implemented on every switch and router in the voice path. An accidental mis-configuration of even a single router involved in the voice path in just one direction, can cause delays that impact voice quality. It’s also vital to ensure that label switching parameters for voice packets are correctly implemented to leverage MPLS infrastructure and prioritize voice traffic. If using Diffserv, the correct configuration of forwarding treatment for particular flows and per-hop behavior is required to prioritize different classes of service and differentiate service level options.

In addition it should be remembered that voice is not the only traffic type that uses priority. Although voice should always be assigned the highest priority, video and call signaling also require priority over various other traffic classes.

Problem domain: WAN

Most likely when:

Early days of rollout and where QoS settings are not in place

Problem domain: LAN/WAN

Most likely when: QoS incorrectly configured

(4)

When configuring data traffic prioritization the impact of VoIP on other applications must also be taken into consideration.

Root cause: Device failure on primary route with insufficient bandwidth on alternate route

If a device fails on the primary route and insufficient bandwidth is available on the failover route, the addition of voice traffic is likely to cause route congestion. This will result in a noticeable impact on voice quality and means that other applications using the route may also be affected.

To assess the impact of a failover, active testing, or network assessment – can be used to check the performance of VoIP with other application traffic on both primary and alternate routes. By simulating packets generated from voice devices such as IP phones, voice gateways and voicemail servers, assessing engineers can observe how the network responds to variations in voice traffic patterns. They can then determine what network adjustments should be made to accommodate failover, usually in terms of QoS settings and link capacity.

An alternative to active testing is passive testing, or measuring of actual traffic using a specialized VoIP management tool. This monitors day-to-day operations providing a vital measure of end-to-end call quality and performance on both primary and alternate routes. Reports enable analysis of usage trends, allowing adjustments to be made and facilitating capacity planning for future voice traffic.

Root cause: Excessive bandwidth utilization to a particular location

Even if the overall VoIP service is performing well, a variety of factors can cause bandwidth over-subscription for a particular WAN link to a location. VoIP configuration errors, location bandwidth limits or excessive gateway use can all contribute to excessive bandwidth utilization.

a) Codec configuration errors

If calls use a high bandwidth codec, all available capacity to a particular location may quickly be consumed. This will result in poor voice quality for all calls to and from that location and will also negatively impact the performance of other applications.

To prevent this happening, codecs should be configured for each link to suit the available bandwidth. This might also require a resetting of user expectations when placing long distance calls or setting service levels.

Problem domain: WAN

Most likely when: Router failure has occurred

Problem domain: VoIP

Most likely when: Codec configurations are unsuited to available bandwidth

(5)

b) Jitter buffer settings

The fundamental way packets are sent across an IP network means that not all voice packets for a conversation are guaranteed to take the same route. Network congestion and jitter can cause voice packets to arrive unevenly or out of sequence, disrupting the conversation and causing garbled sentences.

Jitter buffers on routers, gateways and end-user devices need to be configured with the correct delay setting. If this is too short – for example 10 ms or less – packets might be discarded needlessly. Since delays of up to 50 ms aren’t detectable by the human ear, a higher delay setting of 30–80 ms is suitable for the jitter buffer to re-sequence packets.

c) Location bandwidth limits

Voice quality will suffer when there is over-subscription of available bandwidth to a particular location. VoIP users will experience poor voice quality as voice streams compete with each other and other types of traffic for bandwidth on the location’s link.

Bandwidth to a specific location can be protected by configuring a destination from within the VoIP application as a discrete location. Call admission controls can then be applied to protect voice traffic from the negative effects of other voice traffic and to reject calls when the volume exceeds specific thresholds.

d) Failure to configure a gateway as part of a location

The WAN link to a gateway can become over-subscribed by callers outside its location. This can happen when a dial plan routes every call for a dialed sequence to a single gateway. As a result, voice quality on all calls will suffer as too many calls are routed over the WAN link to the gateway’s location.

Configuring the gateway as part of a VoIP location allows you to include it in call admission controls for the location. In this way, voice quality on a WAN link to a particular location is protected from the negative effects of additional voice traffic.

Problem domains: VoIP, device Most likely when: Jitter buffer settings are incorrect

Problem domains: WAN, device Most likely when: Insufficient prioritized bandwidth to location

Problem domains: VoIP, device Most likely when: Gateway is not added as a location’s resource

(6)

Voice quality troubleshooting summary

In the majority of cases, voice quality issues occur because of problems or conflicts within VoIP or between VoIP and other applications on the network. Insufficient prioritized bandwidth, router configuration errors, device failures and location bandwidth limits will all have an impact on voice quality.

Specialized VoIP monitoring, management and reporting will help identify the causes underlying poor voice quality. Fine grained telephony-specific metrics, deep diagnostics and voice quality reporting can identify the stream or streams experiencing degraded quality.

Once the voice streams are identified, the granularity of detail provided makes it possible to analyze corresponding network performance on the voice stream path being traversed, in real time or historically. This will then provide the insights necessary to identify, develop work-arounds and resolve issues before they overly impact users or the business.

Such insights are vital to successfully managing the performance of voice as a business service over an IP network, and critical for the role it will play in the future of unified communications.

(7)

2. Troubles hooting failure to obtain dial tone

In the same way factors external to the VoIP domain can impact voice quality – network delays, poor hardware performance and outages can also contribute to dial tone failure.

Understanding what can cause dial tone failure may help you recognize the root causes and how best to resolve them.

Root cause: Phones de-registered from primary call server

Phones de-registering from the primary call server or IP-PBX can be caused by software, hardware or digital signal processor problems on the phone, network failures, configuration errors or software bugs.

To help identify if the source of the problem lies with the phone, call server or network, it is useful to establish if the phone or the call server supplies dial tone. This allows you to determine a starting point for troubleshooting. If the phone is functioning correctly or a replacement phone exhibits the same problems, then troubleshooting needs to extend to the data network.

This allows you to confirm that critical servers are available to the phones and that their IP addresses are valid. If the phone is able to reach these servers on the LAN or WAN, check that the software in the phone is working properly and that the phone has the correct firmware version.

Failure to register with any call server typically indicates a broader network problem and this can introduce the added complexity of a domain handover. The problem is manifested in the VoIP application, but it is core network configurations and infrastructure that must be checked. The network manager will need to confirm DHCP servers are functioning properly with correct security settings and are able to issue IP addresses that are not blocked by firewall or router settings. Also, they may need to investigate faulty network modules or foreign exchange station cards that may cause silence or absence of dial tone in a voice port.

Specialized VoIP management is very useful in troubleshooting dial tone failure because it clarifies the relationships between individual components and calculates their status based on their inter-dependencies.

Root cause: Failure of backup device

Failure to obtain a dial tone from a backup device may be caused by the device itself, and also by the inability of phones to reach and register with it. If phones are attempting to fail over, it is essential to ensure that backup devices are available and that they are accessible across the LAN or WAN. Troubleshooting should include checking phone and backup device status, failover configurations and network status.

Problem domains: Device, LAN/WAN, VoIP Most likely when:

Hardware, software or network failures have occurred

Problem domains: Device, LAN/WAN, VoIP

Most likely when:

Hardware or network failure, or configuration errors have occurred

(8)

Root cause: Download requests failing when phone registers

Even phones that supply their own dial tone require access via TFTP or HTTP/S, (depending on vendor implementation) to various download servers for firmware, software updates, configuration information and locale data. If requests to provide scripts, applications, images, settings, boot files or firmware are failing, it is advisable to check server availability, capacity, performance and security.

You will need to ensure that the phones are able to reach the required servers, that the correct file versions are being downloaded and that phones requesting the files have sufficient permissions to receive them.

Root cause: Severe PBX overload

A severely overloaded PBX or call server will be unable to service all the requests it receives, including supplying dial tone. The PBX call load should be monitored to identify the busiest hour and the most common daily busy hour. Checking capacity and performance of the call server at these times will identify when it becomes overloaded and unable to supply dial tone. If data is sourced from multiple PBXs, you’ll need to calculate busy hour statistics for each PBX individually, as well as aggregate it for all PBXs. This will let you know how far the telephony system is pushed at its peak.

Specialized VoIP monitoring and reporting can provide reports to establish call throughput, call attempts, call failure and success rates and CPU utilization. This information provides invaluable data for troubleshooting, capacity planning, PBX consolidation and failover contingency planning.

Dial tone failure troubleshooting summary

Dial tone failure can be caused by phones deregistering from the primary PBX or call server, network or server failures, and overloading of the PBX. Troubleshooting such a variety of causes is likely to incorporate handovers between operational domains which introduces additional complexity.

Specialist VoIP monitoring and reporting is invaluable in analyzing PBX usage patterns and status, and categorizing phones by make, model and firmware version. This will help isolate and resolve problems to minimize the occurrence of VoIP issues bouncing back and forth between different domains.

Problem domains: Device, LAN/WAN, VoIP

Most likely when: Problems with availability, accessibility and configuration of download and call servers

Problem domain: PBX

Most likely when:

(9)

3. Troubles hooting call s etup problems

The complexity of troubleshooting call setup failures is compounded by the many interconnected components and domains the setup process spans. Calls can fail to set up correctly if they cannot obtain a dial tone or access to the PBX, voice gateways, the PSTN or locate sufficient trans-coding resources. Calls will also fail to set up if there are long delays that cause the user to abandon the attempt.

Understanding some of the many reasons for call setup failure will help you recognize the root causes and how best to tackle them.

Root cause: Gateway interface going down

A gateway cannot set up calls to the PSTN if there is a hardware failure or insufficient allocation of redundant channels and trunks to a dial plan. This will become a serious issue if redundancy is not configured in dial plans or if all redundant routes are down or busy. If the gateway appears to be down it is advisable to check configuration, capacity and consumption of PSTN resources.

Even if your gateway is functional, users may receive a fast-busy signal due to network congestion somewhere outside your control. If the VoIP call setup relies on local exchange carriers, any congestion within their network may cause your call setup attempts to fail.

Specialized VoIP management provides positive feedback on gateway status during normal operations and generates alerts if gateways degrade or fail. This exposes impending problems, allowing you to investigate and resolve them before users are overly impacted.

Root cause: Insufficient DSP resource for transcoding or conferencing

Increased call setup times and failures can result from insufficient digital signal processor (DSP) or mislocated trans-coding resources. As resources that handle conferencing and codec conversion are usually located in pools, careful thought needs to be given to their placement. If calls need to be bridged and or conferenced it’s important that DSP resources are in the correct location. This prevents calls traversing the network, consuming bandwidth and decreasing available resources for other users.

Specialized VoIP monitoring and management is invaluable for analyzing resource usage to identify and rectify causes of call setup failure when caused by insufficient DSP resources.

Problem domain Gateways

Most likely when:

Hardware failures and no dial plan redundancy

Problem domains: Device, LAN/WAN, VoIP

Most likely when:

Insufficient or mislocated DSP resources

(10)

Root cause: Bandwidth limit reached to destination

If call admission controls are in place, new call setup attempts will be rejected when the bandwidth limit is reached to a destination location. This is done intentionally to prevent new voice traffic negatively impacting existing voice traffic. Bandwidth use to a destination location can be controlled based on the number of calls, or by bandwidth thresholds to ensure sufficient capacity is available.

Unless dial plan redundancy is configured, a user attempting to set up a call that exceeds the thresholds set by call admission controls will fail. The call will only be set up successfully once sufficient bandwidth is available to the destination.

VoIP monitoring and analysis allows you to assess the number of calls that are being rejected due to bandwidth limits to particular locations. It will provide a detailed breakdown of calls by origin, routing, PBX or call server so you can make capacity planning decisions to support call setup requirements. This may include increasing the percentage of existing bandwidth for calls, increasing overall bandwidth to the location or routing them somewhere else via dial plan redundancy configuration.

Root cause: Route pattern/dial plan congestion/failure

Dial plan congestion may be caused by insufficient gateway resources, a downed PRI, or call admission controls rejecting new call setup attempts. It can also occur if insufficient channels or trunks are allocated to the dial plan to accommodate the volume of calls.

For example, a dial plan may try to use a remote SIP gateway. The PBX routes the call to the gateway but the gateway is down and does not respond. The specified alternate gateway is down and a third gateway has resources available but is in a location whose call admission controls prevent the request from being serviced.

All these factors result in the user’s call setup being unsuccessful because the route pattern or dial plan cannot access the resources it needs. However, troubleshooting dial plan related call setup failures is particularly complex because the relationships between dial plans, route partition tables, route patterns and trunk groups are not always obvious.

This is where specialized VoIP monitoring and analysis is crucial. It provides route and dial plan information with deep drill-down diagnostics that help determine which trunks or gateways are causing the problems – and can dramatically reduce mean time to repair of service-impacting issues.

Problem domains: LAN/WAN, VoIP Most likely when: Call admission controls in place or absence of call admission controls and excessive traffic on WAN link

(11)

Root cause: Users abandon calls due to long call setup delay

In traditional telephony, users are accustomed to hearing a ring-back tone within five seconds of dialing a number. If they have to wait much longer than that they may lose patience and abandon the call. Default waiting periods for call signaling timers within VoIP introduce the potential for much longer delays. In fact, it’s possible to wait up to 32 seconds for a ring-back tone, but from the application’s perspective, nothing has gone wrong.

As well as long default waiting periods in the signaling protocols, failure to hear a ring-back tone may also be due to unavailability of a DSP resource to play it, or voice quality issues between the DSP resource and the phone. This makes it particularly difficult to identify which calls are actually failing, compared to those that have been abandoned by the caller because they didn’t hear a ring-back tone.

Specialist VoIP management enables root cause analysis by monitoring and reporting on call load and attempted, failed and rejected calls. This helps support staff identify which calls were successfully set up, which failed, and why.

Call setup failure troubleshooting summary

Successful call setup incorporates every component required to establish a path for the voice stream. Dial tone failure, inability to access the PBX, route calls to gateways, the PSTN or failure to locate trans-coding resources are all reasons call setup attempts can fail.

In addition to these root causes, failure to configure alternate routing instructions and paths through the IP network, a gateway PRI going down, device, network failures and excess bandwidth usage will also contribute to call setup failure.

Specialized VoIP management assists support teams with troubleshooting call setup failures by monitoring and reporting on every component required to successfully set up calls.

Problem domains: LAN/WAN, VoIP Most likely when:

Delayed or absent ring-back tone

(12)

C onclus ion

The top issues affecting VoIP call quality are often not caused by the VoIP application itself, but are related to its configuration, conflicts with other applications for network resources and misalignment between VoIP and network design. Therefore, to effectively troubleshoot VoIP problems, you must monitor, measure and manage its availability and performance across all the operational domains it spans.

However, cross-domain troubleshooting is highly complex. Much of this complexity is due to the way VoIP uses the network and spans multiple domains of operational responsibility. Finger pointing and handover between operational domains can make VoIP troubleshooting up to four times more complex than in traditional telephony environments.

Incorporating specialized VoIP monitoring as part of an integrated management framework and defining efficient problem handover between operational teams will ease cross-domain problem management. Companies that pay attention to the complexities of VoIP management can significantly lower their annual operational costs by up to 90 percent and in so doing realize a rapid return on investment.

Additional reading

1. Third party VoIP management on the rise (by Nemertes Research) 2. The ROI of VoIP Management (by Nemertes Research)

These documents and additional papers covering VoIP management are available at www.prognosis.com

About PROGNOSIS

As a specialized tool, PROGNOSIS is designed to monitor the performance of VoIP as a business service, providing the insight needed to identify and resolve issues before they impact the business or its customers. PROGNOSIS delivers intelligent alerting, thousands of VoIP specific metrics, deep diagnostics and reporting to help ensure the highest possible call quality and reliability.

Providing a single, unified view across Cisco, Avaya and Nortel VoIP environments, PROGNOSIS has been proven to manage hundreds of IP-PBXs and hundreds of thousands of phones. Its scalability, flexible deployment options and customizable design make it the ideal solution for large enterprises and service providers. Integrating into organizations’

References

Related documents

“the capability to provide service bundles which support the development of USOs’ entrepreneurial competencies in a sustainable way within the university structure.” Service

Based on these experimental results, an approach about effects of silt content on soil consolidation characteristics and relationships between fines content, soil

Edwards (1998) explained that after taking into account the roles of all other factors including capital accumulation, growth in labour force including differences

In line with our hypothesis, increasing tree species richness lowered damage by oak insect herbivores and oak powdery mildew, which confirms the general pattern of

Any motion for extension of time must be supported by an affidavit from the responsible court reporter, court monitor, district court clerk or any other party whose duty it is

When the phone is on-hook, displays the call logs (Missed Calls, Received Calls, Placed Calls) and your Speed Dials.. Using

Please discuss location and open hours of hall concession stands, food service areas, and other Reliant Park food facilities with your Event Manager and Catering