ECHO CANCELLATION
6.4 TALKER ECHO LEVELS AND DELAY
This section is mainly relevant for line echo cancellation. The talker echo loud- ness rating and the delay relation are shown in Fig. 6.4 in relation to G.131 [ITU - T - G.131 (2003) ] and G.107 [ITU - T - G.107 (2005) ] recommendations. In PSTN, delays are lower than VoIP, and TELR requirements will be lower even for inter - regional calls. Several examples of delay and TELR combinations are given in G.131. Some of these examples with echo cancellation requirements are given in Table 6.2 (a). The fi rst four rows of the table are for without the echo canceller, marked as a “ None ” improvement under the echo canceller. The last fi ve rows are with the echo canceller operation. For the delays of 150,
Figure 6.4. TELR in relation to ITU - T - G.131 and echo cancellation requirements as per ITU - T - G.168. (a) G.131 TELR graph for limiting, acceptable, and for R - factor 90. (b) G.168 linear part echo cancellation. © G.168 linear and nonlinear part of echo cancellation. [Courtesy: Reproduced with the kind permission of ITU; International Telecommunication Union, Geneva, www.itu.int ).] 10 20 30 50 100 200 300 500 0 10 20 30 40 50 60 70 75 (a) TELR in dB TELR for R=90 Limiting case R=60 Acceptable case R=74 −30 −20 −10 0 −55 −50 −45 −40 −35 −30 LRE S (dBm) (b) −30 −20 −10 0 −70 −65 −60 −55 −50 LRin(dBm) (c) LRE T (dBm)
200, 300, and 500 ms, the corresponding TELR and the echo canceller enhance- ment are given in Table 6.2 .
In relating TELR from G.131 with G.107 R - factor, the TELR limiting and acceptable cases are given in G.131 as fi rst - level considerations. The limiting case is the echo level for which 10% of the listeners reporting echo problems. The acceptable level is 1% of the users reporting echo. Echo complaints are quality degradations captured in the E - model through R - factor estimation. Narrowband digital phones with a G.711 voice call can achieve a maximum R - factor of 93.2. TELR of the limiting case establishes R - 60, and the accept-
able case R - 74, which thus degrades voice quality. Maintaining the acceptable condition as a goal is not suffi cient because of a lower mean opinion score (MOS) with an R of 74. Hence, the goal has to be to achieve more than the acceptable case of the G.131 recommendation.
Additional graphs on TELR are given in G.131 for different TELR levels and the R - factor. In this section, TELR specifi c to R of 90 from G.131 is con- sidered the recommended value. In this book, TELR of R 90 is referred to as TELR(For R - factor 90). Refer to the latest revisions of ITU recommendations for any regularized naming convention for TELR exceeding the acceptable case. TELR with limiting, acceptable, and R - factor 90 are given in Fig. 6.4 . Refer to Chapter 20 for more details on other parameters in arriving at the R - factor. TELR can be noted from the graphs given in the G.131 recommen- dation or from the following equations. For mean one - way delay in ms “ t ” , TELR for these three categories are calculated as [ITU - T - G.131 (2003) , ITU - T - G.107 (2005) ] given below. The results from these equations are established
as graphs in Fig. 6.4 (a). It is noted by visual inspection that about a ± 1 - dB dif-
ference is observed between the equation estimates and the G.131 graphs. This deviation is insignifi cant compared with the TELR values of 65 to 75 dB.
TELR limiting( ) = + + gives R
+ ⎛ ⎝⎜ ⎞⎠⎟− − = 8 40 1 10 1 150 6 60 0 32 log t . t e t (6.1)
TELR acceptable( ) = +6 TELR limiting( ) gives R=74 (6.2)
Table 6.2. Talker Echo Requirements. (a) TELR Requirements for R = 60 to 90, (b) linear and NonLinear Part Cancellation Requirements as per G .168
ERL SLR RLR
Send + Receive Path Loss
EC Rejection dB TELR
One - Way Delay for R-factor 17 7 3 6 None 33 25 ms for R - 74 14 7 3 6 None 30 18 ms for R - 74 8 7 3 6 None 24 9 ms for R - 74 8 2 (min) 1 (min) 6 None 17 7 ms for R - 60 10 7 3 6 24 50 150 ms for R - 74 10 7 3 6 26 52 200 ms for R - 74 10 7 3 6 28 54 300 ms for R - 74 10 7 3 6 45 71 300 ms for R - 90 10 7 3 6 48 74 500 ms for R - 90 R in dBm S in with Phone ERL of 6 dB S RES as per G.168 ERLE - Minimum Linear Part S out as per G.168 Requirement NLP Improvements in dB − 0 − 6 dBm − 30 dBm 24 dB − 55 dBm 25 dB − 10 − 16 dBm − 38 dBm 22 dB − 65 dBm 27 dB − 20 − 26 dBm − 47 dBm 21 dB − 65 dBm 18 dB − 30 − 36 dBm − 55 dBm 19 dB − 65 dBm 10 dB
TELR R-factor90 16 TELR acceptable
30 40 1 10 1 150 ( ) = + ( ) = + + + ⎛ ⎝ log t t ⎜⎜ ⎞⎠⎟−6 −0 32 e . t (6.3)
In Fig. 6.4 , TELR limiting, acceptable, and recommended (R - factor 90) are given. From the G.131 graphs, it is observed that a TELR of R 90 is 15 to 17 dB higher than TELR acceptable values. A typical value of 16 dB higher than are acceptable case values is considered in this book for presenting R of 90. For the R - factor 90 condition, a TELR requirement at 300 - ms one - way delay is 72 dB. It is recommended to achieve higher TELR in practical deployments beyond values marked for an R - factor of 90.
6.4.1 Relating TELR and G .168 Recommendations
The echo canceller based on G.168 considers electrical input signal power to decide on required echo cancellation. TELR considers one - way delay and various losses from mouth to ear. Mean one way - delay used with TELR is not clearly connected with G.168 requirements. TELR is not relating input power.
It is useful to connect signal power, one - way delay, and echo rejection
requirements.
As per G.168 recommendations, echo canceller requirements are given in
Table 6.2 (b) through notation of R in , S in , S RES , and S out . The main goal as per
the G.168 recommendations is to maintain echo levels at lower than − 65 dBm.
By considering padding losses (send and receive path loss) of 3 dB in each
path, the talker echo level merges with ideal channel noise of − 68 dBm. Details
on ideal channel noise are given in Chapter 1 . From the Table 6.2 (b) [ITU - T - G.168 (2004) ], it can be noted that the linear part of the EC has to improve by 19 to 24 dB and that the nonlinear part has to improve by 10 to 27 dB
depending on the input reference (R in) level. Many EC implementations
exceed this requirement. The linear part is cancelled by 24 to 35 dB, which reduces the NLP requirements to 6 to 18 dB. Meeting G.168 requirements and
achieving echo reduction to − 65 to − 68 dBm closely matches with the R - factor
of 90 requirements at delays of 300 to 400 ms. At a 300 - ms delay, the TELR requirement is 72 dB for the R - factor 90 condition. This requirement takes
care of meeting − 68 dBm even for strong signals of the order of 0 dBm. In
summary, to maintain good quality from echo cancellation considerations, it is essential to exceed the requirements of the G.131 TELR acceptable condition. The echo level should submerge with ideal channel noise to minimize depen- dencies on delay and power levels.
6.4.2 Convergence Time
Convergence time [ITU - T - P.340 (2000) ] is the time interval between the
after all functions of the echo canceller have been reset and then enabled, and
the instant when the returned echo signal at the S out port is attenuated by at
least a predefi ned amount. During convergence local user speech S gen is
inactive.
As per G.168, rejected level at L RES in 1 s is 20 dB (including the phone ERL
of 6 dB) below R in level. The S out listed in Table 6.2 (b) are steady state values.
Steady state is 10 - s duration from the initial reset conditions. In practice, echo cancellers are designed to converge faster than the G.168 requirements. In practical implementations, echo cancellers are implemented with two to fi ve times faster convergence than G.168 recommendations. Fast convergence can also lead to fast divergence in bad conditions. The improvement in conver- gence has to be considered in relation to the robustness of the control function during nonadapting conditions.
6.5 ECHO CANCELLATION IN V o IP ADAPTERS
Echo canceller functioning between two VoIP adapters is represented in Fig. 6.5 . For easy representation, two phones (phone - A and phone - B) connected to two adapters are marked as A and B. These phones are shown to generate two different types of signals. Phone - A is voice from continuous reading of letters “ abcd, ” simply marked as abcd.. , and phone - B voice is also called voice - B or B - voice. In Fig. 6.5 , SLIC, hardware CODEC (ADC, DAC), echo cancel- ler, voice encoder, and decoder are shown as part of each VoIP adapter. The other voice functions of dual - tone multifrequency (DTMF), tones, packet loss concealment (PLC), other packetization, and VoIP signaling are not shown. Refer to Chapter 2 for details on complete VoIP adapter blocks and on an end - to - end VoIP voice call.
Figure 6.5. Echo cancellation between two VoIP adapters.
The VoIP adapter will interface with the telephone through an FXS TIP - RING interface. Hybrid inside the phone converts a two - to - four - wire interface for connecting voice to the handset microphone and speaker. In the VoIP adapter, hybrid converts the two - wire TIP - RING interface to the four - wire interface for ADC and DAC signals. In actual implementation, hybrid is part of the SLIC. ADC and DAC are part of the hardware CODEC chip also called the SLAC. In recent devices, SLIC and hardware CODEC are manufactured as a single device with few passive components located outside the chip [URL (Si3015) ]. The echo canceller is a software module working on a processor along with DTMF, voice compression, voice activity detection (VAD)/comfort noise generation (CNG), PLC, and call progress tones.
End - to - end echo cancellation steps are given as follows:
• In this echo cancellation example of Fig. 6.5 , phone - A is the talker and B
is the listener. Echo is generated at B and comes back to A as talker echo.
• When person at phone - A speaks (say “ abcd.. ” ), voice is fed into the
phone - A hybrid and is converted on to a two - wire telephone interface between phone - A and the VoIP adapter. In the adapter - A, the hybrid converts the two - wire signal into a four - wire interface. Voice from phone - A goes through ADC, echo canceller, voice compression path of VoIP adapter, and fi nally reaches adapter - B as IP packets. At destination VoIP adapter - B, voice packets are decompressed and sent to the telephone through SLAC and SLIC hybrid interfaces.
• In the end - to - end voice call between two phones, voice goes through a
total of four hybrids. All hybrids have a certain amount of leakage. This leakage is referred to generically as ERL. The phone hybrid is the most dominant echo creator. The ERL for a good VoIP adapter SLIC hybrid is of the order of 18 to 24 dB. In voltage scale, about one sixteenth (for 24 - dB ERL) of the voltage of DAC output directly enters the ADC path as leakage without going through the phone. This leakage of one sixteenth of abcd.. is marked in the path from ADC – SLIC – DAC in adapter - B.
• The main contributor for echo beyond adapter - B SLIC is the phone - B
connected at the listener. Phone - B will convert a two - wire TIP - RING to a four - wire (microphone and speaker) headset. A typical phone with an ERL of 12 dB returns one quarter (25%) of the received signal as echo. In Fig. 6.5 , signal one quarter of abcd.. voice is shown with dotted lines returning from the phone hybrid.
As shown in Fig 6.5 , a total of one quarter A ’ s voice from the phone hybrid and one sixteenth of A ’ s voice from the SLIC hybrid will be fed back to the ADC of VoIP adapter - B. This return signal will be heard as echo at phone - A. To cancel this echo, VoIP adapter - B will need an echo canceller in the B - to - A path. In general, this type of near - end cancellation in the VoIP adapter will
benefi t the other end. Overall, VoIP calls based on the VoIP adapter will need line echo cancellers. The echo cancellation requirements vary based on end -
to - end delays, phone characteristics, interfaces, loss planning of country,
and quality goals. Echo canceller implementations take care of these requirements.
6.5.1 Fixed and Nonstationary Delays
In Fig. 6.5 , fi xed and nonstationary delays are marked on the upper part of the fi gure. In the voice call between two gateways, telephone, telephone interface, and blocks on the telephone side offer stationary/fi xed delays. The echo cancel- ler can adapt to the fi xed delays. In this path of fi xed delay, the delay is not fi xed on a permanent basis. A user may connect different telephones or another parallel telephone, or conference in or out of the call, which changes the echo path in the middle of the conversation. These operations offer slowly varying delay/echo path models. In Fig. 6.5 , between two adapters, delay is not station- ary because of IP network and packet impediments. Continuous and sudden delay changes are not suitable for echo cancellation. Hence, far - end echo
cancellation is diffi cult with VoIP solutions. In far - end echo cancellation,
adapter - A has to cancel echo in its decoder path after the A - voice completes the round trip. Adapter - B will not be canceling the echo. Round - trip delay is not stationary in VoIP. Far - end echo cancellation is avoided in deployments, but such operation is possible with PSTN because of fi xed delays on the estab- lished voice call. To get the best out of echo cancellation, the echo canceller has to be mounted close to the echo path and must avoid nonlinearities and nonstationary echo paths.
6.5.2 Automatic Level Control with Echo Cancellers
Automatic level control (ALC), also known as automatic gain control (AGC), is more commonly used with acoustic echo cancellers. In AEC, level control
is used after S RES , as given in reference [ITU - T - P.340 (2000) ]. The positioning
of ALC is very important. In summary, the gain control should not appear
between R in and S RES of the echo canceller. ALC can appear before giving the
input R in or at the output of S RES or S out . The echo canceller as a four - terminal
block should not see gain variations because of ALC. Otherwise, ALC in the echo path destabilizes the adapted fi lter coeffi cients, and steady state echo cancellation will reduce signifi cantly. Echo path variations can create signal - level variations, and this is more common with acoustic echo. This echo path variation is unavoidable, and the echo canceller is supposed to takes care of this. Some characteristics of gain control and its relations can be found in ref- erences [ITU - T - G.169 (1999) , ITU - T - P.340 (2000) ]. As per the G.169 recom- mendations, the gain range is usually limited to 15 dB, and gain is controlled at the rate of 10 dB per second; typically, slower control is better.
6.5.3 Linear and Nonlinear Echo with Example
Echo cancellation happens by locally estimating the replica of the echo in the adaptive fi lter. With reference to Fig. 6.5 , the fi lter adapts and provides an
equivalent echo of “ abcd.. ” that will appear as a delayed and attenuated
(reduced in amplitude) version of the original. In the Fig. 6.5 example, about one quarter + one sixteenth of the original signal is given at the summing junction at adapter - B. The summing junction works for reducing the echo by 10 to 30 times (in dB scale, 20 to 30 dB). This improvement is the ERLE. The summing junction output is reduced in level in comparison with the original signal ( “ abcd.. ” ). Practically, the SLIC hybrid and phone hybrid will have certain nonlinear parts in their electrical path model. The echo canceller adap- tive fi lter adapts to the linear part of the SLIC and telephone hybrids. The nonlinear part appears as the residue at the output of the summing junction. The NLP block is used to remove the residue and to create comfortable noise in place of nonlinear echo.
For most voice power levels of − 10 to − 30 dBm, echo has to be removed to
the − 65 - dBm level ITU - T - G.131 (2000) ]. In practical echo cancellers, linear
echo cancellation removes echo by 24 to 35 dB. An additional linear part of echo residue and nonlinear part has to be improved by 12 to 24 dB to make a
total echo level of − 65 dBm. This process calls for nonlinear echo cancellation.
In lower end - to - end delays of the order of 50 ms, a linear echo cancellation of 24 to 35 dB would be suffi cient, assuming the requirements of TELR(for R - factor 90) are met.
Echo residue is processed in NLP. NLP works when it senses the presence of small residue echo when compared with the main reference signal. The NLP has to operate only when NLP input is a small residue. If phone - B voice also appears at NLP input along with residue, then NLP will not operate. The sim- plest function of NLP is to replace residue with zero signals during echo residue. When voice from phone - B appears, then NLP will pass the signal without introducing any distortions. When people at phone - A and phone - B speak simultaneously, then it is referred to as double talk (DT). During double talk, NLP will be disabled to avoid distorting the phone - B voice. In double talk, echo residue is not removed. When residue of “ abcd.. ” and voice from B
present in the S RES , then NLP will not operate. Hence, the echo residue will
pass to the NLP output as is without any degradation.
6.5.4 Linear Echo Improvement with 16 - Bit Samples
As illustrated in Fig. 6.5 , in the communication between telephone and echo canceller, ADC and DAC are present that interface between analog signals and digital samples through a pulse code modulation (PCM) interface. This
PCM interface can use 16 - bit linear or 8 - bit A/ µ - law. The 8 - bit format has a
limitation of a signal - to - noise or quantization ratio of only 38 to 40 dB, and for low amplitude signals, this will reduce to 27 dB. Hence, this quantization effect
limits the possible linear part of the echo cancellation. By making use of the 16 - bit linear format in the communication between ADC/DAC and the pro- cessor, the linear part of the echo cancellation can be improved. The linear part of cancellation is also limited by the linear part of echo at the phone and by the capabilities of the echo cancellation algorithms. In summary, when ADC/DAC limits the samples to 8 - bit, linear echo cancellation dB levels may be lower. Linear 16 - bit samples also improve voice quality compared with 8 - bit
A/ µ - law. As explained in Chapter 9 , wideband voice communication makes
use of 16 - bit linear samples to achieve better voice quality. A low - bit - rate codec such as G.729AB is also found to provide slightly better objective mea-