CMC ICT3207 Call Flow

(1)

Speech digitization

The human ear is capable of perceiving frequencies in the

range of 16Hz-20KHz, known as the audio range, whereas

speech produces a narrow band of frequencies 100Hz-10KHz in this audio range.

A reduction in the bandwidth is desirable as it reduces the cost

of the communication systems.

An acceptable level of intelligibility of speech is obtained by

transmitting frequencies in the range of 300-3400Hz. Such a band-limited (a bandwidth of 3.1 kHz) speech signal is often called ‘toll’ (telephone) quality speech.

In this band-limited range of speech, the ear is most sensitive

to frequencies that lie around 3 KHz. In the case of female voice, maximum energy is distributed around this frequency, whereas in the case of male voice, the maximum energy occurs at a much lower frequency. That’s why women are preferred as telephone operators and announcers.

(2)

Speech digitization



A channel in a communication system has a finite

transmission loss and is subject to noise impairment.



When the length of the transmission path increases,

the signal-to-noise ratio at the receiving end

decreases.



In analog voice transmission, the effect of noise and

interference is most apparent during speech pauses

when the signal amplitude is near zero. Even

relatively low noise levels can be quite annoying to a

listener during speech pauses. The same levels of

noise may be unnoticeable when speech is present.



Hence, it is the absolute noise level of an idle channel

that determines the analog speech quality.

(3)

Speech digitization

 In a digital system, speech and speech pauses are encoded

with data pattern and transmitted at a constant power level.

 Signal regeneration at regular intervals bringing the signal to

the original level virtually eliminates all noise due to the transmission medium. Thus, the idle channel noise is determined by the encoding process and not by the transmission link in a digital system.

 Besides, the ability of the digital transmission to reject

crosstalk is superior to that of an analog system. First, low level crosstalks are eliminated because of the constant

amplitude signals. Second, high amplitude crosstalks result in detection errors and as such are unintelligible.

 Other advantages of digital systems include the ability to

support nonvoice services, and easy data encryption and performance monitoring.

 Although digital systems require greater bandwidth than

analog systems and transmission media like wire pairs cause greater attenuation when larger bandwidth signals are passed through them, the advantages offered by the digital systems

(4)

Sampling

 The first step in digitizing speech is to establish a set of

discrete times at which the input waveform is sampled.

 The discrete sample instances may be spaced either at

regular or irregular intervals.

 The minimum sampling frequency required to reconstruct the

original waveform from the sampled sequence is given by Nyquist criterion which can be stated as, f_S≥2H

where, f_S= sampling frequency or the Nyquist rate H= highest frequency component in the

input analog waveform

 H is the bandwidth of the input waveform if it is not band

limited with a lower cut-off frequency. In this case, the

original waveform is reconstructed by passing the sampled values through a low pass filter which smoothens out or interpolates the signal between sampled values.

(5)

Sampling

_ _{Sampling is a process of}

multiplying a constant

amplitude impulse train with the input signal. It is an

amplitude modulation process, where the pulse train acts as the carrier.

 Since the amplitude of the

pulses is modulated, the scheme is called pulse

amplitude modulation (PAM).

 The frequency spectrum of an

amplitude modulated signal, when the carrier is a sine

wave, has frequencies ranging from f_C-H to f_C+H, where f_C is the carrier frequency.

(6)

Sampling

 If the carrier is a pulse train, as in

the case in PAM, the output spectrum contains the

fundamental as well as the

harmonics of the fundamental.

 If the pulse train is a square wave

(50% duty cycle), only the

fundamental and odd harmonics are present.

 The low pass filter at the receiver

end allows only the baseband

component 0-H Hz to pass. If f_S is less than twice H, portions of PAM signal spectrum will overlap.

 This overlapping of the sidebands

produces beat frequencies that interfere with the desired signal and such an interference is

referred to as aliasing or foldover distortion.

 To avoid aliasing effects, the

minimum sampling frequency required is 6.8 KHz though in

digital telephone network, speech is sampled at 8 KHz rate.

•The filter used for band limiting the input speech waveform is known as antialiasing Filter.

•8KHz sampling results in oversampling which Provides for the nonideal filter characteristics

(7)

Quantization & binary coding



PAM systems are not generally useful over long

distances, owing to vulnerability of the individual

pulse amplitudes to noise, distortion & crosstalk.



The amplitude susceptibility may be reduced or

eliminated by converting the PAM samples into a

digital format, thereby allowing the use of

regenerative repeaters to remove transmission

imperfections before errors result.



With n bits, the no. of sample values that can be

represented is 2

n

_{. But the PAM sample amplitudes}

can take on an infinite range of values. Therefore,

it is necessary to quantize the PAM sample

amplitude to the nearest of a range of discrete

amplitude levels.

(8)

Quantization & binary coding

 Signal V is confined to a range

from V_Lto V_H, and this range is divided into M equal steps. The step size S=(V_H-V_L)/M

 In the center of each of these

steps we locate the quantization levels V₀, V₁,…,V_M-1. The quantized signal V_qtakes on any one of the quantized level values.

 A signal V is quantized to its

nearest quantization level.

 The boundary values between the

steps are equidistant from two quantization levels and a

convention may be adopted to

quantize them to one of the levels.

 Thus, the signal V_q makes a

quantum jump of step size S and at any instant of time the

quantization error V-V_q has a magnitude which is equal to or less than S/2.

 When the step size is uniform, it is

known as linear or uniform quantization.

V_q=V₃ if (V₃-S/2)≤V<(V₃+S/2) V_q= V₄if (V₄-S/2)≤V<(V₄+S/2)

(9)

Quantization & Binary Coding



The process of quantization itself brings about a

certain amount of noise immunity to the signal.



The quantized signal is an approximation to the

original signal. The quality of approximation may be

improved by reducing the size of the steps and

thereby increasing the no. of allowable levels.



However, reducing the step size makes the PAM

signal more susceptible to noise.



So, each quantized level is represented by a code

number which is transmitted instead of the

quantized sample value itself. If binary arithmetic is

used for coding, then the code number is

transmitted as a series of pulses. Hence, such a

system of transmission is called pulse code

(10)

Quantization & Binary Coding



The analog signal is

limited to -4V to +4V.



Step size is one volt.



Eight quantization levels

are used and are

located at -3.5V, -2.5V,

…,+3.5V.



Code number 0 is

assigned to -3.5V, the

code number 1 to -2.5V

and so on.



Each code number has

its equivalent 3-bit

(11)

A PCM system

 The analog input signal V

is band limited to 3.4 KHz to prevent aliasing and sampled at 8 KHz.

 The quantizer and the

encoder together perform the A-D conversion.

 The quantizer and the

decoder together perform D-A conversion at the

receiver.

 The quantized PAM levels

are then passed through a filter which rejects the

frequency components lying outside the

baseband and produces a reconstructed waveform of the original band

limited signal.

(12)

Quantization noise

 The instantaneous error

e=V-V_q is randomly distributed within the range (S/2) and is called the quantization error or noise.

 The average quantization

noise output power is given by the variance,

σ2₌

=S2_/12

 Signal to noise ratio (SQR) is

a good measure of

performance of a PCM system.

 SQR=1.76+6.02n dB

 



 

de e p

e ) ( )

(  2

(13)

Companding

 In linear or uniform quantization, the magnitude of quantization noise is absolute for a particular system and is independent of the input signal amplitude.

 Therefore, comparatively, the weak and low-level signals suffer worse from quantization noise than the loud and strong signals.  The very high percentage error

at low input signal levels actually represents idle channel noise.  The effect of this is particularly

bothersome during speech

pauses and can be minimized by choosing 0 volt level as a

quantization level and avoiding the mid points of the first

intervals on either side of the zero level as quantization levels.

e_f=(S/2)/|V|

For sinusoidal input, S=2V_m/M, Hence, e_f=[Vm/(M|V|)]×100%

(14)

Companding



The scheme which uses the

two first midpoints is known

as mid-riser scheme and the

other as mid-tread scheme.



The mid-tread scheme uses

odd number of quantization

levels, i.e., M=2

n

_-1



In mid-tread scheme, very

low signals are decoded into a

constant, zero-level output.



However, if a d.c. bias exists

in the encoder, idle channel

noise is still a problem with

mid-tread quantisation.

(15)

Companding

_ _{A more efficient method of minimizing}

large variations in the percentage

quantization error over the signal range is to use nonlinear or nonuniform

quantization.

 It is interesting to note that uniform

quantization intervals result in

nonuniform SQR over the signal range and nonuniform intervals result in

uniform SQR.

 The effect of permitting larger

quantization intervals at higher signal

amplitudes is to compress the input signal to achieve a uniform quantization level.

 The input signal is first compressed by

using a nonlinear functional device and then a linear quantizer is used. At the receiving end, the quantized signal is

expanded by a nonuniform device having an inverse characteristic of the

compression at the sending end.

 The process of first compressing and then

expanding is referred to as companding.

(16)

Companding

 A variety of nonlinear

compression-expansion

functions can be chosen to implement a compandor. The obvious one is a logarithmic law.

 Unfortunately, the function

y=lnx does not pass through the origin.

 So, it is necessary to substitute

a linear portion to the curve for lower values of x.

 Most practical companding

systems are based on a law

suggested by K.W. Cattermole.

 These equations are collectively

known as A-law used by India and other European countries.

 U.S.A & Japan follow a variation

of A-law known as µ-law.

For logarithmic section,

y=(1+lnAx)/(1+lnA) for 1/A≤x≤1 For linear section,

y=Ax/(1+lnA) for 0≤x≤1/A A=compression coefficient

The expansion function is given by, x=ey(1+lnA)-1_{/A for 1/(1+lnA) ≤y≤1}

(17)

Companding

 In practice, a piecewise linear

segment approximation is used.

 A-law companding consists of eight

linear segments for each polarity.

 The slope halves for each segment

except the lowest two segments which have the same slope.

 The lowest two segments of positive

& negative polarities coalesce into one straight line segment.

 As a result, there are 13 effective

segments in the curve and the law is sometimes referred to as

13-segment companding law.

 In µ-law, the slope halves in the

lowest two segments also, giving rise to 15 effective segments.

 Each segment is divided into 16

linear steps. Eight bits are required to represent each sample value: 1-bit sign, 3-1-bit segment number and a 4-bit linear step number.

 There are in all 256 defined signal

levels.

(18)

Differential coding

 PCM is not specifically designed for digitizing speech

waveforms.

 Speech waveforms exhibit considerable redundancy which

can be usefully exploited in designing coding schemes.

 The following characteristics of speech signals contribute to

the redundancy:

 Nonuniform amplitude distributions

 Sample-to-sample correlations

 Periodicity or cycle-to-cycle correlations

 Pitch interval-to-pitch interval correlations

 Speech pauses or inactivity factors

 A sizeable fraction of the human speech sounds is produced

by the flow of puffs of air from the lungs into the vocal tract. The interval between these puffs of air is known as the pitch interval. There may be as many as 20 to 40 pitch intervals in a single sound.

(19)

Differential coding

 Delta or differential coding systems are designed to take

advantage of the sample-to-sample redundancies in speech waveforms.

 Because of the strong correlation between adjacent speech

samples, large abrupt changes in levels do not occur frequently in speech waveforms.

 In such situations, it is more efficient to transmit or encode and

transmit only the signal changes instead of the absolute value of the samples.

 Delta modulation (DM) is a scheme that transmits only the

signal changes and differential pulse code modulation (DPCM) encodes the differences and transmits them.

 A delta modulator may be implemented by simply comparing

each new signal sample with the previous sample and transmitting the resulting difference signal.

 At the receiver end, the difference signals are added up to

construct the absolute signal by using an integrator.

 However, such a system, being open loop, suffers from the

possibility of the receiver output diverging from the transmitter input due to system errors or inaccuracies.

(20)

Differential coding

 The system can be converted into

a closed loop system by setting up a feedback path with an integrator at the transmitting end.

 When the input is constant, the

output of the transmitter is an alternating positive and negative pulse train. This constitutes the quantization noise in delta

modulators and is also known as granular noise.

 If the transmitter input signal

changes too rapidly, the receiver output is unable to keep up and this phenomenon is known as slope overload.

 This problem may be overcome by

using a variable slope integrator whose output slope is increased or decreased, depending on the rate of change of the input signal.

(21)

Vocoders

 By considering some of the properties that are more or less

unique to speech, such as pitch interval and cycle

correlations, significant reductions can be achieved in bit rates.

 Coding systems that are so specifically designed for voice

signal are known as voice coders or vocoders & operate typically at bit rates in the range 1.2-2.4 kbps.

 Vocoders take into account the physiology of the vocal cords,

the larynx, the throat, the mouth, the nasal passages and the ear in their design.

 The basic purpose of the vocoders is to encode only the

perceptually important aspects of speech and thereby reduce the bit rate significantly.

 As a result, the reproduced voice is synthetic sounding and

unnatural with artificial quality.

 Main applications include recorded message announcements,

encrypted voice transmission, voice mail etc.

(22)

Vocoders

 Human speech is generated in two basic ways:

 Voiced sounds generated as a result of vibrations in the

vocal cords.

 Unvoiced sounds formed by expelling air through lips &

teeth ( in the pronunciation Of s, p, t and f)

 Human speech can now be

modeled as a sequence of voiced and unvoiced sounds passed

through a filter which represents the effect of mouth, throat, etc. on the generated sounds.

(23)

Vocoders

 There are three basic types of vocoders:

 Channel vocoders

 Formant vocoders

 Linear predictive coders

 The speech spectrum exhibits sound specific structures with

energy peaks at some frequencies and energy valleys at others over short periods. Channel vocoders attempt to determine these short term signal spectrums as a function of time and take advantage of them.

 In addition, it also determine the nature of speech

excitations and the pitch intervals. The excitation

information is used at the receiver end to synthesize speech by switching the appropriate signal source for the required duration. The filter at the receiver implements a vocal tract transfer function.