HARDWARE IMPLEMENTATION OF THE PCM CODEC FOR VOIP TELEPHONY

(1)

HARDWARE IMPLEMENTATION OF

THE PCM CODEC FOR VOIP

TELEPHONY

LUCIEN NGALAMOU

School of Engineering, Grand Valley State University 301 W. Fulton Street, Grand Rapids, MI 49508, USA

MARCUS GEORGE

Dept. of Electrical and Computer Engineering, University of the West Indies St. Augustine, Trinidad and Tobago

LEARY MYERS

Department of Physics and Engineering, University of the West Indies Mona Campus, Kingston 7, Jamaica

Abstract:

This paper presents the hardware implementation of the pulse code modulation coder and decoder (PCM Codec), that aims at improving the delay of the speech compression of this codec, which is originally implemented in software for Voice-over Internet Protocol (VoIP) Telephony. This design provides a solution known as intellectual property (IP) core ready to be deployed in a reconfigurable architecture with other voice codecs. Such architecture can then be used to improve the Quality-of-Service (QoS) of VoIP Telephony.

Keywords: PCM; Delay; Hardware Implementation; IP Core; VoIP Telephony.

1. Introduction

(2)

The quality of service refers to a measure or guarantee of performance. It is generally accepted that IP data networks are “best effort” networks and therefore transmitted data is not guaranteed to arrive at its destination. Also the time for different packets to arrive is variable. For conventional data, this is not a problem since the delay involved is not important and protocols such as TCP can be used in order to ensure that all data is properly received. But for real-time services such as VoIP, these events will affect system performance.

Quality-of-Service (QoS) factors [Ribadeneira (2007)] which are typically measured for VoIP traffic include delay, jitter and packet loss. Delay is defined as the length of time required to move a packet/signal from source to destination. Jitter is defined as the variation in the sequence of packets received on the receiver side relative to the sequence of packets sent from the sender side. Packet loss is defined as the percentage of packets transmitted from the sender which are not delivered to the receiver. Jitter and packet loss depend entirely on the status of the network. Delay however has some dependence on components of the IP network.

Figure 2 [Arora (1999)] illustrates various types of delays existing on a VoIP network. Algorithmic delay is introduced by the speech compression algorithm and is dependent on the coding algorithm. Packetization delay is the time over which samples are accumulated before they are packeted. Serialization delay is the time required to initially transmit the IP packet and also includes delays incurred whenever a packet passes through a store-and-forward device (such as a router or a switch).

Propagation delay is the time required for the electrical or optical signal to travel along a transmission medium and is a function of the geographic distance. Component delay is caused by the various components within the transmission system. For example, a segment passing through a router has to move from the input port to the output port through the backplane. The focus of this work is on quantifying and potentially reducing algorithmic delay. Processing delay is the time required to process the header of a packet received from the IP network after transmission. Finally, playback buffer delay is the time required to reproduce speech samples from received packets.

Figure 2: Various delays associated with a VoIP system [Arora (1999)]

1.1. Speech Compression

A speech signal is a sound formed by humans in order to communicate. Like all other sound, it is a perturbation of the atmospheric pressure that travels in the form of a sound wave [Juan (1995)]. To transport speech over communication networks, the speech signal must be transformed from a sound wave to an electrical signal. This is usually done with a microphone and the result is an electrical signal where the voltage varies with the pressure of the speech signal.

(3)

vocal chords allow the passage of air into the vocal tract. They can be either opened or closed. When closed (voiced signals) they create a periodic oscillation with a fundamental frequency, known as the pitch. On the other hand when open (unvoiced signals) the pressure signal is not modulated.

1.2. Pulse Code Modulation (PCM)

The PCM algorithm is a waveform based speech compression algorithm. The international standard for PCM is ITU-T G.711 [ITU (1972)]. The G.711 algorithm PCM uses logarithmic scalar quantization and represents each speech sample with 8 bits. Logarithmic scalar quantization places most of the quantization steps at lower amplitudes by using the non-linear function logarithm function. The PCM algorithm converts speech in 16-bit uniform PCM format to 8-bit μ-law or A-law PCM format.

The μ-law/A-law PCM byte comprises of three components: sign bit, segment and interval. The sign of the input PCM sample is represented by bit 7 (MSB) of the uniform PCM byte, the segment to which the input PCM sample belongs to is represented by bits 4 to 6, and the position within the segment that the input PCM sample resides, known as the interval value is represented by the lower nibble. PCM is widely used due to its simplicity. This results in low algorithmic delay and high decoded speech quality but also a relatively high bit rate.

1.3. Objectives and Paper organization

The aim of the work presented in this paper is to demonstrate that PCM speech compression algorithm can be implemented in hardware with lower algorithmic delay than software implementations. To achieve this aim the following objectives were pursued:

a) Determine the latency (algorithmic delay) of software-implemented PCM speech compression algorithm. b) Produce hardware implementations of the PCM speech compression algorithm.

c) Determine the latency for the hardware-implemented PCM speech compression algorithms.

d) Compare the latency of both hardware and software implementations of the PCM speech compression algorithm.

The remainder of the paper is organized in three sections. Section 2 presents the profiling of the PCM’s software implementation, section 3 deals with the IP core design of the PCM, and section 4 concludes the paper.

2. Profiling the Software Implementation of PCM

Profilers are tools which monitor the time each instruction takes to execute in a given program and hence the total time needed for execution, i.e. the frequency with which functions call other functions. Profiling on Linux operating systems can be performed by compiling code with the GNU compiler so that profiling data is recorded during execution [Fenlason (1992)]. GProf [Gprof (2010)] is then run to analyze the profiled data.

Execution times displayed in the flat profile are estimates because they include overhead from other tasks running on the operation system. The execution time in nanoseconds displayed in the flat profile can be converted to machine cycles e.g. A desktop computer with 2.4GHz Intel(R) Pentium Duo Core E2220 processor is capable of executing at a clock speed of 2.4×109_{cycles per second. The processor speed specified corresponds to the speed at}

which the processor operates. A dual core processor with a clock speed of 2.4GHz has two processors, each operating with a clock speed of 2.4GHz. An execution time can be converted to cycles by computing the product of the execution time in seconds and the number of cycles executed per second on the processor eg. 27ns = (27×10-6_{) ×}

(2.4×109_{) = 64.8 cycles.}

Gprof gives execution times in seconds, with a resolution of 0.1 second. Therefore, to determine the time required toexecute a short routine, the routine must be invoked by a program multiple times, after which the average execution time can be computed.

(4)

Sun Microsystems

Name of function end-to-end execution

time / ns cycles

PCM Encoder 33 (33×10-6_{) × (2.4×10}9_{) = 79200}

PCM Decoder 36 (36×10-6_{) × (2.4×10}9_{) = 86400}

Table 1: Latency of software implemented PCM speech compression algorithm from Sun Microsystems

3. Hardware Implementation of the PCM Codec

3.1. PCM Speech Compression Algorithm

The PCM encoder converts data in 16-bit two’s complement format to 8-bit μ-law/A-law PCM format using logarithmic scalar quantization. The PCM decoder converts the μ-law/A-law PCM byte received to 16-bit two’s complement format. The μ-law/A-law PCM byte (Figure 3) comprises of three components: sign bit, segment and interval. The sign of the input sample is represented by bit 7 (MSB) of the μ-law/A-law PCM byte, the segment to which the input sample belongs to is represented by bits 4 to 6, and the position within the segment that the input sample resides, known as the interval value is represented by the lower nibble.

Figure 3: Structure of a μ-law/A-law PCM byte

3.2. PCM Encoder Design

The PCM encoder (Figure 4) converts data in 14/13-bit uniform PCM format to 8-bit μ-law/A-law PCM format, and consists of three units. The input PCM format conversion unit converts speech in 16-bit two’s complement format to 14-bit two’s complement format (14-bit uniform PCM). The component extractor unit along with the XOR operation convert data in 14-bit two’s complement format to 8-bit μ-law/A-law PCM format. Among these three units, only the operation of the component extractor must be initiated and controlled. However a controller unit was not included because the latency of the input PCM format conversion and XOR units were very short and their operation does not need to be initiated or controlled. It can therefore be concluded that the exclusion of a controller unit would not have adversely affected the operation of the PCM encoder.

Component Extractor

start

pcm_data

finish clk reset

RESET

8

XOR Unit

law

μ-law/A-law_PCM

8

8-bit μ-law/A-law PCM

14

Input PCM Format Conversion

data

law

16

16bit_TC _law

16bit TC data 14/13bit_uniform_PCM

pcm_data

Figure 4: Datapath block diagram of the PCM Encoder

(5)

component extractor responsible for the computation of the interval was designed by applying the hardware design technique outlined in Equation 1.

2

1111

_

/

_

int







boundary

lower

boundary

upper

boundary

lower

law

A

law

erval



Equation 1: Interval computation for μ-law/A-law Component Extraction process

An integer multiplier and divider were used instead of floating point units in the computation of the interval because the quantized version of the interval has to be stored in the lower nibble of the μ-law/A-law PCM byte. The integer multiplier was placed before the divider to ensure that output values less than 1 are not obtained, thereby eliminating the need for floating point units to ensure correct computation of the interval. Another reason for choosing integer units is that they require less hardware than floating point units.

(6)

(7)

3.3. PCM Decoder Design

The PCM decoder, consists of three units, (Figure 6) and converts speech in 8-bit μ-law/A-law PCM format to 16-bit uniform PCM format. . The units are: theinverse-component extractor unit along with the XOR operation convert the μ-law/A-law PCM byte to 14-bit two’s complement format, and theoutput PCM format conversion unit converts speech in 14-bit two’s complement format to 16-bit two’s complement format. A controller unit was not included for the same reason given for its exclusion in the PCM encoder.

Figure 6: Datapath block diagram of the PCM Decoder

The inverse-component extractor (Figure 7) first extracts the sign bit (MSB) installed it as the sign bit of the 14-bit two’s complement sample. The segment and interval were used in computation of the least significant 13-bit of the two’s complement output. This section was designed by applying the hardware design technique outlined in Equation 2.

boundary

lower

boundary

upper

B

boundary

lower

A

B

value

erval

A

law

A

law

_

]

)

10000

(

_

int

[

_

/

_

2













Equation 2: μ-law/A-law value computation in the μ-law/A-law Computation process

(8)

Se g m e n t Boundary LUT S ubtract o r (i nteger) M u lti p li er (i nteger) Divi der (int eger) seg m en t up per _bo unda ry low er_ bou nda ry A B di ffe renc e Bi n Bou t A B sta rt star t pr odu ct qu ot ie n t fin ish fini sh clk clk res et rese t 3 0 RE SE T 13 13 RE SET 26 star t_d iv A B st ar t_ m u lt si gn bi t 8 in ter val _val ue 13 26 Ad d e r (i nteger) 13 A B 13 su m 0 Co ut Ci n low er _bou nda ry si gn 14 13 4 13 law 00 0000 000 (0) 9 13 00 0000 0000 0000 0000 0000 1111 (15 ) Co ntr o ll er U n it sta rt_ m u lt m ult_d one cl k rese t div_ don e st art_ div to ‘ sta rt’ o f d iv id er to ‘ sta rt’ o f m u lt ip lie r RE SET sta rt done fr om ‘f in is h ’ of d ivi d er from ‘f inish ’ of m ultipl ier pc m _d ata 14 -b it uni fo rm PC M la w

Figure 7: Datapath block diagram of the Inverse Component Extraction unit

3.4. Hardware Implementation of the PCM Speech Compression Algorithm

(9)

Figure 8. Interface View of the G7.11 Codec generated as a RTL Module

After implementing the PCM speech compression algorithms, they were synthesized in the Xilinx 11.1 ISE environment in preparation for verification and validation (figure 9).

Device Utilization Summary (estimated values) [-]

Logic Utilization Used Available Utilization

Number of Slices 4515 7680 58%

Number of Slice Flip Flops 7126 15360 46%

Number of 4 input LUTs 4737 15360 30%

Number of bonded IOBs 50 173 28%

Number of MULT18X18s 16 24 66%

Number of GCLKs 8 8 100%

Figure 9: Details of the resource utilization for the PCM Hardware Implementation. The timing summary is given below.

Timing Summary: --- Speed Grade: -5

Minimum period: 29.005ns (Maximum Frequency: 34.477MHz) Minimum input arrival time before clock: 23.434ns

(10)

3.5. Verification of the hardware implemented PCM speech compression algorithm

Test cases for the verification tests consisted of input samples belonging to each of the following categories: small, intermediate and large values along withpositive and negative values. The expected outputs for the PCM encoder were obtained from [ITU (1972)]. The actual outputs of the hardware-implemented PCM encoder were obtained through simulation using Modelsim XE 6.4b using the chosen test cases. The actual outputs were compared with the expected outputs to verify that the hardware-implemented PCM encoder correctly compressed the input test sequences.

The expected outputs of the implemented PCM encoder were used as the inputs to the hardware-implemented PCM decoder. The expected outputs for the PCM decoder were obtained from [ITU (1972)]. The actual outputs of the hardware-implemented PCM decoder were obtained through simulation using Modelsim XE 6.4b. The actual outputs of the hardware-implemented PCM decoder were compared to the input samples to the hardware-implemented PCM encoder to verify that the hardware-implemented PCM decoder correctly decompressed selected inputs.

The verification test results (Table 2 and Table 3) indicate that the actual outputs of the PCM encoder and decoder in μ-law and A-law modes corresponded to the expected outputs from [ITU (1972)]. This demonstrated that hardware-implemented PCM encoder and decoder correctly compressed and decompressed the input samples respectively.

u-law PCM Encoder u-law PCM Decoder

Input Sample

Expected Output

Actual Output

Input Sample

Expected Output

Actual Output

14 11111111 11111111 11111111 14 14

198 11010111 11010111 11010111 198 198

-198 01010111 01010111 01010111 -198 -198

710 10111111 10111111 10111111 710 710

2246 10011111 10011111 10011111 2246 2246 -2246 00011111 00011111 00011111 -2246 -2246 4294 10001110 10001110 10001110 4294 4294 7923 10000000 10000000 10000000 7923 7923

Table 2: Verification tests results for PCM Encoder and Decoder in u-law mode

A-law PCM Encoder A-law PCM Decoder

Input

Sample Expected Output Output Actual Sample Input Expected Output Output Actual

14 11010101 11010101 11010101 14 14

198 11111101 11111101 11111101 198 198 -198 01111101 01111101 01111101 -198 -198 710 10010101 10010101 10010101 710 710 2246 10110111 10110111 10110111 2246 2246 -2246 00110111 00110111 00110111 -2246 -2246 3270 10111101 10111101 10111101 3270 3270

Table 3: Verification tests results for PCM Encoder and Decoder in A-law mode

3.6. Validation of the hardware implemented PCM speech compression algorithm

(11)

encoder was 1760 times shorter than the latency of software PCM encoder, while the latency of the hardware PCM decoder was 1878 times shorter than the latency of the software PCM decoder.

PCM Functional Module Average End-to End Execution Time / Cycles Hardware Implementation

(This paper) Software Implementation (Sun Microsystem)

PCM Encoder 45 79,200

PCM Decoder 46 86,400

Table 4: Performance Comparison of Hardware and Software Implementations of PCM

4. Conclusion

The results presented in this paper clearly show that the actual outputs of the PCM encoder and decoder in both μ-law and A-law modes correspond to the expected outputs from (ITU 1972). This is a clear demonstration that hardware-implemented PCM encoder and decoder can correctly compress and decompress their input samples respectively and hence can be considered equivalent implementations of the PCM encoder and decoder. The validation results indicates that the latency (algorithmic delay) of both the hardware-implemented PCM encoder and decoder in both μ-law and A-law modes were 1760 and 1878 times shorter than the latency of their software counter-parts from Sun Microsystem respectively. This indicates that the algorithmic delay of the PCM speech compression algorithm which has been implemented in software can be significantly reduced by implementation in hardware.

Success in analyzing and implementing these speech compression algorithms in hardware was achieved As the PCM encoder and decoder correctly compresses and decompresses their input sequences respectively.

The main advantage of using hardware is speed as these algorithms use a great deal of processing power for computation. Software has the disadvantage of being slower but however has the benefit of flexibility. This implementation is customized to be used as an IP core in a upcoming project relating the application the application of reconfigurable architectures for efficient voice and video deployment over internet protocol.

References

[1] Arora, Rakesh. 1999. Voice over IP: Protocols and Standards. June 23. http://www.cs.wustl.edu/~jain/cis788-99/ftp/voip_protocols.pdf

(accessed April 04 2008)

[2] Bishop, David. 2008. Parameterized Fixed-Point Package. February 27. http://www.vhdl.org/fphdl/vhdl.html (accessed July 17 2009) [3] Grof, Gprof Profiler, http://sourceware.org/binutils/docs/gprof/index.html (accessed October 9, 2010)

[4] Halsall, Fred. 1998. Data Communications, Computer Networks and Open Systems. 4th. ed. Wales: Addison-Wesley Publishing Company.

[5] International Telecommunication Union (ITU). 1972. Pulse Code Modulation (PCM) of Voice Frequencies. G.711. [6] Juan, G. R. 1995. Introduction to the Physics and Psycholophysics of Music. 3rd ed. New York: Springer Verlag.

[7] Montminy, Christian. 2000. A Study of Speech Compression Algorithms for Voice Over IP. Msc Thesis. Ottawa-Carleton Institute for Electrical and Computer Engineering, University of Ottawa, Canada.

[8] Perry, Douglas. 1998. VHDL. 3rd ed. New York: McGraw-Hill.