FPGA Implementation of Hearing Aids using Stationary
Wavelet-Packets for Denoising
Nivin Ghamry
Department of Information Technology, Faculty of Computer and Information,
Cairo, Egypt
ABSTRACT
Speech signals are often contaminated with acoustic noise, which is present in a variety of listening environments. This problem is of critical importance because background noise is particularly damaging to speech intelligibility for people with hearing loss and hearing aids users. A pre-processing step in modern digital hearing aids is denoising to estimate the signal in the band of interest from the available noisy signals without altering it. Advances in digital signal processing have allowed adding vital features to improve the performance of assistive listening devices like speech enhancement and noise reduction. Furthermore, the progress in field programmable gate arrays (FPGA) technology,especially for miniaturized system applications facilitated the implementation of small, portable and high performance devices. In this paper a signal processing core based on stationary wavelet packets is designed forspeech enhancement and denoising in hearing aids. The developed core is implemented on FPGA and interconnected with embedded microprocessors and related peripherals to form a complete system on a programmable chip. The embedded software processor is Micro Blaze soft core processor. The chip used is Diligent XUP II Virtex-II Pro system FPGA containing audio codec LM4550. The features of the proposed design are intelligibility, small area and low cost.
Keywords
denoising, hearing aids, FPGA; Micro Blaze, LM4550, stationary wavelet transform, wavelet packets.
1.
INTRODUCTION
Digital hearing aids score over their analog counterparts because of using advanced digital signal processing (DSP) algorithms to compensate speech signal and improve intelligibility of the hearing impaired in noisy environment [6]. The general problem of noise reduction has been addressed in great depth by researchers who have investigated several speech enhancement algorithms and techniques for various noisy environments to reduce the effects of background noise on speech intelligibility and overall sound quality [22-23, 7]. Speech enhancement methods include: spectral subtraction, mean square estimation formant based methods, discrete Fourier transform DFT [18]and Discrete Wavelet (DWT). Wavelet based approaches have become apopularalternative to the Fourier transform in the field of digital signal processing due to their multiresolution capability [4, 12]. The application of wavelet transform based methods in the design of hearing aids is found in the early
speech enhancement and noise reduction is presented, in section 5 the software implementation of the SWPT core on Micro Blaze and its hardware implementation on FPGA and integration as an embedded system is explained. Finally conclusions are given in section 6
2.
NOISE REDUCTION IN HEARING
AIDS
Noise is any unwanted signal that interferes with a speech signal in the band of interest. Specialists have distinguished three types of noise that are particularly damaging to speech intelligibility [5]:
1. Random noise with an intensity-frequency spectrum similar to that of speech.
2. Interfering voice. The interference produced by many other voices of roughly equal intensity (known as speech babble) has physical characteristics similar to that of random noise with a speech-shaped intensity-frequency spectrum.
3. Substantial room reverberation. Reverberation is produced by sound being reflected off walls, floors, ceilings and other hard surfaces.
The peripheral auditory system analyzes sound by means of a bank of overlapping, asymmetric, narrowband filters similar to a bank of 1/3 octave-band filters [1]. These filters are known as the critical bands of hearing. A critical band centered on a higher frequency picks up a low-frequency sound. Thus, noise in a critical band masks signals in that critical band and signals in higher-frequency bands. This effect, known as upward spread of masking, is relatively small at low noise levels but increases with increasing noise level. Most hearing-aid users have sensorineural hearing loss, such as an elevation of the hearing threshold which increases with increasing frequency. What makes the noise problem difficult is that providing amplification for this type of hearing loss reduces the dynamic range of hearing and lowers the threshold of loudness discomfort. Several approaches are applied to the problem of noise reduction in hearing aids. Some examples are given here for illustration [5]:
1. The application of fixed filters for time-invariant noise which differs in spectral shape from that of speech.
2. The application of adaptive filters (or frequency-dependent amplitude compression) for noise time-varying spectra which is similar to that of speech.
3. The reduction of spread-of-masking effects of the high sound levels resulting from high-gain amplification. The filter used must attenuate only those frequency bands in which the noise exceeds the speech.
4. Spatial filtering by which the spatial differences between speech and noise are used to improve speech intelligibility using directional microphones or microphone arrays.
3.
THE PRINCIPLES OF SWT AND
WPT
The constant relative frequency resolution of the wavelet analysis is known as the constant Q property. Q is the quality factor of the filter, defined as the center-frequency divided by the bandwidth [15]. The property of constant Q is one of the
designed to perceive sound on a logarithmic frequency and amplitude scale. Wavelet decomposition results in a logarithmic set of bandwidths, which is very similar to the response of human ear to frequencies. The resulting wavelet coefficients can be manipulated to achieve speech compression and denoising. Around a frequency fmthere is
masking according to a masking threshold T(fm). For audio
compression the frequency allocation of the DWT approximates the critical bands of the human ear and a masking curve is computed. The neighboring frequencies with magnitudes below T(fm) are removed in order to
compress the signal, while frequencies with magnitudes larger than the masking threshold are kept for reconstructing the signal.In this section a brief overview of theory of the SWT and WPT is given
3.1
Stationary Wavelet Transform
Stationary Wavelet Transform (SWT) was first introduced in [3]. The main feature of SWT is that it is a time-invariant transform. Shift-invariance is important in many applications such as change detection, denoising and pattern recognition. The SWT algorithm is close to the DWT, but the downsampling operation after filter convolution is suppressed. The decomposition obtained is then a redundant representation of the signal. The benefit of this redundant representation over the memory-efficient decimated DWT is the reduction of artifacts at discontinuities and irregularities in reconstructed signals. These artifacts are caused by unpredictable changes in coefficients with different time shifts. Figure 1(a) shows the general decomposition step and Figure 1(b) shows the stationary wavelet filter calculations. CAjand CDj represent the approximation and detail
coefficients of level j, respectively and Hj and Gj represent the
low- and high pass decomposition filters at the same level. There is no unique Inverse Stationary Wavelet Transform (ISWT), therefore the inverses obtained for every non-decimated DWT are averaged. This can be done recursively, starting from level j down to level 1. In this way the main features of the analyzed signal is captured..
3.2
Wavelet Packet Transform
Wavelet Packet Transform (WPT)is a generalization of wavelet decomposition to offer a richer frequency range for signal analysis. WPT was first introduced by Coifmanet al.in [16]for dealing with the non-stationarity of data in a
(a)
(b)
[image:2.595.318.532.565.736.2]manner than DWT does. In WPT analysis, the details as well as the approximations are split to give a wavelet packet decomposition tree. Figure 2 shows the structure of a typical wavelet packet decomposition tree. The complete WPT tree is not necessary for perfect reconstruction of the analyzed signal, usually a subtree is chosen based on a data-based cost function. Classical entropy-based criteria are used to select the most suitable decomposition of a given signal. This selection is called the best tree structure. Each node of the decomposition tree is examined to quantify the information gained by performing each split. In general a node N is split into two nodes N1 and N2 if the sum of the entropy of N1 and N2 is lower than the entropy of N. This is a local criterion based only on the information available at the node N. Several entropy type criteria can be used to gain the optimal decomposition selection. Commonly used entropy types in the field of signal processing are the Shannon entropy, the logarithm of the energy entropy and the threshold entropy [17]. De-noising and compression of signals are interesting applications of WPT.
S
CA10 CD11
CD32
CD31
CA30
CD23
CD22
CD21
CA20
[image:3.595.55.275.288.440.2]CD33 CD34 CD35 CD36 CD37
Fig. 2: Wavelet Packet Decomposition Tree
4.
THE PROPOSED SWPT METHOD
FOR SPEECH ENHANCEMENT AND
NOISE REDUCTION
4.1
Compression and Noise Reduction
The denoising of signals using the wavelet transform was first developed by Donoho and Johnstone[2]. Wavelet shrinkage is based on thresholding after signal decomposition to a certain level. Thresholding generally gives a low-pass version of the original signal. There are two types of thresholding, hard and soft thresholding. The soft thresholding involves first setting to zero the elements whose absolute values are lower than the threshold, and then shrinking the nonzero coefficients towards zero. The proposed denoising method exploits the features of the wavelet packet analysis in the stationary wavelet basis domain for decomposition. To get the optimum representation of the decomposed signal, the best tree selection algorithm is applied on the non-decimated stationary wavelet coefficients. Denoising of the coefficients is achieved via soft thresholding given by Eq.(1) [2]:
j j
j
N
T
2
ln
/
2
(1)Where N is the length of the input signal and
jis thestandard deviation of noise at scale j.
T
jis the thresholdcalculated on a level-dependent basis, rather than global calculation in which one threshold is used for all coefficients. Here, different thresholds exist for the different multiresolutions. The local thresholding criterion increases the capability of the denoising strategies in handling the variance of noise and improves the limits of the classical denoising strategies. The local thresholding is based on the work of M. Lavielle about detection of change points using dynamic programming [11]. In this work the Db4 stationary wavelets are chosen for the decomposition of a speech signal corrupted by additive white noise. As to white noise, its power is mainly concentrated in first level detail coefficients, and the noise level decreases with the increase of the scale j.
The following steps summarize the proposed method: 1. SWT is performed to obtain non decimated-wavelet
coefficients.
2. Successive splitting of the coefficients and upsampling of the decomposition filters is repeated until the best tree structure at a predefined decomposition level is obtained.
3. For each wavelet decomposition level (except for the approximation coefficients), a level dependent soft threshold is determined.
4. Soft thresholding is applied on the wavelet coefficients to kill the effect of the noise.
5. The clean signal is reconstructed from the denoised approximations and details according to the ISWT algorithm.
The ISWT averages recursively the inverses obtained for every non-decimated DWT. For the WPT analysis, the Shannon entropy is used to compute the best tree after decomposition. Eq. (2) for the Shannon entropy E involves the logarithm of the squared value of each signal sample
s
i [17]:)
log(
22
i i
i
s
s
E
(2)Fig. 3: (a) The Original Noisy Signal S, (b) Denoised Signal with SWT,(c) Denoised Signal with WPT and (d) Denoised Signal with SWPT
4.2
Speech Enhancement
Figure 4 shows the audiogram of a hearing impaired who suffers from mild hearing loss at low frequencies and severe hearing loss starting at the frequency of 1000 Hz. According to wavelet decomposition, each of the wavelet coefficients corresponds to a band of frequencies. WPT-based speech enhancement is achieved by frequency dependent amplification. The amplification gain is calculated separately for each wavelet coefficient based on its intensity level according to the time varying frequency-dependent processing (TVFD) gain calculation [8]. The gain is calculated such that the ratio of log intensity above hearing is the same for the hearing-impaired listener as the corresponding ratio is for the normal-hearing listener. This is given mathematically by Eq. (3):
**
(3)
Where δ is the distance between the unamplified wavelet coefficient and the normal hearing threshold. Δ is the dynamic range of a normal-hearing person (in dB), Δ* is the dynamic range of a hearing- impaired person (in dB). From the above relationship, compression gain for a coefficient can be derived using logarithmic representations in Eq.(4):
**
)
(
norimp
C
T
T
C
(4)Where Timp is the threshold of hearing of a hearing-impaired
person (in dB), Tnor is the threshold of hearing of a normal
[image:4.595.58.274.71.267.2]person (in dB). C* and C are the compensated and non-compensated wavelet coefficients, respectively. For sounds of small Sound Pressure Level (SPL) the gain is adjusted for overcoming the loss of sensitivity along the whole frequency range according to the different levels. The value of the gain is raised at 1000 Hz to smooth the transition from mild- to severe hearing loss. For sounds of larger SPL, the gain is adjusted at high-frequencies according to the loudness growth function to avoid discomfort caused by the reduced dynamic range of the listener. To achieve this, a SWPT decomposition tree with more terminal nodes in the frequency range 1000-4000 Hz is employed. Best results were achieved using three levels of decomposition. The best-tree searching algorithms yielded a decomposition subtree with seven nodes. The speech signal is reconstructed from the modified coefficients after applying amplification gain.
Fig. 4 Precipitous high-frequency hearing loss.
4.3
The SWPT Hardware Architecture
[image:4.595.317.532.308.447.2]mux
H
1G
1Storage
Unit
CA
10Storage
Unit
CD
11mux
H
2G
1Storage
Unit
CA
20Storage
Unit
CD
21H
2G
1Storage
Unit
CA
22Storage
Unit
CD
23First Level
Thresholding
mux
mux
Second Level
Thresholding
mux
mux
Second Level
Thresholding
To further decomposition
To further decomposition
To further decomposition
To further decomposition
Input
Entropy BasedControl Unit
Sel
Sel
Fig5. The architecture of the wavelet packet transform
Figure 5, shows the hardware implementation schematically. The structure follows the regular filter bank structure suppressing decimation blocks between the successive stages. Storage units and multiplexers are the decoupling elements between the filter modules. A control unit determines the proper select signal for the multiplexers based on the entropy criterion mentioned before. Accordingly, the coefficients are either split for further decomposition or passed as final outputs.Db4 coefficients are floating numbers. In order to simplify the digital design, db4 coefficients are scaled up and truncated into integers.
[image:5.595.56.273.534.582.2]5.
FPGA Implementation
Fig 6 shows the building blocks of digital hearing aids which are: microphone, preamplifiers, anti-aliasing filters, analog to digital converter (ADC), a signal processing technique core, digital to analog converter (DAC), post amplifier and a speaker.
Fig. 6 Block diagram for a programmable digital hearing aid.
The proposed hearing aid device is implemented on the Diligent XUP II Virtex-II Pro system. The FPGA on the board is a Xilinx XC2VP30 with 30,816 logic cells, 136 18-bit multipliers and 2448 kb BRAM. The board contains the audio codec LM4550 chip which was specially designed to serve as an interface between the analog traditional audio components (e.g., headphones and microphones) and the FPGA and provide a high quality audio pass. The LM4550 uses 18-bit Sigma-Delta ADCs and DACs and provides 90 dB of dynamic range and a conversion rate of at least 16 kHz. More about LM4550 can be found in [26]. In the following the software and hardware implementation are described.
5.1
Software Implementation
The software implementation of the proposed system is carried out using the soft processor Micro Blaze [26]. Micro Blaze is a 32-bit RISC Harvard architecture soft processor with an instruction set optimized for embedded applications. This virtual microprocessor is built by combining blocks of code called cores inside FPGA. Xilinx Platform Studio (XPS) is used to configure and build the specification of the embedded system. The aim here is to create DSP module (SWPT core) to increase system intelligibility and then integrate it into that processor system by adding it as a peripheral module. A software program named SWPT Driver is run in the Micro Blaze to drive that module. The software program implements the communication between the decomposition stages in SWPT and the control signals of the core.Figure 7shows the block diagram of the software implementation of the proposed design. The proposed system consists of the software and hardware IPs listed here:
1. Micro Blaze Processor;
2. Local Memory Bus (LMB), Block RAM BRAM (On-chip memory), LMB BRAM Controllers,
3. On-chip Peripheral Bus (OPB),
4. UART and Memory Debug Module (OPB_MDM), 5. General Purpose Input/output peripheral (GPIO) 6. Host Interface
7. Digital Clock Manager (DCM) Module
8. Stationary wavelet speech enhancement unit (SWPT) 9. SWPT Driver
Fig. 7 The internal modules on FPGA.
clock 12.228MHz.The OPB_MDM enables JTAC-based debugging for the Micro blaze processor.The UART module connects the OPB to the serial port of the board. The Software controlling the Micro Blaze processor is written in C++ language. Micro Blaze is used to configure the LM4450 using GPIO as shown in Figure. The SWPT speech enhancement module is described in VHDL. After testing and debugging, using Xilinx Microprocessor Debugger (XMD) and JTAG cable it is downloaded on the FPGA and then connected to the Micro blaze processor as a peripheral. To connect the SWPT core to OPB an interface with Master slave signals is provided.
5.2
Hardware Implementation
The building blocks for hardware implementation on FPGA are mainly the SWPT, ISWPT blocks and the Xilinx Dual port BRAM for storing intermediate result (wavelet coefficients), filter coefficients, level dependent thresholds and values for gain adjustment. The data and coefficient values comprise 16 bits. The implementation flow is as follows: At the beginning a serial synchronization pulse goes from low to high as a soft reset for the LM4550 codec chip to clear the value in the status register of the LM4550 codec. AC’ 97 controller provides the register interface to the AC codec. The OPB interface is clocked with OPB system clock, OPB_clk =100 MHz, the AC’97 controller is clocked with the AC’97 bit clock, Bit_clk = 12.228MHz and the SWPT module is running at 100MHz. The input speech signal is converted to an electrical signal by the microphone, boosted by the on-chip amplifier and then transferred to the ADC of the codec through the input of the first channel. The ADC samples the analog waveforms at 48 kHz and outputs sequences of 16-bit digitized samples. The samples are packaged along with other status data and transmitted serially at a rate of 12.288 MHz to the BRAMs of the FPGA via the SDATA-IN pin. The data is accessed through the BRAM controller. The received data is transmitted to the SWPT and processed continuously. Soft thresholding is performed on the wavelet coefficients in the frequency bands of interest to eliminate noise as explained
the operation of the different blocks as well as the interaction between them through control signals. These relevant control signals include the bypass signals for the best tree selection algorithm, the address of the soft thresholds and of the gain values for the frequency band of interest. The SWPT core asserts an acknowledge signal when receiving data and a done signal as soon as the decomposition is finished. To synthesize the signal back to time domain, the ISWT block is activated by an activation signal from the control unit, too. The processed data is transferred to BRAM block then to the DAC of the AC97 chip via the SDATA-OUT pin which converts them to an analog waveform. This waveform is amplified by the amplifier contained at the output of the board and driven to the speaker. Latency of the system is 1.333 ms. Debugging was carried out using Modelsim. To test the SWPT core the same input noisy signal, which is used in Matlab simulation, is transmitted to the processor through the microphone and the corresponding responses is observed in a digital Scope Coder as shown in Fig.8. The results are compared with the results of Matlab. The proposed design is comparable with the design presented in [11] as well as with conventional digital hearing aids. The advantage of the design here over the previous designs is using the Xilinx XC2VP30 board with the audio codec LM4550 chip, which is used for ADC and DAC
Fig.8 Response of one output channel
[image:6.595.319.534.595.732.2]processing time.Furthermore it outperforms conventional hearing aids in portability, nursing and small price, because of the low cost of FPGA exploitation. The features added to improve the performance of the device is denoising as denoising is one of the successful applications of SWPT and yet less exploited in hearing aid devices. On the other hand the latency time of the device is comparable with the time obtained in [5] (1.15 msec), and is still less compared to listening latency which allows aiding more features in future work.
6.
Conclusions
In this paper a stationary wavelet packet algorithm for noise reduction in hearing aids is presented. The performance of the SWPT method is tested using Matlab simulations. The architecture for hearing aid device is implemented on Xilinx FPGA XC2VP30, which contains an audio codec the LM4550 chip on board. The hearing aid device is designed as a System on a Programmable Chip by combining the soft processor core Micro blaze with SWPT hardware module as a peripheral. The SWPT module is described in VHDL and implemented on FPGA. The software controlling Micro blaze is written in C++ and compiled using Xilinx's Embedded Development Kit for Xilinx Micro Blaze. Software and hardware implementations are described. The implementation fulfills the requirements of hearing aids like portability, intelligibility, low price and good performance.
7.
REFERENCES
[1]B. C .J Moore, Basic auditory processes involved in the analysis of speech sounds, Philosophical. Transaction of the Royal Society 363 (2008), 947-963.
[2]D. L. Donoho, and I.M. Johnstone, Ideal spatial adaptation via wavelet shrinkage, Biometrika81 (1994), 425-455.
[3]G.P. Nason, B.W. Silverman, The stationary wavelet transform and some statistical applications, Lecture Notes in Statistics 103 (1995), 281-299.
[4]G. S. Mihov, R. M. Ivanov, Angel N. Popov, Denoising Speech Signals by Wavelet Transform, Journal of Rehabilitation Research and Development 38 (2001), 111–121.
[5]H. Levitt, Noise reduction in hearing aids: a review, Journal of Rehabilitation Research and Development 38(2001), 11–121.
[6]H. Levitt, A historical perspective on digital hearing aids: How digital technology has changed modern hearing aids, Trends in Amplification 11 (2007), 7-24.
[7]K. .Paliwal and A. Basu, A Speech enhancement method based on kalman filtering in Proceeding of IEEE International Conference on Acoustics, Speech, and signal processing (ICASSP’87) 12 (1987), 297 – 300.
[8]L. A., Drake, J. C. Rutledge, and J. Cohen, Wavelet analysis in recruitment of loudness compensation, IEEE Transaction on Signal Processing, 41(1993), 3306 – 3312. [9]M. A. Trenas, , J. C. Rutledge and N. A. Whitmal,, Wavelet-based noise reduction and compression for hearing aids, Proceedings of the first joint Engineering in Medicine and Biology conference (BMES/EMBS) 1 (1999), 670-673 .
[10] M. A. Trenas, , J. Ldpez and E. L Zapata, An architecture for wavelet-packet based speech
enhancement for hearing aids, Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, 2 (2000), II849 - II852.
[11] M. Lavielle, Detection of multiple changes in a sequence of dependent variables, Stochastic Processes and their Applications,Elsevier Science83 (1999) 79-102.
[12] M. S. Chavan, M. N.Chavan and M.S. Gaikwad, Studies on implementation of wavelet for denoising speech signal, International Journal of Computer Application 3 (2010), 1-7.
[13] N. Whitmal, Janet. Rutledge, J. Cohen, Denoising speech signals for digital hearing aids: A wavelet based approach, Wavelets and Multiscale AnalysisApplied and Numerical Harmonic Analysis (2011), 299-331. [14] R. C. Nongpiur and D. J. Shpak, Impulse-noise
suppression in speech using the stationary wavelet transform, Journal of the Acoustical Society of America133 (2013), 866-879.
[15] R. K Young, Wavelet theory and its applications, Kluwer Academic Publisher, USA, 1993.
[16] R.R.; Coifman, M.V Wickerhauser, Signal processing and compression with wave packets, Numerical Algorithms Research Group, New Haven, CT Yale University, 1990.
[17] R.R. Coifman, M.V Wickerhauser, Entropy-based algorithms for best basis selection, IEEE Transaction on Information Theory, 38 (1992), 713-718.
[18] S.D. Apte and R. Shahu, Speech enhancement in hearing aids using conjugate symmetry of DFT and SNR-perception models, International Journal of Computer Application 1 (2010), 44-51.
[19] V. H. Kumar and P. S.Ramaiah, Computerized speech processing in hearing aids using FPGA architecture, Journal of Computer Science and technology (IJACSA) 2, (2011) 106-111.
[20] V. H. Kumar and P. S.Ramaiah, Digital speech processing design for FPGA architecture for auditory prostheses, J0urnal of Computer Science and engineering (JCSE) 6 (2011) 33-41.
[21] Y. Min, Designofportable hearing aid basedonFPGA,Proceedings of the4thIEEE International ConferenceofIndustrial, Electronics and applications(2009), ICIEA, 1895-1899.
[22] Y. Ephraim and H. V. Trees, A Signal subspace approach for speech enhancement, IEEE Transaction on Speech Audio Processing 3 (1995), 251-266.
[23] Y.Ephraim and D. Malah, Speech enhancement using a minimum mean –square error log-spectral amplitude estimator, IEEE Transaction Acoustics, Speech, Signal Processing ASSP-33(1985), 443-445.
[24] Y Hao, X Zhu, A new feature in speech recognition based on wavelet transform. Proceedings of IEEE 5th Inter Conf on Signal Processing (WCCC-ICSP 2000), 1526–1529.