**Animal Sound Recognition Based on Double **

**Feature of Spectrogram in Real Environment**

### Ying Li

College of Mathematics and Computer Science. Fuzhou University

Fuzhou, China fj_liying@fzu.edu.cn

### Zhibin Wu

College of Mathematics and Computer Science. Fuzhou University

Fuzhou, China
n130320070@fzu.edu.cn
**Abstract****—In this paper, we propose an animal sound **

**recognition method in various noise environments with different **
**Signal-to-Noise Ratios (SNRs). In real world, the ability to **
**automatically recognize a wide range of animal sounds can **
**analyze the habits and distributions of animals,which makes it **
**possible to effectively monitor and protect them. However, due to **
**the existence of different environments and noises, the existing **
**method is difficult to ensure the recognition accuracy of animal **
**sound in low SNR condition. To address this problem, this paper **
**proposes double feature, which consists of projection feature and **
**local binary pattern variance (LBPV) feature, combined with **
**random forests** **for animal sound recognition. In feature **
**extraction, an operation of projecting is made on spectrogram to **
**generate the projection feature. Meanwhile, LPBV feature is **
**generated by means of accumulating the corresponding variances **
**of all pixels for every uniform local binary pattern (ULBP) in the **
**spectrogram. As the experimental results show, the proposed **
**method can recognize a wide range of animal sounds and still **
**remains a recognition rate over 80% even under 10dB SNR. **

**Index Terms****—Animal sound recognition, local binary pattern **
**variance, projection feature, random forests. **

I. INTRODUCTION

The ecological environment is closely related to our life, where animal sounds contain a large amount of rich information. By animal sound recognition, we can make some understanding and analysis on their life habits and distributions to effectively monitor and protect them.

Animal sound recognition is generally based on spectrogram, time-based audio feature, Mel Frequency Cepstrum Coefficient (MFCC), sound database index, and wavelet packet decomposition, classifying and recognizing by Support Vector Machine (SVM) et al. There are some typical methods including animal sound recognition based on spectrogram correlation [1], right whale sounds detection using an ‘edge’ detector operating on a smoothed spectrogram [2], animal sound recognition based on time-based audio features [3], bird sounds classification by combining MFCC with SVM [4], et al. In addition, by means of the classic method of text-based database query, Bardeli [5] proposes index-based animal sound retrieval and Cugler et al. [6] propose architecture for retrieval of animal sound recordings based on context variables. Recently, Exadaktylos et al. [7] confirm the status of animals by sound recognition for livestock production optimization. Potamitis et al. [8] present

a method of specific bird sound detection in long real-field recordings. In our recent work [9], we propose a bird detection method, in which bird sound signals are detected and selected via adaptive energy detection from the bird sounds with background noises, Mel-scaled Wavelet packet decomposition Sub-band Cepstral Coefficient (MWSCC) and Mel-Frequency Cepstral Coefficient (MFCC) are extracted from the above signals for classification by using the classifier of SVM.

The existence of varieties of noises in the real environment brings a series of challenges for recognizing animal sounds. In order to improve recognition accuracy of animal sound recognition in various noise environments with low SNR, an animal sound recognition method based on double feature of spectrogram is proposed in this paper. We extract projection feature and local binary pattern variance (LBPV) feature from spectrogram to generate the double feature. Projection feature [10], [11] is set as the first layer of double feature, which is a global feature, got by eigenvalue decomposition and projection on the entire spectrogram matrix. The second layer is LBPV feature [12], which captures local features of image, combining local binary pattern (LBP) feature [13], [14] with contrast feature effectively. The two features, which are complementary, not only can effectively improve recognition performance but also have anti-noise performance. Finally, we adopt random forests as classifier, which is a combination classifier that has good performance as well as fast speed [15].

After a series of designs, experiments, and analysis, we propose a framework of animal sounds recognition based on double feature of spectrogram. As show in Fig. 1, sound signal got its spectrogram firstly, then double feature is extracted from it, lastly, RF is applied to make classification.

II. DOUBLEFEATUREOFSPECTROGRAM Feature extraction is the core of our animal sound recognition method. The effectiveness of feature directly affects the classification results. Therefore, we propose double feature based on the time-frequency characteristics of sound signals, namely projection feature and LBPV feature.

This work is supported by the National Natural Science Foundation of China (No. 61075022).

Fig. 1. Animal sound recognition framework.

*A.* * Projection feature *

Different animal sounds have different frequency ranges,
so their spectrograms are different. Sound signal can be
transformed to its time-frequency spectrum* S (t, f)* by using
Short-Time Fourier transform (STFT), where *t* is frame index,

*f* is frequency index. *S (t, f)* can be translated into a
two-dimensional gray-scale image, namely spectrogram. The *t*th
frame can be viewed as a vector *S*�* _{t}=[S *(

*t, 0*)

*,*⋯

*, S (t, N-1)*]𝑇𝑇

_{, }which contains

*N*frequency bins.

*S*�

*is further converted to the log-scale normalized vector:*

_{t}10

### ˆ 10log ( )

*t*

*t*

*S*

### =

*S*

(1)
### ˆ

### ˆ

### || ||

*t*

*t*

*t*

*S*

*S*

*S*

### =

(2) where*St*denotes the log-scale normalized

*t*th frame. These

vectors are not suitable for classification because of their high dimensions, so it is necessary to reduce their dimensions.

Eigenvalue decomposition is a simple and effective
method of dimensionality reduction. We will use eigenvalue
decomposition to reduce dimensionality. Assuming that *S(t*, *f)*

has *M* frames, the vectors of these frames can be written as a
matrix, *X*∈ℝ*M×N* _{ and }_{X=[S}

*1,*⋯*,St,*⋯*,SM*]𝑇𝑇 . The target of

eigenvalue decomposition is a square matrix. Therefore, the
covariance matrix *C*∈ℝ*N×N*_{ of the matrix }_{X}_{ is given by }_{C=X}T_{X}_{. }

The process of dimensionality reduction using eigenvalue decomposition can be written as

*T*

*C U U*

### = Λ

(3)### (

### )

' 1 1 1 2 '### , , ,

### 0

### 0

*N*

*N*

*N*

*u*

*C*

*u u*

*u*

*u*

### λ

### λ

###

###

###

###

###

###

### =

_{}

_{}

_{}

###

###

###

###

###

###

###

###

###

###

###

###

(4) ' ' ' 1 1 1 2 2 2### =

### +

### + +

_{N N N}*C*

### λ

*u u*

### λ

*u u*

_{}

### λ

*u u*

(5)
### K

' ' ' 1 1 1### +

2 2 2### + +

*K K K*

### ,

*C*

### ≈

### λ

*u u*

### λ

*u u*

_{}

### λ

*u u*

_{}

*N*

(6)
where *U*∈ℝ

*N×N*

_{ is a matrix consisting of all eigenvectors }

*μ*⋯

_{1},*,μ*of matrix

_{N}*C*.

*Λ*is a diagonal matrix containing all eigenvalues

*λ1,*⋯

*,λN*which represent the weights of

corresponding eigenvectors, where *λ1≥λ2≥*⋯*≥λN*. In this paper,

the value of eigenvalue *λn* reflects the importance of

corresponding eigenvector *μ _{n}*for animal sound. The higher the
value is, the more important the eigenvector is. The original
matrix

*C*can be approximately reconstructed by the first

*K*

columns of *U* and *Λ*, where *K*≪*N*. Therefore, eigenvalue
decomposition can be used for dimensionality reduction. The
contribution ratio of the first* K* eigenvectors 𝜂𝜂𝐾𝐾 is calculated:

1 1

### /

*K*

*N*

*K*

*i*

*j*

*i*

*j*

### η

### λ

### λ

= =### =

### ∑ ∑

(7) where 𝜂𝜂𝐾𝐾 shows the significance of the first*K*eigenvectors in representing the sound. Fig. 2 uses the sound of white crane as a sample, when

*K*≤

*10*, the contribution ratio of the first

*K*

eigenvectors increases rapidly. And when* K* continues to
increase, the ratio increases more gently and gradually tends to
be 100%.

Matrix *U* contains the major information of sound, we
select the first *K* eigenvectors to form basic vectors matrix

*UK*∈ℝ*N×K*. Projection feature can be computed by projecting

the spectrogram matrix *X* against*UK*:

*K* *K*

*X*

### =

*XU*

(8)
where *XK*∈ℝ*M×K* is the matrix of the projection feature. And

the dimension of each frame decreases from *N* to *K*, and *K*≪*N*.
Projection feature will be used as a component for animal
sound recognition in various environments.

*B.* * LBPV feature *

LBPV feature is formed by accumulating the corresponding variances of all pixels for every ULBP value. The ULBP value characterizes the spatial structure of image texture, and the variance describes the contrast information of image texture. LBPV feature combines the two features.

The texture *T* in a local neighborhood of a gray-scale
image is defined as the joint distribution of the gray levels of

*P* equally spaced pixels on a circle of radius *R* [13], [14]:

### (

### ) (

### )

### (

### )

### (

0*c*

### ,

1*c*

### , ,

*P*1

*c*

### )

*T t s g*

### ≈

### −

*g s g g*

### −

###

*s g*

− ### −

*g*

(9)
where gray value *g _{c}* corresponds to the gray value of the
center pixel of the local neighborhood,

*g*⋯

_{i}(i=0,1,*,P-1)*

correspond to the gray values of *P* pixels and *s* is a sign
function:

### ( )

### 1,

### 0

### 0,

### 0.

*x*

*s x*

*x*

### ≥

###

### =

_{<}

###

(10)LBP is a gray-scale texture operator, and LBP value
denotes the spatial structure of the image. LBP operator forms
LBP value by sorting* T* in a certain direction, then forming
and computing the binary sequence, which can be written as a

*LBPP, R* value:
1
,
0

### (

### )2 .

*P*

*i*

*P R*

*i*

*c*

*i*

*LBP*

− *s g g*

=
### =

### ∑

### −

(11) As shown in the solid line area of Fig. 3(a), a 3×3 image with gray value is set as an example. The calculation process of LBP value of middle point*c*with gray value 80 is shown in Fig. 3(b), where (141≥80)→1, (109≥80)→1, (89≥80)→1, (68<80)→0, (48<80)→0, (52<80)→0, (60<80)→0, (89≥80)→1, so

*LBPP,R*=(11100001)2=(225)10. To calculate

LBPvalues of edge pixels by (11), the dashed part of image is extended using the process shown in Fig. 3(a).

LBP operator produces 2𝑃𝑃_{different binary patterns, namely }
LBP values, where *P* equally spaced pixels are on a circle of
radius *R*. Ojala et al. [14] propose the uniform pattern based the
fact that the vast majority of binary patterns contain at most 2
bitwise 0/1 changes.The uniform pattern has at most 2 bitwise

0/1 changes in the circular binary presentation. The *U *value is
defined as the number of bitwise 0/1changes in the pattern, and
used for determining whether the pattern is uniform pattern:

, 1 ( , ) 0 ( , ) 1 ( , ) 1 ( , ) 1

### (

### ( , ))

### (

### )

### (

### )

### | (

### )

### (

### ) |.

*P R*

*P*

*m n*

*m n*

*P*

*i*

*m n*

*i*

*m n*

*i*

*U LBP m n*

*s g*

*g*

*s g*

*g*

*s g g*

*s g*

*g*

−
−
−
=
### =

### −

### −

### −

### +

### ∑

### −

### −

### −

(12)The pattern with *U≤2* is uniform pattern, and its value is
called ULBP value, which can be written as * LBPP, Ru2* value:

1 ( , ) , 2 0 , s(g -g )2 ( ( , )) 2 ( , ) ( 1) 3,

### ,

*P*

*i*

*i*

*m n*

*P R*

*u*

*i*

*P R*

*U LBP m n*

*LBP m n*

*P P*

*other*− = ≤ = − +

###

###

###

###

### ∑

(13) where superscript ‘‘u2’’ means that the uniform patterns have*U* values of at most 2.

The uniform pattern can decrease the number of pattern
from *2P*_{to }_{P(P-1)+2}_{, we collect the patterns which have }_{U}

values of over 2 into one category, namely the *P(P-1)+3*th
class. Fig. 3(a) is used as an example, where *P*=8 and *R*=1,
and the number of uniform pattern is 59. We can get 59 the
ULBP values according to (13). The mapping between ULBP
values and the serial number 1-59 is shown in Table 1, where

*ULBP(k)* is the corresponding ULBP value of the serial
number *k*.

As for a *M×N* gray-scale image, each pixel (m, n) can get a
ULBP value. These ULBP values still form an image, which is
called ULBP image. We can get a vector by counting the
frequency of each value in the ULBP image, which represents
a texture feature of gray-scale image. Fig. 3(c) denotes the
ULBP image formed by computing the values of solid line area
of Fig. 3(a) into ULBP values, and also is regard as a matrix
consisting of ULBP values, namely ULBP value matrix *u*. Fig.
3(e) shows the histogram of the ULBP image, which also
represents the texture feature vector of Fig. 3(a).

As for some ULBP image tiles, even they have same
ULBP values in their ULBP images, their texture may be
different. Therefore, the variance is used to describe contrast
information of the texture [12]. The large value of variance
represents a large change of the texture in an image region.
Therefore, LBPV feature is formed by using the variances of
pixel gray values as weight values of ULBP values. The *k*th
element* LBPV(k)* of the LBPV feature can be expressed as

2 , 1 1

### ( )

*M*

*N*

### ( , , )

*u*

*P R*

*m*

*n*

*LBPV*

*k*

*w m n k*

= =
### =

### ∑∑

(14) 2 ,### ( , ) ,

### ( , )

### ( )

### ( , , )

### 0

### ,

*u*

*P R*

*VAR m n LBP m n ULBP k*

*w m n k*

*other*

###

### =

###

### =

###

(15)where *k* is an integer and *k*∈*[1,P*�*P-1*�*+3]*. *w*(*m, n, k*) denotes
the weight value of ULBP value which corresponds the *k*th
element of LBPV feature, for pixel (*m, n*) in the spectrogram.
When the ULBP value of pixel (*m, n*) is equal to *ULBP*(*k*) in
Table I, the value of *w*(*m, n, k*) represents the variance of pixel
(*m, n*) about pixel gray values in the local neighborhood of *P*

equally spaced pixels on a circle of radius *R*. *LBPV(k)*

accumulates the weight values of the ULBP value which
corresponds the *k*th element of LBPV feature, for every pixel
in the spectrogram. *LBPV(1), LBPV(2), LBPV(k), …, *

TABLEI

THE MAPPING BETWEEN ULBP VALUES AND THE SERIAL NUMBER

*k * 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
*ULBP*(*k*) 0 1 2 3 4 6 7 8 12 14 15 16 24 28 30
*k* 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
*ULBP*(*k*) 31 32 48 56 60 62 63 64 76 92 120 124 126 127 128
*k* 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45
*ULBP*(*k*) 129 131 135 143 159 191 192 193 195 199 207 223 224 225 227
*k* 46 47 48 49 50 51 52 53 54 55 56 57 58 59
*ULBP*(*k*) 231 239 240 241 243 247 248 249 251 252 253 254 255 other
(a) (b)
(c) (d)
(e) (f)

Fig. 3. The calculation process of LBPV feature. (a) Gray-scale image. (b)
LBP value of the central point *c.* (c) ULBP value matrix *u*. (d) Variance matrix

*LBPV(P(P-1)+3) *are got according to (14), and LBPV feature
vector with the dimension of *P(P-1)+3* is finally formed.

Fig. 3(d) is the variance matrix *v* of the solid line area in
Fig. 3(a). As shown in Fig. 3(f), LBPV histogram, namely
LBPV feature, is formed by computing* LBPV(k) *according to
the ULBP values of Fig. 3(c), the corresponding serial number

*k* in Table I, and the variances of Fig. 3(d). The above process
is as follows:
*u*(0, 0)=* u*(0, 1)=193=*ULBP*(38)→*v*(0, 0)+*v*(0, 1)=577+653
→*LBPV*(38)=1230,
*u*(0, 2)=* u*(1, 2)=241=*ULBP*(49)→*v*(0, 2)+*v*(1, 2)=218+446
→*LBPV*(49)=664,
*u*(1, 0)=* u*(1,1)= 225=*ULBP*(44)→*v*(1, 0)+*v*(1,1)
=1111+880 →*LBPV*(44)=1991,
*u*(2, 0)=* u*(2, 1)=231=*ULBP*(46)→*v*(2, 0)+*v*(2, 1)=216+197
→*LBPV*(46)=413,
*u* (2, 2)=255=*ULBP*(58)→*u*(2, 2)=132→*LBPV*(58)=132.
Therefore, the corresponding values are put into

*LBPV*={0, …, *LBPV*(38), 0, …, *LBPV*(44), 0, *LBPV*(46), 0, 0,

*LBPV*(49), 0, …, *LBPV*(58), 0}, then we get *LBPV*={0, …,
1230, 0, …, 1991, 0,413, 0, 0, 664, 0, …, 132, 0}, whose
histogram is shown in Fig. 3(f).

III. PEXPERIMENTDESIGN

*A.* *Experimental data *

The 40 animal sounds are used in our experiment, containing bird sounds, mammal sounds, and insect sounds which all come from Freesound [16]. Each sound is mono and truncated to short sound segments with the duration of about 2s, whose format are ‘wav’. Their sampling rate is set as 44.1 kHz, and quantization precision as 16 bits uniformly. The 3 environment noise used in the experiments are wind noise, traffic noise, and rain noise, which are recorded by SONY ICD-UX512F recorder with 44.1 kHz sampling rate from real world.

*B.* * Experiment design *

Two groups of experiments are designed to test the
performances of projection feature, LBPV feature, and double
feature combining with random forests. The first group is used
to decide the parameter *K* in projection feature and the best
scale (*P*,* R*) in LBPV feature. The second group proves that
double feature better represents the animal sounds. According
to the best values of the parameter *K* and the scale (*P*, *R*), we

extract three features including projection feature, LBPV feature, and double feature. Then the experiments of recognition accuracy are carried in noiseless condition and different noise environments with different SNRs, comparing to classic MFCC feature.

In the every experiment, there are 30 samples of each class, 10 samples of every kind of sounds are randomly selected for training, and the left 20 samples are used for testing.

*C.* *Experimental results and analysis *

The first group experiments are without environment
noises. Firstly, we test the relation of accuracy rate and the
parameter *K* in projection feature which is described in (7) and
Fig. 2. As shown in Fig. 4, the recognition accuracy of
projection feature increases as *K* increases. When *K≥6*, the
recognition accuracy tends to flatten. Based on a tradeoff
between computational cost and performance, we set *K*=6.

Then we test recognition accuracies of LBPV feature in
different scales and multi-scales, whose results are shown in
Table III. Using different (*P*,* R*) and the combination of
multiple (*P*,* R*), we can exact LBPV feature with different
scales and multi-scales. According to previous research[14],
we choose 7 groups of (*P, R*), which are (8,1), (16,2), (24,3),
(8,1) +(16,2), (8,1)+(24,3), (16,2)+(24,3) and
(8,1)+(16,2)+(24,3). We observe from Table III that LBPV
feature has good recognition performance in all scales and all
recognition rates are over 96%. We take both performance and
calculation cost into account, (*P, R*) is set as (16, 2).

The second group experiments compare different features in noiseless condition and different noise environments with different SNRs, whose results are shown in Table III, IV and

(a) (b) (c)

Fig. 5. Recognition rates of four features in three environments with different SNRs. (a) Rain noise. (b) Wind noise. (c) Traffic noise.
Fig. 4. The relation of recognition accuracy and the parameter *K*.

Fig. 5. As shown in the Table III, there are four features including projection feature, LBPV feature, double feature, and MFCC feature, which all have a high accuracy rate in noiseless condition. But double feature is slightly higher than the other three features.

Trying to simulate real environment, we perform an experiment under different noise environments with different SNRs. Wind noise, traffic noise, and rain noise are used to simulate real environments. Three noises are added to testing samples with 0dB, 5dB, 10dB, 15dB, 20dB, and 30dB SNRs. Table V shows the average accuracy rates of four features in three noise environments, and it can be seen that the average accuracy rate of double feature is 37.86% higher than MFCC, 16.58% higher than LBPV feature, and 5.71% higher than projection feature. This illustrates that the combination of two features can effectively improve recognition performance, and also denotes LBPV feature and projection feature are complementary.

Fig. 8 shows the recognition results of four features in three environments with different SNRs, and different noise environments make different effects on recognition performance. Compared the three environments, it can be seen that traffic noise has a worse influence on recognition performance while rain noise and wind noise have less influence. The accuracy rates of double feature are significantly higher than other three methods in SNR range of 0dB to 15dB. That means that our proposed method itself is robust to noise. When the SNR is higher than 15dB, the accuracy rates of LBPV feature and projection feature are close to double feature, but double feature still remains the highest accuracy rate among them.

IV. CONCLUSION

This paper proposes an animal sound recognition method, based on double feature of spectrogram, in different noise environments from real world. The results of experiments indicate the proposed method can not only has good recognition performance, but also is robust to noise. In the next stage of study, we will optimize feature extraction to further improve the accuracy rate under low SNR.

References

[1] D. K. Mellinger and W. C. Christopher, “Recognizing transient low-frequency whale sounds by spectrogram correlation,” The Journal of the Acoustical Society of America ,vol. 107, no. 6, pp. 3518-3529, 2000.

[2] D. Gillespie, “Detection and classification of right whale calls using an ‘edge’ detector operating on a smoothed spectrogram,” Canadian Acoustics, vol. 32, no. 2, pp. 39-47, 2004.

[3] D. Mitrovic , M. Zeppelzauer and C. Breiteneder "Discrimination and
retrieval of animal sounds", *IEEE 12th Int. Multi-Media Modelling *
*Conf. Proc.*, 2006

[4] S. Fagerlund, “Bird species recognition using support vector
machines,” *EURASIP Journal on Advances in Signal Processing *, vol.
2007, no. 1, pp. 64-64, May. 2007.

[5] R. Bardeli, “Similarity search in animal sound databases,” *IEEE Trans. *
*Multimedia,*vol. 11, no. 1, pp. 68-76, Jan. 2009.

[6] D. C. Cugler, C. B. Medeiros, and L. F. Toledo, “An architecture for
retrieval of animal sound recordings based on context
variables,”* Concurrency and Computation: Practice and Experience*,
vol. 25, no. 16, pp. 2310-2326, Jun.2013.

[7] V. Exadaktylos, M. Silva, and D. Berckmans, “Automatic
identification and interpretation of animal sounds, applications to
livestock production optimization,” *InTech*, Mar.2014.

[8] I. Potamitis, S. Ntalampiras, O. Jahn, and K. Riede, “Automatic bird
sound detection in long real-field recordings: Applications and
tools,” *Applied Acoustics*, vol. 80, pp. 1-9, Jue. 2014.

[9] X. Zhang and Y. Li, “Adaptive energy detection for bird sound
detection in complex Environments,” *Neurocomputing*, 155(2015),
p108–116.

[10] S. Deng, J. Han, C. Zhang , T. Zheng, and G. Zheng, “ Robust minimum
statistics project coefficients feature for acoustic environment
recognition,” *in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. *
*(ICASSP’14)*, 2014, pp. 8232-8236.

[11] J. Ye, T. Kobayashi, M Murakawa, and T. Higuchi, “Robust acoustic
feature extraction for sound classification based on noise reduction,”* in *
*Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP’14)*,
2014, pp. 5944-4948.

[12] Z. Guo, Z. Lei, and D. Zhang, “Rotation invariant texture classification
using LBP variance (LBPV) with global matching,” *Pattern recognition*,
vol. 43, no. 3, pp. 707-719, Mar. 2010.

[13] T. Ojala, P. Matti, and D. Harwood, “A comparative study of texture
measures with classification based on featured distributions,” *Pattern *
*recognition*, vol. 29, no. 1, pp. 51-59,Jan. 1996.

[14] T. Ojala, P. Matti, and T Maenpaa, “Multiresolution gray-scale and
rotation invariant texture classification with local binary patterns,” *IEEE *
*Trans. Pattern Analysis and Machine Intelligence, *vol. 24, no. 7, pp.
971-987,Jul. 2002.

[15] L. Breiman. “Random forests,” *Machine Learning*, vol. 45, no. 1, pp.
5-32, 2001.

[16] Universitat Pompeu Fabra. Repository of sound under the creative commons license, Freesound. org [DB/OL]. http://www.freesound.org, 2012-5-14.

TABLE III

THE RECOGNITION RATES OF LBPV FEATURE IN VARIOUS SACLES

*P, R * 8,1 16,2 24,3 8,1+16,2

Recognition rate 96.37 97.80 97.03 97.82

*P, R* 8,1+24,3 16,2+24,3 8,1+16,2+24,3

Recognition rate 97.32 97.25 97.55

TABLE IV

THE AVERAGE ACCURACY RATES IN DIFFERENT ENVIRONMENTS Noise

types

Average accuracy rates of different features

MFCC LBPV feature Projection feature Double feature Rain 47.22 67.48 83.85 89.21 Wind 50.25 75.87 85.68 91.92 Traffic 47.79 65.75 72.20 77.71 Average 48.42 69.70 80.58 86.28 TABLE III

COMPARISION ON DIFFERENT METHODS IN NOISELESS CONDITION

Method LBPV feature Projection feature Double feature MFCC Recognition rate 97.80 97.32 98.02 93.74