The Fast Path to a Denoising Autoen coder in R

Researchers Huq and Cleland117, as part of a regular survey on the fertility of women in Bangladesh, collected data on mobility of social freedom. The sub-sample measured the response of 8445 rural women to questions about whether they could engage in certain activities alone (see Table 6).

Name Description

Item 1 Go to any part of the village. Item 2 Go outside the village.

Item 3 Talk to a man you do not know. Item 4 Go to a cinema or cultural show. Item 5 Go shopping.

Item 6 Go to a cooperative or mothers’ club. Item 7 Attend a political meeting.

Item 8 Go to a health centre or hospital.

Table 6: Variables used by Huq and Cleland to measure women’s mobility of social freedom

Let’s use this sample to build our autoencoder. The data is contained in the R object Mobility, from the package ltm. We use it and the package RcppDL:

> require( R c p p D L ) > require( " ltm " ) > data( M o b i l i t y ) > data <- M o b i l i t y

Next, we set up the data; in this example we sample 1,000 observations without replacement from the original 8445 responses. A total of 800 responses are used for the training set, with the remaining 200 observations used for the test set:

> set . seed (17) > n = nrow ( data )

> sample <- sample (1: n , 1000 , F A L S E ) > data <- as . matrix ( M o b i l i t y [ sample ,]) > n = nrow ( data )

> t r a i n <- sample (1: n , 800 , F A L S E )

Now to create the attributes for the training sample and test sample:

> x _ t r a i n <- matrix ( as . numeric ( unlist ( data [ train ,]) ) , nrow = nrow ( data [ train ,]) )

> x _ test <- matrix ( as . numeric ( unlist ( data [ - train ,]) ) , nrow = nrow ( data [ - train ,]) )

Need to check to ensure we have the correct sample sizes. The train set should equal 800 observations, and 200 for the test set:

> nrow ( x _ t r a i n ) [1] 800

> nrow ( x _ test ) [1] 200

All looks good. Now we can remove the response variable from the attributes R objects. In this example we will use item 3 (Talk to a man you do not know) as the response variable. Here is how to remove it from the attribute objects:

> x _ t r a i n <- x _ t r a i n [ , -3] > x _ test <- x _ test [ , -3]

The training and attribute R objects should now look something like this:

> head ( x _ t r a i n ) [ ,1] [ ,2] [ ,3] [ ,4] [ ,5] [ ,6] [ ,7] [1 ,] 1 1 1 0 0 0 0 [2 ,] 1 1 0 0 0 0 0 [3 ,] 1 1 1 0 0 0 0 [4 ,] 0 0 0 0 0 0 0 [5 ,] 1 0 0 0 0 0 0 [6 ,] 1 0 0 0 0 0 0 > head ( x _ test ) [ ,1] [ ,2] [ ,3] [ ,4] [ ,5] [ ,6] [ ,7] [1 ,] 0 0 0 0 0 0 0 [2 ,] 0 0 0 0 0 0 0 [3 ,] 0 0 0 0 0 0 0 [4 ,] 1 0 0 0 0 0 0

[5 ,] 0 0 0 0 0 0 0

[6 ,] 1 0 1 0 0 0 0

Next, we prepare the response variable for use with the RcppDL package. The denoising autoencoder is built using the Rsda function from this package. We will pass the response variable to it using two columns. First, create the response variable for the training sample:

> y _ t r a i n <- data [ train ,3]

> temp <- ifelse ( y _ t r a i n ==0 , 1 , 0) > y _ t r a i n <- cbind ( y _ train , temp ) Take a look at the result:

> head ( y _ t r a i n ) y _ t r a i n temp 1405 1 0 3960 0 1 3175 1 0 7073 1 0 7747 1 0 8113 1 0

And check we have the correct number of observations: > nrow ( y _ t r a i n )

[1] 800

We follow the same procedure for the test sample: > y _ test <- data [ - train ,3]

> t e m p 1 <- ifelse ( y _ test ==0 , 1 , 0) > y _ test <- cbind ( y _ test , t e m p 1 )

> head ( y _ test ) y _ test t e m p 1 3954 1 0 1579 0 1 7000 0 1 4435 1 0

7424 1 0

6764 1 0

> nrow ( y _ test ) [1] 200

Now we are ready to specify our model. Let’s first build a stacked autoencoder without any noise. We will use two hidden layers each containing ten nodes:

> h i d d e n = c (10 ,10)

> fit <- Rsda ( x _ train , y _ train , h i d d e n ) The default noise level for Rsda is 30%. Since, we want to begin with a regular stacked autoencoder we set the noise to 0. Here is how to do that: > s e t C o r r u p t i o n L e v e l ( fit , x = 0.0) > summary ( fit ) $P r e t r a i n L e a r n i n g R a t e [1] 0.1 $C o r r u p t i o n L e v e l [1] 0 $P r e t r a i n i n g E p o c h s [1] 1000 $F i n e t u n e L e a r n i n g R a t e [1] 0.1 $F i n e t u n e E p o c h s [1] 500

NOTE...

You can set a number of the parameters in Rsda. We used something along the lines of:

s e t C o r r u p t i o n L e v e l ( model , x )

You can also choose the number of epochs and learning rates for both fine tuning and pretrain- ing:

• setFinetuneEpochs

• setFinetuneLearningRate • setPretrainLearningRate • setPretrainEpochs

The next step is to pretrain and fine tune the model. This is fairly straight forward:

> p r e t r a i n ( fit ) > f i n e t u n e ( fit )

Since the sample is small the model converges pretty quickly. Let’s take a look at the predicted probabilities for the response variable using the test sample:

> p r e d P r o b <- predict ( fit , x _ test )

> head ( predProb ,6) [ ,1] [ ,2] [1 ,] 0 . 4 4 8 1 6 8 9 0 . 5 5 1 8 3 1 1 [2 ,] 0 . 4 4 8 1 6 8 9 0 . 5 5 1 8 3 1 1 [3 ,] 0 . 4 4 8 1 6 8 9 0 . 5 5 1 8 3 1 1 [4 ,] 0 . 6 1 2 4 6 5 1 0 . 3 8 7 5 3 4 9 [5 ,] 0 . 4 4 8 1 6 8 9 0 . 5 5 1 8 3 1 1

[6 ,] 0 . 8 3 1 0 4 1 2 0 . 1 6 8 9 5 8 8

So, we see for the first three observations the model predicts approximately a 45% probability that they belong to class 1 and 55% that they belong to class 2. Let’s take a peek to see how it did: > head ( y _ test ,3) y _ test t e m p 1 3954 1 0 1579 0 1 7000 0 1

It was missed the first observation! However, it classified the second and third observations correctly. Finally, we construct the confusion matrix:

> p r e d 1 <- ifelse ( p r e d P r o b [ ,1] >=0.5 , 1 , 0)

> table ( pred1 , y _ test [ ,1] , dnn = c ( " P r e d i c t e d " , " O b s e r v e d " ) ) O b s e r v e d P r e d i c t e d 0 1 0 15 15 1 36 134

Next, we rebuild the model, this time adding 25% noise: > s e t C o r r u p t i o n L e v e l ( fit , x = 0 . 2 5 ) > p r e t r a i n ( fit )

> f i n e t u n e ( fit )

> p r e d P r o b <- predict ( fit , x _ test )

> p r e d 1 <- ifelse ( p r e d P r o b [ ,1] >=0.5 , 1 , 0)

> table ( pred1 , y _ test [ ,1] , dnn = c ( " P r e d i c t e d " ,

" O b s e r v e d " ) )

P r e d i c t e d 0 1 0 15 15 1 36 134

It appears to give us the same confusion matrix as a stacked autoencoder without any noise. So in this case, adding noise was not of much benefit.

Notes

101_{See Suwicha Jirayucharoensak, Setha Pan-Ngum, and Pasin Israsena,}

“EEG-Based Emotion Recognition Using Deep Learning Network with Principal Component Based Covariate Shift Adaptation,” The Scien- tific World Journal, vol. 2014, Article ID 627892, 10 pages, 2014. doi:10.1155/2014/627892

102_{See G. E. Hinton, S. Osindero, and Y. W. Teh, “A fast learning al-}

gorithm for deep belief nets,” Neural Computation, vol. 18, no. 7, pp. 1527–1554, 2006.

103_{See for example:}

Pugh, Justin K., Andrea Soltoggio, and Kenneth O. Stanley. "Real- time hebbian learning from autoencoder features for control tasks." (2014). Cireşan, Dan, et al. "Multi-column deep neural network for traffic sign classification." Neural Networks 32 (2012): 333-338.

104_{Webb, W. B., and H. W. Agnew Jr. "Are we chronically sleep de-}

prived?." Bulletin of the Psychonomic Society 6.1 (1975): 47-48.

105_{Horne, James. Why we sleep: the functions of sleep in humans and}

other mammals. Oxford University Press, 1988.

106_{How Much Sleep Do You Really Need? By Laura Blue Friday, June}

06, 2008.

107_{Why Seven Hours of Sleep Might Be Better Than Eight by Sumathi}

Reddy July 21, 2014.

108_{Hirshkowitz, Max, et al. "National Sleep Foundation’s sleep time}

duration recommendations: methodology and results summary." Sleep Health 1.1 (2015): 40-43.

109_See

http://www.aasmnet.org/

110_{Tsinalis, Orestis, Paul M. Matthews, and Yike Guo. "Automatic Sleep}

Stage Scoring Using Time-Frequency Analysis and Stacked Sparse Autoen- coders." Annals of biomedical engineering (2015): 1-11.

111_See:

• Goldberger, Ary L., et al. "Physiobank, physiotoolkit, and physionet components of a new research resource for complex physiologic signals." Circulation 101.23 (2000): e215-e220.

• Also visithttps://physionet.org/pn4/sleep-edfx/

112_See:

• Bengio, Yoshua, et al. "Greedy layer-wise training of deep networks." Advances in neural information processing systems 19 (2007): 153.

• Vincent, Pascal, et al. "Extracting and composing robust features with denoising autoencoders." Proceedings of the 25th international conference on Machine learning. ACM, 2008.

113_{See Vincent, Pascal, et al. "Stacked denoising autoencoders: Learning}

useful representations in a deep network with a local denoising criterion." The Journal of Machine Learning Research 11 (2010): 3371-3408.

114_{Möckel, Thomas, et al. "Classification of grassland successional stages}

using airborne hyperspectral imagery." Remote Sensing 6.8 (2014): 7732- 7761.

115_{See Chen Xing, Li Ma, and Xiaoquan Yang, “Stacked Denoise Au-}

toencoder Based Feature Extraction and Classification for Hyperspectral Images,” Journal of Sensors, vol. 2016, Article ID 3632943, 10 pages, 2016. doi:10.1155/2016/3632943

116_{Graham, Bell Alexander. "Improvement in telegraphy." U.S. Patent}

No. 174,465. 7 Mar. 1876.

117_{Huq, N. and Cleland, J. (1990) Bangladesh Fertility Survey, 1989.}

Dhaka: National Institute of Population Research and Training (NI- PORT).

Restricted Boltzmann

In document Deep Learning Made Easy With R (Page 182-193)