Researchers Huq and Cleland117, as part of a regular survey on the fertility of women in Bangladesh, collected data on mobil- ity of social freedom. The sub-sample measured the response of 8445 rural women to questions about whether they could engage in certain activities alone (see Table 6).
Name Description
Item 1 Go to any part of the village. Item 2 Go outside the village.
Item 3 Talk to a man you do not know. Item 4 Go to a cinema or cultural show. Item 5 Go shopping.
Item 6 Go to a cooperative or mothers’ club. Item 7 Attend a political meeting.
Item 8 Go to a health centre or hospital.
Table 6: Variables used by Huq and Cleland to measure women’s mobility of social freedom
Let’s use this sample to build our autoencoder. The data is contained in the R object Mobility, from the package ltm. We use it and the package RcppDL:
> require( R c p p D L ) > require( " ltm " ) > data( M o b i l i t y ) > data <- M o b i l i t y
Next, we set up the data; in this example we sample 1,000 observations without replacement from the original 8445 re- sponses. A total of 800 responses are used for the training set, with the remaining 200 observations used for the test set:
> set . seed (17) > n = nrow ( data )
> sample <- sample (1: n , 1000 , F A L S E ) > data <- as . matrix ( M o b i l i t y [ sample ,]) > n = nrow ( data )
> t r a i n <- sample (1: n , 800 , F A L S E )
Now to create the attributes for the training sample and test sample:
> x _ t r a i n <- matrix ( as . numeric ( unlist ( data [ train ,]) ) , nrow = nrow ( data [ train ,]) )
> x _ test <- matrix ( as . numeric ( unlist ( data [ - train ,]) ) , nrow = nrow ( data [ - train ,]) )
Need to check to ensure we have the correct sample sizes. The train set should equal 800 observations, and 200 for the test set:
> nrow ( x _ t r a i n ) [1] 800
> nrow ( x _ test ) [1] 200
All looks good. Now we can remove the response variable from the attributes R objects. In this example we will use item 3 (Talk to a man you do not know) as the response variable. Here is how to remove it from the attribute objects:
> x _ t r a i n <- x _ t r a i n [ , -3] > x _ test <- x _ test [ , -3]
The training and attribute R objects should now look some- thing like this:
> head ( x _ t r a i n ) [ ,1] [ ,2] [ ,3] [ ,4] [ ,5] [ ,6] [ ,7] [1 ,] 1 1 1 0 0 0 0 [2 ,] 1 1 0 0 0 0 0 [3 ,] 1 1 1 0 0 0 0 [4 ,] 0 0 0 0 0 0 0 [5 ,] 1 0 0 0 0 0 0 [6 ,] 1 0 0 0 0 0 0 > head ( x _ test ) [ ,1] [ ,2] [ ,3] [ ,4] [ ,5] [ ,6] [ ,7] [1 ,] 0 0 0 0 0 0 0 [2 ,] 0 0 0 0 0 0 0 [3 ,] 0 0 0 0 0 0 0 [4 ,] 1 0 0 0 0 0 0
[5 ,] 0 0 0 0 0 0 0
[6 ,] 1 0 1 0 0 0 0
Next, we prepare the response variable for use with the RcppDL package. The denoising autoencoder is built using the Rsda function from this package. We will pass the response variable to it using two columns. First, create the response variable for the training sample:
> y _ t r a i n <- data [ train ,3]
> temp <- ifelse ( y _ t r a i n ==0 , 1 , 0) > y _ t r a i n <- cbind ( y _ train , temp ) Take a look at the result:
> head ( y _ t r a i n ) y _ t r a i n temp 1405 1 0 3960 0 1 3175 1 0 7073 1 0 7747 1 0 8113 1 0
And check we have the correct number of observations: > nrow ( y _ t r a i n )
[1] 800
We follow the same procedure for the test sample: > y _ test <- data [ - train ,3]
> t e m p 1 <- ifelse ( y _ test ==0 , 1 , 0) > y _ test <- cbind ( y _ test , t e m p 1 )
> head ( y _ test ) y _ test t e m p 1 3954 1 0 1579 0 1 7000 0 1 4435 1 0
7424 1 0
6764 1 0
> nrow ( y _ test ) [1] 200
Now we are ready to specify our model. Let’s first build a stacked autoencoder without any noise. We will use two hidden layers each containing ten nodes:
> h i d d e n = c (10 ,10)
> fit <- Rsda ( x _ train , y _ train , h i d d e n ) The default noise level for Rsda is 30%. Since, we want to begin with a regular stacked autoencoder we set the noise to 0. Here is how to do that: > s e t C o r r u p t i o n L e v e l ( fit , x = 0.0) > summary ( fit ) $P r e t r a i n L e a r n i n g R a t e [1] 0.1 $C o r r u p t i o n L e v e l [1] 0 $P r e t r a i n i n g E p o c h s [1] 1000 $F i n e t u n e L e a r n i n g R a t e [1] 0.1 $F i n e t u n e E p o c h s [1] 500
NOTE...
You can set a number of the parameters in Rsda. We used something along the lines of:
s e t C o r r u p t i o n L e v e l ( model , x )
You can also choose the number of epochs and learning rates for both fine tuning and pretrain- ing:
• setFinetuneEpochs
• setFinetuneLearningRate • setPretrainLearningRate • setPretrainEpochs
The next step is to pretrain and fine tune the model. This is fairly straight forward:
> p r e t r a i n ( fit ) > f i n e t u n e ( fit )
Since the sample is small the model converges pretty quickly. Let’s take a look at the predicted probabilities for the response variable using the test sample:
> p r e d P r o b <- predict ( fit , x _ test )
> head ( predProb ,6) [ ,1] [ ,2] [1 ,] 0 . 4 4 8 1 6 8 9 0 . 5 5 1 8 3 1 1 [2 ,] 0 . 4 4 8 1 6 8 9 0 . 5 5 1 8 3 1 1 [3 ,] 0 . 4 4 8 1 6 8 9 0 . 5 5 1 8 3 1 1 [4 ,] 0 . 6 1 2 4 6 5 1 0 . 3 8 7 5 3 4 9 [5 ,] 0 . 4 4 8 1 6 8 9 0 . 5 5 1 8 3 1 1
[6 ,] 0 . 8 3 1 0 4 1 2 0 . 1 6 8 9 5 8 8
So, we see for the first three observations the model predicts approximately a 45% probability that they belong to class 1 and 55% that they belong to class 2. Let’s take a peek to see how it did: > head ( y _ test ,3) y _ test t e m p 1 3954 1 0 1579 0 1 7000 0 1
It was missed the first observation! However, it classified the second and third observations correctly. Finally, we construct the confusion matrix:
> p r e d 1 <- ifelse ( p r e d P r o b [ ,1] >=0.5 , 1 , 0)
> table ( pred1 , y _ test [ ,1] , dnn = c ( " P r e d i c t e d " , " O b s e r v e d " ) ) O b s e r v e d P r e d i c t e d 0 1 0 15 15 1 36 134
Next, we rebuild the model, this time adding 25% noise: > s e t C o r r u p t i o n L e v e l ( fit , x = 0 . 2 5 ) > p r e t r a i n ( fit )
> f i n e t u n e ( fit )
> p r e d P r o b <- predict ( fit , x _ test )
> p r e d 1 <- ifelse ( p r e d P r o b [ ,1] >=0.5 , 1 , 0)
> table ( pred1 , y _ test [ ,1] , dnn = c ( " P r e d i c t e d " ,
" O b s e r v e d " ) )
P r e d i c t e d 0 1 0 15 15 1 36 134
It appears to give us the same confusion matrix as a stacked autoencoder without any noise. So in this case, adding noise was not of much benefit.
Notes
101See Suwicha Jirayucharoensak, Setha Pan-Ngum, and Pasin Israsena,
“EEG-Based Emotion Recognition Using Deep Learning Network with Principal Component Based Covariate Shift Adaptation,” The Scien- tific World Journal, vol. 2014, Article ID 627892, 10 pages, 2014. doi:10.1155/2014/627892
102See G. E. Hinton, S. Osindero, and Y. W. Teh, “A fast learning al-
gorithm for deep belief nets,” Neural Computation, vol. 18, no. 7, pp. 1527–1554, 2006.
103See for example:
Pugh, Justin K., Andrea Soltoggio, and Kenneth O. Stanley. "Real- time hebbian learning from autoencoder features for control tasks." (2014). Cireşan, Dan, et al. "Multi-column deep neural network for traffic sign classification." Neural Networks 32 (2012): 333-338.
104Webb, W. B., and H. W. Agnew Jr. "Are we chronically sleep de-
prived?." Bulletin of the Psychonomic Society 6.1 (1975): 47-48.
105Horne, James. Why we sleep: the functions of sleep in humans and
other mammals. Oxford University Press, 1988.
106How Much Sleep Do You Really Need? By Laura Blue Friday, June
06, 2008.
107Why Seven Hours of Sleep Might Be Better Than Eight by Sumathi
Reddy July 21, 2014.
108Hirshkowitz, Max, et al. "National Sleep Foundation’s sleep time
duration recommendations: methodology and results summary." Sleep Health 1.1 (2015): 40-43.
109See
http://www.aasmnet.org/
110Tsinalis, Orestis, Paul M. Matthews, and Yike Guo. "Automatic Sleep
Stage Scoring Using Time-Frequency Analysis and Stacked Sparse Autoen- coders." Annals of biomedical engineering (2015): 1-11.
111See:
• Goldberger, Ary L., et al. "Physiobank, physiotoolkit, and phys- ionet components of a new research resource for complex physiologic signals." Circulation 101.23 (2000): e215-e220.
• Also visithttps://physionet.org/pn4/sleep-edfx/
112See:
• Bengio, Yoshua, et al. "Greedy layer-wise training of deep net- works." Advances in neural information processing systems 19 (2007): 153.
• Vincent, Pascal, et al. "Extracting and composing robust features with denoising autoencoders." Proceedings of the 25th international conference on Machine learning. ACM, 2008.
113See Vincent, Pascal, et al. "Stacked denoising autoencoders: Learning
useful representations in a deep network with a local denoising criterion." The Journal of Machine Learning Research 11 (2010): 3371-3408.
114Möckel, Thomas, et al. "Classification of grassland successional stages
using airborne hyperspectral imagery." Remote Sensing 6.8 (2014): 7732- 7761.
115See Chen Xing, Li Ma, and Xiaoquan Yang, “Stacked Denoise Au-
toencoder Based Feature Extraction and Classification for Hyperspectral Images,” Journal of Sensors, vol. 2016, Article ID 3632943, 10 pages, 2016. doi:10.1155/2016/3632943
116Graham, Bell Alexander. "Improvement in telegraphy." U.S. Patent
No. 174,465. 7 Mar. 1876.
117Huq, N. and Cleland, J. (1990) Bangladesh Fertility Survey, 1989.
Dhaka: National Institute of Population Research and Training (NI- PORT).