DBN has generally been regarded as one of the best known deep leaning mod- els [178]. It has proven its ability to discover better discriminative features and; consequently, to improve accuracy [170]. Furthermore, DBN has been shown out- standing performances on visual object recognition and image denoising [179]. However, the idea of DBN for feature selection for regression has not been ap- plied yet. The novel unsupervised feature selection framework, DFSFR, utilises deep belief network to select discriminative antibody features and then applies SVR to perform regression task. Therefore, DFSFR is a multi-level feature selec- tion framework that incorporates deep learning and SVR in order to select most discriminative features from high dimensional data. The proposed unsupervised
Figure 5.1: DFSFR Framework (a) multi-output (b) single-output. hrepre-
sents hidden neurons.
feature selection framework, DFSFR, is demonstrated in Fig. 5.1. DFSFR takes input variables and feeds them into the deep belief network, then, DBN uses the weights provided from hidden nodes to produce weights for features. Next, fea- tures are prioritised according to their weights. Then SVR takes ranked features to generate a predictive model and produce estimated output variables. Finally evaluation metrics are exploited to assess effectiveness of proposed method.
DBN incorporates simple learning modules: Restricted Boltzman Machines (RBMs), which consist of visible and hidden layers that represent features. These hidden and visible layers are connected by symmetrical weights. Input layer is repre- sented by h0 and last hidden layer, hl, computes the output by utilising the
output of previous layer hl−1. Therefore, output can be calculated from the
following formula [169]:
where bl a vector of offsets, Wk a matrix of weights, and ϕ is the activation
function. The output layer is appropriate to make predictions. For quantitative prediction or regression tasks the output is:
hl =α0k+αkϕ(bliW l ih
l−1) (5.2)
whereWl
i is theith row ofWl,α0k is the bias, andαk represents a set of weights
between the last and next to last layers. The probability of visible and hidden neuron vectors for DBN can be calculated by:
P(v, h1, ..., hl) =P(hl−1, hl)(
l−2
Y
k=0
P(hk|hk+1) (5.3)
where P(hk−1|hk) is a conditional probability for the visible units conditioned on the hidden units of the RBM at level k, v is vector of visible units, and
P(hl−1, hl) represents joint distribution in the top level which is RBM. A general
representation of a DBN with an input and l hidden neurons is demonstrated in Fig. 5.2. The last two layers comprise an RBM. Weight updates for a single RBM are performed with a gradient descent or accent; the difference is the sign which is plus or minus, utilised to perform update.
∆Wij(t+ 1) =Wij(t) +
∂logp(v)
∂Wij
(5.4) where p(v) is probability of a visible vector, is a parameter with a small value , and ∂logp∂W(v)
ij is the gradient which can also be calculated as [180]:
∂logp(v) ∂Wij
=< xi, hj >data−< vi, hj >model (5.5)
where <>p represents averages with respect to distribution p.
In RBM, weight updates are defined by Equation (5.4). The probability of visible vector,p(v), can be calculated from:
p(v) = 1
Z
X
h
where Z is the partition function and E(v, h) is the energy function assigned to the state of the network. Therefore, probability of each pair of hidden and visible vectors can be defined as:
p(v, h;θ) = 1 Z(θ) X h e−E(v,h;θ) (5.7) where Z(θ) = P h P
v(−E(v, h;θ)) and the energy function is:
E(v, h) =aTh−bTv −vTwh=−X i bivi− X j ajhj − X i,j wijvihj (5.8)
whereaiand bi are bias of visible inputs,vi and hidden variables,hj, respectively
andwij are weights between units of layers. By utilising Equation (5.8) Equation
(5.7) can be rewritten as [181]: p(v, θ) = X h e−E(v,h;θ) P v,he−E(v,h;θ) = 1 Z(θ) X h exp(vTwh+bTv+aTh) = 1 Z(θ)exp(b Tw) F Y j X hj∈0,1 exp(ajhj + D X 1 wijvihj) = 1 Z(θ)exp(b Tw) F Y j (1 +exp(aj + D X 1 wijvi)) (5.9)
By utilising the energy function the following equations can be defined:
p(v|h;θ) =YP(vi|h) and P(vi = 1|h) =ϕ(bj+ X hiwij) (5.10) p(h|v;θ) =YP(hj|v) and P(hj = 1|v) =ϕ(aj + X vjwij) (5.11)
where ϕis the sigmoid function which can be calculated from:
ϕ(x) = 1
However, the energy function is not applicable for regression tasks where con- tinuous data is used. Therefore, RBM needs to be modified in order to deal with regression tasks. The energy function can be revised by replacing binary inputs with linear units with independent Gaussian noise, so that RBM can han- dle continuous-valued data [182]. This method is called as Gaussian-Bernoulli Restricted Boltzmann Machines (GBRBMs) [183]. The energy function for real- valued data can be calculated from:
E(v, h;θ) = X i (vi−bi)2 2σ2 i −X j ajhj − X i,j wijvihj σi (5.13)
where θ = {W, a, b, σ2} is a vector and σi is the variance of visible or input
variable vi.
After modifying RBM, DBN is capable of handling real-valued data. The pro- posed model takes given data as input to DBN, and DBN generates RBM weight matrix, W, of dimension (number of hidden units, number of inputs). Then, DFSFR assigns feature weights, G, according to following formula:
Gj =
Pd
i=1Wi
h (5.14)
whered is number of features,h is number of hidden neurons, andW represents a weight vector. Finally, SVR or MSVR takes the vector G, performs regression and calculates the prediction performance of the model by utilising evaluation metrics, e.g., RMSE.
5.4
A Hybrid Unsupervised Feature Selection
Method (DKBFS)
In Chapter 4, the KBFS framework is presented. In this chapter, a novel deep learning based unsupervised feature selection method, DFSFR, is proposed. Ex- perimental results, which are shown in next section, conclude that the proposed methods produced better results than the state of the art unsupervised feature se- lection methods. The KBFS method is utilised for the GSE44763 and GSE40279 data sets, which are considered to be ultra high dimensional. However, since GSE44763 and GSE40279 data sets are considered to be ultra high dimensional,
Figure 5.2: General Representation of DBN.
(The top two layers constitute an RBM.W s represent weights between units of layers andW0s are the transpose of W s).
KBFS is utilised as a pre-processing step of DFSFR method. Therefore, a hy- brid method that combines both KBFS and DFSFR is proposed and abbreviated as DKBFS, is generated. This hybrid method has achieved the best results on GSE40279 data set and produced the second best result on GSE44763 data set (the best result is achieved by DFSFR). The experimental results are conducted on the GSE44763 and GSE40279 data set and presented in next section.
DKBFS integrates KBFS and DFSFR methods where KBFS is used as a pre- filtering step for KBFS. User defined number of features are eliminated by using KBFS and selected features are exploited as input variables for DFSFR. Then, DFSFR generates the weights of features. Weighted features are then used as input variables of SVR to construct a predictive model.
Figure 5.3: The Flowchart of DKBFS.
(DKBFS is a hybrid unsupervised feature selection method where KBFS is em- bedded into DFSFR)