• No results found

4.4 DL-MDS Scheme

4.4.2 Topic Model Module Generation

Problem Statement

Patients may ask different questions even if they suffer from the same disease. Human beings can easily understand their questions by capturing important sentences with informative keywords within their questions. For example, for the question illustrated in Fig 4.5, readers understand that it is a lung cancer related question due to some informative keywords such as “ct-scan”, “’benign” and “lung cancer” in the 1st and 2nd sentences.

Thus, we want to design a model that can conduct similar performance of human beings (interpret the contributing factors such as important sentences and keywords in the input data). Specifically, given the input sentences in a user’s question, our pre-trained model within our topic model module can identify important sentences with their associate score vectors and also identify informative keywords. In this subsection, we demonstrate how to extract important sentences and informative keywords in patients’ questions via a LSTM model with a convolutional neural network (CNN) based word embedding method (shown in Fig 4.6).

Informative Representations Extraction via CNN based Word Embedding

(1) Sentence Matrix Layer (Layer1)

The inputs to the network are raw sentences that need to be translated into real-valued feature vectors, which will be processed by subsequent layers of the network. Specifically,

each input is a sentence s treated as a sequence of words: s ={word1, . . . , word|s|}, where

the mapping from words to their word embeddings is performed by using Word2Vec or Skip-gram. Hence, for each input sentence s we build a sentence matrix X1:|s| where each

row corresponds to a word embedding vector. For example, a sentence of length|s| can be expressed as:

X1:|s|= X1⊕ X2⊕ . . . ⊕ X|s| (4.3)

where Xiis a n-dimensional vector of the ithword (wordi) in the corresponding sentence

and ⊕ represents the concatenation operator. (2) Convolutional Layer (Layer2)

The aim of the convolutional layer is to capture patterns, i.e., discriminative word sequences that are common among the training instances of each disease cluster. In order to form a richer representation of the data, our model applies M filters that work in parallel to generate multiple feature maps. For example, in Fig 4.6, we slide 2 filters over the word matrices, where the first one slides over 2 words each time and the second one slides over 3 words each time. For each convolution filter Fk∈ Rh×n, we generate a new feature map,

where h is the length of the sliding window and n is the size of the input embedding. For each sliding window Xi:i+h−1, we compute:

CFi

k = f (Wk∗ Xi:i+h−1+ bk),

(4.4)

where ∗ denoted the convolution operation between an input matrix and a filter, f is an element-wise nonlinear transformation, Wk is the weight factor, Xi:i+h−1 is a ma-

trix slice of size h along the rows, and bk is the bias. Then, filter Fk is applied to all

possible windows in the sentence, {X1:h, . . . , X|s|−h+1:|s|}, to generate the characteristic

mapping CFk = [CFk1, . . . , CF|s|−h+1 k

]. Note that each component is the result of comput- ing an element-wise product between a row slice of X and a filter matrix Fk. After this

step, we can get M characteristic mappings, where each filter will generate one mapping correspondingly.

Figure 4.6: Topic Model Generation

The outputs from the convolutional layer are then passed to the pooling layer, whose goal is to aggregate the information and reduce the representation. Since both average and max pooling methods exhibit certain disadvantages: (i) in average pooling, all elements of the input are treated equally but we may want to give more weights to certain elements; (ii) the max pooling method may lead to overfitting on the training set. Thus, we have used both K-max pooling [84] and average pooling to generate local alignment representations, which contain important information of the corresponding questions.

Finally, the output of the pooling layer will be fed into a LSTM model.

Important Sentences Extraction via LSTM

Since not every sentence in the input questions is useful for the prediction task, we extract important ones using a LSTM model, where each time step of the model takes a sentence as an input.

Thus, to identify important sentences we need to select influential time steps, which can be evaluated based on their corresponding gradient variables. By studying those gradients in the networks, we can gain some insights into the internal mechanisms and identify important sentences. For example, at time step t, the model takes a sentence as input xt, and updates

the hidden state ht−1 to ht using:

ht= g(W xt+ U ht−1+ b), (4.5)

y1 CL1   x1 h0 h1 y2 CL2   x2 h1 h2 yn CLn   Xn-1 hn-1 hn δh1 δCL2

time step1 time step2 time stepn δh2 δhn

δCLn

A LSTM Time Step

Figure 4.7: Back-propagation

order to infer the importance of a time step, it is necessary to explore the gradient variables such as δW and δU , since high values for these variables indicate this time step is important for the final prediction task.

Before we describe the computation process, we first give the definitions of various notations we use:

• atrepresents the input activation

• itrepresents the input gate variable

• ft represents the forget gate variable

• ot represents the output gate variable

• CLt represents the cell state

• · represents the inner product • ⊗ represents the outer products

• σ represents the sigmoid function: σ(x) = 1 (1+e−x)

• represents the element-wise product or Hadamard product

During a forward propagation process, given the inputs we can compute at, it, ft, ot,

CLt and htfor each time step t. In this work, we derive the network gradients analytically

based on the back-propagation computation from the cell outputs all the way to the cell inputs (shown in Fig 4.7). Thus, in a LSTM time step t, we can compute the gradients weights (δWt, δUt) as follows:

Table 4.2: Top 10 Informative Keywords for A Liver Disorder Disease Disease Top 10 Informative Keywords

Liver Disorder liver, respiratory, kidney, esophagus, pancreas, cardiomyopathy, aorta, bowel, aneurysm, clot

δWt= [δat (1 − tanh2(at)), δit it (1 − it), δft ft (1 − ft),

δot ot (1 − ot)]T ⊗ xt

(4.6)

δUt= [δat (1 − tanh2(at)), δit it (1 − it), δft ft (1 − ft),

δot ot (1 − ot)]T ⊗ ht−1

(4.7)

where all necessary variables can be computed via Algorithm 1: (i) at step1, we compute {δot, δCLt}; (ii) at step2, we compute {δat, δit, δft}.

Algorithm 1 Gradients Computation

1: Input: F orwardP ass CLt= it at+ ft CLt−1; ht= ot tanh(CLt);

2: (1) step1 : Given δht, Compute δot, δCLt

3: (i) δot= δht·∂ht

∂ot = δht· tanh(CLt);

4: (ii) δCLt= δht·∂CL∂htt = δht· ot· (1 − tanh2(CLt))

5: (2) step2 : Given δCLt, Compute δit, δft, δat

6: (i) δit= δCLt·∂CLt ∂it = δCLt· at· it· (1 − it); 7: (ii) δft= δCLt·∂CL∂ftt = δCLt· CLt−1· ft· (1 − ft); 8: (iii) δat= δCLt·∂CLt ∂at = δCLt· it· (1 − a 2 t);

Topic Model Module Construction

To allow our query processing module to better understand users’ questions, we incorporate different question representations to identify key information. Specifically, we find important keywords by considering: (i) the similarity of keywords in the corresponding sentence with the disease labels based on their tf-idf values; (ii) the importance of each sentence, which is calculated based on Algorithm 1. As such, we define an attention weight for each keyword wordi of a disease c, awi, using the following equation:

awi = PN

j=1SCj∗tfji∗simic

T Fi , (4.8)

Figure 4.8: Query Processing Module

important score of jth sentence, tfi

j is the occurrence of keyword wordi in the jth sentence,

simicis the similarity score between wordi and disease label c and T Fi is the total number

of occurrences of wordi.

With such attention weights, we can identify important sentences and keywords asso- ciated with each disease cluster. Thus, we can generate a topic model module (T M ) with multiple clusters, where each cluster represents a disease type and contains informative key- words. For example, Table 4.6 is one topic model cluster for a liver disorder disease, which contains 10 identified informative keywords (ordered by their attention weights).