User Intent Classification - Mathematical Model

6.3 Mathematical Model

6.3.1 User Intent Classification

This Section presents a fast on–line procedure for the user intent classification of a test query as:

f†= arg max

~ f ∈F

P ( ˘Q = ˘q†| ~F = ~f , ˆΛ∗)P ( ~F = ~f |Θ∗_T) (6.3.1)

A naive solution requires the computation of O(Q

i∈KFi) product terms,

each having O(L|K|) multiplications. Further, the computation of Z( ~f ) for each ~

f requires a summation over Vmterms, which is computationally infeasible. Instead, we perform approximate inference using standard Belief Propaga- tion (BP) on a dynamically constructed factor graph. Figure 6.1 shows a possible

Figure 6.1: Example spanning tree (T ) on facets.

tree structure T defined on the facet variables. The resulting factor graph is tree structured having K variable nodes and O(M K) factor nodes, and the inference procedure converges in at most K iterations.

Dynamic Factor Graph

The factor graph for finding the optimal facet assignment ~f† for a test query ˘q† consists of two classes of factors, namely,

• factors which model the latent relationship between facets.

• factors which model the impact of query words on facets, under the simplifying assumption that there are no latent relationships between facets. Figure 6.2 shows the factor graph corresponding to the facet classification. The variable nodes, F1, . . . , FK, denote the facet variables which can take possi-

ble states in F1, . . . , FK, respectively. The factors Ti, Tij model the latent rela-

on facet variables.), while the factors W_`imodel the impact of the `thword of the test query ˘q†affecting the ithfacet variable.

The factors, Ti, i ∈ K and Tij, (i, j) ∈ T∗ correspond to the latent relationships between the facets. Their beliefs correspond to the parameters (Θ∗, T∗) based on the training data, given as

P_Ti(F_i = f ) = Θ∗_i(f )(1−di) (6.3.2)

P_Tij(F_i= f, F_j = f0) = Θ∗_ij(f, f0) (6.3.3)

The factors W_`imodel the conditional distribution ˆP (Fk = f |w ∈ ˘q) under

the assumption that all facets are independent of each other. The factors are specified by the beliefs,

P_Wi `(Fi = f ) = ξ i w;f = #( ~fk= f, w ∈ ˘q) #(w ∈ ˘q) (6.3.4) where ξ_w;fk is an approximation for ˆP (Fk = f |w ∈ ˘q) obtained as

ξ_w;fk ∝ P (w ∈ ˘q|Fk= f, ˆΛ∗) ˆP1k(Fk= f ) (6.3.5)

and ˆP₁k(Fk = f ) = #( ~f_Nk=f ) is the MLE estimate of the probability distribution

of the kthfacet under the simplifying assumption that the each facet is independent of other facets.

The parameter Ξ∗ = {ξ_w;fk : f ∈ Fk, k ∈ K, w ∈ V} models the impact of

the words in a query on the facets. Specifically, ξk_w;f measures the evidence that the kthfacet has value f when the query contains the word w.

We note that the evidence ξk_w;f in (6.3.4) is not defined if the word w is not part of the training vocabulary, i.e., the test query ˘q†contains a word w that is not present in the training data D. The following Section presents a method for incorporating the influence of new words, i.e. words in the test query which are not present in the vocabulary using the semantic relationships between the words and the existing vocabulary.

Figure 6.2: Factor graph for query facet classification.

WordNet Integration

Let w denote a new word not present in the vocabulary VD. Then, the corre-

sponding ξi_w;f are not defined and the previous algorithm cannot be used. How- ever, a semantically related word w0might be present in the vocabulary VD; for

which the corresponding ξi_w0_;f are defined. It is safe to assume that a strong se-

mantic relationship between the words w and w0 will be reflected in ξ_w;fi and ξ_wi0_;f. Therefore, it is of interest to be able to relate new words in the query to

existing words in vocabulary.

Now we present a two-step method for integrating the capabilities of Word- Net [77], a lexical database, into the facet classification for queries containing new words. We first query WordNet for the related word candidates W⊥ with similarity scores S⊥= {sw0 : w0 ∈ W⊥} for a new query word w using Breadth

First Search (BFS) till a maximum search depth L. The second step uses the candidates which are present in the vocabulary W♣= W⊥∩ VDto compute the

Algorithm 6.4 WordNet integration ˜ξw = FWN(w) .

Require: L {Maximum depth for BFS} Require: Ξ∗{Parameter based on D}

Ensure: w /∈ VD{New word not present in vocabulary}

Ensure: τ_fi = #(fi=f )

(W⊥, S⊥) ← BF S(w, L) {Related words in WordNet} W♣← W⊥∩ VD{Related words in vocabulary}

for i = 0 to K do Initialize {µi₁, . . . , µi_|F i|} ∼ Dir(τ i f1, . . . , τ i f_|Fi|) for all f in Fido ˜ ξ_w;fi = µ i f+ P w0∈W ♣sw0ξiw0;f P f ∈Fi(µif+ P w0∈W ♣sw0ξiw0;f) end for end for return ˜ξw= { ˜ξ_w;fi : i ∈ K, f ∈ Fi}

related words obtained from WordNet to augment results for a new word. We note that if the word has no neighbors either in WordNet or in the existing vocabulary, the conditional distribution P (Fi = ·|w ∈ ˘q†) is a Dirichlet

distribution with parameter:

#(Fi = f1) N , . . . , #(Fi = f|Fi|) N (6.3.6)

This means that when the test word w has no semantically related neighbors w0 with known statistics ξ_wk0_;f, then the conditional distribution is randomly dis-

tributed such that, on average, the count statistics for a facet would be same as that found in the training data. However, if there exist neighbors of the word W♣, that information is incorporated into the conditional distribution for the new word.

Algorithm 6.5 FastQ ˆf†= F (˘q†|D). Require: ˘q†{Test query}

Require: Ξ∗, Θ∗, T∗, V {Parameter from training data D} Initialize G(K, []) {factor graph on facet variables} for all (i, j) in T∗ do

Add factor Tij s.t. PTij(F_i = a, F_j = b) = Θ∗_ij(a, b)

end for for all i in K do Add factor Tis.t. P_Ti(F_i = a) = 1 Θ∗ i(a)da−1 end for for l = 1 to L do if q_`†∈ V then for all i in K do Add factor W_`is.t. P_Wi `(Fi= f ) = ξ i q_`†;f∀f ∈ Fi end for else ˜ ξ_q† ` = FW N(q †

`) {Use WordNet for new query word}

for all i in K do Add factor W_`is.t. P_Wi `(Fi= f ) = ˜ξ i q_`†;f∀f ∈ Fi end for end if end for ˆ f†= MAX-PROD(G) {Standard BP} return ˆf†= {f₁†, . . . , f_K†}

The FastQ Algorithm

The overall procedure for facet classification when a new query, ˘q† = {q1, . . . , qL} is given in Algorithm 6.5. The algorithm constructs the dynamic

tribution on facets constructed using the Chow-Liu algorithm. Then, if a query word q`is in the vocabulary V, it adds words to the corresponding word factors,

and constructs the word factor W_ì based on parameter Ξ∗, as in (6.3.4). Other- wise, if query word qìs not in vocabulary V, it constructs the factors W_ì based

on semantically related words in vocabulary V found using WordNet. The facet assignment ˆf†for the test query is computed using a standard implementation of the Max-Product Belief Propagation [63].

The small size of the constructed factor graph allows an implementation with limited resources. The maximum search depth L in WordNet is a user-controlled parameter which can be reduced to speedup the classification.

In document Unsupervised Identification of the User s Query Intent in Web Search (Page 118-124)