Compositional Matrix-Space Models and Compositional VSMs

2.3 Compositional Matrix-Space Models

2.3.1 Compositional Matrix-Space Models and Compositional VSMs

It has been theoretically shown that CMSMs are capable to simulate well known composition operations in VSMs and therefore, they subsume compositional VSMs. To show this, let ψ▷◁ : Rn → Rm×m and χ▷◁ : Rm×m → Rn be the mapping functions

between the vector and matrix representations, and a vector vσ ∈Rn assigned to each

wordσ of vocabulary Σ in a natural language.

Consider vector addition as the composition operation. Given a sequence of words w=σ₁. . . σk, vw=Pki=1vσi. Now, define the function ψ+ :R

n_→_R(n+1)×₍n+1) _{to map} the vector v_σ of word σ to the corresponding matrix representation in the following way:

Mσ =ψ₊(vσ) =      1 · · · 0 0 ... ... ... 0 1 0 vσ 1      .

Multiplying the resulting matrices yields:

ψ₊(vw) =ψ+(vσ1). . . ψ+(vσk) Mw =ψ₊(vw) =      1 · · · 0 0 ... ... ... 0 1 0 v_w 1      .

Now define χ₊:R(n+1)×(n+1) →Rn to extract the lowest row omitting the last element,

which results in:

χ₊(Mw) =vw= Σki=1vσi.

Element-wise vector multiplication in VSMs, is defined as vw=vσ1⊙ · · · ⊙vσk where v_w(j) =v_σ₁(j)·v_σ₂(j). . .v_σ_k(j) for 1≤j≤n, given a sequence of words w=σ₁. . . σk. This operation can be also simulated by CMSMs. This time, letψ⊙ :Rn→Rn×nencode

the vector vσ to diagonal matrix with the vector elements as its diagonal elements:

Mσ =ψ⊙(vσ) =        vσ(1) 0 · · · 0 0 vσ(2) ... ... ... 0 0 · · · 0 vσ(n)        .

2.3 Compositional Matrix-Space Models 35 Then, multiplying the matrices corresponding to wordsσi results in:

Mw =ψ⊙(vw) =        vw(1) 0 · · · 0 0 vw(2) ... ... ... 0 0 · · · 0 vw(n)        .

Now, defineχ⊙ :Rn×n→Rnto extract the main diagonal elements of the output matrix,

which is:

χ⊙(Mw) =vw =vσ1 ⊙ · · · ⊙vσk.

Similarly, circular convolution operation1_{, which has been introduced by Plate (1995)}

as a composition operation in VSMs, can be simulated via CMSMs. Circular convolution is interpreted as a compressed outer product of vectors. Given two words σ₁ andσ₂ with associated n-dimensional vectorsu and v, respectively, circular convolution is defined as a tensor product of the two vectors which results in a matrix Qof dimension n×n whereQ(i, j) =u(i)v(j), and then a convolution operation is applied to map the matrix to then-dimensional vector vw corresponding to w=σ1σ2 where:2

vw(i) = n−₁

j=0

u(j)v(i−j) for 0≤i≤n−1.

Fig. 2.11 illustrates the computation of circular convolution operation as a compressed outer product of two vectors. To simulate this operation by CMSMs, let ψ⊛(vσ) be the

Figure 2.11: Circular convolution operation on two 3-dimensional vectors c and x. t(i) =P2

j=0c(j)x(i−j) for 0≤i≤2. To avoid confusion, indices start from 0. Illustration is taken from (Plate, 1995). Copyright (1995) by IEEE Press.

n×nmatrixMσ associated to wordσ where the first row of the matrix is the vector vσ: 1

This composition operation is distinct from the convolution operation in neural networks. Convolu- tion operation in NNs applies a filter to an input data to summarize the features in the input.

Mσ =ψ⊛(vσ) =    v(1) v(2) v(3) v(3) v(1) v(2) v(2) v(3) v(1)   .

Then multiplying matrices of two words σ₁ and σ₂, results in:

Mw=ψ⊛(vw) =    v_w(1) v_w(2) v_w(3) vw(3) vw(1) vw(2) vw(2) vw(3) vw(1)   .

If we define χ_⊛(Mw) to extract the first row of the resulting matrix, it outputsvw. 2.3.2 Compositional Matrix-Space Models and Regular Languages

Rudolph and Giesbrecht (2010) showed that symbolic approaches to language (i.e., discrete grammar formalisms) can be embedded in CMSMs. This suggests the CMSMs are compatible with both discrete (e.g., symbolic approaches) and continuous (e.g., numeric approaches) settings.

First, they showed how CMSMs determine whether a sequence of symbols belongs to a given regular language (i.e., given a sequence of symbols determine if it is accepted by a given finite state automaton).

Definition 2.1 (Finite State Automata). A finite automaton is defined as A =

(Q,Σ, δ, QI, QF) whereQ={q1, . . . , qm}is a finite set of states,Σis a finite set of input symbols, δ⊆Q×Σ×Q is the transition function from one state to another labeled by a symbol in Σ, andQI and Q_F are the sets of initial and final states, respectively. The language accepted by A is the set of strings (i.e., sequences of symbols) w∈Σ∗ _accepted

by A. If we let zero, one, or more transitions from a state on the same symbol, the automaton is called a nondeterministic finite automaton. This time, δ is a map from Q×Σ to the power set of Q, 2Q. (Hopcroft and Ullman, 1979, p. 17-20). ♢

Eilenberg (1974) showed that to each symbolσ ∈Σ a transition matrix [[σ]] =Mσ of size m×m (where m is the number of states) can be assigned. If we assign to every symbol σ a matrix with:

Mσ(i, j) =

(

1 if (qi, σ, qj)∈δ,

0 otherwise.

for 1≤i, j≤m, the matrixMσ encodes all state transitions labeled by the input symbol σ. Likewise, for a sequence w=σ₁. . . σk∈Σ⋆_{, the matrix}_Mw _{:= [[}_σ

1]]. . .[[σk]] encodes all state transitions labeled w. This matrix determines whether w belongs to a given regular language, that is if it is accepted by a given finite automaton A.

It has been shown that by selecting an appropriate assignment [[ ·]] for a CMSM, and

defining two vectors v_I and v_F as follows:

v_I(i) = ( 1 ifqi∈QI, 0 otherwise, vF(i) = ( 1 if qi ∈QF, 0 otherwise,

2.3 Compositional Matrix-Space Models 37 for 1≤i, j ≤m,wis accepted by the automatonAexactly ifv_IMwvF ≥1 (Rudolph and Giesbrecht, 2010). Of course, one can also define the two vectors vI and vF differently and a threshold valuer to comparevIMwvF against the threshold value. Based on this idea, Rudolph and Giesbrecht (2010, p. 912) define the notion of matrix grammars as follows :

Definition 2.2 (Matrix Grammars). LetΣbe an alphabet. Amatrix grammarMof degreenis defined as the pair ⟨[[·]], AC⟩where[[·]]is a mapping from Σton×nmatrices and AC={⟨α₁,β₁, r₁⟩, . . . ,⟨α_ℓ,β_ℓ, rℓ⟩}with α1,β₁, . . . ,αℓ,βℓ ∈Rn and r1, . . . , rℓ∈R

is a finite set of acceptance conditions. The language generated by M (denoted by L(M)) contains a sequence of symbols σ₁. . . σk∈Σ∗ exactly if α⊤i [[σ1]]. . .[[σk]]βi ≥ri for all i∈ {1, . . . , l}. We call a language L matricible if L =L(M) for some matrix

grammar M. ♢

Based on the above definition, regular languages are matricible by appropriately encoding to CMSMs.

Rudolph and Giesbrecht (2010) also studied other formal languages such as non- regular languages (e.g., L(A) = {w|w = wR_}_{) and non-context free languages (e.g.,} L(A) ={ambmcm|

m >0}) and showed that some are matricible by appropriate encoding. Moreover, they

showed that the intersection of two matricible langauges is also a matricible language. However, some formal languages still need to be further investigated. For instance, they conjectured that not all context-free languages are matricible as they have not been able to show that, for example, the language of all well-formed parenthesis expressions is matricible. Some other questions are also open, such as whether matricible languages are closed under concatenation and require more investigations.

2.3.3 Compositional Matrix-Space Models and Weighted Finite Automata

In document Compositional Matrix-Space Models: Learning Methods and Evaluation (Page 48-51)