While the tRNN model of the last sectiongloballyaccounts for linguistic structure, by performing a hierarchical forward computation analogously to a parse tree of the input sentence,localcomposition at the tree nodes remains additive. As discussed previously, this type of function seems inadequate for the representation of composition including function words, such as adjectives, transitive verbs or negation.
The syntactic structure of composition in the tRNN suggests that the composition process is defined by a function of the constituent vectors, in analogy to the functions of formal semantics. In reality, tRNN composition input is aconcatenationof constituents, which is equivalent to the addition of twoindependentmappings over the arguments. This representation seems to contradict our understanding of functional application in compositional models of semantics.
A possible alternative is the recursive matrix-vector model of Socher et al. (2012), where each word is represented by a matrix and a vector, and composition is defined by symmetric functional application of the constituents. The requirement that words can act as functions on other words is satisfied here, but due to the added word represen- tations, model complexity quickly grows with vocabulary size. A preferable solution seems to be employing higher-order tensors, allowing for an adequate representation of argument structure, while still using a single, global composition function.
Tensor Layer by Slices With the above considerations in mind, therecursive neural tensor network(tRNTN) is proposed in Socher et al. (2013a,b). The previous tRNN composition function of Eq. (4.3) is augmented by a third-order tensor term, and composition is given by the sum of a linear map and a bilinear map of layer input. Defined by tensor slices, the tRNTN composition function is then:4
z
(x,y)=
f x
TV
(1:d)y
+W[x
;y]
+
b
(4.4) where
x,y
∈R
d is the composition input, andV
∈R
d×d×d is a full-rank third-order tensor defining a bilinear map fromR
d input spaces to anR
d output space. The remaining terms are identical to those of Eq. (4.3).V
(i), thei
-th ofd
slices ofV
, determines thei
-th component ofz
V
∈
R
d, the output vector of the tensor term, as follows:z
V,i=x
TV
(i)y
We can think of the tRNTN layer output
z
as resulting from threed
-dimensional terms: tensor outputz
V defined above,z
W, the linear combination of concatenated input arguments parametrized byW
, and the constant termz
b=b
. Summing, and applying the nonlinearity, yields the layer output,z
=
f(z
V+z
W+z
b)
.4 We define the tensor function of Socher et al. (2013a) and Bowman et al. (2015), which differs from
the tensor of Socher et al. (2013b), where input consists of concatenated child vectors, i.e.[x;y]in
Matricized Tensor Layer Rather than using the definition by slices of Eq. (4.4), we
will usually describe the tRNTN inmatricizedform, which allows for a more intuitive exposition of the multiplicative interaction of arguments. We can then equivalently define the tRNTN composition as follows:
z
(x,y)=
f
V
m(x⊗y)
v+W[x
;y]
+
b
(4.5) where
V
m is the matricization of third-order tensorV
∈
R
d×d×d, and(x⊗y)
v the vectorization of the tensor (or Kronecker) product of input vectorsx
andy
. All other terms are identical to Eq. (4.4).A more detailed view of the matricized tensor
V
, and the vectorized Kronecker product of input vectorsx,y
is given by:V
m(x⊗y)
v=
v(11,1). . .
v(11,)d. . .
v(d1,)1. . .
v(d1,)d...
. . .
...
v(1d,1). . .
v(1d,d). . .
v(dd,)1. . .
v(dd,)d
x
1y
1...
x
1y
d...
x
dy
1...
x
dy
d
Here, tensor
V
is represented by itsmode-3 matricization, defined in Appendix A.0.3, resulting in thed
×d
2 matrixV
m. The Kronecker product of inputx,y
is given in vectorized form, arranged as ad
2×
1 column vector. We denote here by superscript indicesv(1). . .
v(q)the slicing index of the mode-3 matricization. These indices alsoallow us to connect theslicedefinition with thematricizeddefinition: In the matricized view, the
d
slices of Eq. (4.4) are vectorized, and vertically stacked row-wise to formmatrix
V
m. Composition ofV
mwithd
2×
1 column vector(x⊗y)
vis given by matrix multiplication, and yields ad
-dimensional vector, the tensor term of Eq. (4.5), and equivalent to the tensor term of the slice definition.In the previous definitions we described composition of constituents belonging to
R
d spaces, for a composition output inR
d. We can in principle define composition over arbitrary dimensions, e.g. by some tensorV
∈R
r×(p×q).5 In order to allow for composition of arbitrary constituents, we will generally demand that their dimensions match, thus constraining the mapping toR
r×(p×p). Forrecursivelydefined composition, such as in the tree-structured networks described here, we demand additionally that input and output dimensions match, thus constraining the mapping further to the case of our definitions,V
∈
R
p×(p×p).65 Parentheses and index order here mark, purely for convenience, a mapping fromRp×
RqtoRr.
6 Note that the models of Chapter 5 also employ the functions of Eqs. (4.3) and (4.5) in a non-recursive
fff →
V
: 3×
9 fff →V
: 3×
9 fff →V
: 3×
9 not all pets fff →V
: 3×
9 not walkFigure 4.4:Global structure and tensor composition of tRNTN forward pass.The syntactic structure of the tRNTN forward pass is identical to the tRNN one, but local composition is multiplicative rather than additive. Node input in orange is given by the Kronecker product of child vectors. The (matricized) tensorVis applied to this vector by matrix multiplication. The graph only depicts calculations of the tensor term; the full function of Eq. (4.5) corresponds to summing green vectors at each node of Figs. 4.3 and 4.4, and adding the (omitted) bias term.