Connectivity Reasoning Algorithm - : A Question Answering Benchmark for Temporal Common-sense

CHAPTER 7 : A Question Answering Benchmark for Temporal Common-sense

8.5 Connectivity Reasoning Algorithm

One simple but often effective approach for reasoning is to focus on connectivity (as de- scribed in Figure 27). Specifically, we consider reasoning chains as valid if they correspond

to a short path in the meaning space, and invalid if they correspond to disconnected nodes.

Given nodesm, m0 ∈GM, this corresponds to two possible hypotheses:

h1 =m d

!m0, andh2 =m!m 0

We refer to distinguishing between these two worlds as the d-connectivity reasoning

problem. While we consider two extreme hypotheses for our analysis, we find that with a

small amount of noise, even these extreme hypotheses can be difficult to distinguish.

For the reasoning algorithm, one natural observation that can be used is the connectivity

of the symbol nodes inGS. Existing models of multi-hop reasoning (Khot et al., 2017) use

similar features to identify valid reasoning chains. Specifically, we consider the observation that there is a path of length at most ˜dbetweens ands0:

X_Sd˜(s, s0) = s!≤d˜ s0

The corresponding connectivity algorithm isSeparator_Xd˜ S

, which we would like to be

γ-accurate for the two hypotheses under consideration. Next, we derive bounds on γ for

these specific hypotheses and observation. Note that while the space of possible hypotheses

and observations is large, the above natural and simple choices still allow us to derive

8.5.1. Possibility of accurate connectivity

We begin by defining the following accuracy threshold,γ∗, as a function of the parameters

for sampling a symbol graph:

Definition 8. Given n, d ∈ _N and symbol graph sampling parameters p+, ε+, λ, define

γ∗(n, d, p+, ε+, ε−, λ) as 1−(1−(p+⊕ε−))λ 2d ·1−2e3ελ/+2 d+1 −2en(λB(d))2p−.

This expression is somewhat difficult to follow. Nevertheless, as one might expect, the accu-

racy threshold γ∗ increases (higher accuracy) as p+ increases (higher edge retention) or ε+

decreases (fewer dropped connections between replicas). Asλincreases (higher replication),

the impact of the noise on edges between node cluster or ddecreases (shorter paths), the

accuracy threshold will also increase.

The following theorem (see Appendix for a proof) establishes the possibility of aγ-accurate

algorithm for the connectivity problem:

Theorem 1. Let p+, p−, ε+, ε−, λ be parameters of the sampling process in Algorithm 1

on a meaning graph with nnodes. Letd∈_Nand ˜d=d(1 +λ). If p− and dsatisfy

(p−⊕ε−)· B2(d)<

1 2eλ2_n,

and γ = max{0, γ∗(n, d, p+, ε+, ε−, λ)}, then the connectivity algorithm Separator_Xd˜ S

γ-accurate for the d-connectivity problem.

Proof idea. The proof consists of two steps: first show that for the assumed choice of parameters, connectivity in the meaning space is recoverable in the symbol space, with high-probability. Then show that spurious connectivity in the symbol space (with no meaning space counterparts) has low probability.

rithmSeparator_Xd˜ S

with ˜d=d(1 +λ) is γ-accurate for thed-connectivity problem.

8.5.2. Limits of connectivity algorithm

We show that as d, the distance between two nodes in the meaning space, increases, it is

unlikely that we will be able to make any inference about their connectivity by assessing

connectivity of the corresponding symbol-graph nodes. More specifically, if d is at least

logarithmic in the number of nodes in the graph, then, even for relatively small amounts

of noise, the algorithm will see all node-pairs as connected within distance d; hence any

informative inference will be unlikely.

Theorem 2. Letc >1 be a constant andp−, ε−, λbe parameters of the sampling process

in Algorithm 1 on a meaning graph GM withn nodes. Letd∈Nand ˜d=λd. If

p−⊕ε− ≥

λn and d∈Ω(logn),

then the connectivity algorithm Separator_X_d˜ S

almost-surely infers any node-pair in GM

as connected, and is thus notγ-accurate for any γ >0 for the d-connectivity problem.

Proof idea. One can show that, for the given choice of parameters, noisy edges would dominate over informative ones and the symbol-graph would be a densely connected graph (i.e., one cannot distinguish actual connectivities from the spurious ones).

This result exposes an inherent limitation to multi-hop reasoning: even for small values

of noise, the diameter of the symbol graph becomes very small, namely, logarithmic in n.

This has a resemblance to similar observations in various contexts, commonly known as the

small-world phenomenon. This principle states that in many real-world graphs, nodes are

all linked by short chains of acquaintances, such as “six degrees of separation” (Milgram,

1967; Watts and Strogatz, 1998). Our result affirms that if NLP reasoning algorithms are

not designed carefully, such macro behaviors will necessarily become bottlenecks.

apply simultaneously. SinceB(.)≥1 andλ≥1, Theorem 1 requiresp−⊕ε−≤ ₂_eλ12_n < _λ12_n,

whereas Theorem 2 applies whenp−⊕ε−≥ _λnc > _λ12_n.

In document Reasoning-Driven Question-Answering For Natural Language Understanding (Page 147-150)