• No results found

Bayesian Belief Updating (BBU)

Chapter 2: Computational modelling of the incremental processing of a sentence

2.1. Bayesian Belief Updating (BBU)

Incremental speech processing involves using the available information from the context to constrain an upcoming input (which can be a word, a phrase, a sentence etc.) and integrate it into the prior context once it is heard in order to constrain a subsequent input more accurately. This cycle continues until the speaker ends his message. This conceptual description of

incremental speech processing fits well in the Bayesian framework of language

comprehension. The motivation of this framework originates from Bayesโ€™ theorem which describes the probability of an event based on the prior information and knowledge related to the event. A simple mathematical description of Bayesโ€™ theorem is as follows:

๐‘ƒ(๐ด|๐ต) =๐‘ƒ(๐ต|๐ด)๐‘ƒ(๐ด)๐‘ƒ(๐ต) โ€ฆ (1)

where A is a target variable and B is a context variable on which the target A is conditioned on. As a simple application to language processing, suppose that a listener hears an adjective- noun phrase like โ€œyellow bananaโ€. The goal is to model the listenerโ€™s internal beliefs about โ€œbananaโ€ given the preceding adjective โ€œyellowโ€. By simply substituting ๐ด with โ€œbananaโ€ and ๐ต with โ€œyellowโ€, we obtain the following:

๐‘ƒ("๐‘๐‘Ž๐‘›๐‘Ž๐‘›๐‘Ž"๐‘ก|"๐‘ฆ๐‘’๐‘™๐‘™๐‘œ๐‘ค"๐‘กโˆ’1) =๐‘ƒ("๐‘ฆ๐‘’๐‘™๐‘™๐‘œ๐‘ค"๐‘กโˆ’1๐‘ƒ("๐‘ฆ๐‘’๐‘™๐‘™๐‘œ๐‘ค"|"๐‘๐‘Ž๐‘›๐‘Ž๐‘›๐‘Ž"๐‘ก)๐‘ƒ("๐‘๐‘Ž๐‘›๐‘Ž๐‘›๐‘Ž"๐‘ก)

๐‘กโˆ’1) โ€ฆ (2)

where ๐‘ก and ๐‘ก โˆ’ 1 indicates the relative position of each word in the phrase. The goal is to model the posterior ๐‘ƒ("๐‘๐‘Ž๐‘›๐‘Ž๐‘›๐‘Ž"|"๐‘ฆ๐‘’๐‘™๐‘™๐‘œ๐‘ค") describing the probability of โ€œbananaโ€ given โ€œyellowโ€. This expression already proves its usefulness by showing an explicit mapping between the goal (posterior) and the prior. The prior ๐‘ƒ("๐‘๐‘Ž๐‘›๐‘Ž๐‘›๐‘Ž") describes the listenerโ€™s beliefs about the target โ€œbananaโ€ (i.e. subjective probability of โ€œbananaโ€ alone) before knowing the context โ€œyellowโ€. Then, the likelihood ๐‘ƒ("๐‘ฆ๐‘’๐‘™๐‘™๐‘œ๐‘ค"|"๐‘๐‘Ž๐‘›๐‘Ž๐‘›๐‘Ž") evaluates the context โ€œyellowโ€ against his prior beliefs about the target โ€œbananaโ€. The evidence

๐‘ƒ("๐‘ฆ๐‘’๐‘™๐‘™๐‘œ๐‘ค") works as a context normaliser whose practical role is explained in Footnote 1 in Chapter 1. The concept of belief updating is reflected by the shift from a prior to a posterior at any given cycle until the posterior converges to the delta distribution (target = 1 or 0 otherwise). In a modelling perspective, this Bayesian approach provides useful insight into how prediction may change and develop as new words are incrementally unfolded in a sentence.

59

Another important aspect of this approach is that it models the cyclical development of prediction in sentence and discourse comprehension. Suppose that we are modelling the listenerโ€™s syntactic prediction of a complement structure in a sentence: โ€œThe intrepid child found the pictureโ€. For illustration purposes, I assume that the subject NP โ€œThe intrepid childโ€ is independent of the following complement structure such that it is constrained entirely by the verb โ€œfoundโ€ in a preceding context. Then, it is possible to track changes in prediction as follows (Figure 2-1):

Figure 2-1: A simplistic visual illustration of belief updating about the complement syntactic structure across different cycles in time. SCF = subcategorization frame.

In Figure 2-1, Cycle 1 describes the process of incorporating the main verb โ€œfoundโ€ into prediction. Cycle 2 shows that this verb-incorporated prediction becomes a new prior to constrain the syntactic frames. As a direct object structure is confirmed by the determiner โ€œtheโ€, the prediction cycle ends in Cycle 2 in this example and the prior facilitates the integration of the direct object structure into the sentence. Hence, by tailoring the prediction more specifically to the up-to-date context, this Bayesian model promotes more rapid and accurate integration of the target frame (direct object). It is worth noting that any posterior at

60

the end cycle (Cycle 2 in this example) converges to a delta distribution and the process of belief updating becomes conceptually equivalent to integrating the target into the context (the โ€œtargetโ€, in practice, refers to a specific property (e.g. semantic meaning or grammatical category etc.) of a particular linguistic unit (e.g. a word, a phrase, a clause etc.) that appears after the context).

As shown in (2) and Figure 2-1, incremental speech comprehension proceeds with updating the beliefs each time an input (i.e. verb) that constrains the target (i.e. SCF) is heard.

However, as already discussed in Chapter 1, prediction in speech processing is not merely limited to words but includes a variety of linguistic aspects from perception (phonological- lexical) to cognition (syntax-semantics). The psycholinguistic accounts based on the Fodorian modular theory (Fodor, 1983) claims that the processing streams are organized into separate, autonomous modules (Frazier, 1987). Other accounts propose jointly interacting streams (Marslen-Wilson, 1975; Altmann & Steedman, 1988). In this section, I briefly review a recent generative framework proposed by Kuperberg (2016) in the Bayesian perspective. Kuperbergโ€™s framework claims that listeners infer the underlying cause of the observed inputs from a set of hierarchically organized representations (or internal generative model). These representations best explain the statistical properties of the observed inputs based on their beliefs about the message that the speaker tries to convey. The beliefs propagate down to lower levels to tailor the representations by generating probabilistic predictions before processing the new input. Predictions at these various domains hierarchically interact with each other: for example, predictions about semantic meanings or syntactic structures of possible continuations could influence the predictions about candidate words which could, in turn, affect the expected sequences of phonemes. These probabilistic predictions are

evaluated against the bottom-up evidence once the new input is heard to update their prior beliefs. This top-down prediction scheme facilitates the processing of an input word in a sentence and the input, in turn, enables flexible updating of the multi-level constraints through bottom-up projections. This process is simplistically illustrated in Figure 2-2 below.

61

Figure 2-2: Incremental speech processing of a simple direct object sentence โ€œThe giant crocodile attacked the wildebeestโ€ in the light of the BBU generative framework (Kuperberg, 2016). This describes the role played by each input (i.e. a subject noun phrase, a verb and a complement noun phrase) in constructing the event representation (i.e. a message) in a predictive processing framework. Blue arrows indicate โ€œpredictionโ€ and orange arrows indicate โ€œupdateโ€ or โ€œintegrationโ€.

62

Now, the problem simplifies to characterizing the arrows in Figure 2-2: prediction and update. Under the view of prediction as a graded/probabilistic phenomenon (see Kuperberg & Jaeger, 2016), the conditional probability distribution about the upcoming input directly represents information used to predict the upcoming input (i.e. constraints). Also, it is important to quantify the certainty of beliefs because the strength of top-down prediction depends on the certainty with which the beliefs are held (Kuperberg, 2016). Lastly, the difficulty of updating reflects the proportion of variance in constraints (a.k.a. โ€œpruned

probability massโ€ in Levy (2008, p. 1131)) which cannot be explained by the bottom-up input, so-called โ€œprediction errorโ€. The human language system aims to minimize this prediction error by an iterative process of predicting and updating throughout a sentence and will eventually obtain converged representations at various levels each of which best explains the observed sentence. The ways to characterize prediction and to quantify certainty and error are described in the following sections.

This Kuperbergโ€™s BBU framework is a variant of โ€œpredictive codingโ€ framework (Friston, 2005, 2008) which has drawn significant attention in the field of cognitive/perceptual neuroscience. As stated in Kuperberg and Jaeger (2016), โ€œHierarchical predictive coding in the brain takes the principles of the hierarchical generative framework to an extreme by proposing that the flow of bottom-up information from primary sensory cortices to higher level association cortices constitutes only the prediction error, that is, only information that has not already been โ€œexplained awayโ€ by predictions that have propagated down from higher level corticesโ€ฆโ€. This specific neurobiological hypothesis from the predictive coding account has been tested and corroborated in a series of behavioural and neuroimaging studies of speech perception (Sohoglu, Peelle, Carlyon & Davis, 2012, 2014; Sohoglu & Davis, 2016). They consistently reported the reduced activity in superior temporal gyrus (STG) when the speech input (target) was more expected, supporting the claim that brain is sensitive to the mismatch (error) between expected and actual input.