The Probabilistic Task Content Modeling Approach

Probabilistic Task Content Modeling

4.6 The Probabilistic Task Content Modeling Approach

Previously in this chapter, task content was described as an instantiation of task structure, informally represented in episodic narratives written in natural language.

Both the task structure and the natural language used in the narratives are proba-bilistic phenomena: The task structure because it has to adapt to different situations in the real world, and natural language because of its inherent ambiguity and rich-ness of expression. By considering the elements of the task structure as hidden states that generate the natural language phrases that fill the narratives, the pro-cess of generating the narratives can be regarded as a “doubly embedded stochastic process” that might be formally represented with an HMM. Indeed, we will use an HMM to model the process of generating the episodic narratives, which will be built by customizing the approach of Barzilay & Lee. However, because our final purpose is to perform TCBR, we need to create a connection between cases and the probabilistic modeling. For this purpose, we compile the following assertions.

4.6.1 Assertions

Recall the previous discussion on the knowledge role observed object, where it was explained that a narrative could contain observations on several observed objects.

Then, for each observed object, there is a set of finding-s, explanation-s, or action-s.

Based on the event-oriented viewpoint on TCBR, every single event or group of events that is self-contained can be considered as a unique case. Therefore, if a narrative consists of many events, then:

One narrative might contain many cases.

As mentioned earlier on in Chapter2, one of the CBR tenets is that “the world is regular”. Based on this tenet, it is possible to claim that, similar situations will have similar outcomes. The other tenet is that “the same situations keep recurring”. As a result of this tenet, we would expect that many of the narratives would describe events that have occurred previously, although at a different time and place. Thus:

A collection of narratives will contain redundant cases.

Is redundancy bad? The answer is yes and no, depending on the task. In all retrieval tasks such as information retrieval or the retrieval step of CBR, redundancy causes problems, because it increases the computational cost without contributing new information. However, in the context of probability, when an event keeps repeating more frequently than the others, such information is not considered as redundant. It only reflects the true nature of the process. In fact, if we think of the cases as outcomes from an experiment in a setting of normal distribution, the cases that will appear frequently will be considered as normal outcomes, while the cases that appear rarely will be considered as abnormal outcomes. In the context of CBR, one is usually interested in having in the case base cases that are different from one another, so that each of them contributes a new problem-solving situation. In this sense, those cases seen as abnormal outcomes are more useful than the ones seen as normal outcomes. However, since it is not possible to know a priori the nature of the case, the following assertion is needed:

Redundant cases can be used to build prototypical cases.

Cases that differ from the prototypical cases in some aspect serve as useful cases in the sense of CBR.

By building a probabilistic model that simulates the process of generating the cases, it will be possible to distinguish between prototypical and unique cases, since a prototypical case will have a high probability (of being generated by the model), while a unique case will have a low probability.

4.6.2 Representations

By adopting the event-oriented perspective, it is possible to have two different sets of elements for representing the task structure: the set of events and the set of event participants. The elements of both these sets can play the role of states in

the probabilistic task content model. Actually, we will use events for representing narratives and event participants (i.e., knowledge roles) for representing cases.

To exemplify, consider the structure of the MONITOR-and-DIAGNOSE task described in Section4.3. There are three events: Observe, Explain, and Recommend.

These events can be regarded as the topics that generate sentences (or clauses) according to their inherent meaning. To comply with Barzilay & Lee, we can add an unknown topic X, which will be responsible for generating those sentences that are not generated from one of the known topics. Using the abbreviations Obs, Exp, and Rec for the original events, we could represent narratives as sequences of states that generated the text. Based on the parameters of the probabilistic model, different kind of sequences will be possible:

[Obs, Obs, Obs, Obs], [Obs, Exp, Obs, Exp, Rec], [X, Obs, Exp, Rec], etc.

In a similar way, a case can be regarded as generated from a sequence of event participants that corresponds to knowledge roles in a task. Using the abbreviations OO for observed object, FI for finding, EX for explanation, EV for evaluation, and AC for action, different kind of cases can be represented:

[OO, FI],

[OO, FI, EX, EV], [OO, FI, EX, AC], etc.

Actually, both events and knowledge roles should be considered as the hidden states of the PTCM model, since they are not observed in reality. All we have is a narrative that contains a sequence of natural language sentences, and neither the event types nor the knowledge roles are apparent. However, by supposing that they exist and that the observed sentences or phrases are generated from them, we can think of a PTCM model consisting of these hidden states. It is possible to build two different models, one for the narratives, and one for the cases. Once the number and nature of the states is decided, it remains to estimate the parameters of the models. The procedure is the same for both models.

4.6.3 Estimating Model Parameters

If the number of states and the vocabulary of output symbols (i.e., the words) are known, the situation is that of Problem 3 discussed in Section4.5.1: estimating the parameters of the model λ = (Π, A, B), namely estimating the initial probabilities Π, the state transition probabilities A and the emission probabilities B. By inserting

a dummy state as the initial state, it is possible to incorporate the values of Π in the A distribution, so that only A and B need to be estimated. Instead of following the parameter estimation approach described in Section4.5.1(i.e., the Baum-Welch algorithm), we employ the strategy of Barzilay & Lee, described in Section4.5.2.

Concretely, Equation 4.21 serves to estimate the state transition probabilities, while for the emission probabilities are used Equations4.19 and4.20.

After this initialization, an EM Viterbi-style optimization as described in Sec-tion 4.5.2can take place.

While the learned model is basically an HMM model, throughout this thesis we refer to it as the PTCM model, for identification purposes. Actually, the PTCM model is different from the probabilistic content model build by Barzilary & Lee in Section 4.5.2, because it uses known states. Indeed, it was shown that the model of Barzilay & Lee performed an initial clustering of sentences in order to have each cluster as a possible state. After the parameter estimation, these were called V-states (Viterbi states), containing similar sentences, but having no established, explicit meaning. For purposes of CBR, we decided to have a previous process of annotating the narratives with event types and knowledge roles, so that the PTCM model serves better to these purposes. In order to build the PTCM model, a process of knowledge extraction is needed, then, in order to build the case base, a process of knowledge summarization based on the PTCM model is needed. These two processes, to which two separate chapters are dedicated, are sketched briefly in the following section.

In document Knowledge Extraction and Summarization for Textual Case-Based Reasoning: A Probabilistic Task Content Modeling Approach (Page 108-111)