Probabilistic Task Content Modeling
4.4 Analysis of Task Content Narratives
There are two types of narratives that can be considered as describing task content:
1. narratives that describe how to carry out the steps of a task
2. narratives that describe what happened during a concrete situation of task execution
An example narrative of the first type was shown in Figure 3.7 of the previous chapter. However, we do not regard that narrative as a description of task content, but only as a master-plan for the task structure. In our view, task content is captured by the narratives of the second type, which we proceed to analyze more thoroughly in this section.
Because a specific task is always carried out within a known domain, the users communicating via narratives share a large amount of domain knowledge. Such a shared understanding of the domain means that the narratives will not contain
information that explains either domain entities or domain processes to which the narrative content refers. Generally, one would expect that the narratives contain only text generated specifically for communicating the results of each task step execution. Thus, our working assumption is that a task content narrative only contains information strictly concerned with the task structure and its elements.
In Figure 4.1, a list of task elements for the MONITOR-and-DIAGNOSE task was defined. However, if we do not strive after a knowledge representation as that of CommonKADS, do we still need to acquire domain knowledge for each of these knowledge roles? Because it is clear, the larger the number of these roles, the more difficult becomes the knowledge acquisition process. Thus, a way should be found to determine the most important roles. Our assumption is that such a choice will depend on the user goals for the task at hand. Everything that fulfills a user’s need for knowledge will be important to acquire. In the concrete context, we must raise the question: What are a user’s needs during the MONITOR-and-DIAGNOSE task?
A way to assess the needs is to learn what a user cannot do well. As mentioned in Section 4.3, novices face the problems of “missing the symptoms” and “not being able to generate explanatory hypotheses”.
Thus, a concrete goal for knowledge acquisition from the narratives would be to extract knowledge that corresponds to symptoms and hypotheses. However, this is not as easy as it sounds. When writing free text, people do not formulate their thoughts in the following way:
We found symptom X.
A possible hypothesis (explanation, cause, etc.) is Y.
Actually, all the terms denoting knowledge roles (symptom, finding, discrepancy, etc.) are abstract terms that do not occur in natural text written by domain experts.
Thus, we need to find ways that make recognizing verbalizations of these concepts in the narratives possible. However, a few issues need to be discussed before.
Initially, a symptom was described in Figure 4.1 as a negative finding, and discrepancy seems to be a type of finding, too. Thus, both a symptom and a discrepancy are a kind of finding. Meanwhile, a finding is something that is observed.
In this way, we can assume to identify symptoms, discrepancies, and findings as participants in the Observe event.
Additionally, a hypothesis is regarded as a possible solution. But, what kind of solution? We assume that the solution is to find the cause of a finding. Because people are not always sure whether a causal relation exists between two things that co-occur, a more generic term for a hypothesis would be that of explanation.
An interesting fact related to the process of generating hypotheses can be noticed in practice. While a user might test several hypotheses in the course of problem solving, when writing down the problem solution, the failed hypotheses will not be mentioned at all. There are several reasons for this omission:
• the desire to not overload the audience with information
• the desire to protect domain knowledge from competitors
• the desire to hide the uncertainty related to preferring one hypothesis over others
From the point of view of building a decision support system that assists users with broad information, having access over the failure rates of generated hypotheses would be a great source of information. Unfortunately, this information remains part of the knowledge people keep for themselves, for the reasons we mentioned, or others too. What remains is that a hypothesis as a form of explanation would generally be a participant in an Explain event.
Another issue to discuss is that of parameters. We know that a finding is related to a parameter, whereas a parameter measures an attribute (property) of an entity that is at the focus of the task. As an example, recall the discussion in Section3.3, where the entity was the insulation system, a property was its robustness, a param-eter was the leakage current. However, we saw that the findings in that occasion were expressed as qualities of the representatives such as curves and points. The situation then is the following:
An observed object has properties.
The condition of such properties is measured by parameters.
Parameters are represented in some form.
The representatives are analyzed in to detect irregular findings.
From such a description, it follows that is possible to find occurrences of all the mentioned terms in the narratives. To exemplify, consider the following sentences, where finding is a placeholder (i.e., it stands for something that has been observed):
1. [All the phases] show finding.
2. [The total currents of all measured phases] show finding.
3. [The curves] show finding.
In the first sentence the observed object ‘phase’ appears; in the second sentence the parameter ‘total current’ appears, and in the third sentence the representative
‘curve’ appears.
Such a scenario (the use of different concepts) is possible due to the common speech phenomenon of metonymy. People like to refer to things by terms that some-how have a relation (part-of, participant-in, property-of, etc.) to the true concept being described. The use of metonymy makes an automatic distinction between the semantic categories of the true observed object, its properties, its parameters, and their representatives difficult. For this reason, in this thesis we opt for a functional categorization of these concepts. Because they all appear to play the role of being observed, we refer to all of them collectively as observed objects.
The final issue is related to the number of observed objects in one narrative.
If the true observed object has several properties, whose condition is being mea-sured by some parameters, in a narrative there would appear several sequences of
observations or explanations that are not directly related to one another. This is something to be taken into consideration during the case base creation process, as it will be discussed in Section6.3.
Having discussed the elements related to task content, we direct now our atten-tion to its probabilistic aspect. However, such an elaboraatten-tion cannot be understood outside the theoretical framework of probabilistic modeling for natural language, which we proceed to present in the following section.