• No results found

Conclusions and Future Work

5. Generic techniques that assist case authoring from text. The techniques developed in this research were applied to the domain of air and marine safety in

10.3 Future Work

Telephone Operation. Mrs. B had a sister who lives in America and on a regu-lar basis made contact by telephone to check on Mrs. B’s health and wellbeing.

Mrs. B is unable to communicate effectively by telephone because of her hearing difficulties and in most instances, both Mrs. B and her sister relied upon Mr. B to relay messages between the two. This had become a cause of great frustration and upset for both Mrs. B and her sister. Mr. B also became upset and frustrated by the lack of communication between his wife and her sister, especially as he was having to act as go between.

Figure 10.1: Deleting Vs. Amending Representative Terms

keep in mind that these changes will propagate to all the nodes that represent sub-concepts of the shared node.

All the changes made to the conceptual model are also reflected in the model’s context table. This is because each new authored case will introduce a new set of attribute-terms and objects which need to be attached at appropriate positions in the hierarchy. However, it is impractical to expect a human to do this manually. Therefore, the new case knowledge is added to the context table and a new conceptual model generated. Thus, if the context table does not reflect the changes that were earlier made by the expert’s editing of the conceptual model, the new model that is generated after adding a case will only reflect the changes due to the new added case and all the changes made to the hierarchy over time will be lost.

10.3 Future Work

This section highlights some of the limitations or shortcomings of the approaches used in this research and consequently identifies some directions for future research and improve-ment of the SmartCAT-T tool.

10.3.1 Pruning of the Conceptual Model

A technique for systematically pruning the conceptual model might be useful, especially as more knowledge is acquired. One reason for this is that even domain experts could disagree as to which attributes to delete, change, etc. Moreover, a good subset of the original concepts might be more useful than using all the concepts. Thus it is worth exploring pruning techniques that would be advantageous to the domain.

10.3. Future Work 163 10.3.2 Word Sense Disambiguation

The approach presented in Chapter 3 could be extended as follows. Currently, each term is disambiguated independent of the other terms in the text. This means that for specialised domains, there may be limited evidence of the plausibility of the disambiguated phrase as obtained by Google. However, if the relationships between terms was taken into account, a term that has been disambiguated with high evidence from Google might “help” in disambiguating another term to which it is related. That way, evidence from Google regarding the plausibility of a phrase in which a given term is embedded, could be used to determine the plausibility of another phrase in which a related term is embedded and thus provide enough evidence for disambiguation of the related term. This approach is likely to result in the disambiguation of more terms since more evidence will be available to support disambiguation.

10.3.3 Retrieval Effectiveness

Currently during retrieval, all the query features are used to retrieve matching cases that are then ranked using the structure which exploits semantics to complement overlapping syntactic information between the query and the cases. However, some of the query features may be more important than others. Thus, it would be useful to identify this knowledge and incorporate it into the retrieval mechanism. This knowledge could be as simple as the importance weights obtained by LSI during key phrase extraction, or other feature ranking mechanisms such as those employed by Wiratunga et al. (2005) or in Sophia (Patterson et al. 2005) could be explored and utilised. These techniques could also make use of solution description texts to assign importance weights to key phrases in the problem text. These weights could then be used in ranking retrieved cases and so potentially improve on the quality of the similarity metric. As an alternative to automatically assigning importance weights to query features, it might be worth exploring whether a graphical representation of the query that allows the user to choose which parts of the structure to use for retrieval would improve on retrieval effectiveness. A mixed-initiative dialog similar to that employed in Conversational CBR could be explored in refining queries in order to improve on retrieval effectiveness.

10.3. Future Work 164 Retrieval effectiveness has been evaluated using Precision, Recall and the F-Score. As further research, other means of determining retrieval effectiveness could be explored. One such measure is the Average Distance Measure (Mea & Mizzaro 2004) where the concept of continuous relevance1 of the retrieved cases could be explored.

10.3.4 Editing the Conceptual Model

If all the terms in the intent are deleted then the corresponding sub-problem will have no description! Thus, deleting of terms could be limited in such a way that makes it impossible to delete all the terms in an intent. For instance, a term in an intent could only be removed if it does not contribute to the description of the problem and cannot be changed to do so. Currently, each intent is mostly composed of long phrases and so it is easy for the expert to grasp the context. As an extension to the tool, each intent’s corresponding text could be provided at the concept node so as to further encourage the expert to view the terms in context and thus make informed decisions when editing a concept node’s intent.

10.3.5 Exploring other domains

It is worth exploring the application of the developed techniques in domains other than SmartHouse. Sparse and dense domains could be explored. For sparse domains, other means of incorporating domain knowledge could be explored to aid the key phrase extrac-tion task. For example in domains such as Medical where ontologies such as MESH2 are available, the ontology can be exploited in providing domain knowledge that can in turn, be used to extract key phrases. SmartCAT-T makes use of background domain knowledge to obtain the seeds for anchor term identification. The anchor terms are then used to obtain other key phrases using term-term relationships obtained from LSI. An ontology could replace the use of LSI in SmartCAT-T. Thus an anchor term could be used to ob-tain other important terms by making use of relationships between the anchor term and other terms in the ontology. In dense domains, the anchor terms could be obtained using statistical means as was done in the domain of air and marine safety. In dense domains,

1In continuous relevance, the suitability of a retrieved solution is continuously assessed.

2http://www.nlm.nih.gov/mesh/meshhome.html

10.3. Future Work 165 a purely statistical means of obtaining key phrases could be explored. For example, LSI could be exploited without the anchor terms to guide the key phrase extraction process. It would be interesting to observe the level of knowledge extracted from the texts without the use of additional domain knowledge. The reports in the domain of air and marine safety were long and verbose but the developed techniques still worked satisfactorily. Subjecting the techniques to a domain with longer documents or a much bigger corpus could help to determine their scalability limits.

10.3.6 Building the Conceptual Model

Other techniques of obtaining the conceptual model could be explored. Formal Concept Analysis is very good at arranging concepts according to their IS-A relationships and will thus be hard to compete with. However, other techniques such as that presented by Sanderson & Croft (1999) could be explored. Indeed, documents from the Web could be used to ascertain the validity of the obtained relationships as is done in the Know-ItAll (Etzioni et al. 2005) system.

Chuang & Chien (2004) showed the usefulness of the subsumption hierarchies from Sanderson & Croft (1999) for interactive query expansion. These techniques typically do not put the meaning of the words or word order into consideration and as a result, relations that show a term as being an aspect of another term, e.g., disease - outbreak are extracted alongside hyponym/hypernym relations like banana - fruit. Thus although the approaches can be fully automated, many of the extracted relations are not useful in most applications.

10.3.7 Negation

Negation has not been handled in this research. It is unusual for a person’s complaints to be listed and thereafter negated. Thus it is not common to encounter negation in the problem parts of the reports and none was encountered in this research. However, it is possible for a solution to be considered and then discarded due to the circumstances at the time. Thus it is possible to encounter negation in the solution part of the SmartHouse reports. Fortunately in this project, the inability to handle negation did not have serious repercussions because there were only two solution descriptions that had negation.