SmartCAT-T: Exploiting Tree-structured Models
8.5 Overall Analysis
8.5 Overall Analysis
Figure 8.8 shows a comparison of F-Scores obtained for all the test regimes. It can be seen that H-Smart-H-Query outperformed all the other data-query combinations, for all queries except query 8. H-Smart-H-Query has an average F-Score of 0.71 and Lucene was the worst with an average F-Score of 0.44. This is mainly because hierarchically structured case representations allow for effective similarity matching even when cases are activated in a na¨ıve way such as the bag-of-words approach used in these experiments. Comparing structures the waySmarter does ensures that structures that are different are interpreted as such; a mobility-related case will get a similarity score of 0 when compared to a case regarding hearing problems since nodes in the two cases will have the root node as their nearest common parent.
Figure 8.8: Comparison of all Combinations
Lucene-plus performs better than Lucene for most of the queries. This shows that the case representations generally contain as much useful knowledge as the original documents and in many instances, are more focused to problem-solving than whole documents. This further shows that the feature extraction technique is effective at identifying those features that are important in the domain.
8.6. Summary 145 Query 7 did not contain much of the problem detail. Thus the query was a challenge for all test regimes. Generally, better results are obtained with harmonised case knowledge and queries than with unharmonised case knowledge and queries. This can be attributed to the fact that more key phrases were identified when the documents were harmonised than when they were not. The cohesion created by harmonisation allows LSI to identify more key phrases because of the now enhanced degree of co-occurences. The identified key phrases in turn, lead to activation of concepts to which they are part of the intent.
Thus more useful cases can be retrieved during problem solving.
Better results are obtained for the Leave-one-out test than for the new problems. This is partly because the test probes used in the Leave-one-out test use vocabulary that is familiar to Smarter’s case retrieval engine where as some of the vocabulary used in the queries is unfamiliar. Unfamiliar vocabulary in the new test problems could also be the reason why sometimes the difference in results obtained with harmonised queries is neglibly small; when the bulk of the text is unfamiliar, the use of a few familiar words may not make much difference on retrieval effectiveness. Nevertheless, good results are obtained which means that important terms such as disability-terms or problem-focused terms such as door opening, will be repeated in some fashion regardless of when or by whom reports are created. Hence the CBR system is able to find cases with similar knowledge even when some of the query terms are unfamiliar.
8.6 Summary
This chapter has presented an evaluation of the general performance of theSmarter CBR system and benchmarked the system on a high performance IR tool Lucene. it has also presented an analysis of the contribution of the various modules to the overall performance of the Smart case authoring tool and of Smarter’s retrieval mechanism.
The results have shown Smarter to perform remarkably well for new problems. In-deed, its performance was shown to be superior to that of Lucene even when concepts were activated in a na¨ıve way where only overlapping words between the queries and case knowledge were considered. Results from further experiments have shown that the key phrase extraction technique was effective in identifying phrases that contain
knowl-8.6. Summary 146 edge that is useful for problem solving. Furthermore, better results were observed when case knowledge was obtained after the text was harmonised. This is because LSI em-ploys co-occurrence patterns among terms in the document collection in order to identify key features inherent in the text. Text harmonisation makes those co-occurrences more aparent and consequently results in the identification of more key phrases than when the text is not harmonised.
Retrieval effectiveness depends on the ability of the retrieval engine to associate/match a query term with the ones used as case knowledge. The results show a slight improvement when harmonised query words are used. This can be attributed to the fact that word-synonyms are interpreted as such enabling the system to treat similar terms that are different strings, the same. Thus, high gains could potentially be obtained when the system is in use as there would be many more query words whose nature cannot be known apriori but which the system is designed to be able to handle with the use of the synonym mapper.
The experiments also illustrate the superiority of similarity measures that take into account relationships between concepts, over bag-of-words approaches. This is the reason behind Smarter’s ability to recommend more sensible solutions than IR-based engines such as Lucene.
In the next chapter, an experimental evaluation is carried out to establish how generic the developed approaches are by subjecting them to a domain other than the SmartHouse.
The techniques are applied to data in the domain of air and marine safety investigation obtained from the Canadian Transport Safety Board (TSB 2007). Adjustments that are required in the application of the techniques in the domain of air and marine safety inves-tigation are highlighted.