• No results found

A Theoretical Framework for Mind Mapping

2.3 Identifying Related Concepts

In addition, sorting normalised paths of node keys will group together paths that employ similar nodes;

further eliminating duplicate paths yields a list of unique paths, which will be reassembled automatically into a combined knowledge structure. That, in turn, alleviates the manual task endured by Buckingham and Adams (2006). Such novel mind maps will contain just a single instance of any particular path, with any related paths integrated by a degree of reconfiguration. The third normal form, 3NF, ensures that any fields in a tuple depend solely on the key of the containing relation. The process of classification suggested a tuple for representing related words in the form concept → concept. That tuple lacks a nodeID field: the meaning of any specific word does not depend on what node contains it. The outcome of applying 3NF is similar to that obtained from meeting 2NF, in that separate relations are indicated for holding data in a more usable form.

2.3 Identifying Related Concepts

The process of abstraction, then, refines and optimises intensional knowledge, which regulates what might be stored in any information base, and in what manner. Classification starts that process by identifying any atoms needed to hold such intensional knowledge. Subsequently, successive stages of normalisation determine structures of such atoms. The first stage, 1NF, reorganises repeating groups within any tuple, or record. After that, 2NF and 3NF refine relations that store such tuples. Field in any tuple, though, must hold just atomic values; mind map nodes, in contrast, often express several concepts. Such nodes might be split into constituent words held as separate nodes. That BOW approach, though, was rejected in Chapter 1 because of the ensuing loss of meaning.

A Normalised View of Concepts from GRiST Mind Maps

Rather than rewriting mind maps nodes in that way, tuples of the form nodeID → concept represent key concepts as atoms of intensional knowledge. A further tuple, concept → concept, allows for words that express related meanings. In fact, two types of related words are considered here. The first type concerns morphological variations that yet share a common sub-string, from here on termed a stem. Such word variations arise from appending pre- and suffixes to stems. Words that lack any common stem, though, might yet be related. Although completely different in terms of text, people recognise such words are synonymous. Machines must emulate that human knowledge in order to discern relationships between such words.

It is important that machines concur with humans when identifying related words. From a mor-phological viewpoint, over-short stems will group together words that are not truly related. Conversely, stems that are too long will fail to wholly identify related words. Machines, then, must determine stems of optimum length. Although so-called stemmers exist, they suffer disadvantages that must be overcome,

2.3. IDENTIFYING RELATED CONCEPTS

such as being designed for handling specific languages. An analogous problem arises from analysing word meanings, in that words might be just remotely related. Given that tools exist to detect the meanings of words, the challenge lies in determining the degree of any reported synonymy. Concepts related by morphology or by meaning will, though, be held as stems rather than as actual words. Records based on the tuple nodeID → concept will group together nodes that express morphological variations of a particular concept. Records from the concept → concept tuple will further be populated by stems to indicate synonymy between entire groups of words.

Spelling Mistakes in GRiST Mind Maps

An further aspect of GRiST mind maps, though, detracts from identifying the required atoms; mis-spellings and a sometimes specialised mental-health vocabulary obscures human authors’ intended mean-ings from machines. Although absent from any digital dictionaries, it would be wrong to treat such uncommon words as spelling mistakes; indeed, that they are used by specialists makes them particularly important. Even for true mistakes, any spelling checker might suggest inappropriate replacements which, if accepted, would introduce false knowledge. Machines, then, face two problems. The first is whether novel words should be treated as they stand, or corrected by means of a spelling checker. That second option raises the further problem of determining appropriate suggestions from any offered.

A Tool for Refining Spelling Corrections and for Stemming

Having introduced problems that machines face in processing GRiST mind maps, attention now turns to one of the tools chosen to overcome those problems. In fact, acceptable stems and spelling corrections alike will depend on measuring any similarity between words. To that end, the chosen measure is the Levenshtein Distance, referred to as L. Before describing that algorithm in detail, though, the next chapter starts by looking at spelling errors in more detail. After explaining the origins of spelling mistakes, an overview of existing research raises various issues concerning approaches to automated correction.

Following that comes a proposal for handling spelling errors in GRiST mind maps.

Themes encountered during spelling correction will recur in respect of stemming. Chapter 4, then, continues by describing approaches to deriving stems from isolated words. After that, studies are reviewed that account for context during stemming, before turning to stemming for GRiST mind maps. Both spelling correction and stemming will rely on the Levenshtein Distance, to which various refinements have been made in the past. In particular, the effect of word length on any judged similarity must be taken into account. Such an adjustment for use in analysing GRiST mind maps closes that chapter.

2.4. CHAPTER SUMMARY

2.4 Chapter Summary

This chapter, then, placed mind maps in the context of semantic networks, and described various ap-proaches that used mind mapping in knowledge engineering, albeit to a limited degree. By dint of being semantic networks, GRiST mind maps were further seen as a potential information base. That led to considering just what characterises information bases; in particular, the process of abstraction was seen to improve any design. Accordingly, abstraction was considered in respect of GRiST Mind Maps, in an attempt to identify related concepts. This summary, then, closes the chapter, and attention moves to a means of resolving unrecognised words found in GRiST mind maps.

3