Chapter 6 KneeTex: A system for information extraction from knee MRI reports
6.6 Rule-based co-reference and ambiguity resolution
Once recognised, named entities are imported into a relational database and further scrubbed in order to disambiguate them. Semantic ambiguity may arise naturally from linguistic phenomena such as hyponymy, a relationship between a general term (hypernym) and its more specific instances (hyponyms), and polysemy, where a term may have multiple meanings. Multiple related interpretations may also arise from nested occurrences of named entities.
Term Nestedness
During dictionary lookup, PathNER will return longest possible matches with similarity scores over a certain threshold. As a result, there will be no overlap between named entities recognised in this manner. However, pattern matching used in the second phase of NER may introduce nested annotations of named entities. For example, in the coordinated expression medial and lateral meniscus, PathNER will recognise two terms from the TRAK ontology: medial (TRAK:0000031) and lateral meniscus (TRAK:0001089). Pattern matching will subsequently recognise a coordinated expression as a reference to medial meniscus (TRAK:0001090). The nested occurrence of lateral meniscus should be retained as a valid reference to a named entity. However, the nested occurrence of medial represents an unsuccessful match to another named entity, medial meniscus, and thus should be removed. The choice between retaining and removing nested occurrences of named entities is based on their semantic types. For example, all nested occurrences of terms descending from the concept quality (TRAK:0000133) defined as "a dependent entity that inheres in a bearer by virtue of how the bearer is related to other entities" are removed. This will remove nested occurrence of medial in the previous example, but also references to radial (TRAK:0001531) and vertical (TRAK:0000077) in the example shown in Figure 6-4.
Hyponymy
Hyponymy is a lexical relationship between two terms, where one term (hyponym) is subordinate to the other (hypernym) (Stede, 2000). For example, cruciate ligament is a hyponym of ligament, and complete tear is a hyponym of tear. When a hypernym is mentioned, it could have multiple possible interpretations, either as the hypernym itself or as one of its hyponyms. Anatomical hypernyms cause higher level of ambiguity than finding hypernyms. Figure 6-8 shows examples of hyponyms of ligament and tear in the
ontology. Therefore, when ligament is mentioned in text alone, it could have 15 different interpretations pointing to different anatomical locations. Although the mentioning of tear may have 16 different interpretations, these interpretations represent the same finding with different details. Meanwhile, finding hypernyms occur much less frequently than anatomical hypernyms. For example, the standalone mentioning of ligament occurs 82 times than 14 times of tear in the training set. Therefore, we focused on resolving anatomical hypernyms.
Figure 6-8 Examples of hyponyms of ligament and tear in the TRAK ontology
In clinical discourse, co-referential terms are often used to maintain text coherence and cohesion, e.g.
In this example, the hypernym ligament co-refers to its hyponym medial collateral ligament, and therefore its interpretation should coincide with that of the hyponym. In other words, the literal interpretation of the hypernym (ligament) obtained originally by dictionary lookup should be corrected using the annotation of the co-referring hyponym (medial collateral ligament), e.g.
This type of ambiguity is resolved systematically by identifying co-referential named entities, i.e. those that refer to the same concept. Co-reference resolution is applied to named entities recognised as one of the following concepts: meniscus (TRAK:0000045), ligament (TRAK:0001027), tendon (TRAK:0000046) or muscle (TRAK:0001088). In such cases, co-reference is resolved by looking for previous closest mentions of their ontological descendants. Once a hypernym term is spotted, the system looks for its ontological descendant term occurs before the start of current hypernym term. A distance threshold of 100 is set to avoid mapping to unrelated hyponym concepts. If multiple descendant concepts are found within the distance threshold, the closest one will be selected and used to replace existing mapping concept for current hypernym. However, if no descendant concept is found, the hypernym concept will remain as it is.
Polysemy
Polysemy refers to a linguistic phenomenon where a single word or a phrase may be associated with multiple meaning and therefore have the potential to be misinterpreted (Krovetz, 1997).
Sublanguages are restricted to specific semantic domains, which in turn affect the word usage. They generally tend to reduce the degree of polysemy. Nonetheless, the problem may still persist. For example, as pointed by the domain expert, word rupture in phrases ligament rupture and cyst rupture would be interpreted differently. In the former case it should be mapped to the following concept in the TRAK ontology:
In the latter case, it should be mapped to an alternative interpretation represented by the
id: TRAK:0000211 name: tear
def: "Forcible tearing or disruption of tissue." [] synonym: "rupture" EXACT []
synonym: "tearing" EXACT [] synonym: "disruption" EXACT [] synonym: "split" EXACT [] is_a: TRAK:0000206 ! injury
relationship: occurs_in TRAK:0000045 ! meniscus relationship: occurs_in TRAK:0000046 ! tendon
relationship: occurs_in TRAK:0001072 ! skeletal muscle relationship: occurs_in TRAK:0001027 ! ligament organ
In such cases, co–occurrence information is used to resolve typical ambiguities observed in the training set. For example, when rupture co–occurs with a cyst (i.e. any descendant of the cyst concept), e.g.
it is used to correct its default interpretation as a tear, which represents an injury, to an alternative one, which represents a morphologic descriptor:
Thus, we are able to differentiate between the different uses of the term rupture in this latter example and that of the following example:
By default, rupture has already been mapped to the TRAK concept with identifier TRAK:0000211 representing an injury. It co-occurs with MCL, the abbreviation of medial collateral ligament, which matches the relationship defined for concept TRAK:0000211. This polysemy interpretation is restricted to the sublanguage of the knee injury domain. It may not be correct in or generalisable to other clinical domains.