• No results found

Knowledge graph to represent evolving ontology

Figure 5.2 gives a simplified overview of our knowledge graph (KG) representing evolving ontology. The turquoise and dark blue boxes represent stable and changed concepts, respectively. As illustrated in Figure 5.2, the initial version of our KG, i.e., MeSH 2009, contains concept D009133:Muscular Atrophy. In 2010, a new related concept was added to specialize the existing one (D055534:Spinal and Bulbar Muscular Atrophy). To represent this evolution, we created a highLvlChg relationship between both concepts. Thus, documents associated with D009133 and/or D055534 can be retrieved if we find one of these two concepts in the query (e.g., Muscular Atrophy).

D009133 Muscular Atrophy D009133 Muscular Atrophy D009133 Muscular Atrophy

D055534 Spinal and Bulbar

Muscular Atrophy D020966 Spinal and BulbarMuscular Atrophy

[…] versions later highLvlChg […] Other possibilities version: 2013 version: 2010 version: 2010 version: 2009 […] versions later version: 2013

Figure 5.2: The proposed Ad-hoc history of concepts. The turquoise and dark blue boxes indicate the changed and stable concepts, respectively.

We propose a KG that deals with the multi-versioning and/or evolution of KOS. Figure 5.2 shows the evolution of concept D055534 to D020966 during the period 2010 to 2013 (this concept remains stable until 2012 and moved to another region of the KOS in 2013). To represent this evolution, we created a highLvlChg relationship between both concepts allowing us to navigate through past and current versions of concepts. Thus, even if the concept D055534 was moved to

another region of the KOS, we are still able to use the KG to retrieve the previous versions of the concept (D055534 and D009133 ). To perform this navigation, we added the following features to the KG, where the features described in [Debatty et al., 2016] were extended by additional ones to cope with the evolutionary aspect of the KOS:

• Edge directions: Considering concepts as vertices of a graph, Edges materialize the relationship between concepts. For example, in Figure 5.3, concept D009133:Muscular Atrophy and D001284:Atrophy are related in our graph. Since our KG is a digraph [Debatty et al., 2016], one can distinguish between the subject and the object of the relationship. In this work, we consider the structural relationships that are superClass, subClass, siblings and none as uniquely labelled edges, depicted as black arrows in Figure 5.3. Regarding the concepts that emerged from the evolution of the KOS and their connection within the KG, we utilized the same principle of digraphs, but the connections may not follow the ontology structure. For instance, concept D055534 in Figure 5.3 is connected to its superClass and shares some similarities with two other concepts D009136, D016518, from other regions of the ontology. These connections are illustrated in dashed grey arrows in Figure 5.3 but are associated with vertices in our KG.

• Similarity value: It indicates the degree of similarity between two concepts (or two versions of the same concept). We used the hybrid measure described in Chapter 4 to compute it. When a new KOS version is added to our KG, for each pair of connected vertices, the value of the similarity is either calculated or update as depicted in Figure 5.3. • Validity periods: Versioning and storage capacity is an important feature present in our

KG. To reduce the required storage capacity, we used methods like those described in [Caro et al., 2015, Moffitt and Stoyanovich, 2017]. These methods labeled the validity period of concepts and their relationships on the graph nodes and edges, see Figure 5.3. Applying this method avoid duplicating the whole KOS into the KG for every new version.

• Relationships: In order to include more semantics in our KG, we created two types of se- mantic relationship: evolutionary relationship associated with vertices; and structural relationship associated with edges. Evolutionary relationships are highLvlChg, lowLvlChg and none. highLvlChg includes delC, addC, split, move and chgAttValue; lowLvlChg includes delA and addA; none means that the connected vertices had a KOS change at some point in its evolution. Structural relationship are superClass, subClass and siblings. Figure 5.2 shows only the evolutionary relationships while Figure 5.3 shows both. For instance, highLvlChg is an evolutionary relationship indicating that, from one version to another, a major change in the KOS was observed (in this case, a concept was added). Figure 5.3 uses Super to indicate that concept D001284 subsumes concept D009133. The importance of having these two types of relationships becomes evident when the system needs to define strategies to enrich queries. For instance, a query using the term Spinal and Bulbar Muscular Atrophy, from MeSH 2013, will not find documents before 2010. But, knowing the history of this concept, the query can be enriched to return all documents, created before 2010, that also contain the term Muscular Atrophy.

When the query contains outdated concepts, the KG can be used to include additional terms in the query. For instance, consider the situation where one system uses MeSH 2009 to request documents containing the concept D009133 to another system that uses MeSH 2013 to annotate its documents. Thus, using the KG, the query can be enriched with the concepts D055534, D009136 and D020966, connected through the path (D009133, D055534, D009136, D020966) in the period [2009, 2013], to retrieve documents associated with Spinal and Bulbar Muscular Atrophy and Muscular Atrophy. In such cases, the relation and similarity values become very helpful for selecting additional terms (see Figure 5.3).

D009133 Period Neighbor List Relation Simi 2010 D001286 D055534 nonehighLvlChg 0.950.76 2012 D001286 D055534 nonenone 0.910.74 2013 D001286 none 0.91 D001284 D040181 D020271 D030342 D009358 D020763 D013568 D019636 D009422 D009136 D009223 Valid: [2010, 2012] Valid: [2013, 2016] Period Neighbor List Relation Simi 2010 D009136 D020271 D040181 D016518 highLvlChg highLvlChg highLvlChg highLvlChg 0.75 0.96 0.94 0.66 Period Neighbor List Relation Simi 2013 D009133 D009134 D009138 D062187 highLvlChg highLvlChg highLvlChg highLvlChg 0.75 0.96 0.74 0.66 2014 D009133 D009139 D009131 none highLvlChg highLvlChg 0.75 0.86 0.74 Valid: [2009, 2016] D055534 D020966

Figure 5.3: KG proposed for the indirect maintenance of semantic annotations. The black arrows indicate the connection following the ontology structure, while the dashed grey lines indicate the connections created by our approach. Each node contains the validity period and its neighbours. For each triple (node, period, neighbour), one relation and similarity describe their link.

We formalize our KG as a direct graph G = (V, E), where V is the set of vertices and E the set of edges. The set of vertices is denoted by:

V = {(c, p, NL)|c ∈ O, p ∈ N,

NL = {(ci, RE, simiV)|ci∈ O, RE ∈ {highLvl, lowLvl, none}, simiV ∈ R}}

where, c is a concept from an ontology O in a period p, e.g. 2009, containing a neighbour list NL inferred by the KNN graph approach of Debatty. Each N L contains a concept target ci; the

relation emerged from the evolution RE, whose values highLvl and lowLvl denote some KOS change and none indicates that they are connected because they have a high similarity value simiV. The set of edges E is denoted by:

E = {(u, v, p, SR, simiE)|u, v ∈ V, p ∈ N, SR = {super, sub, sib} , simiE ∈ R}

Where, u and v are vertices belonging to V and are connected during a period p, e.g. 2009. Since edges respect the ontology structure, each connection has a semantic relationship SR, which is one of the values superClass, subClass and sibling. Finally, simiE represents the similarity

between u and v.