Description of the Process - Automatic Discovery of Explanations

4.3 Automatic Discovery of Explanations

4.3.2 Description of the Process

The process that we propose is to repeat the steps shown in Figure4.2(we call the whole

loop an iteration) until a termination condition is met. This is expressed by the user in terms of time spent, number of iterations achieved or a specific score to be reached by best explanation. An iteration then includes:

Figure 4.2 Schema of an iteration.

(1) Graph Expansion, where we dereference new entities to reveal new property-value couples;

(2) Path Collection, where we build new paths that can be followed in the following iterations;

(3) Explanation Discovery, where we create and evaluate new explanations.

At each iteration, the process collects new explanations and outputs the one with the highest score. All the explanations collected by the process up to that point are kept in a queue in a descending order based on their score. This means that, within a new iteration, two scenarios are possible: either we have found a better explanation, that covers more positive examples than the previous one and has therefore a better score, or no better hypotheses are found, and we keep the last iteration’s hypothesis as the best one. In other words, with more iterations, results can only increase in quality. Therefore, Dedalo can be considered an anytime process.

Graph Expansion. An iteration starts by dereferencing the set Vi of ending values of a

path !pi that has been chosen from a queue Q of possible paths to be followed. We call

this operation pop(Q). Initially, the graph G is only composed of elements from E, so

V =_{E and E = ; (see Figure}4.3). Since Q is empty, we dereference elements of E. When

dereferencing the set of entities corresponding to the values in Vi, we obtain each of the pairs

4.3 Automatic Discovery of Explanations | 77

Figure 4.3 G at iteration (1).

Figure 4.4 G at iteration (2).

obtain pairs as (ddl:linkedTo, db:GoT-s03e09) for ddl:week1, (ddl:linkedTo, db:GoT-s03e10) for ddl:week2, and so on. The value of each pair is added to V , while the property is added to

E. Figure4.5and Figure4.6show an example of how G can be further increased throughout

iterations.

Figure 4.5 G at iteration (3).

As hinted by the namespaces, values might or might not be part of the same dataset, and this only depends on the representation of the entity that has been dereferenced. In this example, we have shown how from a dataset of weeks the search spread to DBpedia simply by following entity links (the property ddl:linkedTo).

Path Collection. Given a node vj 2 Vi, there exist a path !pi of length l that has led to

it (unless vj belongs to E, then l = 0). For each of the (property-value) pairs revealed by

the look up, the former are concatenated to the !pi that led there, to generate new paths of

length l + 1. Those are added to the queue Q of paths, and the best one will be returned by

pop(Q)at the next iteration for further extending the graph. For instance, consider being in

Figure 4.6 G at iteration (4).

hddl:linkedTo.dc:subjecti. By dereferencing db:GoT-S04, the new property skos:broader is extracted from the discovered pair (skos:broader, db:GameOfThrones-TVseries); thus,

a new path !p2 = hddl:linkedTo.dc:subject.skos:broaderi is built and added to Q. When

dereferencing the other values of V1, as for examples db:HIMYM-S08, the path !p2 exists

already in Q, then the value of the pair (skos:broader, db:HowIMetYourMother) is added to

the set V2 of ending values for !p2.

Explanation Discovery. Before starting a new iteration, we build from each of the new

paths the set of candidate explanations and then evaluate them1_{. Explanations are added to}

B (the list of possible explanations known at that given iteration), and the one(s) with the highest score is saved as top(B) for that current iteration.

To generate explanations, the paths !pi are chained to each of their ending values vj 2 Vi.

We can obtain two types of explanations:

(1) "i,j =h!pi · vji, if the last property plof !pi is an object property, as in

"1,1 =hddl:linkedTo.rdf:type · db:seasonFinalei

indicating that the popularity increases for those episodes that are at the end of a season;

(2) "i,j =h!pi·  ·vji or "i,j =h!pi· ·vji, if plis a numbered datatype property, e.g.

"3,1 =hddl:linkedTo.airedIn ·  · 2014i

meaning that the popularity increases for the episodes aired before the year 2014.

4.3 Automatic Discovery of Explanations | 79

Note that the length of the path !pi in the explanations gives an insight of how much the

graph has been traversed, i.e. how far has the search gone, and possibly how many iterations

have been achieved2_{. In our examples, the best explanation for iteration (2) of Figure}_4.4_is

"1,1 =hddl:linkedTo.rdf:type · db:seasonFinalei of length l = 2. At the following iteration,

in Figure4.5, when new explanations will be built and evaluated, the best explanation de-

tected is in fact "2,1 =hddl:linkedTo.dc:subject.skos:broader · db:GameOfThrones-TVseriesi

(length l = 3), since it represents more items in the pattern and less outside of it.

The iterative process here described is executed until a termination condition (time, number of iterations or a desired explanation score) is met. Next, we proceed by presenting the evaluation measures that we use to assess the validity of the hypotheses.

In document Explaining Data Patterns using Knowledge from the Web of Data (Page 100-104)