7.3 Qualitative Analysis using Manual Validation
7.3.3 Completeness
For the 3cixty KB, we analyzed the 2016-06-06 and 2016-09-09 releases; we eval- uated the properties attached to lode:Event entities. DBpedia KB entity type of foaf:Personand dbo:Place in 201510 and 201604 releases has 131 and 437 proper- ties with completeness issues. For manual validation, we manually inspected whether they are real issues.
3cixty. From the analysis of the 2016-06-06 and 2016-09-09 releases of the 3cixty KB releases, we found eight properties showing completeness issues. Based on the eight lode:Event class properties, we investigated all entities and attached properties. We first investigated five instances for each property, manually inspecting 40 different entities. From the investigation we observed that those entities that are presents in 2016-06-06 are missing in 2016-09-09 that leads to a completeness issue. Entities are missing in the 2016-09-09 release due to an error of the reconciliation algorithm. Based on this manual investigation, the completeness measure generates an output that has a precision of 95%. DBpedia. We have randomly selected 50 properties from foaf:Person class which is
identified as incomplete in the quantitative experiment. In our manual inspec- tion, we investigated a small number of the subjects presented in each property. More specifically, we first checked five subjects for manual evaluation for each property. For DBpedia, we checked a total of 250 entities. For example, we identified that the property bnfId has completeness issue. We extracted all the subjects for the releases of 201510 and 201610.
In detail, the property dbo:bnfId for version 201604 has only 16 instances and for version 201510 has 217 instances. We performed a entities comparison between these two releases to identify the missing instances of the given property dbo:bnfId in the 201604 release. After a comparison between the two releases, we found 204 distinct instances missing in 201610 version of DBpedia. We perform a further manual investigation on the instances to verify the result.
One of the results of the analysis is John_Hartley _(academic)4who is avail- able in the 201510 release. However, it is not found in 201604 release of DBpe- dia. To further validate such an output, we checked the source Wikipedia page using foaf:primaryTopic about John Hartley (academic)5. In the Wikipedia page BNF ID is present as linked to external source. In DBpedia from 201510 version to 201604 version update, this entity has been removed from the prop- erty dbo:bnfId. This example shows a completeness issue presents in the 201604 release of DBpedia for property dbo:bnfId.
Another example of DBpedia foaf:Person-type, properties with completeness issues are dbo:firstRace and dbo:lastRace. We extracted all the subjects present in the last two releases (201510 and 201604) and performed a set disjoint oper- ation to identify the missing subjects. For manual validation, We first checked five subjects for the dbo:firstRace and dbo:lastRace property, checking a total of 250 entities. In the 201604 release, dbo:firstRace has 769 instances and in the 201510 release it has 777 instances. After the set disjoint operation between two releases (201510, 201604), we found 9 distinct instances missing in 201604 release of DBpedia EN. Furthermore, we manually inspected each instance to identify causes of incompleteness issue. One of the data instance dbr:Bob_Saidfor the dbo:firstRace property is available in the 201510 release. However, it is not present in 201604 release. We further explore the corre- sponding Wikipedia page using foaf:primaryTopic. In the Wikipedia page firt raceis present as info box key. Due to DBpedia update from 201510 to 201604 version, this entity has been missing from the property dbo:firstRace. Simi- larly, we also found this entity is missing for the dbo:lastRace property. These examples present an ideal scenario for completeness issues in the 201604 release of the English version of DBpedia. Based on the manual inspection of 50 properties, we observed that completeness measure has the precision of 94%.
From the incomplete properties list of dbo:Place class we randomly selected 50 properties. We checked first five entities for manual evaluation. For dbo:Place class, we checked a total of 250 entities. For example, we identified that the property dbo:parish has completeness issue. We extracted all the instances for the releases of 201510 and 201610. Then we perform manual inspection for
4http://dbpedia.org/page/John_Hartley_(academic) 5https://en.wikipedia.org/wiki/John_Hartley_(academic)
7.3 Qualitative Analysis using Manual Validation 99
each entity and compared with the Wikipedia sources to identify the causes of quality issues.
For example, property dbo:parish has 26 entities for 201510 and 20 entities in 201604. We collect missing resources after performing set disjoint oper- ation. One of the results of the set disjoint operation is Maughold_(parish) missing in the 201604 version. To further validate such an output, we checked the source Wikipedia page using foaf:primaryTopic about wikipedia- en:Maughold_(parish). In the Wikipedia page Parish is presented as title definition of the captain of parish militia. In particular, in DBpedia from 201510 version to 201604 version update, this entity has been removed from the property dbo:parish. Based on the investigation of the properties, we compute our completeness measure has the precision of 86%.
From the Spanish version of DBpedia, dbo:Place entity type completeness measure we found 3606 properties with completeness value of 0. This indicates a potential completeness issue present for these properties. From the 3606 property, we randomly select the property dbo:prefijoTelefónicoNombre for manual validation. We collected all the subjects (56109, 55387) from the two releases (201604, 201610). Then we performed a set of disjoint operations between two triples set to identify those triples missing from the 201610. From the set disjoint operation, we found a total of 1982 subject missing from 201610 version. To keep the manual work in a feasible level, we selected a subset of 200 subjects for evaluation in a random manner. One of the results of the analysis is location Morante,6which is available in the 201604 release. However, it is missing in 201610 release of DBpedia. To further validate such an output, we checked the source Wikipedia page using foaf:primaryTopic about Morante7. In the Wikipedia page prefijo TelefónicoNombre is present in the infobox as key. In DBpedia ES from 201604 version to 201610 version update, this subject has been missing from the property prefijo TelefónicoNom- bre. This example shows a completeness issue presents in the 201610 release of DBpedia for property prefijo TelefónicoNombre. Based on the investigation over the subset of property values, we compute our completeness measure has the precision of 89%.
6http://es.dbpedia.org/page/Morante 7https://es.wikipedia.org/wiki/Morante