Large-Scale Experiments on SW Datasets - Automatic refinement of large-scale cross-domain knowl

5.2 Preliminaries

5.5.5 Large-Scale Experiments on SW Datasets

In this section, we perform large-scale experiments on whole RDF datasets. Table 5.4 shows the results of 5-fold cross validation on the RDF datasets presented ear- lier. Because of time limitation, we do not report the results for classifiers which require more than a week for training. HMC, HOMER and MLC4.5 were able to finish only on NELL, therefore for the other datasets in Table 5.4 we report the results only for SDType and SLCN, which were able to finish for all datasets, showing the effectiveness of the proposed approach in improving scalability.

On the NELL dataset, the HMC, HOMER and MLC4.5 perform better than SDType and SLCN. However, the runtime of the first three methods are notably longer than the others. When comparing SLCN against SDType, the former per- forms consistently better with respect to all evaluation measures, but longer runtime. Note that the results of SDType differ from those reported in (150) because the latter includesowl:Thingand classes in other ontologies, such as FOAF and

schema.org, in the evaluation, while we exclude them. On all the other datasets, which are significantly larger than NELL (c.f. Table 5.1), SLCN is the best overall performer as HMC, HOMER and MLC4.5 were not able to finish in less than one week.

64 CHAPTER 5. TYPE PREDICTION USING HMC

Dataset Method hF h-loss hamm rt(ms)

DBpedia F=Rin SDType 0.7647_±0.0021 0.7729_±0.008 0.0027_±0.0000 16 080 553 SLCN 0.8470±0.0009 0.4632±0.0051 0.0016±0.0000 7 024 255 DBpedia F=Rin∪Qin SDType 0.7702±0.0002 0.7501±0.0014 0.0026±0.0000 54 511 659 SLCN 0.8462_±0.0005 0.4610_±0.0048 0.0016_±0.0000 10 154 987 DBp(YAGO) F=Rout∪Rin SDType 0.6663±0.0001 2.6724±0.0019 0.0159±0.0000 6 744 282 SLCN 0.7029±0.0066 2.0965±0.0891 0.0133±0.0007 7 635 499 DBp(YAGO) F=Rout∪Rin∪Qout∪Qin SDType 0.6705±0.0002 2.6477±0.0019 0.0157±0.0000 213 904 335 SLCN 0.7022_±0.0064 2.1057_±0.0903 0.0134_±0.0007 48 374 257 Wikidata F=Rout∪Rin SDType 0.7529±0.0001 0.5749±0.0002 0.0017±0.0000 208 957 224 SLCN 0.8116_±0.0106 0.3748_±0.0091 0.0014_±0.0000 44 807 901 Wikidata F=Rout∪Rin∪Qout∪Qin SDType 0.7759±0.0001 0.5189±0.0003 0.0016±0.0000 272 206 437 SLCN 0.8680_±0.0028 0.2712_±0.0060 0.0011_±0.0000 64 413 619 NELL F=Rout∪Rin SDType 0.9025±0.0026 0.4842±0.0129 0.0041±0.0001 1 917 946 SLCN 0.8956±0.0042 0.5851±0.0148 0.0045±0.0002 2 547 871 HMC 0.9165±0.0041 0.5014±0.0132 0.0036±0.0002 6 336 452 HOMER 0.9251±0.0036 0.4799±0.0136 0.0032±0.0001 10 957 195 MLC4.5 0.9548_±0.0014 0.3305_±0.0091 0.0020_±0.0001 16 991 440 NELL F=Qout∪Qin SDType 0.8922_±0.0025 0.4447_±0.0091 0.0045_±0.0001 9 289 373 SLCN 0.7608±0.0041 1.0809±0.0175 0.0101±0.0002 18 252 489 HMC 0.8695±0.0019 0.6859±0.0767 0.0055±0.0001 102 815 535 HOMER 0.8678±0.0155 0.6866±0.1168 0.0056±0.0008 156 444 313 MLC4.5 0.9119_±0.0016 0.4840_±0.0044 0.0037_±0.0001 166 477 106 Table 5.4: Evaluation of different classification methods on large cross-domain SW datasets

dimensionality of the feature space, as it can observed in Table 5.1, and therefore the runtime is also increased. SDType is able to improve its results when consid- ering the greater set of features for DBpedia and DBpedia with YAGO types, but SLCN actually yields slightly worse results. This may be because of a possibly higher level of dependency between the features inQout andQin. Since the filter feature selection method does not take dependencies between features into account, the selected feature set could contain several redundant features.

The results in Table 5.5 can illustrate the importance of the high scalability of SLCN. Training SLCN on AIFB withF = Qis faster than training HOMER or

HMC onF =R, and the prediction quality is significantly higher. On Mutagene-

sis, the training time is a bit slower, but again, the prediction quality is significantly higher. That shows that, when time and computing resources are limited, using SLCN allows you to work on a higher number of features in comparison to less scalable methods, and ultimately achieve better results.

5.6. CONCLUSION 65

Dataset Method hF h-loss hamm rt(ms)

AIFB F=R SDType 0.7373_±0.0040 0.8844_±0.0062 0.0164_±0.0001 80 892 SLCN 0.9380±0.0032 0.2296±0.0064 0.0046±0.0002 88 997 HMC 0.9495±0.0005 0.1975±0.0044 0.0038±0.0000 442 235 HOMER 0.9493±0.0003 0.1983±0.0034 0.0038±0.0000 387 434 MLC4.5 0.9497±0.0007 0.1988±0.0059 0.0038_±0.0001 389 573 AIFB F=Q SDType 0.8032±0.0040 0.6902±0.0065 0.0128±0.0001 207 402 SLCN 0.9764_±0.0043 0.0675_±0.0080 0.0018_±0.0003 399 530 HMC 0.9870_±0.0008 0.0360_±0.0028 0.0010_±0.0001 1 609 020 HOMER 0.9872_±0.0007 0.0356_±0.0027 0.0010_±0.0001 3 739 919 MLC4.5 0.9880±0.0011 0.0337±0.0028 0.0009±0.0001 1 484 945 Mutagenesis F=R SDType 0.7261±0.0056 1.2770±0.0324 0.0164±0.0005 56 511 SLCN 0.7621±0.0077 0.8650±0.0064 0.0118±0.0002 1679 HMC 0.7607_±0.0034 0.8541_±0.0060 0.0118_±0.0002 3505 HOMER 0.7778_±0.0043 0.8805±0.0122 0.0121_±0.0003 29 643 MLC4.5 0.6726±0.0023 1.3274±0.0076 0.0177_±0.0001 55 289 Mutagenesis F=Q SDType 0.7372±0.0016 0.9357±0.0041 0.0135±0.0001 71 213 SLCN 0.9262±0.0053 0.2835±0.0121 0.0040±0.0003 10 304 HMC 0.9343_±0.0020 0.2540_±0.0032 0.0036_±0.0001 29 171 HOMER 0.9339_±0.0020 0.2557±0.0033 0.0036_±0.0001 109 943 MLC4.5 0.8991±0.0037 0.3728±0.0135 0.0055_±0.0002 67 920

Table 5.5: Evaluation of different classification methods on smaller SW datasets

5.6 Conclusion

In this chapter, we have modeled the type prediction problem in Semantic Web knowledge bases as a hierarchical multilabel classification problem. We propose SLCN, and compare it to both popular hierarchical multilabel classifiers and the state-of-the-art type prediction approach SDType (which is currently one of the strongest and best scalable algorithms for the task at hand). The experiments indicate that the local feature selection and local sampling can significantly improve scalability without sacrificing the quality of the prediction. The results also show that SLCN can perform better than SDType, while scaling better than the other multilabel classifiers evaluated in this chapter. With enough computing power avail- able, a state-of-the-art hierarchical multilabel classifier, such MLC4.5, is the best choice, while SLCN is a good trade-off when scalability is a major issue. The use of entity embeddings as features in our proposed type prediction approach and the results indicate that, under the experimental settings used in this chapter, the simple graph-features yield significantly better results.

Chapter 6

Detection of Relation Assertion

Errors

6.1 Introduction

Many of the knowledge graphs published as Linked Open Data have been created from semi-structured or unstructured sources. The magnitude of many of these knowledge graphs, e.g.: DBpedia, NELL, Wikidata, YAGO, does not allow for manual curation, and, instead, require the use of heuristics. Such heuristics, however, do not guarantee that the resulting graphs are free from errors. Wikipedia, which serves as source for DBpedia and YAGO, is estimated to have 2.8% of its statements wrong (209), which add up to the error caused by the extraction heuristics. Therefore, automatic approaches to automatically detect wrong statements are an important tool for the improvement of knowledge graph quality.

Incompleteness is another major problem of most knowledge graphs. Auto- matic knowledge graph completion has been widely researched (137), with a va- riety of methods proposed, including embedding models. Although such methods can also be trivially employed for error detection, their performance has not yet been extensively evaluated on the task.

Many existing large-scale error detection methods rely exclusively on the types of subject and object of a relation (48; 151; 153), and try to spot violations of the underlying ontology and/or typical usage patterns. While types can be a valuable feature, some knowledge graphs lack this kind of information, have only incom- plete type information, or have types which are not very informative. Moreover, some errors might contain wrong instances of correct types. For example, if some- one adds the fact playedFor(Ronaldo, Manchester_United), which would be

wrong becauseRonaldorefers to Ronaldo Nazário instead of Cristiano Ronaldo,

such an approach would not be able to detect the error.

In knowledge graph completion, paths in the graph have been proven to be valuable features (98; 69). For instance, in order to predict whether a personalives

in a placeb(livesIn(a,b)), one important path feature is whether the person has a

In document Automatic refinement of large-scale cross-domain knowledge graphs (Page 80-84)