Outlook - Automating the Fact-Checking Task: Challenges and Directions

and engineers tend to have large output data in their experiments, which is both difficult to analyze and archive properly without provenance metadata. To answer RQ4 we studied existing paradigms and approaches to enable 1) interoperability, 2) scalability and 3) interpretability. These three pillars are the cornerstone to enable reproducible research. We first survey existing attempts to bridge this gap, which were predominantly designed on a workflow-system base. Due to the complexity of the scenario, these attempts imply in several constraints, allowing its implementation only in specific domains (e.g., Bioinformatics). To tackle this problem, ontologies were proposed. However, proposing a high level of granularity, which naturally comes at a price of more complexity to represent the experiments. In order to enable reproducibility of machine learning experiments, in general, we proposed MEX. MEX is a lightweight specification based on Linked Data for interchanging machine-learning metadata over different architectures to achieve a higher level of interoperability. To further foster the dissemination of the protocol, we designed an extensive list of tools and frameworks.

1. We define the first lightweight standard to represent machine learning experiments, a vocabulary dubbed MEX;

2. We design LOG4MEX - a library which allows to export configurations and experiment outcomes;

3. Furthermore, we propose WEB4MEX, a REST interface to export configurations through web calls;

4. Also, MEX-Interfaces is part of the project and defines a set of class interfaces that allow machine learning metadata generation without explicit code implementations;

5. WASOTA has been proposed as a prototype to store experimental metadata online; 6. Finally, we propose ML-Schema, an upper level ontology which maps state-of-the-art

ontologies and adopts high level of provenance in a single representation, achieving the maximum level of interoperablity and intepretability; possible.

Through the successful conclusion of these projects, we are able to answer RQ4 and conclude that throught linked data technologies is possible to enable reproducibility of scientific experiments. The proposed methodology and frameworks provide a prompt method to describe experiments with a special focus on data provenance and fulfills the requirements for a long-term maintenance.

7.2 Outlook

In this final section, we describe what we envisage as the future directions for this work. In the scope of this thesis, we focused on different underlying challenges to automatize the validation task in the fact-checking context. Although very interesting findings have been presented, there is still room to improve the results in each of the problems presented in this thesis. In the following items we summarize the future directions on each of the main contributions of this thesis.

Regarding the challenge of recognizing named entities on noisy data (RQ1: HORUS), we see the following directions to continue and improve the integration approach:

1. Extend HORUS approach to performing classification of other common entities beyond PER, LOC and ORG (e.g., MISC); although the exploration of images and text mining techniques have been extensively tackled in the scope of the thesis, the inclusion on new target classes - especially specific emerging entities [82] - have not yet been explored. 2. Our proposed approach has a remarkable negative aspect w.r.t. performance. Currently,

its performance is close to 2-3 seconds per token, which is considerably slow and possibly a significant issue in terms of the production environment. The bottleneck is the image feature extraction pipeline. A distributed solution to cache the extracted features would considerably speed up the model‘s response.

3. The metadata provided by HORUS has the potential to become an essential asset to several other applications in NLP, such as entity linking and question answering, for instance. This integration should be validated and may push the state of the art, especially in noisy text.

4. Finally, to improve performance and reduce costs with search engine calls, we propose the integration with open Knowledge Bases, such as DBpedia, YAGO. The trade-off might be worth and shall be validated in terms of costs and performance.

Regarding the trustworthiness model (RQ2: WebCred), it can be extended with the following ideas:

1. Integrate our new proposed method with graph-based solutions [23] to assess credibility in order to improve user experience and give more insights to the end-user.

2. The concept of credibility strongly relies on the user‘s feedback. An interesting approach would be extending the framework to an open crowdsource based architecture where users could interact with the system, giving feedback to the model‘s response. This information could be either used as a feature to a supervised model as well as to design a reinforcement learning-based strategy to tackle the problem.

3. Documents which have a certain number of false statements should be automatically labeled as non-credible sources. The idea of performing macro fact-checking over all statements existing in a given information source can be an exciting approach to derive a final credibility score.

Regarding our automated fact-checking approach (RQ3: DeFacto) we plan the following to the future:

1. All new models proposed (HORUS and WebCred) have a high potential to improve the performance of our fact-checking framework and thus should be validated in DeFacto. 2. An exciting line of research would be to study visual fake-news. Generative Adversarial

7.2 Outlook

With the adoption of this technology, videos and photos can be used to spread fake-news in a different level of difficulty for existing automated fact-checking models.

3. The usage of word embeddings to enhance the evidence extraction phase is a promising research line. In this case, we suggest to continue and extend the evidence extraction method we proposed in this thesis. Possible extensions could be considering not only the explicit knowledge encoded available in related documents but the implicit knowledge too, which can be acquired through common-sense frameworks. Transformers also have great potential to boost the task.

4. In the document retrieval component, we incentive the study of entity-linking methods to potentially improve recall. Our current architecture does not perform disambiguation of entities, which is an important step in the pipeline.

5. Untimely, we also suggest integrating into DeFacto a feedback module, shifting the architecture to a human-in-the-loop paradigm. Positive and negative feedback from specialists (e.g., journalists) can be a great asset to improve the overall model‘s performance.

Future of reproducible research (RQ4: MEX and ML-Schema) should encompass more transpar- ent methods to represent data and metadata:

1. The most challenging issue regarding this topic is indirectly related to time management. Scientists face a never-ending competition to solve complex problems within the minimum amount of time. They are thus constantly pushed to deliver more and faster, which implies in the lack of proper representation of scientific experiments. In the engineering side, one exciting solution would be integrating ML vocabularies and ontologies through the proposed canonical format ML-Schema into famous ML frameworks, such as scikit-learn. 2. A very ambitious solution lies in constructing machines to automatically reading source-

code and generating metadata representation out of it, without human interference.

In terms of the applicability of the approaches we see a lot of opportunities to solve integration problems on the following domains:

In the health domain, HORUS can be adopted as a novel solution to detect unusual entities for NER models, e.g., products and ingredients. Due to its language-agnostic characteristic, it may be applied to several languages without restrictions.

Furthermore, DeFacto can be applied to verify knowledge graphs containing information of alimentary diet. For instance, by checking relationships among the food and diseases (e.g., “carrots are a weight loss friendly food and have been linked to lower cholesterol levels and improved eye health.”

After most famous web credibility tools have been shut down to the public, WebCred can stand as the only free solution to bring credibility information to internet users if deployed in large scale.

a general controlled environment to foster reproducibility.

In document Automating the Fact-Checking Task: Challenges and Directions (Page 127-130)