• No results found

7.5 Knowledge Graph Governance

7.5.3 Lessons Learned

Technology awareness within the company After all, the majority of the stakeholders were enthusiastic and committed to developing an integrated knowledge graph and applications on top of it. Nevertheless, reservations on the fitness of the technology and methodology existed from the start. A few stakeholders preferred a bottom-up approach of first gathering and generating internally an overview of the existing schemas and models before involving external parties, such as our research institute. However, the management preferred an outside view and put a focus on quick results. Instead of spending time on finding an agreement on how to proceed, speed is the major driving force. Thus, they preferred to try out a “new” technology and methodology, which does not yet have the reputation of strong industrial maturity.

7.5 Knowledge Graph Governance

Table 7.1: Questions and answers of the stakeholders. A questionnaire with the listed questions is presented to the stakeholders using a Likert scale from 1 to 5. Values of the mean (M) and standard deviation (SD) for each question are outlined.

Question(Likert scale of 1 to 5, 1 = not at all, 5 = very much; M = mean value, SD = standard deviation)

M SD

1. Did the developed knowledge graph meet your expectations? 2.4 0.9

2. Do you think investing in semantic technologies can result in a fast ROI? 3.0 1.4

3. Do you consider semantic technologies fit for usage in the manufacturing domain? 3.6 0.9

4. Are you satisfied with the software for semantic technologies available on the market? 2.8 1.3

5. Is it easy to hire personnel with knowledge in semantic technologies? 1.8 0.4

Free-text questions:

6. What do you expect from semantic technologies in manufacturing contexts?

7. What is the biggest bottleneck in using semantic technologies in manufacturing contexts?

Perceived maturity of semantic technologies While semantic technologies are already widely used in some domains, e.g., life sciences, e-commerce or cultural heritage, there is a lack of success stories, technology readiness and show-case applications in most industrial areas. With regard to smaller and innovative products, the penetration of semantic technologies is still relatively small. A typical question when pitching semantic technologies within companies is “Who else in our domain is using them already?”. Therefore, it is important to point to successful

business projects, even if details on them are usually rare.

Lack of semantic web professionals on the job market Enabling the employees of the manu- facturer to extend the knowledge graph by themselves is crucial for the success of the project. Consequently, it is necessary to teach selected stakeholders the relevant concepts and semantic technologies. Hiring new staff experienced with semantic technologies is not necessarily an easy alternative. Compared to relational data management and XML technologies, there is still a gap between the supply of skilled semantic technology staff and the demand of the market.5

Importance of knowledge graph governance Of major importance for the company is a clear governance concept around the knowledge graph, answering questions such as who or which department is allowed to access, modify and delete parts of the knowledge graph. An RDF-based knowledge graph has advantages in this regard: i) it enables people across all sites of the company to obtain a holistic view of company data; ii) current data source schemes are enriched with further semantic information, enabling the creation of mappings between similar concepts; and ii) developers can follow a defined and documented process for further evolving and maintaining the knowledge graph.

Building on top of existing systems Accessing data from the existing infrastructure as a virtual RDF graph is an important requirement for the manufacturing company. It avoids the costs of materializing the data into RDF triples and maintaining them redundantly in a triple store, and at the same time, benefits from mature mechanisms for querying, transaction processing, and security of the relational database systems. Three different data access strategies are considered:

5

For the related field of data science, the European Data Science Academy has conducted extensive studies highlighting such a skill/demand gap all over Europe; cf. Deliverables D1.2 and D1.4 (“Study Evaluation Report

Chapter 7 Applications of Semantic Data Integration to Industry 4.0 Scenarios

DB in Dumps Relational data to be analyzed is dumped in an isolated place away from the

production systems, as not to affect their safety and performance. This strategy is used in cases where the amount of data is small and most likely to be static or updated very rarely.

DB in Replication All data is replicated, allowing direct access from both production systems

and new analytic platforms. This solution is considered in cases where data changes frequently and the amount of data is relatively high. It requires allocation of additional resources to achieve a “real-time” synchronization and to avoid performance degradation of the systems in production. We used this strategy to implement our solution since it allows accessing the data sources as a virtual RDF graph and benefit from the maturity of relational database systems.

DB in Production The strategy of accessing data in real-time systems does not require

allocating additional resources, such as investment in new hardware or software. Since this strategy exposes a high risk for performance degradation of the real-time systems, whereas sensitive information requires high availability and not providing it on time can have hazardous consequences, we did not apply it in our scenario.

7.6 Concluding Remarks

In this chapter, we investigated the applicability of a knowledge graph-based approach for I40 scenarios. We addressed the problem of integrating data from different data sources in a manufacturing company by applying a knowledge graph-based approach. We viewed the data sources to be integrated as standards in the I40 domain. Thus, this chapter addressed the level of standards in the general architecture of the thesis. Two use cases of core importance for the company are developed, i.e., tool availability and energy consumption. Then, the data sources related to the use cases are semantically described. Existing semantic interoperability conflicts among the data sources are analyzed.

To execute the use cases, we developed a knowledge graph approach for semantically integrating the data of the data sources. The integration of the data contemplated the solution of the semantic interoperability conflicts among the data sources. To achieve this, an architecture for executing the knowledge graph approach is defined. A sets of ontologies are developed to describe the semantics of the data sources. Furthermore, a set of mappings is defined to link the data sources with the ontologies. The architecture enables the integration of data considering the data sources, ontologies, mappings, and applications.

In order to test the proposed solution, a user study is developed. The stakeholders are interviewed with respect to the application of the knowledge graph to the use cases. Additionally, questions with respect to the use of the solution and with the perceived benefits of a knowledge graph-based approaches for manufacturing are presented. In general, the results of the user study demonstrated that the developed solution met the expectations of the stakeholders.

C H A P T E R

8

Conclusions and Future Direction

In this thesis, we investigated the problem of enhancing semantic interoperability in Industry 4.0. We proposed a knowledge graph approach for integrating data in these scenarios. The knowledge graph approach enables the integration of data as well as the identification and resolution of semantic interoperability conflicts among Industry 4.0 entities – one of the key challenges in this application domain. The discussion of the research problem, the research questions, as well as the contributions, are presented in Chapter 1. Necessary background concepts are examined in Chapter2. An overview of state-of-the-art approaches related to the main problem tackled in this thesis is presented in Chapter 3. Then, the subsequent three core chapters of the thesis describe and evaluate three key aspects of the proposed knowledge graph integration approach. Further, a real-world case study, performed in a manufacturing company, is presented in Chapter 7. The case study provides practical evidence regarding the applicability of the knowledge graph approach to the problem of data integration in Industry 4.0 scenarios. Finally in this chapter, the thesis is concluded by revisiting the research questions. To this end, the achieved results are examined in Section8.1 and some limitations of the work are highlighted in Section 8.2. Section8.3outlines possible avenues for future work.

8.1 Revisiting the Research Questions

In order to conduct the work of this thesis, the research problem is divided into four research questions. The objective of the first research question is to investigate whether knowledge graphs are capable to encode the meaning of entities in Industry 4.0 scenarios, particularly those that belong to the specification of standards related to the domain.

RQ1: How can a knowledge graph approach define mappings of standards and stand-

ardization frameworks and resolve existing semantic interoperability conflicts among them?

This research question is addressed in Chapter4. Existing state-of-the-art approaches are limited to classify and relate Industry 4.0 entities without considering the semantics encoded on them. We tackle this by proposing a novel methodology for building knowledge graphs of Industry 4.0 entities. Particularly, this methodology concentrates on semantically describing standards and standardization frameworks. We applied the proposed methodology and built the knowledge graph of Industry 4.0 standards (I40KG). The I40KG comprises semantic descriptions

Chapter 8 Conclusions and Future Direction

of more than 220 standards, 25 standardization organizations, as well as 10 standardization frameworks. The I40KG semantically describes and annotates standards, as well as relations among them. Furthermore, categorizations of standards with respect to the standardization frameworks are also semantically encoded in the I40KG. These semantics descriptions and annotations help to discover new relations between standards based on existing ones, thus, reducing interoperability conflicts. The knowledge graph internal reasoning step reveals implicit relations among standards. Further, the performed graph analysis is capable to discover most relevant standards, i.e., standards with the largest number of connections in the graph. The knowledge graph interlinking step permits to discover new knowledge about standards and standardization frameworks. We analyze the number of discovered relations among standards and the accuracy of these relations. The observed results indicate that both, the reasoning and linking processes enable to increase the connectivity in the knowledge graph by up to 80%, whilst up to 96% of the relations can be validated. The evidence from this study supports the advantages of a knowledge graph approach for semantically describing and interlinking the knowledge from standards and standardization frameworks.

RQ2: How can knowledge graphs represent semantics encoded in Industry 4.0 entities?

In Chapter5 this research question is positively answered by demonstrating that knowledge graphs are capable of providing a solid knowledge representation for entities in Industry 4.0 scenarios. We interpret the ontologies as a key part of the knowledge graphs that records the structure, in this case, of the Industry 4.0 domain. In this regard, we proposed a methodology, based on best-practices for ontology building. The methodology is applied for the construction of three ontologies capturing the structure of standards of core importance for Industry 4.0 scenarios. First, the RAMI4.0 model provides a reference architecture for I40 solutions and the Administration Shell concept enables the digital representation of physical assets. Second, the AML ontology covers the AutomationML standard. This standard is crucial in industry solutions for designing CPS from distinct discipline perspectives such as the mechanical, electrical and software engineering ones. Finally, SCORVoc represents the supply chain operations reference model of the APICS industry association. We demonstrate the benefits of the semantic representation of Industry 4.0 entities. Then, knowledge graphs are created for each one of the designed ontologies. Common use cases for the semantic representation in Industry 4.0 scenarios are developed, e.g., the units of measurements. The representation and discovery of semantic interoperability conflicts among entities in these scenarios are introduced. Furthermore, the resolution of conflicts by considering and applying the semantics of the ontologies is developed. The knowledge graph approach for representing and linking entities poses many advantages for the realization of the Industry 4.0 vision. The flexible schema representation, the creation of global unique identifiers for entities, the ease creating a multilingual representation, are some of these advantages that we can observe in the proposed approach.

RQ3: How can existing rule-based approaches be utilized to resolve semantic interoper-

ability conflicts over knowledge graphs?

In Chapter 6, this research question is addressed by proposing Deductive Databases and Probabilistic Soft Logic approaches for creating and exploiting knowledge graphs. We formalize the problem of identifying and resolving conflicts among I40 entities from different CPS per- spectives following these two approaches. Knowledge graphs are created for representing the

8.1 Revisiting the Research Questions

information from different perspectives of CPS design, i.e., mechanical, electrical or software views. We implemented these formalizations in Alligator and SemCPS, respectively. First, we presented Alligator, a deductive approach for the identification and solution of semantic interoperability conflicts between CPS documents. Alligator relies on Datalog to represent knowledge that characterizes different types of semantic interoperability conflicts in CPS docu- ments. Alligator uses a knowledge graph to encode the knowledge of the CPS perspectives. Second, we introduced SemCPS, an approach for enabling the integration of CPS descriptions into knowledge graphs. SemCPS uses Probabilistic Soft Logic to capture the knowledge that characterizes different types of semantic interoperability in CPS documents. As part of the SemCPS approach, we defined the concept of uncertain knowledge graphs. Uncertain knowledge graphs are capable to represent the uncertainty, which is typical in CPS design. Uncertain knowledge graphs comprise hard and soft knowledge facts for representing the entities of the CPS perspectives. An empirical evaluation is performed to compare the proposed approaches with existing ones such as EDOAL and SILK. In general, SemCPS exhibits better performance than Alligator, EDOAL, and SILK when executed with accumulative types of semantic interoperability conflicts and when an increasing number of entities is added. However, in certain cases Alligator outperforms the compared approaches with regard to precision. Furthermore, we analyzed the behavior of Alligator and SemCPS for dealing with uncertainty. In the first case, without considering uncertainty, i.e., only hard knowledge facts, both approaches present similar behavior for identifying and resolving the semantic interoperability conflicts among the perspectives. In the second case, i.e., the combination of hard and soft knowledge facts, the SemCPS approach allowed us to represent the uncertainty which is typical in the CPS design. By relying on this representation, SemCPS is capable to compute many alternatives of CPS design and to choose the most probable one. The most probable alternative matches with the final integrated design for the studied CPS. Taken together, these results suggest that rule-based approaches are capable of identifying and resolving semantic interoperability conflicts in CPS design.

RQ4: How can a knowledge graph-based integration of entities be applied in Industry 4.0

real-world scenarios?

Chapter 7 reports on the results of the semantic-based approach for data integration in real-world industrial scenarios. A case study from an actual manufacturing company is presented and shows the applicability of the knowledge graph approach to integrate heterogeneous data sources in Industry 4.0 scenarios. Two use cases of core importance for the efficiency of factory production are developed, i.e., tool availability and energy consumption. We investigated the data sources of the manufacturing company that are related to the use cases. Existing semantic interoperability conflicts among the data sources are identified and analyzed. Furthermore, to execute the use cases, we applied the knowledge graph approach for resolving the semantic interoperability conflicts. A set of ontologies was developed to describe the semantics of the data sources, i.e., bill of materials, manufacturing execution systems, and sensor data. In addition, a set of mappings are defined to link the data sources to the respective ontologies. An architecture for implementing the knowledge graph approach is defined. The architecture enables the integration of data considering the data sources, ontologies, mappings and applications. The implemented solution is evaluated by interviewing the stakeholders in the manufacturing company. Questions with respect to the use of the solution and the perceived benefits of a knowledge graph approach for manufacturing are presented. We observed that most of the

Chapter 8 Conclusions and Future Direction

assessment questions received good evaluation results. In conclusion, a knowledge graph approach appears indeed beneficial for integrating data in real-world Industry 4.0 scenarios.