• No results found

5.5 Experiments

5.5.8 Discussion and perspectives

Beyond the specific case of smart grids we described in this chapter, we believe that MWGs can find applications in a large diversity of application domains, including social networks [198], digital marketing, smart cities, healthcare, sales, and biology [167]. For example, in the case of smart cities, a MWG can store and learn the mobility models of citizens and then explore the impact of closing/opening roads on the traffic. Another domain of application for such what-if analysis is weather forecasting. As weather forecasts are built on complex models, anticipating the impacts of certain effects (e.g., air pollution) requires to simulate what would happen in such cases, based on complex simulation models. Additionally, in the domain of software engineering, MWGs can be used to trace the evolution of mobile apps [180] and thus identify the sequence of refactoring actions to be performed in order to improve the software quality. MWGs can also be used to monitor the execution of deployed software and explore future states, thus predicting the impact of changing parameters or executing specific actions. Aside of potential applications of this approach, our perspectives also include the extension of MWGs to consider different laws of evolution for the stored graphs, thus going beyond the application of machine learning [179]. We are also looking at the integration of our solution with existing graph processing systems, like Giraph [5]. Finally, beyond the support of what-If analysis, the coverage of alternative prescriptive analytics based on MWG is another research direction we are aiming for.

5.6

Conclusion

We proposed a novel graph data model, called many-world graph, which allows to efficiently explore a large number of independent actions—both in time and many worlds—even on a massive amount of data. We validated that our MWG implemen- tation follows the theoretical time complexity of O(log(n)) for the temporal resolution and O(m) for the world resolution, where m is the maximum number of nested worlds. Our experimental evaluation showed that even when used as a base graph—without time and many-worlds—our MWG implementation outperforms a state of the art graph database, Neo4j, for both mass and single inserts. A direct comparison with a state of the art time series database, influxDB, showed that although the MWG is not just a simple time series, but a fully temporal graph, the temporal resolution performance is comparable or in some cases even faster than time series databases. The experimental validation showed that the MWG is very well suited for what-if analysis, especially when only a small percentage of nodes changes. Regarding the support for prescrip- tive analytics, we showed that the MWG implementation is able to handle efficiently hundreds of millions of nodes, timepoints, and hundreds of thousands of independent worlds.

Reasoning over distributed data

and combining domain knowledge

6

A peer-to-peer distribution and stream

processing model

The [email protected] paradigm promotes the use of models during the execution of cyber-physical systems to represent their context and to reason about their runtime be- haviour. In the previous chapters, we introduced a scalable multi-dimensional graph data model that can cope with the complexity and the diversity of representing and exploring many different alternatives, combined with temporal data. However, the re- cent trend towards highly interconnected cyber-physical systems with distributed control and decision-making abilities makes it necessary to efficiently reason over distributed data. Coping at the same time with the large-scale, distributed, and constantly chang- ing nature of these systems constitutes a major challenge for analytic processes and their underlying data models. This chapter presents a peer-to-peer distribution mech- anism for the data model introduced in the previous chapters. A stream processing model on top of this enables to efficiently reason over distributed and frequently chang- ing data. Reasoning over distributed data becomes more and more crucial, given the trend towards highly interconnected cyber-physical systems with distributed control and decision-making abilities, such as smart grids.

This chapter is based on the work that has been presented in the following paper:

• Thomas Hartmann, Assaad Moawad, Fran¸cois Fouquet, Gr´egory Nain, Jacques Klein, and Yves Le Traon. Stream my models: Reactive peer-to-peer distributed mod- [email protected]. In 18th ACM/IEEE International Conference on Model Driven En- gineering Languages and Systems, MoDELS 2015, Ottawa, ON, Canada, September 30 - October 2, 2015, pages 80–89, 2015

Contents

6.1 Introduction . . . 128 6.2 Reactive distributed [email protected] . . . 129 6.3 Evaluation . . . 136 6.4 Discussion: distribution and asynchronicity . . . 140 6.5 Conclusion . . . 141

6.1

Introduction

Over the past few years the [email protected] paradigm has proven the potential of models to be used not only at design-time but also at runtime to represent the context of cyber-physical systems, to monitor their runtime behaviour and reason about it, and to react to state changes [96], [88]. Reasoning on the state of a cyber-physical system is a complex task, since it relies on the aggregation and processing of various constantly evolving data such as sensor values. As detailed in Chapter 4 and Chap- ter 5, this requires scalable data models that can cope with the complexity and the diversity of representing and exploring many different alternatives, combined with tem- poral data. Therefore, we introduced a scalable multi-dimensional graph data model (cf. Chapter 4 and Chapter 5)—to represent the context of CPSs—that can cope with the complexity and the diversity of representing and exploring many different alternatives, combined with temporal data. However, the recent trend towards highly interconnected cyber-physical systems with distributed control and decision-making abilities makes it necessary to efficiently reason over distributed data.

To fulfil their tasks, these systems typically need to share context and state infor- mation between computational nodes. Unlike in the previous chapter, where a node denoted a node in the context of a graph data model, in this chapter a node refers to a computational node, i.e., any computer system reading, writing, or processing data in the context of a cyber-physical system. Given the fact that our approach promotes the use of runtime data models to represent the state and context information of CPSs, the runtime models of distributed CPSs must also be distributed. Moreover, as shown in the previous chapters, runtime models of complex CPSs can get very large and the underlying data can change very frequently. This makes it difficult to share this information efficiently.

Let us consider the smart grid case study as a concrete example. Smart grids are char- acterised as very complex and highly distributed CPSs [303], where various sensor data and information from the electric topology must be aggregated and analysed. To sup- port reasoning and decision-making processes, we use the smart grid model presented in Section 1.2.2. The state of the smart grid, i.e., its runtime model, is continuously updated with a high frequency from various sensor measurements (like consumption or quality of power supply) and other internal or external events (e.g., overload warn- ings). In reaction to these state changes, different actions can be triggered. However, reasoning and decision-making processes are not centralised but distributed over smart meters, data concentrators, and a central system [142], making it necessary to share context information between these nodes. The fact that runtime models of smart grids, depending on the size of a city or country, can reach millions of elements and thousands of distributed nodes, challenges the efficiency of sharing context information.

These challenges are not specific to the smart grid but also arise in many other large- scale, distributed cyber-physical systems, where state and context information change frequently. For example, advanced automotive systems, process control, environmental control, avionics, and medical systems [221].

tems during runtime, to the best of our knowledge, there is no approach tackling the i) large-scale, ii) distributed, and iii) constantly changing nature of these systems at the same time [149], [301]. This chapter introduces a distributed [email protected] approach combining ideas from asynchronous, reactive programming, peer-to-peer dis- tribution, and large-scale [email protected]. The introduced distribution and stream processing model allows to distribute our previously proposed multi-dimensional graph data model (cf. Chapter 4 and Chapter 5) in a peer-to-peer manner and to efficiently reason over distributed, frequently changing data.

First of all, since [email protected] are continuously updated during the execution of a system, they cannot be considered as bounded but can change and grow indefi- nitely [119]. Therefore, we define models as observable streams of model chunks, where every chunk contains data related to one model element (e.g., a meter). This stream- based interpretation of models, allows to process models chunk-by-chunk regardless of their global size. Secondly, we distribute and exchange these model chunks between computational nodes in a peer-to-peer manner and on-demand to avoid the exchange of full runtime models. That peer-to-peer distribution can lead to highly scalable imple- mentations has, for example, also been discussed in [197]. Moreover, the use of a lazy loading strategy allows to transparently access the complete virtual model from every node, although chunks are actually distributed across nodes. Thirdly, we leverage ob- servers, an automatic reloading mechanism of model chunks (in case of changes), and asynchronous operations to enable a reactive programming style, allowing a system to dynamically react to context changes.

We integrated our approach into the KMF [147], [151] by entirely rewriting its core to apply a thoroughly reactive and asynchronous programming model. Evaluated on an industrial-scale smart grid case study, inspired by the Creos project, we demonstrate that our approach enables frequently changing, reactive distributed models and can scale to millions of elements distributed over thousands of nodes, while the distribution and model access remains fast enough to enable reactive systems.

The remainder of this chapter is as follows. Section 6.2 presents our approach of reactive distributed models at runtime, which we evaluate in Section 6.3. In Section 6.4 we discuss the need for asynchronicity, to distribute models before we conclude in Section 6.5.