• No results found

Modelling approaches, such as [email protected], provide semantically rich reflection layers, which enable cyber-physical systems to reason about their context. As these systems evolve over time, reasoning processes typically need to analyse and compare the current context with its history. The use of models to organise and store such dynamic data—also called data in motion—suffers from the lack of sustainable mech- anisms to efficiently handle historical data. Despite the fact that considering time as a crosscutting concern of data modelling has been discussed since quite some time, today’s modelling approaches mostly still rely on a discrete representation of time. Therefore, a common approach consists in a temporal discretisation, which regularly samples the context (snapshots) at specific timestamps to keep track of the history. Analysing these data would then require to mine a huge amount of snapshots, extract a relevant view, and finally analyse it. This would require lots of computational power and be time-consuming, conflicting with the near real-time response time require- ments these systems usually face. In this chapter, we presented a novel temporal data model, which considers time as a first-class property crosscutting any model element,

allowing to organise context representations as temporal views dedicated for reasoning processes, rather than a mere stack of snapshots. By introducing a temporal valid- ity, independently for each model element, we allowed each model element to evolve independently and at different paces, making the full sampling of a context model unnecessary. Finally, we added a time-relative navigation, which makes an efficient navigation between model elements, coming from different timestamps, possible. This allows us to assemble a temporal data model for reasoning purposes and seamlessly and efficiently navigate along the time dimension of data, without the need to manually mine the necessary data from different context models. The proposed temporal data model has been implemented and integrated into the open source modelling framework KMF and evaluated on a smart grid reasoning engine for electric load prediction. We showed that our approach supports temporal reasoning processes, outperforms a full context sampling by far, and can be compatible with near real-time requirements. To sum up, we demonstrated that the proposed temporal data model is able to efficiently analyse data in motion.

5

A multi-dimensional graph data model to

support what-if analysis

Over the last few years, the cross-fertilisation of big data and cyber-physical systems, respectively, the Internet of Things has boosted data analytics from a descriptive era, mostly confined to the explanation of past events, to the emergence of new predictive techniques. Nevertheless, existing predictive techniques still fail to envision alternative futures, which inevitably diverge when exploring the impact of what-if decisions. What- if analysis calls for the design of scalable data models that can cope with the complexity and the diversity of representing and exploring many different alternatives. This chap- ter introduces a multi-dimensional graph data model, called many-world graph, which combines multi-dimensional graphs and temporal data to organise a massive amount of unstructured and continuously changing data. The proposed data model is an extension of the temporal data model presented in the previous chapter.

This chapter is based on the work that has been presented in the following paper:

• under submission at ACM/USENIX EuroSys 2017: Thomas Hartmann, Assaad

Moawad, Francois Fouquet, Gregory Nain, Romain Rouvoy, Yves Le Traon, and Jacques Klein. PIXEL: A Graph Storage to Support Large Scale What-If Analysis

Contents

5.1 Introduction . . . 100 5.2 Motivating example . . . 102 5.3 Many-world graphs . . . 103 5.4 MWG implementation . . . 110 5.5 Experiments . . . 116 5.6 Conclusion . . . 124

5.1

Introduction

In their “2013 Hype Cycle of Emerging Technologies” report Gartner considers pre- scriptive analytics as one of the “innovation triggers” of the next five to ten years [1]. For instance, the emerging domains of cyber-physical systems and the Internet of Things are expected to increasingly control bigger and bigger parts of our critical infrastructures, like electric grids, (semi-)autonomously [194]. This requires advanced data analytics to turn the huge amount of collected data into valuable insights to iden- tify suitable decisions [277]. However, technologies for prescriptive analytics are yet in their infancies. It heavily relies on the exploration of what might happen if this or that action would be applied, which is referred to as what-if analysis [167]. What-if analysis therefore plays a crucial part of decision-making.

Every action induces some side-effects, which potentially lead to an alternative state from where a set of other actions can be applied and so forth. When considering complex systems, such as CPSs or IoT, hundreds or thousands of alternative actions must be explored simultaneously. As in the many-world interpretation [139], every action can be interpreted as a divergent point leading to an alternative, independent world. This means that every data variable can have alternative values in different worlds. What-if analysis therefore tries to establish the sequence of actions that leads to the desired values of all variables, i.e., the desired world.

In addition, actions and values have a temporal dimension. As discussed in detail in Chapter 4, it is usually not enough to consider just the current state of a system, but it is often necessary to also consider and reason about historical data, using for instance approaches like sliding window analytics [93]. Therefore, given a specific world, variables can have different values for different points in time. This can lead to different histories for the values of variables in different worlds. In addition to that, every world and their variables can evolve independently at different paces. This leads to a huge combinatorial complexity of world and timepoint alternatives. Therefore, what-if analysis requires to define a data model that can represent at the same time:

• Temporal—i.e., evolving—data: Most of nowadays data is temporal in na- ture: from social networks, financial transactions, medical records to self-driving cars. What-if analysis typically not only needs to process current, but also his- torical data (cf. Chapter 4).

• Several alternative worlds: To independently explore different actions, it is necessary to “fork” or “snapshot” the underlying data, so that every action can be simulated on its own dataset.

The fast-growing area of graph analytics (for example GraphX [323]) suggests to or- ganise the massive amounts of unstructured, constantly changing data which such analytics have to deal with, as graphs. Graphs and associated computation models have been demonstrated to be especially suitable to depict complex data and their relationships [232], [240]. An increasing number of work discuss challenges of tempo- ral aspects of graph data [82], [107], [201], however an efficient exploration of many

independently evolving worlds remains an open issue. As discussed in Chapter 2, mod- [email protected] can be thought of as object graphs, where every node corresponds to one model element of the runtime model. Respectively, every edge in the graph maps to a relationship of the runtime model. In this chapter, we follow the terminology of graph analytics and speak about nodes and edges rather than model elements and relationships.

To address the combinatorial complexity of world and timepoint alternatives, we pro- pose in this chapter a novel graph data model, called many-world graph (MWG), where values of each node are resolved on-demand, based on the viewpoint (defined by a world and a timepoint) where we read from. As in the famous example of Schr¨odinger’s cat [282], where the cat is “dead” and “alive” at the same time (in different worlds) and the actual state is just revealed at the moment the cat is observed, in our approach nodes and edges can have many different values at the same time (depending on the world and time), which are just revealed at the moment the graph is observed. Like in this example, every node can have alternative values depending on the current view- point (time and world). Let us now suppose we could influence the state of the cat. The goal would then be, if we want to save the cat, to select the sequence of actions that leads to the world where the graph represents the state where the cat is consid- ered as alive. Based on this concept, our MWG implements an efficient on-demand fork concept for nodes and edges and at the same time supports temporal nodes and edges. We show that this allows to efficiently explore a large number of independent actions—in time and many worlds—even on a massive amount of data (hundreds of millions of nodes, timepoints, and hundreds of thousands of worlds). We believe that this model can prepare the ground for efficient what-if analysis.

We integrated this data model into the Kevoree Modeling Framework1, to evaluate its

capabilities and limits. First, we evaluate its performance when used as a base graph storage. We compare our approach with a state of the art graph storage. Secondly, we focus on evaluating the temporal aspects of our approach. Besides raw performance testing for different scenarios, we compare our approach to a state of the art time series database. Thirdly, we evaluate the performance of inserting and reading from many different worlds for different scenarios. Finally, we validate our approach with a scenario from the smart grid case study. For all cases, we discuss results, limits, best and worst cases.

The remainder of this chapter is organised as follows. First, Section 5.2 motivates the research behind this contribution, based on the smart grid case study. Sections 5.3 and 5.4 introduces the main concepts of MWG and their implementation in KMF. We thoroughly evaluate our approach in Section 5.5. The chapter concludes in Section 5.6.

1The source code of our many-world graph implementation is available under