Evaluation - Enabling Model-Driven Live Analytics For Cyber-Physical Systems: The Case of Smart

In this section, we evaluate if the proposed temporal data model is able to efficiently analyse data in motion. Therefore, we apply it on an industrial case study and evaluate its impact. The case study is taken from our cooperation with Creos Luxembourg S.A. and has initially led to the research behind this approach. In a nutshell, in this case study we evaluate the performance of a reasoning engine that needs to analyse temporal smart grid data. Therefore, it has to aggregate and navigate temporal data and, if necessary, take corrective actions. This case study is based on the smart grid model presented in Figure 1.4, which is periodically filled with live data from smart meters and sensors. Based on the electric consumption, smart meters can derive the electric load in a region. The idea for this reasoning engine is to predict, if the load in a certain region will likely exceed or surpass a critical value. Therefore, a linear regression of the values of the meters in this region, over a certain period of time, has to be computed. This case study has been implemented twice, once with a traditional sampling strategy, and once using our temporal data model, which we implemented into the KMF framework (cf. Section 4.6). The full sampling approach and our approach both use Google’s LevelDB as a storage backend and both are executed using JDK 8. All experiments are conducted on a MacBook Pro with an Intel Core i7 CPU, 16GB RAM, and a SSD. Each experiment has been executed 100 times and the presented results are average values.

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 ·105

0 50 100

Number of model elements

Main Memory (in MB) Full Sampling LU Full Sampling MU

Figure 4.7: Memory usage for model update operations using the full sampling strategy

The following validation is based on three key performance indicators (KPIs): 1) time and memory requirements to update the context model, 2) performance to navigate the context model in time, and 3) space requirements for persisting the temporal data. For each KPI, we compare our approach with the classic sampling strategy, taking a snapshot of the entire model for each modification (or periodically). The measured memory value for KPI-1 is main memory (RAM), for KPI-3 it is disk space. The measured time is the time required to complete the reasoning process (depending on the KPI). Main memory is measured in terms of used heap memory, queried using Java’s runtime API.

4.7.1 KPI-1: Model updates

First, we evaluate time and memory requirements to update the proposed temporal data model and compare this to a full sampling approach. We analyse modifications of two magnitudes: 1) a large update (LU) that consists in creating a new concentrator and a smart meter subtree (1,000 units) and 2) a minor update (MU) that consists in updating the consumption value of a specific smart meter, which is already present in the context model. For this experiment, we keep the size of each update constant but vary the size of the context model and the history. We grow the context model from 0 to 100,000 elements, which approximately corresponds to the dimension of the actual size of our Luxembourg smart grid model. The results of KPI-1, in terms of memory usage, are depicted in Figure 4.7, for using the full sampling approach and in Figure 4.8, for using the temporal data model. Outcomes of KPI-1, with respect to the required time for updating the context models are shown in Figure 4.9, for full sampling and in Figure 4.10, for the temporal data model.

Let us first consider main memory. The full sampling strategy depends on the size of the model, as reflected by the linear progression of the required main memory size, to perform the updates. In contrary, our approach results in two flat curves for LU

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 ·105 1 1.5 2 2.5

Number of model elements

Main

Memory

(in

MB)

Temporal Data Model LU Temporal Data Model MU

Figure 4.8: Memory usage for model update operations using the temporal data model

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 ·105 0 500 1,000 1,500 2,000

Number of model elements

Up date time (in ms) Full Sampling LU Full Sampling MU

Figure 4.9: Update time for model manipulations using the full sampling strategy

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 ·105 20 40 60 80 100

Number of model elements

date

time

(in

ms)

Temporal Data Model LU Temporal Data Model MU

and MU updates, showing that the required memory only depends on the size of the update, not on the size of the model. This is confirmed by the fact that LU requires more memory than MU, but both are constant—less than 2.5 MB, compared to up to 100 MB of the full sampling strategy. This is due to our lazy loading approach, i.e., only the elements which need to be updated are loaded into main memory.

Next, we look at the time required to update context models. The time to insert new elements using the full sampling approach depends on the size of the model, but is nearly constant with the proposed temporal data model. This behaviour is similar to what we observed for the required main memory. The fact that the updated time is less for MU compared to LU confirms that our approach reduces the time needed to modify elements. Looking at the results of the experiments, KPI-1 demonstrates that even in the worst case scenario, where all elements evolve at the same pace, our approach offers a major improvement for model update operations (factor of 33 for time and between 50 to 100 for memory).

Finally, we analyse the capability of our temporal data model to handle batch inser- tions. Therefore, we additionally performed a batch insert using once the full sampling and once our approach. The batch insert consists of 10,000 historical values for each smart meter, resulting in a model of 1 million elements. As a result, we obtain 267 seconds to insert with the full sampling strategy and 16 seconds for our approach. This means that even in the worst case, we still have an improvement of a factor of 17 for the insertion time.

4.7.2 KPI-2: Navigating the context model in time

For the following experiment, we consider an already existing smart grid model containing, a history of consumption values. We evaluate the required time to execute a complex computation over the historical consumption data. We run several prediction algorithms over the model, which correlate historical data in order to predict the future state of the grid and, for example, throw an alert in case of a potential overload risk. We define two prediction categories, each for two different scales, resulting in 4 different reasoning processes: 1) small deep prediction (SDP), 2) small wide prediction (SWP), 3) large deep prediction (LDP), and 4) large wide prediction (LWP). Wide prediction means that the algorithm uses a correlation of data from neighbouring smart meters in order to predict the future consumption. This means that the algorithm needs to explore, i.e., navigate, the model in wide. The underlying idea is that the electric consumption within a region (a number of geographically close smart meters) remains comparable over time for similar contexts (weather conditions, time of the year, etc.). The deep prediction strategy uses the history of customers to predict their consumption habits. In this case, the algorithm needs to navigate the model in deep, i.e., it needs to navigate the history of a model element. For both approaches we perform a linear regression to predict the future consumption using two scales: large (100 meters) and small (10 meters).

Table 4.1: Reasoning time to predict the electric consumption (in milliseconds)

Type SDP SWP LDP LWP

Full 1,147.75 ms 1,131.13 ms 192,271.19 ms 188,985.69 ms Lazy 2.06 ms 0.85 ms 189.03 ms 160.03 ms Factor 557 1,330 1,017 1,180

The results are presented in Table 4.1. The gain factor of using the temporal data model, compared to full sampling, is defined as F actor = (F ull Sampling time / N ative V ersioning time). As can be seen in Table 4.1, the gain factor lies between 557 and 1,330. Using the proposed temporal data model in- stead of a full sampling approach, reduces the processing time from minutes to seconds. This experiment showed that the usage of a temporal data model can significantly re- duce the time to analyse historical data. This can enable reasoning processes to react in near real-time.

4.7.3 KPI-3: Storing temporal data

In this section, we evaluate the overhead, introduced by our approach, for storing temporal data. As in the experiments before, we compare the results with a full sampling strategy. Our goal is to determine, how much of a model must be changed in one step, so that storing temporal models is more costly in case of disc space, compared to a full sampling approach. Intuitively, the more changes are done, the higher the overhead will be compared to full sampling. In other words, we want to investigate, after which percentage of modifications becomes our solution less efficient in terms of storage space, compared to the full sampling approach. It is important to note that the navigation gains remain still valid.

For this evaluation, we load an existing model (containing 100 smart meters), update the consumption value of several meters, serialise it again and store it. By varying the percentage of smart meters updated per version (period), we can compare the size of the storage space, which is required for the comparison with our approach and the full sampling approach. To ensure a fair comparison we use for both cases a compact JSON serialisation format. Results are depicted in Figure 4.11.

Regardless of the amount of modifications, the full sampling approach requires 39.1 KB to store one version (snapshot) of the model. This is a serious overhead for small modifications. In contrary, the temporal data model requires a variable amount of storage space, i.e., the required storage space depends on the amount of modifications. It varies from 393 bytes for 1% of changes to 39.8 KB for 100% of changes (the complete model changes). A linear augmentation of model changes leads to a linear augmentation of needed storage space. This confirms that our storage strategy for model elements has no unexpected side effect.

0 20 40 60 80 100 0

2 4

·104

Percentage of modifications per version

Storage size (in b ytes) Full sampling Temporal data model

96 98 100

3.8 3.9

4 ·10

Zoom (same legends) Figure 4.11: Required storage space to save temporal data

To put the observed results into perspective, our proposed temporal data model reduces the required storage space by 99.5% for 1% of changes. On the other hand, it increases the required storage space by 1.02% for 100% of modifications. This means that up to 98.5% of modifications of a model, our approach needs less memory than a full sampling approach. Also, the overhead of 1.02% for a change of the full model has to be set into relation to the features enabled by this overhead (navigation, insertion time gains, comparison time gains).

Besides the presented runtime usage improvements, this validation shows that the temporal data model offers nearly constant time and memory behaviour, which allows to face massive amounts of historical data and large-scale context models. This validation demonstrates that the proposed temporal data model is able to efficiently analyse data in motion.

In document Enabling Model-Driven Live Analytics For Cyber-Physical Systems: The Case of Smart Grids (Page 113-118)