Data Conditioning - Online disturbance prediction for enhanced availability in smart grids

ers and phase angles on all 14 buses. To total number of features is 56 as four features are measured per bus and there are 14 buses in total. Simulation parameters are summarized in Table 7.4.

In each simulation run one fault is injected but zero, one or more sags may occur on different buses depending on the topology and state of the grid.

Table 7.4. Simulation parameters summary.

Total number of simulations 100

Duration of an individual simulation in seconds from 20 to 60 Duration of an individual simulation in cycles from 1000 to 3000 Time for execution of each simulation « 1 min

Fault injection time random

Clearance time in cycles random between 1 and 15

Load demand variation 20%

Renewable generation variation 20%

Total number of features 56 (14*4)

Sampling period 20 ms (1 cycle)

Total number of measurements 3 000 000

7.3 Data Conditioning

Data output from DyPSyFI is represented as a sequence of time-tagged values with sampling period of 20 ms (one cycle). This is also a typical sampling period period of PMUs that are expected to be installed in most of the ADNs in the future.

To use these data for classification algorithms’ training and evaluation, they must be first presented in the matrix form and each data instance has to be classified. We process each data stream related to individual features separately. Sag detection on each bus is performed simply by comparing a current voltage value to the sag threshold (0.9 p.u.). Then, a subset of the data stream of a predefined length L backward from the point when the sag is detected, and that includes the entire set of measurements during the sag, is removed from the stream and placed in a table as one data instance with the class "sag". This process is also described in Figure 7.3 and an example of a data instance is given in Figure 7.4. Abbreviation "p.u." stands for "per unit".

Once all the sag-related data are extracted from the stream, data related to sag-free system state are extracted in a similar way. Subsets of consequent data of

104 7.3 Data Conditioning

Figure 7.3. An example of a tagged data output sequence from DyPSyFI.

Figure 7.4. An example of a data instance.

the length L are extracted from what has been left from the original data stream and each subset is placed in the matrix as one data instances of the class "no sag". A simplified explanation of the idea behind the described procedure is that we have conditioned the data in such a way that each instance represents a stream of data that may lead to a sag (instances classified as the "sag" class) or may not lead to a sag (instances classified as the "no sag" class). The constructed data matrix is used for prediction model training.

We observe that none of the simulated sags have lasted more than 30 cycles. As for improving availability we aim at using OLTCs that may be activated with a short activation delay in a range of 10ms (less than one cycle), it is reasonable to have relatively short lead time that is in the range of one second (50 cycles). Following the results of predicting failures in computer systems [67], we may expect that longer lead time will decrease prediction quality. As for that, we set L to 100 cycles. This is sufficient to perform proactive actions and also gives a possibility to evaluate how increasing lead time may affect the quality of predic-

105 7.3 Data Conditioning

tion. Moreover, setting L to 100 is a good compromise between the length of one data instance and the total number of instances in the specific case. Setting L to a higher value would decrease the total number of instances to the level that would have not been sufficient enough for training the model.

It is important to observe that in this interpretation lead time is defined with respect to the sag classification time that occurs at the end of the sag. Namely, lead time means how much in advance a class of the sag (that depends on its duration) may be predicted. This is because of the structure of the data and varying duration of sags. For this reason, only the end of the sag could be fixed in time. However, as sags are lasting never more than 30 cycles (in fact 99% last less than 25 cycles), this also means that a start of the sag is predicted with sufficient lead time. For example, when sag class (end) is predicted with lead time of 50 cycles, having in mind that it could not last more than 30 cycles, means that start of the sag is predicted with, at least, 20 cycles lead.

We inject faults on one bus at time and perform the data conditioning procedure. The procedure is performed on the data stream for every feature, detecting sags on all the buses. For each combination of a feature and a bus where the sag is detected, we generate a matrix that we use for training the models. For example, for the case when faults are injected on Bus 2, the matrix has 1400 instances. An extract of the table generated from measurements of the voltage on Bus 14 when sag detection is performed on Bus 6 is presented in Figure 7.5.

Figure 7.5. An example of a part of a data matrix.

To get good classification results, it is preferred that the classes are balanced. This means that the number of instances of one class is similar to the number of instances of the other class. A number of instances of the two classes for different buses when a fault is injected on Bus 2 are given in Table 7.5. In this case, Bus 6 has a good balance between the "sag" and "no sag" classes (687 vs. 713). For this reason, in the rest of the chapter, we focus on the results from predicting sags on Bus 6 when faults are injected on Bus 2. For the comparison, we also provide results of different states of the methodology for the sag-predictor design on other buses as well.

106 7.3 Data Conditioning

Table 7.5. Summary of the number of "sag" and "no sag" instances per bus.

Bus ID # "no sag" instances # "sag" instances

1 205 1195 2 201 1199 3 98 1302 4 237 1163 5 316 1084 6 713 687 7 1103 297 8 1200 200 9 1098 302 10 965 435 11 934 466 12 1200 185 13 1300 100 14 570 830

For describing a predictor, besides parameters that were introduced in Sub- section 4.1.1, namely precision, recall, and lead time, we introduce two addi- tional parameters relevant for online prediction with time series. These are sampling period and prediction window. Sampling period represents the time between two consecutive measurements. Prediction window is time frame for which the data are considered when making a prediction. For clarification, in Figure 7.6, the case when lead time is 30 cycles and prediction window size is 50 cycles is depicted. The lightening symbol indicates time when a sag is detected (and classified), whereas the bell symbol is used to mark time of the sag prediction. For the convenience, we also indicate the time for which data are are considered when creating one data instance with length of 100 cycles.

In document Online disturbance prediction for enhanced availability in smart grids (Page 125-129)