Methodology - Solar Power Forecasting

2.5 Discussion

3.1.1 Methodology

DWkNN extends the traditional kNN algorithm by assigning weights to the different data sources based on the importance of these sources for the prediction. These weights are learned from previous data. In contrast to the traditional weighted neareast neigh- bor algorithms, DWkNN assigns weights to the features from the different sources, not to the nearest neighbours when combining their predictions.

CHAPTER 3. INSTANCE-BASED METHODS 29 Sequence of days Average d+1 Day to be predicted … d-1 d 1 2 3 4 5 … PVd-1 _PVd PV1 _PV2 _PV3 _PV4 _PV5 PV power output 𝐏𝐏𝐏𝐏𝐝𝐝+𝟏𝟏

Days after the nearest neighboursNearest neighbors of d+1

Prediction for d+1 w1 , w2, w3and k Grid parameter optimization Step 1 Step 2 Historical data: [𝑷𝑷𝑷𝑷𝟏𝟏_,_{𝑷𝑷𝑷𝑷}𝟐𝟐_{, … ,}_{𝑷𝑷𝑷𝑷}𝒅𝒅_], [𝑾𝑾𝟏𝟏_,_𝑾𝑾𝟐𝟐_{, … ,}_𝑾𝑾𝒅𝒅_] [𝑾𝑾𝑾𝑾𝟏𝟏_,_{𝑾𝑾𝑾𝑾}𝟐𝟐_{, … ,}_{𝑾𝑾𝑾𝑾}𝒅𝒅_]

Initialise weights w1, w2, w3and k

Transform data using weights [w1 PV, w2 W, w3 WF]

Develop kNN and evaluate its performance on validation set

Update weights and k

Select the best weights and k PV and weather data for day d,

weather forecast for day d+1:𝑷𝑷𝑷𝑷𝒅𝒅_,_𝑾𝑾𝒅𝒅_,_{𝑾𝑾𝑾𝑾}𝒅𝒅+𝟏𝟏

Figure 3.1: The DWkNN algorithm

DWkNN addresses the limitation of the traditional k-NN which doesn’t consider the importance of different data sources. In this study, we focus on three data sources that are most commonly used for PV power output prediction tasks: historical PV data (PV), historical weather data (W) and weather forecast (WF).

The algorithm is summarized in Fig. 3.1 and consists of two main steps:

1. Finding the best weights for the features of the data sources and the best number of neighbors by using the historical data and applying a grid search method, and 2. Predicting the PV power output for the new day using the selected parameters

from the previous step.

3.1.1.1 Finding the Weights

The parameters that are optimized in the first step are the weights w1, w2 and w3, reflecting the importance of each of the three data sources and the number of neighbors

CHAPTER 3. INSTANCE-BASED METHODS 30

k. To find the best parameters, a grid search method is first applied. The value of

k is varied from 1 to 10 with an increment of 1, and the values of the weights are varied from 1 to 100%, with an increment of 1%. For each set of weights, the training and validation sets are transformed by multiplying the values of these sets with the corresponding weights:

new dataset= [w1P V, w2W, w3W F]

ThenkNN, trained on the training data, is applied to predict all instances from the validation set and the accuracy is calculated. The parameters resulting in the highest accuracy on the validation set are selected.

3.1.1.2 Predicting the New Day

In the second step, the PV power output for the next day d+1 is predicted. To do this, firstly the k nearest neighbors of the previous day (dayd) are selected using the chosen data weights and the number of neighbourskfrom the previous step. Then the PV power data of the days following the nearest neighbors is averaged to generate the prediction for the next day. Specifically, ifS = {s1, s2, ..., sk}is the set of selectedk

days, then the prediction forP Vd+1 _{is given by:}

P Vd+1 ₌ 1

P VSi+1

where eachP Vi∈S _{is the 20-dimensional vector of half-hourly power outputs for day}

To find the nearest neighbors for dayd, four different representations of day dare used, depending on the data sources used, as shown in Fig. 3.2 - 3.5:

1. As shown in Fig. 3.2, when only the PV and W data sources are used (P V +

W), day d is represented as a feature vector consisting of its PV power and weather data[P Vd_{, W}d_]_{and it is compared with the previous days}_i_represented

as[P Vi_{, W}i_]_.

2. As shown in Fig. 3.3, when only the PV and WF data sources are used(P V +

CHAPTER 3. INSTANCE-BASED METHODS 31 W PV Days ₁ ₂ ₃ ₄ ₅ ₆ _… _d-1 _d _d+1 1 2 3 4 5 6 … d-1 d 1 2 3 4 5 6 … d-1 d 𝐏𝐏𝐏𝐏𝐝𝐝+𝟏𝟏 Selected Nearest neighbors of

Days after the nearest neighbours

Prediction for d+1 PV+W

[𝐏𝐏𝐏𝐏𝒅𝒅_,_𝐖𝐖𝐝𝐝_]

Figure 3.2: Representing days using historical PV and weather data

W PV Days ₁ ₂ ₃ ₄ ₅ ₆ _… _d-1 _d _d+1 1 2 3 4 5 6 … d-1 d 𝐖𝐖𝐖𝐖𝐝𝐝+𝟏𝟏 1 2 3 4 5 6 … d-1 d 𝐏𝐏𝐏𝐏𝐝𝐝+𝟏𝟏 Average

Days after the nearest neighbours

Prediction for d+1

PV + WF

Nearest neighbors of [𝐏𝐏𝐏𝐏𝐝𝐝_,_{𝐖𝐖𝐖𝐖}𝐝𝐝+𝟏𝟏_]

Figure 3.3: Representing days using historical PV data and weather forecast daydand also the weather forecast for the next dayd+1[P Vd_{, W F}d+1_]_{, and is} compared with the previous daysirepresented as[P Vi_{, W F}i+1_]_.

CHAPTER 3. INSTANCE-BASED METHODS 32 1 2 3 4 5 6 … d-1 d d+1 1 2 3 4 5 6 … d-1 d 𝐖𝐖𝐖𝐖𝐝𝐝+𝟏𝟏 W PV Nearest neighbors of [𝐖𝐖𝐝𝐝_,_{𝐖𝐖𝐖𝐖}𝐝𝐝+𝟏𝟏_]

Days after the nearest neighbours Days Prediction for d+1 1 2 3 4 5 6 … d-1 d Average W + WF 𝐏𝐏𝐏𝐏𝐝𝐝+𝟏𝟏

Figure 3.4: Representing days using historical weather data and weather forecast

PV+W+WF

1 2 3 4 5 6 … d-1 d d+1

1 2 3 4 5 6 … d-1 d 𝐖𝐖𝐖𝐖𝐝𝐝+𝟏𝟏

Days after the nearest neighbours Days

1 2 3 4 5 6 … d-1 d

Selected Prediction for d+1 Nearest neighbors of [𝐏𝐏𝐏𝐏𝐝𝐝_,_𝐖𝐖𝐝𝐝_,_{𝐖𝐖𝐖𝐖}𝐝𝐝+𝟏𝟏_]

𝐏𝐏𝐏𝐏𝐝𝐝+𝟏𝟏

Figure 3.5: Representing days using historical PV and weather data, and weather forecast

CHAPTER 3. INSTANCE-BASED METHODS 33

3. As shown in Fig. 3.4, when only the W and WF data sources are used(W+W F), daydis represented as a feature vector including the weather data for daydand also the weather forecast for the next day d+1[Wd, W Fd+1], and is compared with the previous daysirepresented as[Wi_{, W F}i+1_]_.

4. As shown in Fig. 3.5, when all three data sources are used(P V +W+W F), day

dis represented as a feature vector including the PV power and weather data for day d and also the weather forecast for the next day d+1 [P Vd_{, W}d_{, W F}d+1_]_, and is compared with the previous daysirepresented as[P Vi, Wi, W Fi+1]. The nearest neighbours are found by using a suitable distance measure.

In document Solar Power Forecasting (Page 42-47)