2.5 Discussion
3.1.1 Methodology
DWkNN extends the traditional kNN algorithm by assigning weights to the different data sources based on the importance of these sources for the prediction. These weights are learned from previous data. In contrast to the traditional weighted neareast neigh- bor algorithms, DWkNN assigns weights to the features from the different sources, not to the nearest neighbours when combining their predictions.
CHAPTER 3. INSTANCE-BASED METHODS 29 Sequence of days Average d+1 Day to be predicted β¦ d-1 d 1 2 3 4 5 β¦ PVd-1 PVd PV1 PV2 PV3 PV4 PV5 PV power output ππππππ+ππ
Days after the nearest neighboursNearest neighbors of d+1
Prediction for d+1 w1 , w2, w3and k Grid parameter optimization Step 1 Step 2 Historical data: [π·π·π·π·ππ,π·π·π·π·ππ, β¦ ,π·π·π·π·π π ], [πΎπΎππ,πΎπΎππ, β¦ ,πΎπΎπ π ] [πΎπΎπΎπΎππ,πΎπΎπΎπΎππ, β¦ ,πΎπΎπΎπΎπ π ]
Initialise weights w1, w2, w3and k
Transform data using weights [w1 PV, w2 W, w3 WF]
Develop kNN and evaluate its performance on validation set
Update weights and k
Select the best weights and k PV and weather data for day d,
weather forecast for day d+1:π·π·π·π·π π ,πΎπΎπ π ,πΎπΎπΎπΎπ π +ππ
Figure 3.1: The DWkNN algorithm
DWkNN addresses the limitation of the traditional k-NN which doesnβt consider the importance of different data sources. In this study, we focus on three data sources that are most commonly used for PV power output prediction tasks: historical PV data (PV), historical weather data (W) and weather forecast (WF).
The algorithm is summarized in Fig. 3.1 and consists of two main steps:
1. Finding the best weights for the features of the data sources and the best number of neighbors by using the historical data and applying a grid search method, and 2. Predicting the PV power output for the new day using the selected parameters
from the previous step.
3.1.1.1 Finding the Weights
The parameters that are optimized in the first step are the weights w1, w2 and w3, reflecting the importance of each of the three data sources and the number of neighbors
CHAPTER 3. INSTANCE-BASED METHODS 30
k. To find the best parameters, a grid search method is first applied. The value of
k is varied from 1 to 10 with an increment of 1, and the values of the weights are varied from 1 to 100%, with an increment of 1%. For each set of weights, the training and validation sets are transformed by multiplying the values of these sets with the corresponding weights:
new dataset= [w1P V, w2W, w3W F]
ThenkNN, trained on the training data, is applied to predict all instances from the validation set and the accuracy is calculated. The parameters resulting in the highest accuracy on the validation set are selected.
3.1.1.2 Predicting the New Day
In the second step, the PV power output for the next day d+1 is predicted. To do this, firstly the k nearest neighbors of the previous day (dayd) are selected using the chosen data weights and the number of neighbourskfrom the previous step. Then the PV power data of the days following the nearest neighbors is averaged to generate the prediction for the next day. Specifically, ifS = {s1, s2, ..., sk}is the set of selectedk
days, then the prediction forP Vd+1 is given by:
\
P Vd+1 = 1
k
X
P VSi+1
where eachP ViβS is the 20-dimensional vector of half-hourly power outputs for day
i.
To find the nearest neighbors for dayd, four different representations of day dare used, depending on the data sources used, as shown in Fig. 3.2 - 3.5:
1. As shown in Fig. 3.2, when only the PV and W data sources are used (P V +
W), day d is represented as a feature vector consisting of its PV power and weather data[P Vd, Wd]and it is compared with the previous daysirepresented
as[P Vi, Wi].
2. As shown in Fig. 3.3, when only the PV and WF data sources are used(P V +
CHAPTER 3. INSTANCE-BASED METHODS 31 W PV Days 1 2 3 4 5 6 β¦ d-1 d d+1 1 2 3 4 5 6 β¦ d-1 d 1 2 3 4 5 6 β¦ d-1 d ππππππ+ππ Selected Nearest neighbors of
Days after the nearest neighbours
Prediction for d+1 PV+W
[πππππ π ,ππππ]
Figure 3.2: Representing days using historical PV and weather data
W PV Days 1 2 3 4 5 6 β¦ d-1 d d+1 1 2 3 4 5 6 β¦ d-1 d ππππππ+ππ 1 2 3 4 5 6 β¦ d-1 d ππππππ+ππ Average
Days after the nearest neighbours
Prediction for d+1
PV + WF
Nearest neighbors of [ππππππ,ππππππ+ππ]
Figure 3.3: Representing days using historical PV data and weather forecast daydand also the weather forecast for the next dayd+1[P Vd, W Fd+1], and is compared with the previous daysirepresented as[P Vi, W Fi+1].
CHAPTER 3. INSTANCE-BASED METHODS 32 1 2 3 4 5 6 β¦ d-1 d d+1 1 2 3 4 5 6 β¦ d-1 d ππππππ+ππ W PV Nearest neighbors of [ππππ,ππππππ+ππ]
Days after the nearest neighbours Days Prediction for d+1 1 2 3 4 5 6 β¦ d-1 d Average W + WF ππππππ+ππ
Figure 3.4: Representing days using historical weather data and weather forecast
PV+W+WF
1 2 3 4 5 6 β¦ d-1 d d+1
1 2 3 4 5 6 β¦ d-1 d ππππππ+ππ
W
PV
Days after the nearest neighbours Days
1 2 3 4 5 6 β¦ d-1 d
Selected Prediction for d+1 Nearest neighbors of [ππππππ,ππππ,ππππππ+ππ]
ππππππ+ππ
Figure 3.5: Representing days using historical PV and weather data, and weather fore- cast
CHAPTER 3. INSTANCE-BASED METHODS 33
3. As shown in Fig. 3.4, when only the W and WF data sources are used(W+W F), daydis represented as a feature vector including the weather data for daydand also the weather forecast for the next day d+1[Wd, W Fd+1], and is compared with the previous daysirepresented as[Wi, W Fi+1].
4. As shown in Fig. 3.5, when all three data sources are used(P V +W+W F), day
dis represented as a feature vector including the PV power and weather data for day d and also the weather forecast for the next day d+1 [P Vd, Wd, W Fd+1], and is compared with the previous daysirepresented as[P Vi, Wi, W Fi+1]. The nearest neighbours are found by using a suitable distance measure.