Wind Power Forecasting
Ricardo Bessa
Senior Researcher ([email protected]) Center for Power and Energy Systems, INESC TEC, Portugal
Joint work with Laura Cavalcante and Marisa Reis
EWEA Technology Workshop: Wind Power Forecasting 2015 1-2 October 2015, Leuven, Belgium
Introduction
Vector Autogression (VAR) models can be applied to combine wind power time series distributed in space
Two important requirements for a practical implementation Reduce the number of non-null coefficients
Low computational time in large datasets
This work provides the following original contributions Explores a set of sparse structures for the VAR model Applies the alternating direction method of multipliers (ADMM) to estimate the VAR coefficients
Explores parallel computing
Autoregressive Model
Univariate model: uses past observations from the same time series
AR(p) - Autoregressive Model of order p
→ forecasts the variable yt given the past p values yt = c + b1yt−1+ b2yt−2+ · · · + +bpyt−p+ εt
VAR(p) - Vector Autoregressive Model of order p
→ forecasts the vector of k variables Yt = (Y1,t, Y2,t, . . . , Yk,t)
Yt = c + B1Yt−1+ B2Yt−2+ · · · + +BpYt−p + ut
Least Absolute Shrinkage and Selection Operator (LASSO)-VAR Model
The Lasso-VAR estimation minimizes the residual sum of squares subject to an L1 constraint
1
2kY − BZ k2F s.t. kBk1≤ t
Equivalently, it can be defined in the Lagrangian form as 1
2kY − BZ k2F + λ kBk1, where kX kp= (Pn
i=1|xi|p)1/p,kX k2F =Pm i=1
Pn
j=1|xij|2is the Frobenius norm and the regularization parameter λ≥ 0is inverse related to t
Fits the regression model and simultaneously performs variable selection by shrinking regression coefficients to zero
Lasso-VAR Model: Extensions and Generalizations
Lasso
Extensions Penalty Illustration
Row Lasso λ
Bi 1
Matricial Lasso λkBk1
Lag Lasso λPp
l=1kBlk1
Group Lasso λ
P
i6=j
k(B1)ij. . .(Bp)ijk2 Sparse Group
Lasso
(1 − α)λPp l=1kBlkF +αλ kBk1
Parameter Estimation and the ADMM Algorithm
The goal is to estimate the sparse matrix of coefficients with a simple and powerful algorithm
ADMM framework has several advantages
Combines the problem separability offered by the dual ascent method with the convergence properties of the method of multipliers
Convex problems with nondifferentiable constraints (as LASSO) can be easily addressed
Parallel Optimization: break up large datasets into blocks and carry out the optimization over each block
ADMM Algorithm
Lasso-VAR:
minimize 12kY − BZ k2F + λ kBk1
ADMM problem form:
minimize 1
2kY − BZ k2F
| {z }
f(B)
+ λ kHk1
| {z }
f(H)
s.t. B− H = 0
Augmented Lagrangian
Lρ(B, H, W ) = 1
2kY − BZ k2F+λ kHk1+WT(B−H)+ρ
2kB − Hk2F
Parallel Computing
The goal is to split data and use ADMM to solve the problem in a distributed manner (with N objective terms)
Z1 Z2 . . .ZN → Split data across features and use ADMM sharing problem
Z1
Z2
...
ZN
→ Split data across examples and use ADMM consensus optimization
ADMM and Parallel Computing
Splitting Across Examples Splitting Across Features
minPN
i=11/2 kYi− BiZik2F
| {z }
fi(Bi)
+ λ kBik1
| {z }
g(Bi)
min 1/2 Y −
PN i=1BiZi
2 F
| {z }
g(PN i=1BiZi)
+PN
i=1λ kBik1
| {z }
fi(Bi)
minPN
i=1fi(Bi) + g (H) s.t Bi− H = 0
minPN
i=1fi(Bi) + g (PN i=1Hi) s.t BiZi− Hi= 0
Bk+1i := arg min Bi
fi(Bi) +ρ
2 Bi− H
k+ Uik 2 F
Hk+1:= arg min H
g(H) +Nρ 2
H − Bk+1− U k
2 F
Uk+1i := Uki + Bk+1i − Hk+1
Bik+1:= arg min Bi
fi(Bi) +ρ
2
BiZi− Hki + Uki 2 F
Hik+1:= arg min H
g(PN
i=1Hi) +ρ 2
N X i=1
Hi− Uik− Bik+1Zi 2 F
Uk+1:= Uk+ Bk+1i Zi− Hik+1
Case Study description
Apply ADMM algorithm to several LASSO-VAR(2) variants in order to produce wind power forecasts from 1 to 6 hours ahead Dataset
68 wind farms (same control area) Training period: 9 months Test period: 3 months Time resolution: 1 hour
LASSO and ADMM parameters estimated by 5-fold cross-validation
Calculate the improvement in terms of Root Mean Squared Error (RMSE) compared to an Autoregression model - AR(2)
RMSE Improvement over AR results
1 2 3 4 5 6
7 8 9 10 11 12 13
Wind Farm with best improvement
Time Horizon (h)
Improvement over AR (%)
Row L−V Matricial L−V Lag L−V Group L−V Sparse L−V No Sparsity
RMSE Improvement over AR result
1 2 3 4 5 6
4 5 6 7 8 9
Wind Farm with intermediate improvement
Time Horizon (h)
Improvement over AR (%)
Row L−V Matricial L−V Lag L−V Group L−V Sparse L−V No Sparsity
RMSE Improvement over AR result
1 2 3 4 5 6
−8
−6
−4
−2 0 2
Wind Farm with worst improvement
Time Horizon (h)
Improvement over AR (%)
Row L−V Matricial L−V Lag L−V Group L−V Sparse L−V No Sparsity
No of wind farms with negative imp. (average over the time horizon): 3 No of wind farms with negative imp. in at least one lead-time: 13 Group LASSO does not have negative imp. in the first two lead-times
RMSE Improvement over AR result
1 2 3 4 5 6
2 3 4 5 6 7
Global
Time Horizon (h)
Improvement over AR (%)
Row L−V Matricial L−V Lag L−V Group L−V Sparse L−V No Sparsity
Running Time
Lasso Extensions
Not distributed
Distributed over Examples
Row Lasso 5.3 1.6
Matricial Lasso 1.6 0.5
Lag Lasso 1.1 0.4
Group Lasso 7.8 1.1
Sparse Lasso 11 5.5
Table: Time (in sec) to run data divided by a i7 8-cores processor
The same tolerance (1e-3) was used for the ADMM
The error results for each LASSO extension are very similar
Final Remarks and Future Work
The adequate choice of a sparse structure can improve the forecast skill of the VAR model
The case-study results indicate that
Information from selected distributed time series can improve the forecast error compared to an AR model
The Group LASSO-VAR model achieves the highest global improvement and the Lag LASSO-VAR model provides the lowest improvement (mainly for the first lead times)
Future Work
Explore more complex sparse structures
Extend the statistical model to the probabilistic forecast framework
Apply this framework to other smart grid related problems
Acknowledgements
This work was made in the framework of the SusCity project (“MITP-TB/CS/0026/2013”) financed by national funds through
Fundação para a Ciência e a Tecnologia (FCT), Portugal.