Big Data Techniques Applied to Very Short-term Wind Power Forecasting

(1)

Wind Power Forecasting

Ricardo Bessa

Senior Researcher ([email protected]) Center for Power and Energy Systems, INESC TEC, Portugal

Joint work with Laura Cavalcante and Marisa Reis

EWEA Technology Workshop: Wind Power Forecasting 2015 1-2 October 2015, Leuven, Belgium

(2)

Introduction

Vector Autogression (VAR) models can be applied to combine wind power time series distributed in space

Two important requirements for a practical implementation Reduce the number of non-null coefficients

Low computational time in large datasets

This work provides the following original contributions Explores a set of sparse structures for the VAR model Applies the alternating direction method of multipliers (ADMM) to estimate the VAR coefficients

Explores parallel computing

(3)

Autoregressive Model

Univariate model: uses past observations from the same time series

AR(p) - Autoregressive Model of order p

→ forecasts the variable y_t given the past p values yt = c + b₁y_t−1+ b₂y_t−2+ · · · + +bpy_t−p+ εt

VAR(p) - Vector Autoregressive Model of order p

→ forecasts the vector of k variables Yt = (Y1,t, Y2,t, . . . , Y_k,t)

Yt = c + B₁Y_t−1+ B₂Y_t−2+ · · · + +BpY_t−p + ut

(4)

Least Absolute Shrinkage and Selection Operator (LASSO)-VAR Model

The Lasso-VAR estimation minimizes the residual sum of squares subject to an L₁ constraint

1

2kY − BZ k²_F s.t. kBk₁≤ t

Equivalently, it can be defined in the Lagrangian form as 1

2kY − BZ k²_F + λ kBk₁, where kX kp= (Pn

i=1|xⁱ|^p)^1/p,kX k²F =Pm i=1

Pn

j=1|x^ij|²is the Frobenius norm and the regularization parameter ^λ≥ 0is inverse related to t

Fits the regression model and simultaneously performs variable selection by shrinking regression coefficients to zero

(5)

Lasso-VAR Model: Extensions and Generalizations

Lasso

Extensions Penalty Illustration

Row Lasso λ

Bⁱ 1

Matricial Lasso λkBk₁

Lag Lasso λPp

l=1kBlk₁

Group Lasso ^λ

P

i6=j

k(B₁)ij. . .(Bp)ijk₂ Sparse Group

Lasso

(1 − α)λPp l=1kB^lk_F +αλ kBk₁

(6)

Parameter Estimation and the ADMM Algorithm

The goal is to estimate the sparse matrix of coefficients with a simple and powerful algorithm

ADMM framework has several advantages

Combines the problem separability offered by the dual ascent method with the convergence properties of the method of multipliers

Convex problems with nondifferentiable constraints (as LASSO) can be easily addressed

Parallel Optimization: break up large datasets into blocks and carry out the optimization over each block

(7)

ADMM Algorithm

Lasso-VAR:

minimize ¹₂kY − BZ k²_F + λ kBk₁

ADMM problem form:

minimize 1

2kY − BZ k²_F

| {z }

f(B)

+ λ kHk₁

| {z }

f(H)

s.t. B− H = 0

Augmented Lagrangian

L_ρ(B, H, W ) = 1

2kY − BZ k²_F+λ kHk₁+W^T(B−H)+ρ

2kB − Hk²_F

(8)

Parallel Computing

The goal is to split data and use ADMM to solve the problem in a distributed manner (with N objective terms)

Z₁ Z₂ . . .ZN → Split data across features and use ADMM sharing problem

Z1

Z2

...

ZN

→ Split data across examples and use ADMM consensus optimization

























(9)

ADMM and Parallel Computing

Splitting Across Examples Splitting Across Features

minPN

i=11/2 kYi− BiZ_ik²_F

| {z }

f_i(B_i)

+ λ kBik₁

| {z }

g(B_i)

min 1/2 Y −

PN i=1BiZi

2 F

| {z }

g(PN i=1B_iZ_i)

+PN

i=1λ kBik₁

| {z }

f_i(Bi)

minPN

i=1fi(Bi) + g (H) s.t Bi− H = 0

minPN

i=1fi(Bi) + g (PN i=1Hi) s.t BiZi− Hi= 0

B^k+1_i := arg min Bi

f_i(B_i) +ρ

2 B_i− H

k+ U_i^k 2 F

H^k+1:= arg min H

g(H) +Nρ 2

H − B^k+1− U k

2 F

U^k+1_i := U^k_i + B^k+1_i − H^k+1

B_i^k+1:= arg min Bi

f_i(Bi) +ρ

2

B_iZ_i− H^ki + U^k_i 2 F

H_i^k+1:= arg min H

g(PN

i=1H_i) +ρ 2

N X i=1

H_i− Ui^k− Bi^k+1Z_i 2 F

U^k+1:= U^k+ B^k+1_i Z_i− Hi^k+1

(10)

Case Study description

Apply ADMM algorithm to several LASSO-VAR(2) variants in order to produce wind power forecasts from 1 to 6 hours ahead Dataset

68 wind farms (same control area) Training period: 9 months Test period: 3 months Time resolution: 1 hour

LASSO and ADMM parameters estimated by 5-fold cross-validation

Calculate the improvement in terms of Root Mean Squared Error (RMSE) compared to an Autoregression model - AR(2)

(11)

RMSE Improvement over AR results

1 2 3 4 5 6

7 8 9 10 11 12 13

Wind Farm with best improvement

Time Horizon (h)

Improvement over AR (%)

Row L−V Matricial L−V Lag L−V Group L−V Sparse L−V No Sparsity

(12)

RMSE Improvement over AR result

1 2 3 4 5 6

4 5 6 7 8 9

Wind Farm with intermediate improvement

Time Horizon (h)

(13)

RMSE Improvement over AR result

1 2 3 4 5 6

−8

−6

−4

−2 0 2

Wind Farm with worst improvement

Time Horizon (h)

N^o of wind farms with negative imp. (average over the time horizon): 3 N^o of wind farms with negative imp. in at least one lead-time: 13 Group LASSO does not have negative imp. in the first two lead-times

(14)

RMSE Improvement over AR result

1 2 3 4 5 6

2 3 4 5 6 7

Global

Time Horizon (h)

(15)

Running Time

Lasso Extensions

Not distributed

Distributed over Examples

Row Lasso 5.3 1.6

Matricial Lasso 1.6 0.5

Lag Lasso 1.1 0.4

Group Lasso 7.8 1.1

Sparse Lasso 11 5.5

Table: Time (in sec) to run data divided by a i7 8-cores processor

The same tolerance (1e-3) was used for the ADMM

The error results for each LASSO extension are very similar

(16)

Final Remarks and Future Work

The adequate choice of a sparse structure can improve the forecast skill of the VAR model

The case-study results indicate that

Information from selected distributed time series can improve the forecast error compared to an AR model

The Group LASSO-VAR model achieves the highest global improvement and the Lag LASSO-VAR model provides the lowest improvement (mainly for the first lead times)

Future Work

Explore more complex sparse structures

Extend the statistical model to the probabilistic forecast framework

Apply this framework to other smart grid related problems

(17)

Acknowledgements

This work was made in the framework of the SusCity project (“MITP-TB/CS/0026/2013”) financed by national funds through

Fundação para a Ciência e a Tecnologia (FCT), Portugal.