Combining Forecasts for Short Term Electricity Load Forecasting 0 1000 2000 3000 4000 5000 0.0 0.2 0.4 0.6 0.8 1.0 Step Weight
Motivation of Electricity Load Forecasting
Electricity can not be stored, thus forecast-ing elec. consumption:
to avoid blackouts on the grid to avoid financial penalties to optimize the management of production units and electricity trading
Managing a wild variety of production units:
nuclear plants
fuel, coal and gas plants
renewable energy: water dams, wind farms, solar panels...
Motivation of Electricity Load Forecasting
Application to Electricity Load Data Electricity Data Trend 1/9/2002 13/1/2003 28/5/2003 9/10/2003 21/2/2004 4/7/2004 16/11/2004 31/3/2005 12/8/2005 25/12/2005 8/5/2006 20/9/2006 1/2/2007 16/6/2007 28/10/2007 10/3/2008 23/7/2008 4/12/2008 18/4/2009 31/8/2009 40000 50000 60000 70000 80000 90000
Application to Electricity Load Data Electricity Data Yearly Pattern 1/1/2006 20/1/2006 8/2/2006 27/2/2006 18/3/2006 7/4/2006 26/4/2006 15/5/2006 3/6/2006 22/6/2006 12/7/2006 31/7/2006 19/8/2006 7/9/2006 26/9/2006 16/10/2006 4/11/2006 23/11/2006 12/12/2006 31/12/2006 30000 40000 50000 60000 70000 80000
Application to Electricity Load Data Electricity Data Weekly Pattern 1/6/2006 2/6/2006 4/6/2006 5/6/2006 7/6/2006 8/6/2006 10/6/2006 12/6/2006 13/6/2006 15/6/2006 16/6/2006 18/6/2006 19/6/2006 21/6/2006 23/6/2006 24/6/2006 26/6/2006 27/6/2006 29/6/2006 30/6/2006 35000 40000 45000 50000 55000
Application to Electricity Load Data Electricity Data Daily Pattern 0 10 20 30 40 40000 45000 50000 55000 60000 65000 70000 Instant Load Mo Tu We Th Fr Sa Su
Application to Electricity Load Data Electricity Data Special Days 0 10 20 30 40 60000 65000 70000 75000 80000 Instant Load (MW)
Normal Special Tariff
20/12/2007 20/12/2007 21/12/2007 22/12/2007 23/12/2007 24/12/2007 25/12/2007 25/12/2007 26/12/2007 27/12/2007 28/12/2007 29/12/2007 30/12/2007 30/12/2007 31/12/2007 1/1/2008 2/1/2008 3/1/2008 4/1/2008 4/1/2008 55000 60000 65000 70000 75000 80000 85000
Application to Electricity Load Data Electricity Data
Application to Electricity Load Data Electricity Data Load-Cloud Cover 0 10 20 30 40 60000 65000 70000 75000 Instant Load (MW) 0 10 20 30 40 0 2 4 6 8 Instant Cloud co v er (Octets)
Application to Electricity Load Data Parametric Models
Operational Forecasts: a high dimensional non-linear regression model, see [Bruhns et al.(2005)]
Metehore Model
Separate the Weather dep. and the Weather ind. Load:
Lt =LWDt +L WI t +εt
LWDt :
Cooling and Heating effect
Felt temperature (expo. smoothing of the real temperature...) Trend
LWI t :
Daily, Weekly and Yearly cycles (Regression, Fourier basis) Trend
Application to Electricity Load Data Parametric Models Metehore Model T° f(T°) −10 0 10 20 30 5 10 15 20 0.6 0.7 0.8 0.9 1.0 Hour Load (MW) Mo Th We Tu Fr Sa Su 5 10 15 20 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Hour Saturda y Shape 5 10 15 20 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Hour Monda y Shape
Application to Electricity Load Data Semi-Parametric Models
In used at EDF R& D, see [Pierrot and Goude (2011)], and [Wood (2006)] for a in depth presentation of the statistical method.
GAM Model
Lt = P6j=1fj(Hourt)IDayTypet=j+f7(Toyt,It) +f8(t)
+ g1(Tt,Timet) +g2(Tt−48,Timet) +g3(Cloudt) + h(Lt−24h)
+ εt
fjs: Weather Independant Load (shapes of days,yearly cycle, trend)
gjs: Weather Dependant Load
Application to Electricity Load Data Semi-Parametric Models GAM Model week.temp 0 10 20 week.ind 10 20 30 40 50 z 45000 50000 55000 60000 65000 Temperature Effect 0 10 20 30 40 −10000 −5000 0 5000 10000 Hour Load (MW) Mo we Fr Sa Su Posan 0.0 0.2 0.4 0.6 0.8 Instant 0 10 20 30 40 z 40000 50000 60000 70000 80000 Yearly Cycle 120000 140000 160000 180000 200000 220000 240000 −10000 −5000 0 5000 10000 Trend t
Application to Electricity Load Data Non-Parametric Models
Similarity models based on wavelets decomposition proposed in
[Antoniadis et al. (2006)], [Antoniadis et al. (2010)], results presented in [Cugliary (2011)].
Functional Model
Partitioning the load into blocks of load curvesZi(t)
Classify this curves into clusters according to calendar informations
In each cluster find ”similarity”Wi,j∈[0,1] between curvei andj with a wavelets based distance
Forecast tomorrow’s curveZn+1(t):
b Zn+1(t) = n−1 X m=1 Wn,mZm+1(t)
Tomorrow will look like the days following days similar than today in the past
Application to Electricity Load Data Non-Parametric Models Functional Model 0 100 200 300 400 500 600 700 55000 60000 65000 70000 75000 80000 85000 Time Load (MW) ● ● ● ●● ● ● ● ● ● ● ● ●●●●●● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●●●●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●●● ● ● ● ● ●● ● ● ● ● ● ● ● ●
Sequential Combination of Specialized Experts Framework and Algorithms This framework was introduced in [Blum (1997)] and further studied in [Freund et al. (1997)]
On-line Sequential Aggregation
At each timet∈[1,T], we have access toYt
1 = (y1, ...,yt−1),yi ∈[0,B], and
the past experts (e.g. GAM, Metehore or functional models) then
the ”environment” generatesyt and the individual predictors (experts) (bfj,t)16j6N
the forecaster builds his combined forecastbyt
the ”environment” revealsyt to the forecaster the experts incur loss`:R+×R+→R+, `(fj,t,yt)
Individual Sequence: worst case bounds
no assumption on an underlying stochastic process a general framework to embed all kind of base forecasters
Sequential Combination of Specialized Experts Framework and Algorithms
Each timet the experts can beactive(produce a forecast) orinactive(do not produce any forecast)
We denoteEt ⊂ {1, . . . ,N}the set of active experts at time indext
Aggregation consists in convex aggregation rule:
b yt= N X j=1 pj,tfj,t pt= (p1,t, . . . ,pN,t)∈ X
X : {pt∈RN,pj,t>0,p1,t+. . .+pN,t= 1}is the set of convex weight vectors
overNelements
the weights are produced sequentially with an algorithm based on the concept of
Sequential Combination of Specialized Experts Framework and Algorithms
Supposing weightspare produced by the algorithmA, the regret with respect to the expertjup to timeT is:
Rt(A,j) = X t=1,...,T
(`t(pt)−`t(δj))
Where`t(p) is the loss of the combined forecast based on weightspt,δjthe dirac mass of the expertj. Rt(A,q) = X t=1,...,T (`t(pt)−`t(q)) Whereq∈ X.
Goal: find an algorithmAthat minimizes the regret, e.g. that obtains a minimal regret ino(T)
minjRt(A,): as well or better than the best expert⇒ Eη
minqRt(A,q): as well or better than the best convex combination⇒ Egrad
Sequential Combination of Specialized Experts Framework and Algorithms
E
η, Exponential Weight Aggregation
Input: η >0
Initialisation: w1= (1/N, ...,1/N)
Fort from1toT do: -Forecastbyt =P 1
i∈Etwi,t
P
j∈Etwj,tfj,t -Observeyt
-For experti from 1 toNupdate the weights:
wi,t+1= eηRt−1(Eη,j) I{j∈Et} P k∈Ete ηRt−1(Eη,k) End For End Do Egrad
η : same algorithm, replacing the loss`t bye`t such that
Sequential Combination of Specialized Experts Framework and Algorithms Compound Experts: jT 1 = (j1, . . . ,jT) size j1T = T X t=2 I{jt−16=jt} and size q T 1 = T X t=2 I{qt−16=qt}
The regrets are
RT A,j1T =PT t=1 `t(pt)−`t δjt ⇒ Fη,α RT A,qT1 =PT t=1 `t(pt)−`t(qt) ⇒ Fgrad η,α
Sequential Combination of Specialized Experts Framework and Algorithms
F
η,α, Fixed-Share
Input: η >0,α∈[0,1]
Initialisation: w1= (1/N, ...,1/N)
Fort from1toT do: -Forecastbyt =P1
wi,t
P
jwj,tfj,t
-Observeyt
-For experti from 1 toNupdate the weights:
vi,t = wi,te−η`t(δi)
wi,t+1 = (1−α)vi,t+Mα−1Pj6=ivj,t
End For End Do
Sequential Combination of Specialized Experts Application Context:
produce one day ahead load forecasts of the French load consumption every day at noon (weigths are updated according to that constraint, it induces a modif. of the algorithms)
base forecasters are obtained from R& D models:
Metehore model: 15 experts GAM model: 8 experts Functional model: 1 expert
this experts specialized on winter/summer periods, some are inactive on banking holidays
Time intervals Every 30 minutes
Number of daysD 320
Time indexesT 15 360
Number of expertsN 24 (= 15 + 8 + 1) Median of theyt 56.33 (GW)
BoundB on theyt 92.76 (GW)
Table: Some characteristics of the observations yt of the French data set of
Sequential Combination of Specialized Experts Application
Name of the benchmark procedure Formula Value
Uniform convex weight vector rmse (1/24, . . . ,1/24)
= 0.748
Best single expert min
j=1,...,24 rmse(j) = 0.782
Best convex weight vector min
q∈X rmse(q) = 0.683
Best compound expert
Size at mostm= 50 min
jT 1∈C50
rmse j1T
= 0.534
Size at mostm= 100 min
jT 1∈C100
rmse j1T
= 0.474 Size at mostm=T −1 = 10 359 min
jT
1∈E1×E2×...×ET
rmse j1T
= 0.223
Table: Definition and performance of several (possibly off-line) benchmarking procedures on the French data set (GW) of operational forecasting.
Sequential Combination of Specialized Experts Application
Optimisation of the aggregation rules parameters: fixeds values
Value of η 10−6 10−5 10−4 2×10−4 10−3 5×10−3 10−2
rmseof Eη (u) 0.724 0.722 0.718 0.731 0.788
Egradη (u) 0.724 0.722 0.712 0.683 0.650 0.668
Table: Performance obtained by the sequential aggregation rules for various choices ofη.
Value of η 0.01 0.01 0.01 1 1 1 500 500 500
α 0.001 0.01 0.05 0.001 0.01 0.05 0.001 0.01 0.05 mse of Fη,α 0.678 0.683 0.704 0.711 0.659 0.652 0.674 0.633 0.632
Fη,αgrad 0.646 0.669 0.700 0.622 0.598 0.637 0.683 0.675 0.671
Sequential Combination of Specialized Experts Application
Optimisation of the aggregation rules parameters: online calibration
Best constant pair (η, α) Grid
rmseof Fη,α 0.632 0.644 Fη,αgrad 0.598 0.599
Table: Performance obtained by the rulesFη,αandFη,αgradfor the best constant choices ofηandαand with the
meta-rule selecting sequentially the values ofηandα.
We obtain a significant improvement of 20% of the RMSE over the best expert. Performance of the fixed-share rule is comparable to the best compound expert with 50 shifts.
Sequential Combination of Specialized Experts Application Example of Weights 0 5000 10000 15000 0.0 0.2 0.4 0.6 0.8 1.0 Half hours W eight 0 5000 10000 15000 0.0 0.2 0.4 0.6 0.8 1.0 Half hours W eight
Conclusion and Future Work
Building the specialized experts: for extreme weather conditions,
holidays etc...
Intraday forecasts
Algorithms based on exogenous informations: meteo, calendar data...
Density forecasts based on experts advices
Conclusion and Future Work
A. Antoniadis, E. Paparoditis, and T. Sapatinas. A functional wavelet kernel approach for time series prediction. Journal of the Royal Statistical Society: Series B, 68(5):837-857, 2006.
A. Antoniadis, X. Brossat, J. Cugliari, and J.M. Poggi. Clustering functional data using wavelets. In Proceedings of the Nineteenth International Conference on Computational Statistics (COMPSTAT), 2010.
J. Cugliari, Pr´evision non param´etrique de processus `a valeurs fonctionnelles, Application `a la consommation d’´electricit´e, PhD Thesis.
A. Blum, Empirical support for winnow and weighted-majority algorithms: Results on a calendar scheduling domain. Machine Learning, 26:5-23, 1997.
Bruhns, A., Deurveilher G., and Roy, J.S. (2005), ”A non-linear regression model for mid-term load forecasting and improvements in seasonnality ”, presented at the 15th Power Systems Computation Conference, August 22–26, 2005, Liege, Belgium.
Y. Freund, R. Schapire, Y. Singer, and M. Warmuth, Using and combining predictors that specialize, In Proceedings of the Twenty-Ninth Annual ACM Symposium on the Theory of Computing (STOC), pages 334-343, 1997.
A. Pierrot and Y. Goude, Short-Term Electricity Load Forecasting With Generalized Additive Models, Proceedings of ISAP power 2011, 2011.