Our aim is to develop statistical models that can explain the relationship between a response variable,
Ym and predictor variables, X1,m, . . . , XP,m. These models may take the form
Ym= f (Xm,1, . . . , Xp,m) + η. (2.3.1)
Here, f denotes some function and η denotes some variation in Ymnot attributed to the predictors
Xm,1, . . . , Xp,m. In Chapter 1 we discussed that the response variables in the telecommunications
event dataset exhibit weekly seasonality. Also, bank holidays appear to adversely affect the variation in the responses. This behaviour of the response variables is not thought to be attributed to weather predictors, which are of primary interest for our industrial collaborator in the telecommunications event dataset. The current approach estimates the variation in the response variables caused by weekly seasonality and bank holiday affects and removes it from the response variables. This is seen as a data pre-processing step. The procedure for doing this follows. For ease of notation we shall drop the response index, m as this procedure is an individual regression procedure which is applied to each response variable separately.
It is possible to decompose the response variable into the sum of components. Hyndman and Athanasopoulos (2019) present an additive decomposition model of a time series as the sum of three
components, these consist of a seasonal component, Sta trend-cycle component, Ttand a remainder
component Rt such that
Yt= St+ Tt+ Rt.
discuss a classical method using moving averages, however our industrial collaborator uses simple averages as follows.
First, we identify the seasons. Figure 2.3.1 shows a seasonal sub-series plot. Here, the events are plotted for each day of the week separately. It is clear that the level of events on Saturdays and Sundays are unique and lower than the level of each weekday. There is slight variation between levels of events for each weekday. As the level of events for each day of the week appears to vary it may be argued that we should estimate a seasonal component for each day of the week. Further, we observed in Figure 1.1.5 that events are typically lower on bank holidays. In Chapter 1 we discussed that the events appear to deviate much further from past values on Christmas and Boxing Day in comparison to all other bank holidays. This suggests that a single seasonal component for a Christmas Day and Boxing Day, and a seasonal component for all other bank holidays may be suitable.
yt
Monday Tuesday Wednesday Thursday Friday Saturday Sunday
Figure 2.3.1: A seasonal sub-series plot highlighting the weekday levels of the telecommunication event data.
The seasonal components for each season are estimated in the following way. Let the sets of indices be defined
S1={t : t corresponds to Christmas Day, Boxing Day or substitute},
S2={t : t /∈ S1 and t corresponds to a bank holiday},
S3={t : t /∈ S1∪ S2and t corresponds to a Monday},
.. .
S9={t :/∈ S1∪ S2 and t corresponds to a Sunday}.
on a Saturday or Sunday. Then, the estimates of the seasonal components corresponding toSi are calculated as ˆ Si= 1 |Si| X t∈Si Yt.
The next step is to estimate the trend component. When there are long-term increases or decreases in a time series we say that the time series exhibits trend. Our industrial partner estimates the trend component by applying a 365 day centered moving average to the de-seasonalised data as follows,
ˆ Tt= 1 min{t + 183, T } − max{1, t − 183} min{T,t+183} X t=max{1,t−183} Yt− ˆSt .
Note that for t∈ [1, 183] and t ∈ [T − 182, T ] Ttis not strictly symmetric.
Once the trend and seasonal components have been estimated they can be removed and an estimate of the remainder component obtained as
ˆ
Rt= Yt− ˆSt− ˆTt.
We let ˜Yt= ˆRt denote our pre-processed response data. It is possible that the predictor variables
also have long-term increases or decreases. Therefore, a centered moving average is also applied to the predictor variables to obtain the pre-processed predictor variables,
˜ Xt,p= Xt,p− min{T,t+183} X t=max{1,t−183} Xt,p.
Relating back to the model given in (2.3.1), we now seek a model of the form, ˜
Yt= ˜f ( ˜X1, . . . , ˜XP) + ˜η, (2.3.2)
for some error ˜η and some function ˜f . Our industrial partner assumes that ˜f is a linear function in
the predictors.
Following the pre-processing of data, our industrial collaborator applies a stepwise search algo- rithm to select predictors. A number of undesirable properties of the resulting models are often observed, some of which we have already discussed. Typically, combinations of highly correlated predictors are selected for the models where the coefficients of the associated predictors have con- flicting signs. This leads one to question the validity of such a model as one would expect strongly correlated predictors to affect the response variable in either a positive or negative way, but not in opposing ways. Hastie et al. (2008) note that this problem is often observed with the least squares estimates and motivates the application of ridge regression (Hoerl and Kennard, 1970).
The stepwise algorithm used by our industrial collaborator is implemented using the stats::step (R Core Team, 2018) function in R. This procedure iteratively adds the predictor which produces
a model with the lowest AIC, until the AIC of a model can not be reduced further by adding an additional predictor.
In this chapter we have introduced linear regression models and a number of methods used to estimate them. In particular, we focused on procedures that could produce sparse models where a number of the regression coefficients are estimated to be zero. Often these procedures use a tuning parameter and we discussed methods that can can be used to determine them. We introduced literature for predictor selection in multi-response models and described the procedure that our industrial collaborator uses to model telecommunications data. In the next chapter we describe the procedure that we have developed to model telecommunications data.
Semi-automated simultaneous
predictor selection for
Regression-SARIMA models: An
application to telecommunications
events
Abstract: Deciding which predictors to use plays an integral role in deriving statistical models in a
wide range of applications. Motivated by challenges of predicting events across a telecommunications network, we propose a semi-automated, joint model fitting procedure for linear regression models. Our approach can model and account for serial correlation in the regression residuals, produce sparse and interpretable models and can be used to jointly select models for a group of related response variables. We achieve this by fitting linear models under constraints on the number of non-zero coefficients using a generalisation of the Mixed Integer Quadratic Optimisation approach developed by Bertsimas and King (2016). Our approach can produce models with better predictive performance on the telecommunications data than methods currently used by industry.
This chapter is structured as follows. In Section 3.1 we start with an introduction to the industrial setting that motivated our methodology. In Section 3.2 we state our problem formally and review the existing literature for predictor selection in linear regression. We then discuss how to use the MIQO program presented by Bertsimas and King (2016) to develop a semi-automated modelling procedure. In Section 3.3 we introduce our MIQO program and extensions that can improve the performance of the models. Section 3.4 highlights the advantages of our approach over standard methods in the literature through a simulation study. We apply our approach to a motivating data application in Section 3.5 before concluding this chapter in Section 3.6.