Forecasting Future Arrival Demand - Statistical Forecasting and Queueing Models for Virtual Com

CHAPTER 2: Statistical Forecasting and Queueing Models for Virtual Com-

2.5 Forecasting Future Arrival Demand

To implement the dynamic SAP in practice, we need to forecast future arrival demand, denoted as λn,i in the above sections. Here we introduce two forecasting methods that will be compared in our numerical study section. The baseline method is the moving average (MA) method, and the more sophisticated one is the singular value decomposition (SVD) method introduced by Shen and Huang (2008).

We introduce the common notation used in this section. LetT be the number of days. Each day is divided into P periods. For example, we use P = 24 hourly periods in our numerical experiments. Let xt,i be the number of arrivals during the i-th period of day t, t= 1,· · ·, T,

i= 1,· · ·, P. The vectorxt= [xt1,· · · , xtP] records the hourly arrival volumes on dayt. Given the historical data x1,· · · , xT, the following sections discuss the two methods to forecast the next-day demandsxT+1.

2.5.1 Moving Average Forecasting

Let w be the rolling horizon, which is the number of historical days used for forecasting. Specifically, considering the daily patterns in our data, we forecast the arrival demand over each ith period of day T+ 1 as the average of the same time periods of the previous w days. That is ˆ x(1)_T₊₁_,i= 1 w T X t=T−w+1 xt,i, i= 1,· · ·, P, T > w.

To accommodate the variation in the demand and achieve a higher service quality, we also propose an inflated estimate using the sample standard deviation:

ˆ x(2)_T₊₁_,i= ˆx(1)_T₊₁_,i+ 1.96 v u u t 1 w−1 T X t=T−w+1 (xt,i−xˆ(1)T+1,i)2.

The above estimate will be an approximation of the upper bound of the 95% prediction interval of the mean arrival rate forecast.

One important issue in the MA method is to decide the size of the rolling horizonw. The larger the window, the less influence the short term daily fluctuation will have, and more clearly we can see the long term effects. However, a largerwwould make the MA method less sensitive to the non-stationary phenomenon. Our numerical studies usew= 30 days.

2.5.2 SVD Forecasting

Here we apply the inter-day forecasting method introduced in Shen and Huang (2008). To be consistent with the MA method, we also usewhistorical days. LetX= [x>_T₋_w₊₁,· · · , x>_T]> be the historical data matrix used in the forecasting of arrivals on dayT + 1 (here the sign>

stands for transpose). The singular value decomposition (SVD) of theX can be expressed as

X=U SV>, whereU records the row information (the daily (inter-day) pattern of the original

X), andV records the column information (the time of day (intra-day) pattern of the arrivals). We write the three decomposition matrices in the form of column vectors and diagonal elements:

U = (u1,· · · , uP),V = (v1,· · ·, vP), S=diag(s1,· · ·, sP), where s1 ≥s2 ≥ · · · ≥sP. Then it follows

xt= (s1ut,1)v>1 +· · ·+ (sPut,P)v>P.

To summarize the inter-day features while reducing the dimension of data profiles, we extract the first K singular vectors from V, which is similar to the techniques used in the principle component analysis. By setting skut,k =βt,k, we get the following approximation

xt∼= (s1ut,1)v1>+· · ·+ (sKut,K)vK> =βt,1v1>+· · ·+βt,Kv>K.

The last step is to forecast arrivals on day T + 1. Since there are obvious day of week patterns shown in the data, we include zi = 1,· · · ,7 as a categorical covariate controlling for the day of week effects in the AR(1) time series model to obtainβT+1,k, 1≤k≤K:

βT+1,k=ak(zT+1) +bkβT ,k+T+1,k. (2.17)

Next, the arrival vector on day T+ 1 is modeled as

xT+1=βT+1,1v1>+· · ·+βT+1,Kv>K+T+1, (2.18)

whose mean can be estimated by

x(3)_T₊₁ =βT+1,1v1>+· · ·+βT+1,KvK>.

We use the first semester arrival rates of application 1, 3 and 4 as test data to obtain the scree plots (Shen and Huang, 2008) in Figure 2.6, and find that K = 2 or 3 gives good

forecasting results.

Figure 2.6: Scree Plots

Because the SVD forecasting approach assumes no probabilistic models, Shen and Huang (2008) propose a nonparametric bootstrap method to derive the prediction interval of the forecast. Using our data, we apply the same method by first bootstraping the errors from time series model (2.17), and obtain the series {βˆ_Tb₊₁_,₁,· · · ,βˆb_T₊₁_,K}, 1 ≤ b ≤B, where B is the number of bootstrap samples. Then by bootstrapping the errors in the forecasting model (2.18), we obtain B forecasts of xT+1. For an upper bound of the 95% prediction interval,

we use the end point of the 97.5% empirical percentiles of {x1_T₊₁,· · ·, xB_T₊₁}, and denote that estimate by ˆx(4)_T₊₁.

Note that the SVD method involves regression on the decomposed data matrix, which is not efficient when the historical arrival volumes are close to zero. This happens to be the case for applications ranked 100 and lower in terms of total requests, whose hourly arrival rates are usually less than one. We therefore pool the lower ranked applications into few groups, and forecast the aggregated arrival rates within each group. We then split the forecasted group total arrival rates using the Bernoulli splitting rule, and obtain the hourly arrival rates for each individual application.

In the rest of the chapter, we useλ(_n,ij) to denote the forecasted arrival rate of applicationn

over the ith period using the jth forecasting method illustrated above (j = 1,2,3,4 as in the notations ˆx(_Tj₊₁) ).

In document Yu_unc_0153D_17443.pdf (Page 38-42)