5.2 Modelling
5.2.2 Model Stochastics
This subsection discusses the stochastics of demand which contribute to the construction of our two-stage model.
Before we discuss stochastic demand, we note that uncertainty surrounding supply remains at the tactical and aggregate allocation planning stages (due to absenteeism, holidays and variable efficiency levels affecting completion times). We omit any stochas-
tic modelling of supply in this work however, based on three grounds:
1. We view demand uncertainty as the main contributor to the uncertainties which make Tactical and Operational Planning challenging;
2. Our interests lie in how cross-training, a property of supply, is influenced by un- certainties in demand. This renders demand as our primary input and supply as our primary output;
3. Existing papers have found cross-training alleviates the need to model anticipation of worker absence (Easton, 2011).
Our goal is to find a training solution which provides a workforce well placed to cope with a range of demand outcomes. Here we consider the form that these demand outcomes should take. Training decisions influence our flexibility in meeting demand in every period of the planning horizon after the training action is taken. The performance of the resulting updated workforce should therefore be evaluated on an aggregate alloca- tion overtime series realisations of demand, say of length|T|. To aid further discussion, we provide a brief aside on time series modelling.
Time Series Modelling
A traditional approach to time series modelling involves decomposing demand for skill
j ∈ J at time t ∈ T into a cyclical contribution Cjt, seasonal contribution Sjt, trend
component Tjt, and a random ‘noise’ component εjt (Shumway and Stoffer, 2006). For
example, demand might be summarised by the following additive model
djt =Cjt+Sjt+Tjt +εjt. (5.2.1)
Note that demand measures djt may be based on historical data, some forecast based
The deterministic cyclical, seasonal and trend components can be estimated by using filters or parametric regression models. Assuming these components capture all of the serial dependence in the data, the remaining set {εjt}t∈T will have the properties of
white independent noise: with zero mean and, critically, independent and identically distributed (i.i.d). LetFj(·) denote the cumulative distribution function fitted to data
set{εjt}t∈T.
To construct a univariate time series realisation d0j = (d0j1, . . . , d0j|T|) for skill j, we sample ε0jt from the univariate distribution fitted to {εjt}t∈T and combine with deter-
ministic components Cjt, Sjt and Tjt. As the utilisation of a cross-trained workforce is
affected by the cross-correlation between demand for different skills, we constructmulti- variatetime series realisations (for all skills inJ) by instead samplingε0t = (ε01t, . . . , ε0|J|t) from thejoint distribution of {(ε1t, . . . , ε|J|t)}t∈T.
Serial Dependence
Suppose that demand features no systematic source of serial dependence, so that Cjt,
Sjt and Tjt do not feature in time series model (5.2.1). In this case, demand on one day
is independent from demand on another day and assessing the value of a cross-training policy over a time series representing the planning horizon reduces to measuring demand coverage for all possible daily demand realisations in isolation. That is, in the case of no serial dependence in demand, it is enough to solve multiple single-period aggregate allocation problems to measure the performance of a training solution.
A more likely case is that daily demand observations do have some underlying serial dependence. In this circumstance, this demand characteristic will be lost by considering daily demands in isolation. This could result in our training model underestimating the presence of cross-correlation between the skills (if demand streams contain similar patterns over time then their cross-correlation will inevitably be higher) which could result in an undervaluation of the benefits of cross-training. To capture such serial
dependence we need to perform aggregate allocations over time series realisations of demand which last the duration of the serial dependence. The performance of the training configuration will therefore be measured over multiple days at once via a multi- period aggregate allocation.
To provide an example, suppose that demand for a skill j has some weekly cyclic pattern but is otherwise stationary across the year so that the following model is repre- sentative of its time series:
djt =Cjt7 +εjt where Cjt7 = Pb |T| 7 c u=1 dj,7(u−1)+t7 b|T7|c ,
and weekly cyclic variation Cjt7 measured by taking day of the week averages over all
full weeks in the planning horizonT. That is,
Cjt7 := Pb |T| 7 c u=1 dj,7(u−1)+t7 b|T7|c ,
where u is a week number index and t7 = t−7(u−1) is an index on the day of the week.
This model says there is no relationship between the demand level from one week to the next, meaning we can capture the randomness of demand over a whole year with a collection of week-long scenarios. Assessing the performance of a training scheme over the whole year then simply requires aggregate allocation to week-long time series scenar- ios. This comes as a consequence of our modelling assumption that carryover need not be accounted for in this particular model. For brevity of argument and relevance to our case study application, we continue with this weekday variation case. Note that further discussion and, ultimately, the stochastic model are not limited to this case however. For example, dependence between demand at the weekly level but independence at the
monthly level would require|T|= 28-day scenarios to capture variation over the year. In general, we note that time series scenarios for demand do not need to run the full length of planning horizon. If we observe independence in the variation around neigh- bouring cyclical subsets of the planning horizon, we can assess performance over such subsets independently. Were the carryover of incomplete work through time included in our demand count djt in period t, there would clearly be autocorrelation in demand
lasting the full length of the planning horizon. Without this non-carryover assumption, we would therefore need to test training solutions against aggregate allocations over time series lasting the full duration of the planning horizon. It is in our interests to limit the length of the time series making up the demand scenarios so that the scale of the resulting stochastic program is manageable.
The non-carryover assumption is of further value when searching for solutions to the resulting stochastic program. In rendering the second-stage sub-problem separable by period and scenario, this assumption provides further opportunities for computation time improvement via parallelisation.
Having established the nature of the stochastics underlying the training problem, we now provide a method for generating the time series scenarios required.
Scenario Generation Process
Let us assume that we have a continuous|J|-dimensional multivariate distributionF(·) capturing joint residual variation (ε1, . . . ,ε|J|) in demand for skills j ∈ J. Ideally,
we base training decisions on an expectation (of performance in aggregate allocation) taken over this continuous distribution. In reality, calculating an expectation over a continuous distribution of uncertain parameters - forming the second stage sub-problem of a stochastic program (King and Wallace, 2012) - renders the majority of stochastic programs unsolvable. To find a solution to such a model we must find a discrete version of this probability distribution, that is, we must approximate the distribution with a
finite set of scenarios.
The discretisation process is not trivial, indeed it merits its own body of research under the termscenario generation. The procedure we use from this literature draws on the theory of copulas. Note that, by Sklar’s theorem (Sklar, 1996), joint distributionF(·) can be fully specified using a copula dependence function C and marginal distribution functionsFj as follows:
F(x1, . . . , x|J|) =C{F1(x1), . . . , F|J|(x|J|)}. (5.2.2)
Generating time series scenarios for multivariate demand can then be broken down into the following process. For each periodt∈T:
1. Sample (u1, . . . , u|J|) from copulaC (on uniform margins) using the copula-based
scenario generation method of Kaut (2011);
2. Transform the resulting samples to the correct scale:
εsjt =F−1(uj)
where recall, Fj(·) is the inverse marginal cumulative distribution function fitted
to {εjt}t∈T. This gives a multivariate scenario (εs1t, . . . , εs|J|t) for random variation
in period t;
3. Add the resulting scenario εs
jt onto the cyclic component Cjt7 (and seasonal and
trend components if they exist) to obtain multivariate demand sample dst = (ds1t, . . . , ds|J|t) for period t.
Concatenating the valuesdst by time indext, we reach a multivariate time series scenario with desired variance, cyclic serial dependence and cross-correlation properties. For an introduction to multivariate dependence sampling using copulas, see Nelsen (2007).
Note that the copula based scenario generation method of Kaut (2011) is favoured for its flexibility to capture non-elliptic distributions. There may exist more efficient or otherwise more suitable scenario generation methods for this model but, given the primary focus of this work lies in the modelling process and not the field of scenario generation, we proceed with these methods on the basis that they fulfil the requirements of an effective scenario generation technique. Those requirements, as discussed by King and Wallace (2012), are
• In-sample stability: a test for the robustness of the discretisation procedure, it ensures that the optimal objective function value is roughly the same for any scenario set generated by the (random) scenario generation procedure; and
• Out-of-Sample Stability: ensures that thetrueobjective function value correspond- ing to solutions resulting from different scenario sets are roughly equal.
LetSp andSq represent two scenario sets resulting from two different runs of a scenario
generation procedure. Then let f(x;Sp) denote the objective function (in terms of
decision variablex) associated with scenario set Sp, and ˆxp denote the optimal solution
of the corresponding minimisation problem: minxf(x;Sp). With ˆxq similarly defined, if
the optimal objective function values are (approximately) the same in all cases, i.e.
f(ˆxp;Sp)≈f(ˆxq;Sq),
then we have in-sample stability.
To test out-of-sample stability, ideally we would verify that
f(ˆxp;ξ)≈f(ˆxq;ξ).
Evaluatingf(ˆxp;ξ) equates to fixing the first stage solution and solving a large number
We will therefore perform a weaker out-of-sample stability test here:
f(ˆxp;Sq)≈f(ˆxq;Sp).