CHAPTER 4 : Data-Driven Robust Resource Allocation
4.3 Constructing Uncertainty Sets
4.3.3 Uncertainty Modeling
In this section, we briefly review the theories related to constructing uncertainty demand mod- els based on a spatial-temporal dataset considered in this work. Since we do not assume that the marginal distribution for every element of vectorrcis independent with each other, we select two
approaches without any assumptions about the true distributionP∗(rc)in the literature [12, 27, 79].
given dataset and a required probabilistic guarantee level, and then construct an uncertainty set based on the hypothesis testing.
Uncertainty demand sets built from marginal samples
One intuitive description about a random vector is to define a range for each element.
For instance, David and Nagaraja [27] considered the following multivariate hypothesis with given thresholdsq¯i,0, qi,0∈R, i= 1,2, . . . , τ n H0,i:inf{t:P(rc,i 6t)>1− τ n}>q¯i,0 inf{t:P(−rc,i 6t)>1− τ n}>−qi,0. (4.16)
This hypothesis is related to the bound of the τ n probability value on the random vector, and we dividebyτ nbecausercis a multivariate random vector that we need the hypothesis testing for each
componentrc,iholds simultaneously to provides the probabilistic guarantee described as (4.15).
Assume that we haveN random samples for each componentrc,iofrc, ordered in increasing value
asr(1)c,i, r(2)c,i, . . . , rc,i(N) no matter the original sample order. Then this order is also the order of the estimated valuerˆc,i, i.e.,rˆc,i(1)=r
(1)
c,i, . . . ,ˆr
(N)
c,i . We define the indexsby
s=min k∈N: N X j=k N j τ n N−j 1− τ n j 6 αh 2τ n , (4.17)
and lets=N+ 1if the corresponding set is empty. The testH0is rejected if
To construct an uncertainty set, we need an accepted hypothesis test. Hence, we setq¯i,0 = ˆrc,i(s)and
qi,0= ˆr(c,iN−s+1)withrˆc,i(s)andˆr(c,iN−s+1)from the sampled dataset, thenH0,iis always accepted. The
following uncertainty set is then applied in this work based on the range hypothesis testing (4.16). Proposition 1 ([12], [27]) Ifsdefined by equation(4.17)satisfies thatN −s+ 1< s, then, with
probability at least1−αhover the sample, the set
UM(rc) = n
rc∈Rτ n: ˆrc,i(N−s+1)6rc,i6ˆrc,i(s) o
(4.18)
implies a probabilistic guarantee forP∗(rc)at level.
The hypothesis (4.16) is tested for each component rc,i separately, and the uncertainty demand
model also describes the range ofrc,i, i= 1,2, . . . , τ nseparately provided by Proposition 1. We
do not assume that the marginal distributions ofP∗ are independent, their correlations are reflected
in the box uncertainty set in the sense that changing the value ofnandτ result in a different index values(4.17), and the order statisticsˆrc,i(N−s+1)andˆrc,i(s) will be different. However, the model of the box type of uncertainty set formula does not directly describe the spatial-temporal correlations among components ofrc.
Uncertainty set motivated by moment hypothesis testing
Though the box type of uncertainty set reflects the spatial-temporal correlations by varying range values with different dimensions ofrc, it is not easy to tell directly from the uncertainty set (4.18)
when the range of one component changes how will others be affected. To construct an uncertainty set that directly shows the spatial-temporal correlations of the demand model, we consider to apply hypothesis testing related to the first and second moments of the random vector. The following null assumptions are about the mean and covariance of the true distributionP∗(rc) of random vector rc[79] H0 :EP ∗ [rc] =r0 and EP ∗ [rcrTc]−EP ∗ [rc]EP ∗ [rcT] = Σ0,
with test statistics T defined askrˆc−r0k and kΣˆ −Σ0k. Given thresholds ΓB1 and ΓB2, H0 is rejected when the difference among the estimation of mean or covariance according to multiple
times of samples is greater than the threshold, i.e.,
kEP[˜rc]−rˆck2 >ΓB1 or kEP[˜rcr˜cT]−EP[˜rc]EP[˜rTc]−ΣˆkF >ΓB2,
whereEP[˜r]is the estimated mean value of one experiment,rˆcandΣˆ are the estimated mean and
covariance of multiple times of experiments. The remaining problem is then to select the thresh- olds such that the above hypothesis testing holds given the dataset. In the following Section ??, the detailed steps of calculating the thresholds ΓB
1 andΓB2 at a desired significance value αh and
probabilistic guarantee levelbased on the given dataset is described2.
The uncertainty set derived based on the moment hypothesis testing is defined in the following proposition.
Proposition 2 ([12], [79]) With probability at least1−αhwith respect to the sampling, the follow-
ing uncertainty setUCS
(rc)implies a probabilistic guarantee level offorP∗(rc)
UCS (rc) ={rc>0,rˆc+y+CTw:∃y, w∈Rnτ s.t. kyk2 6ΓB1,kwk2 6 r 1− }, (4.19) whereCTC = ˆΣ + ΓB 2Iis a Cholesky decomposition.
By testing the properties of both first and second moments of the dataset, the uncertainty set (4.19) reflects the spatial-temporal correlations of the demand model directly compared with the box type (4.18). When one component of rc increases or decreases, we have an intuition how it af-
fects the value of other components ofrcby the expression (4.19). More properties of each type of
uncertainty set and application level problems, such as how to choose the number of samplesN for the hypothesis testing with high dimensionalrcwill be discussed in evaluations of Section 3.6.
2Bootstrapped thresholds and theoretic bounds proposed by work [48] are compared in [12]. The bootstrapped thresh- olds result in a smaller uncertainty set in general, hence reduces the ambiguity inP∗. In this work, we apply the boot-
strapped thresholdsΓB1 andΓ
B