Uncertainty Modeling - Constructing Uncertainty Sets

CHAPTER 4 : Data-Driven Robust Resource Allocation

4.3 Constructing Uncertainty Sets

4.3.3 Uncertainty Modeling

In this section, we briefly review the theories related to constructing uncertainty demand mod- els based on a spatial-temporal dataset considered in this work. Since we do not assume that the marginal distribution for every element of vectorrcis independent with each other, we select two

approaches without any assumptions about the true distributionP∗(rc)in the literature [12, 27, 79].

given dataset and a required probabilistic guarantee level, and then construct an uncertainty set based on the hypothesis testing.

Uncertainty demand sets built from marginal samples

One intuitive description about a random vector is to define a range for each element.

For instance, David and Nagaraja [27] considered the following multivariate hypothesis with given thresholdsq¯i,0, q_i,₀∈R, i= 1,2, . . . , τ n H0,i:inf{t:P(rc,i 6t)>1− τ n}>q¯i,0 inf{t:P(−rc,i 6t)>1− τ n}>−qi,0. (4.16)

This hypothesis is related to the bound of the _{τ n} probability value on the random vector, and we dividebyτ nbecausercis a multivariate random vector that we need the hypothesis testing for each

componentrc,iholds simultaneously to provides the probabilistic guarantee described as (4.15).

Assume that we haveN random samples for each componentrc,iofrc, ordered in increasing value

asr(1)_c,i, r(2)_c,i, . . . , r_c,i(N) no matter the original sample order. Then this order is also the order of the estimated valuerˆc,i, i.e.,rˆc,i(1)=r

(1)

c,i, . . . ,ˆr

(N)

c,i . We define the indexsby

s=min      k∈_N: N X j=k    N j    τ n N−j 1− τ n j 6 αh 2τ n      , (4.17)

and lets=N+ 1if the corresponding set is empty. The testH0is rejected if

To construct an uncertainty set, we need an accepted hypothesis test. Hence, we setq¯i,0 = ˆr_c,i(s)and

q_i,₀= ˆr(_c,iN−s+1)withrˆ_c,i(s)andˆr(_c,iN−s+1)from the sampled dataset, thenH0,iis always accepted. The

following uncertainty set is then applied in this work based on the range hypothesis testing (4.16). Proposition 1 ([12], [27]) Ifsdefined by equation(4.17)satisfies thatN −s+ 1< s, then, with

probability at least1−αhover the sample, the set

UM(rc) = n

rc∈Rτ n: ˆr_c,i(N−s+1)6rc,i6ˆr_c,i(s) o

(4.18)

implies a probabilistic guarantee for_P∗(rc)at level.

The hypothesis (4.16) is tested for each component rc,i separately, and the uncertainty demand

model also describes the range ofrc,i, i= 1,2, . . . , τ nseparately provided by Proposition 1. We

do not assume that the marginal distributions ofP∗ are independent, their correlations are reflected

in the box uncertainty set in the sense that changing the value ofnandτ result in a different index values(4.17), and the order statisticsˆr_c,i(N−s+1)andˆr_c,i(s) will be different. However, the model of the box type of uncertainty set formula does not directly describe the spatial-temporal correlations among components ofrc.

Uncertainty set motivated by moment hypothesis testing

Though the box type of uncertainty set reflects the spatial-temporal correlations by varying range values with different dimensions ofrc, it is not easy to tell directly from the uncertainty set (4.18)

when the range of one component changes how will others be affected. To construct an uncertainty set that directly shows the spatial-temporal correlations of the demand model, we consider to apply hypothesis testing related to the first and second moments of the random vector. The following null assumptions are about the mean and covariance of the true distributionP∗(rc) of random vector rc[79] H0 :EP ∗ [rc] =r0 and EP ∗ [rcrTc]−EP ∗ [rc]EP ∗ [r_cT] = Σ0,

with test statistics T defined askrˆc−r0k and kΣˆ −Σ0k. Given thresholds ΓB1 and ΓB2, H0 is rejected when the difference among the estimation of mean or covariance according to multiple

times of samples is greater than the threshold, i.e.,

kEP[˜rc]−rˆck2 >ΓB1 or kEP[˜rcr˜cT]−EP[˜rc]EP[˜rTc]−ΣˆkF >ΓB2,

whereEP[˜r]is the estimated mean value of one experiment,rˆcandΣˆ are the estimated mean and

covariance of multiple times of experiments. The remaining problem is then to select the thresholds such that the above hypothesis testing holds given the dataset. In the following Section ??, the detailed steps of calculating the thresholds ΓB

1 andΓB2 at a desired significance value αh and

probabilistic guarantee levelbased on the given dataset is described2.

The uncertainty set derived based on the moment hypothesis testing is defined in the following proposition.

Proposition 2 ([12], [79]) With probability at least1−αhwith respect to the sampling, the follow-

ing uncertainty setUCS

(rc)implies a probabilistic guarantee level offorP∗(rc)

UCS (rc) ={rc>0,rˆc+y+CTw:∃y, w∈Rnτ s.t. kyk₂ ₆ΓB₁,kwk₂ ₆ r 1− }, (4.19) whereCT_C _{= ˆ}_{Σ + Γ}B 2Iis a Cholesky decomposition.

By testing the properties of both first and second moments of the dataset, the uncertainty set (4.19) reflects the spatial-temporal correlations of the demand model directly compared with the box type (4.18). When one component of rc increases or decreases, we have an intuition how it af-

fects the value of other components ofrcby the expression (4.19). More properties of each type of

uncertainty set and application level problems, such as how to choose the number of samplesN for the hypothesis testing with high dimensionalrcwill be discussed in evaluations of Section 3.6.

2_{Bootstrapped thresholds and theoretic bounds proposed by work [48] are compared in [12]. The bootstrapped thresh-} olds result in a smaller uncertainty set in general, hence reduces the ambiguity inP∗. In this work, we apply the boot-

strapped thresholdsΓB1 andΓ

In document Data-Driven Dynamic Robust Resource Allocation: Application to Efficient Transportation (Page 84-88)