Random Sets: CDFs, Intervals and Probability Boxes

Random Sets can be used to model the uncertainty by means of cumulative distribution functions, intervals, probability boxes, normalised fuzzy sets and Dempster-Shafer struc- tures. In this work, CDFs, intervals and probability boxes are considered as uncertainty representations.

2.2.1 Cumulative Distribution Functions

A CDF, FX, is used to express the probability distribution of a random variable, X,

as already shown in Eq. (2.5), where FXpxq “ PXpX ď xq for x P X Ď R, and

fully characterises the random variable, X. CDFs can be represented as random sets Γ : Ω Ñ F where F is the collection of focal elements Γpαq :“ F_X´1pαq, @α P Ω.

The inverse of the CDF, FX, is defined by F_X´1pαq :“inftx :FXpxq ěα, α P p0,1su.

Note that the representation of the CDF as a random set only contains the aleatory component, which is given either by α, or by its corresponding samplex“F_X´1pαq.

2.2.2 Intervals

Aninterval,x“ rx, xs, can be represented as the random setΓ :ΩÑF, αÞÑxdefined on R where the focal set contains the unique focal element rx, xs, that is, F “x and αP p0,1s ”Ω; in this case,PΓis specified by Eq. (2.2). In other words, all the samplings

of α PΩ draw the same interval x. Note that the interval representation as a random set does not contain an aleatory component, as all α-s map to the same focal element x. In this case, the epistemic component is given by the interval itself. An alternative equivalent way to denote an interval is by means of the symbolIx “ rlx, uxs “ rx, xs.

2.2.3 Probability boxes

Aprobability box orp-box (term coined by Ferson et al. [29]),“F , F‰, is the set of CDFs F :Fpxq ďFXpxq ďFpxq,@ xPR, delimited by a lower CDF bound F and an upper

CDF bound F. Lower and upper CDF bounds define the set of distribution functions that collectively represents the epistemic uncertainty about the variable. Note that left bound F is an upper bound on the probabilities and a lower bound on quantiles (i.e. the x-values), while the right bound F is a lower bound on the probabilities and an upper bound on quantiles. The class of functions contained in the p-box may not have additional restrictions or alternatively may belong to a reduced class of CDFs; using this distinction, probability boxes can naturally be grouped into distribution-free and parametric p-boxes.

2.2.3.1 Distribution-free p-boxes

Distribution-free p-boxes (also known as non-parametric p-boxes, free p-box or simply p-boxes) appear when the CDF of a random variable cannot be precisely specified, given that the CDF family is unknown; nonetheless it is possible to define the upper, F, and lower, F, CDF bounds. These bounds can either be defined in advance or can be estimated using for example the methods listed in Zhang et. al. [78]. Note that distribution-free p-boxes do not make any assumption about the family or shape of the CDFs that belong to the p-box.

Since only upper and lower CDF bounds are concerned in a free p-box, two different ways to define a p-box can be identified. One way consists in specifying the upper and lower CDFs, which may have unknown or inhomogeneous parental distribution models. In this case, the random set Γ : Ω Ñ F, α ÞÑ Γpαq, on R, is the ensemble of focal

elements, Γpαq :““Fpαq´1, Fpαq´1‰, for α P p0,1s ”Ω, with F´1 and F´1 denoting the inverse upper and lower CDFs bounds, respectively. With this representation, a focal element can be obtained as

An alternative definition, widely used in this work, consists in defining upper and lower CDFs by means of interval (or set-valued) hyperparameters. As a result the bounding CDFs are obtained from the envelope of known distribution functions. So, for example, a p-box can be defined as the collection of all distribution functions which CDFs are bounded by normal distributions, F „ Npµ, σq, which mean and standard

deviation belong to the intervals, µ “ rµ, µs and σ “ rσ, σs. In general, a CDF

with interval hyper-parameters, θ_i, i “ 1, . . . , m, denoted by Fp¨;θ₁, . . . , θ_mq, can be

given a random set representation as the image through the function,F´1_{, of the input}

intervals θ_i:i“1, . . . , m( together with the uniform α-sample obtained from Ω. In

consequence, it can be represented as the random setΓ :ΩÑF, αÞÑΓpαq defined on

R, where F is the collection of focal elements F “ F´1

pα;θ₁,¨ ¨ ¨ , θ_mq:αPΩ(. In

this way, the focal element aleatory component,α, can be separated from the epistemic component, which is obtained as the Cartesian product,θ “Śm_i_“₁θ_i :“θ₁ˆ ¨ ¨ ¨ ˆθ_m. This representation of distribution-free p-boxes shows that for a single realization of the aleatory component, α, a focal element contains the image through F´1 _{of all}

the possible combinations of values within the intervals of the hyperparameters of the parental CDF,F. Provided this definition, the upper and lower CDFs are

F´1pαq “inf θPθ F´1 pα;θ1, . . . , θmq; F´1pαq “sup θPθ F´1 pα;θ1, . . . , θmq. (2.7)

Note that with this definition, the bounding CDFs may not entirely belong to the same parental distribution function, as they are often the envelope of two or more parental distributions.

2.2.3.2 Parametric p-boxes

Parametric p-boxes (also known as distributional p-boxes) appear when there is uncertainty in the hyperparameters of a given distribution function, which are provided as intervals. For instance, let again F „ Npµ, σq be a Normal distribution function

with interval mean, µ, and interval standard deviation, σ. All Normal distribution functions that have mean and standard deviation inside the specified intervals belong to the probability box. Despite the lower and upper CDF bounds enclosing infinite non-normal distributions, only distributions from the original normal parental model are considered. This constraints the parametric p-boxes to a smaller set of distributions compared to the distribution-free ones. This representation does not look at the CDF bounds, but is only concerned with the distributions responsible for the lower and upper probability bounds.

Parametric p-boxes cannot be treated using a random set representation, because only one distribution at a time is selected and therefore, it is not possible to separate the aleatory from the epistemic component. Distributional p-boxes can be treated using

a double loop Monte Carlo-optimisation strategy, in which the inner loop samples α-s from a Copula function, inp0,1s, and the outer loop picksθ-s to search for the extrema in θ “ Śm_i_“₁θ_i. If the dimension of the epistemic space, Θ ” θ is not too high (indicativelyď5), a double loop Monte Carlo strategy can be adopted, where the outer

Monte Carlo is used to perform a heuristic search in the epistemic space,Θ.

Note that the use of parametric p-boxes always results in narrower intervals of probability compared to the distribution-free case, as a consequence of searching within a smaller set of distribution functions.

In document Efficient random set uncertainty quantification by means of advanced sampling techniques (Page 32-35)