• No results found

els

In short, we are faced with a random sample x1, . . . , xn

from a precise distribution Pθ0 which is unknown. It is only known thatPθ0 is contained in a credal set M0

θ. The parameter θ is also unknown and should be estimated.

The idea of the presented minimum distance estimator is very simple: The data x1, . . . , xn are used to build the empirical measure

P(n) = 1 n n X i=1 δxi

Then, the minimum distance estimator is that ˆθ ∈Θ such that P(n) lies next to M0

ˆ

θ. That is, we calculate the distance between P

(n) and M0

θ for every

θ ∈Θ and pick that ˆθ where the distance is minimal.

This estimator will not be optimal in the general decision theoretic setup and the present section fails to proof any optimality result – the present section even does not make any attempt to derive such an optimality result. Admittedly, this is criticizable since, as a rule, every promoted statistical procedure should be justified by an appropriate optimality criterion.

On the other hand, even small numbers of observations (e.g. n = 10) usually lead to models which are so extensive that calculating optimal estimators is excluded because of exceedingly high computational efforts – at least as measured by the present state of research. So, the best that we can hope for at the moment are optimal estimators which cannot be calculated or estimators which can be calculated and behave reasonably well. The purpose of the present section is to develop such an estimator which can be calculated in real applications. The proposed minimum distance estimator fulfills this practical need in many situations. Furthermore, the asymptotic results of Section 6.4 confirm that the estimator behaves reasonably well in terms of asymptotic statistics, and the simulation study in Section 6.6 demonstrates its applicability.

In order to define the estimator in a mathematical rigorous way, the setup developed in Section 6.2 is used:

Let Ω be a set with σ-algebra F and X be a set with σ-algebra9 A0. Let Θ be any index set. There is no need to assume finiteness of Θ at the moment – such an assumption will only be used for concrete computations in Section 6.5.

Let (Uθ)θ∈Θ be an imprecise model on (Ω,F) with corresponding family of credal sets (Uθ)θ∈Θ. The observations x1, . . . , xn are modeled via random variables

Xi : Ω → X , i∈ {1, . . . , n}

It is assumed that X1, . . . , Xn are independent uniformly distributed with respect to an

unknown probability charge Uθ ∈ Uθ.

Therefore, we have an imprecise model (Pθ0)θ∈Θ on (X,A0) with corresponding credal sets M0θ = X(Uθ)

Uθ ∈ Uθ , θ ∈Θ ; and the random variables

X1, . . . , Xn ∼i.i.d. P

0

θ

are independent identically distributed according some precise distributionPθ0 which may be any element of the credal set of P0θ. The task is to estimate the unknown parameter θ ∈Θ .

The following fundamental assumption is made:

Assumption 6.3 There is a finite subset K={f1, . . . , fs} ⊂ L∞(X,A0) such that

M0θ = Pθ0 ∈ba+1(X,A0) Pθ0[f]≤P 0

θ[f] ∀f ∈ K (6.7)

for every θ∈Θ. Furthermore, it is assumed that

P0θ[f]−Pθ0[f] > 0 ∀f ∈ K (6.8)

where P0θ is the corresponding lower coherent prevision.10

Such assumptions have also been made in Subsection 5.4.2. As has already been stated there, these assumptions can be justified as follows: Practitioners will very often only be able to specify concrete upper previsions for a finite number of functions and this directly leads to models satisfying Assumption (6.7). In particular, this will often be true for expert systems. There, it is a natural proceeding to ask some experts about their prevision (or expectation) on some specific events, experiments, gambles, assets etc. – and this can only be done for a finite number of such objects.

Furthermore, Section 5.2 tell us that using models of form (6.7) which violate (6.8) is dangerous because these models are potentially most instable. Therefore, those models which violate (6.8) generally should be avoided anyway.

9In order to derive asymptotic results later on, some parts of the investigations are concerned with

σ-additive probability measures and, therefore, we have to consider σ-algebras. This does not provide difficulties because an imprecise model on an algebra can always be extended to an imprecise model on a σ-algebra by means of a natural extension.

10That is, P0 θ[f0] = inf P0 θ∈M 0 θ Pθ0[f0]

Note that these assumptions rule out classical probability measures. One of the main goals K. Weichselberger had when he developed his theory of imprecise probabilities (F- probabilities) was: “As a special case, classical probability must fit into this theory.” 11 This means that – as a fundamental property – every probability measure is also an F-probability (and a coherent upper prevision). However, F-probabilities and coherent upper previsions which fulfill the above assumptions cannot coincide with probability measures. Accordingly, the following investigations do not apply to classical probability theory as a special case. That is, we deal with a strictly imprecise setup. As will be seen, this turns out to be an advantage here because the minimum distance estimator is based on the total variation distance. While working with total variation distances provides some difficulties in classical probability theory these difficulties cannot occur in our strictly imprecise setup; cf. Section 6.4.

Now, it is possible to define the empirical measure in this setup. The empirical measure P(n) is the map P(n) : Ω → ba+1(X,A 0 ), ω 7→ P(ωn) = 1 n n X i=1 δXi(ω)

where δxi denotes the Dirac measure in xi ∈ X . Note that P(n)[f0] : Ω → R, ω 7→ 1 n n X i=1 δXi(ω)[f 0 ] = 1 n n X i=1 f0 Xi(ω)

is a (bounded) random variable for every f0 ∈ L∞(X,A0) and Ω× A0 → R, (ω, A0) 7→ P(ωn)[IA0] is a Markov kernel.

The following notation will also be used: For every x= (x1, . . . , xn)∈ Xn, the probability

measure on (X,A0) defined by P(xn)[f 0 ] := 1 n n X i=1 f0 xi ∀f0 ∈ L∞(X,A0) is denoted by P(xn).

In order to define a minimum distance estimator, we have to choose a suitable notion of “distance” between a measure P00 and a coherent upper prevision P0 on (X,A0) now. Appropriately to the sensitivity analyst’s point of view, the distance will be defined as

inf P0∈M0d(P 0 0, P 0 ) where d is a suitable metric on ba+1(X,A0) .

Since bounded charges µ0 ∈ ba(X,A0) are mainly regarded as bounded linear operators onL∞(X,A0) within the theory of imprecise probabilities, it seems to be most natural to choose the operator norm for d; that is,

d(P00, P0) = kP00 −P0k = sup f0∈L∞(X,A0) P00[f0]−P0[f0] kf0k 11(Weichselberger, 2000, p. 149f)

and we put P00−P0 := inf P0∈M0kP 0 0−P 0k (6.9)

Though this is not a norm (because of the different roles of P00 and P0) this notation is sensible. Particularly, in the special case that P0 is a precise prevision (i.e. a probability charge), the definition in (6.9) reduces to the usual operator norm in ba(X,A0) .

Next, the minimum distance estimator can be defined: The minimum distance estimator

ˆ θn0 is ˆ θ0n : Xn → Θ, x 7→ arg min θ∈Θ P(xn)−P 0 θ

Recall from Section 2.3 that the operator norm in ba(X,A0) is equal to the total variation. Therefore, the minimum distance estimator is based on the total variation norm. As shown in the following section, the annoying properties of the total variation norm with respect to the empirical measure in classical statistics completely disappear in the above developed setup based on imprecise probabilities.