A minimum distance estimator for imprecise models

els

In short, we are faced with a random sample x1, . . . , xn

from a precise distribution P_θ0 which is unknown. It is only known thatP_θ0 is contained in a credal set M0

θ. The parameter θ is also unknown and should be estimated.

The idea of the presented minimum distance estimator is very simple: The data x1, . . . , xn are used to build the empirical measure

P(n) = 1 n n X i=1 δxi

Then, the minimum distance estimator is that ˆθ ∈Θ such that _P(n) _{lies next} to M0

θ. That is, we calculate the distance between P

(n) _and _M0

θ for every

θ ∈Θ and pick that ˆθ where the distance is minimal.

This estimator will not be optimal in the general decision theoretic setup and the present section fails to proof any optimality result – the present section even does not make any attempt to derive such an optimality result. Admittedly, this is criticizable since, as a rule, every promoted statistical procedure should be justified by an appropriate optimality criterion.

On the other hand, even small numbers of observations (e.g. n = 10) usually lead to models which are so extensive that calculating optimal estimators is excluded because of exceedingly high computational efforts – at least as measured by the present state of research. So, the best that we can hope for at the moment are optimal estimators which cannot be calculated or estimators which can be calculated and behave reasonably well. The purpose of the present section is to develop such an estimator which can be calculated in real applications. The proposed minimum distance estimator fulfills this practical need in many situations. Furthermore, the asymptotic results of Section 6.4 confirm that the estimator behaves reasonably well in terms of asymptotic statistics, and the simulation study in Section 6.6 demonstrates its applicability.

In order to define the estimator in a mathematical rigorous way, the setup developed in Section 6.2 is used:

Let Ω be a set with σ-algebra F and X be a set with σ-algebra9 _A0_{. Let Θ be any index} set. There is no need to assume finiteness of Θ at the moment – such an assumption will only be used for concrete computations in Section 6.5.

Let (Uθ)θ∈Θ be an imprecise model on (Ω,F) with corresponding family of credal sets (Uθ)θ∈Θ. The observations x1, . . . , xn are modeled via random variables

Xi : Ω → X , i∈ {1, . . . , n}

It is assumed that X1, . . . , Xn are independent uniformly distributed with respect to an

unknown probability charge Uθ ∈ Uθ.

Therefore, we have an imprecise model (P_θ0)θ∈Θ on (X,A0) with corresponding credal sets M0_θ = X(Uθ)

U_θ ∈ U_θ , θ ∈Θ ; and the random variables

X1, . . . , Xn ∼i.i.d. P

are independent identically distributed according some precise distributionP_θ0 which may be any element of the credal set of P0_θ. The task is to estimate the unknown parameter θ ∈Θ .

The following fundamental assumption is made:

Assumption 6.3 There is a finite subset K={f1, . . . , fs} ⊂ L∞(X,A0) such that

M0_θ = P_θ0 ∈ba+₁(X,A0) P_θ0[f]≤P 0

θ[f] ∀f ∈ K (6.7)

for every θ∈Θ. Furthermore, it is assumed that

P0_θ[f]−P_θ0[f] > 0 ∀f ∈ K (6.8)

where P0_θ is the corresponding lower coherent prevision.10

Such assumptions have also been made in Subsection 5.4.2. As has already been stated there, these assumptions can be justified as follows: Practitioners will very often only be able to specify concrete upper previsions for a finite number of functions and this directly leads to models satisfying Assumption (6.7). In particular, this will often be true for expert systems. There, it is a natural proceeding to ask some experts about their prevision (or expectation) on some specific events, experiments, gambles, assets etc. – and this can only be done for a finite number of such objects.

Furthermore, Section 5.2 tell us that using models of form (6.7) which violate (6.8) is dangerous because these models are potentially most instable. Therefore, those models which violate (6.8) generally should be avoided anyway.

9_{In order to derive asymptotic results later on, some parts of the investigations are concerned with}

σ-additive probability measures and, therefore, we have to consider σ-algebras. This does not provide difficulties because an imprecise model on an algebra can always be extended to an imprecise model on a σ-algebra by means of a natural extension.

10_{That is,} _P0 θ[f0] = inf P0 θ∈M 0 θ Pθ0[f0]

Note that these assumptions rule out classical probability measures. One of the main goals K. Weichselberger had when he developed his theory of imprecise probabilities (F- probabilities) was: “As a special case, classical probability must fit into this theory.” 11 _{This means that – as a fundamental property – every probability measure is also an} F-probability (and a coherent upper prevision). However, F-probabilities and coherent upper previsions which fulfill the above assumptions cannot coincide with probability measures. Accordingly, the following investigations do not apply to classical probability theory as a special case. That is, we deal with a strictly imprecise setup. As will be seen, this turns out to be an advantage here because the minimum distance estimator is based on the total variation distance. While working with total variation distances provides some difficulties in classical probability theory these difficulties cannot occur in our strictly imprecise setup; cf. Section 6.4.

Now, it is possible to define the empirical measure in this setup. The empirical measure P(n) is the map P(n) : Ω → ba+1(X,A 0 ), ω 7→ _P(_ωn) = 1 n n X i=1 δXi(ω)

where δxi denotes the Dirac measure in xi ∈ X . Note that P(n)[f0] : Ω → R, ω 7→ 1 n n X i=1 δXi(ω)[f 0 ] = 1 n n X i=1 f0 Xi(ω)

is a (bounded) random variable for every f0 ∈ L∞(X,A0) and Ω× A0 → _R, (ω, A0) 7→ _P(_ωn)[IA0] is a Markov kernel.

The following notation will also be used: For every x= (x1, . . . , xn)∈ Xn, the probability

measure on (X,A0_{) defined by} P(xn)[f 0 ] := 1 n n X i=1 f0 xi ∀f0 ∈ L∞(X,A0) is denoted by _P(xn).

In order to define a minimum distance estimator, we have to choose a suitable notion of “distance” between a measure P₀0 and a coherent upper prevision P0 on (X,A0_{) now.} Appropriately to the sensitivity analyst’s point of view, the distance will be defined as

inf P0_∈M0d(P 0 0, P 0 ) where d is a suitable metric on ba+₁(X,A0_{) .}

Since bounded charges µ0 ∈ ba(X,A0_{) are mainly regarded as bounded linear operators} onL∞(X,A0) within the theory of imprecise probabilities, it seems to be most natural to choose the operator norm for d; that is,

d(P₀0, P0) = kP₀0 −P0k = sup f0_∈L∞(X_,_A0₎ P₀0[f0]−P0[f0] kf0_k 11_{(Weichselberger, 2000, p. 149f)}

and we put P₀0−P0 := inf P0_∈M0kP 0 0−P 0_k (6.9)

Though this is not a norm (because of the different roles of P₀0 and P0) this notation is sensible. Particularly, in the special case that P0 is a precise prevision (i.e. a probability charge), the definition in (6.9) reduces to the usual operator norm in ba(X,A0_{) .}

Next, the minimum distance estimator can be defined: The minimum distance estimator

ˆ θ_n0 is ˆ θ0_n : Xn → Θ, x 7→ arg min θ∈Θ P(_xn)−P 0 θ

Recall from Section 2.3 that the operator norm in ba(X,A0_{) is equal to the total variation.} Therefore, the minimum distance estimator is based on the total variation norm. As shown in the following section, the annoying properties of the total variation norm with respect to the empirical measure in classical statistics completely disappear in the above developed setup based on imprecise probabilities.

In document Hable, Robert (2009): Data-Based Decisions under Complex Uncertainty. Dissertation, LMU München: Fakultät für Mathematik, Informatik und Statistik (Page 172-175)