Types of Convergence - Sequences of Spaces, Events, and Random Variables

1.3 Sequences of Spaces, Events, and Random Variables

1.3.3 Types of Convergence

The first important point to understand about asymptotic theory is that there are different kinds of convergence of a sequence of random variables, _{Xn}. Three of these kinds of convergence have analogues in convergence of general measurable functions (see Appendix 0.1) and a fourth type applies to convergence of the measures themselves. Different types of convergence apply to

• a function, that is, directly to the random variable (Definition1.35). This is the convergence that is ordinarily called “strong convergence”.

• expected values of powers of the random variable (Definition1.36). This is also a type of strong convergence.

• probabilities of the random variable being within a range of another random variable (Definition1.37). This is a weak convergence.

• the distribution of the random variable (Definition1.39, stated in terms of weak convergence of probability measures, Definition1.38). This is the convergence that is ordinarily called “weak convergence”.

In statistics, we are interested in various types of convergence of proce- dures of statistical inference. Depending on the kind of inference, one type of convergence may be more relevant than another. We will discuss these in later chapters. At this point, however, it is appropriate to point out that an important property ofpoint estimators isconsistency, and the various types of consistency of point estimators, which we will discuss in Section3.8.1, correspond directly to the types of convergence of sequences of random variables we discuss below.

Almost Sure Convergence

Definition 1.35 (almost sure (a.s.) convergence) We say that_{Xn} converges almost surelyto X if

lim n→∞Xn=X a.s. (1.155) We write Xn a.s. → X.

Writing this definition in the form of Definition0.1.38on page726, with Xn andX defined on the probability space (Ω,F, P), we have

P(_{ω : lim

n→∞Xn(ω) =X(ω)}) = 1. (1.156)

This expression provides a very useful heuristic for distinguishing a.s. convergence from other types of convergence.

Almost sure convergence is equivalent to lim

n→∞Pr (∪ ∞

m=nkXm−Xk> ) = 0, (1.157) for every >0 (exercise).

Almost sure convergence is also called “almost certain” convergence, and written asXn

a.c. → X.

The condition (1.155) can also be written as Prlim

n→∞kXn−Xk<

= 1, (1.158)

for every >0. For this reason, almost sure convergence is also calledconver- gence with probability 1, and may be indicated by writingXn

wp1

→ X. Hence, we may encounter three equivalent expressions:

a.s.

→ ≡ a.c.→ ≡ wp1→ .

Almost sure convergence of a sequence of random variables {Xn} to a constant cimplies lim sup_nXn = lim infnXn =c, and implies{Xn=ci.o.}; by itself, however, _{Xn =c i.o.} does not imply any kind of convergence of {Xn}.

Convergence in rth Moment

Definition 1.36 (convergence in rth _{moment (convergence in} _L r)) For fixedr >0, we say that_{Xn}converges inrth momentto X if

lim

n→∞E(kXn−Xk

We write

Xn Lr

→X. (Compare Definition0.1.50on page748.)

Convergence inrth_{moment requires that E(}

kXnkrr)<∞for eachn. Con- vergence inrth_{moment implies convergence in}_sth _{moment for}_s_≤_r_{(and, of} course, it implies that E(kXnkss)<∞for eachn). (See Theorem 1.16, which was stated only for scalar random variables.)

For r = 1, convergence in rth _{moment is called} _{convergence in absolute}

mean. For r = 2, it is called convergence in mean square or convergence in second moment, and of course, it implies convergence in mean. (Recall our notational convention:_kXn−Xk=kXn−Xk2.)

The Cauchy criterion (see Exercise0.0.6don page689) is often useful for proving convergence in mean or convergence in mean square, without speci- fying the limit of the sequence. The sequence_{Xn}converges in mean square (to some real number) iff

lim

n,m→∞E(kXn−Xmk) = 0. (1.160)

Convergence in Probability

Definition 1.37 (convergence in probability)

We say that_{Xn} converges in probabilitytoX if for every >0, lim n→∞Pr(kXn−Xk> ) = 0. (1.161) We write Xn p →X.

(Compare Definition0.1.51on page748for general measures.)

Notice the difference in convergence in probability and convergence in rth _{moment. Convergence in probability together with uniform integrability} implies convergence in mean, but not in higher rth _{moments. It is easy to} construct examples of sequences that converge in probability but that do not converge in second moment (exercise).

Notice the difference in convergence in probability and almost sure convergence; in the former case the limit of probabilities is taken, in the lat- ter the case a probability of a limit is evaluated; compare equations (1.157) and (1.161). It is easy to construct examples of sequences that converge in probability but that do not converge almost surely (exercise).

Although convergence in probability does not imply almost sure converge, it does imply the existence of a subsequence that does converge almost surely, as stated in the following theorem.

Theorem 1.31

Suppose {Xn} converges in probability toX. Then there exists a subsequence {Xni} that converges almost surely toX.

Stated another way, this theorem says that if _{Xn} converges in probability toX, then there is an increasing sequence_{ni}of positive integers such that

lim i→∞Xni

a.s. = X.

Proof.The proof is an exercise. You could first show that there is an increasing sequence {ni} such that

∞

X i=1

Pr(_|Xni−X|>1/i)<∞,

and from this conclude that Xni

a.s. → X.

Weak Convergence

There is another type of convergence that is very important in statistical applications; in fact, it is the basis for asymptotic statistical inference. This convergence is defined in terms of pointwise convergence of the sequence of CDFs; hence it is aweakconvergence. We will give the definition in terms of the sequence of CDFs or, equivalently, of probability measures, and then state the definition in terms of a sequence of random variables.

Definition 1.38 (weak convergence of probability measures)

Let _{Pn} be a sequence of probability measures and {Fn} be the sequence of corresponding CDFs, and letF be a CDF with corresponding probability measureP. If at each point of continuityt ofF,

lim

n→∞Fn(t) =F(t), (1.162)

we say that the sequence of CDFs_{Fn} converges weaklyto F, and, equivalently, we say that the sequence of probability measures_{Pn}converges weakly toP. We write

Fn→w F or

Pn→w P

Definition 1.39 (convergence in distribution (in law))

If_{Xn}have CDFs {Fn}andX has CDFF, we say that{Xn} converges in

distributionorin lawtoX iffFn→w F. We write Xn

d →X.

Because convergence in distribution is not precisely a convergence of the random variables themselves, it may be preferable to use a notation of the form

L(Xn)→ L(X),

where the symbol _L(_·) refers to the distribution or the “law” of the random variable.

When a random variable converges in distribution to a distribution for which we have adopted a symbol such as N(µ, σ2_{), for example, we may use} notation of the form

Xn →∼ N(µ, σ2).

Because this notation only applies in this kind of situation, we often write it more simply as just

Xn →N(µ, σ2), or in the “law” notation,_L(Xn)→N(µ, σ2)

For certain distributions we have special symbols to represent a random variable. In such cases, we may use notation of the form

Xn d →χ2ν,

which in this case indicates that the sequence {Xn}converges in distribution to a random variable with a chi-squared distribution withνdegrees of freedom. The “law” notation for this would beL(Xn)→ L(χ2ν).

Determining Classes

In the case of multiple probability measures over a measurable space, we may be interested in how these measures behave over different sub-σ-fields, in particular, whether there is a determining class smaller than the σ-field of the given measurable space. For convergent sequences of probability measures, the determining classes of interest are those that preserve convergence of the measures for all sets in theσ-field of the given measurable space.

Definition 1.40 (convergence-determining class)

Let _{Pn} be a sequence of probability measures defined on the measurable space (Ω,F) that converges (weakly) toP, also a probability measure defined on (Ω,F). A collection of subsets C ⊆ F is called a convergence-determining classof the sequence, iff

Pn(A)→P(A)∀A∈ C 3P(∂A) = 0 =⇒Pn(B)→P(B)∀B ∈ F.

It is easy to see that a convergence-determining class is a determining class (exercise), but the converse is not true, as the following example from Romano and Siegel (1986) shows.

Example 1.20 a determining class that is not a convergence-determining class

For this example, we use the familiar measurable space (IR,B), and construct a determining class C whose sets exclude exactly one point, and then define a probability measure P that puts mass one at that point. All that is then required is to define a sequence _{Pn}that converges toP. The example given by Romano and Siegel (1986) is the collection _C of all finite open intervals that do not include the single mass point of P. (It is an exercise to show that this is a determining class.) For definiteness, let that special point be 0, and letPn be the probability measure that puts mass one at n. Then, for any A∈ C, Pn(A)→0 =P(A), but for any interval (a, b) where a <0 and 0< b <1,Pn((a, b)) = 0 butP((a, b)) = 1.

Both convergence in probability and convergence in distribution are weak types of convergence. Convergence in probability, however, means that the probability is high that the two random variables are close to each other, while convergence in distribution means that two random variables have the same distribution. That does not mean that they are very close to each other. The term “weak convergence” is often used specifically for convergence in distribution because this type of convergence has so many applications in asymptotic statistical inference. In many interesting cases the limiting distribution of a sequence _{Xn} may be degenerate, but for some sequence of constantsan, the limiting distribution of{anXn}may not be degenerate and in fact may be very useful in statistical applications. The limiting distribution of_{anXn}for a reasonable choice of a sequence of normalizing constants{an} is called the asymptotic distribution of_{Xn}. After some consideration of the relationships among the various types of convergence, in Section1.3.7, we will consider the “reasonable” choice of normalizing constants and other properties of weak convergence in distribution in more detail. The relevance of the limiting distribution of{anXn} will become more apparent in the statistical applications in Section3.8.2and later sections.

Relationships among Types of Convergence

Almost sure convergence and convergence inrth_{moment are both strong types} of convergence, but they are not closely related to each other. We have the logical relations shown in Figure1.3.

The directions of the arrows in Figure1.3 correspond to theorems with straightforward proofs. Where there are no arrows, as between Lr and a.s., we can find examples that satisfy one condition but not the other (see Ex- amples 1.21 and 1.22 below). For relations in the opposite direction of the arrows, we can construct counterexamples, as for example, the reader is asked to do in Exercises 1.54a and1.54b.

Lr Q Q Q Q Q Q Q s ? L1 PP PP PPP_P_q_P P P P P P i uniformly integrable a.s + ?_a.s 1 ) _subsequence p ? d (or w)

Figure 1.3. Relationships of Convergence Types

Useful Sequences for Studying Types of Convergence

Just as for working with limits of unions and intersections of sets where we find it useful to identify sequences of sets that behave in some simple way (such as the intervals [a+ 1/n, b₋1/n] on page 646), it is also useful to identify sequences of random variables that behave in interesting but simple ways.

One useful sequence begins with_{Un}, where Un∼U(0,1/n). We define

Xn =nUn. (1.163)

This sequence can be used to show that an a.s. convergent sequence may not converge in L1.

Example 1.21 converges a.s. but not in mean

Let_{Xn}be the sequence defined in equation (1.163). Since Pr(limn→∞Xn = 0) = 1, Xn

a.s.

→ 0. The mean and in fact the rth _{moment (for} _{r >} _{0) is 0.} However,

E(_|Xn−0|r) = Z 1/n

nrdu=nr−1.

Forr= 1, this does not converge to the mean of 0, and forr >1, it diverges; hence _{Xn} does not converge to 0 in rth moment for any r ≥ 1. (It does converge to the correct rth _{moment for 0}_{< r <}_{1, however.)}

This example is also an example of a sequence that converges in probability (since a.s. convergence implies that), but does not converge inrth _moment.

Other kinds of interesting sequences can be constructed as indicators of events; that is, 0-1 random variables. One such simple sequence is the Bernoulli random variables{Xn}with probability thatXn = 1 being 1/n. This sequence can be used to show that a sequence that converges toX in probability does not necessarily converge to X a.s.

Other ways of defining 0-1 random variables involve breaking a U(0,1) distribution into uniform distributions on partitions of ]0,1[. For example, for a positive integerk, we may form 2k _{subintervals of ]0,}_{1[ for}_j _{= 1, . . . ,}₂k _as

_j −1 2k , j 2k .

As k gets larger, the Lebesgue measure of these subintervals approaches 0 rapidly.Romano and Siegel(1986) build an indicator sequence using random variables on these subintervals for various counterexamples. This sequence can be used to show that an L2 convergent sequence may not converge a.s., as in the following example.

Example 1.22 converges in second moment but not a.s. LetU ∼U(0,1) and define

Xn= ( 1 if jn−1 2kn < U < jn 2kn 0 otherwise,

where jn= 1, . . . ,2kn andkn→ ∞as n→ ∞. We see that E((Xn−0)2) = 1/(2kn),

hence _{Xn} converges in quadratic mean (or in mean square) to 0. We see, however, that limn→∞Xndoes not exist (since for any value ofU,Xn takes on each of the values 0 and 1 infinitely often). Therefore, {Xn}cannot converge a.s. (to anything!).

This is another example of a sequence that converges in probability (since convergence inrth _{moment implies that), but does not converge a.s.}

Convergence of PDFs

The weak convergence of a sequence of CDFs {Fn} is the basis for most asymptotic statistical inference. The convergence of a sequence of PDFs{fn} is a stronger form of convergence because it implies uniform convergence of probability on any given Borel set.

Theorem 1.32 (Scheff´e)

Let _{fn} be a sequence of PDFs that converge pointwise to a PDF f; that is,

at each x lim n→∞fn(x) =f(x). Then lim n→∞ Z B| fn(x)−f(x)|dx= 0 (1.164)

For a proof see Scheff´e(1947).

Hettmansperger and Klimko (1974) showed that if a weakly convergent sequence of CDFs {Fn} has an associated sequence of PDFs {fn}, and if these PDFs are unimodal at a given point, then on any closed interval that does not contain the modal point the sequence of PDFs converge uniformly to a PDF.

Big O and Little o Almost Surely

We are often interested in nature of the convergence or the rate of convergence of a sequence of random variables to another sequence of random variables. As in general spaces of real numbers that we consider in Section 0.0.5 on page 652, we distinguish two types of limiting behavior by big O and little o. These are involve the asymptotic ratio of the elements of one sequence to the elements of a given sequence _{an}. We defined two order classes, O(an) and o(an). In this section we begin with a given sequence of random variables {Yn} and define four different order classes, O(Yn) a.s., o(Yn) a.s., OP(Yn), and oP(Yn), based on whether or not the ratio is approaching 0 (that is, big O or little o) and on whether the converge is almost sure or in probability.

For sequences of random variables{Xn} and {Yn} defined on a common probability space, we identify different types of convergence, either almost sure or in probability.

• Big O almost surely, written O(Yn) a.s.

Xn∈O(Yn) a.s.iff Pr (kXnk ∈O(kYnk)) = 1 • Little o almost surely, written o(Yn) a.s.

Xn ∈o(Yn) a.s.iff kXnk/kYnka.s.→0. CompareXn/Yn a.s.→ 0 forXn∈IRmandYn∈IR.

Big O and Little o Weakly

We also have relationships in which one sequence converges to another in probability.

• Big O in probability, written OP(Yn).

Xn∈OP(Yn) iff∀ >0∃constantC>03sup

n Pr(kXnk ≥CkYnk)< . IfXn∈OP(1),Xn is said to be bounded in probability.

• Little o in probability, written oP(Yn).

Xn∈oP(Yn) iffkXnk/kYnk p →0.

IfXn∈oP(1), then Xn converges in probability to 0, and conversely. IfXn∈oP(1), then also Xn∈OP(1). (Exercise.)

Instead of a defining sequence{Yn} of random variables, the sequence of interest may be a sequence of constants{an}.

Some useful properties are the following, in which{Xn},{Yn}, and{Zn} are random variables defined on a common probability space, and {an} and {bn} are sequences of constants.

Xn∈op(an) =⇒Xn∈Op(an) (1.165) Xn ∈op(1)⇐⇒Xn→0. (1.166) Xn∈Op(1/an), limbn/an<∞=⇒Xn∈Op(mn). (1.167) Xn∈Op(an) =⇒XnYn∈Op(anYn). (1.168) Xn∈Op(an), Yn ∈Op(bn) =⇒XnYn∈Op(anbn). (1.169) Xn ∈Op(an), Yn ∈Op(bn) =⇒Xn+Yn∈Op(kank+kbnk). (1.170) Xn∈Op(Zn), Yn ∈Op(Zn) =⇒Xn+Yn∈Op(Zn). (1.171) Xn∈op(an), Yn ∈op(bn) =⇒XnYn∈op(anbn). (1.172) Xn∈op(an), Yn ∈op(bn) =⇒Xn+Yn ∈op(kank+kbnk). (1.173) Xn∈op(an), Yn ∈Op(bn) =⇒XnYn∈op(anbn). (1.174) You are asked to prove these statements in Exercise1.61. There are, of course, other variations on these relationships. The order of convergence of sequence of absolute expectations can be related to order of convergence in probability: an∈IR+, E(|Xn|)∈O(an) =⇒Xn ∈Op(an). (1.175) Almost sure convergence implies that the sup is bounded in probability. For any random variableX (recall that a random variable is finite a.s.),

In document Theory of Statistics - Free Computer, Programming, Mathematics, Technical Books, Lecture Notes and Tutorials (Page 93-103)