BROWNIAN MOTION 1. INTRODUCTION

(1)

BROWNIAN MOTION

1. INTRODUCTION 1.1. Wiener Process: Definition.

Definition 1. A standard (one-dimensional) Wiener process (also called Brownian motion) is a stochastic process{Wt}t≥0+indexed by nonnegative real numberstwith the following

properties: (1) W0 = 0.

(2) With probability1, the functiont→Wtis continuous int.

(3) The process{Wt}t≥0hasstationary, independent increments.

(4) The incrementWt+s−Wshas the NORMAL(0, t)distribution.

A Wiener process with initial valueW0 = x is gotten by addingxto a standard Wiener

process. As is customary in the land of Markov processes, the initial valuexis indicated (when appropriate) by putting a superscriptx on the probability and expectation oper-ators. The term independent increments means that for every choice of nonnegative real numbers0≤s1 < t1≤s2 < t2 ≤ · · · ≤sn< tn<∞, theincrementrandom variables

Wt1−Ws1, Wt2 −Ws2, . . . , Wtn−Wsn

are jointly independent; the termstationary increments means that for any0 < s, t < ∞ the distribution of the incrementWt+s−Wshas the same distribution asWt−W0 =Wt.

In general, a stochastic process with stationary, independent increments is called aL´evy process; more on these later. The Wiener process is the intersection of the class ofGaussian processeswith theL´evy processes.

It should not be obvious that properties (1)–(4) in the definition of a standard Brownian motion are mutually consistent, so it is nota prioriclear that a standard Brownian motion exists. (The main issue is to show that properties (3)–(4) do not preclude the possibility of continuous paths.) That it does exist was first proved by N. WIENER in about 1920. His proof was simplified by P. L ´EVY; we shall outline L´evy’s construction in section?? below. But notice that properties (3) and (4) are compatible. This follows from the follow-ing elementary property of the normal distributions: IfX, Y are independent, normally distributed random variables with meansµX, µY and variancesσX2, σY2, then the random

variableX+Y is normally distributed with meanµX +µY and varianceσX2 +σ2Y.

1.2. Brownian Motion as a Limit of Random Walks. One of the many reasons that Brow-nian motion is important in probability theory is that it is, in a certain sense, a limit of rescaled simple random walks. Let ξ1, ξ2, . . . be a sequence of independent, identically

distributed random variables with mean 0 and variance 1. For each n ≥ 1 define a continuous–time stochastic process{W_n(t)}_t≥0by

(1) Wn(t) = 1 √ n X 1≤j≤bntc ξj 1

(2)

This is a random step function with jumps of size ±1/√n at timesk/n, where k ∈ Z+.

Since the random variablesξj are independent, the increments ofWn(t)are independent.

Moreover, for largenthe distribution ofWn(t+s)−Wn(s)is close to the NORMAL(0, t)

distribution, by the Central Limit theorem. Thus, it requires only a small leap of faith to believe that, as n → ∞, the distribution of the random function Wn(t)approaches (in a

sense made precise below) that of a standard Brownian motion.

Why is this important? First, it explains, at least in part, why the Wiener process arises so commonly in nature. Many stochastic processes behave, at least for long stretches of time, like random walks with small but frequent jumps. The argument above suggests that such processes will look, at least approximately, and on the appropriate time scale, like Brownian motion.

Second, it suggests that many important “statistics” of the random walk will have lim-iting distributions, and that the limlim-iting distributions will be the distributions of the cor-responding statistics of Brownian motion. The simplest instance of this principle is the central limit theorem: the distribution of Wn(1)is, for large nclose to that of W(1) (the

gaussian distribution with mean0and variance1). Other important instances do not fol-low so easily from the central limit theorem. For example, the distribution of

(2) Mn(t) := max 0≤s≤tWn(t) = max0≤k≤nt 1 √ n X 1≤j≤k ξj converges, asn→ ∞, to that of (3) M(t) := max 0≤s≤tW(t).

The distribution ofM(t)will be calculated explicitly below, along with the distributions of several related random variables connected with the Brownian path.

1.3. Transition Probabilities. The mathematical study of Brownian motion arose out of the recognition by Einstein that the random motion of molecules was responsible for the macroscopic phenomenon ofdiffusion. Thus, it should be no surprise that there are deep connections between the theory of Brownian motion and parabolic partial differential equations such as the heat and diffusion equations. At the root of the connection is the Gauss kernel, which is the transition probability function for Brownian motion:

(4) P(Wt+s∈dy|Ws=x) ∆ =pt(x, y)dy= 1 √ 2πtexp{−(y−x) 2_/₂_t}dy.

This equation follows directly from properties (3)–(4) in the definition of a standard Brow-nian motion, and the definition of the normal distribution. The functionpt(y|x) =pt(x, y)

is called theGauss kernel, or sometimes theheat kernel. (In the parlance of the PDE folks, it is thefundamental solutionof the heat equation). Here is why:

Theorem 1. Letf : R → Rbe a continuous, bounded function. Then the unique (continuous)

solutionut(x)to the initial value problem ∂u ∂t = 1 2 ∂2u ∂x2 (5) u0(x) =f(x) (6)

(3)

is given by

(7) ut(x) =Ef(Wtx) =

Z ∞

y=−∞

pt(x, y)f(y)dy.

HereW_txis a Brownian motion started atx.

The equation (??) is called theheat equation. That the PDE (??) has only one solution that satisfies the initial condition (??) follows from themaximum principle: see a PDE text if you are interested. The more important thing is that the solution is given by the expectation formula (??). To see that the right side of (??) actually does solve (??), take the partial derivatives in the PDE (??) under the integral in (??). You then see that the issue boils down to showing that

(8) ∂pt(x, y) ∂t = 1 2 ∂2_p t(x, y) ∂x2 .

Exercise:Verify this.

1.4. Symmetries and Scaling Laws.

Proposition 1. Let{W(t)}t≥0 be a standard Brownian motion. Then each of the following

pro-cesses is also a standard Brownian motion:

{−W(t)}t≥0 (9) {W(t+s)−W(s)}t≥0 (10) {aW(t/a2)}_t≥0 (11) {tW(1/t)}t≥0. (12)

Exercise:Prove this.

The scaling law, in particular, has all manner of important ramifications. It is advisable, when confronted with a problem about Wiener processes, to begin by reflecting on how scaling might affect the answer. Consider,as a first example, themaximum andminimum random variables

M(t) := max{W(s) : 0≤s≤t} and (13)

M−(t) := min{W(s) : 0≤s≤t}. (14)

These are well-defined, because the Wiener process has continuous paths, and continuous functions always attain their maximal and minimal values on compact intervals. Now observe that if the pathW(s) is replaced by its reflection−W(s) then the maximum and the minimum are interchanged and negated. But since−W(s)is again a Wiener process, it follows thatM(t)and−M−(t)have the same distribution:

(4)

Next, consider the implications of Brownian scaling. Fixa >0, and define W∗(t) =aW(t/a2) and M∗(t) = max 0≤s≤tW ∗ (s) = max 0≤s≤taW(s/a 2₎ =aM(t/a2).

By the Brownian scaling property,W∗(s)is a standard Brownian motion, and so the ran-dom variableM∗(t)has the same distribution asM(t). Therefore,

(16) M(t)=D aM(t/a2).

On first sight, this relation appears rather harmless. However, as we shall see in section ??, it implies that the sample pathsW(s) of the Wiener process are, with probability one, nondifferentiable ats= 0.

Exercise: Use Brownian scaling to deduce a scaling law for the first-passage timerandom variablesτ(a)defined as follows:

(17) τ(a) = min{t : W(t) =a}

orτ(a) =∞on the event that the processW(t)never attains the valuea. 2. THEBROWNIAN FILTRATION AND THEMARKOV PROPERTY

Property (??) is a rudimentary form of the Markov propertyof Brownian motion. The Markov property asserts something more: not only is the process{W(t+s)−W(s)}_t≥0

a standard Brownian motion, but it is independent of the path{W(r)}0≤r≤sup to times.

This may be stated more precisely using the language ofσ−algebras. Define (18) F_tW :=Ft:=σ({W(s)}0≤s≤t)

to be the smallestσ−algebra containing all events of the form{W_s ∈B}where0≤s≤t and B ⊂ R is a Borel set. The indexed collection of σ−algebra {FtW}t≥0 is called the

standard filtrationassociated with the Brownian motion.

Example: For eacht >0and for everya∈_R, the event{M(t)> a}is an element ofFW t .

To see this, observe that by path-continuity, (19) {M(t)> a}= [

s∈_Q:0≤s≤t

{W(s)> a}.

HereQdenotes the set of rational numbers. BecauseQis a countable set, the union in (??)

is a countable union. Since each of the events{W(s)> a}in the union is an element of the σ−algebraFW

t , the event{M(t)> a}must also be an element ofFtW.

In general, afiltrationis an increasing family ofσ−algebras{G_t}_t≥0 indexed by timet.

A stochastic processX(t)is said to beadaptedto a filtration{Gt}t≥0 if the random variable

X(t) is measurable relative toG_t for everyt ≥ 0. It is sometimes necessary to consider filtrations other than the standard one, because in some situations there are several sources of randomness (additional independent Brownian motions, Poisson processes, coin tosses, etc.).

(5)

Definition 2. Let{Wt}t≥0 be a Wiener process and{Gt}t≥0 a filtration of the probability

space on which the Wiener process is defined. The filtration is said to beadmissiblefor the Wiener process if (a) the Wiener process is adapted to the filtration, and (b) for everyt≥0, the post-tprocess{Wt+s−Wt}s≥0is independent of theσ−algebraGt.

Note that the standard filtration is embedded in any admissible filtration{G_t}_t≥0, that

is, for eacht,

F_tW ⊂ Gt.

Proposition 2. (Markov Property) If{W(t)}_t≥0is a standard Brownian motion, then the standard

filtration{FW

t }t≥0 is admissible.

Proof of the Markov Property. This is nothing more than a sophisticated restatement of the independent increments property of Brownian motion. Fixs≥0, and consider two events of the form

A=∩n

j=1{W(sj)−W(sj−1)≤xj} ∈ Fs and

B =∩m_j=1{W(tj+s)−W(tj−1+s)≤yj}.

By the independent increments hypothesis, eventsA andB are independent. Events of typeAgenerate theσ−algebraFs, and events of typeB generate the smallestσ−algebra

with respect to which the post-sBrownian motionW(t+s)−W(s)is measurable. Conse-quently, the post-sBrownian motion is independent of theσ−algebraFs.

The Markov property has an important generalization called theStrong Markov Property. This generalization involves the notion of astopping time for a filtration{F_t}_t≥0. A

stop-ping time is defined to be a nonnegative random variableτ such that for each (nonrandom) t≥0the event{τ ≤t}is an element of theσ−algebraFt.

Example:τ(a) := min{t : W(t) = a}is a stopping time relative to the standard filtration. To see this, observe that, because the paths of the Wiener process are continuous, the event {τ(a) ≤t}is identical to the event{M(t) ≥a}. We have already shown that this event is an element ofF_t.

Associated with any stopping timeτ is aσ−algebraFτ, defined to be the collection of

all eventsB such thatB∩ {τ ≤t} ∈ Ftfor every nonrandomt. Informally,Fτ consists of

all events that are “observable” by timeτ. Observe that the random variablesτ,Wτ, and

Mτ are measurable relative toFτ. (Exercise: Prove this.)

Example: Letτ =τa as above, and letB be the event that the Brownian pathW(t)hitsb

before it hitsa. ThenB ∈ F_τ.

Theorem 2. (Strong Markov Property) Let{W(t)}t≥0be a standard Brownian motion, and letτ

be a stopping time relative to the standard filtration, with associated stoppingσ−algebraF_τ. For t≥0, define thepost-τ process

(20) W∗(t) =W(t+τ)−W(τ),

and let{F_t∗}t≥0 be the standard filtration for this process. Then

(a) {W∗(t)}t≥0is a standard Brownian motion; and

(6)

The Strong Markov property holds more generally for arbitrary L´evy processes — see the notes on L´evy processes for a formal statement and complete proof, and also a proof of the following useful corollary:

Corollary 1. Let{Wt}t≥0 be a Brownian motion with admissible filtration{F }t, and let τ be a

stopping time for this filtration. Let{W∗

s}s≥0be a second Brownian motion on the same probability

space that is independent of the stopping fieldFτ. Then the spliced process ˜

Wt=Wt fort≤τ,

(21)

=Wτ+Wt∗−τ fort≥τ

is also a Brownian motion.

The hypothesis thatτ be a stopping time is essential for the truth of the Strong Markov Property. Mistaken application of the Strong Markov Property may lead to intricate and sometimes subtle contradictions. Here is an example: LetTbe the first time that the Wiener path reaches its maximum value up to time1, that is,

T = min{t : W(t) =M(1)}.

Observe thatTis well-defined, by path-continuity, which assures that the set of timest≤1

such thatW(t) =M(1)is closed and nonempty. SinceM(1)is the maximum value attained by the Wiener path up to time1, the post-T pathW∗(s) =W(T +s)−W(T)cannot enter the positive half-line(0,∞)fors ≤1−T. Later we will show thatT < 1almost surely; thus, almost surely,W∗(s)does not immediately enter(0,∞). Now if the Strong Markov Property were true for the random timeT, then it would follow that, almost surely,W(s)

does not immediately enter(0,∞). Since−W(s)is also a Wiener process, we may infer that, almost surely, W(s) does not immediately enter(−∞,0), and so W(s) = 0 for all sin a (random) time interval of positive duration beginning at0. But this is impossible, because with probability one,

W(s)6= 0 for all rational timess >0. 3. EMBEDDEDSIMPLERANDOMWALKS

Lemma 1. Defineτ = min{t >0 : |W_t|= 1}. Then with probability1,τ <∞.

Proof. This should be a familiar argument: I’ll define a sequence of independent Bernoulli trialsGnin such a way that if any of them results in a success, then the pathWtmust escape

from the interval[−1,1]. SetGn={Wn+1−Wn >2}. These events are independent, and

each has probability p := 1−Φ(2) > 0. Since p > 0, infinitely many of the events Gn

will occur (and in fact the numberN of trials until the first success will have the geometric distribution with parameterp). Clearly, ifGnoccurs, thenτ ≤n+ 1.

The lemma guarantees that there will be a first timeτ1 = τ when the Wiener process

has traveled±1from its initial point. Since this time is a stopping time, the post-τ process Wt+τ−Wτ is anindependentWiener process, by the strong Markov property, and so there

will be a first time when it has traveled±1from its starting point, and so on. Because the post-τ process is independent of the stopping fieldFτ, it is in particular independent of

W(τ) =±1, and so the sequence of future±1jumps is independent of the first. By an easy induction argument, the sequence of ±1 jumps made in this sequence are independent

(7)

and identically distributed. Similarly, the sequence of elapsed times are i.i.d. copies ofτ. Formally, defineτ0 = 0and

(22) τn+1 := min{t > τn : |Wt+τn−Wτn|= 1}. The arguments above imply the following.

Proposition 3. The sequenceYn := W(τn)is a simple random walk started atY0 = W0 = 0.

Furthermore, the sequence of random vectors

(W(τn+1)−W(τn), τn+1−τn)

is independent and identically distributed.

Corollary 2. With probability one, the Wiener process visits every real number.

Proof. The recurrence of simple random walk implies thatWt must visit everyinteger, in

fact infinitely many times. Path-continuity and the intermediate value theorem therefore

imply that the path must travel through every real number.

There isn’t anything special about the values±1for the Wiener process — in fact, Brow-nian scaling implies that there is an embedded simple random walk on each discrete lattice (i.e., discrete additive subgroup) ofR. It isn’t hard to see (or to prove, for that matter) that

the embedded simple random walks on the lattices m−1Z “fill out” the Brownian path

in such a way that asm → ∞the polygonal paths gotten by connecting the dots in the embedded simple random walks converge uniformly (on compact time intervals) to the pathWt. This can be used to provide a precise meaning for the assertion made earlier that

Brownian motion is, in some sense, a continuum limit of random walks. We’ll come back to this later in the course (maybe).

The embedding of simple random walks in Brownian motion has other, more subtle ramifications that have to do with Brownian local time. We’ll discuss this when we have a few more tools (in particular, the It ˆo formula) available. For now I’ll just remark that the key is the way that the embedded simple random walks on the nested lattices2−kZfit

together. It is clear that the embedded SRW on2−k−1Zis a subsequence of the embedded

SRW on 2−k

Z. Furthermore, theway that it fits in as a subsequence is exactly the same

(statistically speaking) as the way that the embedded SRW on2−1Z fits into the

embed-ded SRW onZ, by Brownian scaling. Thus, there is an infinite sequence of nested simple

random walks on the lattices2−kZ, fork∈Z, that fill out (and hence, by path-continuity,

determine) the Wiener path. OK, enough for now.

One last remark in connection with Proposition??: There is a more general — and less obvious — theorem of Skorohod to the effect thateverymean zero, finite variance random walk onRis embedded in standard Brownian motion. See sec.??below for more.

4. THEREFLECTIONPRINCIPLE

Denote byMt=M(t)the maximum of the Wiener process up to timet, and byτa=τ(a)

the first passage time to the valuea. Proposition 4.

(23) P{M(t)≥a}=P{τa≤t}= 2P{W(t)> a}= 2−2Φ(a/

√ t).

(8)

The argument will be based on a symmetry principle that may be traced back to the French mathematician D. ANDRE. This is often referred to as the´ reflection principle. The essential point of the argument is this: ifτ(a)< t, thenW(t)is just as likely to beabovethe levelaas to be belowthe levela. Justification of this claim requires the use of the Strong Markov Property. Writeτ =τ(a). By Corollary??above,τ <∞almost surely. Sinceτ is a stopping time, the post-τ process

(24) W∗(t) :=W(τ+t)−W(τ)

is a Wiener process, and is independent of the stopping field F_τ. Consequently, the re-flection{−W∗(t)}t≥0is also a Wiener process, and is independent of the stopping fieldFτ.

Thus, if we were to run the original Wiener processW(s)until the timeτ of first passage to the valueaand then attach notW∗but instead its reflection−W∗, we would again obtain a Wiener process. This new process is formally defined as follows:

˜

W(s) =W(s) fors≤τ, (25)

= 2a−W(s) fors≥τ.

Proposition 5. (Reflection Principle) If{W(t)}_t≥0 is a Wiener process, then so is{W˜(t)}t≥0.

Proof. This is just a special case of Corollary??.

Proof of Proposition??. The reflected processW˜ is a Brownian motion that agrees with the original Brownian motionW up until the first timeτ = τ(a) that the path(s) reaches the levela. In particular,τ is the first passage time to the levelafor the Brownian motionW˜. Hence,

P{τ < t and W(t)< a}=P{τ < t and W˜(t)< a}.

After timeτ, the pathW˜ is gotten by reflecting the pathW in the linew=a. Consequently, on the eventτ < t,W(t)< aif and only ifW˜(t)> a, and so

P{τ < t and W˜(t)< a}=P{τ < t and W(t)> a}.

Combining the last two displayed equalities, and using the fact thatP{W(t) =a}= 0, we obtain

P{τ < a}= 2P{τ < t and W(t)> a}= 2P{W(t)> a}.

Corollary 3. The first-passage time random variableτ(a)is almost surely finite, and has the one-sided stableprobability density function of index1/2:

(26) f(t) = ae

−a2_/2t √

2πt3 .

Essentially the same arguments prove the following. Corollary 4.

(27) P{M(t)∈daandW(t)∈a−db}= 2(a+b) exp{−(a+b) 2_/₂_t} (2π)1/2_t3/2 dadb

It follows, by an easy calculation, that for everytthe random variables|Wt|andMt−Wt

have the same distribution. In fact, the processes |W_t| andMt−Wt have the same joint

(9)

Proposition 6. (P. L´evy) The processes{Mt−Wt}t≥0and{|Wt|}t≥0have the same distributions.

Exercise 1. Prove this. Hints: (A) It is enough to show that the two processes have the same finite-dimensional distributions, that is, that for any finite set of time pointst1, t2, . . . , tkthe

joint distributions of the two processes at the time pointstiare the same. (B) By the Markov

property for the Wiener process, to prove equality of finite-dimensional distributions it is enough to show that the two-dimensional distributions are the same. (C) For this, use the Reflection Principle.

Remark1. The reflection principle and its use in determining the distributions of the max Mtand the first-passage timeτ(a)are really no different from their analogues for simple

random walks, about which you learned in 312. In fact, we could have obtained the results for Brownian motion directly from the corresponding results for simple random walk, by using embedding.

Exercise 2. Brownian motion with absorption.

(A) Define Brownian motion with absorption at0 byYt = Wt∧τ(0), that is, Ytis the

pro-cess that follows the Brownian path until the first visit to0, then sticks at 0forever after. Calculate the transition probability densitiesp0_t(x, y)ofYt.

(B) Define Brownian motion with absorption on[0,1]byZt =Wt∧T, whereT = min{t :

Wt= 0or1}. Calculate the transition probability densitiesqt(x, y)forx, y∈(0,1).

5. WALDIDENTITIES FORBROWNIANMOTION

Proposition 7. Let{W(t)}_t≥0be a standard Wiener process and letG={Gt}t≥0be an admissible

filtration. Then each of the following is a continuous martingale relative toG(withθ∈R):

(a) {W_t}_t≥0

(b) {W2

t −t}t≥0

(c) {exp{θWt−θ2t/2}}t≥0

(d) {exp{iθW_t+θ2t/2}}_t≥0

Consequently, for any bounded stopping timeτ, each fo the following holds: EW(τ) = 0; (28) EW(τ)2=Eτ; (29) Eexp{θW(τ)−θ2τ /2}= 1 ∀θ∈R; and (30) Eexp{iθW(τ) +θ2τ /2}= 1 ∀θ∈_R. (31)

Observe that fornonrandomtimesτ =t, these identities follow from elementary proper-ties of the normal distribution. Notice also that ifτ is anunboundedstopping time, then the identities may fail to be true: for example, ifτ =τ(1)is the first passage time to the value

1, thenW(τ) = 1, and soEW(τ)6= 0. Finally, it is crucial thatτshould be a stopping time: if, for instance,τ = min{t≤1 :W(t) =M(1)}, thenEW(τ) =EM(1)>0.

Proof. The martingale property follows immediately from the independent increments prop-erty. In particular, the definition of an admissible filtration implies thatW(t+s)−W(s)is

(10)

independent ofGs; hence (for instance),

E(exp{θW_t+s−θ2(t+s)/2} | G_s) = exp{θW_s−θ2s/2}E(exp{θW_t+s−Ws−θ2t/2} | Gs) = exp{θWs−θ2s/2}Eexp{θWt+s−Ws−θ2t/2} = exp{θWs−θ2s/2}.

The Wald identities follow immediately from Doob’s optional sampling formula.

Example 1: Fix constantsa, b > 0, and defineT = T−a,b to be the first time tsuch that

W(t) = −aor+b. The random variableT is a finite, but unbounded, stopping time, and so the Wald identities may not be applied directly. However, for each integern ≥ 1, the random variableT ∧nisa bounded stopping time. Consequently,

EW(T∧n) = 0 and EW(T∧n)2=ET∧n.

Now until timeT, the Wiener path remains between the values−aand+b, so the random variables |W(T ∧n)|are uniformly bounded bya+b. Furthermore, by path-continuity, W(T ∧n)→W(T)asn→ ∞. Therefore, by the dominated convergence theorem,

EW(T) =−aP{W(T) =−a}+bP{W(T) =b}= 0. SinceP{W(T) =−a}+P{W(T) =b}= 1, it follows that

(32) P{W(T) =b}= a

a+b.

The dominated convergence theorem also guarantees thatEW(T ∧n)2 → EW(T)2, and the monotone convergence theorem thatET∧n↑ET. Thus,

EW(T)2 =ET. Using (??), one may now easily obtain

(33) ET =ab.

Example 2: Letτ = τ(a)be the first passage time to the value a >0by the Wiener path W(t). As we have seen,τ is a stopping time andτ <∞with probability one, butτ is not bounded. Nevertheless, for anyn <∞, the truncationτ ∧nis a bounded stopping time, and so by the third Wald identity, for anyθ >0,

(34) Eexp{θW(τ ∧n)−θ2(τ ∧n)}= 1.

Because the pathW(t)does not assume a value larger thanauntil after timeτ, the random variablesW(τ ∧n)are uniformly bounded bya, and so the random variables in equation (??) are dominated by the constantexp{θa}. Sinceτ <∞with probability one,τ ∧n→τ asn→ ∞, and by path-continuity, the random variablesW(τ∧n)converge toaasn→ ∞. Therefore, by the dominated convergence theorem,

Eexp{θa−θ2(τ)}= 1. Thus, settingλ=θ2/2, we have

(11)

The only density with this Laplace transform1 is the one–sided stable density given in equation (??). Thus, the Optional Sampling Formula gives us a second proof of (??). Exercise 3. First Passage to a Tilted Line.LetWtbe a standard Wiener process, and define

τ = min{t > 0 : W(t) = a−bt}wherea, b > 0are positive constants. Find the Laplace transform and/or the probability density function ofτ.

Exercise 4. Two-dimensional Brownian motion: First-passage distribution. Let Zt = (Xt, Yt)be a two-dimensional Brownian motion started at the origin(0,0)(that is, the

coor-dinate processesXtandYtare independent standard one-dimensional Wiener processes).

(A) Prove that for each realθ, the processexp{θX_t+iθYt} is a martingale relative to an

admissible filtration.

(B) Deduce the corresponding Wald identity for the stopping timeτ(a) = min{t : Wt=a},

fora >0.

(C) What does this tell you about the distribution ofYτ(a)?

Exercise 5. Eigenfunction expansions. These exercises show how to use Wald identi-ties to obtain eigenfunction expansions (in this case, Fourier expansions) of the transition probability densities of Brownian motion with absorption on the unit interval(0,1). You will need to know that the functions {√2 sinkπx}_k≥1 constitute an orthonormal basis of

L2[0,1]. LetWtbe a Brownian motion started atx ∈[0,1]underPx, and letT = T[0,1]be

the first time thatWt= 0or1.

(A) Use the appropriate martingale (Wald) identity to check that Exsin(kπWt)ek

2_π2_t/2

1{T > t}= sin(kπx).

(B) Deduce that for everyC∞functionuwhich vanishes at the endpointsx = 0,1of the interval, Exu(Wt∧T) = ∞ X k=1 e−k2π2t/2(√2 sin(kπx))ˆu(k)

whereuˆ(k) =√2R₀1u(y) sin(kπy)dyis thekth Fourier coefficient ofu. (C) Conclude that the sub-probability measurePx_{W

t∈dy;T > t}has density

qt(x, y) = ∞ X

k=1

e−k2π2t/22 sin(kπx) sin(kπy). 6. BROWNIANPATHS

In the latter half of the nineteenth century, mathematicians began to encounter (and invent) some rather strange objects. Weierstrass produced a continuous function that is nowhere differentiable. Cantor constructed a subsetC(the “Cantor set”) of the unit inter-val with zero area (Lebesgue measure) that is nevertheless in one-to-one correspondence with the unit interval, and has the further disconcerting property that between any two points of C lies an interval of positive length totally contained in the complement of C. Not all mathematicians were pleased by these new objects. Hermite, for one, remarked that he was “revolted” by this plethora of nondifferentiable functions and bizarre sets.

(12)

With Brownian motion, the strange becomes commonplace. With probability one, the sample paths are nowhere differentiable, and the zero set Z = {t ≤ 1 : W(t) = 0}) is a homeomorphic image of the Cantor set. These facts may be established using only the formula (??), Brownian scaling, the strong Markov property, and elementary arguments. 6.1. Zero Set of a Brownian Path. The zero set is

(36) Z ={t≥0 : W(t) = 0}.

Because the pathW(t)is continuous int, the setZis closed. Furthermore, with probability one the Lebesgue measure of Z is 0, because Fubini’s theorem implies that theexpected Lebesgue measure ofZ is0: E|Z|=E Z ∞ 0 1{0}(Wt)dt = Z ∞ 0 E1{0}(Wt)dt = Z ∞ 0 P{Wt= 0}dt = 0,

where|Z| denotes the Lebesgue measure ofZ. Observe that for any fixed (nonrandom) t > 0, the probability thatt ∈ Z is0, becauseP{W(t) = 0} = 0. Hence, becauseQ+(the

set of positive rationals) is countable,

(37) P{_Q₊∩ Z 6=∅}= 0.

Proposition 8. With probability one, the Brownian pathW(t)has infinitely many zeros in every time interval(0, ε), whereε >0.

Proof. First we show that for everyε >0there is, with probability one, at least one zero in the time interval(0, ε). Recall (equation (??)) that the distribution ofM−(t), the minimum up to timet, is the same as that of−M(t). By formula (??), the probability thatM(ε)>0is one; consequently, the probability thatM−(ε)<0is also one. Thus, with probability one, W(t)assumes both negative and positive values in the time interval(0, ε). Since the path W(t)is continuous, it follows, by the Intermediate Value theorem, that it must assume the value0at some time between the times it takes on its minimum and maximum values in

(0, ε].

We now show that, almost surely, W(t) hasinfinitely many zeros in the time interval

(0, ε). By the preceding paragraph, for each k ∈ _N the probability that there is at least one zero in(0,1/k)is one, and so with probability one there is at least one zero in every

(0,1/k). This implies that, with probability one, there is an infinite sequencetn of zeros

converging to zero: Take any zerot1 ∈(0,1); choosekso large that1/k < t1; take any zero

t2 ∈(0,1/k); and so on.

Proposition 9. With probability one, the zero set Z of a Brownian path is a perfect set, that is, Z is closed, and for every t ∈ Z there is a sequence ofdistinct elements tn ∈ Z such that limn→∞tn=t.

(13)

Proof. ThatZis closed follows from path-continuity, as noted earlier. Fix a rational number q >0(nonrandom), and defineν =νqto be the first timet≥such thatW(t) = 0. Because

W(q)6= 0almost surely, the random variableνqis well-defined and is almost surely strictly

greater thanq. By the Strong Markov Property, the post-νq processW(νq+t)−W(νq)is,

conditional on the stopping fieldF_ν, a Wiener process, and consequently, by Proposition ??, it has infinitely many zeros in every time interval(0, ε), with probability one. Since W(νq) = 0, and since the set of rationals is countable, it follows that, almost surely, the

Wiener pathW(t)has infinitely many zeros in every interval(νq, νq+ε), whereq∈Qand

ε >0.

Now lettbe any zero of the path. Then either there is an increasing sequencetnof zeros

such thattn → t, or there is a real number ε > 0 such that the interval(t−ε, t) is free

of zeros. In the latter case, there is a rational numberq ∈ (t−ε, t), andt = νq. In this

case, by the preceding paragraph, there must be adecreasingsequencetnof zeros such that

tn→t.

It can be shown (this is not especially difficult) that every compact perfect set of Lebesgue measure zero is homeomorphic to the Cantor set. Thus, with probability one, the set of ze-ros of the Brownian pathW(t)in the unit interval is a homeomorphic image of the Cantor set.

6.2. Nondifferentiability of Paths.

Proposition 10. With probability one, the Brownian pathWtis nowhere differentiable.

Proof. This is an adaptation of an argument of DVORETSKY, ERDOS, & KAKUTANI¨ 1961. The theorem itself was first proved by PALEY, WIENER & ZYGMUND in 1931. It suffices to prove that the path Wt is not differentiable at any t ∈ (0,1) (why?). Suppose to the

contrary that for somet∗ ∈(0,1)the path were differentiable att=t∗; then for someε >0

and someC <∞it would be the case that

(38) |W_t−Wt∗| ≤C|t−t∗| for all t∈(t∗−ε, t∗+ε),

that is, the graph ofWtwould lie between two intersecting lines of finite slope in some

neighborhood of their intersection. This in turn would imply, by the triangle inequality, that for infinitely manyk∈_Nthere would be some0≤m≤4ksuch that2

(39) |W((m+i+ 1)/4k)−W((m+i)/4k)| ≤16C/4k for each i= 0,1,2.

I’ll show that the probability of this event is0. LetBm,k = Bk,m(C)be the event that (??)

holds, and set Bk = ∪m≤4kB_m,k; then by the Borel-Cantelli lemma it is enough to show that (for eachC <∞)

(40)

∞ X

k=1

P(Bm)<∞.

The trick isBrownian scaling: in particular, for all s, t ≥ 0the increment Wt+s −Wtis

Gaussian with mean0 and standard deviation √s. Consequently, since the three incre-ments in (??) are independent, each with standard deviation 2−k_{, and since the standard}

normal density is bounded above by1/√2π,

P(Bm,k) =P{|Z| ≤16C/2k}3≤(32C/2k

√

2π)3.

(14)

This implies that

P(Bk)≤4k(32C/2k

√

2π)3)≤(32C/√2π)3/2k.

This is obviously summable ink.

Exercise 6. Local Maxima of the Brownian Path. A continuous functionf(t) is said to have alocal maximumatt=sif there existsε >0such that

f(t)≤f(s) for allt∈(s−ε, s+ε).

(A) Prove that if the Brownian pathW(t)has a local maximumwat some times >0then, with probability one, it cannot have a local maximum at some later times∗with the same valuew. HINT: Use the Strong Markov Property and the fact that the rational numbers are countable and dense in[0,∞).

(B) Prove that, with probability one, the times of local maxima of the Brownian pathW(t)

are dense in[0,∞)

(C) Prove that, with probability one, the set of local maxima of the Brownian pathW(t)is countable. HINT: Use the result of part (A) to show that for each local maximum(s, Ws)

there is an interval(s−ε, s+ε)such that

Wt< Ws for allt∈(s−ε, s+ε), t6=s.

7. QUADRATICVARIATION

Fix t > 0, and let Π = {t0, t1, t2, . . . , tn} be a partition of the interval[0, t], that is, an

increasing sequence0 = t0 < t1 < t2 < · · · < tn = t. The meshof a partitionΠ is the

length of its longest intervalti −ti−1. IfΠ is a partition of [0, t]and if 0 < s < t, then

therestriction of Πto [0, s](or the restriction to [s, t]) is defined in the obvious way: just terminate the sequencetjat the largest entry befores, and appends. Say that a partitionΠ0

is arefinementof the partitionΠif the sequence of pointstithat definesΠis a subsequence

of the sequencet0_j that defines Π0. A nested sequence of partitions is a sequence Πn such

that each is a refinement of its predecessor. For any partitionΠand any continuous-time stochastic processXt, define thequadratic variationofXrelative toΠby

(41) QV(X; Π) =

n X

j=1

(X(tj)−X(tj−1))2.

Theorem 3. LetΠnbe a nested sequence of partitions of the unit interval[0,1]with mesh→0as

n→ ∞. LetWtbe a standard Wiener process. Then with probability one,

(42) lim

n→∞QV(W; Πn) = 1.

Note 1. It can be shown, without too much additional difficulty, that ifΠt_nis the restriction ofΠnto[0, t]then with probability one, for allt∈[0,1],

lim

n→∞QV(W; Π t n) =t.

(15)

Before giving the proof of Theorem??, I’ll discuss a much simpler special case3, where the reason for the convergence is more transparent. For each natural numbern, define the nthdyadic partitionDn[0, t]to be the partition consisting of thedyadic rationalsk/2nof depth

n(herekis an integer) that are between0andt(withtadded if it is not a dyadic rational of depthn). LetX(s)be any process indexed bys.

Proposition 11. Let{W(t)}t≥0be a standard Brownian motion. For eacht >0, with probability

one,

(43) lim

n→∞QV(W;Dn[0, t]) =t.

Proof. Proof of Proposition ??. First let’s prove convergence in probability. To simplify things, assume thatt= 1. Then for eachn≥1, the random variables

ξn,k ∆

= 2n(W(k/2n)−W((k−1)/2n))2, k= 1,2, . . . ,2n

are independent, identically distributedχ2 _{with one degree of freedom (that is, they are}

distributed as the square of a standard normal random variable). Observe thatEξn,k = 1.

Now QV(W;Dn[0,1]) = 2−n 2n X k=1 ξn,k.

The right side of this equation is the average of2nindependent, identically distributed ran-dom variables, and so the Weak Law of Large Numbers implies convergence in probability to the mean of theχ2distribution with one degree of freedom, which equals1.

The stronger statement that the convergence holds with probability one can easily be deduced from the Chebyshev inequality and the Borel–Cantelli lemma. The Chebyshev inequality and Brownian scaling implies that

P{|QV(W;Dn[0,1])−1| ≥ε}=P{| 2n X k=1 (ξn,k−1)| ≥2nε} ≤ Eξ2_1,1 4n_ε2 .

Since P∞_n=11/4n < ∞, the Borel–Cantelli lemma implies that, with probability one, the event|QV(W;D_n[0,1])−1| ≥ εoccurs for at most finitely manyn. Since ε > 0 can be chosen arbitrarily small, it follows thatlimn→∞QV(W;Dn[0,1]) = 1 almost surely. The

same argument shows that for any dyadic rationalt∈[0,1], the convergence (??) holds a.s. Exercise 7. Prove that if (??) holds a.s. for each dyadic rational in the unit interval, then with probability one it holds for allt.

In general, when the partitionΠis not a dyadic partition, the summands in the formula (??) for the quadratic variation are (whenX = W is a Wiener process) still independent χ2 random variables, but they are no longer identically distributed, and so Chebyshev’s inequality by itself won’t always be good enough to prove a.s. convergence. The route we’ll take is completely different: we’ll show that for nested partitionsΠn the sequence

QV(W; Πn)is areverse martingalerelative to an appropriate filtrationGn.

3_{Only the special case will be needed for the It ˆo calculus. However, it will be of crucial importance — it is,}

(16)

Lemma 2. Letξ, ζbe independent Gaussian random variables with means0and variancesσ2 ξ, σζ2,

respectively. LetGbe anyσ−algebra such that the random variablesξ2andζ2 areG−measurable, but such thatsgn(ζ)andsgn(ζ)are independent ofG. Then

(44) E((ξ+ζ)2| G) =ξ2+ζ2.

Proof. Expand the square, and use the fact thatξ2andζ2areG−measurable to extract them from the conditional expectation. What’s left is

2E(ξζ| G) = 2E(sgn(ξ)sgn(ζ)|ξ| |ζ| | G) = 2|ξ| |ζ|E(sgn(ξ)sgn(ζ)| G) = 0,

because sgn(ξ)and sgn(ζ)are independent ofG.

Proof of Theorem??. Without loss of generality, we may assume that each partitionΠn+1is

gotten by splitting one interval ofΠn, and thatΠ0is the trivial partition of[0,1](consisting of the single interval [0,1]). Thus, Πn consists of n+ 1 nonoverlapping intervals Jkn = [tn_k₋₁, tn_k]. Define

ξn,k =W(tnk)−W(tnk−1),

and for eachn≥0letGnbe theσ−algebra generated by the random variablesξ_m,k2 , where

m≥nandk∈[m+ 1]. Theσ−algebrasGnare decreasing inn, so they form areverse

filtra-tion. By Lemma??, the random variablesQV(W; Πn)form a reverse martingale relative to

the reverse filtrationGn, that is, for eachn,

E(QV(W; Πn)| Gn+1) =QV(W; Πn+1). By the reverse martingale convergence theorem,

lim

n→∞QV(W; Πn) =E(QV(W; Π0)| ∩n≥0Gn) =E(W 2

1 | ∩n≥0Gn) almost surely.

Exercise 8. Prove that the limit is constant a.s., and that the constant is1.

8. SKOROHOD’STHEOREM

In section??we showed that there are simple random walks embedded in the Wiener path. Skorohod discovered thatanymean zero, finite variance random walk is also em-bedded. To prove this it suffices (in view of the strong Markov property) to show that for any mean0, finite variance distributionF there is a stopping timeT such thatWT has

distributionF.

Theorem 4. LetF be any probability distribution on the real line with mean0and varianceσ2< ∞, and let W(t) be a standard Wiener process. There is a stopping time T (for the standard filtration) with expectationET =σ2such that the random variableW(T)has distributionF. Proof for finitely supported distributions. Suppose first thatFis supported by two pointsa <

0 < b, to which it attaches positive probabilitiesp, q. SinceF has mean0, it must be that pa+qb = 0. DefineT to be the first time thatW reaches either aor b; then Wald’s first identity implies thatp = P{W_T = a}andq = P{W_T = b} (Exercise!). Thus, two-point distributions are embedded.

The general case of a probability distributionF with finite support can now be proved by induction on the number of points in the support. To see how the induction step goes,

(17)

consider the case of a three-point distributionF which puts probabilitiespa, pb, pcon the

pointsa <0≤b < c. (Note: Since the distributionFhas mean0, there must be at least one point in the support on either side of0.) Definedto be the weighted average ofbandc:

d= bpb+cpc

pb+pc

.

Now define two stopping times: (A) Letνbe the first time thatW reaches eitheraord. (B) IfWν =a, letT =ν; but ifWν =d, then letT be the first time afterνthatW reaches either

borc.

Exercise 9. (A) Check that the distribution ofWT isF.

(B) Check thatET =σ2.

(C) Complete the induction step.

Proof for the uniform distribution on (-1,1). The general case is proved using the special case of finitely supported distributions by taking a limit. I’ll do only the special case of the uniform distribution on[−1,1]. Define a sequence of stopping timesτnas follows:

τ1= min{t >0 : W(t) =±1/2}

τn+1= min{t > τn : W(t)−W(τn) =±1/2n+1}.

By symmetry, the random variable W(τ1) takes the values ±1/2 with probabilities 1/2

each. Similarly, by the Strong Markov Property and induction onn, the random variable W(τn)takes each of the valuesk/2n, wherekis anoddnumber between−2nand+2n, with

probability1/2n. Notice that these values are equally spaced in the interval[−1,1], and that as→ ∞the values fill the interval. Consequently, the distribution ofW(τn)converges to

the uniform distribution on[−1,1]asn→ ∞.

The stopping timesτnare clearly increasing withn. Do they converge to a finite value?

Yes, because they are all bounded byT−1,1, the first passage time to one of the values±1.

(Exercise: Why?) Consequently, τ := limτn = supτn is finite with probability one. By

path-continuity,W(τn) → W(τ) almost surely. As we have seen, the distributions of the

random variables W(τn) approach the uniform distribution on [−1,1] as n → ∞, so it follows that the random variableW(τ)is uniformly distributed on[−1,1]. Exercise 10. Show that ifτnis an increasing sequence of stopping times such thatτ = limτn

is finite with probability one, thenτ is a stopping time. 9. L ´EVY’SCONSTRUCTION

Each of the three principal figures in the development of the mathematical theory of Brownian motion – Wiener, Kolmogorov, and L´evy – gave a construction of the process. Of the three, Levy’s is the shortest, and requires least in the way of preparation. It is based on a very simple property of normal distributions:

Lemma 3. Let X, Y be independent random variables, each normally distributed, with mean 0

and variancess > 0andt > 0, respectively. Then the conditional distribution ofX given that X+Y =zis

(45) D(X|X+Y =z) =Normal(zs/(s+t), st/(s+t)).

(18)

Levy’s idea was that one could use this fact to build a sequence of successive approxi-mationsWn(t) to the Wiener processW(t), each a (random) polygonal function linear in each of the time intervals[k/2n,(k+ 1)/2n]. This would be done in such a way that the val-ues ofWn(t)at the dyadic rationalsk/2mof levelm, wherek= 0,1, . . . ,2mwould remain the same for alln≥m. Thus, the joint distribution of the random variables

Wm(0), Wm(1/2m), Wm(2/2m), . . . , Wm(1)

should be chosen in such a way that there joint distribution is the same as that for the corresponding points on the Wiener path. The trick is that one can arrange for this joint distribution by choosing eachWm(k/2m), wherekisodd, from theconditionaldistribution

ofW(k/2m)given the valuesW((k−1)/2m) =Wm−1((k−1)/2m)andW((k+ 1)/2m) =

Wm−1((k+ 1)/2m). This is where Lemma??comes in. It guarantees that the conditional

variance does not depend on the valuesWm−1((k−1)/2m) andWm−1((k+ 1)/2m), and

that the conditional mean is just the average of these values; in particular, sinceWm−1(t)is

linear between times(k−1)/2mand(k+ 1)/2m, the conditional mean ofWm(k/2m)given the values Wm−1((k−1)/2m) andWm−1((k+ 1)/2m) is Wm−1(k/2m)! Hence, to obtain

Wm+1(t)fromWm(t): (46) Wm+1(t) =Wm(t) + 2m X k=1 Zm+1,kGm,k(t)/2(m+2)/2

where the random variablesZm,kare independent, with the unit normal distribution, and

the functionsGm,k(t)are theSchauderfunctions:

Gm,k(t) = 2m+1t−(k−1) for(k−1)/2m+1 ≤t≤k/2m+1;

(47)

=k+ 1−2m+1t fork/2m+1 ≤t≤(k+ 1)/2m+1;

= 0 otherwise.

The construction of the initial approximationW0(t)is trivial: since the distribution ofW(t)

is Normal(0,1), merely set

(48) W0(t) =Z0,1t,

whereZ0,1is a unit normal.

Theorem 5. (L´evy) If the random variables Zm,k are independent, identically distributed with

common distributionN(0,1), then with probability one, the infinite series

(49) W(t) :=Z0,1t+ ∞ X m=0 2m X k=1 Zm+1,kGm,k(t)/2(m+2)/2