The Markowitz inconsistency - Transition probabilites, or what can break will eventually break

A.2 Transition probabilites, or what can break will eventually break

3.5 The Markowitz inconsistency

Assume that someone tells you that the probability of an event is exactly zero. You ask him where he got this from. "Baal told me" is the answer. In such case, the person is coherent, but would be deemed unrealistic by non- Baalists. But if on the other hand, the person tells you "I estimated it to be zero," we have a problem. The

person is both unrealistic and inconsistent. Something estimated needs to have an estimation error. So proba-bility cannot be zero if it is estimated, its lower bound is linked to the estimation error; the higher the estimation error, the higher the probability, up to a point. As with Laplace’s argument of total ignorance, an infi nite esti-mation error pushes the probability toward Â¡. We will return to the implication of the mistake; take for now that anything estimating a parameter and then putting it into an equation is diﬀerent from estimating the equa-tion across parameters. And Markowitz was inconsistent

by starting his "seminal" paper with "Assume you know E and V" (that is, the expectation and the variance).

At the end of the paper he accepts that they need to be estimated, and what is worse, with a combination of sta-tistical techniques and the "judgment of practical men."

Well, if these parameters need to be estimated, with an error, then the derivations need to be written diﬀerently and, of course, we would have no such model. Economic models are extremely fragile to assumptions, in the sense that a slight alteration in these assumptions can lead to extremely consequential diﬀerences in the results.

4 Large Numbers and Convergence in the Real World

The Law of Large Numbers and The Central Limit Theorem are the foundation of modern statistics: The behavior of the sum of random variables allows us to get to the asymptote and use handy asymptotic properties, that is, Platonic distributions. But the problem is that in the real world we never get to the asymptote, we just get “close”. Some distributions get close quickly, others very slowly (even if they have finite variance). Recall from Chapter 1 that the quality of an estimator is tied to its replicability outside the set in which it was derived: this is the basis of the law of large numbers.

4.1 The Law of Large Numbers Under Fat Tails

How do you reach the limit?

The common interpretation of the weak law of large num-bers is as follows.

By the weak law of large numbers, consider a sum of random variables X1, X2,..., XN independent and iden-tically distributed with finite mean m, that is E[Xi] <

1, then N¹

1iNXi converges to min probability, as N ! 1. But the problem of convergence in probabil-ity, as we will see later, is that it does not take place in the tails of the distribution (diﬀerent parts of the distri-bution have diﬀerent speeds). This point is quite central and will be examined later with a deeper mathematical discussions on limits in Chapter x. We limit it here to intuitive presentations of turkey surprises.

(Hint: we will need to look at the limit without the common route of Chebychev’s inequality which requires E[Xi²] < 1 . Chebychev’s inequality and similar ones eliminate the probabilities of some tail events).

So long as there is a mean, observations should at some point reveal it.

Figure 4.1: How thin tails (Gaussian) and fat tails (1<

↵2) converge to the mean.

The law of iterated logarithms. For the “thin-tailed” con-ditions, we can see in Figure x how by the law of iterated logarithm, for xi i.i.d. distributed with mean 0 and uni-tary variance, lim sup

n!1 acceptably narrow cone limiting the fluctuation of the sum.

Speed of convergence. Let us examine the speed of con-vergence of the average _N¹ P

1iNXi. For a Gaus-sian distribution (m, ), the characteristic function for the convolution is '(t/N)^N=

✓

e^imt^N ^{s2 t2}^{2N 2}

◆N

, which, derived twice at 0 yields ( i)^{2 @}_@t²2^c i^@c_@t/. t ! 0 which produces the standard deviation (n) = ^p⁽¹⁾_N so one can say that sum “converges” at a speedp

Another approach consists in expanding ' and letting N go to infinity

Nlim!1

✓

e^imt^N ^{s2 t2}^{2N 2}

◆N

= e^imt

Now e^imt is the characteristic function of the degen-erate distribution at m, with density p(x) = (m x) where is the Dirac delta with values zero except at the point m x. (Note that the strong law of large numbers implies that convergence takes place almost everywhere except for a set of probability 0; for that the same result should be obtained for all values of t).

But things are far more complicated with power laws.

Let us repeat the exercise for a Pareto distribution with density L^↵x ^{1 ↵}↵, x> L,

where E is the exponential integral E; En(z) = R₁ limit only exists for ↵ >1.

Setting L = 1 to scale, the standard deviation ↵(N ) for the N -average becomes, for ↵ >2

↵(N ) = 1 as with the Gaussian, which is a sucker’s trap. For we should be careful in interpreting ↵(N ), which will be very volatile since ↵(1) is already very volatile and does not reveal itself easily in realizations of the pro-cess. In fact, let p(.) be the PDF of a Pareto distri-bution with mean m, standard deviation , minimum value L and exponent ↵, ↵ the expected mean de-viation of the variance for a given ↵ will be ↵ =

Figure 4.2: The distribution (histogram) of the standard de-viation of the sum of N=100 ↵=13/6. The second graph shows the entire span of realizations.

Absence of Useful Theory:. As to situations, central sit-uations, where 1< ↵ <2, we are left hanging analytically (but we can do something about it in the next section).

We will return to the problem in our treatment of the preasymptotics of the central limit theorem.

But we saw in ??.?? that the volatility of the mean is_{↵ 1}^↵ sand the mean deviation of the mean deviation, that is, the volatility of the volatility of mean is 2(↵ 1)^{↵ 2}↵^{1 ↵}s , where s is the scale of the distribution. As we get close to ↵ = 1 the mean becomes more and more volatile in realizations for a given scale. This is not trivial since we are not interested in the speed of convergence per se

given a variance, rather the ability of a sample to deliver a meaningful estimate of some total properties.

Intuitively, the law of large numbers needs an infinite ob-servations to converge at ↵=1. So, if it ever works, it would operate at a >20 times slower rate for an “ob-served” ↵ of 1.15 than for an exponent of 3. To make up for measurement errors on the ↵, as a rough heuristic, just assume that one needs > 400 times the observations.

Indeed, 400 times! (The point of what we mean by

“rate” will be revisited with the discussion of the Large Deviation Principle and the Cramer rate function in X.x;

we need a bit more refinement of the idea of tail exposure for the sum of random variables).

Comparing N = 1 to N = 2 for a symmetric power law with 1< ↵ 2 .

Let (t) be the characteristic function of the symmetric Student T with ↵ degrees of freedom. After two-fold con-volution of the average we get:

(t/2)²=

We can get an explicit density by inverse Fourier trans-formation of , which yields the following

p2,↵(x) = where2F1is the hypergeometric function:

2F1(a, b; c; z) = X1 k=0

(a)k(b)k/(c)kz^k. k!

We can compare the twice-summed density to the initial one (with notation: pN(x)= P(PN

From there, we see that in the Cauchy case (↵=1) the sum conserves the density, so

p1,1(x) = p2,1(x) = 1

⇡ (1 + x²)

Let us use the ratio of mean deviations; since the mean is 0,

Figure 4.3: Preasymptotics of the ratio of mean deviations.

But one should note that mean deviations themselves are extremely high in the neighborhood of #1. So we have a

“sort of” double convergence topn: convergence at higher n and convergence at higher ↵.

The double eﬀect of summing fat tailed ran-dom variables: The summation of ranran-dom vari-ables performs two simultaneous actions, one, the

“thinning” of the tails by the CLT for a finite vari-ance distribution (or convergence to some basin of at-traction for infinite variance classes); and the other, the lowering of the dispersion by the LLN. Both ef-fects are fast under thinner tails, and slow under fat tails. But there is a third eﬀect: the dispersion of observations for n=1 is itself much higher under fat tails. Fatter tails for power laws come with higher expected mean deviation.

4.2 Preasymptotics and Central Limit

In document Probability, Fat tails and Antifrigality (Page 53-57)