Non Parametric Inference
Maura Mezzetti
Department of Economics and Finance Universit`a Tor Vergata
Outline
1
Inverse distribution function (quantile function)
2
Nonparametric Inference
3
Robust Statistics
Inverse distribution function
Theorem: Let U be a uniform random variable on (0, 1). Let X
be a continuous random variable with cumulative distribution
function (cdf) F (x ). Let Y be defined such that Y = F
−1(U). Y
has c.d.f. equal to F .
Inverse distribution function
Inverse distribution function
Why nonparametric statistics?
While in many situations parametric assumptions are reasonable (e.g. assumption of Normal distribution for the background noise), we often have no prior knowledge of the underlying distributions.
In such situations, the use of parametric statistics can give
misleading or even wrong results. We need statistical procedures
which are insensitive to the model assumptions in the sense that
the procedures retain their properties in the neighborhood of the
model assumptions.
What is the nonparametric inference?
The basic idea of nonparametric inference is to use data to infer an unknown quantity while making as few assumptions as possible.
Usually, this means using statistical models that are
infinite-dimensional. Indeed, a better name for nonparametric
inference might be infinite-dimensional inference. But it is difficult
to give a precise definition of nonparametric inference. For the
purposes of this course, we will use the phrase nonparametric
inference to refer to a set of modern statistical methods that aim
to keep the number of underlying assumptions as weak as possible.
What is the advantage of nonparametric statistics?
The rapid and continuous development of nonparametric statistical procedures over the past six decades is due to the following
advantages enjoyed by nonparametric techniques
Require few assumptions about the underlying populations from which the data are obtained
It enables the user to obtain exact p − values for tests, exact coverage probabilities for confidence regions, and exact experimentwise error rates for multiple comparison procedures.
easy to understand (often)
Usually they are only slightly less efficient than their normal competitors when the underlying populations are normal, and they can be mildly or wildly more efficient than these
competitors when the underlying populations are not normal.
insensitive to outliers
What is the advantage of nonparametric statistics?
Because many nonparametric approaches require just the ranks of the observations, rather than the actual magnitude of the
observations, they are applicable in many situations where normal
theory procedures cannot be utilized.
The empirical distribution function
We will begin with the problem of estimating a CDF (cumulative distribution function)
Suppose X ˜ F , where F (x ) = P(X ≤ x ) is a distribution function
The empirical distribution function, ˆ F , is the CDF that puts mass 1/n at each data point x
iF (x ) = ˆ 1 n
n
X
i =1
I (x
i≤ x)
where I is the indicator function
Properties of ˆ F
At any fixed value of x, E ( ˆ F (x )) = F (x ) Var ( ˆ F (x )) =
1nF (x )(1 − F (x ))
Note that these two facts imply that F (x ) −→ ˆ
PF (x )
An even stronger proof of convergence is given by the Glivenko-Cantelli Theorem:
sup
x| ˆ F (x ) − F (x )| −→
a.s.0
Non parametric test
In order to be able to employ the test proposed below, we have to make the supplementary (but mild) assumption that F is
continuous. Thus the hypothesis to be tested here is
H
0: F (x ) = F
0(x ) a given continuous d.f., against the alternative H
0: F (x ) 6= F
0(x ) (in the sense that F (x ) 6= F
0(x ) for at least one one x . Define the random variable D
nas
D
n= sup
x| ˆ F (x ) − F (x )|
Kolmogorov test
Idea: If the difference between the sample and the theoretical distribution functions is severe, the null hypothesis H
0is rejected.
Statistic: The probability distribution of D
nis not one of the well-known models. Its probabilities are given in a specific table for small n, while an asymptotic result is applied for big n.
Rule: Critical region of the form D
n(x ) ≥ k
Kolmogorov One-sample test
In order for this determination to be possible, we would have to know the distribution of D
n, under H
0, or of some known multiple of it. It has been shown in the literature that
P( √
nD
n≤ x|H
0) −−−−→ n → ∞
∞
X
j =−∞
(−1)
je
−2j2x2, x > 0
Thus for large n, the right-hand side of previous equation may be
used for the purpose of determining critical region. The test
employed above is known as the Kolmogorov one-sample test.
Kolmogorov-Smirnov Two sample test
The testing hypothesis problem just described is of limited practical importance. What arise naturally in practice are problems of the following type: Let X
i, i = 1, . . . , m be i.i.d. r.v. with continuous but unknown d.f. F and let Y
j, j = 1, . . . , n be i.i.d. r.v. with continuous but unknown d.f. G . The two random samples are assumed to be independent and the hypothesis of interest here is
H
0: F = G . One possible alternative is the following:
H
1: F 6= G
Kolmogorov-Smirnov Two sample test
Kolmogorov-Smirnov Two sample test
Robustness
Any statistical procedure should possess the following desirable features:
It has reasonably relative efficiency under the assumed model It is robust in the sense that small deviations from the assumed model assumptions should impair the perfomance only slighly
Somewhat larger deviations from the model should not a
cause a catastrophe
Robustness
In addition to the classical concept of efficiency, new concepts are introduced to de- scribe
the local stability of a statistical procedure (the influence function and derived quantities)
its global reliability or safety (the breakdown point).
Sample median
x
(1), x
(2), . . . , x
(n)denotes a sample in ascending order.
Definition. The (sample or empirical) median denoted by Me, is given by
Me =
( x
(n+12 )
if n is odd x
(n2)
+ x
(n2+1)