BASIC TERMS OF STATISTICAL INFERENCE - Statistics for Mining EngineerinG 1

An essential task of mathematical statistics is statistical inference. It can only be applied in relation to the outcomes of random experiments. Two cardinal parts of this field of science are the theory of estimation and the theory of the verification of statistical hypotheses.

29 Semi-Markov processes were introduced by Levy (1954) and Smith (1955). Takács (1954, 1955) inves-tigated similar processes. The foundations of the theory of semi-Markov processes were mainly laid by Pyke (1961a, b), Pyke and Schaufele (1964), and Korolyuk and Turbin (1976).

Book.indb 35

Book.indb 35 12/9/2013 12:22:43 PM12/9/2013 12:22:43 PM

The theory of estimation deals with the assessment of unknown parameters of the general population that are based on a random experiment.

1.3.1 Estimator

A basic term of the theory of estimation is an estimator. This is understood as a clearly defined function of the outcomes of a random trial implied by an unknown parameter θ of the general population. If so, the estimator is a random variable.

Notice that if the point of interest is an unknown parameter θ of the general population and that a random sample of size n is taken to estimate its value, then by applying the estima-tor T_n we have the following situation:

T_n= f (X1, X₂, …, X_n | θ) (1.88) because every element of the sample can be treated as a random variable.

A value calculated based on a random sample taken t_n of the estimator is called an estimate of the unknown parameter θ. This estimate is given by the equation:

t_n ff( ,xx xxx₂, ...,xx_n) (1.89) It is a number, the deterministic value.

1.3.2 Properties of estimators and methods of their construction

Theoretically, an infinite number of estimators can be constructed. Some of them will give a worse assessment of the unknown parameter, some—better. Thus, we must have the pos-sibility of making an evaluation of which estimator is better and which is worse. Or, to be more precise, which estimator has better properties that will usually yield better esti-mates because the estimator is a function. This gives us a tool with which to select a good estimator.

There are two important items connected with each estimator, namely:

• the error of estimation

• the properties that characterise the selected estimator.

It is obvious that one can be almost certain that when making an estimation an error will occur because a statistical inference is made based on a trial only. This error is called the error of estimation.

It can be defined as:

d = Tn – t_n (1.90)

The right side of the equation determines a certain random variable because the function of a random variable is also a random variable. Therefore, the left side of the equation is also a random variable. Therefore, the estimation error is a random variable. It has its own prob-ability distribution and we are able analyse its basic parameters.

To achieve a precise estimation, i.e. to ensure a small error of estimation, it is necessary to pay attention to both the correct sampling and the selection of an estimator with good statistical properties.

A good estimator should be, among the other things, characterised by following properties:

• unbiasedness

• consistency

• efficiency

• sufficiency.

Book.indb 36

Book.indb 36 12/9/2013 12:22:44 PM12/9/2013 12:22:44 PM

It is said that an estimator T_n of parameter θ is unbiased if its expected value is the same as the value of the estimated parameter³⁰, i.e.

E(T_n) = θ (1.91)

Notice that the expected value of a random variable is a deterministic value³¹.

An unbiased estimator allows the unknown parameter to be estimated without systematic errors.

The difference:

Δ = E(Tn) – θ (1.92)

is called a bias of the estimator T_n. Obviously, the bias of an unbiased estimator is zero.

Let us consider the estimation of an unknown expected value m in a certain general popu-lation . Let us use the arithmetic mean formula:

x n x_i

This estimator is an unbiased one since:

E E

The second statistical parameter used most frequently, besides the expected value, is the standard deviation σ or its square—the variance. Presume now that for the estimation of the unknown variance σ² of the general population, the following estimator was applied:

s n_i ⁱ

Let us test the bias of the estimator above. Here we have:

E E

Thus, the above estimator is a biased one. Its bias is:

E n

n n

[ (s²( )])] ² 1 ²₂ ²₂ 1 ² X −σ²= − σ −σ²= − σ

30 There are more general notions of bias and unbiasedness. Here ‘bias’ is de facto ‘mean-bias’ in order to distinguish mean-bias from the other notions with the most noteworthy ones being ‘median-unbiased’

estimators. See for example Rojo (2012).

31 In some statistical investigations, a randomisation of the expected value is made, i.e. we treat that value as a random variable although this is a targeted exception from the rule.

Book.indb 37

Book.indb 37 12/9/2013 12:22:44 PM12/9/2013 12:22:44 PM

Notice, that when the sample size taken is large, its bias is:

Therefore, this estimator is asymptotically unbiased for this reason. It can be applied when the sample size is large; practically n > 30.

By analysing the bias of the investigated estimator, a conclusion can be formulated that it gives estimates of the variance that are too low.

It is easy to prove that an estimator:

2 2

of the unknown variance of the general population is an unbiased one. It can be applied in any sample size. where: s_i²—the estimate of variance within the i-th group

x—the arithmetic mean of the whole sample x_i—the arithmetic mean of the i-th group s²( )x_i_i —the variance between groups.

By looking more carefully at pattern (1.96), a simple conclusion can be drawn—the more differentiated the population, the greater the value of its variance. The relationship (1.96) is called the ‘variation identity’ (Sobczyk 1996 p. 46). This pattern is useful in the calculation of some combined machinery systems (Czaplicki 2010a p. 230) as well as in calculations con-nected with the homogeneity of shovel-truck systems (ibidem p. 266).

The second important property that must be investigated before the application of a given estimator is its consistency.

It is said that estimator T_n of the parameter θ is consistent if it converges in probability (stochastic convergence) to the true value of the parameter, i.e. the following equation holds:

limn P

→∞ {|TTT_n−θ | <| }=1 εε>0 (1.97) Looking at this relationship, one may easily come to the conclusion that enlarging the sam-ple size leads to a situation in which the estimates that are obtained will be closer and closer to the real value of the unknown parameter θ.

Suppose one has a sequence of observations {x₁, x₂, …} from a normal N(μ, σ) distribu-tion. In order to estimate the unknown expected value μ, one uses the sample mean deter-mined by formula (1.93). Now assume that every element of the sample is a random variable.

If so, the estimator (1.93) becomes a random variable. Denote it by T_n. From the properties of the normal distribution, we know that T_n is itself normally distributed with the mean μ and the variance σ²/n. Equivalently, the random variable (T_n− μ)/(σ/ n ) has a standard nor-mal distribution. Therefore, the following relationship holds:

P(|T_n− μ | ≥ ε ) = P[(|T_n− μ | n / σ ≥ (ε n /σ)] = 2 [1 − Φ_N(ε n /σ)] → 0

Book.indb 38

Book.indb 38 12/9/2013 12:22:48 PM12/9/2013 12:22:48 PM

as n tends to infinity for any fixed ε > 0. Thus, the estimator Tn of the sample mean is consist-ent for the population mean μ.

An estimator can be unbiased but not consistent. For example, for independent and identi-cally distributed³² random variables that are components of a sample {X₁, ..., X_n}, one can use T_n= x1 as the estimator of the mean E(X).

Alternatively, an estimator can be biased but consistent. For example, if the mean is esti-mated by the formula [(1/n) + (Σxi /n)], this estimator is biased, but as sample size n tends to infinity, it approaches the correct value and so it is consistent.

A significant property of estimator is its efficiency.

A given estimator is the most efficient if it is unbiased and of the lowest variance among all of the possible the unbiased estimators constructed on samples.

A small value of the variance of a given estimator ensures a small dispersion of the esti-mates of the unknown value of the parameter that we can obtain from it.

When investigating the efficiency of a given estimator, we compare at least the variances between two estimators of the same parameter. The estimator with a lower variance is more efficient than the other.

Compare, for instance, an efficiency of the estimators of variances basing on the mean and the median. We have:

Ef D

n n

= 2² = ₂ =

2 0 64 ( )x

(MeMM ) .

πσ π≅

The final property of estimators to consider is sufficiency.

An estimator of an unknown parameter is sufficient if it contains all of the information comprised in a sample taken and there is no other estimator that gives more information on the parameter being estimated.

Consider, for example, two unbiased estimators of the expected value E(X). One defined by formula (1.93) and the second determined by the pattern:

x 1

2(xx_min++xx_max),

where x_min and x_max are the lowest and the highest value in the sample taken, respectively.

The estimator x is insufficient because it takes only two values from the sample. The arith-metic mean takes all of the sample elements into account.

Further considerations concerning the theory and practice of estimation will be conducted in Chapter 4 where a synthesis of the information obtained from a statistical investigation is analysed.

1.3.3 Statistical hypotheses and their types

The basic terms in the theory of verification of statistical hypotheses are: statistical hypoth-esis³³ and statistical test.

A statistical hypothesis is any conjecture (supposition) concerning the general population.

In practice, we almost always have some information on the population of interest, e.g. the investigated random variable is a continuous one and we know its physical limits, what values the random variable takes and so on. This information determines a certain set of admissible

32 Sometimes the abbreviation ‘i.i.d.’ is used for the term ‘independent and identically distributed’.

33 A hypothesis (from Greek ποτιθ ναι—hypotithenai, Latin hypothesis meaning both ‘to put under’ or

‘to suppose’) is a proposed explanation for a phenomenon; a statement requiring verification.

Book.indb 39

Book.indb 39 12/9/2013 12:22:53 PM12/9/2013 12:22:53 PM

(possible) hypotheses. Denote this set by . This set determines a set of probability tions and we know that these distributions may characterise the population. These distribu-tions can be different in both, in formulas that indicate a class of distribution and the values of the parameters can be different (differences in a given class).

Each formulated statistical hypothesis separates a certain subset from the set . It can be written as:

If the subset contains only one element (i.e. determines one distribution only), then such a hypothesis is called a simple hypothesis. Otherwise, the hypothesis is a composite (complex) one.

Let us divide the statistical hypotheses remaining in a given class of distribution.

A supposition that is formulated can concern:

• the parameter of the population

• the class of the population.

If a hypothesis is formulated in relation to a parameter of the random variable, we say that the hypothesis is a parametric one provided that the distribution is known. If a hypothesis is formulated and there is no information on the population, we say that the hypothesis is a nonparametric one.

Let us presume that the random variable of our interest is a discrete one (e.g. the number of failed machines, the number of occupied service stands, the number of spare parts) etc.

Therefore, the set of admissible hypotheses comprises all of the possible distributions of dis-crete random variables that are nonnegative. If we have a ground to guess that the random variables of interest may be described by a binomial distribution, our hypothesis is both a parametric one and a complex one. A subset contains all of the binomial distributions with different values of parameters. If, in turn, a hypothesis was formulated that the binomial distribution has the parameter p = 0.1, then the hypothesis is simple.

The formulation of a statistical hypothesis is a very important part of statistical analysis;

however, it sets the challenge of verifying this supposition.

1.3.4 Statistical tests and critical region

A statistical test is any rule of conduct used to predicate whether the verified statistical hypothesis should be rejected or whether there is no basis to do this. The statement that there is no ground to reject the hypothesis is not the same as stating that the hypothesis is a true one.

It may happen that based on the result of a different sample taken for the verification of this supposition, an inference may be altered—in which case the hypothesis should be rejected.

The division of statistical hypotheses into parametric and nonparametric ones means that all tests in statistics are divided into parametric tests and nonparametric tests.

A statistical test is constructed according to some rules.

Firstly, a hypothesis is formulated that will be the subject of verification. This hypothesis is called a null one and is noted as:

In addition, an alternative hypothesis is also formulated which is different to the null one.

Often, it is a denial statement compared to the verified hypothesis. It can be noted as:

and this hypothesis is accepted as the true one when the null hypothesis is rejected.

Book.indb 40

Book.indb 40 12/9/2013 12:22:55 PM12/9/2013 12:22:55 PM

Notice that a sample WWW_n ( ,xx,xxx xx_n) can be treated as a certain point in the n-dimensional space of trials. Denote a set of all possible results of trials by . A statistical test relies on the determination of such a region that if W

n∈ , then the verified hypothesis should be rejected. This region is a critical one. If W_n∈ – , then the verified null hypoth-esis can be accepted.

The region is the area of the rejection of the verified hypothesis and also the critical region of the test. The area of the acceptance of the hypothesis is obviously determined by: − .

Because inference about the properties of the investigated population is conducted based on a sample, there is a real possibility that the deduction will produce an incorrect result. The information contained in the sample may be such that we recognise the verified hypothesis as false and we reject it, although the hypothesis is a true one. Similarly, we may make a mistake by accepting a hypothesis which an untrue one. This means that two possible errors can be made during statistical inference. The relationship between the property of the hypothesis—

true or false—and the decisions made during statistical inference is presented in the table below.

Decision

Hypothesis H₀

True False

Reject I type error √

Accept √ II type error

The probability that an error of the first type will be made is given by the pattern:

P(W_n∈ |H0) = α( ) (1.98)

whereas the probability that an error of the second type will be made is given by the pattern:

P(W_n∈ ( − ) |H1) = β( ) (1.99)

The best test would be one which ensures a minimum of both errors. Unfortunately, there is no way to attain the simultaneous minimisation of both probabilities. If the probability of making an error of the first type is zero, then the rejection region is an empty set. Thus, the acceptance region overlaps with set , and for this reason, the probability is that relation W_n∈ will be 1 for all of the hypotheses. This also means that β( ) = 1 for hypothesis H1.

In the theory of the verification of statistical hypotheses, tests are constructed in such a way as to minimise the probability of making a type II error presuming that the probability of making a type I error is constant and appropriately low. Such tests are called the most powerful ones. A certain probabilistic measure is associated with these tests—the probability that a false hypothesis will be rejected and the alternative hypothesis which states the truth will be accepted. This measure is called the power of a statistical test.

Therefore it can be written as:

P(W_n∈ |H1) = M( ) (1.100)

where M( ) is the power of the test.

The relationship between the power of the test and the probability that a type II error will be made is given by the relationship:

β( ) = 1 − M( ) (1.101)

Book.indb 41

Book.indb 41 12/9/2013 12:22:56 PM12/9/2013 12:22:56 PM

The task of the general theory of testing statistical hypotheses is to formulate methods for the construction of the best tests, i.e. the most powerful tests. However, in some cases such tests do not exist. Thus, the further task of the theory is to indicate what to do when there is no most powerful test.

Look more carefully at formula (1.98), which provides information about the probability of making a type I error. If the verified hypothesis is rejected due to the information con-tained in the sample, then it can be assessed that a rare event occurred because the probability α is small. Moreover, if the event was Wn∈ , then the assumption on the truthfulness of the null hypothesis was wrong. In a case when the event will not happen, we can say that we have no ground to discard the hypothesis. Notice, that we have no basis for evaluating the hypothesis as a true one because true hypothesis can be different. Such a property of statis-tical tests characterises tests of significance. These tests allow the verified hypothesis to be rejected with a high probability when it is false. However, they do not allow the problem of whether the null hypothesis is a true one to be resolved. The probability α is called the level of significance. Thus, if this level is assumed to be 0.05 (this is the most frequently presumed level of significance in engineering investigations), it means that taking 5 out of 100 cases on average, the verified hypothesis will be rejected—based on the sample taken—despite the fact that the hypothesis is a true one. The reason for the rejection is connected with the informa-tion contained in samples and is not associated with the statistical procedure conducted.

Tests of significance are most frequently used in practice not only in the engineering field and they are very simple in application.

Nonetheless, it should be noted that there is a certain freedom in the construction of tests of significance and this freedom is not only connected with the arbitrary presumed level of signifi-cance. Basically, there is independence in the selection of the statistic (estimator) which will be used to estimate the unknown parameter of the general population in parametric tests or in a different type of inference (in nonparametric tests of significance). Practically, the method of the selection of the level of significance and the selection of the statistic alone is usually imposed by what would be found in the literature on the subject. As was stated, the level of significance is usually assumed to be α = 0.05 and it is a rare event when this level is higher or lower than that.

There is also a rule that is only one hypothesis is articulated presuming silently that the alternative hypothesis is its negation in tests of significance. In addition, nothing is stated about the level of the probability of a type II error.

When a parametric test of significance is applied, its procedure is as follows.

1. Formulate the basic hypothesis (null one) H₀ stating that parameter q = q₀, which means it is suspected that the population of interest has a parameter q of q₀ value

2. Take a sample of size n from the general population

In document Statistics for Mining EngineerinG 1 (Page 50-57)