• No results found

Lifetime statistics

In document Spatio temporal dynamics in pipe flow (Page 129-132)

4.6 Conclusions

5.1.1 Lifetime statistics

Lifetime problems are often encountered in a number of real-world applications such as manufacturing for quality control purposes or drug trials and medical studies. The statistical analysis of lifetimes is therefore a well-studied topic, and many techniques exist to obtain the best estimate of the mean lifetime and the accuracy of this estimate. In this section, we will briefly outline the relevant measurement techniques of Lawless (2003), which provides an extremely comprehensive reference outlining a wide variety of statistical methods which can be used to analyse lifetime problems.

variableT which lies inŒ0;1/and is distributed according to a probability density function f .t /. Lifetime distributions are most obviously constructed via the survival functionS.t /. Definition 5.1.1 (Survival function). The survival functionf .t /is defined by

S.t /DP.T >t /D

Z 1

t

f .x/dx:

In a general lifetime problem, it is often the case that by observing a finite number of lifetimes¹tiºniD1which are distributed according tof, we surmise thatf belongs to a family

of possible density functions¹g.xI /jº. For example, in pipe flow eachgrepresents an

exponential distribution, and corresponds to. The problem therefore is to find:

an approximation of the true value of from the given lifetime data, denoted byO;

how accurate this value is, given the sample size.

The best statistical approach for the first question is to apply the method ofmaximum likelihood, which provides an optimal estimate of the true value given any observed data.

Definition 5.1.2 (Maximum likelihood estimator). Given i.i.d. lifetimes¹tiºniD1distributed

asf .xI /, we define thelikelihood functionJ. /as

J. jt1; : : : ; tn/D n

Y

iD1

f .tiI /:

IfO maximizesJ, then it signifies the most likely parameter and is known as themaximum

likelihood estimator.

When the survival function is exponential so that S.tI / D exp. t = / and f .tI / D 1exp. t = /, it can be shown thatOis given by the usual sample mean

O D 1 n n X iD1 ti:

However, this analysis assumes that we are always able to observe the lifetimes of any given sample, which is often not the case. For example in pipe flow, puffs have lifetimes exceeding

106 time units for even moderately large Reynolds numbers which is often inaccessible

A natural way to avoid this problem is to censor the data by imposing some restrictions on the observation time. Lawless (2003) examines two common cases. Type 1 censoring imposes an upper limit on all observation times, so thatti 6tmaxfor alli, in which case we define

the number of uncensored lifetimesrto be the number of samples which haveti < tmax(i.e.

those which do not exceed the upper limit on the observation time). In type 2 censoring, prior to the start of the simulation we determine the number of uncensored lifetimesrwhich will

be observed, leavingn rcensored cases. Simulations are then performed untilr lifetimes

have been observed. Both methods have their merits; type 1 is often easier and more natural to implement, but is difficult to obtain estimates for the accuracy of the MLE. On the other hand, the accuracy of the MLE for type 2 may be derived exactly, but the total observation time remains unknown.

In either case, the MLE is obtained through a slight variation of the sample mean, with

O D 1 r n X iD1 ti:

The reliability of this MLE may be determined through the standard statistical method of confidence intervals. Given a confidence level1 ˛ with0 < ˛ < 1, we determine an

intervalŒL; U such thatP.L < < U /O D1 ˛; i.e. the range of values likely to contain

O

up to probability1 ˛. In the case of type 2 censoring, exact confidence intervals may be

derived as ŒL; U D O " 2r 22r;˛=2; 2r 22r;1 ˛=2 #

where2r;pdenotes thep-th quantile of the2distribution withrdegrees of freedom; i.e.

the value such that ifX 2r then

P.X 6x/Dp:

In the case of type 1 censoring however, exact intervals are impossible to determine and so must be estimated numerically. One approach is to appeal to the central limit theorem, in which case intervals may be estimated byO.1˙0:96n 1=2/. However, for small sample

sizes this approximation is not accurate and is easily affected by non-symmetric distributions. Instead then, we use the relatively simple (and accurate) technique ofbootstrapping as presented in DiCiccio & Efron (1996), amongst others.

1. Generate a pseudorandom samplet1; t2; : : : ; tnby sampling from the set of available

lifetimes¹tiºniD1with replacement. For example whennD3, a possible pseudoran-

dom sample may be¹t1; t1; t2º.

2. Calculate the maximum likelihood estimatorO1of this data as

O 1D 1 r1 n X kD1 tk;

wherer1is the number of uncensored lifetimes.

3. Repeat the previous two stepsBtimes to obtain bootstrap samples¹ O1;O2; : : : ;OBº.

4. Sort this data so thatOb<ObC1.

AssumingB D10kwherek>2, confidence intervals at level1 ˛may then be read off as ŒOB˛=2;OB.1 ˛=2/.

In document Spatio temporal dynamics in pipe flow (Page 129-132)