Examples of using TIC - Application of AIC, TIC, and GIC

2.4 Application of AIC, TIC, and GIC – simple examples

2.4.2 Examples of using TIC

In this subsection, we will fit some simpleprobability distribution models to some sim- ulated sample data to investigate how diﬀerent AIC and TIC are in evaluating a fitted model.

It is more diﬃcult to calculate TIC than AIC due to the computation of the trace term in Equation (2.6). In order to evaluate this trace term, twok×k matrices K(ˆθ) and J(ˆθ)−1 _{need to be evaluated based on Equations (A.8) and (A.9). It seems that,}

so far no built-in R function is available to calculate TIC. Therefore, some special R programs are written to do the job.

The major built-in R functions used in this study are: optimize (for univariate case) and optimas the minimizer functions, and function Dfor evaluating derivatives.

optim is a general-purpose optimization function in R and we have kept using the default method in optim for optimization calculation in this research. The default method inoptimis an implementation of the Nelder and Mead (1965) algorithm, that uses only function values and is robust but relatively slow. It will work reasonably well for non-diﬀerentiable functions according to the R manual. After the 2.7.1 version, the R functionDcan be used to evaluate the second derivative of agamma function. This makes programming in R much easier for calculating TIC and GIC because several important probability distributions, e.g. gamma distribution andWeibull distribution, do involve the gamma function.

We have designed two scenarios to compare AIC and TIC in evaluation of diﬀerent models.

Scenario 1: Data (of size n = 1000) are generated from a unit mean exponential distribution using the R commands: set.seed(101); rexp(1000).

Three distribution models, denoted by ‘fitexp’ for exponential, ‘fitwei’ for Weibull, and ‘fitgam’ for gamma, are fitted to the data. Since the exponential distribution is a special case for both the Weibull and the gamma distributions, this is a situation that

we should expect to see AIC and TIC behave almost the same (except for differences due to sampling variation). The model evaluation results are summarized in Table 2.3. Scenario 2: Data (of size n = 1000) are generated from a lognormal distribution using the R commands: set.seed(101); rlnorm(1000, meanlog=0, sdlog=2). Four distribution models are fitted to the data: ‘fitexp’ for exponential; ‘fitwei’ for Weibull; ‘fitgam’ for gamma; and ‘fitlnorm’ for lognormal distribution. This is a different situation from Scenario 1, although both scenarios satisfy the three AIC assumption conditions (beginning of Section 2.3.1). We would like to see how different are AIC and TIC as summarized in Table 2.4.

In Tables 2.3 and 2.4, notation Waic andWtic stand for ‘Akaike weights’ calculated from ∆AIC and ∆TIC based on formula (2.21), respectively. The notation ‘EMD’ stands for ‘Eﬀective Model Dimension’ as defined in Equation (2.11) and explained in the paragraph after the definition. Since, for AIC, EMD =k is a constant for a fitted model, it makes more sense using the Eﬀective Model Dimension notion for TIC or GIC only. Here we have EMD = tr�K(ˆθ)J(ˆθ)−1�_{for TIC.}

Table 2.3: Comparison of AIC versus TIC for model selection (1) Fitted models AIC ∆AICi Waic TIC ∆TICi Wtic EMD fitexp(k=1) 1935.8 0.51 0.290 1936.0 0.70 0.271 1.07 fitwei(k=2) 1935.6 0.22 0.335 1935.5 0.23 0.344 1.99 fitgam (k=2) 1935.3 0 0.375 1935.3 0 0.385 1.98

Table 2.4: Comparison of AIC versus TIC for model selection (2) Fitted models AIC ∆AICi Waic TIC ∆TICi Wtic EMD fitexp(k=1) 5364.3 1360.1 0.000 5362.3 1358.2 0.000 0.020 fitwei(k=2) 4152.0 147.8 0.000 4154.3 150.2 0.000 3.14 fitgam(k=2) 4389.4 385.2 0.000 4403.1 399.0 0.000 8.86 fitlnorm (k=2) 4004.2 0 1.000 4004.1 0 1.000 1.94

In Table 2.3, we observed that EMD≈kas we expected. The reason is explained in Equation (A.7) and in the paragraph following Equation (2.6). The model evaluation results for AIC and TIC in Scenario 1 have no real diﬀerence. Note that both AIC and TIC have selected gamma distribution as the ‘best’ model (∆AIC =∆TIC = 0) which is NOT the ‘true’ model. From the Waic and Wtic results, however, we notice that all three candidate models are strongly supported by the data in terms of bias-variance

trade-oﬀ. Again, we should not take the ‘single one best-fit model’ strategy in this situation as explained in Section 2.4.1.

Table 2.4 is different from Table 2.3 in several aspects. The EMD is no longer close tok for models fitexp, fitwei, and fitgam. EMD ≈k (i.e. 1.94≈2) with model fitlnorm because the lognormal distribution is the true model which generates the data as we have designed. BothWaic and Wtic results indicate that fitlnorm is dominantly supported by the data and this is a single best-fit model situation. We notice that although EMD is substantially different from k if a model is ‘misspecified’ (i.e. AIC score is different from TIC beyond the two-unit warning range),∆AIC and∆TIC end up picking the same ‘best’ model and this is not by coincidence. We will give more discussions on this issue in Section 2.5.1.

Table 2.4 shows us a substantial advantage in using information-theoretic criteria that they are valid for non-nested models. The likelihood ratio method of hypothesis testing is as widely applicable as maximum likelihood estimation (Casella and Berger, 2002). However, the traditionallikelihood ratio testsare defined only for nested models. This represents another substantial limitation in the use of hypothesis testing in model selection.

In document Further developments of two point process models for fine scale time series : a thesis presented in partial fulfilment of the requirements for the degree of Doctor of Philosophy in Statistics at Massey University, Albany (Auckland), New Zealand (Page 58-60)