Model parametrisation and fitting

6.4 Empirical Analysis

6.4.5 Model parametrisation and fitting

Recall that three Poisson processes are involved in a BLP model in describing the rainfall pulse process and there are six parameters for a single-process BLP model: generation rate of storm origins, rate of rain cell origins, and pulse arrival rate are denoted byλ, β, andξ, respectively; the mean storm lifetime, mean cell duration, and mean pulse depth are denoted byγ−1_,_η−1_{, and}_{θ, respectively. In the model specifica-}

tions of fit-o andfit-c, two distinct independent storm processes are superimposed and a common mean pulse depth is assumed. Hence, there are 11 parameters in total in each model as shown in Table 6.2

The determination of the the objective function (6.8) is based on the fitting procedure strategy of Cowpertwait et al. (1996a), which assumes that it is more desirable to fit a larger set of sample moments approximately rather than a smaller set exactly. In addition, intuitively and qualitatively, we believe that if the statistical properties at 5- min and 24 hours levels can be fitted well, those properties of in-between time scale levels (e.g. 1 hour, 6 hours, etc) should also fit well. This is how (6.8) has been determined, which includes 12 properties for estimating 10 parameters, {λi,βi,ξi,γi,ηi : i= 1,2} (θ is estimated separately in the very last step). Two problems are found with this fitting procedure: (1) the minimization process is not robust (e.g. easy to hit the specified parameter space bounds or mis-fit badly); (2) many diﬀerent but possibly equivalent optimal solutions may be obtained. The optimization results are sensitive to the specified parameter space bounds.

As defined in Section 6.3, the 12 sample properties included in (6.8) consist of three sample statistics (coefficient of variation, lag 1 autocorrelation, and coefficient of skewness) at four different aggregation levels (5-min, 1, 6, and 24 hours). The statistical properties of the same sample statistic, e.g. CV, at different aggregation levels, tend to highly correlate to each other. This results in a smaller number of independent components of constraints in the minimization objective function. If the number of independent components of constraints is less than the number of parameters for estimation, the problem of overparametrisation occurs. This is most likely the reason for the unstable and non-unique minimization results. In order to find evidence of overparametrisation, a basic Principal Components Analysis (PCA)2 _{is made on}

those statistical properties which are included in (6.8).

Table 6.3: Basic Principal Components Analysis result: importance of components PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8 PC9 PC10 PC11 PC12 sd 2.412 1.567 1.121 0.9542 0.7325 0.6737 0.5303 0.3867 0.2835 0.2038 0.1083 0.07706 prop of var 0.485 0.205 0.105 0.0759 0.0447 0.0378 0.0234 0.0125 0.0067 0.00346 0.00098 0.00049 cumu prop 0.485 0.689 0.794 0.8699 0.9146 0.9525 0.9759 0.9884 0.9951 0.9985 0.9995 1.0000

The 60 years of 5-min Kelburn data are divided into 12 consecutive 5-year subsamples and each subsample is further divided into 12 subsamples pooled by each month (144 subsamples in total and 12 subsamples for each month which come from diﬀerent 5-year groups). In this way, each of the 12 properties has 12 replicates. The basic PCA ranks the importance of each principal component. The results are presented in Table 6.3. The abbreviations ‘sd’ stands for standard deviation; ‘prop of var’ for proportion of variance; ‘cumu prop’ for cumulative proportion.

The results show that those statistical properties used in (6.8) are highly correlated to each other. Six principal components can explain more than 95% of the variance expressed by those 12 statistical properties. Nine principal components can explain more than 99% of the total variance. This indicates that the number of independent constraints is less than the number of model parameters, which implies the model is over parameterised and the minimization procedure is therefore not robust. It is therefore possible to have more than one solution of ‘optimal’ parameter set. One direct solution for the overparametrisation problem is to reduce the number of model parameters. This issue will be further investigated in the next chapter.

Table 6.4 shows how parameter estimates of model fit-c are correlated to one another. The correlation coeﬃcients are calculated based on the parameter estimates

of fit-c given in Table 6.2. We concentrate on how the corresponding parameters

of distinct storm processes are correlated and those linear correlation coeﬃcients are highlighted by bold font. The most striking one is the correlation between ˆξ1 and ˆξ2

which is as high as 0.982. One obvious option to reduce the number of parameters is to assume that there is a deterministic linear regression relationship between ˆξ1and ˆξ2,

hence one can be expressed in terms of the other by a fitted linear regression function. However, simulation study shows that this treatment causes some lack-of-fit to the sample properties. Therefore, a more elaborate but less eﬃcient way of specifying a

Table 6.4: Linear correlation between model parameters: fit-c ˆ λ1 βˆ1 ξˆ1 ˆγ1 ηˆ1 ˆλ2 βˆ2 ξˆ2 γˆ2 ηˆ2 ˆ λ1 1.000 0.419 ₋0.391 0.728 ₋0.006 ₋0.324 ₋0.318 ₋0.385 ₋0.545 ₋0.420 ˆ β1 ₋ 1.000 ₋0.026 0.877 0.851 ₋0.486 0.568 ₋0.112 ₋0.278 0.431 ˆ ξ1 ₋ ₋ 1.000 ₋0.084 0.035 0.712 ₋0.068 0.982 0.768 0.071 ˆ γ1 − − − 1.000 0.588 −0.469 0.205 −0.150 −0.4130.116 ˆ η1 − − − − 1.000 −0.572 0.789 −0.066 −0.146 0.710 ˆ λ2 − − − − − 1.000 −0.412 0.772 0.804 −0.347 ˆ β2 − − − − − − 1.000 −0.152 0.044 0.943 ˆ ξ2 − − − − − − − 1.000 0.818 −0.020 ˆ γ2 − − − − − − − − 1.000 0.094 ˆ η2 ₋ ₋ ₋ ₋ ₋ ₋ ₋ ₋ ₋ 1.000

Table 6.5: Ratio of pulse arrival rates: ˆξ2/ξˆ1

month 1 2 3 4 5 6 7 8 9 10 11 12 fit-o 10.10 7.17 7.46 8.69 9.75 14.21 7.88 7.84 6.19 6.22 11.87 8.10 fit-c 6.41 4.85 5.33 5.50 6.50 10.63 5.38 5.52 4.46 5.48 7.63 5.69 fit-c-pd (fixed) 8.2 5.0 7.6 5.5 6.5 15.0 7.2 6.0 5.6 5.5 8.3 5.8

deterministic relationship between ˆξ1 and ˆξ2 is adopted in fitting model fit-c-pd. A

constantratiovector is given in the third row in Table 6.5 so that ˆξ2can be determined

by ˆξ2[i] = ˆξ1[i]×ratio[i],i= 1, ..., 12 (month). The ˆξ2 to ˆξ1 ratios in row 1 and 2 are

derived, respectively, based on the parameter estimates of fit-o and fit-c given in Table 6.2. The averages of the row 1 and row 2 values are used as the initialratiovalues to start the model fitting process. Theratio vector is finally obtained by manual fine adjustment in the model fitting process. In this way, the negative impact on the model fit of fit-c-pdis insignificant. However, such a treatment for reducing the number of model parameters is subjective and ineﬃcient. Alternatively, we may assume those pair of parameters which have the least correlation to be the same for diﬀerent processes, e.g. assuming a common storm generation rate or a common storm mean lifetime for both storm processes. These possibilities are investigated in the next two chapters.

In the process of selecting the best model specification and fitting procedure, we have tested a range of possible schemes and tried to identify the impact from each single change. We found that the increase of within cell depths dependence introduced

by assuming a conditional depth distribution has a major impact in improving annual maxima patterns over the lower return period range (T < 3 years). However, the further inclusion of a proportion dry constraint compromises the 5-min extreme values fit slightly (causing a slight underfit) while gaining improvement in fitting the proportion dry. The inclusion of the PD constraint (6.10) has imposed an ideal condition for the parameter estimation. The observed rainfall data actually adopt a threshold PD definition as we have discussed in detail in Section 6.4.4. This could be the reason for the slight compromise for the extreme values fit using fit-c-pd at the 5-min level (Figure 6.9).

6.5 Summary and conclusion

Motivated for fitting the fine-scale (i.e. sub-hourly) rainfall series with more realistic rain cell profiles, the BLP model was developed by CIO2007 based on the original Bartlett-Lewis rainfall model. In this chapter we have presented the results of an empirical study of model specification and fitting procedures and the resulting model performance of BLP models in fitting a 5-min rainfall time series. Two diﬀerent BLP model specifications,fit-c, assuming a conditional mean depth distribution andfit- c-pd with a proportion dry constraint at 24 hours level also included, are proposed in comparison with the original BLP model fit-o. The total number of estimated parameters is unchanged with fit-c. By assuming ˆξ2 = constant×ξˆ1, fit-c-pd has

one estimated parameters less than fit-o. fit-cand fit-c-pd have retained a good fit to moment properties as obtained usingfit-o. A satisfactory goodness-of-fit result to moment properties not used in fitting provides additional evidence for the adequacy of the BLP models.

By simulation,fit-oshows a slight underfit to the 5-min extreme values and some slight overfit at 12 hours level. Other than these, fit-o fits extreme values very well.

fit-c improves upon fit-o. The annual maxima fitting results using fit-c are ad-

equate. Significant improvement is obtained in fitting the individual month annual maxima using fit-c, although there are still some lack of fit cases. The improvement in fitting extreme values is due to the introduction of conditional mean depth distribution.

Proportion dry is a key feature in applications of a stochastic rainfall model. The previous work have shown that the basic NSRP and BLRP models have a tendency

to overestimate the observed proportion dry. For example, Cowpertwait et al. (1996a) included proportion dry directly in the objective function for parameter estimation to improve the model proportion dry performance; according to Frost et al. (2004), a basic NSRP model generally overestimates proportion dry even when the proportion dry is directly included in the fitting procedure. fit-o also produces an overestimation in the observed proportion dry. In contrast, both fit-candfit-c-pd have significantly improved the model proportion dry performance at all time scale levels examined, with or without assuming a threshold. fit-c-pdfurther improves the fit to the proportion dry at higher time scale levels, e.g. 12 and 24 levels, fromfit-c. However, this gain is at a cost of slightly underfitting 5-min extreme values. If 5-min extreme value pattern is not of the major concern in the application, further improvement on the reproduction of proportion dry patterns may be achieved by imposing two proportion dry constraints in the fitting procedure. It is likely that fit-candfit-c-pd’s ability in reproduction of proportion dry patterns will satisfy the needs for many practical applications.

BLP models with continuous

distributions of storm types

7.1 Introduction

In Chapter 6, based on the empirical study results, we conclude that a diﬀerent BLP model specification, fit-c, significantly improved the model performance of the original BLP model, fit-o, in fitting the annual maxima and proportion dry values. The improvements are due to the successful implementation of the within cell pulse depths dependence structure by assuming a conditional mean exponential distribution. On the other hand, the discussions on the fitting procedure reveal that further research is needed in optimization of the model parametrisation. In this chapter, we inves- tigate an alternative way for the characterization of the rainfall process within the BLP framework and aim to improve upon fit-c through the optimization of model parametrisation.

Cowpertwait (2010) proposed a Neyman-Scott model with continuous distributions of storm types. In this approach, individual storms are characterized by a continuum of storm types z, where z is a random variable of a continuous probability distribution, and model parameters are taken to be functions ofz. This model setting allows for superposition of diﬀerent types of storms within the same storm process and the parametrisation enables the exploration of how selected model parameters are asso- ciated. BLP models with continuous distributions of storm types are formulated in this chapter and the statistical properties up to third order are derived for several typical model specifications. Based on the derived properties of the fitted models, the continuous-storm-types BLP models are compared to the original BLP models in fitting the Kelburn data.

In what follows, Section 7.2 gives the details of the formulation of a BLP model with continuous distributions of storm types. Moment properties up to third order are derived in the general case and for several typical model specifications in particular. In Section 7.3, model comparison results between diﬀerent single process BLP models are presented. BLP models with two storm processes superposed are compared in Section 7.4. Section 7.5 finishes this chapter with concluding remarks.

In document Further developments of two point process models for fine scale time series : a thesis presented in partial fulfilment of the requirements for the degree of Doctor of Philosophy in Statistics at Massey University, Albany (Auckland), New Zealand (Page 182-189)