4.2 Modelling Daily Rainfall : The Basic Model
4.2.1 General structure
In this section, we shall describe the construction of a general daily rainfall model for a single site. This work aims to provide the methodology required for modelling daily rainfall within the Bayesian framework before this can be extended to multiple sites. Modelling daily rainfall data is a complex process since the distribution is a mixture of discrete and continuous components. Daily rainfall models commonly distinguish between the discrete component which represents rainfall occurrence and the continuous counterpart which corresponds to the rainfall amount when rain occurs. Hence, the daily rainfall can be modelled separately in two parts: the occurrence process and the amount process. The occurrence process is a process that models the probability of rainfall occurrences which means that the data has a zero value when rain is absent and a positive value when rain occurs. On the other hand, the amount process is a process that models the amount of rainfall which occurs during a rainy day. It is hence important to represent the rainfall amount with an appropriate distribution that only allows non-negative values.
Over the past few decades, most research using mixed distributions has advocated the use of a two-stage approach. Stern & Coe (1984) demonstrated that the utilisation of a two-stage approach in a daily rainfall model is relatively straightforward for hydrology applications. Tooze et al. (2002) also employed the same approach for the analysis of medical expenditure in the United States. Both examples have demonstrated that the two-stage approach is an appropriate way to model mixed distributions.
Generalized Linear Models (GLMs) have also been employed to model a daily rainfall. The GLMs are an effective class of probability models which can accommodate various types of data such as climatological and meteorological data. The response variable for the GLM approach generally does not necessarily need to be Gaussian, as long as it comes from one of the members of the exponential family. Therefore, GLMs can handle non-normal
responses using well defined characteristics for modelling the continuous and strictly pos- itive amount process and the discrete component of occurrence process. Basically, the response variable is linked to the linear predictor through a specific link function. The early works by Coe & Stern (1982) and Stern & Coe (1984) have shown that it is quite straightforward to model daily rainfall using GLMs in a hydrology context. They used a simple linear regression with a Fourier series function as the covariate for both amount and occurrence processes. Grunwald & Jones (2000) utilised a similar approach to this in an Australian daily rainfall model by incorporating more complex covariates in the GLM model. Moreover, recent literature in rainfall modelling indicates that the covariates in the GLM may also include sophisticated weather variables such as temperature, wind speed, and atmospheric circulation pattern (Chandler & Wheater, 2002; Furrer & Katz, 2007). The GLM approach can also be extended to include spatio-temporal models. Chandler & Wheater (2002), Yang et al. (2005) and Fernandes et al. (2009) successfully proposed a GLM-based framework for spatiotemporal structure in daily rainfall models, an idea which we will use in the next chapter. However, the majority of this literature in rainfall modelling does not use the Bayesian approach. Therefore, we want to contribute to this small literature and utilise the ideas from Coe & Stern (1982), Stern & Coe (1984) and Grunwald & Jones (2000) by using a Fourier series function as a covariate for both the amount and occurrence processes within the Bayesian framework. Particular attention is also given to the relationship between these two processes. For the case of British daily rainfall, we will extend the model by incorporating Lamb weather types (LWTs) directly to the amount and occurrence processes.
Let Wt ∈ R+ be a random variable for the daily rainfall amount with an observed value, wt at a single site at time t (measured in days), where t = 1, . . . , T . Suppose that
each observation in the daily rainfall data is generated from a random process where the distribution of Wt, given quantities ptand µt(where 0 ≤ pt≤ 1), can be defined as follows:
Fw(wt|pt, µt) = Pr (Wt≤ wt|pt, µt) = 0, (wt< 0) 1 − pt, (wt= 0) ptFA(wt|µt), (wt> 0)
where FA(wt|µt) is the distribution function of the amount distribution. Hence Pr(Wt=
0) = 1 − pt. Given that Wt> 0, the conditional pdf of Wt is f (wt|µt). Let Rt represent
the rainfall occurrence and serve as an indicator function for Wt. Then we have:
Rt=
(
0, Wt= 0
Thus, the random variable Wt can be re-expressed as:
Wt= I(Wt> 0)Yt
= RtYt (4.2)
where Yt = g(Zt) is a continuous random variable and g(.) is some monotonic function
defining a suitable transformation (e.g. exponential) together with the transformed value,
Zt. Here Wt is the actual rainfall amount whose value could be zero but it is always
observed. According to Stern & Coe (1984) and Grunwald & Jones (2000), Yt can be
regarded as the intensity process which can also be viewed as the potential rainfall amount. The value of Yt is always positive but not always observed.
In order to describe the serial structure of the daily rainfall, it is necessary to consider a model that can accommodate the relationship between the amount and occurrence pro- cesses. Stern & Coe (1984) does not explicitly take into account any relationship between the probability of rainfall and the rainfall amount. It is hence one of our primary objectives to relate these two processes in our modelling strategies so that no important information about the rainfall amount and occurrence is lost. Our goal is further emphasized by Tooze
et al. (2002) who stressed that it is critical to elucidate the relationship between the prob-
ability and the level of nonzero observation so that the accuracy and adequacy of analysis might be improved. There are several approaches that we can employ to expound the relationship between the rainfall amount and occurrence. In the first approach, we can fit the occurrence process and then use that information to analyse the amount process as follows:
Pr(Rt= 1) = pt and f (wt|Rt= 1) (4.3)
where f (wt|Rt= 1) is the conditional pdf for the rainfall amount when rt = 1. Alterna- tively, we can evaluate the model for the potential rainfall amount first, and then, compute the rainfall probability as:
f (yt) and Pr(Rt= 1|Yt= yt) = ˆh(yt) (4.4)
where ˆh(y) is some function for the rainfall probability which incorporates the information
contained in the amount process. A third possibility is that we can use a property of the distribution, such as the mean, µ, as a covariate of the rainfall probability to link between the amount and occurrence processes.
Heaps et al. (2015) demonstrated the first approach by introducing two normal vari- ables, {Z0, Z1} where
with Z0,t as the occurrence function and Z1,t as the amount function. In this case,
f (wt|Rt = 1) is a lognormal density and the authors included the Z0,t as the covari-
ate of Z1,t for the log-rainfall amount. On the other hand, Sofia (2007) utilised the third approach by conditioning the occurrence process on the mean of the log potential rainfall amount, ϑt using logistic regression for monthly rainfall data. Thus, the probability of rainfall is given by
Pr(Wt> 0|ϑt) = pt=
exp(ζ0+ ζ1ϑt)
1 + exp(ζ0+ ζ1ϑt)
.
In this case, Rt and Yt are conditionally independent given ϑt so f (wt|Rt = 1, ϑt) =
f (yt|ϑt). For example, we can have Zt= log(Yt) and, given ϑt, Zthas a normal distribution
with mean ϑt and Ythas a lognormal distribution.
While Germain (2010) and Heaps et al. (2015) used the first approach, in this the- sis we propose to investigate the other two approaches as alternative models for daily rainfall, within the Bayesian framework. In particular, we will use the third approach, which was used by Sofia (2007), for the Italian daily rainfall and the second approach for the British daily rainfall. We will also investigate the daily rainfall data over the whole year in our model instead of just focusing on the daily rainfall data in the winter as in the case of Germain (2010) and Heaps et al. (2015), or monthly data, as in the case of Sofia (2007). Figure 4.1 illustrates the example of a directed acyclic graph (DAG) for the general structure of the univariate daily rainfall model. This model is the first proposal that we will use for the univariate daily rainfall in the context of the Italian data. In this model, · · · , Yt−1, Yt, Yt+1, · · · are assumed to be independent and only depend on µtwhich
changes over time. It means that Ytvariables are independent given the model parameters
and not just conditionally independent given Rt. The detail of this model will be discussed
in Section 4.3.2 and we will also examine the assumption of conditional independence in the amount process, through analysis of residuals in Section 4.3.3.5. There are other possi- bilities which we might consider, such as direct dependence between · · · , Yt−1, Yt, Yt+1, · · ·
as well as · · · , Rt−1, Rt, Rt+1, · · · but with a different context of rainfall data.
Rt Rt−1 Rt+1 µt µt−1 Wt−1 Wt µt+1 Wt+1 Yt Yt−1 Yt+1