Statistical Methods - Statistical Analysis and Modeling of PM\u3csub\u3e2.5\u3c/sub\u3e Speciat

The traditional time series models are useful in describing the temporal variations,

including seasonality and trend, in ambient air P M2.5 metals. However, preliminary

examination of the P M2.5 metals (Figures 2.2-2.4) suggested variations that are less

regular and less appropriate for time series modeling. As a result, we will use additive models, a type of non-parametric methods to describe the temporal variations.

Being time series, P M2.5 metals speciation data are likely to be correlated. Several

data values are also recorded from each sites giving it a structure of clustered data with site as a cluster. A model that can take into account this nature of the data is needed. In general, when dealing with correlated data such as clustered, hierarchical and spatial designs data, model that could account for the correlation need to be used and random effect models are good candidate. For these reasons we will use the Generalized Additive Mixed Models(GAMM) described below for the modeling.

Suppose that yi is the ith observations of the random variable y and p covariates

xi = (1, xi1, · · · , xip)T associated with fixed effects and a q × 1 vector of covariates

zi associated with random effects. Given a q × 1 vector b of random effects, the

observation yi are assumed to be independent with means E(yi|b) = µbi and variances

var(yi|b) = φm−1i v(µbi), where v(.) is a specified variance function, mi is a prior weight

(e.g. a binomial denominator) and φ is a scale parameter. A generalized additive model is given by g(µbi) = β0 + p X j=1 fj(xij) + zTi b, (2.3.1)

where g(.) is a monotonic differentiable link function, fj(.) is a centered twice-differentiable

smooth function, the random effects b are assumed to be distributed as N(0, D(θ)) and θ is a c × 1 vector of variance components.

A key feature of the GAMM(2.3.1) is that additive nonparametric functions are used to model covariates and random effects are used to model correlation between

observations. If fj(.) is a linear function, the GAMM(2.3.1) reduce to Generalized

Linear Mixed Model (GLMM). In addition if the link function is taking to be identity, then they will be reduced to additive mixed models.

Model (2.3.1) encompasses various study designs, including clustered, hierarchical and spatial designs. This is because a flexible covariance structure of the random effects b can be specified. For longitudinal data, the random effects b can be de- composed into a random intercept and a stochastic process (Zeger and Diggle, 1994; Zhang et al., 1998). For hierarchical (multilevel) data, they can be partitioned to represent different levels of a hierarchy, e.g. a centre, physician and patient in a mul- ticentre clinical trial (lin and Breslaw, 1996). For spatial data, which is common in disease mapping and ecological studies, they can be used to model spatial correlation (Cressie, 1993; Breslaw and Cleton, 1993).

The multiple smoothing parameter estimation by generalized cross validation (mgcv) package is part of the recommended suite that comes with the default installation of R and is used by GAMM for the model fitting. It is based on methods described in Wood (2000). Different packages are available in R for fitting additive model in general. The gam package allows more choice in the smoothers used while the mgcv package has an automatic choice in the amount of smoothing as well as wider func- tionality. The gss package of Gu (2002) takes a spline-based approach. The fitting algorithm depends on the package used. The penalized smoothing spline approach is used in the mgcv package and works as follow:

Suppose we represent fj(xj) =

iβijφi(xj) for a family of spline basis functions, φi.

We impose a penalty R [f”

j(x)]2dx which can be shown to be of the form βjTSjβj, for

a suitable matrix Sj that depends on the choice of basis. The model is fitted by

minimizing,

||y − Xβ||2+X

λjβjTSjβj

problem of estimating the degree of smoothness for the model is now the problem of

estimating the smoothing parameter λjs. Generalized cross-validation method (GCV)

is used to select the λjs.

In GCV, λ is chosen to minimize:

Vg =

nPn

i=1(yi− ˆfi)2

[tr(I − A)]2 .

where ˆf is the estimate of f from fitting all the data, and A is the corresponding

influence matrix.

For this special case of P M2.5 speciation data, a possible model can be given by

yis= β0+ f1s(yearis) + f2s(monthis) + f3s(tempis) + f4s(pressis) + Sitei+ ǫis

The problem with this model is as the number of site augment, we will pay the price of having to estimate a lot of parameters for the site effect. Another difficulty in using this model is that, we can only make statements about stations given in the data and won’t be able to generalize to a neighboring station in the region. The best approach is to use a random intercept for the site and the following model is proposed.

yis = β0+ f1s(yearis) + f2s(monthis) + f3s(tempis) + f4s(pressis) + bs+ ǫis

where yis is the ith observations of the random variable y and s represent the MSA.

fjs refer to a smooth function for the covariate xjs of a given metropolitan statistical

area s with ith observation xjis. bs is the random effect representing sites in a given

MSA s. Within the random effect different correlation structure could be specified. In

our present situation an autoregressive of order one (AR1) is found to be suitable. ǫis

is the noise and is assumed to be independently distributed with mean 0 and variance σ2_.

sidered part of a mixture based on several criteria including factor analysis, knowledge of association with health effect and source origin. All metals coming from the same leading factors will be grouped together to form a mixture. Metals from significant factors with high loading coefficients (> 0.05) could also be grouped to constitute a mixture. Additionally some mixtures are formed based on knowledge of their significant association with health outcome such as low birth weight and preterm birth either during the first trimester of pregnancy or the entire pregnancy period. Summary of mixtures analyzed in this report are giving in Tables 2.8 and 2.9.

In document Statistical Analysis and Modeling of PM\u3csub\u3e2.5\u3c/sub\u3e Speciation Metals and Their Mixtures (Page 40-43)