Spatial Statistics Chapter 3
Basics of areal data and areal data modeling
• Recall areal data also known as lattice data are data Y (s),s ∈ D where D is a discrete index set.
• This usually corresponds to data Y1, ..., Yn observed on a set of geographical units (over a map), the pixels of an image or a regular arrangements of points on a lat- tice.
1
• Models for areal data are also sometimes employed for irregularly arranged
point-referenced data sets when the num- ber of spatial units is very large → compu- tational considerations.
• As we shall see in Chapter 5, certain types of areal models are computationally easier to work with and ideal for use with Gibbs sampler.
• In this setting, unlike the geostatistical one, we are typically not interested in prediction and have observed data at all spatial sites.
• What is of interest in this setting?
• Spatial pattern evident? Are there clusters of high/low values?
• Smoothing: Filter out some of the noise in the data → help elucidate spatial pattern.
• Deciding how much to smooth the data is not always clear. Smoother maps are easier to interpret but will generally not represent the data well and vice versa.
• Example: No smoothing at all is equivalent to presenting a raw map of the data. Ex- treme smoothing would involve associating the same value ¯Y with all units. Optimal smoothing lies somewhere between these two extremes.
• Also of interest in this setting is relating the response to covariates through regres- sion models → need to account for spatial dependence in such regression models.
• Also in the regression setting, we would be interested in examining the residual spatial structure after accounting for covariates.
Exploratory methods for areal data
• Recall the primary source of spatial infor- mation in the areal setting consists of ad- jacencies → knowing, for each region, all the ‘neighboring’ regions (for some appro- priate definition of neighbor). i.e.the ar- rangement of the regions across the map.
• This adjacency structure is quantified through the neighborhood (or proximity) matrix W:
Wij =
0 if i = j
0 if i and j are not neighbors cij > 0 if i and j are neighbors
cij quantifies the strength of the neighbor relationship.
• Most often cij = 1 for all neighbor pairs and two regions are considered neighbors if they share a common boundary.
• It is instructive to think of this spatial struc- ture as a graph, where nodes correspond to regions and two nodes on the graph are connected if the associated regions are neighbors.
• The neighborhood matrix W can be used for exploratory analysis and will also be used when we discuss models for areal data.
• Note that it is also possible to define 2nd order neighbors and to have a correspond- ing 2nd order neighborhood matrix.
• After simply plotting data (usually on a map in this case) an exploratory analysis usually proceeds with an attempt to quan- tify the strength of spatial association in the data.
• For this, two statistics can be employed:
1. Morn’s I:
I = n Pi Pj wij(Yi − ¯Y )(Yj − ¯Y ) (Pi6=j wij) Pi(Yi − ¯Y )2 where
– I ≈ 0 → no spatial dependence
– I > 0 → positive spatial dependence – I < 0 → negative spatial dependence
Can be thought of as an areal ‘correla- tion coefficient’.
2. Geary’s C:
C = (n − 1)Pi Pj wij(Yi − Yj)2 (Pi6=j wij) Pi(Yi − ¯Y )2 where C ≥ 0
– C ≈ 1 → no spatial dependence
– C < 1 → positive spatial dependence – C > 1 → negative spatial dependence
• Under the hypothesis that the Yi’s are iid, one can show that the asymptotic distri- butions of both statistics are normal and that
E[I] = 0; E[C] = 1
• Using these asymptotic distributions one can easily construct hypothesis test of
H0 : E[I] = 0
against either a one or two-sided alterna- tive.
• Another, perhaps preferable, way to test for association is to use a Monte Carlo test for independence.
• Idea: Under the assumption that the Yi’s are iid, the distribution of I (and C) is in- variant to permutations of the Yi’s.
• What does this mean?
• The distribution of I clearly depends on W; however, if the spatial structure has no role to play then permuting the rows of W will not change the distribution of I.
• So [I|W] ≡ [I|W∗] where W∗ is any row permutation of W.
• To calculate a Monte Carlo test for spa- tial association, we randomly permute the data vector Y (equivalent to permuting the rows of W) and calculate the value the new value say, I(1).
• Repeat this procedure many times, say, n = 999: I(1), I(2), ..., I(999) and plot the his- togram of these values.
• We then locate the original observed value I(obs) on this histogram.
• Under the assumption that the Yi’s are iid, the observed value I(obs) comes from the same distribution as I(1), I(2), ..., I(999) → I(obs) should lie somewhere in the main body of the histogram.
• If I(obs) lies in the tails of the histogram, we have evidence against the hypothesis that the Yi’s are iid.
• Can quantify this by calculating an empir- ical p-value.
• If associated with each Yi is a vector of covariates xi, then even if the Yi’s are spa- tially dependent they may not be identically distributed.
• As in the point referenced setting, this sug- gests applying these techniques to the es- timated residuals from standard regression models.
Simple Smoothing
• To filter out noise in the data and produce a smooth map we can use the W matrix and replace each Yi with
Yˆi = X
j
wij
wi+Yj; wi+ = X
j
wij
a weighted average that will encourage the smoothed Yi to be similar to its neighbors.
Problems with this?
• A possible remedy is
Yˆi∗ = (1 − α)Yi + α ˆYi for α ∈ [0, 1].
• Here, α = 0 yields the raw data and α = 1 yields a very smooth map. Try different values of α in an exploratory fashion.
• In Chapter 5 we will discuss hierarchical models for smoothing which will incorpo- rate covariate information and spatial ran- dom effects.
• In that setting our smoothed Yi’s will be posterior means E[Yi|Data].
Markov Random Fields
• In the point-referenced data setting we spec- ified the joint distribution of the observed data Y1, ..., Yn directly.
• In the areal setting, where we have Y1, ..., Yn and a neighborhood matrix W we will take a different approach and build the required joint distributions f (y1, ..., yn) through the specification of a set of simpler full con- ditional distributions f (yi|yj, j 6= i), i = 1, ..., n.
• For a given joint distribution f (y1, ..., yn) we can always obtain unique and well defined conditional distributions
f (y1, ..., yn) = f (y1, ..., yn)
R f (y1, ..., yn)dyj
• But note that the converse is not always true!
• We can not simply write down a set of full conditional distributions f (yi|yj, j 6= i), i = 1, ..., n and claim that these determine a unique f (y1, ..., yn).
• Consider two random variables with Y1|Y2 ∼ N (α0 + α1Y2, σ12) and
Y2|Y1 ∼ N (β0 + β1Y13, σ22)
• In this case
E[Y1] = E[E[Y1|Y2]] = E[α0 + α1Y2]
= α0 + α1E[Y2] → E[Y2] is a linear function of E[Y1]
• But we also have
E[Y2] = E[E[Y2|Y1]] = E[β0 + β1Y13] β0 + β1E[Y13]
• Both conditions can not hold (except in trivial cases) and so here the two condi- tional distributions do not determine a valid and unique joint distribution.
• In general when a set of full conditional distributions determine a unique and valid joint distribution we say that the set of conditional distributions is compatible.
• Improper distribution: An improper distri- bution is a distribution with non-integrable density. That is, if S is the sample space of Y then
Z
S f (y)dy = ∞
• When would such an object be useful in statistics? Clearly, an improper distribution is not useful as a model for data.
• In Bayesian statistics, where parameters are assigned probability distributions, improper distributions may be employed as priors.
How?
• Even though the prior density π(θ) is such
that Z
π(θ)dθ = ∞
having observed data y (assumed to arise from a proper distribution) the correspond- ing posterior may be proper
Z
π(θ|y)dθ < ∞
and so inference based on this posterior is valid.
• Such distributions have their uses in Bayesian statistics and in fact are used, as we shall see later, as models for random effects in an areal data setting.
• Given a set of compatible and proper full conditional distributions f (yi|yj, j 6= i), i = 1, ..., n, the resulting joint distribution can be improper!
• Example: consider the bivariate joint dis- tribution with
f (y1, y2) ∝ exp[−1
2(y1−y2)2], (y1, y2) ∈ R2 This density has no valid normalizing con- stant since
Z Z
exp[−1
2(y1 − y2)2]dy1dy2 = ∞ and so the distribution is improper.
• What about the corresponding full condi- tional distributions?
• Clearly
[Y1|Y2 = y2] ∼ N (y2, 1) and
[Y2|Y1 = y1] ∼ N (y1, 1)
so here we have an example of two compat- ible and proper full conditional distributions that yield an improper joint distribution.
• If we have a set of compatible full con- ditional distributions f (yi|yj, j 6= i), i = 1, ..., n, how can we determine the form of the resulting joint distribution f (y1, ..., yn)?
→ Brook’s Lemma
• Brook’s Lemma notes that if {f (yi|yj), j 6=
i), i = 1, ..., n} is a set of compatible full conditional distributions and y0 = (y10, ..., yno) is any fixed point in the support of f (y1, ..., yn) then
f (y1, ..., yn) = f (y1|y2, ..., yn)
f (y10|y2, ..., yn) · f (y2|y10, y3..., yn) f (y20|y10, y3, ..., yn)
· · · f (yn|y10, ..., yn−1,0)
f (yn0|y10, ..., yn−1,0)f (y10, ..., yn0)
• This gives us the joint distribution up to a normalizing constant.
• If f (y1, ..., yn) is proper, then the fact that it integrates to 1 determines the normalizing constant.
• How should we specify the full conditional distributions so that (1) they are compati- ble and (2) they are simple enough and yet yield useful spatial structure?
• We will not worry about (1). To address (2) we will assume that the full conditional distribution of Yi depends only on its ‘neigh- bors’.
• That is, the full conditional distribution of Yi will depend only on those Yj’s that have Wij 6= 0.
• Letting ∂i = {j|Wij 6= 0} denote the set of neighbors for region i (i ∼ j ↔ Wij 6= 0) this implies
f (yi|yj, j 6= i) = f (yi|yj, j ∈ ∂i), i = 1, ..., n
• This sort of specification for the full condi- tional distributions, when compatible, is re- ferred to as a Markov random field (MRF) due to the obvious Markovian structure of the full conditional distributions.
• The idea behind such models is the de- velopment of a complicated spatial depen- dence structure through a set of simple ‘lo- cal’ specifications that depend only on lat- tice (or map) adjacencies.
• We will develop and employ these sorts of models as models for areal data or as mod- els for random effects in an areal setting.
• Clique: A clique is a set of cells (or in- dices) such that each element in the set is a neighbor of every other element in the set.
• Think of the graph representation of the neighborhood structure mentioned earlier.
A clique represents a set of nodes M on the graph such the each pair of indices (i, j) with both i and j in M represents an edge of the graph.
• With n spatial units, we can have cliques of size 1, ..., n.
• Potential function: A potential of order k is a function of k arguments that is ex- changeable in its arguments.
• A potential function of order k typically op- erates on the variable values ys1, ..., ysk as- sociated with a clique {s1, ..., sk} of size k.
• Examples k = 2
1. yiyj
2. (yi − yj)2
3. yiyj + (1 − yi)(1 − yj) for binary data
• Gibbs Distribution: A joint distribution for Y1, ..., Yn is a Gibbs distribution if the joint density/pmf f (y1, ..., yn) takes the following form
f (y1, ..., yn) ∝ exp{γ X
k
X
α∈Mk
φ(k)(yα1, ..., yαk)}
Where φ(k)(·) is a potential of order k, Mk is the collection of all cliques of size k and γ > 0 is a parameter.
• The joint distribution f (y1, ..., yn) depends on y1, ..., yn only through potential func- tions evaluated over the cliques induced by the neighborhood (graph) structure.
• Note such a distribution may have more than one parameter → the potential func- tions may depend on unknown parameters.
• Hammersley-Clifford Theorem: If we have a MRF then the corresponding joint distri- bution is a Gibbs distribution.
• Only Cliques of order 1 → independence - consider the form of the corresponding Gibbs distribution.
• Distributions having Cliques of order ≤ 2 are most common. An example is the pair- wise difference form
f (y1, ..., yn) ∝ exp{− 1 2τ2
X i,j
(yi − yj)2} based on quadratic potential functions.
Conditionally autoregressive (CAR) models
• Particularly popular class of MRF models introduced by J. Besag in 1974.
• These models have become very popular within the last decade, particulary since the advent of Gibbs sampling.
• Gibbs sampling is a procedure for simu- lating realizations from a joint distribution f (y1, ..., yn) using only the full conditional distributions {f (yi|yj, j 6= i), i = 1, ..., n}.
• Useful in Bayesian statistics when we want to draw samples from a posterior distribu- tion of interest.
• MRF models are ideal in this setting since they are specified in terms of full condi- tional distributions. More on this later...
Autonormal (Gaussian) CAR models
• Here we begin with the full conditionals [Yi|yj, j 6= i] ∼ N (X
j
bijyj, τi2), i = 1, ..., n
• For appropriately chosen bij these full con- ditionals are compatible, so using Brook’s lemma we can obtain the joint distribution as
f (y1, ..., yn) ∝ exp{−1
2y0D−1(I − B)y} where B = (bij) and D = diag{τ12, ..., τn2}
• Looks like a multivariate normal distribu- tion with µ = 0 and Σ−1y = D−1(I − B).
• This is of course only true if D−1(I − B) is symmetric.
• We must choose bij in the conditional Gaus- sian distributions to ensure this symmetry.
• In particular, choosing bij so that bij
τi2 = bji
τj2, for all i, j
will ensure symmetry (and compatibility).
• Notice that if τi2 6= τj2 then we can not have bij = bji.
• How to choose the bij’s subject to the above constraints? and also, to yield a reasonable joint spatial distribution?
• We will take the bij’s to be functions of the neighborhood matrix W
bij = wij
wi+, τi2 = τ2 wi+
Does this specification satisfy the symme- try condition?
• With these choices the full conditional dis- tributions are
[Yi|yj, j 6= i] ∼ N (X
j
wij
wi+yj, τ2
wi+), i = 1, ..., n Interpretation?
• The joint distribution for these choices of bij and τi is
f (y1, ..., yn) ∝ exp{− 1
2τ2y0(DW − W)y} where DW = diag{w1+, ..., wn+}.
• This is again MVN with µ = 0 and Σ−1y = (DW − W)
• Note here that (DW − W)1 = 0 → Σ−1y is singular!
• This is a singular MVN distribution → an improper distribution → no valid normaliz- ing constant
• Such a distribution is often referred to as a Gaussian intrinsic autoregression.
• To further investigate this impropriety we can rewrite the joint distribution as
f (y1, ..., yn) ∝ exp{− 1 2τ2
X i,j
wij(yi − yj)2}
→ a pairwise difference Gibbs distribution with quadratic potentials.
• What happens to this distribution if I add a constant µ to all the Yi?
→ nothing → the Yi’s are not centered.
• This distribution does not identify an over- all mean.
• To provide the required centering we can impose a constraint
XYi = 0
• Problems with this as a model for data?
• Can not expect our data to respect this constraint...
• This constrained improper distribution can not be used as a model for data, but can be used as a model for spatial random effects (a prior for parameters that vary spatially).
Perhaps explain this in the context of a map...
• If we want to use the autonormal model as a distribution for data (as opposed to a prior for spatial random effects) we need an alternative solutions to the impropriety problem.
• We have (DW −W)1 = 0 → causing unfor- tunate results.
• An obvious remedy is to incorporate a con- stant ρ so that
Σ−1y = (DW − ρW) is non-singular.
• Such models are often referred to as proper CAR models.
• How to choose ρ to ensure non-singularity?
• Such non-singularity is guaranteed provided ρ ∈ (λ1
(1), λ1
(n)) where λ(1) < λ(2) · · · < λ(n) are the ordered eigenvalues of D−w12WD−w12.
• It is also possible to show λ(1) < 0 and λ(n) > 0 so that the interval (λ1
(1), λ1
(n)) con- tains 0.
• How to choose ρ?
• Leave ρ ∈ (λ1
(1), λ1
(n)) unspecified as a pa- rameter in our model.
• One usually adopts the simple choice ρ ∈ [0, 1) when λ(n) = 1.
• Here ρ = 0 corresponds to conditional dis- tributions
[Yi|yj, j 6= i] ∼ N (0, τ2
wi+), i = 1, ..., n
→ spatial independence.
• Further ρ → 1 corresponds to the IAR model and larger values of ρ imply a greater de- gree of spatial dependence.
• Note with the IAR model (ρ = 1) we only have one parameter τ2 - the variance com- ponent.
• This variance component does not quantify spatial dependence in any way.
• With the IAR model, much of the spa- tial structure imposed by the model is pre- implied by the chosen W.
• Note also that independence does not arise as a special case of this model.
• Of course one could, in principle, allow the neighborhood structure, W, itself to be a parameter in the model → fairly compli- cated.
• When the more general CAR model incor- porating ρ is employed, how does one in- terpret ρ? → very carefully.
• In particular, ρ does not represent corre- lation. Rather, ρ is some measure of de- pendence in the sense that ρ = 0 corre- sponds to independence and spatial depen- dence increases with ρ.
• The maximum allowable spatial dependence corresponds to the IAR model when ρ = 1.
• To calibrate ρ for a given neighborhood structure and map, one could simulate re- alizations from the CAR model for different values of ρ. For each realization we could compute Moran’s I to get a strength of the spatial dependence implied by a particular ρ value.
• In general, even moderate amounts of spa- tial dependence will require ρ > 0.9 and usually estimates of ρ are close to its up- per bound value.
• When modeling random effects in an areal data setting, I usually fit models based on the proper CAR model as well as the IAR model and then compare the two using some model selection tool.
• Usually, at least in my experience, the IAR model ends up being the preferred model.
• I note again that in the framework of this model we specify a joint normal distribu- tion for the data and specify the inverse covariance matrix
Σ−1y = (DW − ρW)
but in general have no simple form for the covariance matrix.
• The elements of Σy give us, of course, in- formation on the marginal covariance struc- ture of Y. The elements of Σ−1y give us information on the conditional covariance structure of Y.
• For example, using standard results asso- ciated with the MVN distribution, we can show that 1/(Σ−1y )ii gives us V AR(Yi|yj, j 6=
i).
• Moreover, if (Σ−1y )ij = 0 then Yi and Yj are conditionally independent given {yk, k 6=
i, j}.
• We see that Wij = 0 implies conditional independence between Yi and Yj (given all other Y ’s). From this we see that the specification of a neighborhood structure W is essentially a set of conditional inde- pendence assumptions.
• Regression: If the proper CAR model is used as a distribution for data, we can ac- commodate covariates xi by modifying the conditional distributions to
N (x0iβ +X
j
wij
wi+(yj −x0jβ), τ2
wi+), i = 1, ..., n
• With these conditional specifications the marginal distribution for Y is MVN with µ = Xβ and Σ−1y = (DW − ρW).
• We will mostly be concerned with the µ = 0 case when CAR models are applied as a (prior) distribution for random effects.
• Multivariate spatial data: Suppose, associ- ated with each areal unit, we observe sev- eral, say p dependent observations Yi = (Yi1, Yi2..., Yip).
• Models for these sorts of data must ac- count for the spatial dependence across areal units and also dependence within each Yi.
• Multivariate conditional autoregressive mod- els (MCAR) have been developed for such data.
• The idea is a straightforward extension of the univariate case where we specify the joint distribution of all np random variables
Y = (Y1, ...,Y0n)0
through a set of full conditional distribu- tions. These full conditional distributions will be p−variate normal instead of univari- ate normal.
• Note also that a CAR model can, in princi- ple, be adopted for model point referenced data by allowing the elements of W to de- pend on the distance between points.
• This may be useful for very large datasets since CAR models, as we shall see in Chap- ter 5, are numerically less demanding to fit within a Gibbs sampling framework.
• When prediction is not of interest, this is a perfectly acceptable way of building a joint distribution. Whether or not such an ap- proach yields an adequate representation of the underlying spatial structure in a given application is a model assessment issue - and a critical one at that.
Non-Gaussian CAR models
• When dealing with non-Gaussian areal data, our preferred approach will be based on generalized linear mixed models, where we incorporate Gaussian CAR random effects into models for non-Gaussian data → Chap- ter 5.
• An alternative to this approach, which we consider now, is to adopt a MRF type spec- ification for the data Y1, ..., Yn and deter- mine a joint distribution through the speci- fication of a set of compatible non-Gaussian full conditional distributions.
• For example, we can allow the full condi- tional distributions f (yi|yj, j 6= i) to take Poisson, binomial, Gamma or in fact any form from the exponential family.
• When these are compatible, the result is a joint spatial distribution for non-Gaussian data. See Cressie (1993) for a full devel- opment of CAR models in a general frame- work.
• I will present two examples of such non- Gaussian CAR models and discuss the com- putational problems associated with these.
• Binary Data: For binary Y1, ..., Yn an autol- ogistic (binary MRF) model specifies the full conditional distributions as
pi = P (Yi = 1|yj, j 6= i) = P (Yi = 1|yj, j ∈ ∂i) and
log( pi
1 − p1) = x0iβ + ψ X
j
wijyj
where β is a vector of regression param- eters and ψ ∈ R is a spatial dependence parameter.
• These full conditional distributions are com- patible and Brook’s lemma yields the form of the joint pmf:
f (y1, ..., yn) ∝ exp{β0(X
i
yixi)+ψ X
i,j
wijyiyj} A Gibbs distribution with potentials on cliques of order 2.
• We can, in principle use this form to fit the model and obtain, for example, MLE’s of β and ψ.
• Unfortunately, there is a computational prob- lem that arises. The normalizing constant in f (y1, ..., yn) depends on model parame- ters
f (y1, ..., yn) = C(β, ψ)
× exp{β0(X
i
yixi) + ψ X
i,j
wijyiyj}
and so would need to be evaluated at each iteration of the maximization procedure.
• Note that C(β, ψ)−1
=
X1 y1=0
· · ·
X1 yn=0
exp{β0(X
i
yixi)+ψ X
i,j
wijyiyj}
• Evaluating this constant for any particu- lar value of β and ψ requires summing 2n terms → not feasible even for moderate n;
in particular since we would have to do this iteratively.
• Evaluating the normalizing constant is also required for Bayesian inference. Pseudo likelihood, a somewhat adhoc inferential scheme can be employed to avoid the cal- culation of the normalization constant.
• The autologistic model can be generalized to the case where each Yi is categorical and takes values in the set {0, L − 1} for some L ≥ 2.
• In this case the full conditional distributions are defined by
P (Yi = l|yj, j 6= i) ∝ exp(ψ X
j6=i
wijI(yj = l)) where ψ ∈ R is again a spatial dependence parameter.
• Covariates can be added to this model just as in the autologistic case.
• This model, referred to as the Potts model can be used to model allocations in finite mixture models providing a robust alterna- tive to the usual Gaussian spatial random effects models
• As before, the model contains a normal- izing constant C(ψ) that causes computa- tional problems when fitting this model.
Simultaneous autoregressive (SAR) models
• MRF models such as the CAR models we have discussed are by far the most popular sorts of models for areal data.
• An alternative class of models for areal data can be based on an autoregressive struc- ture similar to that adopted in time series modeling.
• As before we have data Y1, ..., Yn and spatial information W.
• Unlike the MRF approach, we do not focus on full conditionals in this framework.
• Instead, we start with a vector of indepen- dent errors or innovations e ∼ M V N (0,D˜) with D˜ = diag{σ12, ..., σn2} or more simply D˜ = σ2I.
• We then construct a simple functional re- lationship between Y and e and this rela- tionship induces a distribution for Y.
• Consider the relationship Yi = X
j
bijYj + ei, i = 1, ..., n
for some constant bij and with bii = 0.
• In matrix form this is
Y = BY + e where B = (bij).
• From this we can obtain the relationship between Y and e
Y = (I − B)−1e assuming I − B is invertible.
• The simple distribution assigned to e then induces the following for Y:
Y ∼ M V N (0, (I − B)−1D˜[(I − B)−1]0) and when D˜ = σ2I this is just
Y ∼ M V N (0, σ2(I − B)−1[(I − B)−1]0)
• To ensure that I − B is invertible, we can take B = ρW and restrict ρ to an appropri- ate range.
• Invertibility is ensured when ρ ∈ (1/λ(1), 1/λ(n)) where λ(1) and λ(n) are the smallest and
largest eigenvalues of W.
• The SAR model is then based on Σy = σ2[(I − ρW)(I − ρW)0]−1
where ρ is referred to as the autoregression parameter with ρ = 0 corresponding to
Σy = σ2I an independence model.
• Regression: When covariates are present, the SAR model can be adopted as a model for residuals.
• In this case we define U = Y − Xβ and assume U follows a SAR model so that
(I − ρW)U = e
→ (I − ρW)(Y − Xβ) = e
→ Y = ρWY + (I − ρW)xβ + e
• Note here that if W = 0 this is the standard linear model.
• Note that the spatial covariance structure implied by the SAR model, just as with the CAR model, is not entirely intuitive.
• In addition, the SAR models unlike the CAR models, are not based on a set of full con- ditional distributions. These of course ex- ist, but they do not have a computationally convenient form.
• As a result, SAR models are not well suited to model fitting using the Gibbs sampler.
• Finally, Cressie (1993) shows that any SAR model can be represented as a CAR model;
however, the converse is not true.
• There exist CAR models that do not have a representation as a SAR model.
• Given the above, we will not consider SAR models further in this course.
• I note; however, the general approach of building spatial distributions using transfor- mations of independent RV’s is a simple, intuitive and appealing approach. Other similar approaches could (and should) be explored further...