Spatial Statistics Chapter 3 Basics of areal data and areal data modeling

(1)

Spatial Statistics Chapter 3

Basics of areal data and areal data modeling

• Recall areal data also known as lattice data are data Y (s),s ∈ D where D is a discrete index set.

• This usually corresponds to data Y₁, ..., Y_n observed on a set of geographical units (over a map), the pixels of an image or a regular arrangements of points on a lattice.

1

(2)

• Models for areal data are also sometimes employed for irregularly arranged

point-referenced data sets when the num- ber of spatial units is very large → compu- tational considerations.

(3)

• As we shall see in Chapter 5, certain types of areal models are computationally easier to work with and ideal for use with Gibbs sampler.

• In this setting, unlike the geostatistical one, we are typically not interested in prediction and have observed data at all spatial sites.

• What is of interest in this setting?

• Spatial pattern evident? Are there clusters of high/low values?

(4)

• Smoothing: Filter out some of the noise in the data → help elucidate spatial pattern.

• Deciding how much to smooth the data is not always clear. Smoother maps are easier to interpret but will generally not represent the data well and vice versa.

• Example: No smoothing at all is equivalent to presenting a raw map of the data. Ex- treme smoothing would involve associating the same value ¯Y with all units. Optimal smoothing lies somewhere between these two extremes.

(5)

• Also of interest in this setting is relating the response to covariates through regres- sion models → need to account for spatial dependence in such regression models.

• Also in the regression setting, we would be interested in examining the residual spatial structure after accounting for covariates.

Exploratory methods for areal data

• Recall the primary source of spatial information in the areal setting consists of ad- jacencies → knowing, for each region, all the ‘neighboring’ regions (for some appro- priate definition of neighbor). i.e.the ar- rangement of the regions across the map.

(6)

• This adjacency structure is quantified through the neighborhood (or proximity) matrix W:

W_ij =









0 if i = j

0 if i and j are not neighbors c_ij > 0 if i and j are neighbors

c_ij quantifies the strength of the neighbor relationship.

• Most often c_ij = 1 for all neighbor pairs and two regions are considered neighbors if they share a common boundary.

• It is instructive to think of this spatial structure as a graph, where nodes correspond to regions and two nodes on the graph are connected if the associated regions are neighbors.

(7)

• The neighborhood matrix W can be used for exploratory analysis and will also be used when we discuss models for areal data.

• Note that it is also possible to define 2^nd order neighbors and to have a corresponding 2^nd order neighborhood matrix.

• After simply plotting data (usually on a map in this case) an exploratory analysis usually proceeds with an attempt to quantify the strength of spatial association in the data.

(8)

• For this, two statistics can be employed:

1. Morn’s I:

I = n ^P_i ^P_j w_ij(Y_i − ¯Y )(Y_j − ¯Y ) (^P_i6=j w_ij) ^P_i(Y_i − ¯Y )² where

– I ≈ 0 → no spatial dependence

– I > 0 → positive spatial dependence – I < 0 → negative spatial dependence

Can be thought of as an areal ‘correla- tion coefficient’.

(9)

2. Geary’s C:

C = (n − 1)^P_i ^P_j w_ij(Y_i − Y_j)² (^P_i6=j w_ij) ^P_i(Y_i − ¯Y )² where C ≥ 0

– C ≈ 1 → no spatial dependence

– C < 1 → positive spatial dependence – C > 1 → negative spatial dependence

• Under the hypothesis that the Y_i’s are iid, one can show that the asymptotic distributions of both statistics are normal and that

E[I] = 0; E[C] = 1

(10)

• Using these asymptotic distributions one can easily construct hypothesis test of

H₀ : E[I] = 0

against either a one or two-sided alternative.

• Another, perhaps preferable, way to test for association is to use a Monte Carlo test for independence.

• Idea: Under the assumption that the Y_i’s are iid, the distribution of I (and C) is in- variant to permutations of the Y_i’s.

• What does this mean?

(11)

• The distribution of I clearly depends on W; however, if the spatial structure has no role to play then permuting the rows of W will not change the distribution of I.

• So [I|W] ≡ [I|W^∗] where W^∗ is any row permutation of W.

• To calculate a Monte Carlo test for spatial association, we randomly permute the data vector Y (equivalent to permuting the rows of W) and calculate the value the new value say, I⁽¹⁾.

• Repeat this procedure many times, say, n = 999: I⁽¹⁾, I⁽²⁾, ..., I⁽⁹⁹⁹⁾ and plot the histogram of these values.

• We then locate the original observed value I^(obs) on this histogram.

(12)

• Under the assumption that the Y_i’s are iid, the observed value I^(obs) comes from the same distribution as I⁽¹⁾, I⁽²⁾, ..., I⁽⁹⁹⁹⁾ → I^(obs) should lie somewhere in the main body of the histogram.

• If I^(obs) lies in the tails of the histogram, we have evidence against the hypothesis that the Y_i’s are iid.

• Can quantify this by calculating an empir- ical p-value.

• If associated with each Y_i is a vector of covariates x_i, then even if the Y_i’s are spatially dependent they may not be identically distributed.

(13)

• As in the point referenced setting, this sug- gests applying these techniques to the es- timated residuals from standard regression models.

Simple Smoothing

• To filter out noise in the data and produce a smooth map we can use the W matrix and replace each Y_i with

Yˆ_i = ^X

j

w_ij

w_i+Y_j; w_i+ = ^X

j

w_ij

a weighted average that will encourage the smoothed Y_i to be similar to its neighbors.

Problems with this?

• A possible remedy is

Yˆ_i^∗ = (1 − α)Y_i + α ˆY_i for α ∈ [0, 1].

(14)

• Here, α = 0 yields the raw data and α = 1 yields a very smooth map. Try different values of α in an exploratory fashion.

• In Chapter 5 we will discuss hierarchical models for smoothing which will incorporate covariate information and spatial random effects.

• In that setting our smoothed Y_i’s will be posterior means E[Y_i|Data].

(15)

Markov Random Fields

• In the point-referenced data setting we specified the joint distribution of the observed data Y₁, ..., Y_n directly.

• In the areal setting, where we have Y₁, ..., Y_n and a neighborhood matrix W we will take a different approach and build the required joint distributions f (y₁, ..., y_n) through the specification of a set of simpler full con- ditional distributions f (y_i|y_j, j 6= i), i = 1, ..., n.

• For a given joint distribution f (y₁, ..., y_n) we can always obtain unique and well defined conditional distributions

f (y₁, ..., y_n) = f (y₁, ..., y_n)

R f (y₁, ..., y_n)dy_j

(16)

• But note that the converse is not always true!

• We can not simply write down a set of full conditional distributions f (y_i|y_j, j 6= i), i = 1, ..., n and claim that these determine a unique f (y₁, ..., y_n).

• Consider two random variables with Y₁|Y₂ ∼ N (α₀ + α₁Y₂, σ₁²) and

Y₂|Y₁ ∼ N (β₀ + β₁Y₁³, σ₂²)

(17)

• In this case

E[Y₁] = E[E[Y₁|Y₂]] = E[α₀ + α₁Y₂]

= α₀ + α₁E[Y₂] → E[Y₂] is a linear function of E[Y₁]

• But we also have

E[Y₂] = E[E[Y₂|Y₁]] = E[β₀ + β₁Y₁³] β₀ + β₁E[Y₁³]

• Both conditions can not hold (except in trivial cases) and so here the two conditional distributions do not determine a valid and unique joint distribution.

(18)

• In general when a set of full conditional distributions determine a unique and valid joint distribution we say that the set of conditional distributions is compatible.

• Improper distribution: An improper distribution is a distribution with non-integrable density. That is, if S is the sample space of Y then

Z

S f (y)dy = ∞

• When would such an object be useful in statistics? Clearly, an improper distribution is not useful as a model for data.

• In Bayesian statistics, where parameters are assigned probability distributions, improper distributions may be employed as priors.

How?

(19)

• Even though the prior density π(θ) is such

that Z

π(θ)dθ = ∞

having observed data y (assumed to arise from a proper distribution) the corresponding posterior may be proper

Z

π(θ|y)dθ < ∞

and so inference based on this posterior is valid.

• Such distributions have their uses in Bayesian statistics and in fact are used, as we shall see later, as models for random effects in an areal data setting.

(20)

• Given a set of compatible and proper full conditional distributions f (y_i|y_j, j 6= i), i = 1, ..., n, the resulting joint distribution can be improper!

• Example: consider the bivariate joint distribution with

f (y₁, y₂) ∝ exp[−1

2(y₁−y₂)²], (y₁, y₂) ∈ R² This density has no valid normalizing constant since

Z Z

exp[−1

2(y₁ − y₂)²]dy₁dy₂ = ∞ and so the distribution is improper.

• What about the corresponding full conditional distributions?

(21)

• Clearly

[Y₁|Y₂ = y₂] ∼ N (y₂, 1) and

[Y₂|Y₁ = y₁] ∼ N (y₁, 1)

so here we have an example of two compatible and proper full conditional distributions that yield an improper joint distribution.

• If we have a set of compatible full con- ditional distributions f (y_i|y_j, j 6= i), i = 1, ..., n, how can we determine the form of the resulting joint distribution f (y₁, ..., y_n)?

→ Brook’s Lemma

(22)

• Brook’s Lemma notes that if {f (y_i|y_j), j 6=

i), i = 1, ..., n} is a set of compatible full conditional distributions and y₀ = (y₁₀, ..., y_no) is any fixed point in the support of f (y₁, ..., y_n) then

f (y₁, ..., y_n) = f (y₁|y₂, ..., y_n)

f (y₁₀|y₂, ..., y_n) · f (y₂|y₁₀, y₃..., y_n) f (y₂₀|y₁₀, y₃, ..., y_n)

· · · f (y_n|y₁₀, ..., y_n−1,0)

f (y_n0|y₁₀, ..., y_n−1,0)f (y₁₀, ..., y_n0)

• This gives us the joint distribution up to a normalizing constant.

• If f (y₁, ..., y_n) is proper, then the fact that it integrates to 1 determines the normalizing constant.

• How should we specify the full conditional distributions so that (1) they are compatible and (2) they are simple enough and yet yield useful spatial structure?

(23)

• We will not worry about (1). To address (2) we will assume that the full conditional distribution of Y_i depends only on its ‘neighbors’.

• That is, the full conditional distribution of Y_i will depend only on those Y_j’s that have W_ij 6= 0.

• Letting ∂_i = {j|W_ij 6= 0} denote the set of neighbors for region i (i ∼ j ↔ W_ij 6= 0) this implies

f (y_i|y_j, j 6= i) = f (y_i|y_j, j ∈ ∂_i), i = 1, ..., n

(24)

• This sort of specification for the full conditional distributions, when compatible, is referred to as a Markov random field (MRF) due to the obvious Markovian structure of the full conditional distributions.

• The idea behind such models is the de- velopment of a complicated spatial dependence structure through a set of simple ‘lo- cal’ specifications that depend only on lattice (or map) adjacencies.

• We will develop and employ these sorts of models as models for areal data or as models for random effects in an areal setting.

• Clique: A clique is a set of cells (or indices) such that each element in the set is a neighbor of every other element in the set.

(25)

• Think of the graph representation of the neighborhood structure mentioned earlier.

A clique represents a set of nodes M on the graph such the each pair of indices (i, j) with both i and j in M represents an edge of the graph.

• With n spatial units, we can have cliques of size 1, ..., n.

• Potential function: A potential of order k is a function of k arguments that is ex- changeable in its arguments.

• A potential function of order k typically op- erates on the variable values y_s₁, ..., y_s_k as- sociated with a clique {s₁, ..., s_k} of size k.

(26)

• Examples k = 2

1. y_iy_j

2. (y_i − y_j)²

3. y_iy_j + (1 − y_i)(1 − y_j) for binary data

• Gibbs Distribution: A joint distribution for Y₁, ..., Y_n is a Gibbs distribution if the joint density/pmf f (y₁, ..., y_n) takes the following form

f (y₁, ..., y_n) ∝ exp{γ ^X

k

X

α^∈M_k

φ^(k)(y_α₁, ..., y_α_k)}

Where φ^(k)(·) is a potential of order k, M_k is the collection of all cliques of size k and γ > 0 is a parameter.

(27)

• The joint distribution f (y₁, ..., y_n) depends on y₁, ..., y_n only through potential functions evaluated over the cliques induced by the neighborhood (graph) structure.

• Note such a distribution may have more than one parameter → the potential func- tions may depend on unknown parameters.

(28)

• Hammersley-Clifford Theorem: If we have a MRF then the corresponding joint distribution is a Gibbs distribution.

• Only Cliques of order 1 → independence - consider the form of the corresponding Gibbs distribution.

• Distributions having Cliques of order ≤ 2 are most common. An example is the pairwise difference form

f (y₁, ..., y_n) ∝ exp{− 1 2τ²

X i,j

(y_i − y_j)²} based on quadratic potential functions.

(29)

Conditionally autoregressive (CAR) models

• Particularly popular class of MRF models introduced by J. Besag in 1974.

• These models have become very popular within the last decade, particulary since the advent of Gibbs sampling.

• Gibbs sampling is a procedure for simu- lating realizations from a joint distribution f (y₁, ..., y_n) using only the full conditional distributions {f (y_i|y_j, j 6= i), i = 1, ..., n}.

(30)

• Useful in Bayesian statistics when we want to draw samples from a posterior distribution of interest.

• MRF models are ideal in this setting since they are specified in terms of full conditional distributions. More on this later...

(31)

Autonormal (Gaussian) CAR models

• Here we begin with the full conditionals [Y_i|y_j, j 6= i] ∼ N (^X

j

b_ijy_j, τ_i²), i = 1, ..., n

• For appropriately chosen b_ij these full conditionals are compatible, so using Brook’s lemma we can obtain the joint distribution as

f (y₁, ..., y_n) ∝ exp{−1

2y⁰D⁻¹(I − B)y} where B = (b_ij) and D = diag{τ₁², ..., τ_n²}

• Looks like a multivariate normal distribution with µ = 0 and Σ⁻¹y ⁼ D⁻¹(I − B).

(32)

• This is of course only true if D⁻¹(I − B) is symmetric.

• We must choose b_ij in the conditional Gaus- sian distributions to ensure this symmetry.

• In particular, choosing b_ij so that b_ij

τ_i² = b_ji

τ_j², for all i, j

will ensure symmetry (and compatibility).

• Notice that if τ_i² 6= τ_j² then we can not have b_ij = b_ji.

• How to choose the b_ij’s subject to the above constraints? and also, to yield a reasonable joint spatial distribution?

(33)

• We will take the b_ij’s to be functions of the neighborhood matrix W

b_ij = w_ij

w_i+, τ_i² = τ² w_i+

Does this specification satisfy the symmetry condition?

• With these choices the full conditional distributions are

[Y_i|y_j, j 6= i] ∼ N (^X

j

w_ij

w_i+y_j, τ²

w_i+), i = 1, ..., n Interpretation?

(34)

• The joint distribution for these choices of b_ij and τ_i is

f (y₁, ..., y_n) ∝ exp{− 1

2τ²y⁰(D_W − W)y} where D_W = diag{w₁₊, ..., w_n+}.

• This is again MVN with µ = 0 and Σ⁻¹_y = (D_W − W)

• Note here that (D_W − W)1 = 0 → Σ⁻¹_y is singular!

• This is a singular MVN distribution → an improper distribution → no valid normaliz- ing constant

(35)

• Such a distribution is often referred to as a Gaussian intrinsic autoregression.

• To further investigate this impropriety we can rewrite the joint distribution as

f (y₁, ..., y_n) ∝ exp{− 1 2τ²

X i,j

w_ij(y_i − y_j)²}

→ a pairwise difference Gibbs distribution with quadratic potentials.

• What happens to this distribution if I add a constant µ to all the Y_i?

→ nothing → the Y_i’s are not centered.

• This distribution does not identify an over- all mean.

(36)

• To provide the required centering we can impose a constraint

XY_i = 0

• Problems with this as a model for data?

• Can not expect our data to respect this constraint...

• This constrained improper distribution can not be used as a model for data, but can be used as a model for spatial random effects (a prior for parameters that vary spatially).

Perhaps explain this in the context of a map...

(37)

• If we want to use the autonormal model as a distribution for data (as opposed to a prior for spatial random effects) we need an alternative solutions to the impropriety problem.

• We have (D_W −W)1 = 0 → causing unfor- tunate results.

• An obvious remedy is to incorporate a con- stant ρ so that

Σ⁻¹_y = (D_W − ρW) is non-singular.

• Such models are often referred to as proper CAR models.

(38)

• How to choose ρ to ensure non-singularity?

• Such non-singularity is guaranteed provided ρ ∈ (_λ¹

(1), _λ¹

(n)) where λ₍₁₎ < λ₍₂₎ · · · < λ_(n) are the ordered eigenvalues of D⁻_w¹²WD⁻_w¹².

• It is also possible to show λ₍₁₎ < 0 and λ_(n) > 0 so that the interval (_λ¹

(1), _λ¹

(n)) contains 0.

• How to choose ρ?

(39)

• Leave ρ ∈ (_λ¹

(1), _λ¹

(n)) unspecified as a parameter in our model.

• One usually adopts the simple choice ρ ∈ [0, 1) when λ_(n) = 1.

• Here ρ = 0 corresponds to conditional dis- tributions

[Y_i|y_j, j 6= i] ∼ N (0, τ²

w_i+), i = 1, ..., n

→ spatial independence.

• Further ρ → 1 corresponds to the IAR model and larger values of ρ imply a greater de- gree of spatial dependence.

(40)

• Note with the IAR model (ρ = 1) we only have one parameter τ² - the variance component.

• This variance component does not quantify spatial dependence in any way.

• With the IAR model, much of the spatial structure imposed by the model is pre- implied by the chosen W.

• Note also that independence does not arise as a special case of this model.

(41)

• Of course one could, in principle, allow the neighborhood structure, W, itself to be a parameter in the model → fairly compli- cated.

• When the more general CAR model incor- porating ρ is employed, how does one in- terpret ρ? → very carefully.

• In particular, ρ does not represent corre- lation. Rather, ρ is some measure of de- pendence in the sense that ρ = 0 corre- sponds to independence and spatial depen- dence increases with ρ.

• The maximum allowable spatial dependence corresponds to the IAR model when ρ = 1.

(42)

• To calibrate ρ for a given neighborhood structure and map, one could simulate realizations from the CAR model for different values of ρ. For each realization we could compute Moran’s I to get a strength of the spatial dependence implied by a particular ρ value.

(43)

• In general, even moderate amounts of spa- tial dependence will require ρ > 0.9 and usually estimates of ρ are close to its up- per bound value.

• When modeling random effects in an areal data setting, I usually fit models based on the proper CAR model as well as the IAR model and then compare the two using some model selection tool.

• Usually, at least in my experience, the IAR model ends up being the preferred model.

(44)

• I note again that in the framework of this model we specify a joint normal distribution for the data and specify the inverse covariance matrix

Σ⁻¹_y = (D_W − ρW)

but in general have no simple form for the covariance matrix.

• The elements of Σ_y give us, of course, information on the marginal covariance structure of Y. The elements of Σ⁻¹_y give us information on the conditional covariance structure of Y.

• For example, using standard results associated with the MVN distribution, we can show that 1/(Σ⁻¹_y )_ii gives us V AR(Y_i|y_j, j 6=

i).

(45)

• Moreover, if (Σ⁻¹_y )_ij = 0 then Y_i and Y_j are conditionally independent given {y_k, k 6=

i, j}.

• We see that W_ij = 0 implies conditional independence between Y_i and Y_j (given all other Y ’s). From this we see that the specification of a neighborhood structure W is essentially a set of conditional independence assumptions.

• Regression: If the proper CAR model is used as a distribution for data, we can ac- commodate covariates x_i by modifying the conditional distributions to

N (x⁰_iβ +^X

j

w_ij

w_i+(y_j −x⁰_jβ), τ²

w_i+), i = 1, ..., n

(46)

• With these conditional specifications the marginal distribution for Y is MVN with µ = Xβ and Σ⁻¹_y = (D_W − ρW).

• We will mostly be concerned with the µ = 0 case when CAR models are applied as a (prior) distribution for random effects.

• Multivariate spatial data: Suppose, associated with each areal unit, we observe sev- eral, say p dependent observations Y_i = (Y_i1, Y_i2..., Y_ip).

• Models for these sorts of data must account for the spatial dependence across areal units and also dependence within each Y_i.

(47)

• Multivariate conditional autoregressive models (MCAR) have been developed for such data.

• The idea is a straightforward extension of the univariate case where we specify the joint distribution of all np random variables

Y = (Y₁, ...,Y⁰_n)⁰

through a set of full conditional distributions. These full conditional distributions will be p−variate normal instead of univari- ate normal.

• Note also that a CAR model can, in principle, be adopted for model point referenced data by allowing the elements of W to depend on the distance between points.

(48)

• This may be useful for very large datasets since CAR models, as we shall see in Chap- ter 5, are numerically less demanding to fit within a Gibbs sampling framework.

• When prediction is not of interest, this is a perfectly acceptable way of building a joint distribution. Whether or not such an approach yields an adequate representation of the underlying spatial structure in a given application is a model assessment issue - and a critical one at that.

(49)

Non-Gaussian CAR models

• When dealing with non-Gaussian areal data, our preferred approach will be based on generalized linear mixed models, where we incorporate Gaussian CAR random effects into models for non-Gaussian data → Chap- ter 5.

• An alternative to this approach, which we consider now, is to adopt a MRF type spec- ification for the data Y₁, ..., Y_n and determine a joint distribution through the specification of a set of compatible non-Gaussian full conditional distributions.

(50)

• For example, we can allow the full condi- tional distributions f (y_i|y_j, j 6= i) to take Poisson, binomial, Gamma or in fact any form from the exponential family.

• When these are compatible, the result is a joint spatial distribution for non-Gaussian data. See Cressie (1993) for a full devel- opment of CAR models in a general framework.

• I will present two examples of such non- Gaussian CAR models and discuss the computational problems associated with these.

(51)

• Binary Data: For binary Y₁, ..., Y_n an autologistic (binary MRF) model specifies the full conditional distributions as

p_i = P (Y_i = 1|y_j, j 6= i) = P (Y_i = 1|y_j, j ∈ ∂_i) and

log( p_i

1 − p₁) = x⁰_iβ + ψ ^X

j

w_ijy_j

where β is a vector of regression param- eters and ψ ∈ R is a spatial dependence parameter.

• These full conditional distributions are compatible and Brook’s lemma yields the form of the joint pmf:

f (y₁, ..., y_n) ∝ exp{β⁰(^X

i

y_ix_i)+ψ ^X

i,j

w_ijy_iy_j} A Gibbs distribution with potentials on cliques of order 2.

(52)

• We can, in principle use this form to fit the model and obtain, for example, MLE’s of β and ψ.

• Unfortunately, there is a computational problem that arises. The normalizing constant in f (y₁, ..., y_n) depends on model parameters

f (y₁, ..., y_n) = C(β, ψ)

× exp{β⁰(^X

i

y_ix_i) + ψ ^X

i,j

w_ijy_iy_j}

and so would need to be evaluated at each iteration of the maximization procedure.

• Note that C(β, ψ)⁻¹

=

X1 y₁=0

· · ·

X1 y_n=0

exp{β⁰(^X

i

y_ix_i)+ψ ^X

i,j

w_ijy_iy_j}

(53)

• Evaluating this constant for any particular value of β and ψ requires summing 2ⁿ terms → not feasible even for moderate n;

in particular since we would have to do this iteratively.

• Evaluating the normalizing constant is also required for Bayesian inference. Pseudo likelihood, a somewhat adhoc inferential scheme can be employed to avoid the cal- culation of the normalization constant.

• The autologistic model can be generalized to the case where each Y_i is categorical and takes values in the set {0, L − 1} for some L ≥ 2.

(54)

• In this case the full conditional distributions are defined by

P (Y_i = l|y_j, j 6= i) ∝ exp(ψ ^X

j6=i

w_ijI(y_j = l)) where ψ ∈ R is again a spatial dependence parameter.

• Covariates can be added to this model just as in the autologistic case.

• This model, referred to as the Potts model can be used to model allocations in finite mixture models providing a robust alternative to the usual Gaussian spatial random effects models

• As before, the model contains a normal- izing constant C(ψ) that causes computa- tional problems when fitting this model.

(55)

Simultaneous autoregressive (SAR) models

• MRF models such as the CAR models we have discussed are by far the most popular sorts of models for areal data.

• An alternative class of models for areal data can be based on an autoregressive structure similar to that adopted in time series modeling.

• As before we have data Y₁, ..., Y_n and spatial information W.

• Unlike the MRF approach, we do not focus on full conditionals in this framework.

(56)

• Instead, we start with a vector of independent errors or innovations e ∼ M V N (0,D˜) with D˜ = diag{σ₁², ..., σ_n²} or more simply D˜ = σ²I.

• We then construct a simple functional relationship between Y and e and this relationship induces a distribution for Y.

• Consider the relationship Y_i = ^X

j

b_ijY_j + e_i, i = 1, ..., n

for some constant b_ij and with b_ii = 0.

(57)

• In matrix form this is

Y = BY + e where B = (b_ij).

• From this we can obtain the relationship between Y and e

Y = (I − B)⁻¹e assuming I − B is invertible.

• The simple distribution assigned to e then induces the following for Y:

Y ∼ M V N (0, (I − B)⁻¹D˜[(I − B)⁻¹]⁰) and when D˜ = σ²I this is just

Y ∼ M V N (0, σ²(I − B)⁻¹[(I − B)⁻¹]⁰)

(58)

• To ensure that I − B is invertible, we can take B = ρW and restrict ρ to an appropri- ate range.

• Invertibility is ensured when ρ ∈ (1/λ₍₁₎, 1/λ_(n)) where λ₍₁₎ and λ_(n) are the smallest and

largest eigenvalues of W.

• The SAR model is then based on Σy ^{= σ}²^[(I − ρW)(I − ρW)⁰]⁻¹

where ρ is referred to as the autoregression parameter with ρ = 0 corresponding to

Σy ^{= σ}²I an independence model.

(59)

• Regression: When covariates are present, the SAR model can be adopted as a model for residuals.

• In this case we define U = Y − Xβ and assume U follows a SAR model so that

(I − ρW)U = e

→ (I − ρW)(Y − Xβ) = e

→ Y = ρWY + (I − ρW)xβ + e

• Note here that if W = 0 this is the standard linear model.

• Note that the spatial covariance structure implied by the SAR model, just as with the CAR model, is not entirely intuitive.

(60)

• In addition, the SAR models unlike the CAR models, are not based on a set of full conditional distributions. These of course exist, but they do not have a computationally convenient form.

• As a result, SAR models are not well suited to model fitting using the Gibbs sampler.

• Finally, Cressie (1993) shows that any SAR model can be represented as a CAR model;

however, the converse is not true.

• There exist CAR models that do not have a representation as a SAR model.

• Given the above, we will not consider SAR models further in this course.

(61)

• I note; however, the general approach of building spatial distributions using transfor- mations of independent RV’s is a simple, intuitive and appealing approach. Other similar approaches could (and should) be explored further...