Model Inference - Bayesian regression for network data

After determining the dimension of DX and specifying the hyper-prior parameter λθ,

we are in the position to estimate model parameters. To this end, we sample from the posterior distribution of θ using a one-step Riemannian manifold Hamiltonian Monte Carlo proposal, also known as a manifold Metropolis adjusted Langevin algorithm (MMALA); see Girolami and Calderhead (2011). Given the target density may be high dimensional, this sampling algorithm can provide efficient convergence and

exploration of the target density in comparison to more commonly used Metropo- lis Hastings schemes such as the IRLS method presented in Gamerman (1997) or the preconditioned MALA (Girolami and Calderhead, 2011). The MMALA proposal mechanism generalizes the MALA, based on a Langevin diffusion, by taking into ac- count the natural geometry of the target density and making transitions proposals that are informed by its local structure. Specifically, the proposal density is a normal distribution, with mean based on the local curvature of the space along the direction of steepest gradient and covariance given by the scaled inverse Fisher information matrix of the posterior. The process, in the general case, is as follows:

1. Set t = 1 and begin with θ = θ(0)_;

2. Sample θ∗ from proposal density, q;

3. Accept with probability, ρ(θ(t−1)_{, θ}∗_{). If accept, θ}(t) _{= θ}∗_{; if reject, θ}(t) _{= θ}(t−1)_; 4. Set t = t + 1 and return to step 2.

Here q(θ∗|θn_{, ) = N θ}∗_{| µ(θ}n_{, ),}2_G−1_(θn_{) where} µ(θn, )i = θni + 2 2{G −1 (θn)∇θlog{p(θn)}}i− 2 D X j=1 G−1 (θn)∂G(θ n₎ ∂θj G−1(θn) _ij+ 2 2 D X j=1 G−1 (θn) _ijtr ( G−1(θn)∂G(θ n₎ ∂θj ) ,

is a step parameter, and G is the Fisher information matrix of the posterior distribution. Furthermore we have,

ρ(θ(t−1), θ∗) = min 1, π(θ ∗_)q(θ(t−1)_|θ∗₎ π(θ(t−1)_)q(θ∗_|θ(t−1)₎ ,

130 of Girolami and Calderhead (2011) for further details. We also provide examples in section 3.3.3, along with step size and initial value recommendations, for additional reference.

Now that we have defined our model and outlined fitting procedures and guidelines for hyper-prior parameter selection, we turn to a case study.

A Case Study: Modeling Occurrences of

Residential Burglary

Occurrences of residential burglary in urban areas are not spatially homogeneous. Crime rates vary across the city causing some areas to be heavily impacted by crime in comparison to others. Motivated by the importance of understanding these spatial patterns, we apply the statistical model outlined in chapter 2 defined on the network of city streets in Boston, Massachusetts. Our resulting model and interpretations provide valuable insight into the spatial dynamics of residential burglary in the city. This chapter is organized as follows: in section 1, we introduce and describe our data; in section 2 we apply and extend model (2.4.2) to meet the needs of our application; section 3 outlines hyper-prior specification and examines the posterior distribution of fitted coefficients; section 4 continues with a discussion of our results; section 5 describes a simulation study used to assess the performance of our model; and the chapter concludes in section 6 with a summary of our contributions to modeling vertex attributes.

3.1 The Data

We are interested in a specific type of urban crime, residential burglary. Burglary can be legally defined as “the act of breaking and entering a building with the intent to commit a felony” (Garner, 2001). Our network of interest, the street network of

Figure 3·1: Residential burglary occurrence counts (left) and average wealth estimate (right) for intersections in the network of Boston, MA.

Boston, Massachusetts, contains 18,889 streets segments (edges) and 13,308 intersections (vertices) forming an undirected simple graph.

The crime data, maintained by the city of Boston and available to the public (City of Boston, 2016; Open Data, 2016) consists of crime incident reports provided by the Boston Police Department (BPD) to document the initial details surrounding an incident to which BPD officers respond. The complete data set contains fields categorizing the type of incident, as well as when and where it occurred. We focus on the 7,012 reported instances of residential burglary that occurred between July 2012 and October 2015. We pooled the data over time and mapped each occurrence to its closest intersection. Figure 3·1 pictures each street intersection in the city colored to indicate the value of our attribute of interest, counts of residential burglary occurrences. The attribute covariates gathered and included in our final model are:

Distance from each intersection to the nearest police station.

Sub-district designation of intersection location (business, residential, or other). Gross tax amount for each parcel in the city of Boston in 2015. To convert gross tax

into a vertex indexed covariate, we construct a buffer around each intersection and then aggregate the fractions of gross tax from each parcel in proportion to the parcel area covered by the buffer.

Predictor attributes analyzed but not present in our final model include population measures, a majority-minority ethnicity composition indicator, neighborhood clas- sifications, and an English language proficiency indicator. Our choice of analyzing the aforementioned covariates was driven by available data and established crime theory (Bernasco and Block, 2009). A variable selection process was also used (see section 3.3.2).

A naive approach to modeling crime counts on each intersection would entail identifying a set of predictor attributes describing each intersection and performing count regression, for instance, Poisson or negative binomial regression. That is, if

Yv and xv are the crime occurrence counts and covariate attributes at vertex v, we

could suppose Yv

ind

∼ Po[exp(xT

vβ)]. However, this specification assumes that crime

effects β are constant over the network and thus spatially homogeneous. Empirical evidence suggests otherwise; for example, figure 3·1 displays gross tax information for the city of Boston in 2015. Comparing this attribute to the counts of residential burglary, we see that in some areas of the city higher taxes, indicative of wealth, correspond to larger crime rates, while in other locations lower taxes, identifying poorer neighborhoods, correlate with higher crime rates. Thus, even when including informative covariates, a simple regression cannot adequately explain the variability in burglary occurrences across a city.

Moreover, the existence of crime “hot spots,” areas of concentrated crime counts, is widely acknowledged in crime theory literature (see, e.g., Eck et al., 2005). These zones can be identified visually in figure 3·1; for instance, in the southern region of the northwest peninsula neighborhood, Allston. Many current crime mapping algorithms, including kernel intensity estimation and univariate time series models, focus on identifying these hot spots. A variety of qualitative theories, such as the routine-activity and broken-window theories, aim to describe hot spot propagation (Eck et al., 2005). Yet, interpretative models providing probabilities of hot spot formulation accounting for and explaining heterogeneous effects seem to be lacking.

Our proposed model aims at capturing both gradual variations in crime explained by predictor attributes and abrupt variations attributed to a hot spots. The inter- pretability of the model allows not only an understanding as to what makes certain parts of the city more susceptible to burglary, but identifies how the factors contribut- ing to high crime change across the city. With this information local law enforcement can direct crime prevention efforts to narrowly defined regions of the city and identify area specific interventions to decrease the occurrence rate of residential burglary.

In document Bayesian regression for network data (Page 33-39)