CHAPTER 4: SOURCES THAT MOVE IN SPACE AND TIME – USING
4.2.3 Identifying Outbreaks
The procedure described above yielded CBG centroids that could be dichotomized as “outbreak location” or “not an outbreak location” at each month for both syphilis and gonorrhea. We attempted to correctly identify the outbreak and non-outbreak locations using space/time prospective SaTScan, a widely-used software program for finding case clusters, and our new method for finding outbreaks which utilizes the rate of change in the posterior distribution produced by geostatistical estimation of the incidence rate using the BUMBME method.
We ran prospective saTScan for each month, from January 1, 2005 to January 1, 2010 for syphilis and July 1, 2008 to January 1, 2011 for gonorrhea. A reported incidence case was
included in the prospective SaTScan analysis if its incidence date was prior to the date being evaluated. The spatial parameter of space/time prospective SaTScan was set to cover a spatial area no larger than 5% of the state’s population. The length of the time period for a prospective cluster was allowed to be unlimited, but due to the prospective nature of the analysis, a
significant cluster (p<0.10) must include the most recent time period where data are available. Our analysis yielded a collection of spatial areas for each month that SaTScan identified as prospective clusters. A CBG centroid inside one of these clusters was labeled as an outbreak location identified by prospective SaTScan.
Our new approach consisted of two parts: (1) producing the geostatistical estimates and (2) analyzing the rate of change with regards to the expectation of the estimation posteriors to identify outbreaks. Six month incidence rate data were created for each month at each CBG
96
centroid for both syphilis and gonorrhea. This was done by summing the respective number of syphilis or gonorrhea cases within the CBG during the previous 6 months and dividing by the block group’s estimated population at risk for the corresponding year. In order to estimate the CBG population for each year, we interpolated the value by using the block group’s 2000 census population and its 2007 estimate, extrapolating the rate of change during this 7 year period to years beyond 2007.
We utilized the BME computational library for producing the space/time geostatistical estimation due to its rigorous nonlinear mathematical framework that includes the ability to incorporate both Gaussian and non-Gaussian data. BME is a two-stage process which uses maximum entropy to organize the general knowledge (G) of the STRF (such as space/time mean trend and covariance functions) to compute a prior probability density function (PDF) of the space/time process. The prior PDF is then updated by the site-specific knowledge (S) which can include hard (measured without error) and soft (characterized by a PDF) data to produce a posterior PDF that characterizes the STRF at any space/time point. When only hard data is available, BME produces the simple space/time kriging estimator.
The BUMBME approach was used to produce the geostatistical estimates. This method smoothes the map of the incidence rate by using a distribution of possible incidence rates in place of the observed rate for each space/time location (soft data). The distributions correspond to Bayesian posterior distributions which were derived by using a uniform distribution around the observed rate as a likelihood function to update a lognormal prior distribution for the 6- month incidence rate which comes from a model which bases this on the CBG’s long-term incidence rate. Since a time period is needed for the long-term incidence rate, BUMBME
97
geostatistical estimates are available for January 2005 to January 2010 for syphilis and July 2008 to January 2011 for gonorrhea.
The geostatistical estimation yields a BME posterior distribution for each 6-month period ending at time j for every CBG centroid i which can be denoted as fK(yK,ij) where the subscript K represents the physical knowledge K=GS within the BME framework. At each centroid, moving forward through time, we compare the expected value of the posterior distribution to that of the prior time period, indicating when it has increased sufficiently and summing these
significant increases over time (Eq. 4.1). When aij ≥ β3, an alarm is indicated at CBG i for the 6- month period ending at time j.
aij=∑𝑗𝑡=𝑗−𝛽 𝐼
1 [[𝐸̂[fK(yK,ij)] - 𝐸̂[fK(yK,ij-1)] > β2[𝑉̂[fK(yK,ij-1)]
1/2]] (4.1)
The set of parameters β=[β1 β2 β3]are estimated by optimizing the function that calculates the sensitivity (Sn) and specificity (Sp) of our method for identifying outbreaks and the
associated cost (Eq. 4.2). Analogous to the definitions used in a test for disease, we define Sn as correctly identifying (sounding an alarm) a space/time outbreak location and Sp is defined as correctly identifying (not sounding the alarm) the space/time locations that are not outbreak locations. The type I error cost (C1) – the cost associated with not correctly identifying locations where an outbreak is actually occurring – and the type II error cost (C2) – the cost of utilizing public health resources when an outbreak is not occurring – are complicated functions which are dependent on infection type, the spatial area an outbreak covers, the number of people and potential cases affected, and the type of intervention, as well as other factors.
C = 𝑐𝑐2
98
Since the number of space/time locations (N) in the study area and the proportion (p) of them labeled as outbreak locations are held constant, we use the ratio of costs to obtain a relative cost (C) performance for a given set of parameters.
4.3 Results