3.5 Simulation studies
3.5.3 Study 3: Spatiotemporal exceedance probability sensitivity
The previous two simulation studies have examined the performance of the commonly em- ployed, albeit quite ad hoc, minimum contrast approach to parameter estimation of the terms defining the correlation structure in the spatial LGCP. They illustrated the degree to which we can expect the estimates of the spatial parameters to vary, and the rates at which they do so when additional aspects of the modelling framework, such as the deterministic kernel-smoothed heterogeneous ‘global’ intensity, are present. Though these results are very useful on their own, a key question remains: does this degree of parameter estimate vari- ation actually matter in practice? We pose the question in the context of spatiotemporal disease surveillance where, as in the AEGISS analysis, the goal is to produce real-time maps of exceedance probabilities which describe the chances of having observed ‘anomalies’ (i.e. unusually extreme behaviour) under the assumed model at each space-time location.
To examine this issue, we design three distinct Spatiotemporal Scenarios (STS) each of which have a ‘true’ form of the space-time stationary, isotropic Gaussian field which, following the exponentiation that provides the unit-mean ‘residual’ LGCP, constitutes the sole ingredient for computing the exceedance probabilities; see Section 3.4.2. The true Gaussian field is set as a single, unconditionally generated realisation with a pre-definedσ2
ψ, φψ andθ; using the Exponential correlation function throughout. Spatially constrained to the unit squareW, eachSTSis split into two versions according to a pre-defined temporal interval with a maximum number of integer time points T and constant mean number of observations η in the sense that η = η(t); t ∈ {1, . . . , T}. In all cases, the global spatial trend is ignored (i.e. made uniform such thatζψ(x) = 1 for allx∈W).
The two versions of each STS have either T = 10, η = 100 or T = 100, η = 10, i.e. either a spatiotemporal field defined over 10 distinct time points with a mean number of 100 observations (Poisson distributed) at each time, or a longer-term field over 100 time points but with only a mean of 10 observations per time. STS1 has σ2
ψ = 4, φψ = 0.04, θ = 4; STS2 has σ2
ψ = 8, φψ = 0.1, θ = 6; and STS3 has σψ2 = 2, φψ = 0.02, θ = 2. The first scenario,STS1, represents data sets exhibiting a moderate level of dependence and cluster concentration relative to the others: STS2 has the ‘longest-reaching’ dependencies in terms of the spatial and temporal lags; as well as the most extreme variance. STS3 denotes a considerably ‘weaker’ or ‘shorter’ dependence structure, as well as the smallest field variance. For each of the six problems (three scenarios times two versions), 10 distinct data sets were generated and stored; each one of these representing a single instance of a hypothetically observed data set under each set of conditions for a single realised sample size determined by the original unconditional spatiotemporal field realisations defining the
STSs themselves.
Phase 1
This simulation study takes the form of two phases, the first of which involves minimum contrast estimation of the three parameters controlling the stochastic covariance. Inciden- tally, this phase in itself can be considered interesting; a (limited) opportunity to examine the accuracy and precision of minimum contrast estimation of the temporal dependence
parameterθ being provided. For a givenSTSand version, and using the corresponding 10 pre-generated data sets,σ2
ψ,φψ and θare estimated for each; the former two based on the time-averaged version of eitherKorg, the latter onCusing the results arrived at forσ2
ψand φψ. Calibration of the contrast integrals (e.g. function transformation) follows the decisions made in Section 3.2.3 and also followed in the previous two simulation studies. Table 3.5 gives the mean of the estimated parameters (¯σ2
ψ, ¯φψ and ¯θ) as well as the corresponding standard errors for each scenario and version.
STS T;η V φψ¯ (s.e.[ ¯φψ]) σ¯2 ψ (s.e.[¯σψ2]) θ¯(s.e.[¯θ]) 1 10; 100 K 0.0361 (0.0002) 3.9810 (0.0196) 1.5673 (0.0094) {φ= 0.04, g 0.0376 (0.0001) 3.9039 (0.0183) 1.5294 (0.0087) σ2= 4, 100; 10 K 0.0263 (0.0003) 3.6088 (0.0335) 4.5564 (0.0598) θ= 4} g 0.0287 (0.0002) 3.1826 (0.0152) 3.9776 (0.0202) 2 10; 100 K 0.1043 (0.0002) 5.2463 (0.0065) 4.4921 (0.0196) {φ= 0.1, g 0.0956 (0.0002) 5.4259 (0.0069) 4.9436 (0.0254) σ2= 8, 100; 10 K 0.0522 (0.0003) 5.0739 (0.0189) 6.1134 (0.0405) θ= 6} g 0.0493 (0.0002) 5.0622 (0.0162) 6.1699 (0.0318) 3 10; 100 K 0.0302 (0.0005) 1.8243 (0.0250) 0.8185 (0.0067) {φ= 0.02, g 0.0226 (0.0003) 2.1881 (0.0199) 0.9411 (0.0062) σ2= 2, 100; 10 K 0.0136 (0.0004) 2.0677 (0.0732) 1.6500 (0.0363) θ= 2} g 0.0099 (0.0002) 2.6932 (0.0094) 1.9772 (0.0055) Table 3.5: Study 3, Phase 1: Means and associated standard errors of the minimum contrast- estimated correlation parameters for both versions of eachSTS, based on the hypothetically observed data sets.
An examination of these figures shows generally what we would expect to see given our design specifications for each STS, though there are some exceptions. WhereT = 10 but η = 100, note the estimates of the spatial correlation parameters, σ2
ψ and φψ, tend to be closer to their true counterparts compared to when T = 100, η = 10. Naturally, the fact that we have far more spatial data at each possible time dictates these values, based on the second-order spatial characteristics, should be more reliable than the version whereT = 100 but η is only 10. Unfortunately, this means the estimates of θ for the T = 10, η = 100 versions suffer for much the same reasons! With far fewer time points in the data sets, the estimation procedure struggles to accurately capture the the temporal correlation. For example, note in STS1 for T = 10, η = 100, the mean estimated spatial parameters are notably superior than the T = 100, η = 10 version (for use of both K and g), but θ is drastically underestimated, sitting at around 1.5 (true value is 4). Conversely, inspection of T = 100,η = 10 gives far superior estimates of θ at the cost of misspecifying σ2ψ and φψ to a greater extent. This story is mirrored in the results for STS2and once more inSTS3, albeit with some overestimation of the spatial parameters in the latter. Note that STS3 is the scenario which has the weakest range of spatiotemporal correlation, with φψ = 0.02 and θ = 2. Estimation of these ‘finer’ parameters to a similar level of satisfaction as the earlier scenarios could conceivably require a far larger data set, in both space and time. This is where the issue of spatial and temporal discretisation levels also comes into play, with smaller dependency ranges requiring finer discrete approximations in order to be detectable
in practice. We leaveSTS3as-is, and consider this a scenario where the researcher has small- scale dependence in their space-time data and may therefore be more susceptible to utilising too coarse a discretisation in conditional simulation, which we perform in the following phase of this simulation study.
As observed in Study 1 and Study 2, the differences between results based on using either theK-function or the pair correlation function for estimation of the spatial parameters are relatively small. It is nevertheless comforting to note that the time-averaged versions of the relevant nonparametric estimators appear to behave similarly to the ‘spatial only’ estimators. As a consequence, the differences in estimation ofθbetweenKandgis also relatively minor. One noteworthy point is the fact that in almost all cases, the level of variation associated with use of g was lower than that of the estimates obtained with K, also noticed in the previous simulation studies. This is of course of little value, however, if there exists any considerable bias in the estimates themselves.
Prior to entering Phase 2 of the study, we remark that Phase 1 is essentially a ‘by- product’ of the procedure necessary for the conditional prediction which follows. To draw more precise conclusions on the performance of minimum contrast parameter estimation for the spatiotemporal LGCP, it is of course desirable to use a larger number of data sets. However, the results observed with the 10 generated sets for each version per STS are already useful: they are intuitively sensible, and reflect earlier discoveries in the purely spatial setting. The reason for such a relatively small number of data sets in this simulation study was made primarily for computational reasons in the operations which follow; these details will be clarified momentarily.
Phase 2
By now, we have gained a good understanding of the behaviour of minimum contrast param- eter estimation in terms of the proximity of the estimates to the true values. As mentioned at the introduction of this study, however, it remains to be seen if these types of discrepancies impact upon our practical interpretations of the process when we use our observed data to predict the intensity itself. On its own, this is unfortunately a rather vague challenge. The computational expense associated with conditional simulation of the spatiotemporal LGCP (which will in a numerical exercise require multiple MALA runs) dictates that all aspects of the design of Phase 2 must be carefully considered in order to attain meaningful results with respect to clearly defined objectives in a computationally feasible way.
Disease surveillance, as described for the AEGISS data in Diggle, Rowlingson and Su (2005) and Section 3.4.2, is amongst the primary objectives of interest for epidemiological applications utilising the spatiotemporal LGCP. Flagging of ‘unusual’ space-time clusters of observations in view of the fitted model is achieved with exceedance probabilities, computed empirically as the proportion of retained, exponentiated Gaussian fields (from a correspond- ing MALA run) which exceed a certain user-defined threshold (chosen to be a suitably high limit such that the researcher is confident in interpreting higher values as a true ‘anomaly’) at each space-time location. Naturally we should be concerned if, at a given time, the spa- tial exceedance probability surfaces obtained in practice fail to detect a true anomaly, or conversely, flag an anomaly where there exists none. When these are interpreted as poten-
tial disease outbreaks, the consequences of non-detection or false-flagging in terms of public health as well as economic cost could well be substantial. Do the minimum contrast param- eter estimates, and the discrepancies they exhibit with reference to the true values, diminish our ability to arrive at a ‘suitably correct’ representation of exceedances?
Brix and Diggle (2001) and Diggle, Rowlingson and Su (2005) do provide some sensi- ble conjecture on this issue, albeit without further formal attention. They argue that, in situations where we have a suitably large data set, uncertainty in the minimum contrast estimates can safely be ignored when it comes to conditional simulation of a spatiotemporal LGCP. This statement is made owing to the fact that we useall historical data to estimate the parameters, yet only thelast fewtime points (user-specified, depending on the perceived strength of the temporal dependence), with relatively few observations, for prediction of the ‘current’ spatial intensity. We should therefore expect variation in the prediction proce- dure to dominate (compared to parameter estimation variation); the conditional simulation algorithm subsequently beingrelatively insensitive to the choice of initial parameters.
This is a perfectly reasonable notion, but does depend heavily on just how much historical data we have, as well as the (underlying) expected number of observations per time point. The fact remains that in both Studies 1 and 2, as well as Phase 1 of the current study, we have in some cases noticed rather extreme departures of the minimum contrast estimates from the ‘truth’. Are we to expect that even these kinds of discrepancies will have little impact on anomaly detection?
This motivates Phase 2, where we use each of the 10 data sets generated from eachSTS (and version) to predict the exponentiated spatial Gaussian field at the final time point (as would be of the most interest in a real-time surveillance setting) using the MALA. MALAs of length 44000 are run based on (a) use of the true parameter values; (b) use of the K- function estimated parameters; and (c) use of the parameters estimated withg. In all cases, there is a burn-in period of 4000 iterations, followed by retention of every 100th iteration.
To compare the estimated intensities arising from the conditional simulations with the true form of the latent LGCP we first determine, for each scenario and version, which cells of the (M −1)×(N −1) = 64×64 grid at the last time point T do in fact exceed the 95th percentile threshold. This results in a binary matrix; 1 denoting an exceedance, 0
no exceedance. Label this the ‘true anomaly matrix’. For a given STS and version, we then obtain for each of the 10 data sets a collection of 400 retained posterior exponentiated spatial Gaussian fields on the same 64×64 cells from a MALA of type (a), (b) or (c). Subsequent exceedance probability surfaces are computed based on the 95th percentile of
the corresponding lognormal distribution (this is dependent in each case on the variance, or estimated variance, of the Gaussian field itself). The performance of the MALAs is judged based on those cells with empirical probability greater than 0.5, that is, the cells for which more than half of the realisations have flagged an exceedance. This results in another binary structure which is labelled the ‘predicted anomaly matrix’.
The Jaccard similarity coefficient is used to evaluate the agreement between the true and predicted anomaly matrices. Briefly, for two setsA andB, the Jaccard index is calculated as the size of their intersection divided by the size of their union; J = |A∩B|/|A∪B|; 0 ≤ J ≤ 1. In our case, the sets represent the two anomaly matrices; elements are the
cell-wise ‘exceedance’ (1) or ‘no-exceedance’ (0) flags. By summing the two binary matrices,
J can quickly be computed as the ratio of the number of cells whose value is 2 to the number of cells whose value is greater than or equal to 1. Thus, a ‘perfect’ match will be found with
J = 1; absolutely no match withJ = 0. It is important to note that we do not expect to see any perfect matches, regardless of how well the prediction algorithm executes. Precisely where cells are flagged in the predicted anomaly matrix depends on a number of variable components, some of which are extremely difficult to control. The number of observations per time point, cell resolution with respect to the magnitude of the LGCP parameters, as well as the convergence and mixing performance of the MALA chain, will all impact the final exceedance results and magnitudes of the recordedJ statistics. As such, we stress that use of the Jaccard coefficient in this setting should be viewed as a very simple within-scenario, between-MALA type of quality measure.
Prior to execution, it is worth highlighting the variability in the 95% exceedance thresh- olds themselves. Dependent upon the variance of the Gaussian field, σ2ψ, discrepancies between different estimates thereof will flow through to these cutoff values, thereby poten- tially having an effect on the location and prevalence of flagged exceedances (and therefore our resulting Jaccard indices). Figure 3.30 illustrates this with respect to the 10 data sets specific to each STSand version, using the true and mean parameters estimates of σ2
ψ in Table 3.5. All scenarios using the minimum contrast-estimated values of the variance exhibit on average an increased 95% threshold when compared to the value associated with the true variance. Furthermore, recall that the LGCP is a ‘residual process’ with unit mean: the impact on the ‘location’ (stationary mean) parameterμψ =−0.5σ2
ψ is also present. These issues, in addition to the obvious use ofσ2
ψ in defining the discretised covariance matrix of the Gaussian field, therefore means that there exists considerable potential for the precise magnitude of the suppliedσ2
ψ to affect a given MALA.
The computational demands of Phase 2 cannot be emphasised enough. The decisions regarding the size of the prediction grid (64×64); the number of distinct data sets (10); the number of scenarios (3); the number of versions of each scenario (2); the number of different model parameterisations or ‘MALA types’ (3); the MALA length and burn-in (44000 and 4000 respectively); the retention rate (1/100); and the number of previous time points to consider for prediction at the latest time (‘lag length’ – 5); were all made after careful examination of the performance of the required operations. The above figures were decided upon as a fine balance between relatively manageable computational costs and meaningfulness of the resulting empirics (in terms of variety of scenarios and accuracy and precision of results, as well as validity of the MCMC methods based on an assessment of mixing and convergence). Finally, the 3×2×3×10 = 180 individual 44000-iteration, minus- 5-lagged spatiotemporal MALAs were run, for which several further weeks of execution time using multiple machines was required.
Results are given in Table 3.6 as the mean Jaccard scores ¯JTRUE, ¯JK and ¯Jg, obtained for each of the 10 data sets specific to eachSTSand version by comparing the correspond- ing predicted anomaly matrix with the true anomaly matrix. Note the subscripts TRUE, K and g refer to MALAs (a), (b) and (c) respectively, that is, using the true parameter values, the individually-derived K-function parameter values and the individually-derived
3.80 3.85 3.90 3.95 4.00 4.05 4.10 3.60 3.62 3.64 3.66 3.68 T=10, eta=100 sigma^2 95% e xceedance threshold 3.0 3.2 3.4 3.6 3.8 4.0 4.2 3.55 3.60 3.65 3.70 3.75 3.80 3.85 T=100, eta=10 sigma^2 95% e xceedance threshold 4 5 6 7 8 9 1.5 2.0 2.5 3.0 3.5 sigma^2 95% e xceedance threshold 4 5 6 7 8 9 1.5 2.0 2.5 3.0 3.5 sigma^2 95% e xceedance threshold 2.0 2.2 2.4 2.6 3.74 3.76 3.78 3.80 3.82 3.84 3.86 sigma^2 95% e xceedance threshold 1.8 2.0 2.2 2.4 2.6 2.8 3.0 3.70 3.75 3.80 3.85 sigma^2 95% e xceedance threshold
Figure 3.30: Cutoff threshold value (95thpercentile) by varianceσ2
ψ(solid, bold line). Values for STS1 (top), STS2 (middle) and STS3 (bottom), are given for the true variance (solid line), mean minimum contrast estimate by K (dashed line), and mean minimum contrast estimate byg (dotted line). VersionsT = 10;η = 100 andT = 100;η = 10 are given in the left and right columns respectively.
PCF parameter values. STS T;η J¯TRUE(s.d.[JTRUE]) J¯K (s.d.[JK]) J¯g (s.d.[Jg]) 1 10; 100 0.1631 (0.1004) 0.2166 (0.0367) 0.2094 (0.0429) {φ= 0.04 σ2= 4 100; 10 0.0566 (0.0312) 0.0521 (0.0141) 0.0513 (0.0248) θ= 4} 2 10; 100 0.5212 (0.0606) 0.5361 (0.0800) 0.5825 (0.0663) {φ= 0.1 σ2= 8 100; 10 0.3905 (0.0752) 0.2763 (0.0405) 0.2622 (0.0365) θ= 6} 3 10; 100 0.1036 (0.0226) 0.0873 (0.0295) 0.0976 (0.0267) {φ= 0.02 σ2= 2 100; 10 0.0073 (0.0071) 0.0078 (0.0105) 0.0108 (0.0083) θ= 2}
Table 3.6: Study 3, Phase 2: Means and standard deviations of the Jaccard indices for both versions of eachSTS, based on MALA prediction of the final time point exceedances.
In general, ‘within scenarios’, there appears to be little effect on the Jaccard scores of using the minimum contrast-estimated parameters when performing prediction of the intensity on the final day, regardless of STS or version, in comparison to using the true parameters. Indeed, some cases even show higher scores for the MALAs corresponding to the estimated parameters when compared to the MALAs executed using the true values themselves (for exampleSTS1, version T = 10, η = 100; STS3, version T = 100, η = 10). The fact that we sometimes get better results with estimated parameters may not just be due to simulation noise. In each case, the generated datasets and associated results are based on a single realisation of the spatiotemporal Gaussian process – it could be that the characteristics of these processes (in particular, for anomaly detection) are better described using the estimated rather than the true parameters. Between the two versions within each scenario, there are noticeably lower Jaccard scores in theT = 100, η = 10 setting. This is understandable: with far fewer observations at the time point of prediction, there are clearly fewer cells which will be capable of producing intensity levels of a magnitude sufficient to flag an exceedance.
Between scenarios, the highest Jaccard indices are reserved for those problems with longer ranges of dependence (e.g.STS2): this is intuitively sensible when we consider the fact that