1. Chapter 1: Introduction aims, objectives, structure, scope and
6.6 Model ability to replicate known flows
Having calibrated the model to replicate known characteristics of consumers trip making behaviours via ATD, the focus now turns to assessing the model’s overall performance. This is achieved by validating its ability to reproduce the known flow data supplied by Sainsbury’s for four stores of interest. Knudsen and Fotheringham (1986) note that this assessment of the model’s ability to replicate an observed set of data is an important
component of model building. Validation via GOF statistics is based on measuring the differences between observed and predicted values (Batty and Mackie, 1972). This section makes use of two GOF statistics: R2 (or the coefficient of determination) which is commonly used to assess SIM performance, and SRMSE (standardised root mean square error). These are both considered to be some of the ‘better performing’ and more commonly used GOF statistics (Fotheringham and O'Kelly, 1989). SRMSE is observed to be very sensitive to any differences between the observed and predicted flow matrix (Harland, 2008).
SRMSE is calculated as shown in equation 6.13. A value of zero represents a perfect fit between observed and predicted matrices (Knudsen and Fotheringham, 1986), with an upper limit generally accepted to be 1, though Harland (2008) illustrates that the upper limit can rise above 1 under certain conditions in a sparsely populated matrix where a number of zero flows exist.
√∑ ∑ ( ̂ )
∑ ∑
(6.13)
As previously, represents predicted flows, ̂ represents observed flows and represent the dimensions of the observed and predicted LSOA ( ) to store ( ) flow matrix. Harland (2008) notes that the SRMSE does a good job of identifying discrepancies between observed and predicted flows in a number of different scenarios, all of which he simulated on a dataset in order to evaluate the sensitivity of different GOF statistics. He noted that under all his scenarios (which involved either altering the magnitude of flows or shifting flows to alternative origin/destination cells) the SRMSE picked up that differences existed between and ̂ . By contrast, R2, outlined in equation 6.14, was found to be sensitive to values that had been shifted elsewhere on the matrix, but less sensitive to differences in the volume of individual flows when they appeared in the correct cells in the matrix. Nonetheless, both are valuable tools in assessing model performance and have been utilised here.
R2 is calculated as follows:
[
∑ ∑ ( ̅̅̅) ̂ ̅̅̅̅√∑ ∑ ̅̅̅ ∑ ∑ ̂ ̅̅̅̅
]
(6.14)
Where, ̅ represents the mean of all ’s (predicted flows) and ̅ , represents the mean of all ̂ ‘s (observed flows). R2
is bounded by an upper limit of 1 (Knudsen and Fotheringham, 1986).
Table 6.8 shows the SRMSE and R2 values for the disaggregate model, for the four calibration/validation stores.
Recall that an SRMSE of 0 and R2 of 1 would denote an exact fit between observed and predicted values. Table 6.8 clearly identifies that the model is performing well, with reference to the four study stores, demonstrated by an overall SRMSE of 0.05 and R2 of 0.88. On a store-by-store basis, the model is able to replicate flows to the Newquay store most accurately. All stores exhibit an R2 of above 0.84, and, with the exception of Bude, an SRMSE of 0.1 or lower, suggesting that both the spatial distribution of flows, and the magnitude of individual flows, correspond closely with the observed values. The Bude store exhibits a higher SRMSE of 0.2, suggesting that, whilst the spatial pattern of flows shows a close match to observed data (R2 = 0.86), the volume of modelled flows show some disparity with observed flows. The characteristics of this store make it tricky to model. It is a popular, but very modestly sized store (11,500 square foot) serving a thriving town centre and seasonal tourist trade. Based in their experience at Tesco and Sainsbury’s, Wood and Tasker (2008) acknowledge that smaller format supermarkets such as the Bude store are trickier to model using a SIM as they tend to have a smaller catchment than larger supermarkets. Sainsbury’s own analysis also identifies that only 58.7% of store spend (2010 trading year) was associated with a Nectar card, and thus the flow data at this store is more limited. In spite of this, section 6.7 demonstrates that modelled flows can be used to predict seasonal variations in revenue at this store to an acceptable level of accuracy.
Table 6.8 - GOF statistics for four Cornish study stores.
Based on 52 week average flow data
SRMSE R2 Newquay 0.08 0.93 Bude 0.20 0.86 Bodmin 0.10 0.84 Truro 0.08 0.84 Overall25 0.05 0.88
The application of GOF statistics goes some way to validate the model’s ability to replicate known flows and thus assess how well the model has been specified (and the assumptions made). Nonetheless, the GOF statistics are only indicative of the models’ performance. To understand more about any differences between observed and predicted revenue, especially at the coastal resort stores (which are subject to greatest seasonal variations) flows should be considered spatially. Figure
6.2
demonstrates the spatial pattern of observed and predicted flows within the catchment area of the Newquay store. It is clear that there is a good spatial25 Not averaged from individual store values but calculated based on observed and predicted matrix for all four stores.
fit between observed and predicted flows, with the model showing a tendency to predict within 10% of reality in most OAs, with some over prediction in OAs in close proximity to the store. Figure
6.3
begins to consider the predicted inflow on a month-by-month basis, again at the Newquay store. As reasonably expected, April (fringe season) and August (peak season) demonstrates a higher inflow from a fairly wide catchment, incorporating inflow from a number of rural and coastal output areas to the south of the town, which are home to much of the visitor accommodation within this store catchment. In January, when much of this accommodation is closed or operating well below capacity, the spatial pattern of trade around the store produces a noticeably tighter core catchment area.Since residential and visitor demand are handled separately within the model, it is also possible to consider the total inflow from local residents and also from visitors. Figure
6.4
considers June 2010 and demonstrates, on an OA-by-OA basis, the inflow from residential and visitor demand. It is clear that visitor demand exhibits a greater degree of concentration around the resort of Newquay itself, driven by the location of visitor accommodation, which, as explored in Chapter 5, is concentrated around resorts such as Newquay, with a number of OAs generating over £5,000 per week inflow from visitor demand alone. By contrast, and as expected, residential demand is drawn more uniformly from the OAs that make up the store catchment, with distance decay, driven by drive time, more pronounced.This brief exploration of flow patterns at the Newquay store highlights that the model appears to be performing well, replicating observed flows and producing flow patterns which are consistent with the input data and assumptions made, and clearly highlighting the impact of seasonal variations driven by tourism. The real value of the model is its ability to predict store revenue with accuracy, such that it can be used in a predictive capacity. Birkin et al. (2010a) actually suggest a move away from traditional concepts of goodness-of-fit statistics to a more complex approach to model validation, considering whether the models are able to accurately replicate customer flows and store revenue, effectively termed goodness-of- forecast and considered in section 6.7.