Chapter 5 Modelling of large human populations
5.1.3 Individual based models
Along with the increase in available data, the availability of computing memory and speed has greatly increased, which allows large, memory-intensive simulations to be run quickly enough to be realistically implemented.
One way in which the data has improved is in the use of Radio Frequency Identification (RFID) devices. These are used to track the locations of the wearers and to generate person-to-person interaction networks via proximity between subjects [Cattuto et al., 2010; Machens et al., 2013]. Once these networks are formed, it is then possible to simulate an epidemic on them.
Along with this the ubiquity of mobile phones has lead to some large data sets [Gonzalez et al., 2008], which gives information regarding the movements that people make.
The idea of using many di↵erent datasets to model real world populations is a well es-
tablished method in an attempt to characterise the spread of epidemics seen in Ferguson et al. [2005, 2006]. These papers used datasets related to household size, age structure, school and workplace size data along with commuting data to generate individually based simulations where the spatial density of people is in agreement with reality. The focus here
relates to the prevention of serious flu epidemics, through investigating the e↵ectiveness
of various interventions and combinations of intervention strategies.
These have also been used to create synthetic populations, which is an established method in the use of networks to model disease spread [Eubank et al., 2004]. To construct a synthetic population, data sources including census data, diary based activity surveys and workplace data are all combined in an attempt to construct a realistic contact network for a given area, which has the same number of individuals as the true population in that area. Once this is completed the spread of epidemics on the population can be investigated and
di↵erent control strategies can be tested. An outbreak of smallpox in Portland Oregon
was the first hypothetical outbreak considered on a synthetic population constructed using
this method [Barrett et al., 2005; Eubank et al., 2004], but since multiple di↵erent areas
have been considered, for example Boston [Lewis et al., 2013] and Washington DC [Parikh et al., 2013b]. The synthetic populations generated have even been used to investigate the impact of human behaviour on the size of disaster which follows a nuclear explosion [Parikh et al., 2013a].
To simulate on such large networks, the creation of efficient algorithms has been required
generate useful statistics relating to the use of intervention and surveillance strategies to combat epidemics [Lewis et al., 2013].
A similar approach has been used to construct the Little Italy model [Iozzi et al., 2010]. This makes us of survey detailing how the Italian public spend their time, which was performed by the Istituto Nazionale di Statistica (ISTAT) [ISTAT homepage] of 55,773 individuals from 21,075 households to construct a synthetic population. This survey con- sists of 144, 10 minute intervals over a 24 hour day, in which the type of activity being performed and the type of location that the person was in is given. This was then com- bined with a ‘“minimally” complex set of rules’ [Iozzi et al., 2010] to generate a synthetic population. This set of rules involved including individuals who filled in the form for a weekday, therefore limiting the number of responses to 18,085 and rather than trying to scale up to a population level simply constructing a population with this number of people in it. The survey data was combined with data about household size and composition, school class size and workplace size for towns of a similar size to Little Italy.
To generate the contact matrix people are placed in households as defined by the household data, and are then assigned to workplaces/schools throughout the day, with contacts made
in each of these places to give an adjacency matrix for the population. Three di↵erent
methods are then used to give the final contact matrix: Type 1 weights contacts by time together, Type 2 weights contacts by number of times they meet each other in a day and Type 3 is unweighted.
The benefit of constructing a population in this manner, is that population size is small meaning that simulation is relatively quick in comparison to large populations. However the small population is also a large limitation as this means that the investigation of realistic control strategies is not possible. Comparison to actual epidemics is possible and is done by Iozzi et al. [2010]. The Little Italy population is compared with contact structures derived from POLYMOD [Mossong et al., 2008], along with ‘Big Italy’, a contact structure which gives the average number of contacts between all possible ages in the household, workplace and general community contexts. This was generated to match data regarding household composition family structures, school, university and workplace structure along with homogeneous mixing for the community level interactions.
Which method produces the most accurate results to observed epidemics is then consid- ered. However this is not answered satisfactorily, as the spread of two epidemics through
Italy are considered, Varicella and ParvoVirus, and di↵erent methods give the best fit
(as measured by AIC) in each case (POLYMOD for Varicella and Little Italy Type 2 for ParvoVirus).
This demonstrates clearly the fact that contact structure can have a large impact on the
spread of an epidemics, and also the difficulty of choosing what the best assumptions are
to make for a given disease. POLYMOD for example gives information regarding the age, sex, location, duration, frequency, and occurrence of physical contact between people, which seems to be a sensible list of variables to collect, but is a worse fit for the ParvoVirus outbreak than all of the Little Italy models, along with Big Italy, so is obviously not a perfect method for constructing contact networks. In the paper it is suggested that the high level of assortativity present in the Little Italy models allows for a good fit for ParvoVirus, but also prevents it from fitting the Varicella well. This implies that assortativity is an
important measure for achieving a realistic model, but also one which is difficult to get
As mentioned previously, a large portion of my PhD was dedicated to the construction of a synthetic population for England and Wales which was done in collaboration with the Network Dynamics and Simulation Science Laboratory at Virginia Tech (NDSSL). The process for this construction is described next.