Model Validation and Testing
6.2 Model Validation - AMARC1 (ppt.mp30)
Numerous versions of the AMARC model have been produced that can be used at different stages in the validation and analysis stages of this research. The first of these, AMARC1, initialises with all 4,449 individuals for whom there is 1970 data available in the EMIUB. In order to perform the most stringent validation test possible, the model is run with only these 4,449 agents present in the model until its completion in 1999. The EMIUB dataset provides migration history data for these individuals up to 1999, therefore permitting direct comparison
of the model with the record, herein referred to as EMIUB1. As such, no agent birth or death is permitted within the model throughout its thirty year run time. As a very stringent means of analysis where the AMARC1 model is required to attempt an exact replication of the migration decisions made by 4,449 individuals over a thirty year period, results from this level of validation should not be expected to precisely replicate the EMIUB1 data but display similar migration patterns of migrants leaving each model zone.
Agents in the AMARC1 model are located in random networks where each agent has, on average, fifty networked peers. Rainfall patterns between 1970 and 1999 are gained from the historical Delaware rainfall data described in Chapter 4. Dry, average and wet rainfall years are defined according to the quartile values of the rainfall data for each zone over the thirty year period in question. Migration probability values for use in the agents’ migration decisions are derived from analysis of 1970-1999 migration data while each year is assessed as average, wet, or dry on the basis of that year alone (therefore using weight matrices ).
Table 1 of Appendix 5 displays the observed EMIUB1 migration flow data used for validation.
The average annual number of migrants from all five zones over the 30 year period from 1970 to 1999 is recorded by the EMIUB1 data as 191, or 4.30% of the total population, ranging from a low of 104 migrants (2.34%) in 1971 to a high of 284 (6.38%) in 1985. The largest number of these migrants (annual average of 45) originates in the most populated zone, Centre. However, when considered as a percentage of the population of each zone, Centre represents the zone with the lowest migration rate (annual average rate of 3.27%). When viewed as a percentage migration rate, the origin location with the highest rate of migration is Southwest (annual average of 4.68%). The year with the highest overall migration, 1985, sees a contrasting pattern with the highest migration rate that year from Sahel (7.46% of zone population) and lowest from Southwest (3.79% of zone population).
Table 2 of Appendix 5 displays the averaged results of five runs of the AMARC1 model over the period 1970 to 1999. The average annual number of AMARC1 modelled migrants from all five zones over the 30 year period from 1970 to 1999 is 244 (5.48%), compared to the EMIUB1 total of 191 (4.30%). The largest number of these migrants (annual average of 57) again originates in the most populated zone, Centre. Again, when considered as a percentage of the population of each zone, and in accordance with the EMIUB1 data, Centre displays the lowest migration rate (annual average rate of 4.15%). The location modelled as having the highest
migration rate (annual average of 6.21%) is Ouagadougou although both Bobo Dioulasso and Southwest also show modelled migration rates of over 6%. The AMARC1 model zone with the lowest actual number of total migrants is Southwest (annual average of 38). Total migration peaks in the model data in 1990 at 373 migrants, 5 years later than witnessed in the EMIUB1.
The highest absolute migration modelled in 1990 originates from Bobo Dioulasso with 88 migrants (9.82%) although the highest migration rate modelled that year originates from Ouagadougou at 10.64% (70 migrants). The EMIUB1 data year with the highest total migrants, 1985 (284 migrants across all zones), is modelled by AMARC1 as a year with a five model run average of 309 migrants. Of these, most (72 individuals) originate in Centre although the highest migration rate is modelled as occurring in Southwest (8.78% of total population).
When the migration patterns of the 4,449 agents recorded as alive in 1970 by the EMIUB1 data are followed in their migration histories both throughout their lifespan in the observed data and in the AMARC1 model, some inevitable discrepancies do occur. These discrepancies are, to some extent, to be anticipated due to the nature of the manner that the AMARC1 model works to isolate rainfall as the primary motive for migration. Within the EMIUB1 data, individuals will be migrating for numerous interconnected reasons. By contrast, the AMARC1 model uses rainfall as the only precursor to migration in its attempt to isolate and assess its role in the migration decision. This, the most stringent means of testing and validating the AMARC models against the observed data, can be best considered in terms of the general trend in the overall flow of migrants from the model and its comparison to the EMIUB1 data. Figure 6.1 displays the overall migration trend from all zones both from the EMIUB1 data and as a mean of five runs of the AMARC1 model.
Figure 6.1: Total number of individuals alive in 1970 migrating from all zones each year from 1970-1999 in the EMIUB1 Data and the AMARC1 model five run mean. Correlation coefficient = 0.65.
0 50 100 150 200 250 300 350 400
AMARC1 Modelled EMIUB1 Observed
migrants
As can be seen from Figure 6.1, a considerable level of agreement is evident between the EMIUB1 data record of the 1970-1999 migration history of the 4,449 agents initialised into the model at start-up and the five run mean AMARC1 modelled migration for the same individuals over the same time period. The correlation coefficient of the two sets of data is 0.65 which, over the 30 data points, gives a significance value of greater than 0.995. Furthermore, the natural variance in the migration flows modelled by each of the five runs of the AMARC1 model are seen, from the small error bars, to be minimal and within a range that the difference between minimum and maximum five-run modelled migration flows do not alter the relationship evident between observed and modelled data.
It can also be seen from Figure 6.1 however that, while the EMIUB1 recorded migration peaks in 1985 at 284, the year of greatest AMARC1 modelled migration, 1990, is recorded by EMIUB1 as having the third highest number of migrants, 273. As a result, although from 1981 onwards modelled migration is consistently higher than that observed, the general patterns of migration shown are similar, resulting in the significant correlation value. The fact that both observed and modelled migration rates decrease after 1990 is likely due to the ageing of individuals towards 1999. As this most stringent form of model validation allows no birth or death of individuals, either in the data or the model, the population in question ages towards 1999 and therefore becomes less likely to migrate as a result of the general tendency of migration in Burkina Faso being the undertaking of younger generations, of whom there are increasingly few.
It is evident therefore that, at this most stringent level of validation, the AMARC1 model is, without the input of any demographic change other than ageing, able to relatively accurately simulate the migration history of the 4,449 agents initialised into the model. As can be expected however, differences between the observed and modelled data are evident. These can be largely attributed to the efforts made by the model in its construction to focus upon rainfall as the primary driver of migration when in truth the migration trends seen in the EMIUB1 data will have been motivated by multiple characteristics and circumstances of the physical, social and financial environment that individuals find themselves in. If, for example, a large number of people in the EMIUB1 data migrated in a particular year as a result of, for example, conflict in neighbouring Côte d’Ivoire or the discovery of gold in Essakan, 330 km north of Ouagadougou (both events that occurred between 1970 and 1999), this would not be reflected by the AMARC1 model. As such migration is unlikely to be linked in any significant way to rainfall, but will have been recorded by the EMIUB survey, the use of the EMIUB1 data as the migration
record from which behavioural attitude values in the model are derived means that migration trends seen in the data that do not correlate with variations in rainfall will not be reflected in the model. Although using a dataset that records all migration, not just that caused by rainfall, over the validation period may seem illogical, it is this data that has shaped the development of the AMARC models. As a result, the model does not attempt to elucidate the precise quantitative impact of rainfall on migration but is intended to investigate the large-scale influence of changes in rainfall upon migration in general. Although many migrations within and from Burkina Faso for a variety of motives may be affected to some degree by rainfall, a non-rainfall contributory factor such as political activism or the discovery of gold is beyond the scope of the model and can only be identified as a point of contrast.
In order to look in more detail at the total migration figures displayed above, the migration flows from each origin location can be considered. Figures 6.2 to 6.6 display the total migration flows from each origin location according to both the EMIUB1 data and the AMARC1 model.
0
1970 1972 1974 1976 1978 1980 1982 1984 1986 1988 1990 1992 1994 1996 1998
AMARC1
1970 1972 1974 1976 1978 1980 1982 1984 1986 1988 1990 1992 1994 1996 1998
AMARC1 model. Correlation coefficient for the two sets of data = 0.51.
Figure 6.3: Number of migrants leaving Bobo Dioulasso each year from 1970-1999 in the EMIUB1 data and in the AMARC1 model. Correlation coefficient for the two sets of data = 0.44.
All of the modelled flows of migrants from each of the five origin locations within AMARC1
1970 1972 1974 1976 1978 1980 1982 1984 1986 1988 1990 1992 1994 1996 1998
AMARC1
1970 1972 1974 1976 1978 1980 1982 1984 1986 1988 1990 1992 1994 1996 1998
AMARC1
1970 1972 1974 1976 1978 1980 1982 1984 1986 1988 1990 1992 1994 1996 1998
AMARC1
Figures 6.2 to 6.6 are similar to those observed by the EMIUB1, giving rise to the significant correlation values across all zones. The most notable difference between observed and modelled flows is evident in Figure 6.3 in the case of departures from Bobo Dioulasso. In this example, modelled flows are clearly higher than observed from 1986 onwards and fluctuate markedly over two or three year periods to the end of the simulation. Such an effect is likely to result from the manner in which rainfall has been broken down by this research. According to the dry, average, wet classification system used, rainfall in Bobo Dioulasso is seen to fluctuate regularly from 1986 onwards in Bobo Dioulasso with modelled peaks in migration tending to align with average rainfall and modelled troughs aligning with wet years.
Despite the possible limitation of categorising rainfall as within one of three classes (dry, average, wet), it is evident from the 0.975 significance between modelled and observed data that the AMARC1 model shows a good amount of agreement with the relevant EMIUB1 data at the most stringent level of model validation, at both national- and zone-scale resolutions. Although this level of correlation could be considered acceptable as a means of model validation (being at least 0.975 significant in all cases), there is evidently potential for improvement in the scale of accuracy with which the agent model represents the reality provided by the EMIUB record.