Model Validation and Testing
6.3 Model Validation - AMARC2 (ppt.mp30)
Despite the possible limitation of categorising rainfall as within one of three classes (dry, average, wet), it is evident from the 0.975 significance between modelled and observed data that the AMARC1 model shows a good amount of agreement with the relevant EMIUB1 data at the most stringent level of model validation, at both national- and zone-scale resolutions. Although this level of correlation could be considered acceptable as a means of model validation (being at least 0.975 significant in all cases), there is evidently potential for improvement in the scale of accuracy with which the agent model represents the reality provided by the EMIUB record.
6.3 Model Validation - AMARC2 (ppt.mp30)
The central component missing from the AMARC1 model is the inclusion of demographic changes that affect the population of each zone. As a further means of model validation, the next version of the model, AMARC2, presents the same model processes as those of AMARC1 but with birth introduced as a further demographic process. The results of AMARC2 can then be compared to the full set of EMIUB data for further validation that incorporates those individuals surveyed in 2000 but born after 1970 (referred to herein as EMIUB2). By permitting birth within the AMARC2 model the population of each zone will increase each year as more agents are born and reach the age at which they are deemed to make their own migration decisions.
At this second stage of model validation, full demographic processes cannot be included in the ABM due to the fact that the EMIUB2 data contains no information on the death of individuals.
This results from the retrospective nature of the survey. In order for an individual’s migration history to be recorded, that individual had to be alive and present in 2000 for interview. As a result, the AMARC2 model includes no function for agent death. Agents in the model are
located in random networks where each agent has, on average, 50 networked peers. Agents use weight matrices in conducting their migration decisions.
Table 3 of Appendix 5 displays the EMIUB2 migration data for all surveyed individuals between 1970 and 1999, including those born after 1970. The EMIUB2 data records total migration from all zones in Burkina Faso as ranging from 121 in 1970 to 482 in 1999. The distribution of this overall migrant flow reveals that, in 1999, the zone from which the majority of migrants (140 migrants) originated was Sahel. In both 1970 and 1999 the EMIUB2 data records Ouagadougou as the zone with the lowest contribution (16 migrants and 60 migrants respectively).
Table 4 of Appendix 5 displays the averaged results from five runs of the AMARC2 model between 1970 and 1999. The AMARC2 five run averaged model outputs record total migration from all zones in Burkina Faso as ranging from 71 in 1970 to 409 in 1999. This compares well with the EMIUB2 observed total migration data which shows a low of 95 migrants in 1971 and a high of 482 in 1999. The distribution of the overall migrant flow modelled by AMARC2 reveals that, in 1999, the zone from which the majority of migrants originated was Sahel (140 individuals). This again corresponds well with the EMIUB2 data distribution which also shows Sahel as the zone that contributes the most migrants to the total flow, despite the larger overall population of Centre.
The two lowest migration flows modelled by AMARC2 originate in Ouagadougou and Southwest with flows in 1999 of 46 and 56 respectively. This again corresponds well with the observed trend as the smallest EMIUB2 migration flows in 1999 are seen to originate in Ouagadougou and Southwest with 60 and 62 migrants respectively. In order to further investigate the similarities between EMIUB2 observed and AMARC2 modelled migration flows, Figure 6.7 displays the EMIUB2 data migration record for all surveyed individuals between 1970 and 1999 (including those born after 1970) and the equivalent data averaged from five runs of the AMARC2 model.
Figure 6.7: Total number of migrants leaving all zones each year from 1970-1999 in the EMIUB2 Data and the AMARC2 model. Correlation coefficient for the two sets of data = 0.94. RMSD = 43 (12.7%).
Figure 6.7 shows that a considerably better correlation (correlation coefficient of 0.94) is evident between the AMARC2 model output and the EMIUB2 data than was seen in Figure 6.1 between AMARC1 and EMIUB1. The scale of the migration flows modelled by AMARC2 are similar to those of the EMIUB2 data with the highest migrant numbers both modelled and surveyed as occurring in 1999 (409 AMARC2 migrants compared to 489 in the EMIUB2 data).
The five run error bars shown in Figure 6.7 also confirm that the degree of natural variation occurring between model runs is not large enough to alter the relationship between observed and modelled data. It appears therefore that adding the demographic component of birth into both the EMIUB data record and the AMARC model permits a closer statistical relationship between the modelled and observed data. A correlation coefficient of 0.94 is double that necessary to achieve 0.995 significance over the 30 data points from 1970-1999. Two clear peaks in the EMIUB2 observed migration flow seen in Figure 6.7 are evident in 1980 and 1990. Neither of these peaks are replicated by the AMARC2 modelled flow of migrants. As a result of the aim of the AMARC models to elucidate the role of changes in rainfall variability upon migration in Burkina Faso, it is therefore proposed that these clear peaks in observed migration result from historical events that would not be captured by an analysis that focuses on rainfall. Focus group participants mentioned the discovery of gold at Essakan in northern Burkina Faso as an historical event that triggered a considerable shift in the usual flows of migrants in the country.
Although gold was first discovered at Essakan in 1985, mining activities did not start at the site until sometime later making it difficult to determine when a resulting change in migration flows may have occurred. However, the peak in migration flows may be more easily attributable,
0 50 100 150 200 250 300 350 400 450 500
EMIUB2 Observed AMARC2 Modelled
migrants
although not conclusively, to workers strikes across the country that led up to the overthrow of President Lamizana and perhaps increased labour migration.
As with the results of the AMARC1 model, it is useful to investigate the EMIUB2 data and AMARC2 outputs in more detail. Figures 6.8 to 6.12 display graphical comparisons of the numbers of migrants recorded by EMIUB2 and modelled by five averaged runs of AMARC2 as leaving each zone.
1970 1972 1974 1976 1978 1980 1982 1984 1986 1988 1990 1992 1994 1996 1998
AMARC2
1970 1972 1974 1976 1978 1980 1982 1984 1986 1988 1990 1992 1994 1996 1998
AMARC2 model. Correlation coefficient for the two sets of data = 0.89. RMSD = 11 (19.7%).
Figure 6.9: Numbers of migrants leaving Bobo Dioulasso each year from 1970-1999 in the EMIUB2 data and in the AMARC2 model. Correlation coefficient for the two sets of data = 0.93. RMSD = 8 (9.9%).
The AMARC2 modelled flows of migrants from each of the five origin locations all show similarly good levels of agreement with the EMIUB2 data. The lowest correlation between the observed and modelled data is evident in the Southwest zone with a coefficient value of 0.82, still well into the realm of 0.995 significance. The highest correlation value is evident for Bobo
0
1970 1972 1974 1976 1978 1980 1982 1984 1986 1988 1990 1992 1994 1996 1998
AMARC2
1970 1972 1974 1976 1978 1980 1982 1984 1986 1988 1990 1992 1994 1996 1998
AMARC2
1970 1972 1974 1976 1978 1980 1982 1984 1986 1988 1990 1992 1994 1996 1998
AMARC2
Dioulasso where observed and modelled flows produce a correlation coefficient of 0.94.
Observed and modelled comparisons using EMIUB2 and AMARC2 data show considerable improvements in correlation values over those resulting from comparison of EMIUB1 and AMARC1 results. The peak in observed migration seen for total migrant flows in Figure 6.7 is replicated in departures from all zones but Southwest further suggesting the occurrence of some non rainfall related feature that caused increased migration almost nationwide.
Another means of assessing the ability of the modelled results to replicate the observed data is through the use of root-mean-square deviation (RMSD). The lowest deviation calculated is that for Bobo Dioulasso with five run averaged residuals of 8 (9.9%). By contrast, the highest deviation calculated is that for Centre with residuals calculated of 19 (26.1%), indicating relatively high individual differences between observed and modelled data points. While the RMSD values calculated between observed and modelled data for each zone combine to produce an overall RMSD for the total AMARC2 modelled migration of 43 (12.7%). There is no absolute criterion for a “good” RMSD value but this base provides an interesting platform from which to undertake comparisons of later model performance under different structural arrangements.
Correlation coefficient values calculated for the AMARC2 model results indicate that the model is able to accurately replicate the migratory trends seen in the EMIUB2 data across all five model zones. RMSD values however suggest that although the general trends are being accurately portrayed by the model, it is weaker at generating the level of annual variation in migration flows recorded by EMIUB2. For example, as displayed in Figure 6.11, observed migration from Centre varies greatly from year to year while maintaining a general pattern of increasing migration over the thirty year validation period. While the AMARC2 modelled migration follows the same pattern of increasing migration, the year-to-year variation in flows seen in the observed data is not replicated to the same extent resulting in the relatively high RMSD value. However, to expect the AMARC2 model to capture the full extent of the observed variation is unrealistic due to the nature of the model, and this research, in wishing to elucidate the role of changes in rainfall variability in the migration decision rather than capture the full migration decision which will, in reality, be influenced by numerous and interacting socio-economic factors, not all of which will be affected by rainfall.