Ch apter 7: Conclusion - Statistical problems in measuring surface ozone and modelling its patt

In chapter 2, the inverse Gaussian distribution was considered as a possible alternative to the lognomial as a model for the frequency distribution of observed SO2 concentrations. It was suggested that as the inverse Gaussian distribution has a heavier upper tail than the lognormal, it might have been possible to addi ess the problem of underestimation of the frequency of potentially damaging high concentrations of SO2 by its use. The work carried out indicated that the fit of the inverse Gaussian was even less acceptable than the

lognormal. Given a better result at this stage, it would have been possible to improve the fit by using alternative fitting methods such as a minimisation of the Anderson-Daiiing test statistic considered, but with the results obtained this was not worthwhile.

Chapter 3 made use of the fact that data were available from two ozone analysers recording side by side to investigate any differences between the observations obtained from both machines. The data analysis indicated that while the actual value of the differences was small (generally only 1-2 ppb) there were a number of occasions in every sequence where only one of the two machines had recorded a temporary jump in ozone levels, by as much as 40ppb. The fact that this jump only occuned on one of the two machines, and occurred on each with roughly equal frequency for each, suggests that these jumps are a feature of the monitoring equipment. These jumps could have serious implications for analyses of

extreme values conducted on these data, as the jumps would in many cases be considered to be the daily or hourly maxima. It would therefore be of great interest to obtain more data sets where two machines are available and investigate the occurrences of these jumps further. Removing the jumps in the data, three Box-Jenkins ARIMA type models were considered as possible models for the difference in readings between the two machines. One of these, the IMA(1,1) model, was shown to have a sound theoretical basis for selection, and fitted the data reasonably well. This approach could be extended by the use of a similar model, but with a non-Gaussian random shock distribution, as the residuals obtained after model fitting were shown to be from a distiibution with slightly heavier tails

than the noraial. The t-distribution is an obvious choice, and this would be the logical next step in any extension of this work.

Chapter 4 began the laigest topic of this thesis, the analysis of the behaviour of ozone levels over a day. This first step was to consider the average diurnal cycle of ozone for each month of a year, and investigate the variation in shape and level of these curves. This chapter also introduces the concept of distorting the time axis for each day of the year such that sunrise, sunset, midday and midnight occurred at the same (transformed) time each day. This approach was shown to make comparison of the behaviour of ozone in a day over the year simpler, as any effects due to the position of the sun in the sky (notably solar radiation and temperature) are not masked by changes in sunrise and sunset times over the year. The monthly average diurnal curves calculated after this time distoition was carried out were shown to have similar properties, with the daytime and night time sections being quite distinct in each case. The reduced par ameter version of the polynomial model was shown to provide a reasonable description of these curves from month to month.

The diurnal behaviour of ozone was further investigated in chapter 5. The difference in average diurnal cycle from month to month over a year was hypothesised to be due to the existence of several typical types of ozone day, representing different weather conditions, whose relative frequencies change over a year'. The work carried out in chapter 5 sought to identify these types, by the use of a classification methodology treating each ozone day as an object for classification. Due to the work carried out in chapter 4, it was decided that the use of the distorted time axis was desirable when considering days from different times of the year together. It was also decided after the work in chapter 4 to consider the whole day, daytime and night-time curves from each day separately. The classification methodology was then used and identified two types of whole day curve, four types of daytime curve and five types of night time curve. The behaviour of ozone over the daytime, night time or whole day cur*ve was shown to be different in level and/or spread according to which basic type the curve had been described as. The distribution of weather types during the day,

night or whole day was also shown to be significantly different over types. It was also shown to be possible to explain much of the variability obseiwed in the monthly average curves considered in chapter 4 by using the frequency of types of whole day observed in each month. These facts indicate that the different types of curve discovered by the classification method can be related to differences in physical processes and that they can reasonably be considered to have physical meaning. As further validation of this approach, data from another site were considered using the same methodology, and the resulting whole day shapes were found to be veiy similar. This provides further evidence that the types discovered represent differing physical processes.

There are many possibilities for further work using this classification methodology. First, data from many other sites could be analysed. If similai* types aie discovered in the data from these, then further investigation of the behaviour of ozone within each of these could prove useful for understanding the different processes governing ozone production and depletion. Each type could possibly be related to a different set of physical processes operating. Modelling ozone behaviour within the same type could be made simpler, as each curve would be generated by similai" processes. The relative frequencies of the types over time or different sites could also provide valuable insights into the changes in physical processes governing ozone levels.

The classification methodology introduced could also prove useful when analysing data from other areas. The primary case would be when data are in the form of objects

consisting of ordered observations. The 'area between two curves' dissimilarity measure has been shown in chapter 6 to have desirable theoretical properties, and takes account of any such ordering, unlike most measures of dissimilarity commonly used. The importance of considering the ordering was illustrated in section 6.1, where examples ar e given where ignoring the time ordering by using a standard dissimilarity measure leads to very different conclusions.

In document Statistical problems in measuring surface ozone and modelling its patterns (Page 186-189)