3. Description of Monitoring Campaigns and Data
3.7. Preparing Monitored Particle Concentrations for Analyses
Outlier Removal
Monitored datasets as described in this chapter showed some very high concentration values at minute level, for example during cooking events or in traffic situations. High concentrations even during short time periods contribute to adverse health effects caused by particle exposure. High
124 values on the other hand can provide a mathematical problem, as they can influence for example the mean concentration of an hourly average or the fit of a model when compared to monitoring data. In order to avoid these effects, high (or low) concentrations that are very different to the rest of the data distribution should therefore be removed as outliers.
As preparation for the analyses in this thesis it was decided to remove unusual values. Outlier removal was undertaken at a minute level to a) allow for a more precise removal of single values, and b) avoid unnecessary removal of data points. There are several common methods for outlier removal, such as Chauvenet's criterion or Grubbs' test. They however assume normal distribution of a dataset, which is not met by the monitored data of this thesis. Some other methods focus on removing single outliers. Datasets collected for this theses however frequently showed several very high values. A generic removal of extreme concentrations is sometimes practiced, such as removal of concentrations outside a range of twice the IQR around the mean. This approach is however not feasible for the datasets of this thesis as it would remove many high concentrations during and after cooking, as well as peak concentrations in traffic. The best remaining method was the removal of outliers from visual observation using box plots, as suggested by Harris & Jarvis (2011). Visual removal of outliers relies on visually observing gaps between the majority of data points and data points that are much above or below most other values. Removal of outliers for this thesis was based on the main principle of visual observation. In order to assure consistency within this thesis the following protocol was however followed:
• Concentrations within twice the IQR of a dataset are never removed as outliers.
• For concentrations above two times the IQR (no low extreme values existed in any of the datasets) the following rules were observed:
o For datasets with a range below 1000µg/m3: If the distance between two data values is more than 10% of the dataset’s range, all minute concentrations above this ‘gap’ in the dataset are removed.
o For datasets with a range above 1000µg/m3: If the distance between two data values is more than 5% of the dataset’s range, all minute concentrations above this ‘gap’ in the dataset are removed.
Removed data values are indicated in the data distributions presented in Appendix G.
In a few of the following analyses, extreme data values at an hourly level substantially altered results. In these occasions it was decided to remove the extreme values in the same way described above for minute concentrations. Results for these analyses are however always presented both with and without extreme values.
125
Addressing Skewed Data Distributions
Datasets of particle concentrations are often positively skewed, as they have a minimum level of zero and no upper limit (Kruize et al. 2003). A positively skewed dataset has a tail of high values away from the mean (Harris & Jarvis 2011). Several methods applied in the following analyses, such as Pearson’s r and R2, are influenced by the skewness of a dataset. Particle concentrations with high temporal frequency, such as minutes, show often particularly high skewness, as the variability of minute concentrations can be high. All datasets described above have been to some extent aggregated for the following analyses. After aggregating the datasets and removal of minute outliers many of the datasets were approximately normally distributed with a skewness factor of between -1 and 1 (Harris & Jarvis 2011).
For the remaining positively skewed datasets, a potential method to reduce skewness would be by log transformation. It was however decided against log transforming skewed datasets, as log transformation would reduce the importance of extreme values in an analysis. High concentrations are however of particular importance in human exposure to particles. Reducing their importance in the analyses is therefore not considered a valid option. It should be noted that several results for the following analyses (especially for Pearson’s r and R2) would have been improved if the concentrations had been log transformed.
A few additional datasets are introduced in the coming chapters. Data descriptions of these datasets is undertaken using mean, SD and range, which are some of the most commonly used parameters for data description (Briggs et al. 2008; Gulliver & Briggs 2007; Boogaard et al. 2009; Pfeifer et al. 1999b; McNabola et al. 2009b; McNabola et al. 2009a; Praml & Schierl 2000). Datasets described in the following chapters show generally only a mild degree of skewness for their data distribution. Mean and SD may therefore be applied, instead of median and IQR applied above for highly positively skewed minute datasets.
3.7.2. Methods to Evaluate Model Performance
Several parameters are applied in this thesis, which measure the degree of similarity between two datasets. These parameters are foremost applied in this context to compare modelled and monitored particle concentrations, and as such to evaluate model performance. Two of the most commonly applied model performance parameters are Pearson’s correlation, or Pearson’s r, (r) and R2. Pearson’s r measures the linear relationship of two variables. The range of r is from -1, a perfect negative linear correlation to +1, a perfect positive linear correlation. A value of 0 signifies the two variables are unrelated. R2, as applied here, is equal to the squared Pearson’s r. The interpretation of
126 R2 is therefore similar to the interpretation of r. All results of R2 are however positive, ranging between 0 and 1 (Harris & Jarvis 2011).
Coefficient of Divergence (COD)
Index of Agreement (IA)
Normalised Mean Square Error (NMSE) Fractional Bias (FB)
Table 27: Model Performance Parameters.
Several less commonly applied model performance parameters are presented with according equations in Table 27, with predicted values (P) and observed values (O) of samples (i). The coefficient of divergence (COD) measures the degree of similarity between values of two datasets. It has a range of between 0 and 1 with a COD value of 0 indicating the values are the same, a COD of 1 means maximum difference. Generally the interpretation is similar to Pearson’s correlation (or R2); however the COD provides a stricter metric. A COD of 0 means data values are identical, a Pearson’s correlation of -1 or +1 on the other hand refers to a perfect linear correlation (Gaines Wilson & Zawar-reza 2006; Arku et al. 2008).
R2, Index of Agreement (IA or IOA), Normalised Mean Square Error (NMSE), and Fractional Bias (FB) are commonly applied together to describe model performance in the context of air pollution exposure modelling (Kousa et al. 2001; Beckx, Int Panis, Van De Vel, et al. 2009; Elbir et al. 2010; Gulliver & Briggs 2005). For this thesis this set of four parameters has as well been applied to describe and evaluate model performance.
R2 has been introduced above and describes in this context the degree of linear correlation between modelled and observed. For the description of R2 results the following broad categories are used: 0.0 > R2 > 0.3 = low, 0.3 > R2 > 0.4 = medium low, 0.4 > R2 > 0.6 = medium/moderate, 0.6 > R2 > 0.8 = medium high, R2 > 0.8 = high.
The IA (sometimes also abbreviated IOA) measures the skill of the model to predict variations about the observed mean. IA values range between 0 and 1 with 0 indicating no agreement between variations and 1 indicating perfect agreement between modelled and observed. A value above 0.5 is considered a good model fit (Gaines Wilson & Zawar-reza 2006; Noth et al. 2011; Beckx, Int Panis,
127 Van De Vel, et al. 2009). The IA is sensitive to total differences between data values of observed and predicted, as well as to proportional changes.
The NMSE measures the distance between modelled and observed data points and as such is a measure of mean relative scatter, reflecting both systematic and unsystematic (random) errors. The NMSE parameter is normalised, so it is not dependent on unit. The NMSE is sensitive to occasional large errors due to the squaring process (Gaines Wilson & Zawar-reza 2006; Beckx, Int Panis, Van De Vel, et al. 2009).
The fractional bias (FB) between two datasets shows the degree of over- or underprediction of the predictions. The FB can range from -2 to +2, with -2 indicating extreme underprediction, and +2 indicating extreme overprediction; a FB of 0 indicates no over- or underprediction. FB values equal to −0.67 and +0.67 are equivalent to under- and over-predictions by a factor of two respectively (Beckx, Int Panis, Van De Vel, et al. 2009).
Applying the presented set of four parameters, the perfect model performance would be as follows: R2 = 1, IA = 1, NMSE = 0, FB = 0. In the coming chapters model performance is evaluated using this set of four parameters presented above, first in chapter 4 for the evaluation of a temporally adjusted LUR model, developed as ambient base model, then in chapter 5 to compare performance of different indoor models applied from the literature, and finally in chapter 6 to evaluate performance of the personal exposure model.