• No results found

Using RASCAL to Investigate Atmospheric Science Data In Flight

The screen captures used to illustrate the following discussion have been created during a normal run of the software and can be easily reproduced, no special prior knowledge is required.

6.5.1

Identification of Data of Interest

There are two aspects to identifying data that may be of interest for adapting a flight path, or for identifying data for analysis after the campaign. Firstly separating data that is truly anomalous from that which appears anomalous but is not and, secondly, identifying multiple regions of data that have similar anomalous characteristics. These are illustrated in figure 6.3.

(a) (b)

(c)

Fig. 6.3 RASCAL screen views showing (6.3a) The early part of the flight with no signif- icant anomalies and measurements falling with a range of standard values. (6.3b) Two data regions in similar locations relative toO3spikes. The red data shows abnormal levels

of acetaldehyde in the cluster plot. (6.3c) Two data regions with abnormal acetaldehyde levels in the cluster plot. We also see their location on the flight path and the altitudes marked in black on the trace plot.

at the trace plot (top) this data can be seen after a significant spike inO3. Immediately

after the next spike and before the following one is a similar region coloured magenta. However, when the group cluster plot (lower right) is examined it can be seen that whereas the red region constitutes the bulge, the magenta region has measurements in the normal region.

Using RASCAL Clustering to Identify Multiple Anomalies of Interest

Still later in the flight it is noticed that a second bulge begins to appear, also related to anO3 spike as shown in figure 6.3c. This time the acetone values are higher than the previous bulge, however, they remain consistent with normal values and it is the acetaldehyde measurements that increase. In this image the two regions are identified as red and magenta. These data values are shown in the lower-centre cluster plot. This time the map view of the flight path is also shown and the two data regions can be seen separated in time, however, due to the flight path they are somewhat physically closer than might be expected. The trace plot, where the altitudes are indicated by the black lines, shows that the magenta region is at a lower altitude. This is consistent with a plume of air, rising in height and drifting geographically with a slow loss in, or mixing out of, acetone as it does so.

Using Selectable Data Clustering for Additional Information

Having identified multiple regions of anomalous data further exploration of these regions can be carried out using DDC clustering on alternative selected data streams. Using the drop down menus to the right side of the lower-right plot, see figure 6.4, DDC is performed on any pair of data sets. Figure 6.4 shows the resulting plot from selecting Acetaldehyde and MVK-MACR (i.e., the signal derived from proton-reaction mass spectrometry corresponding to methyl vinyl ketone and methacrolein, first-generation reaction products of the biogenic hydrocarbon, isoprene). In the cluster plot on the lower right it is seen that both the regions have raised levels of both MVK-MACR and

Fig. 6.4 Using selectable data-stream clustering to explore data streams. By selecting suitable data streams from the drop down menus we can apply clustering to MVK-MACR and Acetaldehyde. The display shows that the two selected data regions, red and magenta, both have raised levels of MVK-MACR and Acetaldehyde.

acetaldehyde. These parameters are known to have a correlation with biomass burning, [166, 78, 39].

It can also be seen from the flight path that the magenta region is at lower altitude and appears to be geographically narrower than the higher altitude red region. These data appear consistent, therefore, with the spread of pollutants as a biomass burning plume rises and drifts.

6.5.2

Using Model Outputs to Identify Data of Interest

In figure 6.5 the output from the CiTTyCAT Lagrangian model ([96, 132]), has been used to demonstrate the use of RASCAL with model data. The trace plots are used to compare the CiTTyCat model output, dashed line, with the actual instrument reading, solid line, forO3and indicate three sets of data for further investigation. The solid green line is the flight altitude.

Red Data Region

Where the data is highlighted in red, the model outputs bears little relation to the actual readings. Whereas, generally, throughout the flight the model values rise and fall in approximate synchronization with the readings, in the red zone the model predicts a large drop inO3despite the rising altitude. Overall, throughout the flight there is a general

Fig. 6.5 Identification of Further Regions of Interest: RASCAL screen view showing model data comparison where the regions in red and black indicate where the model prediction is incorrectly predicting dips inO3and magenta where the model accurately predicted narrow spikes inO3

tendency for the model output to become closer to reality suggesting the red anomaly may be caused by the initial model parameters. CiTTyCAT is a research modelling tool; for operational mode it would be a straightforward extension of the software to provide on-the-fly skill metrics.

Numbered Spikes

Where the data is indicated with numerical values the model (subscript 'm') has predicted spikes inO3approximately the matching instrument readings (subscript 'i'). The region

highlighted in magenta (labels 4-6) shows spikes with a good temporal match to the readings and, in one case a good match in magnitude.

Dark Green Data Region

In the dark green data region, from time code≈55,000, the model is predicting a drop inO3where the actual readings show a spike. Other than this, the model is still providing a reasonable match to the overall profile of the instruments. Although there is some temporal shift for most of the spikes, and the scaling and the amplitude of the peaks may be different, the general shape of the profile is similar from the red region up until this point. This indicates some data worthy of investigation as it may be, for example, an unknown, unexpected cause for the actual chemistry change, or some accumulated error in the model output.