In the previous section, we evaluated the performance of our multivariate imputation method against a range of alternatives for a variety of simulated scenarios. We now consider an application related to the use of atmospheric monitoring techniques within the oil and gas industry.
There has been an increased interest in Carbon Capture and Storage (CCS) projects over recent years due to the environmental benefits that such operations can bring (Bachu, 2008; Leung et al., 2014). However, the associated climate benefits are highly dependent on the efficient containment of the injected gases. For this rea- son, there has been a particular focus on developing reliable atmospheric monitoring techniques to detect and locate CO2 leakages within CCS sites. Current approaches include methods based on atmospheric tomography (Jenkins et al., 2011; Levine et al., 2016), Lagrangian particle dispersion models (Luhar et al., 2014) and Gaussian plume dispersion models (Hirst et al., 2017). Any attempts to detect such leaks can be hin- dered by the presence of missingness within the data as missing entries must either be removed or replaced with suitable estimates before further analysis can take place. Within this section, we focus on the problem of imputing missing values in a mul- tivariate time series arising from a CCS project using data provided by our industrial collaborator. We consider a trivariate signal of length 2048 corresponding to approxi- mately one week of measurements; Figure 4.5.1(a) shows the CO2 concentrations over time for three sensors. The signal exhibits a range of missingness, including data miss- ing both at random or for consecutive time points across one or more sensors. The
(a) (b)
Figure 4.5.1: Time series plots of the CO2 concentration for three sensors over the same time period: (a) original series; (b) detrended series.
total number of missing values across the signal corresponds to approximately 6.5%. Due to the zero-mean assumption of the mvLSW model, before analysis we detrend the series by fitting a smoothing spline to each of the components and considering the residuals, see Figure 4.5.1(b).
It is important to note that after removing the trend effect and imputing missing values in the now zero-mean process, the trend must then be introduced back into the series in some way to ensure direct comparisons can be made between the original and imputed time series. However, this means that we must be aware that any interesting behaviour within an imputed time series may be a feature of the imputation method or could be due to the trend being added back into the series.
We apply the mvLSW-based imputation approach withp= 20 points considered in the clipped predictor for both the forecasting and backcasting steps. For comparison, we apply the mtsdi method and the VAR-fb approach as, of the existing methods, these performed better in the simulation study. Since the true values of the test signal are unknown at the missing time points, we compare the results visually. The
(a) MvLSWimpute-fb (b) mtsdi
(c) VAR-fb
Figure 4.5.2: Imputation results obtained from applying mvLSWimpute-fb, mtsdi and VAR-fb approaches to CO2 data, imputed values are shown in red.
imputation results for each of the methods are found in Figure 4.5.2; imputed values are shown in red. It can be seen that, whilst the imputation results for all three methods are quite similar, the mvLSWimpute method produces the most reliable estimate for the missing data between August 5 and 6 when compared to the daily behaviour over the rest of the week. Figure 4.5.3 shows the imputation results for each of the methods, focusing on August 5 only. As the background concentration of CO2 within the atmosphere naturally varies over the course of a day, it would be unlikely that we would see any features as imputed by mtsdi and VAR-fb (Figures 4.5.3(b) and 4.5.3(c)) where the concentration suddenly changes over a short period of time.
(a) MvLSWimpute-fb
(b) mtsdi
(c) VAR-fb
Recall that the overall aim of atmospheric monitoring within CCS regions is to be able to detect anomalous regions that could indicate releases of CO2. It is therefore important to be able to replace missing values with reasonable estimates which will then allow further analysis to be carried out. Our mvLSW imputation approach can be used as a first step to infill any missing values before attempting to detect anomalous regions or other secondary analysis tasks of interest.
4.6
Concluding remarks
In this work, we have introduced a wavelet-based imputation method that can be used to replace missing values within a multivariate, nonstationary time series. We compared the performance of our method against existing imputation approaches using simulated data examples and a dataset from a Carbon Capture and Storage facility. The simulated data examples demonstrate that the use of a backcasting step within imputation can improve the performance of the prediction methods. The case study shows that our method can be used to successfully impute missing values within time series containing both nonstationarity and seasonality, resulting in a more reliable imputation estimate compared to existing approaches.
In practice, we have found that, as with other competitor methods, the perfor- mance of our approach suffers when we have extreme scenarios such as chunks of consecutive time points missing or bursts of missingness. An avenue for future re- search could be to look at ways in which we could improve the imputation results for these cases.