5.5.1
Selection of predictors
In order to identify potential informative predictors for the regional SPI-1 fields over entire Russia, we perform a detailed cross-correlation analysis of the H500 and MSLP of the SL-AV hindcast data sets. For evaluating possibly relevant cross-correlations, we fix the spatial range for both predictands to 20-90◦N and 50◦W to 180◦E. This area is chosen to cover the Northern part of the Atlantic, Pacific and Arctic oceans as well as continental parts of Northern Eurasia. The SPI-1 fields are obtained from the CAMS precipitation data within the region of 40-90◦N and 20-180◦E. In order to reduce the corresponding amount of pairwise correlations and account for the fact that very detailed regional resolution is hard to achieve and interpret, we divide the aforementioned regions into larger spatial sub-regions, each covering 5 (10◦) in lati- tude (longitude) instead of working with the native spatial resolution of the data (2.5◦). This coarse-graining procedure resulted in 322 grid cells for the SL-AV data sets. From the CAMS-derived SPI-1 data, 121 of the resulting cells are considered as FRs, all of which cover at least some part of the land surface of Russia (Fig. 5.15).
The coarse-graining of the predictor (H500 and MSLP) and target (SPI-1) fields is achieved by representing each resulting grid cell by its geographical centre and com-
5.5. Statistical forecasting scheme for Russia 81 puting a weighted mean (with weights according to the spatial distance to the centre of each cell) over the data for all original grid points that contribute to each respective cell. Based upon this, reduced data sets are obtained, defining potential predictors for the SPI-1 values in each FR as spatial regions in which the respective pressure variable exhibited an absolute correlation value above 0.4 (with statistical significance beyond the 95% confidence level) with respect to the target SPI-1 value for a given calendar month (i.e., separately for the three boreal summer months June, July and August). By this analysis we identify a set of candidate regions in the H500 and MSLP fields from SL-AV that might serve as informative predictors for each FR and each calendar month for which SPI-1 is to be predicted. Notably, these candidate regions have a tendency to cluster in space.
In order to obtain a robust and numerically feasible forecasting scheme, we further reduce the number of individual candidate regions to a set of predictors by selecting the weighted mean value for each group of candidate regions that meets the following conditions (see Figure 5.16 for a schematic illustration): (i) minimum size of three sub-regions, (ii) spatial connectedness (i.e., each region in a group needed to have at least one direct neighbour region that belongs to the same cluster) and (iii) stability (i.e., groups used for predicting SPI-1 values at neighbouring regions need to have a sufficiently large overlap). For the purposes of this case study, the latter require- ment is manually checked for all pairs of FRs, while a more formal and automatically testable criterion for spatial overlap is used in the later West African case study (see Section 6.4.1) to increase its objectivity. Taken together, our reduction procedure re- sults in a comparatively small set of predictors, which are subsequently used in the forecasting step (see below). Note that while most FRs are associated with several predictor combinations (typically of the order of 5-10), there are others for which our approach does not result in any suitable pair of predictor variables from the MSLP and H500 fields. As a consequence, from the initial 121 FRs, the predictor selection left us with pressure covariates allowing SPI-1 forecasts for 81, 73 and 78 FRs for June, July and August, respectively.
5.5.2
Scheme description and implementation
In Section 5.5.1, it has been demonstrated that both H500 and MSLP exhibit regions that show strong co-variability with local SPI-1 changes over the Russian Federation. Motivated by this finding, we hypothesize that combinations of potential H500 and MSLP predictors thus derived can be utilized to constrain the expected SPI-1 values for any given grid point, together with their associated uncertainty. Specifically, we propose using model 1 (Section 4.3.1) in which all possible pairwise combinations of individual H500 and MSLP predictors identified in Section 5.5.1 are considered to form a set of linear regression equations for the predictand (local SPI-1 value) for the corresponding FR and calender month of interest. Note, that it would be equally possible to utilize combinations of more than two predictor variables from the con- sidered set of candidate predictors, as well as pairs of variables stemming from the
Figure 5.16: Schematic illustration of the selection of informative predictors. The black square depicts the forecast region (FR) for which a prediction is to be made for a certain calendar month (for all years with the same set of statistical models, see text), while groups of dots with different colors indicate different informative predictors. In the example illustrated here, 10 independent regression equations can be formed by combining 5 MSLP and 2 H500 predictors to generate the forecast ensemble.
same pressure field. To this end, we use only pairs of H500 and MSLP predictors on one hand, to reflect the fact that both fields influence SPI-1 and, on the other hand, to keep the regression models as sparse as possible, so the corresponding regression parameters could be well constrained and minimize numerical problems due to a pos- sible collinearity of predictors. Instead, the maximal possible variety of combinations of predictors from the H500 and MSLP fields are utilized to explore the full space of possibilities in the probabilistic forecasting task and capture as much of the associated forecast uncertainty as possible.
Following this rationale, we consider all combinations of pairs of previously identified H500 (xi) and MSLP (yj) predictors to provide individual forecasts of the SPI-1 (zi j)
for a given FR and month in terms of a set of linear regression equations, as described in Section 4.3.1. By this procedure, each forecasting site is associated with an individ- ual set of linear equations, which are independently solved to generate an ensemble of individual SPI forecasts based on one-month lead-time H500 and MSLP forecasts of the SL-AV model. This set of forecasts can then be exploited in both deterministic and probabilistic ways.