• No results found

5. Multivariate Extreme Storm Surge Flooding Model

5.3 Missing Value Analysis

5.3.1 Missing Value Approaches

The missing-data approaches can be classified in two groups (Table 5.5): univariate and multivariate approaches, plus the proposed Alternative Missing Value Approach (AMVA) (see Section 5.3.2).

The univariate approach is a simple technique that retains all the data. Rather than removing variables or observations with missing data, an approach is to fill in missing values. A variety of imputation approaches can be used that range from simple to rather complex. These methods keep the full sample size, which can be advantageous for bias and precision; however, they can yield different kinds of bias.

There are four types of univariate approaches implemented in this research: mean, maximum or minimum imputation; previous and following value; simple random imputation; and interpolation.

 Mean, maximum or minimum imputation. Perhaps the easiest way to impute is to replace each missing value with the mean, maximum or minimum of the observed values for that variable. Unfortunately, this strategy can severely distort the distribution for this variable, leading to complications with summary measures including, notably, underestimates of the standard deviation.

 Previous and following value. A strategy is sometimes replacing missing values with the previous and following values. However, the linear interpolation is done on the index scale and not on the time scale of the dataset. The interpolation is obtained by the R's interpNA function (R Development Core Team, 2009).

 Simple random imputation. The simplest approach is to impute missing values based on the observed data. Whilst the simple random imputations approach can ignore extreme values, it can be a convenient starting point.  Interpolation. Missing values are replaced by linear interpolation or cubic

spline interpolation. Linear interpolation uses a linear function for each interval. However, spline interpolation uses low-degree polynomials in each of the intervals, and chooses the polynomial pieces such that they fit smoothly together. Like polynomial interpolation, spline interpolation incurs a smaller error than linear interpolation and the interpolant is smoother. In both, linear and polynomial interpolation is done for all values and also for gaps less than one week length. In Section 5.5.3, the cluster analysis shows that concurrent extreme events for the research dataset have on average a week length. The polynomial interpolation is obtained by the R 's loess function (R Core Team, 2014).

The multivariate approach takes into account all the locations of the data. It is common to have missing data in several variables (or locations) in an analysis. As in the univariate approach, it is possible to impute the mean, maximum and minimum value of all the data to fill the gaps. Another way, it is to impute several missing variables using linear regression models. Therefore, a better approach is to fit a regression model to the observed cases and then use that to fill the missing cases with the predicted values.

Table 5.5. Missing value approaches (univariate, multivariate and AMVA).

NAmethods DESCRIPTION

Original

dataset 1 DataMAT Original dataset

Univariate missing

value approach

2 DataMAT.linZ Fill missing values with zeros. 3 DataMAT.linM Fill missing values with the mean. 4 DataMAT.linD Fill missing values with the median.

5 DataMAT.linI Linear interpolation based on the index scale. 6 DataMAT.linB Impute the previous value on the index scale. 7 DataMAT.linA Impute the following value on the index scale. 8 DataMAT.impr01 Simple random imputation.

9 DataMAT.impr02 Simple random imputation. 10 DataMAT.impr03 Simple random imputation.

11 DataMAT.na.approx.a Replace by linear interpolation for all values.

12 DataMAT.na.approx.s Replace by linear interpolation only for one week gap. 13 DataMAT.na.spline.a Replace by cubic spline interpolation for all values.

14 DataMAT.na.spline.s Replace by cubic spline interpolation only for one week gap. Multivariate

missing value approach

15 DataMAT.max.imp Deterministic imputation of the maximum value per row (all the locations). 16 DataMAT.min.imp Deterministic imputation of the minimum values per row (all the locations). 17 DataMAT.ave.imp Deterministic imputation of the mean values per row (all the locations). 18 DataMAT.lm.i.ALL Impute predict values based on linear model for all the gauges.

Alternative missing

value approach

19 DataMAT.conds.max Deterministic imputation with the maximum value per row from gauges of condition 1 and 2. 20 DataMAT.conds.min Deterministic imputation with the minimum value per row from gauges of condition 1 and 2. 21 DataMAT.conds.ave Deterministic imputation with the average value per row from gauges of condition 1 and 2. 22 DataMAT.cond1.imp Deterministic imputation with the values from the gauges of condition 1. 23 DataMAT.cond2.imp Deterministic imputation with the values from the gauges of condition 2. 24 DataMAT.lm.p.cond1 Predict values based on linear model the gauges of condition 1. 25 DataMAT.lm.p.cond2 Predict values based on linear model the gauges of condition 2. 26 DataMAT.lm.p.cond3 Predict values based on linear model the gauges of condition 1 and 2. 27 DataMAT.lm.i.cond1 Impute predict values based on linear model the gauges of condition 1. 28 DataMAT.lm.i.cond2 Impute predict values based on linear model the gauges of condition 2. 29 DataMAT.lm.i.cond3 Impute predict values based on linear model the gauges of condition 1 and 2. 30 DataMAT.loess.p.cond1 Predict values based on local polynomial regression fitting for the gauges of condition 1. 31 DataMAT.loess.p.cond2 Predict values based on local polynomial regression fitting for the gauges of condition 2. 32 DataMAT.loess.p.cond3 Predict values based on local polynomial regression fitting for the gauges of condition 1 and 2. 33 DataMAT.loess.i.cond1 Impute predict values based on local polynomial regression fitting for the gauges of condition 1. 34 DataMAT.loess.i.cond2 Impute predict values based on local polynomial regression fitting for the gauges of condition 2. 35 DataMAT.loess.i.cond3 Impute predict values based on local polynomial regression fitting for the gauges of condition 1 and 2.

5.3.2 Alternative Missing Value Approach (AMVA)