Machine Learning Procedure - Coupling Data Science Techniques and Numerical Weather Prediction

3.6 Conclusions

4.1.3 Machine Learning Procedure

After storm tracking and matching has occurred, forecast inputs are extracted from each storm object. This approach extracts statistics about the distribution of meteorological variable values from pixels within the extent of a storm object. These statistics are the mean, standard deviation, minimum, maximum, percentiles, and skew. The full list of input variables for the CAPS ensemble is listed in Table 4.2, and the full list for the NCAR ensemble is in Table 4.3. The CAPS and NCAR ensembles use different post-processing sys- tems and archive differing amounts of data, so some variables available in one system were not available in the other.

The machine learning hail forecast procedure is summarized in Fig. 4.2. Hail occurrence and the distribution of hail sizes are extracted from the MESH tracks and are used as the output labels for the machine learning models. If a forecast storm track is matched with a MESH track, then hail is assumed to have occurred. Forecast storm tracks with no matching MESH track are false alarms, and unmatched MESH tracks are misses. The hail size distribution within a MESH object is modeled as a gamma distribution fit to the MESH object values using maximum likelihood estimation. The gamma distribution probability density function takes the form:

f(x) = (x/β)

α−1_exp(₋_x/β₎

βΓ(α) −x0, α, β >0 (4.2)

whereαis the shape parameter,x0 is the location parameter, andβ is the scale

parameter. An example of a MESH object and the gamma distribution fit to its hail size distribution is shown in Fig. 4.3. The shape parameter affects the

Table 4.2: Input variables for the CAPS ensemble machine learning models. The mean, maximum, minimum, median, standard deviation, skewness, 10th

percentile, and 90th _{percentile of the grid point values within the boundaries}

of each storm object were calculated for the Storm Proxy and Environment variables. CAPE is Convective Available Potential Energy, CIN is Convective Inhibition, ML is the Mean Layer, MU is the Most Unstable layer, and LCL is the Lifted Condensation Level.

Storm Proxy Environment Morphological

Column Total Graupel MLCAPE Area

2-5 km Updraft Helicity MLCIN Eccentricity

Reflectivity -10°C MUCAPE Major Axis Length

Updraft Speed MUCIN Minor Axis Length

Downdraft Speed LCL Height Orientation

Echo Top Height 0-3 km Storm Rel. Helicity Extent Precipitation 0-6 km Wind Shear

Precipitable Water 500 mb Temperature

Bunkers U 700 mb Temperature Bunkers V 2 m Dewpoint 2 m Temperature 850 mb Specific Humidity 0-3 km Lapse Rate 700-500 mb Lapse Rate 10 m U-Wind 10 m V-Wind 700 mb U-Wind 700 mb V-Wind Location Forecast Hour Valid Hour UTC Current Duration Total Duration W.-E. Storm Motion S.-N. Storm Motion

Table 4.3: Input variables for the NCAR ensemble machine learning models. The mean, maximum, minimum, median, standard deviation, skewness, 10th

percentile, and 90th percentile of the grid point values within the boundaries of each storm object were calculated for the Storm Proxy and Environment variables.

Storm Proxy Environment Morphological

Column Total Graupel SBCAPE Area

2-5 km Updraft Helicity SBCIN Eccentricity

Composite Reflectivity MUCAPE Major Axis Length Updraft Speed Precipitable Water Minor Axis Length

Downdraft Speed LCL Height Orientation

Thompson Hail Size Max. 0-3 km S.R. Helicity Extent Thompson Hail Size Sfc. 0-6 km Wind Shear

0-3 km Updraft Helicity 0-1 km Wind Shear 2-5 km Min. Updraft Helicity

10 m Wind Speed 0-1 km Vorticity Location Forecast Hour Valid Hour UTC Current Duration Total Duration W.-E. Storm Motion S.-N. Storm Motion

Training Data: NWP model Storm statistics Group ensemble members by shared microphysics scheme Random Forest Classifier: Does storm produce hail?

Storm is removed from grid

No Random Forest or Elastic

Net Regression: Estimate shape and scale parameters

of gamma distribution simultaneously

Yes

Draw 1000·Area samples from hail size

distribution Sort samples by size in each realization and

average across realizations

Apply mean sizes to storm object in rank order of total column graupel intensities Output: Neighborhood ensemble probabilities and ensemble maximum hail sizes

Figure 4.2: Machine learning hail size forecasting procedure diagram.

skewness of the distribution with small shape parameter values causing more skew and larger values leading to less skew. The location parameter determines the minimum value of the distribution and was fixed at 6 mm. The scale parameter stretches or squeezes the gamma distribution and has the same units as the quantity being modeled (Wilks 2011). The MESH size distributions exhibit an inverse log-linear relationship between the shape and scale parameters.

Figure 4.3: An example MESH object and the resulting discrete and parametric MESH size distributions.

I use machine learning models from the scikit-learn package version 0.16.1 (Pedregosa et al. 2011). A stacked model approach is used with a classifier model predicting whether or not hail occurs, and a regression model that predicts the parameters of the gamma distribution. A random forest (Breiman 2001a) classifier predicts whether hail will occur. The classifier random forest weights each training example by the inverse of its class frequency, has 500 trees, 10 minimum samples at the leaf node, and uses a grid search with cross-validation to determine the maximum number of features sampled from square root of the number of features, 20, 30, and 50 features.

Multitask learning models (Caruana 1997) predict the shape and scale parameters of the hail size distribution simultaneously. During training, the models choose weights and parameters to minimize the total error over all predicted values instead of fitting separate models for each value. Multitask learning provides additional information about the fitting process and maintains corre- lations among the predicted values. The hail size distribution parameters are log-transformed and normalized before fitting to capture the log-linear relation- ships and reduce bias in the error metric from fitting variables with differing

ranges of values. Both the random forest and elastic net regression (Zou and Hastie 2005) support multitask learning and are used in this experiment. A de- fault random forest, called “Random Forest,” with 500 trees, minimum samples at the split node of 10, and sampling square root of the total number of features is used. An optimized random forest, called “Random Forest CV,” with 500 trees, and cross validated grid searching of maximum features from square root, 30, 50, and 100 features, and minimum samples at a splitting node from 5, 10, and 20 samples. The elastic net determines the ratio between the ridge and lasso terms from a validation set and normalizes the values of the input features.

After the machine learning models estimate the probability of any hail and the MESH probability distribution, grid point hail size distribution estimates are performed through a sampling and sorting procedure (Fig. 4.2). For each hailstorm object with a probability of hail occurrence at least 50 %, 1000 random samples are drawn from the predicted gamma distribution for each grid point (Fig 4.2). The samples are then sorted in the dimension of the area of the object. The mean and percentiles are then calculated over the samples and are applied to the prediction grid. Convection-Allowing Model ensemble post-processing products such as neighborhood probabilities and ensemble maximum fields can then be derived from these grids and compared with other hail size and storm surrogate forecasts.

In document Coupling Data Science Techniques and Numerical Weather Prediction Models for High-Impact Weather Prediction (Page 86-91)