Skill Scores 37 - NWP ERROR VERSUS VISIBILITY PARAMETERIZATION

A. NWP ERROR VERSUS VISIBILITY PARAMETERIZATION

2. Skill Scores 37

As a baseline performance metric, the Brier Skill Score (BSS) of the ensemble predictions is computed at four βe thresholds corresponding to daytime visibilities of approximately1_{6.5, 4.5, 2.75, and 0.875 mi. The BSS is obtained by comparing the Brier}

Score of the forecasts to the Brier score of a reference forecast, which for this research is persistence.

The persistence forecast is defined as the condition observed at the initialization time of the forecast preserved unchanged through the remainder of the forecast run. As noted previously, observations reporting an elevated βe due to precipitation were removed from the dataset. However, when precipitation was occurring at the initialization time of an NWP run, it is necessary to categorize the observation as either above or below the βe threshold of interest so the persistence forecast can be defined (even though the 00-h observation itself is still excluded from the results). In these cases, the persistence forecast was categorized as meeting the βe criteria if the 00-h observation had a dewpoint

1_{These thresholds are approximate due to uncertainty in the relationship between β}_e_{and visibility.}

depression <2.2 K (following the logic used by ASOS) and the observed βe was above the threshold of interest. If either of these conditions were not met, the persistence forecast was categorized as not meeting the βe criteria.

Following Wilks (1995), the Brier Score can be decomposed into reliability, resolution, and uncertainty, and these are also shown. A Ranked Probability Skill Score (RPSS), which is similar to BSS except it combines the performance at all four thresholds into a single metric, is also computed. Each of the relevant metrics is described in Table 4.

Except for RPSS, verifying metrics for all sites combined are provided in Figure 13. In order to assess the relative impact of NWP model error versus visibility parameterization error on the final predictions, two sets of results are shown on each plot: the results using just the deterministic SW99 visibility parameterization (solid blue lines), and the results using the parametric visibility parameterization (dashed black lines). The same metrics are provided separately for the coastal, valley, and mountain regions in Figures 14, 15, and 16, respectively. The RPSS for all regions combined and each individual region are shown in Figure 17.

As a broad summary of Figures 13–17, the NWP predictions show increasing skill with forecast hour compared to persistence, with the most skill in the mountain region and the least skill in the valley region. A close examination of these results follows in subsequent sections; for now, note that in nearly every plot in Figures 13–17, the results when the SW99 visibility parameterization was used are indistinguishable from when the parametric visibility parameterization was used.

The lack of visibility parameterization uncertainty at the four tested thresholds is evident in virtually every metric and region. The first-order error in βe prediction from the ensemble is from the NWP predictions of qc, and the conversion of qc to βe plays a negligible role. This does not mean visibility parameterization error is absent, only that it is unimportant given the magnitude and nature of the qc predictions from the NWP model. The following section will examine the qc prediction errors, and reveal why this is the case.

Table 4. Description of metrics used to assess stochastic predictions from the ensemble.

Metric Formula Description Best Score Worst Score

Reliability 1 M Ni(( pe')i oi) 2 i1 I



Measures how well a given forecast probability

matches the observed frequency of occurrence 0 1 Resolution 1 M Ni(oi o) 2 i1 I



Measures degree to which ensemble, through its probability forecasts, can parse data into subsamples

having frequency of occurrence different from

overall climatological frequency Uncertainty score 0 (frequency of occurrence in every subsample = overall climatological frequency) Uncertainty _o(1 o)

Does not depend on forecast, only on climatological frequency; indicates level of difficulty

in obtaining resolution

N/A – but scores may range from 0 (event occurs 0% or

100% of time, so no resolution possible) to 0.25 (event occurs 50% of time,

maximizing potential resolution score) Brier Score Reliability – Resolution + _Uncertainty

Combines reliability and resolution to summarize

overall ensemble accuracy 0 1

Brier Skill Score (relative to persistence)

1 Brier Score

Brier Score_persistence

Measures overall stochastic skill of ensemble at particular threshold. Value of 0 indicates forecast is no

better or worse than persistence forecast. 1 -∞ Ranked Probability Skill Score (relative to persistence) 1 Brier Scorek k1 T



Brier Scorepersistence





k k1 T



Combines multiple thresholds to indicate overall stochastic skill of

ensemble. Value of 0 indicates forecast is no

better or worse than persistence forecast.

1 -∞

M = number of forecast/observation pairs I = number of probability bins (11) N = number of data pairs in bin i

pe’ = center of forecast probability bin (0.025, 0.1, 0.2, 0, … 0.7, 0.8, 0.975) for bin i

ōi = observed relative frequency for bin i

ō = climatological frequency (total occurrences / total forecasts) T = number of event thresholds

Figure 13. Ensemble reliability (left column), resolution (center column), and Brier Skill Score (right column) at four different βe thresholds: 0.29 km-1 (top row), 0.41 km-

Figure 17. Ranked Probability Skill Score for all regions (top left), coastal region (top right), valley region (bottom left), and mountain region (bottom right).

In document Toward Improving Short-Range Fog Prediction in Data-Denied Areas Using the Air Force Weather Agency Mesoscale Ensemble (Page 62-69)