2. DEVELOPMENT OF AN IMPROVED PROTOCOL FOR EVALUATING THE
2.3 Development of PROMPT
2.3.2 Design and scope
As we reviewed in the previous section, assuming that the first MPE outcome question (i.e. model formulation) has been resolved satisfactorily, a good MPE for SIP use should provide answers to the following two questions: (1) does the model make predictions that match history?, and (2) does the model fulfill the designed task? The first question seeks to determine (a) if the model can show reasonable agreement with observation for the right reasons and (b) what level of reliability we can achieve with the model. This second
modeling results in creating SIP policy. Thus, these two questions can be summarized into one question along with complementary questions in parallel: To what extent can we accept the PAQM predictions at face value for a SIP development? And if we cannot, then how should we make judgments about the effectiveness of ozone control options? The goal of PROMPT is to provide guidelines for constructing a proper MPE protocol for state modelers to follow when they attempt to provide answers about these questions to policy makers for their specific SIP modeling.
These questions cannot be answered by following the EPA’s current protocol without performing many ad hoc diagnostic analyses. Often, these analyses require a lot of time and resources to be completed; in the past, these were performed without systematic guidance. Many of these analyses can be ineffective without taking a systematic approach because some analyses often turn out to be irrelevant to the given problems. For example, running DDM is not likely useful if meteorological inputs have serious wind speed or direction errors.
Based on our rethinking of MPE for SIP modeling and to resolve issues found in the EPA’s existing protocol, we designed PROMPT to have four desirable features so that PROMPT can (1) provide a systematic guideline for what to examine in various graphical analyses and how to perform analytical procedures including guidance on when to perform advanced diagnostic analyses, (2) extend its scope beyond the traditional range of
observation such as ground monitors by providing guidance on the use of high resolution data sets such as aircraft measurements, (3) incorporate explicit ways of taking into account policy relevant tolerance in the model evaluation framework, and (4) appraise the possible impact of model input biases on choices of ozone control options. The first two features are
designed for improved history matching. The latter two features are designed to test if a model can fulfill its designed tasks.
2.3.2.2 Scope and limitations
The application of PROMPT requires an operational PAQM for at least one episode. This means an operational modeling system including all the proper model inputs for the episode. We recognize, however, that the application of PROMPT may generate needs for reviewing the setup process of the PAQM. PROMPT does not address issues regarding general acceptability of PAQMs; these can be judged better by special evaluation studies specifically designed as part of model development. Thus, PROMPT is focused on
addressing how to evaluate the performance of a PAQM used in a specific SIP modeling case. A generally acceptable PAQM may not work on a specific case due to a limited range of meteorological conditions on which the PAQM was tested and/or other factors that were not resolved while the PAQM was developed. At the same time, it is risky to use a PAQM that is not generally acceptable. Thus, PROMPT presumes that a state selects, upon the EPA’s approval, a generally acceptable PAQM including selection of the run-time options for the specific science modules in the PAQM and has selected a proper episode for its SIP
modeling case. Again, we recognize that the findings from the application of PROMPT may lead to some changes to the model configuration (including possible model formulation changes via selecting alternative sets of runtime option or even modifying source codes) or result in selecting alternative episodes.
The outcome of examination may bring up the need for formal uncertainty analysis such as Monte Carlo analysis but a protocol for conducting these formal uncertainty analyses is beyond the scope of this current study. Adopting the formal uncertainty analysis may require
more caution; the current state of these approaches is not considered sufficient enough to deal with epistemic uncertainties systematically consistent with aleatory uncertainties (Ferson et al., 2004). Thus, PROMPT does not include any guidance on formal uncertainties analyses combined with the outcome of PROMPT because the primary focus of this study is to develop a protocol that help the evaluator identify the models’ epistemic uncertainties. 2.3.2.3 Structure of PROMPT
As describe in the previous section, one of the significant issues in EPA’s MPE protocol is that it takes a ‘waterfall’ (McConnell, 1996) approach to evaluation of model inputs with respect to effects of input biases on ozone prediction. That is, there is no systematic
feedback to model input evaluation after output evaluation. In addition, the MPE practice following the EPA protocol frequently leads modelers to do trial-and-error changes on inputs until the model performance is satisfactory in terms of statistical measures. This iterative process often requires a lot of resources and becomes fine tuning processes by losing the focus of base case SIP modeling: constructing a model that is sufficient to show the ‘right causes’ of ozone problems in an area. Also, in iterations exercised in past MPE practices, a performance measure is often used only once during the evaluation process. For example, time series plots are made and only a general description of the evaluator’s judgment about the plot is made in very short paragraphs, often only a simple statement such as shows ‘reasonable agreement’.
To overcome these shortcomings, we argue that proper MPE should be conducted in a progressive manner, i.e. a multi-phase evaluation is needed. Each phase needs to give a different degree of information with regard to model’s ability to replicate historical
all of the evaluation phases, performance measures need to be used multiple times but with higher degree of concern for details in each consecutive phase. For example, at the first phase, time series may be examined to see if peak ozone in the model at monitor location occurs close to when observed peak ozone did. In later phase, the time series should be inspected to see if there is co-related changes to nitrogen oxides changes or if there is any irregular changes in ozone signals compared to typical urban ozone signals. Therefore, we structured PROMPT to consist of four sets of analytical procedures that will be taken in a sequence. Thus, PROMPT’s design goals are pursued incrementally at each evaluation step. This approach is possible by constructing PROMPT procedures to use the same analysis material multiple times but with different degrees of inspection corresponding to the phase of evaluation while adding more material to the already evaluated material as the evaluation advances further. The advantage of progressive analysis is that it provides (1) quick screening at the beginning of evaluation, (2) chances of finding deficiencies by inspecting same material repeatedly, and (3) fast feedback to evaluators to re-visit previous evaluation phases. Therefore, the model evaluators can conduct more guided analyses in sequential phases. As shown in Table 2.1, we constructed four major questions and subsequent
corollary questions to each major question that evaluators have to answer with PROMPT by increasing the level of analyses gradually.
To incorporate the four desirable features that we identified in the previous section (two features for the matching history question and the other two features for fulfillment test), for each set of procedures, PROMPT contains the statement of analysis goals, the required information (including characteristics of information) for following procedures, the list of proposed analyses with recommended material, and the suggested procedures to follow.
PROMPT also includes the relationship among different tasks and the documentation requirement.
Table 2.1. Summary of PROMPT procedural questions.
1. Does this model show or have all necessary components to produce the
phenomena that we can expect from the current best perceptual/conceptual model? a. What are the model setup and justification? What amounts and kinds of
observation are available for evaluation? How are model inputs prepared for model operation?
b. Is the overall ozone behavior in the model consistent with the conceptual model?
c. If not, what are the possible causes? Is there any alternative model inputs or configurations?
2. Can this model distinguish which precursor(s) to control for ozone reduction? a. Does protocol for graphical measure construction exist?
b. Does model show correct source-receptor relationship?
c. Does model have biases in surface winds, NOX, and O3 (plus CO if available)?
d. Which precursors are important for potential policy options? 3. How precisely can the model estimate control requirements?
a. How does model perform at locations where observations are available? b. How does model predict at locations where no observation exists? c. What are the resolution of control options in space and time?
4. What are the possible biases in the prediction and the impact of biases on the policy choice?
a. Where does the future ozone problem occur in the model? How does the model perform and/or predict those locations?
b. Do the biases found in model predictions affect the choices of possible control options?
c. What is the evaluator’s confidence on the reliability of model performance in supporting proposed policy options?