Summary - Summary and Contributions - Hazard Avoidance Alerting With Markov Decision Processes

7. Summary and Contributions

7.1 Summary

In this thesis a framework for designing hazard avoidance alerting systems was presented, based on a Markov decision process model of alerting, and motivated by identified weaknesses with existing methods. Two alerting “logics” were created using the framework and were compared in terms of standard performance measures to logics created with more typical methods, demonstrating a benefit.

The use of MDP methods was motivated by a lack of certain features in existing methods for direct derivation of alerting logics from performance requirements. One of these is the ability to reason about future decision opportunities that might influence the current decision. In particular, such knowledge is important for placement of the alerting threshold, because it is what allows deferral of alerts: knowing whether safe options will be available in the future affects the current decision. Another desired feature is the ability to model and account for uncertain dynamic modes in the observed situation. Modes describe distinct types of behavior a system could exhibit at a given time, and uncertainty in the mode complicates the state predictions needed for decision making. Mode uncertainty also motivates being aware of future decision opportunities, because actions have predictable effects on mode uncertainty. In particular, alerts may be deferred partly in expectation of decreasing mode uncertainty.

The MDP-based methodology requires a Markov state and probabilistic dynamic model of the operator-plant system, a probabilistic observation model, and creation of a reward function that describes the alerting system’s (designer’s) goals in terms of cumulative rewards that can be gained along future system trajectories. Uncertain mode variables are modeled probabilistically, and the resulting distribution, or belief state, can

be updated at each step to reflect changes in the uncertainty due to new evidence. With these components, MDP theory provides means to derive an efficient alerting policy that allows computations for alerting decisions to be done in real time. The policy determines both the threshold for alerts and the later sequence of cues that guide an operator during resolution of the hazard.

The policy is a function of the current system state that produces the best action from an available set. The state can be a set of variables or a distribution over

variables—a belief state—including mode variables. In the belief state case the solution can be less straightforward, but methods exist.

The policy inherently takes into account future decisions through application of Bellman’s equation, which itself is an effect of the principle of optimality. Under an assumption of utility-based preferences, this principle says that the utility of an action at a given state depends only on the utility of reaching the next state, assuming the next-state utility is optimal (maximized). Thus, choosing the next action requires no assumption of any particular trajectory being followed later.

The MDP-based methodology was used to derive alerting logics for two kinds of aircraft encounter, one a head-on collision scenario with random altitude variations, and the other an uncertain 2-mode scenario with a safe (level-off) and an unsafe (continued descent) mode. These case studies demonstrated how alerting system goals can be expressed as a reward function, computation of an alerting policy, and use of the policy as an alerting and guidance threshold. The second case study also showed the modeling of an uncertain mode, effects of the mode on policy computation, and the behavior of the resulting logic. In the second case study the MDP-based alerting logic was tested against alternate logics designed according to current practice using standard performance metrics, and the performance benefits of MDP design were made apparent. The importance of using global average performance metrics, including traditional metrics like unnecessary alert and incident rates, alongside reward function requirements was also explained.

A claim is made that the reward function basis of the alerting process must agree with or complement the alerting preferences of the human operators. This is to minimize the rate of improper alerts, defined as alerts that the operators find incorrect, and which include nuisance alerts. However, at this time there is no clear description of this relationship to guide design of the reward function. In the case studies a simple reward function was chosen that makes a trade-off between safety and unnecessary alerts at the threshold. Trading off safety for unnecessary alerts is an established practice in alerting design. The resulting performance compares well with SOC-based alerting, where a threshold is defined in the space of P(SA) and P(UA). In terms of the global SOC performance metrics, the MDP-based logic achieves superior safety to compared SOC- based logics for a given unnecessary alert rate. In addition the MDP-based logic is better able to avoid alerts during level-off mode scenarios while maintaining a given level a safety.

The case study systems were made purposely simple for clarity. This leaves a question of whether MDP methods will also apply to more complex alerting systems requiring more state variables. In principle they do, but because the number of states can increase exponentially with the number of state variables, it is easily possible to run into computing speed and memory limits (Bellman called this problem the “curse of

dimensionality.”) As a consequence, more complex alerting systems may require policy or utility function approximations that reduce the number of variables and states. The tabular utility function representation and policy derivation methods that were convenient in the case studies may be too inefficient for use general.

In document Hazard Avoidance Alerting With Markov Decision Processes (Page 119-121)