1.3 Thesis Contributions
2.4.4 Modelling Approaches
According to Fawcett and Provost (1999), there are two classes of methods for activity monitoring:
profiling In a profiling strategy, a model is constructed using only the normal activity of the data, without reference to abnormal cases. Consequently, an alarm is triggered if the current activity deviates significantly from normal activity. This approach may be useful in complex time-dependent data where anomalies do not have a well-defined concept. For example, fraud attempts often occur in different manners. Effectively, by modelling only normal activity, one is apt to detect different types of anomalies, including the ones unknown hitherto.
with respect to the normal activity, handling the problem as a classification one. A system then uses a model to examine the time series and look for anomalies. In this scenario, the recent past dynamics of the data are used as predictor variables. The target variable denotes whether the event of interest occurs.
On top of these two classes, profiling and discriminating, there is a possi- ble second distinction between approaches: uniform or individual. In uniform
approaches, a model is built using information from all Di ∈ D. On the other
hand, individual approaches build a specific model for each Di. The most appro-
priate solution is domain-dependent. For example, if each Di comprises some
idiosyncratic signal that should be modelled, it may be worthwhile to use an individual approach.
State of the Art Methods
In this section, we present a small number of methods designed for activity mon- itoring. Although the list is not comprehensive, to the best of our knowledge, it comprises a reasonable representation of the type of approaches used to tackle activity monitoring problems. One of the pioneering works in activity moni- toring is due to Fawcett and Provost (1999), where they formalise the activity monitoring predictive task. They also describe a model, dubbed DC-1 (Fawcett and Provost, 1997), which was used for detecting fraud in telecom data, or to monitor new stories. DC-1 works in three main steps. Initially, a set of rules is created, which are designed to indicate possible fraudulent behaviour. Af- terwards, an individual profiling approach is carried out as follows. The rules obtained before are used to build profiling monitors for each entity. These are used to model the typical behaviour of the respective entity relative to a rule. In effect, the system can be used to quantify how far the entity deviates from nor- mal activity. The final step of DC-1 is a weighting mechanism that maximises the performance of the system.
Weiss and Hirsh (1998) presented a method called Timeweaver to predict rare events from a sequence of events. They applied the method to predict telecom equipment failure using a set of alarm messages. Timeweaver works by using a genetic algorithm to identify prediction patterns from the data. Af-
terwards, a greedy algorithm used the generated patterns to create rules that distinguish normal events from anomalous events.
Other examples of approaches in the literature are the works by Salvador et al. (2004); Vilalta and Ma (2002); Ghosh et al. (2016). Although different methods show some variations in their approaches, the basic idea is similar. A system models whether or not a rare event started recently, or is starting shortly, according to the recent activity of the entity being monitored.
Actionable Forecasting
In some domains of application, the event of interest is defined according to the observed values of a certain numeric variable. For example, a common event of interest in the intensive care unit of hospitals is acute hypotension. An acute hypotensive episode (AHE) is defined as a 30-minute interval in which 90% of the values of mean arterial blood pressure are below 60 millimetres of mercury (Ghosh et al., 2016). The most common approach is to model AHEs as a binary classification problem. The recent values of several physiological signals are used to create the predictor variables. The target variable denotes whether or not there is an impending AHE.
An alternative formulation is to model the underlying numeric variable. Us- ing the same predictor variables as a classification model, a regression algorithm can be used to forecast the future values of the numeric variable used to define the event. A subsequent deterministic function is used to map the forecasted value(s) into a decision (i.e. whether or not the event of interest occurs). This regression-based approach is designated as actionable forecasting (Ba´ıa, 2015). In our example, Rocha et al. (2011) take this approach to predict impending AHE. They create a model for forecasting the future values of mean arterial blood pressure, which is the variable used to define an AHE. The decision pro- cess about whether or not there is an impending AHE is carried out according to these predicted values.
Ba´ıa and Torgo (2017) present a study comparing the two approaches, i.e. a classification-based approach with a regression-based approach, for deciding the correct trading actions in the context of financial trading. In the first approach, a classification model predicts the correct course of action, buy an asset, hold
it, or sell it. In the second approach, a forecasting model first predicts the price variation of an asset. A subsequent deterministic function is applied to decide the correct course of action.
Evaluation
The evaluation of predictive models for activity monitoring tasks is typically constrained by two issues: class imbalance, and time-dependency among con- secutive observations. As we mentioned, events of interest are typically rare. This issue has an impact on the evaluation of predictive models (Branco et al., 2016a). Moreover, missing an event of interest does not have the same cost as issuing a false alarm. Recalling the hospital example, failing to anticipate a health crisis in a patient is more costly than launching a false alarm. Cost- sensitive models are often used to cope with this problem (Chan and Stolfo, 1998).
The evaluation of activity monitoring problems also needs to take into ac- count the timeliness of alarms. Suppose that an alarm is issued about an event. A second alarm about the first one adds no information. Moreover, the concept of true negative (Flach, 2019) is not well defined in these problems. Because of the continuity of time, there may be “infinitely many true negatives” (Fawcett and Provost, 1999).
In order to cope with these issues, Fawcett and Provost (1999) proposed to use the AMOC (Activity Monitoring Operating Characteristic) curve as an evaluation framework for these problems. This approach is similar to ROC (Receiver Operating Characteristic) (Provost et al., 1997) but tailored for time- dependent domains.
Weiss and Hirsh (1998) extended the classical precision and recall metrics to evaluate activity monitoring models. Similarly to AMOC, these metrics, reduced precision and event recall, were designed to accommodate to the time- dependency among observations. In Chapter 6, we will describe these metrics in more detail.