Handling Time - Designing and Implementing Features

2 Data to Insights to Decisions

3. In what ways could a predictive analytics model help to address the business problem? For any business problem, there are a number of different analytics

2.4 Designing and Implementing Features

2.4.3 Handling Time

Many of the predictive models that we build are propensity models, which predict the likelihood (or propensity) of a future outcome based on a set of descriptive features describing the past. For example, the goal in the insurance claim fraud scenario we have been considering is to make predictions about whether an insurance claim will turn out to be fraudulent after investigation based on the details of the claim itself and the details of the claimant’s behavior in the time preceding the claim. Propensity models inherently have a temporal element, and when this is the case, we must take time into account when designing the ABT. For propensity modeling, there are two key periods: the observation period, over which descriptive features are calculated, and the outcome period, over which the target feature is calculated.³

In some cases the observation period and outcome period are measured over the same time for all prediction subjects. Consider the task of predicting the likelihood that a customer will buy a new product based on past shopping behavior: features describing the past shopping behavior are calculated over the observation period, while the outcome period is the time during which we observe whether the customer bought the product. In this situation, the observation period for all the prediction subjects, in this case customers, might be defined as the six months prior to the launch of the new product, and the outcome period might cover the three months after the launch. Figure 2.5(a)^[38] shows these two different periods, assuming that the customer’s shopping behavior was measured from August 2012 through January 2013, and whether they bought the product of interest was observed from February 2013 through April 2013; and Figure 2.5(b)^[38] illustrates how the observation and outcome period for multiple customers are measured over the same period.

Figure 2.5

Modeling points in time using an observation period and an outcome period.

Often, however, the observation period and outcome period will be measured over different dates for each prediction subject. Figure 2.6(a)^[39] shows an example in which, rather than being defined by a fixed date, the observation period and outcome period are defined relative to an event that occurs at different dates for each prediction subject. The insurance claims fraud scenario we have been discussing throughout this section is a good example of this. In this example the observation period and outcome period are both defined relative to the date of the claim event, which will happen on different dates for different claims. The observation period is the time before the claim event, across which the descriptive features capturing the claimant’s behavior are calculated, while the outcome period is the time immediately after the claim event, during which it will emerge whether the claim is fraudulent or genuine. Figure 2.6(a)^[39] shows an illustration of this kind of data, while Figure 2.6(b)^[39] shows how this is aligned so that descriptive and target features can be extracted to build an ABT. Note that in Figure 2.6(b)^[39] the month names have been abstracted and are now defined relative to the transition between the observation and outcome periods.

Next-best-offer models provide an example scenario where the descriptive features are time dependent but the target feature is not. A next-best-offer model is used to determine the least expensive incentive that needs to be offered to a customer who is considering canceling a service, for example, a mobile phone contract, in order to make them reconsider and stay. In this case the customer contacting the company to cancel their service is the key event in time. The observation period that the descriptive features will be based on is the customer’s entire behavior up to the point at which they make this contact. There is no outcome period as the target feature is determined by whether the company is able to entice the customer to reconsider and, if so, the incentive that was required to do this. Figure 2.7^[40] illustrates this scenario.

Loan default prediction is an example where the definition of the target feature has a time element but the descriptive features are time independent. In loan default prediction, the likelihood that an applicant will default on a loan is predicted based on the information the applicant provides on the application form. There really isn’t an observation period in this case as all descriptive features will be based on information provided by the applicant on the application form, rather than on observing the applicant’s behavior over time.⁴ The outcome period in this case is considered the period of the lifetime of the loan during which the applicant will have either fully repaid or defaulted on the loan. In order to build an ABT for such a problem, a historical dataset of application details and subsequent repayment behavior is required (this might stretch back over multiple years depending on the terms of the loans in question). This scenario is illustrated in Figure 2.8^[40].

Figure 2.7

Modeling points in time for a scenario with no real outcome period (each line represents a customer, and stars signify events).

Figure 2.8

Modeling points in time for a scenario with no real observation period (each line represents a customer, and stars signify events).

In document Fundamentals of Machine Learning for Predictive Data Analytics (Page 82-85)