2 Data to Insights to Decisions
3. In what ways could a predictive analytics model help to address the business
2.4 Designing and Implementing Features
2.4.2 Different Types of Features
The features in an ABT can be of two types: raw features or derived features. Raw features are features that come directly from raw data sources. For example, customer age, customer gender, loan amount, or insurance claim type are all descriptive features that we would most likely be able to transfer directly from a raw data source to an ABT.
Derived descriptive features do not exist in any raw data source, so they must be constructed from data in one or more raw data sources. For example, average customer purchases per month, loan-to-value ratios, or changes in usage frequencies for different periods are all descriptive features that could be useful in an ABT but that most likely need to be be derived from multiple raw data sources. The variety of derived features that we might wish to use is limitless. For example, consider the number of features we can derive from the monthly payment a customer makes on an electricity bill. From this single raw data point, we can easily derive features that store the average payment over six months; the maximum payment over six months; the minimum payment over six months; the average payment over three months; the maximum payment over three months; the minimum payment over three months; a flag to indicate that a missed payment has occurred over the last six months; a mapping of the last payment made to a low, medium, or high level; the ratio between the current and previous bill payments, and many more. Figure 2.4 Sample descriptive feature data illustrating numeric, binary, ordinal, interval, categorical, and textual types. Despite this limitless variety, however, there are a number of common derived feature types:
Aggregates: These are aggregate measures defined over a group or period and are
usually defined as the count, sum, average, minimum, or maximum of the values within a group. For example, the total number of insurance claims that a member of an insurance company has made over his or her lifetime might be a useful derived feature.
Similarly, the average amount of money spent by a customer at an online retailer over periods of one, three, and six months might make an interesting set of derived features.
Flags: Flags are binary features that indicate presence or absence of some
characteristic within a dataset. For example, a flag indicating whether or not a bank account has ever been overdrawn might be a useful descriptive feature.
Ratios: Ratios are continuous features that capture the relationship between two or
more raw data values. Including a ratio between two values can often be much more powerful in a predictive model than including the two values themselves. For example, in a banking scenario, we might include a ratio between a loan applicant’s salary and the amount for which they are requesting a loan rather than including these two values themselves. In a mobile phone scenario, we might include three ratio features to indicate the mix between voice, data, and SMS services that a customer uses.
Mappings: Mappings are used to convert continuous features into categorical features
and are often used to reduce the number of unique values that a model will have to deal with. For example, rather than using a continuous feature measuring salary, we might instead map the salary values to low, medium, and high levels to create a categorical feature.
Other: There are no restrictions to the ways in which we can combine data to make
derived features. One especially creative example of feature design was when a large retailer wanted to use the level of activity at a competitor’s stores as a descriptive feature in one of their analytics solutions. Obviously, the competitor would not give them this information, and so the analytics team at the retailer sought to find some proxy feature that would give them much the same information. Being a large retailer, they had considerable resources at their disposable, one of which was the ability to regularly take high-resolution satellite photos. Using satellite photos of their competitor’s premises, they were able to count the number of cars in their competitor’s parking lots and use this as a proxy measure of activity within their competitor’s stores!
Although in some applications the target feature is a raw value copied directly from an existing data source, in many others it must be derived. Implementing the target feature for an ABT can demand significant effort. For example, consider a problem in which we are trying to predict whether a customer will default on a loan obligation. Should we count one missed payment as a default or, to avoid predicting that good customers will default, should we consider a customer to have defaulted only after they miss three consecutive payments? Or three payments in a six-month period? Or two payments in a five-month period? Just like descriptive features, target features are based on a domain concept, and we must determine what actual implementation is useful, feasible, and correct according to the specifics of the domain in question. In defining target features, it is especially important to seek input from domain experts.