4.4 Classification Framework and Classifiers
4.4.3 Features
We next describe the features (explanatory variables) that we use to train our abandonment and redial classifiers. With respect to caller history features, we concentrate on the callers’ waiting times, abandonment decisions, and intercontact times in their two most recent calls.7 In addition to our set of caller history features, we include indicators for how long callers have waited in their current call and whether the call arrived during the busy time. We describe each feature below:
• Caller History Features:
– log lag1(2) Wait: The log of the caller’s waiting time (in seconds) during the most recent (2nd-most recent) call.8
– lag1(2) Ab: Indicator that the caller abandoned the most recent (2nd-most recent) call.
– log lag1(2) Intercontact Days: Log of the number of days between the current call and the most recent (2nd-most recent) call.9
6We measured prediction accuracy as the area under the receiver operating characteristic curve. 7
In reduced-form tests, we found statistically significant relationships between the callers’ current behavior and their waiting times, abandonment decisions, and intercontact times in their third- and fourth-most recent calls. However, to limit the computational burden of training the classifiers and to avoid overfitting, we concentrate on caller history from the two most recent calls.
8
We choose log of the waiting time rather than the waiting time due to its better fit in reduced-form tests.
– lag1(2) Fresh: Indicator that the most recent (2nd-most recent) call occurred within the past 2 days (16 days). We find that our classifier prediction accuracy improves when we include these indicators. One potential reason for this is that callers who called recently may have the experience fresh in their mind during their current call. We choose 2 days (16 days) because approximately one third of intercontact times between the most recent call (2nd-most recent call) and the current call occurred within the past 2 days (16 days).
– lag1(2) Old: Indicator that the most recent (2nd-most recent) call occurred more than 35 days ago (85 days ago). We also find that our classifier accuracy improves when we include these indicators. One potential reason for this is that callers whose previous calls were not recent may not recall their previous experience during the current call. We choose 35 days (85 days) because approximately one third of intercontact times between the most recent call (2nd-most recent call) and the current call occurred at least 35 days (85 days) ago.
– Interactions: We include several interactions variables of the above history features: ∗ log lag1(2) Wait × lag1(2) Ab: The log of the waiting time in most recent
(2nd-most recent) call if the call was abandoned.
∗ lag1(2) Fresh × lag1(2) Ab: Indicator that the most recent (2nd-most recent) call occurred within the past 2 days (16 days) and was abandoned.
∗ lag1(2) Old× lag1(2) Ab: Indicator that the most recent (2nd-most recent) call occurred at least 35 days (85 days) ago and was abandoned.
∗ lag1(2) Fresh × log lag1(2) Wait: The log of the waiting time in the most recent (2nd-most recent) call if the call occurred within the past 2 days (16 days). ∗ lag1(2) Old × The log lag1(2) Wait: The log of the waiting time in the most
recent (2nd-most recent) call if the most recent (2nd-most recent) call occurred at least 35 days (85 days) ago.
– First Call History Group (FCH Group): We find that the callers’ waiting time and abandonment decision in the first call of their history is highly predictive of their behavior in subsequent calls. Hence, we group callers by assigning each caller to a
“first-call history” (FCH) group based on how long they waited (in seconds) during the first call of their history and the outcome of their first call (whether they abandoned or received service). For example, one FCH group includes all of the histories where callers immediately entered service in their first call, while another FCH group includes all of the histories where callers waited between 105 and 122 seconds before abandoning their first call. We include an indicator for each FCH group as an available feature for the classifiers to use in prediction. We choose 20 FCH groups for each outcome, making a total of 40 FCH groups (20 groups × 2 outcomes). Also, we choose the waiting time cutoffs to make the number of histories assigned to each group as close as possible.10 • Other Features:
– Current Wait Bin: In reduced-form tests we find that caller patience varies as they wait for service. For example, callers are much more likely to abandon during their first 10 seconds of waiting than during their second 10 seconds of waiting. We also find that callers who wait longer before abandoning are more likely to redial. Hence, we use the callers’ waiting time in their current call to train our classifiers. To capture non- linearities in the effects of the callers’ current waiting time on their current behavior, we create 40 waiting time bins, where the bin boundaries are based on the number of periods that callers have waited during their current call. We choose 40 bins and select the boundaries as to make the number of abandonment decisions that callers made in each bin as close as possible.11 For example, one bin contains the first period that callers chose to wait (abandon after redialing), while another bin contains the 41st, 42nd, 43rd, and 44th periods. Finally, because there are 40 current waiting time bins, we create 39
10
The waiting time cutoffs (in seconds) for callers who received service in their first call are the following: 0, 1, 2 - 6, 7 - 10, 11, 12 - 22, 23 - 34, 35 - 47, 48 - 61, 62 -76, 77 - 93, 94 - 112, 113 - 134, 135 - 158, 159 - 186, 187 - 219, 220 - 261, 262 - 323, 324 - 437, and 438 seconds and greater. The waiting time cutoffs (in seconds) for callers who abandoned their first call are the following: 0 - 7, 8 - 14, 15 - 24, 25 - 33, 34 - 43, 44 - 54, 55 - 68, 69 - 86, 87 - 104, 105 - 122, 123 - 149, 150 - 172, 173 - 203, 204 - 231, 232 - 243, 244 - 266, 267 - 311, 312 - 387, 388 - 513, and 514 seconds and greater. Per FCH group, the average number of calls that we examine is 4,114 (596) for the FCH groups where the caller received service in their first call (abandoned their first call).
11
The boundaries for the current wait bins are the following periods: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 20, 21, 23, 25, 27, 29, 31, 33, 36, 38, 41, 44, 48, 52, 56, 61, 68, 76, 86, 101, and 130. We also trained our classifiers using 10, 20, and 80 current wait bins but found that 40 bins led to the most accurate predictions as measured by AUC.
Current Wait Bin indicator variables.
– Busy Period: To control for seasonality we include an indicator that the current call arrived during the busy period (Monday through Friday, 9:00 - 17:00) as defined in§4.3. Finally, note that in the first call of each history there are no available history features since callers have no history to that point. Furthermore, in the second call of each history there are only history features from the most recent call. We therefore have an unbalanced panel of data. To account for this, we train our classifiers separately for the first call of each history, the second call of each history, and the third or greater calls for each history, and include the available features each type of call. Going forward, we refer to these subsets of calls as the Call 1, Call 2, and Call 3+ subsets, respectively. For reference, we include Figure 4.9 which lists the available features under the Call 1, Call 2, and Calls 3+ subsets.
Figure 4.9: Available Features (Explanatory Variables) by Call Number