A User Session-Level Analysis
3.4.1. Breadth and Specificity
In our basic hypotheses, H1 and H2, we expect that user intent can be inferred from their on-site behavior and, specifically, that attributes of search sessions can be used to predict whether a user will make a purchase. Since the purchase is a binary variable (either the search session resulted in a purchase or it did not), we conducted a logistic regression analysis to test H1 and H2. As earlier research has found that time spent searching affected user behavior in the broader internet context (Johnson, Moe, Fader, Bellman, & Lohse, 2004), we also control for total session time in this model7. In order to check for multi-collinearity, an OLS regression was carried out prior to the logit analysis (Menard, 2002). The highest variable inflation factor (VIF) found was 2.47, indicating that no serious multi-collinearity is present. Descriptive statistics and a correlation matrix for referenced variables can be found in Appendix B, Tables B.2 and B.3 respectively. The logit regression results are summarized in Table 3.3 below and prediction accuracy in Table 3.4.
7 Past research has also found number of queries to significantly affect user behavior, however, as this is
highly correlated with session duration in our dataset and is nearly constant within the full data set, we have omitted it in our model.
64
Table 3.3. Breadth and specificity test results, full data set. Nagelkerke R2 = 0.369, -2LL = 128,382.71.
Predicted
No Yes % Correct
Observed No 1,803,558 2,273 99.87%
Observed Yes 14,682 3,026 17.09%
Overall 99.07%
Table 3.4. Prediction accuracy, full data set.
Using the full data set, we find that broader searches (i.e., those less-focused on narrow topics) significantly relate to a lower probability of conversion (hence the log odds ratios below 1), both in terms of location breadth and other breadth8. We also find support for the relationship between specificity and conversion. Specificity, measured in terms of other depth, was significantly related to conversion. Further, both brand specificity and airline name specificity significantly contributed to higher conversion.
We note, however, no significant effect based on location depth. However, the preponderance of evidence suggests that users searching less broadly as well as with greater specificity can be expected to have a stronger intent to purchase. The model accurately predicts purchase 99.07% of the time, versus 98.89% accuracy if predicting
“no purchase” for each session.
This represents a significant improvement (0.18 in absolute points compared to a possible improvement of only 1.11 absolute points before 100% accuracy is reached;
8 We note here that both measures of breadth are near constants and both have mean values of 0.01.
Results of a logistic regression with these two variables omitted can be found in Appendix B, Table B.4.
65
accuracy is this improved 16.2%). However, we note that the accuracy of predicting purchasers, the more interesting of the two groups for advertisers seeking to attract only those searchers lower in the funnel, is poor. This low prediction accuracy is not unexpected; in data sets with binary dependent variables where one of the variables is many times more frequently represented than the other, logistic regression
underestimates the likelihood of the “rare event”, the purchase in this case (King &
Zeng, 2001). To address this issue, we created a subset of the data that included 5,000 randomly selected purchasers and 5,000 randomly selected non-purchasers and again estimated a logistic regression (King & Zeng, 2001)9. This approach results in models with less accurate intercept coefficients (compared to the coefficient identified through the full data set model), but with slope coefficients for the independent variables that remain unbiased (Allison, 2012). Results of this estimation can be seen in Tables 3.5 and 3.6 below.
Table 3.5. Breadth and specificity test results, evenly distributed data subset. Nagelkerke R2 = 0.894, -2LL
= 2,760.120.
9 We also conducted a rare event logistic analysis using the full data set. Results of this regression were
similar to that of the initial logistic regression also using the full data set. See Appendix B, Table B.5 for results of the rare events logistic regression.
66
Table 3.6. Prediction accuracy, evenly distributed data subset.
Adjusting for rare events in this manner produces a few interesting differences in our results. First, the pseudo-R2 (Nagelkerke) and -2 log likelihood values suggest that this may be a better model, although given the difference in sample size and DV distribution, quality of fit is difficult to determine. Second, we see weakened support for one of our two hypotheses. While H1 regarding the effects of breadth is still supported for both measures of breadth, H2, regarding specificity of search terms, received less support than in the previous model. Finally, while the overall prediction rate is lower in this model than in the previous (96.29% vs. 99.13%), we note that this represents a 46.29 percentage points improvement over predicting either one result or the other in all cases (which would yield only 50% accuracy). Thus, this model increases prediction accuracy by 92.58%.
One of the strengths of this data set is that it contains multiple queries from the same user within the timeframe necessary to be considered a “search session” based on our criteria explained earlier. We find, however, that the mean number of queries per session is 1.01 — in other words, most sessions consist of only one query. What happens, however, when we consider the sessions with multiple queries? Do our hypotheses still hold? To examine this, we divided our data set appropriately and
67
conducted a logistic regression analysis of multiple-query search sessions10. Descriptive statistics for this sub-sample (n = 133,372) can be found in Appendix B, Table B.6. The findings are in Tables 3.7 and 3.8 below.
Variable B SE p Exp(B) Supported?
Table 3.7. Breadth and specificity test results, multiple query sessions only. Nagelkerke R2 = 0.215, -2LL =
76,951.710.
Table 3.8. Prediction accuracy, multiple query sessions only.
We see, then, increased support for the importance of location depth. Airline name continues to be a weak predictor of funnel location (i.e., purchase behavior).
However, for website owners who track users over multiple query-click dyads within a single search session, we continue to see the ability of both breadth and specificity measures to predict funnel location.
From this point, we also explored whether purchasers who began by looking for specific terms would spend less time searching, thus giving us further evidence that these users are further down the funnel and pursuing a more transactional intent. To test this hypothesis, we performed a regression analysis with the following results.
10 While comparing multiple-query sessions to single-query sessions would appear interesting, of the
1,652,531 single-query search sessions, only seven resulted in a purchase. See Appendix B, Table B.7 for descriptive statistics of the data set sub-sample including only single-query sessions.
68
Table 3.9. Starting specificity test results. R2 = 0.595.
Here, we see mixed support for this idea. Users who begin their search sessions with the brand name (“XYZ Travel”) indeed spend significantly less time searching.
However, for those who begin with a specific airline name, we do not find a significant relationship (although the sign is in the predicted direction). This may be caused by a number of factors, including an ineffective link (i.e., the ad’s link does not send users to a page that includes expected information) or the possibility that those searching for airline name may still be price shopping and, thus, may still be higher in the funnel and still in need of more information and related longer search sessions. It is also possible that many users may have frequent flier accounts on particular airlines and thus always begin their travel-related browsing in that way, regardless of funnel location.