Second-order digital inequality: A clickstream analysis of e-commerce use
E- commerce use
3.1 Data sample
We tested our hypotheses on a unique set of clickstream data courtesy of comScore. Clickstream data represents a record of an individual’s online activities. It tracks the user’s navigation path online, collecting information, for example, on the websites the user visits, on the actions conducted at each site as well as on e-commerce transaction details such as domain names, products and prices. In contrast to site-centric data, which only assimilates information for a given website, syndicated clickstream data is “user-centric” (Padmanabhan, Zheng, &
151 Kimbrough, 2001) because it chronicles the online activities of individual users across multiple websites.
Clickstream data provides a particularly powerful empirical basis for studying facets of Internet use. It is frequently applied in the field of online marketing to evaluate browsing behavior, the effectiveness of online advertising and online shopping patterns (Bucklin & Sismeiro, 2009). With regard to the latter, the focus of research has largely been on predicting purchase conversion, understanding the factors driving successful transactions and investigating alternative pricing mechanisms in auctions (Moe, 2006; Park & Bradlow, 2005).
Using clickstream data as an empirical basis has several key advantages. First, it avoids the typical weaknesses of cross sectional data such as self-report bias and common rater effects (Podsakoff et al., 2003) by tracking actual behavior. Second, a clickstream dataset typically covers a period of several months. The longitudinal nature of the data means that the risk of a sustained behavioral bias by the user is minimal. Third, user-centric clickstream data in particular encompasses a very large and detailed set of information that would be difficult to aggregate using survey-based measures. For the purpose of our study, which attempts to understand e-commerce use in a more in-depth and nuanced manner, clickstream data provides the level of detail needed to accurately capture actual use. These advantages come with the tradeoff that the clickstream dataset does not provide empirical insight on the mediating factors that influence use.
Our dataset comprised 19958 Internet users from 10000 households in the US whose Internet activities were tracked for a period of six months from May to October 2012. Participants were part of an opt-in comScore consumer sample which is compiled using industry standard methodologies such as random digit dial (RDD) recruitment and membership incentives (Padmanabhan et al., 2001). To normalize self-selection bias in the opt-in sample, comScore employs a technique called “iterative proportional fitting”. In this process, they use an enumeration survey and calibration panel sample with participants only recruited via offline channels (Cook & Pettit, 2009). The obtained measures are used to calculate a weighting scheme for the opt-in panel to ensure population representativeness and normalize the main sources of online recruitment bias as well as self-selection bias, such as proportionally attracting more heavy Internet users (comScore, 2014).
152 To ensure sample validity, we applied a number of restrictions. We limited transactional data observations to four product categories: apparel & accessories, consumer electronics, home supplies & living, and health & beauty. Other purchases, such as music downloads, digital subscriptions and food delivery orders, were excluded. The rationale behind this selection was to define a homogeneous comparison basis that only includes products that can be purchased online on several different platforms and for which price comparisons and e-coupons are available. In addition, we only included participants with complete demographic data, a minimum age of 18 years and at least one e-commerce transaction in the observation period. The resulting sub-sample encompassed 2819 users and 14260 transactions. This constitutes one of the largest samples in the study of e-commerce use to date.
The dataset included user-level browsing and transaction-related data points from the top 20030 mainstream e-commerce websites in the US and the largest alternative e-commerce, e-coupon and price comparison websites. As we were interested in e-commerce platforms rather than individual websites, we classified the URLs in one of the following disjoint categories: general retailers, specialized retailers, brand shops, auctions, daily deals, flash sales, price comparisons and e-coupons. The URL classification was undertaken by two independent raters, who received the same platform descriptions and coding criteria. Intercoder reliability between the two raters was 97%. After discussing the eight discrepant codes, the two raters reached full agreement (see Appendix).
Table 3 summarizes the sample characteristics. We observe an even gender split in all except for the highest income category. The age distribution is skewed towards users between the ages of 18-44. This is, however, consistent with findings on the age distribution of the actual online shopping population in the US (Forrester Research Inc, 2013). Over 80% of the participants use the Internet for personal purposes for at least five hours a week. This reflects a good level of exposure. Notably, the average number of transactions for each income class is fairly equal across groups, and participants from the lowest income class spend a proportionally higher percentage of their income online compared to participants from higher income classes. As such, a general familiarity with e-commerce transactions can be expected for all income groups.
30 In the resulting sub-sample of 14260 transactions, only 144 of the top 200 mainstream e-commerce websites were represented (see Appendix I).
153
Table 3. Demographic characteristics of sample (n=2,819)