7.6 Experiments
8.6.1 Predictability of Innovation Diffusion
First, by assessing the diffusions of speaker innovations, we show that language is predictable, using only information up to the current state of a diffusion (n), with the method used detailed in section8.3.3.
In Figure8.6.1, we plot the respective distributions of diffusion sizes for the four networks we consider. Our experiments have shown exponent values to be within a.2 interval from exponent 2 and, given this approximation, we assume a similar prediction setting toCheng:2014kmbAs the classes are split on the median value of the typical diffusion size, this assumes a binary classification (implemented through the use of logistic regression) task, where classes are balanced and, as a consequence, the performance
10
010
110
210
310
410
5Cascade size
10
-510
-410
-310
-210
-110
0p
(
X
>
x
)
Reddit Traversal Reddit User α = 1.86α = 1.85
Twitter Geo α = 4.29
Twitter Mention α = 4.04
Figure 8.6.1: The complementary cumulative distribution function (CCDF) of diffusion sizes for each network.
Figure 8.6.2: Average accuracy (acu) in predicting whether, afternobservations, the final diffusion will be greater or less than themedianof all diffusions as measured through the number of posts (|Pn(o)|).
Figure 8.6.3: Average accuracy (acu) in predicting whether, afternobservations, the final diffusion will greater or less than themedianof all diffusions as measured through the number of unique users/nodes (|An(o)|).
of a random baseline will yield an accuracy of 0.5. Thus, a model’s performance above this threshold implies that diffusion is predictable.
We assess two values across a diffusion: first, the final number of usages of an innovation (|Pn(o)|),
and, second, the number of unique nodes that have used the innovationo(|An(o)|). We highlight that,
unlike inCheng:2014kmbwe are not aiming to measure the size of a continuous cascade, but rather a diffusion across a network that many not have neighbouring activated nodes.
As our binary classifier, we use logistic regression with the evaluation being performed through 10- fold cross validation. During evaluation, we consider different models, one for each family of information sources (e.g. network topology, grammatical context). Finally, the accuracy score averaged across pre- diction tasks (of sizen) is the metric employed to assess the effectiveness of each model.
Results: The model predicts, after n observations, whether the final size of the diffusion will be above or below the median value of all diffusions that have been used at least n times. Figure 8.6.3 represents the class prediction accuracy across all four networks. The Reddit networks (both user and traversal) achieve the highest accuracies consistently over the observation periods. Reddit’s traversal accuracy peaks at 0.9 after 200 observations, with basic andcommunity features performing the best. Similarly, the Reddit comment network’scommunity andbasic features consistently achieve the highest accuracy, though there is little variation over time, with values plateauing around 0.6.
The difference in accuracy between the macro and micro Reddit network could be attributed to the variation in network structures. The Reddit user network is significantly larger (see Table 5.2), which we would expect to be more expressive of language change. However, the content generated by each node is significantly less than the traversal network that aggregates content, which in turn could reduce/amalgamate a number of the external influences that affect user language adoption.
However, unlike Reddit, the predictability of the different models for Twitter networks appears to be less stable (potentially cause by its exponent, which cause imbalanced classes (see Figure 8.6.1)). However, when looking at the results, topology features perform best for early-stage diffusions (small vales of n) for Twitter mentions, achieving similar results to baseline and temporal factors. Across Twitter mentions metrics, there is significant variation in performance; this variation could be due to higher degrees of noise given the sparser nature of the underlying network structure (there are relatively few users who are mentioned in the dataset, with electively few edges). This sparsity can also be seen to affect the results in Chapter7.
Twitter’s geo network, which is built on regions of postcodes, again, appears to be a less predictable setting for language diffusion. Feature sets such asallandcommunity-based features, within early stages of a diffusion, perform the best, with accuracies achieving a high of 0.75, though this quickly reduces below the 0.5 random baseline (Figure8.6.3).
due to the structure of the network. The Twitter geo network has a maximum of 2,910 nodes, which means that atn= 100 then 5 % of the network would have been activated. Achieving repetitively large coverage in a short number of steps might not be an issue for other networks, though for Twitter the proportion of activity per postcode is skewed, with a large number of nodes concentrated in big cities that have large populations. This means that the distribution of content across the nodes is unbalanced, concentrating the effective network into a smaller subset of nodes, thus reducing the effective size of the network. This is compounded by the fact that only 4 % of the tweets are geo-tagged, and these are skewed toward more densely populated locations.
The predictability of the number of unique users (Figure 8.6.2) paints a similar picture. This is largely due to the fact that the number of users/nodes and the number of posts featuring an innovation are likely to be highly correlated. In the case of the subreddits network, accuracies hover above the random baseline 0.5, withbasic features breaking the 0.6 boundary. For large diffusions, wheren >200,
basicandcommunityincrease. In the case of adoption number, in the Reddit comment network, accuracy stays constant for differentnvalues.
Temporal features across all networks frequently achieve accuracies below the 0.5 baseline. This is in contrast to research into the dynamics of meme and information diffusion online [185], [208], which identified that temporal aspects of a diffusion are highly important. However, [148] and [129] argue that language change is a slow and drawn out process, with some innovations spreading fast (like viral memes) as they are associated to events, but most taking an extended period of time as they are slowly integrated into people’s vocabulary.
It is worth mentioning that the set of diffusions in the data collected is expected to be highly het- erogeneous in nature, which, as with information diffusion in general, could be the cause of variations in model performance as well. Additionally, for some feature classes, accuracy decreases as diffusion size increases. This potentially indicates a change in the diffusion’s form as it grows, which is in agreement with [207], who suggests that external influence on the diffusion process becomes more profound as it grows is size.
Overall, we have shown that basic, community and topology-based models appear to be the best predictors of language innovation diffusion. Temporal models have been less stable, potentially due to variations in the diffusion process. Additionally, the aggregation of content (macro networks) into the respective groups implies higher accuracy scores as some of the external influences are mitigated. How- ever, the variation in the Twitter network’s power law distribution results in classes being unbalanced, thereby affecting the accuracy.
0.05 0.00 0.05 0.10 0.15 0.20 0.25 0.30 co n 20 0 20 40 60 80 100 120 n
Network = Reddit Traversal
20 0 20 40 60 80 100 120 n
Network = Twitter Geo
20 0 20 40 60 80 100 120 n
Network = Twitter Mention
Quartiel 1st Quartile 2nd Quartile 3rd Quartile 4th Quartile
Figure 8.6.4: Zn¯(o) computed on the first nobservations of all cascades. The first quartile represents
the smallest diffusions, and the fourth quartile the largest.