The main takeaway from these analyses is that looking at functions of the graph history is a must when setting out to explore temporal graph data.
In the case of plain vanilla preferential attachment, we show that the estimator works well and has simple and promising extensions to preferential attachment with multiple edges. It’s not a stretch then to believe that it would extend naturally to multiple change points as well.
Unfortunately, real world data is not generally so well-behaved. Nevertheless, we showed that looking at very simple graph functions can reveal a great deal of understanding about
10http://www.github.com/yichijin/pa-changepoint 11https://graph-tool.skewed.de/
what’s going on structurally behind the scenes. At this point the approach raises more questions than answers, but we feel that it will be possible in the near future to extend our change point methodology to adapt to these messier situations.
CHAPTER 5
Decreasing cascades on scale-free graphs
5.1 5.1. Introduction
This chapter seeks a simple answer to a complicated question: how does information spread on a social network?
The dynamic we would like to study in this chapter is the propagation of information through a scale-free network—specifically, a very specific type of propagation inspired by “retweet” dynamics on the social networking platform Twitter. In contrast to our change point work, we are less interested in the growth dynamic of the underlying network and more interested in the cascade on top of the network.
Almost all person-to-person interaction on the internet is, by definition, carried through on social networks. One way that interactions on the internet differ from interactions in reality is that the internet facilitates large-scale, instantaneous propagation. In essence, social networks make each user their own personal media outlet.
On most social networks such as Facebook or Twitter, interactions broadly fall into one of two classes which we will call engagements or broadcasts. Engagements are selective interactions between two users, such as private messages on Facebook or direct messages on Twitter. Broadcasts are exactly what the name implies–indiscriminate blasts to all of a user’s contacts on the network.
We are interested in studying the dynamics of how information disseminates on a social network through broadcasts. When a single user authors content on a social network and broadcasts it to a neighbor, that neighbor can either choose to ignore it or to re-broadcast
it to their neighbors. This propagation history traces a subgraph on the originating user’s social network.
Cascades have been extensively studied, and researchers have proposed countless plau- sible mechanisms for generating them, see Chapter 2. However, in light of new results by [52] on the shape of viral cascades, we believe a new, simple model of information cascades is of relevance.
In thinking about models for cascades on social networks, a few empirical observations must be taken into account. The first is the intuitively obvious fact that the vast majority of them are tiny. In layman’s terms, most content on the internet is not extensively shared. However, a very small fraction of cascades break the mold and propagate explosively across the internet, or go viral. Any cascade model must be flexible enough to generate cascades at either extreme.
The second observation is that is that in very large (viral) cascades, theshape of cascades does not match what is predicted by simple epidemic models (see Section 2.3.2). To recap, classical models for cascades all predict that large cascades will generally fall into one of two extremes: those that are truly viral, reaching a long distance away from its source and many users at each distance; and those that are simply broadcasts, reaching only users within one or two hops from the source. In [52] this is summarized using the following concept.
For each cascade T we can associate a measure of its “viralness” by the average shortest path distance between nodes:
ν(T) = 1 n(n−1) n X i=1 n X j=1 dij
where i, j index the nodes of T and dij is the distance from node i to node j.
Most simple cascade models predict that either ν(T) is close to 2 or very large (de- pending on the size of the graph). In terms of familiar constructs, if a branching process is supercritical, then if it survives past the first generation it will tend to survive for many
generations and have large ν(T). If on the other hand a branching process is subcritical or supercritical conditioned on extinction, it is unlikely to survive past a handful of generations and thus have ν(T) close to the minimum value, 2.
In practice, [52] notes that actual social network cascades exhibit a wide range of viralities rather than being bimodally distributed at the extremes, and this is backed up by many other studies. Therefore our challenge is twofold. First of all, how can we capture this range of behaviors? Second of all, given the simple way retweeting works in reality, how can we accomplish our first goal using the simplest possible model?
A complete answer to this question is out of the scope of this thesis, but we endeavor to take the first step by proposing a simple branching process model which, we argue, is a reasonable candidate for generating these types of flows.