1.6 Thesis structure
2.1.7 Content Popularity as an illustrative example of users’ aggregated
gated behaviors
4
The interest of users spread over various types of contents. The content types include online videos [109, 49], online news articles [110] or the content in online social networks, e.g. tweets on Twitter [111]. Depending on the type of the contents, users perform different kinds of aggregated behaviors.
For instance, users show their interests to an news article by reading the article many times; or they would request an interesting video a large number of times; or a popular post in a SN would be liked by users many times [40]. As the result, an aggregated behaviors prediction model ought to consider the discussed differences in the users’ aggregated behaviors. The differences affect the selected approaches by the studies to model the popularity of different content types. Considering the mentioned differences, we review the studies on modeling the popularity of contents in what follows.
4In this section, we discuss the content popularity prediction models (C-PPM) in brief as examples for
aggregated behaviors. However, as content popularity prediction is considered as a use-case for EPM, we extensively discuss the existing content popularity models in section 2.2.
The initial attempts for modeling the mass reaction of users to a content were devoted to find the distribution that well fits to the users’ reactions to different contents. For example, recent studies have shown that the distribution of users’ requests for web pages can be represented by a Zipf’s law and is highly skewed [112].
However, studies on modeling the view count of online videos have observed various distributions. They have found log-normal [109], Zipf-like [113], power-law with exponential cut-off [49], Gamma [56], and Weibull [108] distributions as the best-fitted distributions to the video view counts.
Whatever the view count distribution is, it can show different levels of sharpness. Observ- ing a high level of sharpness for the view count distribution of a specific content indicates that a large number of users have reacted to this content in a quite short period of time.
In addition to the above findings, it has been shown that the popularity of the online videos usually follow a power-law distribution [114, 49]. This points out that the users’ interest is only limited to a small number of videos.
Apart from the above-explained approach, researchers have also searched for the set of factors that are correlated with the number of users who perform a specific aggregated behavior. In the popularity prediction paradigm, requesting a video, reading a news article or reacting to a post in SNs are examples of users’ aggregated behaviors. The factors are listed in the following.
• The number of users who perform a specific aggregated behavior is correlated to the number of users who have performed this aggregated behavior in the past: There are a wide range of models that have been proposed to capture the correlation between number of users who have performed a particular aggregated behavior in the past and future. To take the discussed correlation into account, some of the proposed models exploit a regression-based technique. Szabo and Huberman (S-H model) [42], Multivariate Linear model (ML model) [43], Linear regression [115], Multivariate Radial Basis Functions model (MRBF model) [43] and constant growth [54] are some of the well-known C-PPMs that utilize regression-based models to correlate the number of users requesting a video in the future to the number of users who have requested the video till a specific time in the past.
Probabilistic based approaches are other well-known methods to capture the above- discussed correlations. To name some of the approaches, Wu et al. [116] developed a model based on Reservoir Computing (RC), which is a neural network technique, to capture the correlations. The authors observed that there is a high correlation between the number of users requesting a video in some definite time intervals in the past and future. Tan et al. [117] proposed a model based on Pure Birth Process, which is a
time-dependent Markov Chain model, to predict the accumulated future view count of online videos. As the final example, Zaman et al. [118] proposed a model based on Bayesian network to predict the popularity of Tweets.
• The social networks and social relationships of users provide valuable informa- tion for foreseeing the aggregated behavior of users: One of the factors that has an undeniable impact on the popularity of an aggregated behavior amongst users is the social influence of other users. For example, the popularity of an online content (e.g. video) can grow because of recommending the content by users to their acquaintances. As a good illustrative example, Nwana et al. [119] developed a latent social approach to foresee popular contents in a campus network. The dataset that the authors used for evaluation contained YouTube traces in the University of Massachusetts. The authors utilized S-I model (which is a virus diffusion model) with latent parameters to capture user-user sharing probabilities. Moreover, it has been proved that there is a significant correlation between the social influence of the uploader (or the number of followers) of a video and the popularity of the uploaded video [120].
Social streams gathered from social networks, such as Twitter, can be used to foresee the sudden growth in the popularity of videos [121]. The streams can be exploited as an indicator of popular topics and also a criterion videos’ social prominence.
As the final example, Castillo et al. [122] proposed a method that utilizes the received early attention for news articles on social networks to forecast the total number of receive page views for that article on a news site.
• The content and topic of videos influence the decision of users about performing or not performing an aggregated behavior: Recent studies have shown that the genre and content of a video significantly affect the popularity of the video [123, 120]. Some genres are more popular than other genres and also in some periods of times some topics are more popular than other topics. For illustrations, Xie et al. [123] used image processing techniques to find those video that are related to two popular topics in YouTube in 2009, namely swine flu and Iran election. The authors showed that this information is much useful to predict the popularity of a new content.