Predictability of the users’ activities in World Wide Web

1.6 Thesis structure

2.1.3 Predictability of the users’ activities in World Wide Web

At the following, we review some models that exploit the history of users’ activities in World Wide Web (WWW) to extract the repetitive patterns in their activities. The history of users contains all the searches and requests of a user and her location and time in which she does

her search. Through our review, we also discuss how the findings of the presented studies can be considered as supportive evidences for the predictability of the user activities in WWW.

• Statistical analysis: In the literature, various aspects of the statistical properties of user activities in the WWW have been studied. The common initial step for this analysis is to categorize the activities into a number of classes.

To categorize the activities, Kumar et al. [72] classified the user activities in WWW into three main classes including content (e.g. news or videos), communication (e.g. email or forums) and search (main search or item search).

Statistical analysis of the above classes shows that more than 50% of the user activities belongs to content category [72]. Moreover, on average about 81% of the web-pages that a user visits have been visited at least once before and most of the users visit two web-pages more often than other pages [73]. The purpose of more than 60% of the activities is to go to a specific web-page; while purpose of 25% of the activities is to access to a specific data, e.g. weather related measurements, from any web-page [74]. Some other studies have limited their analysis to analyzing the statistics of the activities of users in Social Networks (SNs). As a generic result, it has been shown that the users’ activities in SNs increases in the middle of day; however, users do not continuously interact with their SNs in the long log-in periods (i.e. more than 30 mins) [75]. Similar to the above-mentioned studies, the studies on the user activities in the SNs also categorized the activities into a number of categories [52, 75].

As an illustrative example, Benevenuto et al. [52] introduced 9 categories of customary behaviors of users, namely Universal search, Scrapbook, Messages, Testimonials, Videos, Photos, Profile & Friends, Communities and Others (e.g. account log in). Firstly, Benevenuto et al. studied the session length, inter request time distribution and inter session time. The authors showed that power-law distribution is well-fitted to the aforementioned time distributions. The authors also investigated the statistics of the sequential activities that are performed within and between categories. Their findings showed that 77% of transitions from one activity to another one happen within similar categories (which is in agreement with the findings of [75]). Schneider et al. [75] proved the existence of the repetitive activity sequences in the interaction of users with SNs.

The above-presented findings show that there exist a number of underlying patterns that frequently repeat in the users’ activities in the WWW.

• Proposing a number of factors that limits user activities in WWW: To predict the user activities in WWW, some models benefit from a set of basic factors that are able to explain the reasons behind of performing those activities by users. In what follows, we discuss some of the well-known factors employed by existing works.

1. Graph theory provides a mathematical tool to represent and model the activities of users in WWW and SNs. Graph theory based models has been proposed as a powerful tool to represent and model the connections between users and correlations between their activities in WWW, particularly in SNs. To do this, researchers have employed two types of methods, namely undirected graphsand directed graphs [76].

The first method makes an edge without specifying any direction between two users; e.g. when there is a friendship [77] between them. While the second one considers a directed edge to show a social interaction in SNs between two users. For example, visiting the webpage of user 1 by user 2 is considered as an interaction between them; and to represent the interaction, an edge with direction from user 2 to user 1 is constructed between them.

For illustration, Wilson et al. [78] used undirected graph to construct a social graph between users. The authors examined whether social links in users’ SNs can be considered as a sign of interaction amongst users. Considering wall posts and photo comments as a indicative sign for social interactions, the authors found that users have interaction with only a small group of their friends.

Jiang et al. utilized the directed graph method to model the main graph theoretical characteristics, e.g. average path length that connects two users and social degree distribution, of SNs’ users. The authors considered visiting a user’s profile by another one as an indicator for an interaction between users. The interactions determine a directed edge in the constructed social graph.

Ghosh et al. [79] considered a bipartite graph theory model to represent and model the evolution of inter-group relationships and spreading information amongst users in different groups. They defined two different sets. One set consists of popular social networks and the other set contains the users of the social networks. Directed edges are created between a user and the social networks in which the user is a member.

2. There is a correlation between the future activities of users to their previous activities as well as their acquaintances’ activities: As the first example of the models that utilize the mentioned consideration, Raghavan et al. [55] proposed a

model that take advantage of a first order Markov model to correlate the future activity of users to their previous behaviors and the current behaviors of the users. The authors used a dataset collected from Tweeter to test their model. Their results show that their proposed model is able to well describe the statistical properties of inter-tweet durations.

As the second example, Tan et al. [80] presented a “noise tolerant time-varying factor graph model” (NTT-FGM) that is able to model and predict users’ next activities in SNs. The NTT-FGM model is a graph theory based model that corre- lates each user’s activity to the attributes of the user and her previous activities as well as other users’ current and previous activities. Tan et al. evaluated the performance of their proposed model in predicting the future activities of users. The findings reported in [80] show that NTT-FGM outperforms weighted-vote relational neighbor (wvRN) and SVM-light models.

As the last example, using the impact of a user’s activity on the activity of other users (i.e. correlation between), Trusov et al. [81] developed and empirically tested a solution to identify active users. Active users are defined as the set of those who keep a social network attractive and have impact on other users’ activities. More specifically, the authors investigated whether increase or decrease in the active time of a user can cause changes in the time period of others’ activity or not. Results in [81] show that the proposed model is able to identify active users with accuracy of 92%.

In this section, we discussed the degree of predictability of the user behaviors by considering both their statistical properties and underlying factors governing their behaviors. In doing so, we utilized two familiar examples, which are users’ mobility and user activities in WWW. Since these examples cover relatively a big subset of user behaviors, the observed predictability can be generalized to other kinds of the behaviors performed by users.

In document Pattern profiling of users' behaviour. (Page 39-42)