Chapter Contribution and Organization - IDENTIFYING INFLUENTIAL USERS

5. IDENTIFYING INFLUENTIAL USERS

5.1.4 Chapter Contribution and Organization

A preliminary study of the work has appeared at the 15th Asia-Pacific Web Conference in 2013 [LPLS13]. In that conference paper, the study focuses on the proposed influence model – IDM-CTMP, and shows its advantages over two baselines, which are not necessarily continuous-time models. In this dissertation chapter, (1) we propose three “dimensions” of users’ influence in the social network to help others understand different aspects of influence; (2) we conducted comprehensive experiment to systematically measure users’ influence and compare different influence models over three proposed dimensions; (3) two heuristic continuous-time influence models are defined as baselines to further show the advantages of our proposed model.

In summary, the contributions of this chapter are listed below.

1. We introduce three dimensions on application perspectives and provide an evaluation framework to systematically measure the influence and compare different influence models (See Section 5.6.3).

2. Comprehensive experiments are conducted on various extracted networks (mentions, retweets, replies), as well as temporal propagation paths from the large-scale twitter data (See Section 5.6).

3. Two heuristic influence models considering the topic diffusion in continuous time are defined as baselines (See Section 5.4) to highlight the strengths of our proposed dynamic information diffusion model based on the Continuous- Time Markov Process.

The remainder of this chapter is organized as follows. Related work on influence modeling is reviewed in Section 5.2. Before discussing about any influence models, we propose three dimensions of social influence in Section 5.3. After, in Section 5.4, we first give the definition of the temporal influence network, introduce some ex- isting influence models, and propose two heuristic dynamic influence models. In Section 5.5, we propose an information diffusion model based on the Continuous- Time Markov Process. Experimental results are demonstrated in Section 5.6. In particular, we discuss the three dimensions of influence and present a comprehensive empirical study on a large-scale twitter data set to compare the influence metrics (including both the dynamic influence metrics and well-known static influence metrics) within our proposed evaluation framework in Section 5.6.3. We evaluate the prediction power of our proposed information diffusion model in Sec- tion 5.6.4. Finally Section 5.7 concludes the chapter.

5.2 Related Work

A number of recent works have addressed the matter of user influence on social network. Many of them regard user influence as their network metrics. Kwak et

al. [KLPM10] found the difference between three influence measures: number of followers, page-rank, and number of retweets. Cha et al. [CHBG10] also compared these three measures, and discovered that the number of retweets and the number of mentions are correlated well with each other while the number of friends does not correlated well with the other two measures. Their hypothesis is that the number of followers of user may not be a good influence measure. Weng et al. [WLJH10] regarded the central users of each topic-sensitive subnetwork of the follower-and- followee graph as influential users. Other work such as [GL10, RGAH10, ALTY08, TSWY09] mined users influence from their static network properties derived from either their social graphs or activity graphs.

Various dynamic diffusion models have also been proposed to discover the influential users. They are shown to outperform influence models based on static network metrics [RD02, GL10]. A lot of work in this direction are devoted to viral marketing. Domingos and Richardson [DR01, RD02] were the first to mine customer network values for ‘influence maximization’ for viral marketing in data mining domain. The proposed approach is a probability optimization method with the hill-climbing heuristics. Kemper et al. [KKT03] further showed that a natural greedy strategy can achieve 63% of optimal for two fundamental discrete- time propagation models - Independent Cascade Model (IC) and Linear Thresh- old Model (LT) . Many diffusion models assume the influence probabilities on the edges or the probability of acceptance on the nodes are given or randomly simulated. Goyal et al. [GBL10] proposed to mine these probabilities by analyzing the past behavior of users. Saito et al. [SKOM10a, SKOM10b] extend IC model and LT model to incorporate asynchronous time delay. Model parameters including activation probabilities and continuous time delay are estimated by Maximum Likelihood. Our proposed diffusion model is different from the above discussed models: (1) We model the dynamic probabilities of edge diffusion and node thresh- old changing over the time, rather than computing the static probabilities. (2)

Our model is a Continuous-Time diffusion model instead of a discrete-time diffusion model. Although Saito et al. also proposed Continuous-Time models, the fundamental diffusion process of their models are following LT and IC models. For example, in asynchronous IC, an active node can only infect one of its neighbors in one iteration, while our proposed models does not assume iterations so that nodes can be activated at any time without resetting the clock in the new iteration. Moreover, the models proposed by Saito et al. supposed only one initial active user and focused on model parameter estimation, not much on prediction. The experiments are evaluated on simulated data from some real network topology. Our proposed model estimates the model parameters from the real large-scale social network data, allows many initial active users asynchronously or simultaneously to influence other users, and predicts the real diffusion sizes in the future.

In addition, most of influence models are basically descriptive models instead of predictive models. Bakshy et al. [BHMW11] studied the diffusion tree of URLs on twitter, and train a regression tree model on a set of user network attributes, user past influence, and URL content to predict users’ future influence. Our work is different from the work of Bakshy et al. in the following aspects.

1. They predict users average spreading size in the next month based on the data from the previous month. However, the dynamic nature of word-of- mouth marketing determines that the influence coverage vary over the time. Thus our work aims at predicting the spreading size of each individual user within a specific given date, so we can answer “what is the spreading size of user A within 2 hours, 1 day, or 1 month, etc.”.

2. Their work is based on a regression model. While we proposes a real-time stochastic model. The input and output of these two models are different. 3. Besides URLs diffusion, we also study the diffusion of Hashtags on twitter,

Continuous-Time Markov Process (CTMP) has been used in web-page or document browsing. Huang et al [HYHN04] adopted it to model the web user visiting

patterns. Liu et al. [LGL+_{08] also utilized Continuous-Time Markov Process to}

model user web browsing patterns for ranking web pages. Song et al. [SCHT07] employed CTMP to mine document and movie browsing patterns for recommen- dation. To the best of our knowledge, our work is the first to construct influence diffusion model based on CTMP for spreading coverage prediction and user influence on social networks. We are also the first to introduce three intuitive criteria for users to compare and choose different influence models.

In document Mining the Online Social Network Data: Influence, Summarization, and Organization (Page 124-128)