• No results found

Walls-in-one: usage and temporal patterns in a social media aggregator

N/A
N/A
Protected

Academic year: 2020

Share "Walls-in-one: usage and temporal patterns in a social media aggregator"

Copied!
24
0
0

Loading.... (view fulltext now)

Full text

(1)

R E S E A R C H

Open Access

Walls-in-one: usage and temporal

patterns in a social media aggregator

Matteo Zignani

*

, Azadeh Esfandyari, Sabrina Gaito and Gian Paolo Rossi

*Correspondence: [email protected] Universitá degli Studi di Milano, Dipartimento di Informatica, Via Comelico 39/41, Milan, Italy

Abstract

The continual launches of new online social media that meet the most varied people’s needs are resulting in a simultaneous adoption of different social platforms. As a consequence people are pushed to handle their identity across multiple platforms. However, due the to specialization of the services, people’s identity and behavior are often partial, incomplete and scattered in different “places”. To overcome this identity fragmentation and to give an all-around picture of people’s online behavior, in this paper we perform a multidimensional analysis of users across multiple social media sites. Our study relies on a new rich dataset collecting information about how and when users post their favorite contents, about their centrality on different social media and about the choice of their username. Specifically we gathered the posting activities and social sites usage from Alternion, a social media aggregator.

The analysis of social media usage shows that Alternion data reflect the novel trend of today’s users of branching out into different social platforms. However the novelty is the multidimensional and longitudinal nature of the dataset. Having at our disposal users’ degree in five different social networks, we performed a rank correlation analysis on users’ degree centrality and we find that the degrees of a given user are scarcely correlated. This is suggesting that the individuals’ importance changes from medium to medium. The longitudinal nature of the dataset has been exploited to investigate the posting activity. We find a slightly positive correlation on how often users publish on different social media and we confirm the burstiness of the posting activities extending it to multidimensional time-series. Finally we show that users tend to use similar usernames to keep their identifiability across social sites.

Keywords: Multiplex network, Social media aggregators, Degree correlated networks, Burstiness, Username usage

Introduction

In the last decade large datasets describing online social networks (OSNs) have been made available together with an extensive literature which enabled a comprehensive description of OSNs. Most of these studies assume that people connect and establish relationships on a single social platform. This approach, however, enables the achievement of a partial rep-resentation of the online social behavior of individuals Ruths et al. (2014). It is becoming evident that individuals are used to express their sociality through multiple layers ranging from face-to-face and on-phone communications to a variety of online social networks. Especially in their online life, users have access to a wide portfolio of social platforms which allow them to differentiate their interests and convey diverse contents. As a result,

(2)

a user can be registered to many social media and interact with her/his friends’ circles throughout different channels. For instance, a user can share her/his photos on Insta-gram, organize an event on Facebook and maintain the relationships with workmates on LinkedIn. These arguments are quantitatively reported in a survey conducted in Septem-ber 20141where it has been highlighted that the adoption of multiple social platforms is on the rise. In fact, 52 % of online users now use at least two social media, representing a significant increase from 2013, when the figures were close to 42 %. The diversification in the usage of the social platforms, in conjunction with their specialization, is causing a fragmentation of users’ identities among diverse social media. This process of identity fragmentation makes the understanding of the online behaviors more difficult since data are split and need to be matched and fused.

In this paper we move towards the reduction of the above segmentation by facing with the new challenge of giving a complete and compelling picture of people’s online behavior. To this aim we move within the multidimensional or multiplex network theory Magnani and Rossi (2011); Bródka et al. (2012); Berlingerio et al. (2012); Kivelä et al. (2014); D’AGostino and Scala (2014) since it provides the tools and the models which better capture and measure the interplay/correlation among the social sites users adopt. Specifically the scenario we are dealing with is well represented by a multigraph Zignani et al. (2015), a special case of an heterogeneous information network Sun and Han (2012). So far multiplex network theory has been applied to different real case studies; from cita-tion Li et al. (2016a), co-author Sun et al. (2011) and conference-author networks Sun et al. (2009) to power grids Brummitt et al. (2012), economic Li et al. (2016b); Li et al. (2014); Barigozzi et al. (2011) and biological networks Bauch and Galvani (2013). However most of these studies assume the underlying multiplex network is static or well-known dynamical processes occur onto them. On the contrary, little is known about the tem-poral interactions of people when they have different communication media available, especially in the online world. So the main goals of this paper arei) applying the multi-plex network theory to understand the on-going phenomenon of the adoption of multiple social site; andii) measuring how this process impacts on the interaction dynamics.

In practice, the study of the behaviors of people across different online social networks is at its very beginning. In fact, while many works have been published about profile matching algorithms across sites Zafarani and Liu (2013); Carmagnola and Cena (2009); Vosecky et al. (2009); Goga et al. (2015), only a few works Benevenuto et al. (2009); Meo et al. (2013); Abel et al. (2013); Magnani and Rossi (2013) analyze user behavior across sites, but they focus on specific behaviors as tagging or changes in clickstream.

(3)

media aggregators. Thus, it is possible to collect data about their profiles with personal public information and the posted contents.

This paper offers a first data-driven contribution to the study of people behaviors across social sites while addressing the following research issues:

Q1: How common is the use of multiple social media sites? Does the same hold for active users?

Q2: Is a person’s popularity uniform, i.e. more or less equally distributed across the social sites in use?

Q3: Is users’ production alike in different social media ? Do most active users behave similarly on different media?

Q4: How do people manage their posting activities during the day? Do they post on multiple social media every day or do they alternate in the choice of the publishing platforms?

Q5: Are users coherent in the choice of their usernames, keeping their identifiability across social sites?

By answering the above questions the paper introduces the following main findings and contributions. On the one hand, some findings confirm that the adoption and the active usage of multiple social sites is gaining momentum and maybe be studied through social media aggregators. On the other, we find that a full multiplexity of the interactions and posting activity is difficult to reach in short periods (day) but more evident in longer.

Membership distribution across social media.The way users distribute their membership across sites has been recently studied in Zafarani and Liu (2014) and has been the subject of a few market surveys. In the third section we confirm and extend these results by means of a more recent dataset, which consequently relies on the today’s most popular social websites. The usage of multiple social platforms, raising in 2008, is now strengthening and social media aggregators offer a chance to collect data about this phenomenon.

Popularity across social media.By performing a correlation analysis we investigate the maintenance of users’ popularity across social sites. The analysis led to not straightforward results and indicates that user’s popularity in a given social site barely corresponds or does not correspond at all to his/her popularity on another social platform. The lack of a strong degree correlation has been observed in different contexts such as online/offline interactions Gaito et al. (2012a), virtual worlds Szell et al. (2010) and other multiplex networks Nicosia and Latora (2015). Here we observe the same phenomenon in online social platforms.

Posting activity across social media.We aim at understanding whether an active and productive user on a social media preserves his/her aptitude on the other social media. Correlations between the posting rates on different social sites show that an easy answer is not possible, although we measure slightly positive correlations for many couples of social platforms. These results represent one of the main novel insights of this paper since little is known about users’ engagement on multiple social media.

(4)

medium to medium. The ability of handling some particular media better than oth-ers, the different interests and the usability of the services may be the reasons behind this behavior.

Burstiness in temporal patterns.We have analyzed the posting inter-event times both aggregated on all social sites, to get an overall picture of the activity dynamics of users, and a per site evaluation, to get temporal patterns specific to a social website. This way, we evaluate the dynamics of the online activity by a true multidimensional approach. We discovered that the posting activity on online social media is bursty and highly heterogeneous. In particular the bursty behavior of the aggregated sequence is the union of bursty posting event sequences on different social media. Moreover a period of high activity in the aggregated sequence does not imply that the user is highly productive in each social media along the same period.

Temporal multiplexity.We introduce thepost multiplexity index to measure the propensity of a user of being multidimensional. Results show that most of the users tend to prefer a single media per day while sometimes they adopt multiple social media for posting.

The above results, presented in the fifth section, represent the first steps towards a multidimensional approach in the study of human dynamics Szell and Thurner (2010) in communication and social networks. Some attempts to extend and model bursty dynamics can be found in Quadri et al. (2014) and Jo et al. (2013), however the lay-ers are few or do not really represent communication channels. In this paper, thanks to a large set of social media, we highlight the multidimensional bursty nature of human dynamics in multiple social sites. Specifically, we show that the overall bursti-ness is a consequence of a complex mixture of non-stationary interests to the chosen social media.

Patterns in username usage.We evaluate how users are coherent in the choice of their username across social sites. Results show that users often maintain the same username across different social websites, but are more likely to change them among websites whose norms and scopes are different.

Alternion datasets.The above results rely on two new collected datasets which capture multidimensional and longitudinal information about how online users behave across multiple social platforms. To the best of our knowledge, they represent the most updated available datasets which combine posts, their contents and the multiple profiles of a large set of people. The Alternion dataset allows us to quantify the multiple platforms usage without requiring a periodical survey; on the other, additional information about posts could be extracted to verify whether the same observations hold for users’ engagement. Furthermore, posts provide the temporal information necessary to the study of multidimensional human dynamics in online social networks.

(5)

Methodology

Nowadays people have at their disposal a wide selection of online social sites, each having its own peculiarity. This way, the adoption of multiple social platforms by the same person is becoming increasingly spread. Now people can exploit and combine their favorite social media according to their needs. For example, a tourist could share his/her position by Foursquare, meanwhile s/he uses Twitter to communicate his/her mood and shares on Instagram a selfie with the “Gioconda”.

At the same time, people have recently shown a growing interest in tools managing their online life in a centralized way, i.e. social media aggregators. Social media aggregators allow users to combine their own identities into a single profile by gathering their online activities from different social platforms, such as Twitter, YouTube, LinkedIn, Facebook, and many others. The aggregation is enabled by APIs provided by social networks, so that people may have control on the data the aggregation platform can access.

Most of these services offer a private space to the users, where they can share a content simultaneously on multiple media, but profiles with public contents and public API are not available. The media aggregator Alternion3stands out because, unlike most of simi-lar services, allows users to decide which information about their social sites - including profile information and contents - can be made public. Furthermore, the service retrieves data from more than 200 social sites and manages social relationships among Alternion profiles.

Data collection methodology

Since the service does not expose a public API, we developed a crawler to retrieve the Alternion profiles and their public updates. An example of an Alternion profile page is shown in Fig. 1. The page exposes the social sites associated to the Alternion identity. Each icon links to the relative profile and shows the username chosen in the target social site. The "Updates" tab reports the public posts grouped by social sites. The content of each post highly depends on the information released by APIs.

(6)

We collected registered users by exploiting a search function which returns a different set of 40 random users every two minutes. Data collection started on October 2014 and resulted in 19.680 distinct profiles from different countries and in different languages. From a user profile we extract the set of social sites a user associates to the profile and all the public contents s/he had published since the registration date. For each content we gather information about the social media used to post it, the content itself according to the format returned by the APIs and the publication date. Furthermore, not to over-load the service and to respect politeness, we limit the number of contents for each user to 10.000 at most. In practice, the dataset contains, for each individual, a list of her/is favorite social sites and a multidimensional time series of posts, where each dimension corresponds to a specific social site. All the analysis, but username patterns reported in the last section, will be performed on this datasetD1. In addition, each profile reports the degree of the user on the social networks that make this information available. How-ever, we are not able to get the node neighborhoods since Alternion does not return the neighbors’ usernames. Consequently, we are not able to build any network from the information provided by the social media aggregator. Since we may be able to build dif-ferent types of relation by exploiting implicit links such as mention, we model our dataset by a directed multigraphD = (V,E,D). Each user (identity) is an element ofV, while

EV×V ×D, withDthe set of social media in Alternion, is the set of directed multi-edges. Finally, we associate to each useruVa sequence of timestamped events(t,ped),

whereperepresents a post published on the social mediadD.

For the study of username patterns only, we collected a different dataset,D2, whose characteristics are specifically designed to this aim. We adopted a more targeted sampling approach by collecting 15.000 profiles with an English first name4to make the text analy-sis of usernames more affordable. This way we maintain the alphabet accordance between the username (typically ASCII character) and the information about the identity (first and last name are usually written in mother tongue). Finally, the datasetD2 contains for each individual a list of her/is usernames.

Dataset characteristics and representativity

The datasetD15contains the profile information of 19680 users, each reporting the plat-form Alternion among her/is social sites. Specifically the number of social platplat-forms amounts to 152, from the most famous Facebook, Twitter to the less common Discus or Zazzle. In Fig. 2 we report the number of users who share their profile information of a social site, while in Fig. 3 we report the number of users for the top ten social platforms. As expected, and ignoring Alternion, Facebook is the most used social site and the top 10 correspond to the most popular social media.

(7)

Fig. 2Social site adoption. The number of users for each social platform

We obtain a deeper understanding if we combine the above results with those presented in Fig. 4, where we report the number of posts grouped by social sites. Here we note that Facebook active users are more productive than Google+ users. In fact, although they are fewer, Facebook users publish more or less the same amount of posts as Google+’s. Finally, results in Fig. 4 confirm the predominant role of Twitter in the production of contents and posts, i.e. more than 6 million posts over an overall amount of more than 8.5 million posting events.

As regards the posting activities of the active users, in Fig. 5 we show the distribution of the number of published contents per user. The post sampling covers the time interval from 10 July 2005 to 10 November 2014. The distribution seems to obey to a heavy tail

(8)

Fig. 4Posting activity. The number of posts for each social site

with exponential cut-off. During the examined period, we observe that users have been quite productive: on average, users have published about 800 different contents and half of the users have produced more than 300 updates.

The above results highlight that Alternion effectively captures a representative sam-ple of today’s users of social media sites, in terms of favorite social sites and publishing media as compared with public available data1on social media usage. Indeed, the usage proportion across the social sites are almost equal to those collected by aggregating the official statistics released by the different social platforms. Despite data show an alignment between the Alternion users and the users of the different social media, the representativeness issue of the dataset still persists. For instance, users’ demography is unbalanced towards western and English-speaking countries, the contents posted on the Alternion profile depend on the API provided by the social media. Lastly, the adoption of social media aggregators is more common and useful to an audience active on social

(9)

media. Nevertheless, it represents the most recent dataset able to capture how users behave across multiple sites and an easy tool for studying the on-going phenomenon of the multiple adoption of social platforms.

Usage of multiple social sites

In a survey conducted in September 20141it has been highlighted that the adoption of multiple social platforms is on the rise. In fact, 52 % of online users now use at least two social sites representing a significant increase from 2013, when the figures were close to 42 %. On the one hand the Alternion dataset allows us to quantify this phenomenon without requiring a periodical survey; on the other, additional information about posts could be extracted to verify whether the same observations hold for users’ engagement. Indeed, being registered to a service does not always imply using it.

To answer Q1 -How common is the use of multiple social sites? Does the same hold for active users?- we analyze how users are distributed across sites. The main goal of a social media aggregator is to group different social sites into a single access point. This characteristic allows us to compute how many different social websites people are able to manage. How users distribute their membership across sites has been studied in Zafarani and Liu (2014) only, where authors showed that it follows a power-law distribution from a dataset gathered in 2008. Our goal is to extend that result, by confirming it on a more recent and larger dataset, which relies on the current social platforms. In Fig. 6 we report the probability distribution function (PDF) of the number of sites joined by a given per-son. On average a user is simultaneously registered on 5 social sites, while more than 95 % of users have joined 17 platforms at most. Alternion results are consistent with the afore-mentioned survey; indeed about 56 % of Alternion users use at least three different social platforms6.

(10)

The behavior of active users gives a more direct and clear evidence of the adoption of multiple sites. In fact the publication of contents on a specific social media confirms its adoption in practice and it is strictly related to the user’s engagement. This way, we com-pute the distribution of the number of social platforms being simultaneously used by each active user. As shown in Fig. 6, the observations about the multiple usage get stronger: 73 % of active users publish on at least two social networks. In general, the propensity to adopt multiple social sites is becoming increasingly ingrained amongst online users, especially if we consider the effective publishing on different platforms.

The above analysis on the people’s propensity to be active on multiple social media does not take into account the amount of contents published on each platform. To this aim, we analyze the productivity of the users who join more the one site simultaneously. Let

p1,p2,. . .,pnindicate the number of posts of a user on thensites where s/he is active and

¯

p=pi/nthe average number of posts per site. For each active user we computep¯and

group them by the number of multiple sites they have joined; then we calculate the mean and the standard deviation ofp¯for each group. In Fig. 7 we report the above quantities as a function of the number of sites. The figure indicates thati) the more sites a user joins, less posts for site on average s/he publishes, andii) users who post on few sites are more heterogeneous in their activity since the standard deviation decreases as the number of sites increases. We suppose that the first remark is a consequence of the limited human resources in terms of usable time and ability of handling multiple tasks at the same time Miritello et al. (2013).

Finally, in Fig. 8, we show the number of users who have joined different social sites by considering two classes: social media and social networks. The former denotes social sites where relationships are implicit, the latter indicates social sites where relationships are explicitly defined. These figures show that more than 90 % of users have registered to at most 5 sites and some of them share information on more than 10 social media. In general we observe that it is more likely to adopt social media than social networks.

Centrality and activity correlation across social sites

To answer Q2 -Is a person’s centrality uniform, i.e. more or less equally distributed across the social sites in use?- we treat the user degree on different social sites as an index of

(11)

Fig. 8Social media and networks. The PDF of the number of social sites joined by each user subdivided in social media and social networks

centrality in the network. This information, reported on the Alternion profile, is not avail-able for every site since just a few APIs allow to retrieve this information. So, we limit our analysis on the degree to Facebook, Twitter (in/out), LinkedIn and YouTube (in/out).

Q2 corresponds to verify if statistically significant correlations between the degrees of the same group of users in different sites exist. The presence of these correlations mea-sures whether or not popular users in one site maintain their centrality across media. If we denotekusas the degree of the nodeuin the sites, we can evaluate the degree of correlation between pairwise social sites by adopting different methods Boccaletti et al. (2014). First we compute, for each pair of social sites, the joint distributionP(ks1,ks2)to obtain a char-acterization of the relations between the degree sequences. For example, in Fig. 9, where we show the joint distribution for Facebook and LinkedIn, we observe a slightly positive correlation, especially in the left bottom part of the distribution (dark blue regions). In particular people with about 200 friends in Facebook and about 20/25 relationships in LinkedIn are more likely than others.

By analyzing the joint distributions only, it is difficult to compare the relations between a social site against the other ones. To this aim, we compute the average degree of a node in a social sites1conditioned on the degree in the sites2. The resulting computation for LinkedIn conditioned on Facebook has been shown in Fig. 10. In this case we observe an initial increase of the average degree in LinkedIn as a function of the Facebook degree up to about 500 friends. Then we observe a more unstable trend also due to a low number of data points. In the Fig. 11 we report a more pronounced lack of correlation involving the out-degree in Twitter and YouTube. Here we cannot establish an increasing relation between the out-degrees, so people how follow many users in Twitter would not follow the same amount of channels in YouTube. The specificity of the two social platforms, i.e. news broadcasting and video sharing, could be the cause of the weak correlation.

(12)

Fig. 9Facebook and LinkedIn. The joint probability distribution estimated by kernel density method

the degree and the in/out degree is dictated by the inability to extract mutual links (more similar to a friendship link in Facebook or LinkedIn) since we can only retrieve the counting of the followers/followees in Twitter and YouTube.

Finally, to get an overall picture of the pairwise degree correlations, we apply a rank cor-relation analysis on the different pairwise sequences. Rank corcor-relation analysis allows us to test if the ranking induced by the different degrees is similar or not. As a rank corre-lation method, we compute the Kendall’s rank correcorre-lation coefficientτb7on the ranking

(13)

Fig. 11 Twitter and YouTube. The conditional distribution of the out-degree in YouTube given the Twitter out-degree. For each 50 size bin, we show the degree distribution through the violin plot with its average (blue) and its median (green)

induced by the degrees. In Fig. 13 we visualize the rank correlation matrix, where each row (column) corresponds to a different social site. A strong positive correlation does not exist, rather the scenario is multifaceted. In most pairs, there is only a limited posi-tive correlation (0.1−0.23) between degree centralities. This means that users may have a very different centrality across the services, i.e. a single user might be an hub on one system and loose part of its hubbiness on the other. One reason may rely on the differ-ent goals of the services; whereas LinkedIn is business-oridiffer-ented or Twitter is an interest network, Facebook incorporates all the previous features. So, for instance, LinkedIn may only capture a part of the Facebook friends.

The above conjecture cannot be verified since we are not able to compute the inter-section between a node’s neighborhoods in different social networks (information not provided by Alternion). However we can quantify and analyze the difference between the neighborhoods in terms of their size. To this aim, given a useruand two sitess1ands2, we define thefriend deviationksu1s2as:

(14)

Fig. 13 Degree rank correlation. The correlation matrix among the degree sequences in the different social sites

ksu1s2=kus1−kus2 (1)

We compute the friend deviation between Facebook and Linkedin(kuFL), Facebook and Twitter(kuFT), Facebook and YouTube(kuFY), Twitter and LinkedIn(kTLu ) and Twit-ter and YouTube(kTY

u ) for the users who have joined them. We report the trends of

the friend deviations in Fig. 14 sorted in decreasing order. We observe thatkFTu ,kuFL, kFYu ,kuTLandkuTYare positive for 5360 users (out of 8527), 3313 users (out of 4361), 2944 users (out of 3177), 2518 users (out of 4148) and 2760 users (out of 3098), respec-tively. These results indicate that users in our dataset prefer to create friendships in Facebook rather than in LinkedIn, Twitter and YouTube. Moreover, users prefer Twitter rather than LinkedIn and YouTube to establish friendships. One remarkable result is that users preferring Twitter rather than Facebook have significantly more friends than those they have in Facebook.

In general maintaining the importance across social media is not a straightforward task and asks for a deeper understanding; for example it is not clear how user’s neighborhoods in different media overlap.

Finally we apply the above methodologyi) to investigate how often users publish con-tents in different social sites, andii) to asses if a form of correlation exists between the amount of posts and the number of friends. For pointi), we are verifying whether users who post a lot and often on a social platform, are equally active in other platforms (see

(15)

Q3). We measure the activity level of a user within a social media by means of the post-ing rate, measured in number of posts per week. By considerpost-ing the postpost-ing rate rather than the post count, we mitigate the effects given by the adoption of social media in dif-ferent periods. We report results about the analysis of the rank correlation matrix applied to pairwise posting rate sequences. In Fig. 15 we visualize the Kendall’s coefficientτbfor

the most used pairs of sites. Unlike the above discussion on centrality, there is a more evident positive correlation between the posting activity across social sites. The obtained values do not mean that users are equally active on both social media, however there is a positive tendency to be active in different social sites. In general, the maintenance of the posting activity across social media is not a straightforward task, like in the degree analysis.

By the second point we wonder if users with many friends in a OSN are more active and productive than people with fewer friends. To this aim, we consider only users whose degree and posting rate are available and combine these information. The analysis of the Kendall’s coefficientτbhighlights a medium positive correlation between the two

vari-ables for all the online social networks (τb∈[ 0.27, 0.4]) except YouTube (τb= 0.1). High

degree people tend to post and publish more than those with few friends.

A case study on 4 social media

Up to now the analysis has focused on the pairs of social media. In this section we present a particular case study that involves 524 users who have joined Google+, Pinterest, LinkedIn and Twitter. We select this subset of social media because they have the high-est number of users w.r.t. other subsets of 4 elements and their users are also very active. For instance in Twitter users published 1025 elements on average, followed by LinkedIn (139.32), Google+ (137.55) and Pinterest (134.71). In fact, we observe that less than 40 % of users in Google+, Pinterest and LinkedIn have more than 100 posts. While in Twitter almost 87 percent of the users produced more than 100 posts.

In the light of the results about the moderate correlation among the posting rates, we wonder if, in this group, people who actively post in a service necessarily produce many posts in the other services. To this aim we denote the top 5 % of users in each media as most active users. Only two users are the most active in all the services. 36 %, 30 % and 42 % of the most active users in Twitter are also the most active in Google+, Pinterest and

(16)

LinkedIn respectively. 23 % and 14 % of the most active users in Google+ are also the most active in Pinterest and LinkedIn respectively. The above observations support the pres-ence of users who are very active pairwise but, whereas the number of sites actively used increases, the number of posts and consequently the productivity reduces as observed in the previous section.

We quantify the diversity of the posting activity across social media by defining the posting deviationps1s2

u of a useruas:

psu1s2= |np s1

unpsu1| max(nps1

u,npsu1)

(2)

wherenps1

u(npsu2)denotes the number of posts of useruon the social medias1 (s2).psu1s2

ranges from 0 to 1: ifpdecreases towards 0,utends to publish the same number of posts both ins1 ands2. As shown in Fig. 16, where we report the posting deviationpbetween Google+ and the other social networks, this quantity decreases linearly. From the figure emerge that users’ behavior is more variable between Google+ and Twitter than the other social networks. Only 9 % of users show a post deviation less than 0.5. Most of the users prefer to post on a single service. So users who laboriously publish posts in one service do not publish with the same rate in the other ones.

Temporal patterns in the posting activity

It is a well established result that the human dynamics on communication media are bursty and scarcely uniformly distributed in time, even removing the effect of seasonal or circadian cycles Karsai et al. (2012); Quadri et al. (2014). So we investigate whether the burstiness characterizes how people post on online social media, too. Figure 17a dis-plays a typical posting activity of a user adopting different social media simultaneously. The overall activity presents periods where the user is almost inactive, interleaved with periods of high activity. We analyze the statistical properties of the inter-event timet, i.e. the period between two consecutive posts published by the same user. Many stud-ies have shown that the inter-event time distribution in different human processes Gaito et al. (2012b); Barabasi (2005); Karsai et al. (2012) (e.g. mobile phone call, email, direct physical contact, link formation in online social networks) is heavy-tailed. In Fig. 18 we report the distribution of the inter-event time resulting from the aggregation of the

(17)

Fig. 17 Posting activities on different media. Posting activity on different social media. Inathe raster plot of the overall activity of a person who uses many social media simultaneously. Inbthe user’s activity split into the six social media s/he is using

behavior of all the users. The distribution seems to obey to a heavy-tailed law and sup-ports the hypothesis that the posting activity on online social media is bursty and highly heterogeneous.

The opportunity of tracking multiple online activities simultaneously allows us to highlight the complexity of the users’ posting activity, which appears like a compound combination of the single activities on the different social media. For instance, the sequence of events in Fig. 17 results from the union of the activity of the user in six different social media. The example emphasizes two characteristics of the event sequences:i) singularly each event sequence keeps an high level of burstiness, i.e. the bursty behavior of the aggregated sequence is the union of bursty event sequences

(18)

and; ii) a period of high activity in the aggregated sequence does not imply that the user is highly productive in each social media along the same period. In the example (see Fig. 17) the user is initially active on BipTV only, then s/he begins to adopt the other social media. But the activity on Twitter ceases almost immediately and the other activities interchange like in Tumblr or Pinterest. The characteristici) can be captured by the distribution of the inter-event time on each social media reported in Fig. 19. Here we extract the distributions from the social media with the highest number of posts. From the analysis of the distributions we infer two main insights:a) most of the distributions seem to follow a heavy-tailed law andb) each social media is characterized by a distribution that is far from the others. For example the probability of having a short inter-event time is higher in Pinterest, Delicious or LastFM w.r.t the other social media.

To verify the observationii) we analyze the distribution of the inter-event time between two consecutive events that happen on different social media, a quantity strictly related to the propensity of switching among the media. The distribution shown in Fig. 19 is similar to the overall inter-event time distribution. The figure suggests that users alternate and mix their activities. The heavy-tailed distribution also indicates that the time to switch between different social media is likely to be short even if longer periods are possible. This finding suggests that during a single day a user may publish on multiple social media. To this aim we introduce an index, thepost multiplexity, which captures the number of days a user is active on multiple social networks. Formally, letp(u,si)=<p0,p1,. . .,pT >

denote the sequence of posts published by the useruon the social mediasi, wherepi

indicates the number of posts during the dayi. We define a new binary sequencep(u,si),

wherepi =δ(pi)8and the functiondm(u) = sip(u,si)which computes the number of social media used in each day. The post multiplexity corresponds to the ratio between the number of days s.t.dm(u) = 1 and the overall number of sampling days. The index measures the propensity of a user of being multidimensional. A value close to 1 indi-cates than an individual uses only a social network per day. The distribution of the post

(19)

multiplexity has been reported in Fig. 20. We consider different thresholds on the number of sampling days to make the results independent of the length of the sequence and reduce the effects of less active users. The distribution suggests completely multidi-mensional users are quite rare as well as unmultiplex users; however most of the users tend to prefer a single media per day while sometimes they adopt multiple social media for posting.

Patterns in username usage

Besides providing information about people’s behavior in using and posting across social sites, the collected datasets sum up properties concerning the choice of the username in different sites. The username represents the first information provided to the social site, the first way of being identified in the site and an element in the self-presentation. In fact many behavioral aspects flow into the choice of the username, not least the limited memory capacity. OnD2 we evaluate how users are coherent in the choice of the user-name by computing the edit distance and the complement of the Jaccard Index on the pairs(uji,uki)build from the setUiof the usernames associated to the individuali. The

former quantifies how dissimilar two strings are by computing the minimum number of operations to changeujiintouki. The allowed operation are the insertion, the deletion and the substitution of a character with another. The latter computes the Jaccard coefficient on the sets of characters ofuji intouki. In Figs. 21 and 22 we report the distribution of the above measures computed on each possible pair and in the inner figures the same quantities computed on randomly shuffled pairs. In both cases we obtain a peak at 0, indicating exact matches or the usage of the same character set, while the randomization shift the peak of the distributions towards values typical of dissimilar strings. However a portion of the pairs indicates that people may change the username across social sites. An example is shown in Fig. 23 where the change of the username between Pinterest and Google+ is more evident w.r.t. the average behavior reported in the previous figures. In fact we observe that the decision of changing the username depends on the pair of social sites.

(20)

Fig. 21 Distance distributions between usernames. The distribution of the complement of the Jaccard Index between pairs of usernames associated to the same person

Concluding remarks

To face the new challenge of giving an all-around picture of people’s online behavior, in this paper we perform an analysis on the same users across multiple social media. Our study relies on a new rich dataset gathered from the social media aggregator Alternion. It collects information about how users post their favorite contents, their centrality on different social media and the usernames they choose. The study of this dataset let emerge two main insights. On the one hand we confirmed that the on-going phenomenon of the adoption of multiple social site is spreading. Not just people sign

(21)

Fig. 23 Google+ and Pinterest. The distribution of the edit distance computed on username pairs from Google+ and Pinterest

up in many social media but they are active and exploit their services as well. On the other, the temporal information about how and when users post allowed us to investigate how people manage the opportunity of having different communication channels at their disposal. As far as we know this study represents the first attempt to deal with the people’s activities on multiple social media by using a large set of social platforms and users.

As regards the multiple adoption of social sites, the analysis of social media usage shows that Alternion data capture the typical trend in today’s users, despite the limitations dis-cussed on the dataset section. The usage of multiple platforms is gaining momentum. In fact, half of the users signed up in at least three different social sites. This result stresses the importance of a multiplex approach when conducting studies which rely on online social networks. In fact people are expressing their identity and their behaviors through multiple communication media. This observation is further strengthened by the results about active users. The fact that 73 % of active users publish on at least two social sites means that people choose the right online channel to communicate and convey their con-tents. Also in this case the multiplex approach is fundamental in the extraction of people’s interests and preferences. To this aim we plan to exploit these first results to study how users build their social identities across their social platforms Kooti et al. (2012). In partic-ular we will verify whether different social norms characterize the major social platforms and if users adapt their self-presentation to these norms, as the results about the choice of the username may suggest. Finally we will ask whether the identity of a single user expressed through the contents s/he publishes is persistent and coherent with the profile information.

(22)

For instance, someone better expresses and communicates through video, others through texts or blogs. That may result into different levels of engagement with fans or friends and, consequently, in different degree centralities. To this we asked whether a person’s pop-ularity is uniform across the social sites. The analysis led to not straightforward results and indicated that user’s popularity in a given social site barely corresponds or does not correspond at all to his/her popularity on another social platform. Nverethenless, a more evident positive correlation between the posting activities across social platforms exists. In the future we plan to analyze the reasons behind the weak correlations we observe. In particular we will verify whether the social norms of the online platforms and the services they provide impact on the centrality of the users.

The multidimensional and longitudinal nature of the dataset is fundamental in the understanding of the human dynamics in communication and online social networks since it offers the opportunity to study how people handle different media. The primary goal of our analysis was to characterize the time-series associated to the posting activity of the single users across multiple social platforms and unveil their statistical properties focusing on measures which describe their level of burstiness. The analysis of the bursti-ness moved in two directions: i) we classified the aggregated behavior of each single user; and ii) within each user we characterized the bursty behavior in each single social media. The results show that i) singularly each event sequence keeps an high level of burstiness, i.e. the bursty behavior of the aggregated interaction sequence is the union of bursty event sequences and; ii) a period of high activity in the aggregated sequence does not imply that the user is highly productive in each social media during the same period. These observa-tions represent the basis for a model of temporal patterns in multilayered social network which combines the idea of attention allocation and novel models which reproduce the human bursty dynamics Perra et al. (2012). We also plan to investigate the interaction sequences through frequent pattern analysis in order to highlight whether users are char-acterized by specific usage subsequences, i.e. they have a predefined scheduling in the usage of their preferred social media. The above analysis will be combined to the con-tent analysis of the posts to see if personal topics change along the observation period or users change their posting behavior in terms of covered subjects or whether some social platforms specialize on specific subjects.

Endnotes

1 http://www.pewinternet.org/2015/01/09/social-media-update-2014/

2 http://www.alternion.com

3 http://www.alternion.com

4 http://census.gov

5 The dataset is available by e-mail.

6 Alternion account is included by default, so counting starts from 1

7 τ

btakes into account ties.

8 δis the Heaviside function

Competing interests

The authors declare that they have no competing interests.

Authors’ contributions

(23)

Received: 14 February 2016 Accepted: 19 June 2016

References

Abel F, Herder E, Houben GJ, Henze N, Krause D (2013) Cross-system user modeling and personalization on the social web. User Model User-Adapted Interact 23(2–3):169–209

Barabasi AL (2005) The origin of bursts and heavy tails in human dynamics. Nature 435(7039):207–211

Barigozzi M, Fagiolo G, Mangioni G (2011) Identifying the community structure of the international-trade multi-network. Phys A: Stat Mech Appl 390(11):2051–2066

Bauch CT, Galvani AP (2013) Social factors in epidemiology. Science 342(6154):47–49

Benevenuto F, Rodrigues T, Cha M, Almeida V (2009) Characterizing user behavior in online social networks. In: Proceedings of the 9th ACM SIGCOMM Conference on Internet Measurement Conference. IMC ’09. ACM, New York Berlingerio M, Coscia M, Giannotti F, Monreale A, Pedreschi D (2012) Multidimensional networks: foundations of

structural analysis. World Wide Web 16(5):1–27

Boccaletti S, Bianconi G, Criado R, del Genio CI, Gómez-Gardeñes J, Romance M, Sendiña-Nadal I, Wang Z, Zanin M (2014) The structure and dynamics of multilayer networks. Phys Rep 544(1):1–122

Bródka P, Kazienko P, Musiał K, Skibicki K (2012) Analysis of neighbourhoods in multi-layered dynamic social networks. Int J Comput Int Syst 5(3):582–596

Brummitt CD, D’Souza RM, Leicht E (2012) Suppressing cascades of load in interdependent networks. Proc Nat Acad Sci 109(12):680–689

Carmagnola F, Cena F (2009) User identification for cross-system personalisation. Inf Sci 179(1–2):16–32 D’AGostino G, Scala A (2014) Networks of Networks: The Last Frontier of Complexity. Springer, Berlin

Gaito S, Rossi GP, Zignani M (2012a) Facencounter: bridging the gap between offline and online social networks. In: Proceedings of the Eight International Conference on Signal Image Technology and Internet Based System (SITIS). Complex Networks ’12. IEEE, New York

Gaito S, Zignani M, Rossi GP, Sala A, Zhao X, Zheng H, Zhao BY (2012b) On the bursty evolution of online social networks. In: Proceedings of the First ACM International Workshop on Hot Topics on Interdisciplinary Social Networks Research. HotSocial ’12. ACM, New York

Goga O, Loiseau P, Sommer R, Teixeira R, Gummadi KP (2015) On the reliability of profile matching across large online social networks. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. KDD ’15. ACM, New York

Jo HH, Pan RK, Perotti JI, Kaski K (2013) Contextual analysis framework for bursty dynamics. Phys Rev E 87(6):062131 Karsai M, Kaski K, Barabási AL, Kertész J (2012) Universal features of correlated bursty behaviour. Sci Rep 2(397) Kivelä M., Arenas A, Barthelemy M, Gleeson JP, Moreno Y, Porter MA (2014) Multilayer networks. J Complex Netw

2(3):203–271

Kooti F, Mason WA, Gummadi KP, Cha M (2012) Predicting emerging social conventions in online social networks. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management. CIKM ’12. ACM, New York, NY, USA. pp 445–454

Li H, Fang W, An H, Yan L (2014) The shareholding similarity of the shareholders of the worldwide listed energy companies based on a two-mode primitive network and a one-mode derivative holding-based network. Phys A: Stat Mech Appl 415:525–532

Li H, An H, Wang Y, Huang J, Gao X (2016a) Evolutionary features of academic articles co-keyword network and keywords co-occurrence network: Based on two-mode affiliation network. Phys A: Stat Mech Appl 450:657–669

Li H, Fang W, An H, Gao X, Yan L (2016b) Holding-based network of nations based on listed energy companies: An empirical study on two-mode affiliation network of two sets of actors. Phys A: Stat Mech Appl 449:224–232 Magnani M, Rossi L (2011) The ml-model for multi-layer social networks. In: Proceedings of the International Conference

on Advances in Social Networks Analysis and Mining. ASONAM ’11. IEEE, New York

Magnani, M, Rossi L (2013) Formation of multiple networks. In: Proceedings of the 6th International Conference on Social Computing, Behavioral-Cultural Modeling and Prediction. SBP’13. Springer, Berlin. pp 257–264

Meo P. d., Ferrara E, Abel F, Aroyo L, Houben GJ (2013) Analyzing user behavior across social sharing environments. ACM Trans Intell Syst Technol (TIST) 5(1):14

Miritello G, Moro E, Lara R, Martínez-López R, Belchamber J, Roberts SG, Dunbar RI (2013) Time as a limited resource: Communication strategy in mobile phone networks. Soc Netw 35(1):89–95

Nicosia V, Latora V (2015) Measuring and modeling correlations in multiplex networks. Phys Rev E 92(3):032805 Perra N, Gonçalves B, Pastor-Satorras R, Vespignani A (2012) Activity driven modeling of time varying networks. Sci Rep

2(469)

Quadri C, Zignani M, Capra L, Gaito S, Rossi GP (2014) Multidimensional human dynamics in mobile phone communications. PLoS ONE 9(7):103183

Ruths D, Pfeffer J, et al (2014) Social media for large studies of behavior. Science 346(6213):1063–1064

Sun Y, Han J (2012) Mining heterogeneous information networks: principles and methodologies. Synth Lect Data Mining Knowl Discov 3(2):1–159

Sun Y, Han J, Zhao P, Yin Z, Cheng H, Wu T (2009) Rankclus: Integrating clustering with ranking for heterogeneous information network analysis. In: Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology. EDBT ’09. ACM, New York, NY, USA. pp 565–576

Sun Y, Barber R, Gupta M, Aggarwal CC, Han J (2011) Co-author relationship prediction in heterogeneous bibliographic networks. In: Proceedings of the 2011 International Conference on Advances in Social Networks Analysis and Mining. ASONAM ’11. IEEE, New York. pp 121–128

Szell M, Thurner S (2010) Measuring social dynamics in a massive multiplayer online game. Soc Netw 32(4):313–329 Szell M, Lambiotte R, Thurner S (2010) Multirelational organization of large-scale social networks in an online world. Proc

(24)

Vosecky J, Hong D, Shen VY (2009) User identification across multiple social networks. In: Proceedings of First International Conference on Networked Digital Technologies. NDT ’09. IEEE, USA

Zafarani R, Liu H (2013) Connecting users across social media sites: A behavioral-modeling approach. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. KDD ’13. ACM, New York Zafarani, R, Liu H (2014) Users multiple sites. In: Proceedings of the Eights International Conference on Weblogs and

Social Media. ICWSM’14. AAAI, Palo Alto

Zignani M, Quadri C, Gaito S, Rossi GP (2015) Calling, texting, and moving: multidimensional interactions of mobile phone users. Comput Soc Netw 2(1):1–24

Submit your manuscript to a

journal and benefi t from:

7Convenient online submission 7Rigorous peer review

7Immediate publication on acceptance 7Open access: articles freely available online 7High visibility within the fi eld

7Retaining the copyright to your article

Figure

Fig. 1 Example of a profile page in Alternion. Thepop-menu shows the number of relationships in each social network
Fig. 3 Social site engagement. The number of users/ active users for the top ten social sites
Fig. 5 Posting activity. The complementary cumulative distribution function (CCDF) of the number of postspublished by each user
Fig. 6 Social sites per user. The PDF of the number of social sites joined by each user (blue)/active user (green)
+7

References

Related documents

We also know that larger sample size lead to increased precision when estimating a population therefore for our survey, we will decided to target more than 5000 auto

While considerable research has looked at how people use the Internet for sharing and engaging with various types of content from celebrity news to politics, very little of this

“To consider how best Scotland’s renewable energy potential might be realised having regard to the 2020 targets set by the UK Government and the Scottish Executive and to advise on

Patients and methods: Differences in Psoriasis Activity and Severity Index 50, 75, and 90; Nail Psoriasis Severity Index; and Dermatology Life Quality Index at 16 weeks were

Beth Summers: We… The water heater's already installed and the cash has been used upfront and then we get cash like it's financed and we can do whatever we want with that

In our research, the experiments are carried out in two stages. In the first stage, we attempt to show the impact of optimization by observing the output trend after the wave

This point is connected to the 100Ω centre of the stacking bars of the other Yagi pair by 100Ω shielded transmission line in the form of the inners of two 50Ω coax lines of any

Includes the Gorilla Grip™ stainless steel boom clamps, the stainless steel bolts, nuts and washers that attach them to the DXE-BEB-1 element bracket.. Also included are the