• No results found

What is Twitter?

4.3 Statistical Properties of the TEG

5.1.1 What is Twitter?

Twitter is a social network and microblogging service first introduced in 2006 [33]. Users can post short messages of up to 140 characters (known as a tweet) to the network (examples of which are given in Figure5.2). These messages can contain URLs, images, and videos. All tweets are public to the entire social network and can be found by

(a)

(b)

(c)

(d)

Figure 5.2: Examples of tweets. a) A singleton tweet which uses hashtags. b) A message from one user to another which shares a website link. c) A retweet of a previously seen tweet. d) A singleton tweet which shares a photo.

searching for keywords. In particular, words preceded by the hash symbol, # (also known as a hashtag), can be used to tag content as being relevant to a topic. Twitter allows users to search for all tweets containing a particular tag and provides a list of the most commonly used hashtags.

Users can choose to follow any other users, up to a maximum of 5, 000 at any given time. This subscribes them to read any tweets that are produced by those that are followed. This relationship is not symmetric, i.e. you can follow without being followed by another

user. Because of this there can exist users who follow very few other users but are themselves followed by many. These are potentially influential users in the network as their messages can be reached by a relatively large fraction of other users. Furthermore, other users of the network can be mentioned by using the syntax @username in a tweet. Any number of users can be included (within the 140 character limit). This allows users to direct messages towards other users even if they are not followed by them.

Tweets can be categorised into four different types:

Singleton A tweet containing no mentions i.e., no use of @username.

Message A tweet containing mentions to one or more other users which is created independently of any other tweet.

Reply A tweet containing one or more mentions as a direct response to another tweet. This is not encoded in the tweet text but is available as metadata.

Retweet A direct copy of another tweet, preceded by the term ‘RT @username’ followed by the original tweet text. This is used as a means of extending a message to a new set of users while also being seen as an endorsement of the original tweet.

These distinctions are important for understanding the behaviour of users and the construction of temporal networks from Twitter data as will be discussed in Section5.1.2.

Statistics

Over the course of its lifetime Twitter has become one of the leading online social networks with over 317 million monthly active users [129] (Fig. 5.3). By comparison the largest social network, Facebook, has 1.87 billion monthly active users [23]. There is a high level of activity on the network, with an estimated 500 million tweets per day [129]. This makes the platform both interesting to study but also difficult to manage computationally. The platform is globally relevant with 79% of all activity outside the US [24], although a notable omission is from China where access is restricted. This means that users are able to spread their messages widely across the globe.

Twitter 91

Figure 5.3: Growth of monthly active users from 2010 to 2016. Over this time period the social network has seen strong linear growth, however has grown slower since 2014. Monthly active users are defined as persons who have logged onto the social network at least once in a month (they do not need to post).

This is reflective of a shift in the way we interact with the internet in a move from desktop computers to mobile devices such as phones and tablets. This has occurred not only with social media, but can be seen across a number of sectors from banking to shopping. The mobility of internet access has introduced a number of phenomena including interaction with social media during live events occurring around the world. This ranges from tweeting about live events while on location (as has been the draw for Twitter as a news medium) to tweeting while watching a concert, sporting event, or the latest television show. In 2015, approximately 87% of people reported using a second digital device while watching television [150]. Because of this, the social media ‘buzz’ generated online before, during, and after a television show has become an important tool for assessing user engagement. This phenomenon is very apparent in the millennial generation (roughly those born between 1983 and 2001). 71% of them say tweeting during an event makes it more fun, 70% enjoy reading tweets while watching an event on TV, and 69% will use a hashtag to follow all the tweets related to an event [151]. Given the appetite for discussion of events on social media, there are many opportunities for advertisers to generate content with the ‘second screen’ in mind through event-specific

content and discussion.

Despite widespread use across the globe, Twitter has struggled to monetise its platform, making a loss of $521 million in 2015 [152]. Approximately 90% of Twitter revenue is through advertising ($1.99 billion of $2.22 billion in 2015). For this advertising to be successful it requires a large and targetable user base. In their annual report [152], they list a number of risk factors to the platform that need to be addressed:

The platform must remain relevant: The platform needs to attract usage from

celebrities, organisations, and subsequently users to increase the number of active users, all the while adapting to new trends and the movements of competitors.

An estimated 5% of active users are bots: As Twitter allows a degree of automation

through its platform an industry of ‘fake’ accounts has appeared. These fake accounts are used to artificially boost follower numbers, generate higher activity levels for particular topics, and push an agenda or content piece. This is problematic for Twitter (and researchers) as it can be difficult to differentiate between human accounts and robotic (bot) accounts. If the latter come to dominate the social network then this will drive away users and advertisers.

Advertisers must be able to optimise campaigns: While Twitter has a large user

base, not all content is suitable for everyone. Twitter needs to be able to better understand its users in order to give opportunities to companies to offer tailored advertisements. This includes targeting users, using particular advertising styles and content, and timing advertisements so that they have the greatest effect and largest potential audience.

The first risk is dependent on the executive choices and innovations that Twitter and other competitive platforms make, as well as the continued usage by high profile organisations and people. The latter two points pose two interesting questions that can be addressed using the framework of the TEG:

Can we systematically detect automated behaviour?

How can we understand user behaviour better and monitor it over time?

Twitter 93