What bot software is used? - Deep Diving into Bot Behaviours

6.4 Deep Diving into Bot Behaviours

6.4.1 What bot software is used?

I begin by inspecting the bot software used by each account. This is trivial as tweets are accompanied by “source endpoints” which describe the endpoint that created the tweet. Whereas, nearly all (more than 339k tweets, 78.09%) human accounts rely on the official Twitter client (either web or mobile), I observe significant diversity amongst the bot-operated accounts.

To study these, Table 6.3 presents a summary of the di↵erent source types I observe, and Figures 6.3 shows the distribution of source type across clusters. It is worth noting that, even though I exclusively include bot accounts, almost 320k tweets (53.83%) from tools involve human usage and intervention (S1 and S2),

whereas almost 274k tweets (46.17%) are tweeted using automated tools (S3–S7).

This confirms that many bots are not exclusively automated and, instead, consist of significant human intervention.

In fact, this is further enforced by the human population in the dataset (re- call that I detected 11,379 humans as part of Stweeler bot detection cam- paign). From the accounts that are detected as humans, approximately 343k tweets (78.90%) of all tweets are generated by tools involving human usage and interventionvs. almost 92k tweets (21.10%) by automated tools. This goes a long way in explaining the usual challenges with bot detection – most bots are not exclusively software-based, and most humans are not exclusively using manually operated apps, despite distinctive trends. Inspection of these accounts there-

Table 6.3: Types of most prevalent Twitter activity sources for bot clusters.

Source type Tool/App Usage description # Tweets

S1: Browser or Web client Twitter Web Client Human intervention. 98,991

S2: Mobile device apps Twitter for iPhone, Twitter for An- droid, Mobile Web, Facebook, Drudge

Human intervention. 220,176

S3 Social media management apps

TweetDeck Social media dashboard management and primitive scheduling.

60,158

S4: Social media integration, scheduling and automation Bu↵er, Hootsuite, SocialOomph, Echobox Social, Postcron, dlvr.it, twittbot.net

Social media integration (Twitter, Facebook,etc) and advanced tweet scheduling and automation.

115,663

S5: Social media optimisa- tion and intelligent tweeting

SocialFlow Optimise the delivery of messages on Twitter using the commercial Twitter Firehose API and pro- prietary link proxy (accumulating click data) for large brands and publishers.

34,418

S6: Social media marketing, brand promotion and customer experience management and analytics for enterprises and businesses

Sprinklr, Spred- fast, Sprout Social

Social media marketing, advertising, content management, commu- nity management, collaboration, advocacy, monitoring and analytics tools for large brands and agencies.

24,834

S7: Content web services SnappyTV.com, IFTTT, Vine

Applets, video editing (e.g.creat- ing highlights), video sharing.

38,640

fore reveals a mix of types. Most prominently, I notice that many celebrities (e.g. 0220nicole, hughhewitt, sa↵rontaylor) and organisations (e.g. airandspace, TEDTalks, Xbox) with Twitter-facing communications rely on both humans and software to handle significant tweet activity.

As well as revealing human involvement in bot activity, Table 6.3 also presents a number of sources that are automated: S3–S7 are all software-based. These in-

clude social media integration management and primitive scheduling services (S3)

as well as more advanced tweet scheduling and automation services (S4). In fact,

together S3 and S4 form the second largest endpoints for generating tweet ac-

tivity with almost 176k tweets (29.65%) produced. Beyond these basic tools, I also observe a range of sophisticated and targeted bot platforms. For example, I observe pattern mining bots6 _{that learn optimal ways to obtain visibility (}_S

5),

and marketing, monitoring and analytics bots for large brand and enterprises (S6). The platform provides advertising and marketing products, and monitor-

ing through dashboard services. It is important to note that they account for

6_{These are based on collecting data from Twitter’s commercial Firehose API and accumu-}

0 1 2 3 4 5 6 7 Clusters 0 1 2 3 4 Tweets 104 (a) S1 clusters. 0 1 2 3 4 5 6 7 Clusters 0 2 4 6 8 10 Tweets 104 (b)S2 clusters. 0 1 2 3 4 5 6 7 Clusters 0 0.5 1 1.5 2 2.5 Tweets 104 (c)S3 clusters. 0 1 2 3 4 5 6 7 Clusters 0 2 4 6 8 Tweets 104 (d)S4 clusters. 0 1 2 3 4 5 6 7 Clusters 0 0.5 1 1.5 2 Tweets 104 (e) S5 clusters. 0 1 2 3 4 5 6 7 Clusters 0 5000 10000 15000 Tweets (f) S6 clusters. 0 1 2 3 4 5 6 7 Clusters 0 0.5 1 1.5 2 Tweets 104 (g) S7 clusters.

Figure 6.3: Types of most prevalent Twitter activity sources for bot clusters.

less than 60k tweets (10% of my dataset) but are highly optimised: SocialFlow, Sprinklr, Spredfast and Sprout Social (S6) are specifically designed to optimise

tweet activity for large brands (Xbox), agencies (CNN, TIME) and even popular individuals (alexburnsNYT) for maximum visibility and screen time. For example, Xbox retweeted a tweet7 _{(originally posted on Friday evening at 2200 hours)}

every few hours on Saturday to get maximum participants.

The final category of activity source endpoints includes content web services that are purposed to create applets, video editing and sharing on-the-fly by con-

7_{The original tweet can be found here (last accessed 16 June 2018) –} _{https://twitter.}

tent creators (S7). These services involve a combination of humans (e.g. to cre-

ate highlights of sports events or bulletin news via SnappyTV.com or Vine) and rapid content sharing (e.g. through content management and replication such as IFTTT conditional applets). While only around 39k (6.52%) tweets are produced by these services in the dataset, it shows the rapid ability of an information social network to distribute content.

I next inspect how the di↵erent clusters exploit each software platform. Fig- ure 6.4 presents the fraction of tweets generated by each source endpoint across the eight clusters. Di↵erences can immediately be seen across the choices made within each cluster. For example, it shows that 100% of tweets injected by Drudge8 _{were from accounts in Cluster 1, such as news reporters tweeting for}

AFP, AJENews, AlArabiya EGY, AlArabiya, bbcbrasil, FoxSports br, etc; rep- resentatives from ELLEfashion; sta↵ from DunkinDonuts, HarvardHealth, etc; individuals BobVila, jimcramer, etc; and the app itself DRUDGE REPORT. It is also noticeable that clusters 0 (Young producers), 1 (Young assistants) and 2 (Assistants) use most of the available activity sources, that range from human usage and intervention (left hand side) to completely automated services (right hand side). While clusters 3 (Popular content producers) and 4 (Popular content redirectors) show considerable human usage vs. automation, clusters 5 (Stellar active engagers), 6 (Stellar passive engagers) and 7 (Social chameleons) show much higher automation and scheduling vs. negligible human usage. This is understandable since content popularity is directly proportional to content nov- elty and popular trends, that in turn engages human interest. Most bots lack these properties and thus earn much lower popularity levels than human-created content, as noticed previously in Chapter 4.

In document Understanding the behaviour and influence of automated social agents (Page 106-109)