3.5 User Engagement Graph
3.5.1 Graph Builder
We create engagement graph by using interactions between users to model the way users interact with a video or a channel. This allows us to detect orches- trated actions by sets of users which have a very low likelihood of happening spontaneously or organically.
In practice, the YouTube Comment engagement graph is built with the anonymized aggregate YouTube user activity logs from the past 30 days win- dow, and is updated on a daily basis using a MapReduce implementation. Here
2We set d
maxto be 500 by default. This is because the degree of most known spammer nodes
we take the snapshot of graph created on August 3rd, 2015. The Comment engagement graph consists of hundreds of thousands of nodes and tens of mil- lions of edges. The detailed statistics of the engagement graph in use are not discussed here for privacy reasons. Note that the engagement graph we created here constitutes a subgraph of the entire YouTube engagement graph, where we only captured entities that had activities within the scope of a month.
In the engagement graph, nodes represent users and edges represent com- mon videos or channels in which the users engage. Users that have interacted with a common video will share an edge and are consequently joined in the graph. Edge weights are by default computed based on the number of common engagement activities between two nodes. For example, in the case of users commenting on a YouTube video, this approach translates into users having and edge weight between them equal to the number of common videos they have commented upon.
Adding weight penalty
The way we built the YouTube Comments engagement graph is essentially the same as above except for the subtle difference that node can be two types of entities – a user or a Google+ Page. It is worthwhile noting here that YouTube Comments can be made through the Google+ social platform, without having to log into the YouTube sites. Such feature was powered by YouTube’s Google+ comment integration system introduced in November, 2013. Each PlusPage be- haves like a unique user ID and can be used to write comments across platforms including YouTube.
U1 U2 A B C D E Google+ pages Users Videos 1 2 3 4 5 6 7 8 B C D E A 5 4 3 3 1 1 1 1
Figure 3.1: Example of constructing Google+ pages engaged graph. It shows a group of two users using their PlusPages to spam video #1.
step when constructing the graph. This modification tends to penalize those PlusPages created by the same user the following way:
˜
wpi,pj = 1(u(pi)= u(pj)) · |P(u(pi))|+ wpi,pj, (3.6)
where 1(·) is the indicator function; u(·) defines the owner of a PlusPages and P(·)gives the set of PlusPages a user has created. We use wpi,pj to denote the orig- inal edge weight between PlusPages pi and pj, and is calculated by the number
of common videos both pi and pj commented on. ˜wpi,pj is the updated edge weight, and is equal to wpi,pj when piand pj share different owners. In the case where pi and pj are created by the same user, we add extra weight regulated
by the total number of PlusPages the user has created. The rationale being that owning a larger number of PlusPages indicates a stronger signal of being po- tentially abusive.
Figure 3.1 gives an example of constructing Google+ pages engaged graph. It shows that the edge weight between (A, B), (A, C) and (B, C) are all increased by 3, which is the total number of PlusPages the user U1has created. The clique
structure formed by node A, B and C becomes more noticeable after applying the penalty.
3.5.2
Spammer Seeds
In the context of anomaly detection, when we find suspicious users, we often want to quickly find additional users with similar patterns of behavior that should be disabled as well. LEASmakes use of those users that are identified to be abusive from other YouTube’s security mechanisms as seeds.
In practice, spammer seeds are also updated on a daily basis together with the engagement graph. Since the number of available seeds can be limited, LEAS
can greatly expand the coverage of daily fake engagement take-down volume.
Degree distribution
We started probing into the behavior pattern between the spammer nodes and the general population by examining the node degree distribution. A salient observation from Figure 3.2 is that the degree distribution of seeds (depicted in magenta) has a dissimilar tail effect compared to that of the general population (depicted in blue). And the difference can be been across all engagement-level activities, and is mostly evident in the Comments graph.
100 101 102 103 104
degree
10-5 10-4 10-3 10-2 10-1 100fraction of nodes
Figure 3.2: Comparison of node degree distribution between spammers and the general population in YouTube Comment engagement graph. The degree distribution of seeds is depicted in magenta, whereas the distribution of general population is depicted in blue. The number of seeds used for plotting is 2k. To plot the general population distribution, we first randomly sam- pled 10k nodes from the engagement graph. We further ex- cluded those known abusive nodes from the sampled popula- tion, which left us with 9,957 nodes. Note that the sampled population may contain unknown malicious nodes.
in relatively modest scope and scale. For example, we looked into several ex- isting online vendor sites that claim to sell YouTube fake engagement. Through investigation we found that YouTube Comments are usually sold with package size ranging from 15 to several hundred, which matches exactly with the seed degree distribution in Figure 3.2. For example, we find spammer nodes rarely have degree greater than 781 in the Comments graph.
3.6
A MapReduce Implementation
Our local spectral diffusion method enables a straightforward adaption to the MapReduce implementation framework. In this Section, we introduce practical details and also potential caveats in applying the method at scale. The imple- mentation is provably scalable to massive datasets and trivially parallelizable, with the capability of searching for many clusters simultaneously. Furthermore, our pipeline has the same performance guarantee as the serialization since each diffusion procedure is performed locally on the graph.
Data Server The engagement graph is served using SSTableService, a dis- tributed in-memory key-value serving system within Google. Each data server holds a partition containing 1/P of the total amount of data, where P denotes the number of shards (partitions) of the data. SSTableService allows serving graph queries in a much faster speed compared to on-disk queries. The SSTableService is shared across mappers when running the job.
Data FormatWe use Protocol Buffers3for defining the I/O data streams in our
implementation. Each protocol buffer message is a small logical record of infor- mation, containing a series of name-value pairs. The graph protocol namely stores the weighted adjacency list keyed by each node; the seed protocol con- tains the IDs of the spammer seeds; and the accomplice protocol defines the output of detected accomplice clusters consisting of suspicious nodes with sim- ilar pattern of behavior as the seed. Additionally, we define config protocol for conveniently encapsulating and passing configuration parameters to each mapper when initializing the jobs. Some tunable parameters in our pipeline in- clude, for example, the dimensionality of local spectral subspace l, the number
YouTube
engagement log Graph builder
SSTable SSTable SSTable SSTable SSTable SSTable SSTable SSTable SSTable
SSTable Spammer seeds
Seed expansion mappers Accomplices
SSTableService Engagement graph
Figure 3.3: MapReduce implementation of YouTube fake engagement de- tection pipeline.
of short random walk steps k, the minimum cluster size n, the maximum size of the sampled subgraph N, the degree threshold dmaxfor sampling the subgraph,
the edge weight threshold m.
Algorithm 2: MAPREDUCELEAS
Globals:graph G = (A, E, W), configuration parameters
1: INITIALIZEREPLICA()
2: for s ∈ S do
3: if deg(s) ≤ dmaxthen
4: Sample subgraph Gs
5: Vk,l= LOCALSPECTRAL(Gs, s) . compute local spectral subspace
6: Solve the optimization objective y in Section 3.3
7: C0 = S
WEEPCUT(y)
8: emit hs, accomplice C0i 9: end if
The core of the MapReduce LEASalgorithm can be seen in Algorithm 2. The module of INITIALIZEREPLICApasses the parameters defined by the configura- tion protocol to all the mappers. And each mapper job processes one seed at a time independently. The entire pipeline of fake engagement detection is illus- trated in Figure 3.3, which encompasses the main components of graph builder and seed expander. The graph builder is also implemented using MapReduce framework, where the details are omitted here due to space limit.
3.7
Experimental Analysis
3.7.1
Scalability
YouTube now has over a billion users and is continuing to grow. Therefore, it is important for the algorithm scales well to large datasets in order to efficiently catch the fake engagement activities on a daily basis. We test and compare the performance with COPYCATCH, which is the state-of-the-art algorithm that de-
tects fake Page Likes by analyzing the engagement graph of user-Page interac- tion.
Firstly, we test the scalability of the algorithm by running our implementa- tion on the YouTube Comments graph over different number of seeds. To make the test results comparable, we choose the same set of seed numbers as that re- ported in [23]. The number of seeds varies from 100 to 5,000. We additionally run the pipeline with only 10 seeds to test the system starting-up time. De-
0 1000 2000 3000 4000 5000
Number of seeds
0 1000 2000 3000 4000 5000 6000 7000Running time (seconds)
LEAS CopyCatch (a) a 0 200 400 600 800 1000 Number of seeds 0.2 0.0 0.2 0.4 0.6 0.8 1.0 1.2 Metrics value Internal density Flake-ODF (b) b
Figure 3.4: (a) Comparison of pipeline running time with state-of-the-art as the number of seeds increases. (b) Internal density and Flake-ODF of detected accomplice clusters in YouTube Com- ments engagement graph. We filtered those seeds with degree greater than 500, i.e., dmax=500 and performed the diffusion al-
gorithm on the rest of the seeds. The number clusters in plot is 955. Cluster indices are sorted by the internal density value.
the system to allocate and set up the data servers and the MapReduce clusters. Figure 3.4(a) shows the comparison of running time between COPYCATCHand LEAS4. It is worthwhile noting that LEASachieves 10 times faster running time
with much fewer machines. For example, 3,000 mappers and 500 reducers were used for all the testing data points in [23], whereas at most 1,500 mappers and 2 reducers are required in LEAStest run with 5,000 seeds. Even fewer mappers
are required for those tests with smaller number of seeds. For example, running the pipeline with 1,000 seeds uses 295 mappers, 2,000 seeds uses 597 mappers and 10,000 seeds uses 2,999 mappers.
As seen in the results, we find that the running time of LEAS is almost in-
dependent of the number of seeds. This is reassuring that our implementation exploits the parallelism of the problem and can continue to scale as the data scales.
3.7.2
Performance Evaluation
Graph Metrics
To evaluate the accomplice clusters found by LEAS, we first measure the struc- tural properties using two commonly adopted metrics [191].
• Internal density measures the internal edge density of a node set V0
. A larger internal density value indicates a more densely connected community-like structure among nodes.
f(V0)= 2|E
0|
|V0|(|V0| − 1)
• Flake-ODF is a cluster metric that takes into account both the internal and external connectivity of a set. It measure the fraction of nodes in V0
that have fewer edges pointing inside than to the outside of the set. Ideally, a smaller Flake-ODF value indicates a better cluster quality.
f(V0)= |{v : v ∈ V
0, |{(v, u) ∈ E0
: u ∈ V0}|< deg(v)/2}| |V0|
Figure 3.4(b) presents the measurement scores of accomplice clusters de- tected in three YouTube Comments engagement graph. The most striking ob- servation is the difference concerning the internal density distribution exhibited by the Comments graph. We see that clusters detected from the engagement graph in general are compact with high internal density, which may signify the orchestration strategy when performing fake engagement — that the YouTube fake Comments spammers are exposed to have stronger lockstep behavior pat- tern, where groups of users acting together, commenting on the same videos at
on the other hand, displays a less orchestrated pattern with more likelihood to be incentivized campaigns. Our probe into the structural properties of the detected clusters also suggests that further evaluation is imperative.
YouTube Comment: Manual Review Results
To verify the effectiveness of the algorithm, we ran the pipeline on the engage- ment graph built on August 3rd, 2015 within 30 days of time window, and per- formed intensive manual review on the detected accounts. In total, the pipeline detected roughly 24,000 unique accounts with 955 spammer seeds. Among the newly detected accounts, we find that 8,500 of them are found by more than one seed; while the other 15,500 accounts are detected by only one seed. Figure 3.5 depicts the distribution of the frequency for each account being detected by certain seed(s). The fact that an account detected by several seeds is a stronger indication of being potentially abusive. We therefore divide the results into two types and perform analysis accordingly:
• Tier I: accounts that are repeatedly detected by more than one seed (35%). • Tier II: accounts that are uniquely detected by only one seed (65%).
To investigate the Tier I accounts, we randomly selected 36 accounts without applying any metric thresholding. We manually examined each account’s infor- mation and YouTube post history. We also take into consideration the Google internal security measures associated with each account, but will not discuss in detail here for security reasons. The manual review shows that 100% of the Tier I accounts were verified to be fake. Among the Tier I accounts, the most frequently detected account was found by 64 seeds. We find that this particular
high confidence region hard cases: 98% precision
with metric thresholding
Figure 3.5: Detection frequency distribution of among the accounts de- tected by LEAS. 10d 1m 3~4m 0.5~1.5y 2y 3~4y 6y account age 0 2 4 6 8 10 12 count (a) (b)
Figure 3.6: (a) Age distribution of 36 manually reviewed Tier I suspicious accounts. (b) Google live runs on YouTube engagement graphs with portion of the seeds, dating from August 6th to August 13th, 2015. The magenta curve depicts the daily volume of unique accounts detected by LEAS pipeline, and the blue curve indicates the daily number of videos these accounts have acted upon.
account was created less than 10 days ago yet had posted more than 253 posts with many quota exceeded. We manually clicked through the comments posted by these accounts, and found that most comments are short text pieces such as “good videos”, “very cool ”, “nice”, “oh”, “lol” or emoji of smile faces. We also find the common pattern for accounts to post exactly the same or similar short, fake comments to different videos. Besides, we also discovered a few accounts post- ing comments under popular songs, the contents of which are irrelevant to the video content itself but rather asking for view and subscribe (e.g., “please sub- scribe” or “subscribe now”). Additionally, several other spammy accounts posting comments including malicious URLs and advertisement were detected.
Besides the contextual information, we also looked into the lifespan of each suspicious account. Although one might expect most spammer accounts to have relatively young age, it was actually quite surprising to see the age heterogene- ity of those accounts, as shown in Figure 3.6(a). Among the 36 accounts, the most frequent age falls into the range between 0.5 and 1.5 years; whereas the oldest spammer account have already been existent for more than 6 years.
The Tier II accounts are the harder cases. In order to guarantee the FP guards in production, we randomly selected 100 Tier II accounts that belong to an ac- complice cluster with internal density greater than 0.7. The manual investi- gation shows that 98% detected Tier II accounts to be fake5. The comments
posted by these accounts share similar pattern as those made by Tier I accounts. Quite interestingly, we indeed found a detected cluster of 15 accounts posting the same comments of either “i love pets”, “yeah” or URLs under certain videos. This further verified that the suspicious groups detected by the algorithm are of high accuracy. As for the other two accounts we are uncertain about, one 5In practice, we treat activities made by both Tier I and Tier II accounts as fake engagement.
has huge amount of Google+ shares of good deals although it posted nothing on YouTube comments; another 4-month old account posts a mixture of both organic and fake-like comments, which might be incentivized.
3.7.3
Deployment at Google
LEAS now runs regularly at Google, expanding the coverage of fake engage-
ment activities on YouTube. Parameters have been chosen to significantly distin- guish organic user behavior from fake social engagement. There are two levels of take-down actions in practice — engagement level and account level. En- gagement level take-down is a soft penalty which removes all the fake engage- ment activities happened during the day associated with the detected accounts; account level take-down is a more severe outcome, which is applied when we have very high confidence in certain bad actors committing fake engagement from time to time. Figure 3.6(b) shows the daily aggregate volume of detected accounts when running our pipeline on YouTube Comments graphs with por- tion of the spammer seeds, dating from August 6th to August 13th, 20156. We do
not display the entire daily take-down volume here for security reasons. Note that engagement level take-down was the main penalty applied during our test runs, henceforth the detected accounts didn’t exhibit a fluctuation from day to day — otherwise we would expect to see a decreasing volume of detected ac- counts when applying the account level take-down policy. Overall, this method, in combination with other existing abuse infrastructure at Google, is effective in decreasing the volume of fake social engagement on YouTube.
6The decreased amount of detected account on August 8th and 9th was due to the reduced