• No results found

John M. Doe

4.5 community detection bias

The robustness experiment for the Random perturbation strategy revealed rather large differ-ences in the robustness of the scoring functions. In Figure 4.7, the reported Z-scores for the combined datasets go up to 8.0 standard deviations from the mean, and in the case of the Pope2013-Spl dataset, as high as 15.0 standard deviations from the mean. This suggests that the scoring functions might be subject to a community size bias, where small communities dispropor-tionately affect the results. Therefore, an additional experiment is now proposed to investigate this potential size bias in the scores for the microblogging static scenario.

The experiment is setup as follows. First, a relatively high (p = 0.20) constant perturbation intensity is chosen. Then, the changes in the Z-score as a function of the ground-truth community sizes for the selected perturbation intensity is observed. Each Z-score is calculated with respect to all the ground-truth communities with a given size. Because p = 0.20 represents a moderately strong intensity for all the investigated perturbation strategies, high values of Z-score that are independent of the community size, i.e. constant, are desired if the scores are in fact unbiased.

Figures 4.10,4.11,4.12and4.13respectively show the results for the Node Swap, Random, Shrink and Expand perturbation strategies under the proposed p = 0.20 intensity for each ground-truth dataset, including a plot with all the data combined. Initially, the results con-tained very large Z-score values that subsumed the majority of the smaller values. Therefore, a simple outliers detection strategy [IH93] is applied to the plots with the purpose of improving the visualising of the smaller, more relevant portions of the data.

4.5.1 Node Swap

For the Node Swap perturbation in Figure4.10, no scoring function in the experiment is robust for small communities, e.g. with sizes up to ≈101.5for the Pope2013, Pope2013-Spl and RTE2015 datasets, and sizes up to ≈102 for the WorldCup2014 and Ireland2017 datasets. After these size limits, the values of Z-score are much higher. FOMD, TPR and Modularity are the exception to the above observation for the case of Ireland2017, where they exhibit good robustness with smaller communities. This is explained by the long timespan of the dataset, where the ground-truth communities, despite being small, have the most prominent Clustering Coefficient and Cohesiveness structural properties (refer to Table4.1) among all the datasets.

4.5 community detection bias 79

Figure 4.10: Z-scores as a function of community size for the Node Swap perturbation strategy applied to all community types for each Twitter ground-truth dataset. A combined plot is also presented.

4.5.2 Random

For the Random perturbation in Figure4.11, a very similar behaviour is observed. However with this strategy, FOMD, TPR, Conductance and Flake ODF have more consistent robustness across community sizes. Cut Ratio remains stable but with Z-score values close to zero, suggesting that it is not able to distinguish perturbed and non-perturbed communities when the sizes are small enough, e.g. less than ≈102for the Pope2013 and Pope2013-Spl datasets.

4.5.3 Expand and Shrink

Lastly, for the Expand and Shrink perturbations seen in Figure4.12and Figure4.13also reveal that the scoring functions have a bias for smaller communities, specially in the Expand strategy.

In that case, Conductance and Flake ODF (mixed connectivity family) are the more robust in bigger ground-truth communities for the static scenario. On the other hand, for the Shrink perturbation, the Modularity scoring function is prominently more robust on larger communities, again evidencing that its resolution limit also applies to microblogging data.

In general, this experiment evidences that all the studied scoring functions have an inher-ent bias towards small communities, i.e. produce artificially higher performance, for the static

101 102

Figure 4.11: Z-scores as a function of community size for the Random perturbation strategy applied to all community types for each Twitter ground-truth dataset. A combined plot is also presented.

101 102

Figure 4.12: Z-scores as a function of community size for the Expand perturbation strategy applied to all community types for each Twitter ground-truth dataset. A combined plot is also presented.

4.6 chapter summary 81

Figure 4.13: Z-scores as a function of community size for the Shrink perturbation strategy applied to all community types for each Twitter ground-truth dataset. A combined plot is also presented.

scenario of microblogging social streams. In particular, some of the scores, e.g. Cut Ratio and Modularity, do not perform well when applied to communities smaller than ≈100 users. Nev-ertheless, the identified size bias does not render the scoring functions incapable of working in microblogging. The scores belonging to the internal and mixed connectivity families proved the most robust and reliable of them all for the analyst to consider, given that the communities under study are large enough. Alternatively, the Conductance and Flake ODF scores are also good candidates for consideration in a lesser degree.

4.6 chapter summary

In this chapter, the problem of evaluating community detection in the context of microblogging services – represented by Twitter – was addressed. First, the structural properties of the con-structed functional ground-truth communities in Chapter3were evaluated. Afterwards, a set of structural community scoring functions from the literature were thoroughly evaluated using the constructed functional ground-truth in a static scenario that does not consider any temporal in-formation in the microblogging streams. This evaluation investigated the community detection

goodness of the scoring functions and their robustness to a number of perturbation strategies.

Furthermore, the sensitivity and bias of the scoring functions were also studied.

For the static scenario, the scoring functions based on internal structural information, i.e.

from the internal and mixed families, demonstrated to be the best performing. Overall, to identify more clustered, dense and cohesive communities in Twitter, FOMD and TPR are the recommended choices for structural scoring functions. However, if dense but more separated communities are desired by the analyst, then Conductance or Cut Ratio should be considered instead. On the other hand, Modularity and Cut Ratio were found to be weaker in the same context and should not be preferred. As an alternative, Flake ODF and Conductance (in a lesser degree) from the mixed connectivity family were also found reasonably robust for community detection in microblogging data and can be also recommended for consideration.

In terms of robustness to random perturbations, in general the FOMD score (internal con-nectivity) stands as the most robust and sensitive score for all the perturbation strategies under evaluation in the static scenario. The Modularity score performs the worst under every pertur-bation strategy except Shrink, where only Cut Ratio is worse for microblogging data.

The experiments in this chapter also evidence that all the studied scoring functions have an inherent bias, i.e. produce artificially higher performance, towards small communities for the static scenario of microblogging social streams. In particular, some of the scores, e.g. Cut Ratio and Modularity, do not perform well when applied to communities smaller than ≈100 members. Nevertheless, the identified size bias does not incapacitate the scoring functions in microblogging, but instead this size bias must be taken in consideration.

5 T E M P O R A L C O M M U N I T Y D E T E C T I O N I N M I C R O B L O G G I N G

In this chapter, the dynamic scenario of community detection in microblogging is investigated.

The following main research question, proposed for this scenario in Chapter1, is addressed.

(RQ3)→ How can activity hotspots based on the dynamic user activity in time be identified in the defined ground-truth communities to improve community detection?

To provide an answer to this research question, the following research sub-questions are also proposed for this stage, and are investigated in detail in this chapter.

• (RQ3.1) → What are the temporal characteristics, for instance the user activity distributions, of the defined ground-truth functional communities in(RQ1.1)and(RQ1.2)?

• (RQ3.2)→ Using the dynamic user activity in time as a basis, how can activity hotspots be identified in the defined ground-truth functional communities in (RQ1.1) and (RQ1.2) to be used for further identifying time-scoped sub-communities?

• (RQ3.3) → Considering the identified time-scoped sub-communities based on user activity hotspots defined in(RQ3.2), how well do the state-of-the-art structural community definitions investigated in (RQ2.2)now align to these sub-communities in comparison to the ground-truth functional communities in the static scenario, i.e. without considering their user activity context?

First, a definition for user activity hotspots is introduced and then methods for identifying hotspots in the ground-truth functional communities defined in Chapter 3 are proposed. Af-terwards, temporal sub-communities are generated using the identified user activity hotspots and an evaluation is carried in a dynamic scenario of microblogging social networks. The same thirteen structural community definitions discussed in Chapter4are re-evaluated using the tem-poral sub-communities, including their robustness and sensitivity to random perturbations.

The identified contributions of this chapter are: (1) a strategy for the identification of tem-poral activity hotspots in functional communities in microblogging based on the network of

83

user interactions, that improves the performance of existing community detection algorithms de-signed for static data (2) an in-depth characterisation, understanding and evaluation of structural properties for functional communities in microblogging social media, for the dynamic scenario, and (3) a set of recommendations on community detection algorithms based on data-driven evaluation of Twitter user interactions networks.