4.2 Edge-based MOGA-OCD Algorithm
4.3.5 Experimental Conclusions
Taking into account all the experimental analysis carried out in this chapter, some main conclu-sions can be drawn. Coda algorithm obtains the worst results in most of the experiments. This algorithm tend to detects many unstructured communities with a very low value of Q, or even equal to 0 in same cases. In addition the community overlapping is so high, making impossible to differentiate the distinct communities, and having a precision very low for the dataset with ground truth.
Conga and Congo algorithms usually have good results when the graph is very structured and the algorithm can partition it according to its network topology. But when these algorithms are applied using large size graphs are not able to perform an overlapping partition. In addition, for unstructured or sparse graphs, the accuracy and quality of the partition detected by these algorithms significantly decrease. This means that they are highly dependent on topology and the structure of the input graph.
Using unstructured or sparse graphs, the node-based MOGA-OCD algorithm improves the results in most of the cases. Related to the structured graphs, this version of the algorithm obtains results similar, or closer, to the rest. On the other hand, the edge-based MOGA-OCD algorithm obtains good results for both types of graphs (structured and unstructured).
Therefore, in general terms, the new MOGA-OCD algorithms have good results for the different types of graphs analysed, including largest datasets such as pgp where the same algo-rithms are not able to compute a graph partition. Furthermore, using the dataset collection with ground-truth, the new algorithms reaches the best accuracy in most of the cases studied.
Therefore, the experimental results show that these new approaches improve overall the results of the other classical approaches from the state of the art.
Applications for detecting
communities on Social Graph-based Information
“This world is not so kind, People trap your mind It’s so hard to find someone to admire.”
- Madonna (Nobody knowns me - American Life)
This chapter presents the application of the community detection algorithms to identify opin-ion communities related to Public Healthcare topics in Social Networks, acquiring new collective knowledge about their behaviour, preferences, profiles, etc. For this purpose, both classical al-gorithms and new alal-gorithms implemented in this thesis will be used comparing their results to evaluate them. In this particular work, these algorithms are applied to detect communities in Twitter which are disseminating vaccine opinions. Finally, an analysis of the influence of these communities to the rest of users, in a particular zone or country, is carried out proving how useful is this new knowledge to Public Healthcare Organizations. These organizations could improve their strategies increasing control and preventive measures in the risk zones identified.
For this purpose a dataset collected from Twitter, and the vaccination coverage rates re-trieved from the immunization monitoring system of WHO, have been used to perform several analysis. Using both datasets, an initial analysis is made focused on measuring the potential influence of vaccine opinions based on the variation in the coverage rates. In this analysis two factors are used: Topic Relevance Factor (quantifying the relevance of vaccine topic in a given country) and Kurtosis of Vaccination Coverages (measuring the distribution changes of vaccina-tion coverages rates). Afterwards, generating a network representavaccina-tion of the Twitter dataset, Community Detection Algorithms have been applied to identify groups of similar users opin-ing about vaccines. Finally, several centrality network metrics have been used to study these communities, discovering the most relevant users and analysing their social influence.
5.1 Detecting Discussion Communities on Vaccination in Twitter
The use of vaccines has contributed to dramatically decrease mortality rates from infectious diseases in the 20th century [72]. In 1920, 469,924 measles cases were reported in United States,
87
and 7575 patients died. The number of cases decreased to fewer than 150 per year in the 50s, and in 2008 there were only 64 suspected cases of measles in the world. However, currently, social groups related to vaccines have emerged influencing on the opinion of population about vaccination. This fact could bring on disease outbreaks because they are more common when vaccination rates decrease [95, 136, 169]. These vaccination communities have taken advantage of social media technologies to effectively disseminate its message and to spread their theories [99].
In recent years, several studies on various social media services such as YouTube [102], MySpace blogs [101], and Social Networks (SN) [160], present this dissemination and their effects. In addition, statistical analysis show how this vaccination information influences social media users in their treatment decisions [166].
Currently, one of the most popular social networks (SN) is Twitter [4], producing huge amounts of public information. Twitter users can generate new data sources of collective in-telligence through their comments and interactions, allowing the application of data mining techniques in several fields [21] such as marketing campaigns [38, 23], financial prediction [13]
or public healthcare [16, 19, 45], amongst others.
In the related literature, there are several works investigating knowledge acquisition from social networks about vaccine sentiments using classification techniques [33, 113, 153] in most cases. These classification techniques usually obtain better results than Clustering techniques as a consequence of its supervised nature. However, clustering techniques are able to discover hidden information (or patterns) on a dataset, and they don not need a previous human-labelling process. Any human-labelling process can be really time-consuming, or even impossible, for huge datasets extracted from SN as Twitter.
The information extracted from a SN can be represented as a graph, where the vertices represent the users, and the edges represent the relationships among them (i.e. a re-tweet of a message or a favourited tweet). This graph representation can be clustered into user groups, or communities, based on the topology information of the graph. Each community should include strongly interconnected vertices and few connections with the rest of graph vertices. Therefore, the problem of community detection within a SN can be handled using graph clustering algo-rithms [154]. These algoalgo-rithms can automatically organize a set of users from a SN into similar communities to acquire collective knowledge about their behaviour, preferences, profiles, etc.
This section presents an application of community detection algorithms to detect user groups in Twitter which are disseminating vaccine opinions in order to analyse their influence to the rest of users into their own community, zone, or country. Many people looks for vaccination informa-tion on the internet, and the data found can impact on their vaccinainforma-tion decisions. Therefore, Public Healthcare strategies could be improved through the application of the community detec-tion techniques, increasing control and preventive measures in the identified risk zones. In this particular work, the use of these algorithms are focused on discovering and tracking anti-vaccine movements arising in SN. For this purpose, firstly an analysis of the Twitter Social Influence on the vaccine coverage rates is carried out. Afterwards, a second part of the work is focused on the study of a real re-tweet graph, representing the user interactions who talk about vaccination.
5.1.1 Methodology to analysis the potential influence of social vaccine communities on