• No results found

Building a relational classifier using only the network structure

network structure

In this section, we propose a classifier that takes advantage of the network struc- ture to classify entities. Inspired by the literature about social networks, where homophily is usually reported, our classifier uses the labels assigned to one en- tity’s neighbors in order to classify that very same entity. The problem that arises with this approach is obvious: needing to know the labels of the neighbors in order to classify an entity creates a recursive problem, where the labels of the neighbors are needed to classify the original neighbors, and so on.

Our approach to solve this problem is to create a two-module classifier as shown in Figure7.1. This two-module classifier is composed by an initial module, which makes a first labeling of entities into the desired categories; and a relational module, which uses the results of the previous module to exploit the neighborhood profile. The initial module uses label independent features extracted from the network structure. Note that the performance of the initial module is not critical since its results are only used once as inputs for the relational module. In turn, the relational module makes use of both label independent features computed on the previous stage and label dependent features, that can be computed for all the nodes because the output of the initial module provides labels for all the

Initial classifier Relational classifier Data preprocessing Data preprocessing Class labels New class labels Neighborhood analysis Structural features extraction

Initial Module Relational Module

Figure 7.1: Classifier modules scheme

entities. Moreover, the relational module can be applied iteratively, so that the results of one execution of the relational classifier can be used as new labels for a new execution of the relational classifier. By doing so the classifier is able to refine the results taking into account the newly learned information.

This architecture is similar, to some extent, to the one used by NetKit [40]. In our proposal, the initial module acts as the non-relational model. The modules are similar in the sense that they provide a model to initialize the relational component. However, note that one of our contributions is to use the network structure to define features for this classifier, instead of using local attributes of the nodes alone. Then, their relational and collective inference models correspond to our second module, which uses relationships in a similar way than in Class- Distribution Relational neighbor Classifier [40,199,200].

The next subsections describe each of the two modules in detail. As we will see, the initial module is composed by three different components: the struc- tural feature extraction, the data preprocessing, and the initial classifier. At the same time, the relational module is also made of three elements, presenting a similar architecture: the neighborhood analysis, the data preprocessing, and the relational classifier.

7.2.1

Initial module

The initial module uses structural node properties to obtain an initial node la- beling that can be used afterwards by the relational module. Starting from a graph, a set of structural features is extracted. These features are then used

Chapter7. Improving classification using network structure 95

to map nodes to |F|-dimensional samples which can be labeled with a classic non-relational classifier.

The feature extraction component is used to obtain local features for each of the nodes’ of the graph. Although any graph structural feature computed over the nodes of the graph can be used with this setting, our experiments show that very basic metrics already provide good enough results for an initial of the nodes. Node metrics such as degree, betweenness, and closeness centrality, clustering coefficient, or degree assortativity may be used as local features. The number of features used by the feature extraction module determines the number of dimensions of the resulting samples. We denote by F the set of node local features used in the initial module.

Just before classifying the samples obtained from the feature extraction module, a data preprocessing step is applied. This preprocessing step consists in basic transformations of the data that ease the classification process. On one hand, data is standardized by dividing each sample by the standard deviation of the at- tribute. Moreover, feature extraction and dimensionality reduction is performed with Principal Component Analysis (PCA) to try to optimize the classifier perfor- mance. This preprocessing step is done independently from our structural feature extraction module, i.e., it is applied to the |F|-dimensional samples obtained by the structural features extraction component, without taking into account the graph itself any more.

Then, the initial classifier component is built upon a Support Vector Machine classifier (svm) with soft margins, a Gaussian Radial Basis Function kernel, and a scaling factor of one. Relational datasets usually contain some weird nodes that, although being labeled as members of one type, they exhibit values on the computed attributes very similar to those shown on nodes of the other types. For this reason, we use a soft margin classifier in order to find a solution that better distinguishes the majority of the nodes while neglecting to classify these outliers. Using a Radial Basis Function as kernel shows good enough results for the initial classifier. Other kernels such as high degree polynomials also offer similar results. Support vectors are computed from the train dataset with Quadratic program- ming method and the final result is an initial classification P0 where all nodes v

i

7.2.2

Relational module

The initial classification provides us with an initial mapping P0(v

i) = ck, ∀i =

1, · · · , n that can be used together with the relations between entities expressed by the graph to further improve classification accuracy. The relational classi- fier module uses this initial classification to start exploiting the relationships expressed by the graph. However, note that this information is used only once, in the first iteration of the relational classifier. From that moment on, since new updated (and allegedly more accurate) information about nodes’ labels is available, the relational module does not need this initial labeling any more. The neighborhood analysis module has as inputs the graph, that is, entities and their relationships, and the initial labeling P0(vi); and outputs a (|C| × 2)-

dimensional sample for each of the nodes of the graph. This sample contains an aggregated description of the neighborhood of the node. The relational mod- ule assumes that the class of a node depends only on the classes of their direct neighbors, such that the probability of a node belonging to a given class is in- dependent of the rest of the graph but its immediate neighborhood. This makes the problem of inferring class membership more tractable. Then, in a similar way than with the Class-Distribution Relational neighbor Classifier [40,199,200], the neighborhood analysis module constructs the node viclass vector CV (vi) as the

vector of summed linkage weights to the various known classes. In this way, the k-th position of the class vector CV (vi)k contains the number of neighbors of vi

within the predicted class ck.

Following the scheme showed in Figure7.1, CV(t)(v

i) is the result of the neighbor-

hood analysis box at the t-th iteration, which is used as input for the relational classifier, after being properly preprocessed. The first time the neighborhood analysis module is used, the mapping P0(v

i) resulting from the initial classi-

fication is used to construct the nodes’ class vectors. From that moment on, the neighborhood analysis module takes as inputs the new mappings Pt, t 6= 0,

resulting from the relational classification phase. It is worth to mention that classification at stage t + 1 uses only labels designed at stage t.

However, since we are dealing with directed graphs, we extend the class vector to contain two different values for each category, corresponding to the prede- cessors and the successors of the analyzed node. Therefore, each CV (vi) vector

Chapter7. Improving classification using network structure 97

component has exactly two dimensions,1 the first corresponding to the aggre-

gated counts over the successors of the node, and the second corresponding to the analysis of the predecessors:

CV(t)(vi)k,1= |{vj∈ Γ(vi) s.t. Pt−1(vj) = ck}|

CV(t)(vi)k,2= |{vj∈ Γ−1(vi) s.t. Pt−1(vj) = ck}|

Then, as can be appreciated from the classifier architecture, another data pre- processing module is applied just before the relational classifier. This module has two main functions. First, it takes the samples created by the neighbor- hood analysis module and appends them the features used by the initial module. Since the information generated by the feature extraction component of the initial module has proven to be useful to classify entities, there is no apparent reason to obviate it. Second, in a similar manner than in the data preprocessing step from the initial module, basic prepocessing techniques such as standardization and dimensionality reduction are applied.

Once the vectors for each of the samples have been constructed, we use them as inputs for the relational classifier. The relational classifier is instantiated in a similar way than the initial classifier. It uses Support Vector Machines with soft margins and a Gaussian Radial Kernel Function with the scaling factor equal to one.

Let us stress again that the relational module and, with it, the relational classifier, is applied iteratively. Since the output of the refinement classifier should be better than that of the initial classifier, we can use the output of the relational classifier to compute new values describing the relationships of the entities, and then apply the relational classifier again to improve classification performance. Ideally, we would like to run the relational classifier iteratively as many times as needed until the results converge. However, this method may not always converge, so some other termination condition has to be set to stop the iterative process. In our

1Note that this very same approach can be used to extend the methodology to heterogeneous

case, we fixed a maximum amount of iterations and considered as final results those obtained when that maximum amount of iterations is reached.

7.2.3

Multiclass classification

Since basic support vector machines are applicable only to binary classification problems, our architecture as described above would have the same limitation. In order to overcome this limitation, we allow both our classifier modules to use a combination of binary svm classifiers with one-versus-all methodology. In this setting, we construct |C| binary classifiers, each of them considering positive samples the nodes of one class and negative samples the nodes of all other classes. Then, we assign each test sample to the class that classifies it with the greatest margin. Individual binary classifiers are built with the configuration explained in Sections7.2.1and7.2.2.