Transversal methods - Alternative clustering and classification methods

2.3 Alternative clustering and classification methods

2.3.5 Transversal methods

A time serie X = X1, . . . , XT can also be represented as a unique point in a multi-dimensional space, where each dimension d represents an observation of the serie at time t. Therefore, the number of dimensions represent the number of measures in time D = T . By making this representation, we lose the notion of sequence, which implies numerous problems especially related to the interpretation of the clusters. For example, groups can often be formed simply because they are in similar states in one or more disjoint periods, what is a non-sense in terms of interpreting the partitions.

Therefore, it is important to take into account that a time serie (and longitudinal data in general) cannot be represented as simple collection of points in a T -dimensional space, because often in reality the T distinct dimensions are far from being independent. Since we do not consider the crucial information of the ordering of the measures, the time dependence between the observation is lost.

However, it is not uncommon for these methods to be used on time-dependent data, so we briefly present some examples. Some researchers are ready to sacrifice this loss of information for the sake of simplicity and the possibility to use more advanced transversal methods. This is most often done with discrete or categorical data. But there exist also a non negligible number of continuous longitudinal data clustering problems that have been treated using transversal tools. Therefore, it is necessary to briefly overview some of the frequently used transversal models.

Some of these methods make use of a given metric indicating a distance between sequences. After the distance between every pair of sequences is computed, a distance matrix is constructed and used to create clusters. Multidimensional clustering methods are often directly applied on the sequences, neglecting the time order. Depending on the approach, the distance can be measured between static points in a multivariate space, rather than between ordered time-varying sequences. An issue of such procedure is the possibility to form clusters based just on the proximity at a single time point (for instance at t=5), that can be considered as the “most discriminant dimension” of the

data. This procedure can be criticised, because it is based on distance measures that are difficult to interpret for time-dependant data, since they neglect the evolution and dependence in time. Some specific procedures are described hereafter.

Optimal matching

One of the most used algorithm for measuring the distance between data sequences, especially in the social sciences, is the Optimal Matching (OM). We already mentioned the important inconvenients of this method, but it is important to mention that it does not belon to the model-based clustering procedures, since it is a data-driven approach.

Therefore, no inference on the results seems feasible. It is also more appropriate for discrete and nominal data, but applications on discretized continuous data are also frequent. Once the dissimilarities between data sequences are found, classical clustering tools may be applied.

Multidimensional scaling

Multidimensional Scaling (MDS) is an alternative method generally used on transversal data. This is a form of non-linear dimensionality reduction. It is frequently used as a data visualization technique taking a distance matrix as input and returning a coordinate matrix, using eigenvalue decomposition, which minimizes a loss function.

K-means

One of the simplest and most popular transversal clustering method is the K-means al-gorithm. It separates data observations into k di↵erent clusters, where each observation is associated to the cluster with the closest mean µj.

Even though it is mostly associated with means, what the k-means algorithm per-forms is essentially variance minimization. The function F () (representing the sum of squared errors within the clusters) is minimised:

F = Xk

j=1 nj

i=1

(||x^(j)i µj||)²

where nkis the number of data points in the k-th cluster and x are the observations. K-means (like the EM algorithm) iterates between two repetitive steps until convergence.

After randomly attributing each observation to a class, one first computes the mean of each class (E-like step) and then the observations are re-assigned to the nearest cluster

2.3. ALTERNATIVE CLUSTERING AND CLASSIFICATION METHODS 31 by minimizing the distance to the cluster means (M-like step). The algorithm stops when no more observation moves from one class to another (see Bishop [23]) .

Though being basically a transversal data-driven method, a version of k-means (called KmL) was developed for longitudinal data clustering by Genolini and Falis-sard [57] in 2010. However this method seems to ignore the time sequencing of the observations, by computing the Euclidean or Manhatan distance between sequences.

Thus we return to the situation in which a sequence is viewed as a single point in a multidimensional space, with all its related issues. Moreover, this approach can also be criticised, since it uses the Gower adjustment of the Euclidian distance, and this adjustment simply ignores the time periods where missig observations occurs:

DGower(xit, xjt) = vu ut 1

P!ijt

XT t=1

(xit xjt)²⇤ !ijt

where !ijt= 0 if one of the sequences (xior xj) in unobserved on time t, and 1 otherwise.

That leads to a clustering based only on a part of the available data.

Neural networks and self organizing maps

Neural Networks, inspired by the central nervous system in biology, are statistical learn-ing models that are mainly used in machine learnlearn-ing and particularly in supervised learning where we usually need to observe some output before using the model. There-fore, they are more suited for classification rather than for clustering. Some related models, such as Self Organizing Maps (SOM) invented by Kohonen [81] and Adaptive Resonance Theory, were however developed to perform clustering. The latter has also been applied to the clustering of time-varying data Tomida, Hanai, Honda & Kobayashi [157].

SOM are a tool for the visualization of high-dimensional data, and it attempts to “convert complex non-linear statistical relationships between high-dimensional data items into simple geometric relationships on a low-dimensional display” Kohonen [81].

Therefore, it is a dimensionality reduction tool (most often on two dimensions) and it relates to a non-parametric regression model.

A distance measure is used by SOM in order to select the best node (best matching unit). This is again an important part of the procedure which indicates its perfect adap-tation to multivariate data, but also an inconvenient when dealing with time-varying sequences. Nevertheless, SOM has been applied for time-varying data clustering, for example by Cherif, Cardot & Bon´e [35] and Sarlin [133]. The conclusion of these

au-thors was that the results depend on the seasonality and on the characteristics of the series.

In document Latent Markovian Modelling and Clustering for Continuous Data Sequences (Page 40-43)