3 Machine Learning Interfaces - Proceedings of the KI 2009 Workshop on Complex Cognition

The translation of high-dimensional subsymbolic data into symbols are tasks that are well known in data mining and machine learning under the terms classification, clustering and dimensionality reduction. Many symbol grounding related work exclusively concentrated on neural networks in the past [6, 7, 2, 18], perhaps due to a historical affinity to connectionist models. To overcome the restriction this Section shows the relation between symbol grounding and machine learning: assigning unknown objects to known concepts is known as classification, group- ing objects is known as clustering, finding low dimensional representations for high-dimensional data is denoted as dimension reduction.

3.1 How are Machine Learning Algorithms Related to Symbol Grounding?

The problem of iconization, discrimination and identification formulated by Har- nad [6] is closely related to the question how to map high-dimensional data to classes or clusters. Classification, clustering and dimensionality reduction are similar in this context. They perform a mapping from a high-dimensional data space D to a low dimensional set of symbols S that may be a class, a cluster, or a low dimensional manifold. The three machine learning tasks implement the nature of dimensionality reduction as follows: Classification algorithms deliver a subsymbolic to symbolic mapping I : D → S with regard to explicitly labeled data samples in a supervised way. In a training phase mapping I is learned by reducing the classification error. A learned interface is used to classify unknown data, i.e. assign symbols to classes of similar high-dimensional input data. Clus- tering algorithms deliver the subsymbolic to symbolic mapping D to a set of clusters S with regard to the intrinsic structure of the subsymbolic data and the properties of the algorithm in an unsupervised way. Frequently, the dimensionality of observed data is much higher then the intrinsic dimensionality. A 3D-object for example has got an intrinsic dimensionality of 3, but on a dig- ital image the dimensionality of the data vector is much higher depending on the resolution of the picture. Last, dimension reduction methods have a similar task like classification and clustering. For high-level data low-level representation have to be found, e.g. a mapping from subsymbolic to symbolic data I : D → S or the mapping from Rm _{→ R}n _{with m > n. I come to the conclusion that}

learning are eligible algorithms for the interface I from subsymbolic to symbolic representations.

3.2 Examples for Related Machine Learning Algorithms

In the last years kernel methods became quite popular in machine learning and data mining. It is not the scope of this work to review these methods. For a detailed introduction I refer to textbooks like Bishop’s [1] or Hastie’s [10]. Here, I only comment on the properties of three possible interface algorithms with regard to the interface problem.

A simple but successful clustering technique is k-means clustering [1]. K- means needs one essential parameter: the number of clusters k – that we denote as number of symbols. Each cluster Cj ∈ S with 1 ≤ j ≤ k can be described

by its cluster center cj, the barycenter of the cluster elements. This concept

shows that both clustering and cognition share similar ideas: If the distances between the elements in the data space and the cluster centers are minimal, then clusters of elements should be represented by the same center whilst far-out accumulations of elements belong to different centers. This principle is similar to the idea of semantic distances of mental models. K-Means work as follows. At the beginning it randomly generates k initial cluster centers cj. In order to

minimize the sum of distances, k-means works iteratively in two steps. In the first step each data element xi is assigned to the cluster Cj with minimal distance.

In the next step k-means computes the new cluster centers cj as average of

the data elements that belong to Cj. K-means continues with the cluster center

computation, and so forth. The algorithm ends if the cluster assignment does not change or if the change falls below a threshold value . The process converges, but may get stuck in local optima. K-means allows to specify the number of clusters. If we use k-means as interface algorithm, we can treat k as free parameter that can be optimized with regard to fA. The optimal number of symbols to solve

cognitive tasks may frequently not be known in advance. Perspectives are the number of states in reinforcement learning scenarios or the number of words in language learning scenarios. But also other clustering algorithms may be applied, e.g distance based approaches like DBSCAN that are based on the distances between the data samples [4].

In comparison to clustering algorithms, most dimensionality reduction algorithms maintain the structure of the data space, e.g. neighbored data samples in data space are neighbored on a low-dimensional manifold. A recommendable example is the self-organizing map by Kohonen [11]. Its number of neurons and the learning parameters are eligible free parameters for optimization. In each generation the self-organizing map updates the weights w of a winner neuron and its neighborhood with the help of learning parameters η and a neighborhood parameter h, so that they are pulled into the direction of data sample x The algorithms lead to a mapping from the feature space D to the map. The mapping maintains the topology of the neighborhood: Close data samples in the high-dimensional space lie close together on the map. Whether this property is important for the interface depends on the interpretation of the symbols.

3.3 Optimization Algorithms

When the optimization objectives are clearly specified, and a feedback fA of a

given interface I is available, the choice of an adequate optimization algorithm has to be answered. If no more information is available than the feedback, i.e. no explicitly given functions nor derivatives, we recommend to apply evolutionary algorithms. A comprehensive survey of evolutionary algorithms is given by Eiben [3]. Evolutionary computation comprises stochastic methods for global optimization, i.e. optimization problems with multiple local optima. They are biologically inspired and imitate principles that can be observed in natural evolution like mutation, crossover and selection. If the optimization problem is not supposed to suffer from multiple local optima, deterministic direct search methods like Powell’s conjugate gradient algorithm [14] or similar optimization algorithms for convex optimization may be applied.

In document Proceedings of the KI 2009 Workshop on Complex Cognition (Page 74-76)