The self organizing maps visualisation - The clustering visualisation

5.5 The clustering visualisation

5.5.5 The self organizing maps visualisation

The self organizing maps visualisation demonstrates how self organising maps (SOM) cluster genes. SOMs cluster a set of genes by stretching a grid of units (e.g. a line, an array, a cube, a parallelepiped) to fit the gene set. The grid is fitted to the set of genes in such a way that the relationship between the units corresponds to the relationships in the set of genes. Thus, the grid becomes a map over the set of genes. To achieve this, the units of the grid is organised in neighbourhoods, and when a unit is moved, the unit pulls the neighbouring units with it.

The BioTeach implementation of SOMs allows the users to choose between a line grid and an array grid. The line grid is composed of ⅓ as many units as there are genes in the set of genes. Each unit in the line grid have a neighbour in front and one behind, except the first, which only has one neighbour in front, and the last, which only has one neighbour behind. The array grid is initially composed of ⅓ as many units as there are genes in the gene set, but the number is adjusted so that the array always is a full rectangle or square. Each unit in the array grid, except the corner units and the units along the side of the square or rectangle, has four neighbours, the unit above, below, to the left, and to the right. The corner units only have two neighbours, the unit below or above, and the unit to the left or to the right. The units along the horizontal edges have three neighbours each: the unit above or below, and the units to the left and the right. The units along the vertical edges also have three neighbouring units each: the unit above and below, and the unit to the left or to the right.

During the stretching process, each gene moves the closest unit 5% of the distance between the gene and the unit, while the neighbours of a unit are moved 2% of that distance.

A visualisation of how SOMs work is started by selecting SOM and either the line option, or the grid option (i.e. the array grid) from the lists in the menu, and then pressing the automated clustering-button, the stepwise clustering- button, or the epochwise clustering button. The epochwise clustering presents the visualisation in epochs, that is, a full iteration over the set of genes is performed before any of the units are moved. In comparison, the stepwise clustering moves a unit at each step in the iteration. It is not recommended to select the stepwise clustering because the visualisation is implemented to do

Chapter 5 The BioTeach system

next-button has to be pressed 50 x 3000 = 15000 times to complete the visualisation.). Thus, the epochwise visualisation is the better alternative if the users want to step through the visualisation.

The visualisation of both the line grid variant and the array grid variant starts with the units of the grid distributed randomly around the origin of the coordinate system (fig. 5.28).

Fig. 5.28: The beginning of a SOM visualisation in which the units of the array grid

has been distributed around the origin of the coordinate system.

The visualisation proceeds to iterate over the set of genes and stretch the grid to gradually fit the set of genes (fig. 5.29).

Chapter 5 The BioTeach system

After 3000 iterations over the set of genes, the visualisation is completed and the grid has been fitted to the set of genes (fig. 5.30).

The SOMs work best on large datasets (e.g. thousands of genes), and since the coordinate system of the BioTeach system is too small to accommodate such a number of genes, it is difficult to provide an exact visualisation of how SOMs are constructed. The best results are produced by a visualisation that uses the predefined example with the genes distributed in four clusters. Although the visualisation is not perfect, it should provide a presentation that is good enough to enable the users to understand how a SOM is constructed.

Chapter 6 Hypothetical exercise examples

6.1 Introduction

Although web-technology offers possibilities for designing exercises that comment the answers that are given, there are obvious limitations to such exercises. Firstly, as it is difficult to design computer programs which are able to assess the quality of answers that are based on creative thinking, such exercises should have a definite answer. Exercises that require discussion and creative thinking could, of course, be included in a web-based learning environment, but the answers to such exercises should be forwarded to the lecturer, or to his or hers assistants, for proper evaluation instead of being evaluated by a computer program. Secondly, there should be a limited number of possible incorrect answers to such exercises. The reason is more or less the same as with exercises that has an indefinite answer; it is difficult to anticipate all possible answers, and therefore difficult to provide appropriate comments. One approach that meets these limitations is multiple choice exercises. The Needleman-Wunsch exercise discussed in the previous chapter is another possible approach, and a third possible approach is the Sourcer’s Apprentice approach discussed in chapter 2.

Chapter 6 Hypothetical exercise examples

This chapter discusses how multiple choice exercises and the Sourcer’s Apprentice approach could be used in a web-based learning environment for bioinformatics.

In document Developing an interactive webbased learning. environment for bioinformatics. Master thesis. Daniel Løkken Rustad UNIVERSITY OF OSLO (Page 85-90)