EXPLORATORY GEOSPATIAL DATA ANALYSIS USING SELF-ORGANIZING MAPS

(1)

EXPLORATORY GEOSPATIAL DATA ANALYSIS

USING SELF-ORGANIZING MAPS

Case Study of Portuguese Mainland Regions

Fernando C. LOURENÇO, Victor S. LOBO and Fernando L. BAÇÃO This paper describes the application of the Self-Organizing Map (SOM) in visual exploration of physical geography data. The main justifications for the application of SOM in this issue is that its stresses local factors and topological order. Public domain thematic maps from Portuguese Environment Institute are used. An adequate geospatial unfolding of SOM is presumed to assist a better representation of geographic phenomena. Some approaches to achive this objective are put forward and experimented such as weighting attributes and samples, and a SOM variant named Geo-SOM. Visualization methods to address the information extraction issue are also suggested. Notwithstanding the vagueness of geographic phenomena, experimental results reveal major patterns that are consistent with reference maps of Portuguese mainland regions.

KEYWORDS

Geographical Information Systems, Exploratory Data Analysis, Self-Organizing Maps.

INTRODUCTION

The rapidly increasing volume, dimensionality and complexity of digital geographic data overwhelm conventional analysis techniques and methods [11,14,24,25]. Here most traditional statistical methods are not adequate because of their high computational burden or very restrictive assumptions on data [7,10,25]. They are generally unsuitable for geographic data analysis largely due to the special characteristics of spatial data [e.g. 22,24]. Therefore new approaches are needed to transform data into information, and ultimately, into knowledge [11,24,25,26]. Data Mining (DM), Knowledge Discovery in Databases (KDD) and Artificial Intelligence (AI) bring important contributions to this issue [24]. Neural networks have, in recent years, proved to be an interesting and promising field [e.g. 26]. However, a shortcoming of neural networks is the difficulty to extract information embedded therein [e.g. 26].

As one of the first steps in the knowledge discovery process, Exploratory Data Analysis (EDA) addresses data not yet understood, classified, summarized or structured. Much like KDD, it aims to uncover interesting structures or patterns in large data sets and provide some means of explaining it [11]. Most of its techniques are graphical in nature [32,33]. Visualization of geographic data (geographic visualization or geo-visualization in computer systems) can leverage human experts’ capabilities on pattern recognition, in a rich environment for formulating hypothesis, theory, and explanations [e.g. 13,39]. Moreover, maps are an effective tool for presenting geospatial data because they provide an abstract graphical model of the world [e.g. 39].

In this study a type of neural network, the Self-Organizing Maps (SOM), is used to perform data clustering and projection [20]. This allows the presentation of multidimensional data in a low-dimensional space, typically two-low-dimensional, assisting the visual exploration of data.

Several proposals and studies for the delineation of geographic regions on mainland Portugal have been undertaken [1,6,15,28]. Their objective is achieved, basically, in a clustering scheme trying to find regions characterized by internal homogeneity while maximizing heterogeneity between regions [e.g. 22]. Nowadays, wide-ranging and public domain data are available that can be used for this purpose [e.g. 17]. This puts a challenge and an opportunity to take advantage of that asset in order to enhance the understanding of our country.

The aim of this study is to demonstrate the effectiveness of SOM algorithm in the task of exploring geospatial data, in the search for interesting patterns in physical geography of Portuguese mainland,

(2)

and, ultimately, to seek empirical evidence or relations with regions as presented by several authors.

This study is focused on exploratory geospatial data analysis with SOM. It emphasizes what is special about geospatial data that needs consideration for SOM application, and methods to extract and present information from SOM output. Several approaches to address these issues are studied, proposed and evaluated.

SEARCHING FOR NATURAL REGIONS IN PHYSICAL GEOGRAPHY

The main goal of this study is related to regional geography which is mainly concerned with the

delineation of zones characterized by internal homogeneity (uniform regions) while maximizing heterogeneity between zones [e.g. 22]. The methodology follows the guidelines of knowledge

discovery process which is stated by Fayyad, Piatetsky-Shapiro and Smyth [11] as “the non-trivial

process of identifying valid, novel, potentially useful and ultimately understandable patterns in data”.

Data selection is based on literature review about Portuguese geography [1,4,6,15,23,29] and is

focused on Physical Geography. The themes used reflect, essentially, land relief and structure,

geographic position, and atmospheric dynamics. Some examples of geographic divisions based on

some or most of these factors are shown in Map 1 and Map 2 [6]. These maps will be used as

references to evaluate the experimental results.

The source for the geospatial data set used in the experiments is the public domain “Atlas do Ambiente” from the Portuguese Environmental Agency [17]. Six themes were considered: elevation, hydrographic basins, lithology, precipitation, solar radiation, and temperature.

(3)

Map 2 - Regions of Lautensach (left) and Albuquerque (right) [6]

In order to produce the data set in the required tabular format for the SOM, the thematic maps

were overlaid and then intersected spatially. Linear scaling (min-max transformation into the [0,1]

interval) is used for data normalization.

SOM ALGORITHM AND ITS APPLICATION TO GEOSPATIAL DATA

The Self-Organizing Maps (SOM) algorithm, also called the Kohonen network, the self-organizing

feature map (SOFM) or the topological map, is an unsupervised neural network algorithm developed by the Finish physicist Teuvo Kohonen [20]. It provides two useful operations in exploratory data analysis: clustering, reducing the amount of data into representative categories, and non-linear projection, aiding in the exploration of proximity relations in the data [18]. The SOM algorithm has

some resemblance with vector quantization algorithms, like kmeans, and is closely related to

principal curves [20]. The great distinction from other vector quantization techniques is that the units are organized on a regular grid and along with the selected unit also its neighbors are updated.

As a result, the SOM performs an ordering of the units [20]. The topology preservation is

accomplished through an ordered mapping in such a way that nearby patterns in the input space are mapped also to nearby units in output space.

Information extraction is usually a bottleneck in neural networks [26,34], and this is no exception in SOM. For this reason, further processing of the SOM is needed. Several methods can be applied, such as k-means clustering or U-mat, described further ahead.

In some experiments of this study, k-means clustering is applied to the SOM. The Davies-Bouldin index is calculated for each clustering and used as a guideline for the best partition [e.g. 30]. Performing this clustering over the SOM instead of the data directly is, for the task at hand, a computationally effective approach [36].

The unified distance matrix methods, U-matrix (U-mat) [34], are a graphical representation of the SOM map structure. In the simplest of these methods for each SOM unit the mean of the dissimilarity measures to its neighbors is computed. Plotting this data on a 2-D map, using a gray or

color scheme, we can visualize a landscape with “walls” (mountains) and “valleys” (plains). The

“walls” separate different classes, and patterns in the same valleys are similar [34]. They provide assistance in identifying cluster structure and borders. Using a color scheme we can envisage the regions because areas close to each other on the U-mat will have similar colors [e.g. 35].

(4)

Geospatial data are characterized by information about position or location, connections with other entities and details of non-spatial characteristics [5]. Besides these, geospatial data have some other unique features when compared with most non-geographic data. Examples of such features are: spatial dependency, the tendency of attributes at some locations in space to be related [e.g. 24,38], spatial heterogeneity, the non-stationarity of most geographic phenomena [e.g. 24], and the complexity of geographic phenomena [e.g. 16].

Most non-geographic data can be easily represented with discrete points without losing important properties. However, this is not the case for most geographic phenomena where features as size, shape, boundaries, direction, and connectivity cannot be properly represented by simple points or lines [e.g. 24]. Furthermore, observations of geographic phenomena are intrinsically imperfect due to error, imprecision and vagueness [9,22,39]. There is a geospatial variation due to the fundamental aspect of nature that occurs at all scales, also perceived by its gradual, fuzzy or vague changes [5].

These features of geospatial data rise some problems in the application of SOM algorithm. Here several approaches have been presented and suggested such as data preprocessing (normalization, attribute weighting) and initialization [2,3]. We furthermore suggest sample weighting and geo-initialization.

The SOM, as a quantization algorithm, tries to adjust a discrete manifold to a discrete data distribution. In fact, the SOM algorithm is basically conceived to deal with 0-D geometry objects - points. These are characterized by a position in space, but no length [e.g. 16]. However, the objects represented by the source data set have a 2-D geometry. This issue was tackled in experiments by sampling the input space in order to obtain the required discrete input data for SOM. Here a choice was made to represent each 2-D polygon by its centroid (a 0-D point). This artificial 0-D point reference provides a summary of the location of the object [e.g. 22]. A shortcoming of this procedure is that we lose valuable topological relations in the data such as size, shape, and neighborhood [24]. To compensate for this loss we propose to weight the data points with their corresponding area values (of source polygons). Other options include a more even geospatial sample density, for instance, generating random extra data samples, mostly in larger areas. These areas would, consequently, get a more balanced representation in SOM. Though more accurate, a major drawback of this procedure would be of greatly enlarging the data set and, consequently, introducing a heavy computational burden.

Common approaches to the initialization of the SOM units use an initial state that is ordered or arbitrary. The former, generally, provides better and faster results [20]. Because in this study we are interested in a geographic “ordering”, it is suggested the use the geo-initialization of SOM. In this case, the initial state is achieved spanning the SOM grid (SOM units) regularly along the geospatial rectangle defined by data location attributes (coordinates x and y). The other components of the SOM units are drawn from the geographically nearest neighbor.

Another issue is how to visualize the unfolding of SOM and extract information from the SOM, both from a geospatial viewpoint. It would be interesting if we could envisage the SOM somehow as a conventional geographical map, thus making full use of the map metaphor [9,22,31,39]. The first problem is dealt with by projecting the SOM grid onto the location plane (input subspace). It thus assists in revealing the SOM geospatial unfolding, making it a useful tool to evaluate the SOM initialization and training. With it, a geographically twisted SOM grid is easily detected. Additionally, this paper shows that by representing U-mat values on that grid make it also useful in exploratory analysis.

Furthermore, to harness the potential of the map metaphor, the geographic map of U-mat is introduced here. This map may assist, at the first glance, the exploratory geospatial analysis. In this matter, a simple classification of areas is proposed based on their most similar or best matching unit (BMU). Therefore, we present several maps where each area is colored according to the U-mat value of its corresponding BMU. For a proper application of this method, it is assumed that the SOM has a smooth and comprehensive geospatial unfolding, and preserves most topologic relations in the input data. Hence, it can be expected that those U-mat values will give a meaningful representation of geospatial relations in input data.

(5)

The following tips are very important in interpreting these maps:

• Exploratory analysis on the map must be done locally.

• Color (U-mat value) reveals how much a region is similar to surrounding areas.

Relations between specific surrounding areas (units) must be investigated through the SOM grid or U-mat. A global interpretation can be achieved projecting the SOM into a color space [19] or k-means clustering of SOM [36]. In these cases, a more extensive analysis may be done but at the expense of missing local and fine detailed patterns.

EXPERIMENTS WITH SOM USING GEOSPATIAL DATA

The simulations were done in Matlab v.6.1 using SOM Toolbox 5 [37]. The “batch train” method is used whenever possible as it is much faster to calculate in Matlab than the normal sequential algorithm, and the results are typically just as good or even better [37]. Except where stated otherwise, the values and parameters used in the procedures are the default ones, e.g. sheet with hexagonal lattice, gaussian neighborhood. Practical advice was followed, concerning the form of the array, hexagonal lattice with oblong form, and wide training radius at the beginning that decreases with the number of epochs [21]. Geo-visualization and further exploration was done using essentially ArcGIS 8.1 software.

In the coloring scheme used along most of the figures, high U-mat values are colored red, and low ones blue. Quantities are generally classified using five natural breaks (steps) in the GIS software. Applying other classification methods can produce slightly different maps and insights.

Preliminary Issues

Before proceeding to the main experimental results, a foreword is needed about some groundwork experiments where initialization methods and sample weighting were evaluated.

Linear initialization (using PCA) of SOM generally produced a geographically tangled SOM grid (example shown Figure 1, left). More satisfactory results were obtained with geo-initialization and additionally weighting samples with the area of each polygon during SOM training. The SOM grid obtained this way has generally a smoother geographic unfolding (example shown in Figure 1, right).

(6)

x (x 105_meter)

y (x 105_meter)

Figure 1 –SOM grid on location plane (twisted, left and smooth, right). Experiments with SOM using the Geospatial Data Set

The geospatial data used in the next sections comprises the previously selected themes: temperature, solar radiation, precipitation, elevation, hydrography and lithology. As hydrographic basins and lithology are categorical variables, they are expanded into 26 and 5 binary variables, respectively. Afterwards each new binary variable was scaled down by the number of classes in its original theme (26 in hydrography, and 5 in lithology). The preprocessing of the other variables was done using min-max normalization to the interval [0,1], and additionally preserving geospatial aspect (x/y ratio approximately 0.48). Including location components, the final data set has an overall dimension of 37 attributes and a size of 33237 data samples.

Geo-initialization is used to startup the SOM. Data samples are weighted with full area values only in standard SOM. Location components (x and y) are weighted each with a value of eight. For lower weighting of these components, the SOM unfolding is not adequate (the lattice is twisted). The SOM grid size has 27x7 units. “Batch training” is used in the standard SOM while in Geo-SOM the sequential training is used. We use gaussian kernel neighborhood function for BMU search. In

Geo-SOM, the geographic first search uses a bubble neighborhood function (radius k=3).

Standard SOM

In this experiment, we weight data samples with their full area value in the SOM training process. Training was done in 21 epochs, and two phases: rough training with neighborhood radius decreasing from 6 to 1.5 and fine tuning with neighborhood radius from decreasing from 1.5 to 1. The resulting scalar component planes, U-mat and SOM grid are shown in Figure 2 and in Figure 3, respectively.

(7)

121000 214000 307000 0.0349 0.137 0.24 33400 284000 535000 11.2 14.6 18.1 141 152 163 583 1300 2030 1.08 4.52 7.95 x y codtemp

codradi codpret gridcode

U-mat Algarve Trás-os-Montes Minho Cord. Central

Figure 2 - Scalar component planes and U-mat for standard SOM.

The relation between U-mat and SOM grid is shown in detail in Figure 3 where labels identify the SOM units. Some regions are labeled on the U-mat.

Minho Trás-os-Montes Coord. Central Estremadura Algarve Alto Alentejo Baixo Alentejo N. Ribatejo S. Ribatejo Beira Litoral Baixo Douro Beira Transmontana y (x 105_m) x (x 105_m)

Figure 3 - Standard SOM: U-mat (left) and grid (right) on location plane. Labels identify SOM units. Some regions are pointed out.

The tendency of SOM not to span completely to the data extents is illustrated in Figure 3 (right).

(8)

down to zero instead of one. However, that may damage the geographical ordering of the units. A drift of bottom corner of SOM grid towards Lisbon region is also observable on that figure.

The geographic map of U-mat is depicted in Map 3 (left). At this point, it is worth to recall the interpretation guidelines for these maps presented before. The high U-mat values (red) indicate regions that have dissimilarity with some neighborhood areas, therefore revealing contrasts. On the other hand, low U-mat values (blue) uncover regions that are quite similar to surrounding areas. Although this map has many small and scattered regions, it reveals patterns that are related with those regions presented by most of the authors.

In Map 3 (right) the best k-means partition of SOM for Davies-Bouldin index is shown. As can be seen there, k-means clustering may reveal major regions but it is also clear that fine details are concealed like Algarve region.

Map 3- U-mat for standard SOM (left), background maps of Gomes [6]; k-means best Davies-Bouldin index clustering (11 regions, right).

As can be seen in Map 3 (left), standard SOM performs well in distinguishing Algarve Litoral from Algarve Montanhoso, Beira Litoral from Beira Alta and Beira Serra. It also depicts well Nordeste Transmontano and Minho. Alentejo may be seen as a large region or may be split in two regions (possibly Alto Alentejo and Baixo Alentejo), as suggested by middle areas having mild U-mat values. Along the western coast, from Beira Litoral to Estremadura, there are no sharp borders, suggesting a smooth transition along those regions. Some interesting details are also revealed as the small region Marvão.

Geo-SOM

The following experiment was done using Geo-SOM with geospatial radius k=3. The training was

made in 31 epochs using the sequential algorithm, in two phases: rough training with standard neighborhood radius decreasing from 6 to 1.5 and fine tuning with neighborhood radius decreasing

from 1.5 to 1. The learning rate α decreases during this training from 0.5 to 0.05. Sample

weighting is not implemented in this sequential algorithm.

(9)

Minho Trás-os-Montes Baixo Douro Coordilheira Central Estremadura Algarve Alentejo Lisbon

Figure 4 - Regions on U-mat for Geo-SOM.

The thematic map of U-mat is shown in Map 4 (left); on the right side the best k-means partition (using Davies-Bouldin index, 9 regions) is shown.

Map 4 - Geo-SOM: U-mat values with all components with background regions of Lautensach [6] (left), and k-means best partition for Davies-Bouldin index (right).

Again, k-means clustering may reveal major regions but is unable to expose fine details such as Algarve region.

In Geo-SOM (see Map 4), it is noteworthy that Algarve Litoral is easily detected as a plain region with borders quite similar to those defined by Lautensach. One can observe sharp borders for Algarve

(10)

Litoral, Estremadura vs Ribatejo, Cordilheira Central vs Beira Baixa/Alta. Geo-SOM also shows up Trás-os-Montes region. The results may suggest a potential division of Alentejo. This SOM variant apparently does not disclose so well some details in the north, like the division between Beira Litoral and Beira Alta (Central) or between Minho and Trás-os-Montes.

DISCUSSION

It has been shown that a proper geospatial enforcing of SOM enhances the modeling of the geographic phenomena. This was addressed in various aspects such as in the initialization of SOM, data preprocessing, or also through variants of SOM algorithm.

In the preprocessing phase of geographic data for SOM input, experiments show that it is important to enforce the geospatial character of data and therefore location weight should be high enough to provide a suitable geospatial unfolding of SOM. However, a too high weight may be a disadvantage by masking other geographic attributes. Maintaining geospatial aspect is also a reasonable matter at this point in SOM. The geo-initialization, whereby SOM units at startup are spanned along the location plane, provides a good start for a geographic unfolding of SOM grid compared with initialization by PCA.

In the need to evaluate SOM geospatial unfolding, the SOM grid projected on the location plane (input subspace) has proven to be a valuable tool. An improper or twisted grid is easily detected. In addition, it is important for exploratory purposes. On the other side, the geographic maps of U-mat assist in revealing local similarities and contrast between nearby areas. Hence, these visualization tools end up as a powerful tool for exploratory data analysis and in the search for regions. However, their effectiveness depends largely on a suitable geospatial unfolding of SOM. Computing U-mat without location components may also help highlight physically similar regions. This is clearer when overall weight of location components is high.

The clustering of SOM with k-means, despite concealing fine details, may be of valuable assistance because it helps render global patterns.

Sample weighting with area value tries to address some problems of the application of 2-D geospatial data to SOM. It provides a geospatially more evenly distributed unfolding at the expense of small areas, some with significance but others just a by-product of geospatial processing. However, location and sample weighting introduce an offset in SOM unfolding that affects its objective function. For example, in standard SOM, the grid is forced to follow larger areas better than smaller ones. Therefore, it is expected that geo-enforcing the SOM will, generally, influence negatively the standard quality measures of SOM such as quantization, and to a lesser extent topographic errors.

Analyzing the overlaid maps of regions, we can observe, using U-mat terms that “mountains” may delimit regions but when they are large enough, they themselves can be interpreted as regions that are characterized by great diversity.

Inspecting the component planes suggests some interesting patterns: some well-delimited areas with extreme values may in fact characterize some regions. For instance:

• low latitude (y) and precipitation, high temperature and radiation can typify Algarve;

• low temperature, and high elevation can typify Cordilheira Central;

• high precipitation, and low solar radiation characterizes Minho.

One could observe the reference maps of regions and attempt to find one that best suits the patterns revealed in the experiments. However, this proved to be a hard and risky task due to the complexity of the geographic maps of U-mat and to the subtleties in the delineation of some reference maps. Although several methods can be found in the literature for map comparison [e.g. 8,27] this subject is out of the scope of this study.

The major patterns that are suggested by most experiments are in accordance with the reference maps of regions. Nonetheless, the task of exploratory data analysis may be beset by the complexity of the presented maps and the need to relate the several visualization tools. The former issue can

(11)

be dealt with by introducing a geospatial post-processing after the training of the SOM. For example, applying a geospatial neighborhood function [5] or increasing the weight of location variables when classifying the areas with their BMU. Another approach would be to establish a smaller size for the SOM grid. This way, one decreases the resolution of the SOM and, therefore, the number of regions (and details) that are possible to represent by the SOM.

CONCLUSIONS

The aim of this study was to show how effective the SOM algorithm can be in exploratory data analysis of geographic data. The emphasis on local factors and topologic preservation properties are the main justifications for SOM application. Through its abstraction process and adequate visualization tools new insights into data can be gained. For empirical assessment, a public domain geospatial data set was used. Meanwhile, its application in SOM raised some questions.

The unique character of geographic phenomena represented by the data set raises several issues that are discussed in this study. These are addressed in methods deployed to visualize the geographic unfolding of the SOM grid and in the geographic maps of U-mat. The former is particularly useful to evaluate the SOM training. Both are of valuable assistance in the exploratory analysis stage. The later provides the geographic groundwork for the search of patterns and regions. The methods introduced rely on procedures to enforce a geospatial unfolding of the SOM that were suggested, notably geo-initialization, location and sample weighting. Standard SOM and a variant, Geo-SOM, were experimented, the later having a sensible geospatial foundation and presenting geospatial smoothening characteristics.

In the outset of this study, there was a plan to identify clear borders for all regions. Experiments confirm, however, that generally there are no crisp borders delimiting most regions. This may be explainable, largely, by the fuzzy character of geographic phenomena. Even so, with the methods introduced, interesting patterns have been revealed by the SOM, most of which with agreement with presented reference maps of regions.

It is worth mentioning, as a major shortcoming in computer simulation, the lack of interoperability and integration between different software packages [e.g. 12,31]. To overcome this, much hard work has to be done, for example, exporting data from the SOM Toolbox, formatting data on another package and importing it, again, to another, the GIS software. Unfortunately, this loose coupling between software is a major drawback in interaction, which is crucial in exploratory data visualization and analysis. This research group has submitted a funding proposal to develop software that helps this integration.

The discrete nature of SOM algorithm reveals shortcomings in its application to non-discrete entities with richer geometry and topology, such as 2-D data in GIS. One approach, here suggested, would be to increase the sampling rate when discretizing those phenomena. Another would be conceiving a topologically richer SOM through its implementation, or tight integration, in GIS platform. For instance, the SOM discrete lattice could be represented by a surface, the geographic map, embedded partially in the geographic input space.

REFERENCES

1. Albuquerque, J. P. M., Zonas fito-climáticas e regiões de Portugal. Boletim da Sociedade Broteriana (separata), XIX, 1945.

2. Bação, F., et al., Geo-self-organizing map for building and exploring homogeneous regions, in Lecture Notes in Computer Science, Berlin, 2004, Springer, pp. 22-37.

3. Bação, F., et al., Applications of different self-organizing map variants to geographic information science problems. In Self-Organizing Maps: Applications in Geographic Information Science, John Wiley & Sons, New York, in press.

4. Birot, P., Portugal, Livros Horizonte, Lisboa, 1950.

(12)

Clarendon Press, 6th_{ed., Oxford, 1990.}

6. Caldas, E. C., et al., Regiões homogéneas no continente português: primeiro ensaio de delimitação, Instituto Nacional de Investigação Industrial, Fundação Caloust Gulbenkian, Lisboa, 1966.

7. Chambers, J., et al., Graphical methods for data analysis, CRC, Boca Raton, 1998. 8. Congalton, R. G., et al., Assessing the accuracy of remotely sensed data: principles and

practices, CRC, Boca Raton, 1999.

9. Duckham, M., et al., A formal ontological approach to imperfection in geographic information. Computer, Environment and Urban Systems, 25, 2001, pp. 89-103.

10. Fayyad, U., et al., Eds., Information visualization in data mining and knowledge discovery, Academic Press, San Diego, 2002.

11. Fayyad, U., et al., From data mining to knowledge discovery in databases. AI Magazine, 17, 1996, pp. 37-54.

12. Fischer, M., et al., Spatial analytical perspectives on GIS, Taylor & Francis, London, 1996. 13. Gahegan, M., Visual exploration in geography. In Geographic Data Mining and Knowledge

Discovery, pp. 260-287, Taylor & Francis, London, 2001.

14. Gahegan, M. N., Visualisation strategies for exploratory spatial analysis, in Proceedings Third International Conference on GIS and Environmental Modelling, Santa Barbara, 1996, NCGIA. 15. Girão, A. d. A., Esboço de uma carta regional de Portugal, Imprensa da Universidade,

Coimbra, 1933.

16. Goodchild, M. F., et al., Eds., NCGIA core curriculum in GIS, National Center for Geographic Information and Analysis, Santa Barbara CA, 1990.

17. Instituto do Ambiente. Atlas do ambiente, 2003, (URL: http://elara.iambiente.pt/, accession date: 2003.09.18).

18. Kaski, S., et al., Exploratory data analysis by the self-organizing map: structures of welfare and poverty in the world. In Neural Networks in Financial Engineering, pp. 498-507, World Scientific, Singapore, 1996.

19. Kaski, S., et al., Coloring that reveals cluster structures in multivariate data. Australian Journal of Intelligent Processing Systems, 6, 2000, pp. 82-88.

20. Kohonen, T., Self-organizing maps, Springer-Verlag, 3rd_{ed., Berlin, 2001.}

21. Kohonen, T., et al., SOM_PAK: the self-organizing map program package. Technical report A31, Helsinki University of Technology, Espoo, 1996.

22. Longley, P., et al., Geographic information systems and science, John Wiley & Sons, Chichester, 2001.

23. Medeiros, C. A., Geografia de Portugal: ambiente natural e ocupação humana uma introdução, Editorial Estampa, 2nd_{ed., Lisboa, 1991.}

24. Miller, H., et al., Eds., Geographic data mining, Taylor & Francis, New York, 2001.

25. Openshaw, S., Some suggestions concerning the development of artificial intelligence tools for spatial modelling and analysis in GIS. In Geographic Information Systems, Spatial Modelling and Policy Evaluation, pp. 17-33, Springer-Verlag, Berlin, 1993.

(13)

26. Openshaw, S., et al., Artificial intelligence in geography, John Wiley & Sons, Chichester, 1997.

27. Pontius, R. G., Jr, et al., Components of agreement in categorical maps at multiple resolutions. In Remote Sensing and GIS Accuracy Assessment, pp. 233-251, CRC Press, Boca Raton, 2004.

28. Ribeiro, O., Introdução ao estudo da geografia regional, Edições João Sá da Costa, Lisboa, 1987.

29. Ribeiro, O., et al., Geografia de Portugal: a posição geográfica e o território, Edições João Sá da Costa, Lisboa, 1987.

30. Siponen, M., et al., An approach to automated interpretation of SOM, in Workshop on Self-Organizing Maps, WSOM2001, Lincoln, 2001, Springer, pp. 89-94.

31. Skupin, A., et al., Attribute space visualization of demographic change, in Proceedings of the Eleventh ACM International Symposium on Advances in Geographic Information Systems, New York, 2003, ACM Press, pp. 56-62.

32. Tufte, E. R., Envisioning information, Graphics Press, 6th_{ed., Cheshire, 1998.}

33. Tukey, J., Exploratory data analysis, Addison-Wesley, Reading, 1977.

34. Ultsch, A., et al., Knowledge extraction from artificial neural networks and applications, in Transputer-Anwender-Treffen / World-Transputer-Congress, Aachen, 1993, Springer Verlag, pp. 194-203.

35. Vesanto, J., SOM-based data visualization methods. IDA, 3, 1999, pp. 111-126. 36. Vesanto, J., et al., Clustering of the self-organizing map. IEEE Transactions on Neural

Networks, 11, 2000, pp. 586-600.

37. Vesanto, J., et al., SOM toolbox for Matlab 5. A57, Helsinki University of Technology, Espoo, 2000.

38. Wong, D., Aggregation effects in geo-referenced data. In Practical Handbook of Spatial Statistics, pp. 83-106, CRC Press, Boca Raton, 1996.

39. Worboys, M., et al., GIS: a computing perspective, CRC Press, 2nd_{ed., Boca Raton, 2004.}

AUTHORS INFORMATION Fernando C. LOURENÇO g2001167@isegi.unl.pt PT Comunicações Victor S. LOBO vlobo@isegi.unl.pt Escola Naval Fernando L. BAÇÃO bacao@isegi.unl.pt ISEGI - UNL