Object-based classification of remote sensing data
for change detection
Volker Walter*
Institute for Photogrammetry, University of Stuttgart, Geschwister-Scholl-Str. 24 D, Stuttgart D-70174, Germany Received 31 January 2003; accepted 26 September 2003
Abstract
In this paper, a change detection approach based on an object-based classification of remote sensing data is introduced. The approach classifies not single pixels but groups of pixels that represent already existing objects in a GIS database. The approach is based on a supervised maximum likelihood classification. The multispectral bands grouped by objects and very different measures that can be derived from multispectral bands represent then-dimensional feature space for the classification. The training areas are derived automatically from the geographical information system (GIS) database.
After an introduction into the general approach, different input channels for the classification are defined and discussed. The results of a test on two test areas are presented. Afterwards, further measures, which can improve the result of the classification and enable the distinction between more land-use classes than with the introduced approach, are presented.
D2003 Elsevier B.V. All rights reserved.
Keywords:change detection; classification; object-oriented image analysis; data fusion
1. Introduction
In Walter and Fritsch (2000), a concept for the automatic revision of geographical information sys-tem (GIS) databases using multispectral remote sens-ing data was introduced. This approach can be subdivided into two steps (seeFig. 1). In a first step, remote sensing data are classified with a supervised maximum likelihood classification into different land-use classes. The training areas are derived from an already existing GIS database in order to avoid the time-consuming task of manual acquisition. This can be done if it is assumed that the number of changes in
the real world is very small compared with the number of all GIS objects in the database. This assumption is justified because we want to realise update cycles in the range of several months.
In a second step, the classified remote sensing data have to be matched with the existing GIS objects in order to find those objects where a change occurred, or which were collected wrongly. We solved this task by measuring per object the percentage, homogeneity, and form of the pixels, which are classified to the same object class as the respective object stored in the database(Walter, 2000). All objects are classified into the classesfully verified,partly verified, andnot found
by using thresholds that can be defined interactively by the user.
The problem of using thresholds is that they are data-dependent. For example, the percentage of veg-0924-2716/$ - see front matterD2003 Elsevier B.V. All rights reserved.
doi:10.1016/j.isprsjprs.2003.09.007
* Tel.: +49-711-121-4091; fax: +49-711-121-3297.
E-mail address:[email protected] (V. Walter).
etation pixels varies significantly between data that are captured in summer or in winter. Other influencing factors are light and weather conditions, soil type, or daytime. Therefore, we cannot use the same thresh-olds for different datasets. In order to avoid the problem of defining data-dependent thresholds, we introduce an object-based supervised classification approach. The object-based classification works in the same way as a pixel-based classification (see Fig. 2), with the difference that we do not classify each pixel but combine all pixels of each object and classify them together. Again, the training areas for the classification of the objects are derived from the existing database in order to avoid a time-consuming manual acquisition.
In a ‘‘normal’’ classification, the greyscale values of each pixel in different multispectral channels and possibly some other preprocessed texture channels are used as input. For the classification of groups of
pixels, we have to define new measures that can be very simple (e.g., the mean grey value of all pixels of an object in a specific channel) but also very complex, like measures that describe the form of an object. This approach is very flexible because it can combine very different measures for describing an object. We can even use the result of a pixel-based classification and count for each object the percentage of pixels that are classified to a specific land-use class.
Because the result of the approach is a classifica-tion into the most likely class, the problematic part of matching is now replaced by a single comparison of the classification result with the GIS database without using any thresholds.
1.1. Related work
This kind of approach is an object-oriented image analysis that is also successfully applied to other Fig. 1. Pixel-based classification approach.
problems. A good overview of different approaches can be found in Blaschke et al. (2000). These approaches can be subdivided into approaches that use existing GIS data to superimpose it on an image (per-field or per-parcel classification), and approaches that use object-oriented classification rules without any GIS input. Approaches that use existing GIS data are not very widely used today. InAplin et al. (1999), an example for a per-field classification approach is introduced, which first classifies the image into different land-use classes. Afterwards, the fields (which represent forest parcels from a GIS database) are subdivided into different classes, depending on the classification result, by using thresholds. The main difference of existing approaches compared
with our approach is that no thresholds are used in our approach.
2. Object-based classification
2.1. Input data
The following tests were carried out with ATKIS datasets. ATKIS is the German national topographic and cartographic database, and captures the landscape in the scale of 1:25,000 (AdV, 1988). In Walter (1999), it was shown that a spatial resolution of at least 2 m is needed to update data in the scale of 1:25,000. The remote sensing data were captured with Fig. 2. Differences between object-based and pixel-based classification.
the DPA system, which is an optical airborne digital camera(Hahn et al., 1996). The original resolution of 0.5 m was resampled to a resolution of 2 m. The DPA system has four multispectral channels [blue 440 – 525 nm, green 520 – 600 nm, red 610 – 685 nm, near-infrared (NIR) 770 – 890 nm].
2.2. Classification classes
Currently, 63 different object classes are collected in ATKIS. There are a lot of object classes that can have very similar appearances in an image of 2 m pixel size (e.g., industrial areas, residential areas, or areas of mixed use). Therefore, we do not use 63 land-use classes for the classification but subdivide all object classes into the five land-use classes: water,
forest,settlement,greenland, androads. The land-use
classroadsis only used in the first step in the process for the pixel-based classification. Because of the linear shape, roads consist of many mixed pixels in a resolution of 2 m and have to be checked with other techniques (see Walter, 1998).
2.3. Input channels
Like in a pixel-based classification, we can use all spectral bands as input channels. The difference is that in the pixel-based classification, each pixel is classi-fied separately, whereas in the object-based classifi-cation, all pixels that belong to one GIS object are grouped together. In order to analyse the spectral behaviour of objects, we calculate the mean grey value of each channel for all GIS objects. Fig. 3 shows as an example the original input data (b) and
the mean RGB (red green blue) value (a) of each GIS object. The result of the pixel grouping is like a smoothing of the data. The spectral behaviour of the objects is similar to the typical spectral behaviour of the pixels. For example, forest areas are represented in the green channel by dark pixel/objects, whereas settlements are represented by bright pixel/objects.
This behaviour can be also seen in Fig. 4. The scatterplots show the distribution of (a) the grey values of settlement and forest pixels compared with the distribution of (b) the mean grey value of settlement and forest objects in the channels red and NIR. It can be seen that the behaviour is similar but the separation of the two classes becomes blurred because of the smoothing effect. In the object-based classification, all multispectral bands of the DPA camera system (blue, green, red, and NIR) are used as input channels.
Different land-use classes cannot be distinguished only by their spectral behaviour but also by their different textures. Texture operators transform input images in such a way that the texture is coded in grey values. In our approach, we use a texture operator based on a co-occurrence matrix that measures the contrast in a 55 pixel window. Fig. 5 shows the used texture operator in an example. The input image is shown in Fig. 5a, the texture (calculated from the blue band) inFig. 5b, and the average object textures in Fig. 5c. Settlements are represented with dark pixels, greenlands with bright pixels, and forests with middle grey pixels.
The variance of the grey values of the pixels of an object is also a good indicator of the roughness of a texture.Fig. 6shows the calculated mean variance in the blue band for all objects. Settlement objects have
high variance, greenland objects have middle variance, and forest objects have low variance.Fig. 7shows the behaviour of the variance in the different bands: blue, green, red, and NIR. The best discrimination between land-use classes using the variance can be seen in the blue band. In the NIR band, all land-use classes have a similar distribution, which makes discrimination in this band impossible.
Vegetation indices are very often used in pixel-based classification as an input channel to improve
the classification result. They are based on the spectral behaviour of chlorophyll, which absorbs red light and reflects NIR light. In our approach, we employ the most widely usednormalised difference(Campbell, 1987): VI¼IRR
IRþR ð1Þ
Fig. 8a shows the calculated vegetation index for pixels and Fig. 8b for objects. It can be seen that Fig. 5. (a) Input image, (b) texture blue band, and (c) average object texture.
settlements are represented typically by dark areas, whereas forests are represented mostly by bright areas. The classification of greenlands is difficult because they can be represented by very bright areas (e.g., fields with a high amount of vegetation) as well as by very dark areas (e.g., fields shortly after the harvest).
All so far defined input channels are also used in ‘‘normal’’ pixel-based classification. In object-based classification, it is possible to add further input channels, which do not describe directly spectral or textural characteristics. For example, we can use the result of a pixel-based classification and count the percentage of pixels that are classified to a specific land-use class. This evaluation is shown inFig. 9. The input image is shown in Fig. 9a and the pixel-based classification result inFig. 9b. Fig. 9cshows for each object the percentage of pixels that are classified to the land-use class forest. White colour represents 100% and black colour represents 0%. In Fig. 9b and c, it can be seen that forest is a land-use class that can be classified with high accuracy in pixel-based as well as object-based classifications.Fig. 9dshows the percentage of settlement pixels. Because of the high resolution (2 m) of the data, settlements cannot be detected as homogenous areas but they are split into different land-use classes depending on what the pixels are actually representing. Therefore, settlement objects contain typically only 50 – 70% settlement pixels in 2-m resolution images. This can be also seen inFig. 9e, which shows the percentage of greenland pixels. Whereas greenlands contain up to 100% green-land pixels, it can be seen that, in settlement areas, pixels are also classified as greenlands.
An interesting visualisation of the feature space of the object-based classification can be made with the combination of three object-based evaluations of the pixel-based classification. In Fig. 10, the percentage of settlement pixels is assigned to the red band, the percentage of forest pixels to the green band, and the percentage of greenland pixels to the blue band of an RGB image. The combination of these three bands shows that the pixel-based classification of forests and greenlands is very reliable, which can be seen on the bright green and blue colour of the corresponding objects. Settlement areas in contrast cannot be classi-fied as homogenous areas. Therefore, settlement objects are represented in a reddish colour that can be brownish or purple.
3. Classification results
The approach was tested on two test areas (16 and 9.1 km2), which were acquired at different dates with a total of 951 objects (194 forests, 252 greenlands, 497 settlements, and 8 water objects). The input channels were:
mean grey value blue band mean grey value green band mean grey value red band mean grey value NIR band mean grey value vegetation index mean grey value texture from blue band variance blue band
variance green band variance red band Fig. 6. Mean variance of GIS objects in blue band.
variance NIR band variance vegetation index variance texture
percentage forest pixel percentage greenland pixel percentage settlement pixel percentage water pixel.
The input channels span a 16-dimensional feature space. All objects of the test areas are used as training objects for the classification. That means that those objects are also training objects that are wrong in the database. In a manual revision, we compared the GIS data with the images. The number of objects that were not collected correctly, or where it was not possible to decide if they are collected correctly without further information sources is 63, which is more than 6% of all objects. The average percentage of changes in topographic maps in western Europe per year are 6.4% in scale 1:50,000, 7.4% in scale 1:25,000 and 8% in scale 1:1,000,000(Konecny, 1996). Therefore, the approach is robust enough if we want to update the GIS database in 1-year cycles.
Fig. 11a shows the GIS data and Fig. 11b shows the result of the object-based classification on a part of one test area. Altogether, 82 objects (which are 8.6% of all objects) were classified into a different land-use class than the one assigned to them in the GIS database.
These objects were subdivided manually into three classes. The first class contains all objects where a change in the landscape has happened and an update in the GIS database has to be done. In this class, there are 37 objects (45%). The second class contains all objects where it is not clear if the GIS objects were collected correctly. Higher-resolution data or some-times even field inspections are needed to decide if the GIS database has to be updated or not. In this class, there are 26 objects (31%). The third class contains all objects where the result of the classification is incor-rect. In this class, there are 19 objects (23%).
4. Further work
The approach subdivides all objects into the classes
water,forest, settlement, and greenland. This can be refined if more object characteristics are evaluated. In Fig. 7. Object variance in different bands (x-axis, variance; y-axis,
the following, we suggest three possible extensions of the approach.
4.1. Additional use of laser data
InHaala and Walter (1999), it was shown that the result of a pixel-based classification can be improved significantly by the combined use of multispectral and laser data.Fig. 12shows a pixel-based classification result of a CIR (colored infrared) image with (b) and without (c) the use of laser data as an additional channel. The laser data improve the classification result because they have a complementary ‘‘behav-iour’’ to the multispectral data. With laser data, the classesgreenlandandroadcan be separated very well from the classes forestand settlementbecause of the different heights of the pixels above the ground, whereas in multispectral data, the classes greenland
andforestcan be separated very well from the classes
roadsandsettlementbecause of the strongly different
percentages of chlorophyll. The four input channels, which were calculated from the result of the pixel-based classification (percentage forest pixels, percent-age greenland pixel, percentpercent-age settlement pixels, and percentage water pixels), are the channels with the highest amount of influence for the object-based classification. Therefore, the object-based classifica-tion should also be improved by the combined use of multispectral and laser data.
With laser data, further input channels can be calculated like slope, average object height, average object slope, etc. With high-density laser data, it could be possible to distinguish, for example, between residential areas and industrial areas. Fig. 13 shows a laser profile (1 m raster width) of a residential area (a) and an industrial area (b). In residential areas, there are typically houses with sloped roofs and a lot of vegetation between the houses, whereas in industrial areas, there are buildings with flat roofs and less vegetation. This characteristic can be described by a Fig. 8. Vegetation index for (a) single pixels and (b) objects.
two-dimensional evaluation of the slope directions of each object and could be also useful to distinguish between different types of vegetation.
The fusion of data from different sensors for image segmentation is a relatively new field (Pohl and van Genderen, 1998). The general aim is to increase the information content in order to make the segmentation easier. Instead of laser data, it could be also possible to make a fusion with SAR data (e.g., see Dupas, 2000).
4.2. More texture measures
At the moment, we use a co-occurrence matrix, mean variance, and mean contrast to describe the texture of objects. These texture measures can be also used in pixel-based classification by measuring the variance and contrast of each pixel in an nn
window. The problem of a window with a fixed size is that mixed pixels at the object borders are classified very often to a wrong land-use class. The larger is the window, the more pixels will be classified wrongly. This problem does not appear in object-based classi-fication because we do not evaluate a window with a fixed size but use the existing object geometry (in order not to use mixed pixels at the object boarder, a buffer is used and border pixels are removed). There-fore, we suggest using more texture measures.Fig. 14 shows an example of a possible evaluation of the texture. The images are processed with a Sobel operator. Typically, farmland objects contain many edges with one main edge direction (a), whereas in forest objects, the direction of the edges is equally distributed (b) and in settlement objects, several main directions can be found (c). Other texture measures could be, for example, the average length or contrast of the edges. However, several tests have to be performed in order to prove these ideas.
4.3. Use of multitemporal data
The main reason that the approach classifies objects into a wrong class is that in practice, the
Fig. 9. Percentage right classified pixel. (a) Input image, (b) pixel-based classification result, (c) percentage right classified forest pixels, (d) percentage right classified settlement pixels, (e) percentage right classified greenland pixels.
appearance of objects can be very inhomogeneous. If, for example, a settlement object contains large areas of greenland but only few pixels that represent a house or a road, it will be classified as greenland and not as settlement. The object will be marked as an updated object and an operator has to check the object each time the data are revised because the approach will classify the object every time as greenland.
A solution for this problem is to store all param-eters of the n-dimensional feature space (mean grey values, mean variance, etc.) of an object when it is checked for the first time. If, then, later the object is marked again as an update, the program can measure the distance of the object in the current and the earlier stored feature space. If the distance is under a specific threshold, it can be assumed that the object is still the same and therefore does not have to be updated.
5. Conclusion
The basic idea of the approach is that image interpretation is not based only on the interpretation
of single pixels but on whole object structures. There-fore, we do not classify only single pixels but groups of pixels that represent already existing objects in a GIS database. Each object is described by an n -dimensional feature vector and classified to the most likely class based on a supervised maximum likeli-hood classification. The object-based classification needs no tuning parameters like user-defined thresh-olds. It works fully automatically because all infor-mation for the classification is derived from automatically generated training areas. The result is not only a change detection but also a classification into the most likely land-use class.
The results show that approximately 8.6% of all objects (82 objects from 951) are marked as changes. From these 82 objects, 45% are real changes, 31% are potential changes, and 23% are wrongly classified. That means that the amount of interactive checking of the data can be decreased significantly. On the other hand, we have to ask if the object-based classification finds all changes. A change in the landscape can only be detected if it affects a large part of an object because the object-based classification uses the exist-Fig. 10. Visualisation of the feature space of the object-based classification.
Fig. 12. (a) Input image, (b) classification with multispectral data, and (c) classification with multispectral and laser data. Fig. 11. (a) GIS data and (b) result of the classification.
ing object geometry. If, for example, a forest object has a size of 5000 m2and in that forest object a small settlement area with 200 m2 is built up, then this approach will fail.
Further techniques have to be developed in order to cover this problem. Because forest areas can be classified very accurately in pixel-based classification, it could be additionally tested whether there are large areas in a forest object that are classified to another
land-use class. The same approach could be used for water areas because water is also a land-use class that can be classified very accurately in pixel-based clas-sification. More difficult is the situation for the land-use classes greenland and settlement, which have typically an inhomogeneous appearance in a pixel-based classification. Here, we suggest using a multi-scale approach to make additional verification of the objects (e.g., seeHeipke and Straub, 1999).
Fig. 13. Laser profiles of (a) a residential and (b) an industrial area.
Up to now, we can only distinguish between the land-use classes forest, settlement, greenland, and
water. This can be refined if more object istics are evaluated. Some possible object character-istics are defined in this paper and have to be tested in future work.
References
Aplin, P., Atkinson, P., Curran, P., 1999. Per-field classification of landuse using the forthcoming very fine resolution satellite sen-sors: problems and potential solutions. In: Atkinson, P., Tate, N. (Eds.), Advances in Remote Sensing and GIS Analysis. Wiley, Chichester, pp. 219 – 239.
Arbeitsgemeinschaft der Vermessungsverwaltungen der La¨nder der Bundesrepublik Deutschland (AdV), 1988. Amtlich Topogra-phisches-Kartographisches Informationssystem (ATKIS). Land-esvermessungsamt Nordrhein-Westfalen, Bonn.
Blaschke, T., Lang, S., Lorup, E., Strobl, J., Zeil, P., 2000. Object-oriented image processing in an integrated GIS/remote sensing environment and perspectives for environmental applications. In: Cremers, A., Greve, K. (Eds.), Environmental Information for Planning, Politics and the Public, vol. II. Metropolis-Verlag, Marburg, pp. 555 – 570.
Campbell, J.B., 1987. Introduction into Remote Sensing. The Guildford Press, New York.
Dupas, C.A., 2000. SAR and LANDSAT TM image fusion for land cover classification in the Brazilian Atlantic Forest Domain. International Archives for Photogrammetry and Remote Sensing XXXIII (Part B1), 96 – 103.
Haala, N., Walter, V., 1999. Classification of urban
environ-ments using LIDAR and color aerial imagery. International Archives for Photogrammetry and Remote Sensing XXXII (Part 7-4-3W6), 76 – 82.
Hahn, M., Stallmann, D., Staetter, C., 1996. The DPA-sensor system for topographic and thematic mapping. International Archives of Photogrammetry and Remote Sensing XXXI (Part B2), 141 – 146.
Heipke, C., Straub, B.-M., 1999. Relations between multi scale imagery and GIS aggregation levels for the automatic extrac-tion of vegetaextrac-tion areas. Proceedings of the ISPRS Joint Work-shop on ‘‘Sensors and Mapping from Space’’, Hannover. On CD-ROM.
Konecny, G., 1996. Hochauflo¨sende Fernerkundungssensoren fu¨r kartographische Anwendungen in Entwicklungsla¨nder. ZPF 64 (2), 39 – 51.
Pohl, C., van Genderen, J., 1998. Multisensor image fusion in remote sensing: concepts, methods and applications. Interna-tional Journal on Remote Sensing 19 (5), 823 – 864.
Walter, V., 1998. Automatic classification of remote sensing data for GIS database revision. International Archives for Photo-grammetry and Remote Sensing XXXII (Part 4), 641 – 648. Walter, V., 1999. Comparison of the potential of different sensors
for an automatic approach for change detection in GIS data-bases. Lecture Notes in Computer Science, Integrated Spatial Databases: Digital Images and GIS, International Workshop ISD ’99. Springer, Heidelberg, pp. 47 – 63.
Walter, V., 2000. Automatic change detection in GIS databases based on classification of multispectral data. International Archives of Photogrammetry and Remote Sensing XXXIII (Part B4), 1138 – 1145.
Walter, V., Fritsch, D., 2000. Automatic verification of GIS data using high resolution multispectral data. International Archives of Photogrammetry and Remote Sensing XXXII (Part 3/1), 485 – 489.