3 CONTRIBUTION OF MULTISPECTRAL AND MULTITEMPORAL INFORMATION
3.2. Study area and experimental data set
3.2.1. Study area
We focused our experiences in the Portuguese mainland territory. The country has an area of approximately 89 000 Km2 and ground altitudes raging from 0 to 2000m above sea level.
Portugal is in a transition zone featuring diverse landscapes representing both Mediterranean
and Atlantic climate environments. According to Caetano et al. (2005), in 2000 the agricultural areas occupy 48% of the country, closely followed by forest (38%) and semi-natural areas (9%), while artificial surfaces comprise only 3% of the total land cover.
3.2.2. Land cover classes
The nine land cover classes of the developed nomenclature were defined through the Land Cover Classification System (LCCS) from Food and Agriculture Organization (FAO) (Di Gregorio and Jansen 2000) (Table 3.1). The rationale behind the development of the nomenclature was three-fold: (1) a nomenclature that is well adapted to the type of landscapes existent in regions with characteristics similar to the Portuguese mainland, (2) a nomenclature that is compatible with established ones (e.g. CORINE Land cover, Global Land Cover and the International Geosphere-Biosphere Programme nomenclatures) in order to turn possible the comparison between our results and others using different nomenclatures, and (3) a nomenclature that matches the spatial resolution of used satellite imagery. In addition, its development attended to the suggestion endorsed by Loveland et al.
(2000), i.e. each vegetated land cover class represent relatively homogeneous land cover characteristics (e.g. similar floristic and physiognomic characteristics), that exhibit distinctive phenology (i.e. onset, peak and seasonal duration of greenness) and have comparable levels of relative primary production.
Table 3.1. Land cover classes description, label and respective number of collected samples.
Class Label Description Sample
size Artificial Areas AA Built-up areas and transport network. Non-linear areas of vegetation and bare soils
are exceptional. 47
Rainfed Crops RC Agricultural areas that are not artificially irrigated and consequently do not
present vigorous vegetation during summer. 59
Irrigated Crops IC Agricultural areas irrigated artificially and periodically (during summer-time) that
only present vegetation during the summer. 52
Broadleaved
Forest BF Wooded areas where broadleaved species predominate.
40 Needleleaved
Forest NF Wooded areas where coniferous species predominate.
55 Natural
Grassland NG Natural areas with herbaceous vegetation, namely gramineous species.
16 Shrubland S Natural areas dominated by bushes and other shrubby vegetation that is
characteristic of Mediterranean climate. 18
Barren B Area of limited ability to support life, namely thin soil, sand, or rocks. 27
Water Bodies WB Natural or artificial stretches of water. 40
3.2.3. Earth Observation data
Our study relies on the data acquired by the Moderate Resolution Imaging Spectroradiometer (MODIS), an instrument on board Terra and Aqua satellites from National Aeronautics and Space Administration (NASA). Terra MODIS and Aqua MODIS take between one and two days to cover the entire Earth's surface, with a complete 16-day repeat cycle. Both sensors acquire data in 36 spectral bands, or groups of wavelengths, and their spatial resolution (pixel size at nadir) is 250 m for channels 1 and 2 (0.6 µm - 0.9 µm), 500 m for channels 3 to 7 (0.4 µm - 2.1 µm) and 1000 m for channels 8 to 36 (0.4 µm - 14.4 µm). These channels are calibrated on orbit by a solar diffuser (SD) and a solar diffuser stability monitor (SDSM) system, which convert the Earth surface radiance to radiometrically and geo-located calibrated products for each band (Xiong et al. 2003). Although recent evaluations have reported a geo-location error of 113 m at nadir (Knight et al. 2006), official technical specifications warrant 50 m geo-location accuracy (Wolfe et al. 2002). In both cases, the geo-location is less than half a pixel dimension, and hence acceptable for our multitemporal analysis.
The data acquired by the MODIS sensor is used to generate multiple products at different pre-process stages. In this study we used the MOD09A1 product, a weekly composite of surface reflectance images, freely available from MODIS Data Product website (http://modis.gsfc.nasa.gov). This specific product is an estimate of the surface spectral reflectance imaged at a nominal spatial resolution of 500 m for the first seven bands as it would have been measured at ground level if there were no atmospheric scattering or absorption. The applied correction scheme compensates for the effects of gaseous and aerosol scattering and absorption, for adjacency effects caused by variation of land cover, for Bidirectional Reflectance Distribution Function (BRDF), for coupling effects, and for contamination by thin cirrus (Vermote and Vermeulen 1999). We considered a set of 43 MOD09A1 images for each one of the seven first spectral bands, covering a full year observation period, from February 2000 to January 2001. In addition, from spectral bands B2 (near infrared channel - ρnir), B1 (red channel - ρred) and B3 (blue channel - ρblue), two vegetation indices were also calculated for each date and used as additional band information, namely the Normalized Difference Vegetation Index (NDVI):
(
nir red) (
nir red)
NDVI= ρ −ρ ρ +ρ (3.1)
and the Enhanced Vegetation Index (EVI):
1 nine dimensional space [0,1]7 x [-1,1]2 corresponding, respectively, to the reflectances and to the vegetation indices of a 500 m-by-500 m square area.
3.2.4. Sampling procedure
For each of the nine land cover classes, representative ground reference data were collected from Portuguese mainland territory along year 2000 and used for training and testing purpose of our methodology. Homogeneous sample units, each corresponding to a MODIS pixel area, were selected for each land cover class using the CORINE Land Cover map for 2000 (CLC2000) (Painho and Caetano 2006) as strata. High spatial resolution Earth Observation (EO) data, namely Landsat ETM+ images acquired in 2000 and orthorectified colour infrared aerial photographies from 1995, each covering the whole territory, were used as the base data source to recheck reference land cover classification of sample units.
Reference information for each sample unit was derived by the method described by DeFries et al. (1998), i.e. by visual interpretation of high resolution EO data overlaid with a co-registered 500 m fishnet corresponding to the MODIS data grid. At such a coarse resolution, land cover homogeneity is hardly guaranteed by random sampling because most pixels are likely to contain features from two or more distinct classes. Thus, we deterministically retained only pixels that are representative of pure land cover class occupation. As our intention is not to produce a thematic map based on a new operative classification procedure, it was fundamental to consider empirical classes with typical spectral and temporal specificities that are statistically sound. In the collect process, we endeavored at spreading as much as possible each class samples over the mainland territory, in order to account for possible regional differences, and to prevent geographic correlation due to adjacent pixels (Hammond and Verbyla 1996). Finally, as our objective is to assess the pertinence of intra-annual temporal information for land cover characterization, each sampling unit was selected within geographical sites that did not undergo a drastic land cover change during the study period.
Although the appropriate number of sample units per class that are needed to train a supervised classifier remains open to debate, certain hints or simple heuristics relate the strict minimum to (i) the used algorithm itself, to (ii) the number of input variables, to (iii) the sample selection method, and to (iv) the size and spatial variability of the study area (Huang et al. 2002, Jensen 1996, Mather 2004, Foody and Mathur 2006, Foody et al. 2006).
However, Ho and Basu (2002) asserted that certain classification problems have nonzero Bayes error, no matter the sample size or the feature space dimension.