Chapter 2. Research issues
2.2 Continental Forest Measurement in Australia
2.2.3 Utilising data within integrated sampling schemes
Sampling strategies overview
It is important to understand the strengths and limitations of different sampling schemes in order to have confidence in the collected data, and to provide avenues of query to test assumptions of representativeness at multiple scales. The following section provides a brief outline of common sampling designs, and a summary of the use of field data for remote sensing calibration within integrated sampling schemes. The use of remotely sensed data for forest assessment is described in Section 2.3.
Permanent plot-based inventory systems utilising representative sampling are not new to forest assessment, and provide the most direct and effective measurement of trends in forest change and tree growth (Norman et al., 2003). Trends have been estimated either through comparison of successive aggregate values from single-measure temporary plots, aggregation of
compared successive measures of individual permanent plots, or through a hybrid of these approaches, such as sampling with partial replacement (Scott, 1998). These inventory systems are widely utilised throughout the world, with Canada, New Zealand, Scandinavia and the United States of America (USA) all having similar forest assessment, management and implementation issues (Norman et al., 2003). For example, these countries have extensive areas of relatively undisturbed natural forests over which the accessibility, level of existing knowledge and management intervention is relatively limited. Inventory sampling used by these countries is based on simple and flexible (though less efficient) systematic grids using permanent field plots with limited pre-stratification (or none at all) (Norman et al., 2003). Systems are commonly two-or three stage (or phase) incorporating remotely sensed data from API or satellite based remote sensing. For example, the USA uses a three phase system, where aerial photos (and increasingly satellite imagery) are used on a one kilometre grid in the first phase to stratify locations to place field plots (phase 2 and 3 plots), and to determine expansion factors for strata (e.g., forest, non forest) (Smith, 2002).
New Zealand (Coomes et al., 2002) and Canada (Wulder et al., 2004) employ similar strategies but rely more heavily on remotely sensed data to provide some of the required forest attributes. New methodologies are continually being developed, for example in Canada the Forest Research Partnership developed Enhanced Inventory Project as well as the Earth Observation for Sustainable Development of Forests, which have the main objectives of testing and evaluating new technology to develop an enhanced forest resource inventory to replace the current aerial photographic and ground sampling approach. Key components of the project were the integration of airborne LiDAR, multi-band orthophotography and other remotely sensed data (e.g. hyperspectral, SAR) to generate digital terrain models, canopy surfaces, stand variables by species, stand level diameter distributions, and an increased understanding of the LiDAR data collection variables on estimation of forest variables (Natural Resources Canada, 2007).
In Scandinavian countries, forest inventory is well advanced, with many countries utilising private companies for resource assessment, for example FORAN Remote Sensing
( use either systematic field plots and/or aerial photography to delineate stands and carry out stand inventory of floristics and structure. Research into LiDAR inventory has been widely taken up and has now become a practical and economic alternative. LiDAR based inventory utilises aerial photography for stand delineation and stratification, followed by LiDAR and field sampling for stand assessment (e.g. height, mean diameter, basal area, stocking and volume) (Næsset, 2004; Holmgren, and Jonsson, 2004). In Finland, major changes are underway in national forest inventory, for example, it is planned that from 2010 all forest inventory conducted by Forest Centres will be based on laser scanning, aerial photography and deliberately positioned field plots. Non-parametric plot based methods will be applied to generate estimates by forest stand by species of age, height, diameter and volume (Finnish Forest Association, 2008).
There is wide variation between countries for plot dimensions, orientation and sample density. The same issues facing these countries also occur in Australia, and the NFI is utilising the international experience in the development of the proposed CFMF (Wood et al., 2006).
Within any sampling framework, errors in measurement, estimation and sampling are commonly recognised. Measurement error may occur in estimating structural attributes, such as tree height. Estimation errors are associated with the prediction of new attributes, many of which are difficult to measure (e.g., biomass) from measured attributes (e.g., diameter at 130 cm height (D130)). Within a sampling strategy, a key component is to reduce both the measurement
and estimation errors at the site level and from both field and remotely sensed data. (Schreuder and Gregoire, 1993; West, 2004) Sampling errors relate to how well the sample represents the entire population or region. When designing sampling strategies, consideration needs to be given to the intensity of samples required to adequately represent the area or population. Samples too close together will tend to duplicate information creating a wasteful (and expensive) design. However, samples too far apart will give rise to large sampling variances (potential error) and so be inefficient (Scott, 1998).
Random, systematic, and stratified random sampling
The simplest and easiest technique to sample a population is to select an unbiased sample across the area to be assessed. This is achieved by randomly locating samples until the desired number have been collected. A limitation with simple random sampling is that often clusters of sites are generated in some parts of the area and no observations in others, which, depending on the research interest, can limit the suitability of the method for effective spatial sampling (Haining, 1990). This limitation can be offset with stratified random sampling, where an independent random selection in made within partitioned regions or strata. This strategy allows the variances of the estimators from each stratum to be combined to obtain variances of estimators for the whole population (Thompson, 2002).
Stratification schemes are most efficient when the population is partitioned such that the units within the stratum are as similar as possible. Whilst variance between strata may be high, a stratified sample with adequate units from each stratum in the population will tend to be representative of the population as a whole (Thompson, 2002). Stratified random samples become inefficient when the distance between samples is less than a predefined optimal distance used in a systematic sample. Therefore, some form of systematic sampling may be desirable which keeps sites at some optimal distance apart while providing full coverage of the area under investigation. If attributes under investigation tend towards highly variable distributions, then a relatively dense network of sites is required so that the variable nature of the area can be characterised (Haining, 1990).
Whilst random allocation of sites reduces bias toward any particular spatial attribute, clustering can occur even with stratification, for example near strata boundaries (Thompson, 2002). Theoretical evidence stresses the effectiveness of systematic sampling in a variety of spatial situations. However, issues can arise with aligned systematic sampling because of spatial variability occurring at a range of measurement scales, discontinuous spatial variation, or where there are features that are not easily sampled using a regular grid (e.g., riparian vegetation; Haining, 1990). Practical issues also influence sampling design. While a potentially optimal design may use a dense network of sites on a regular grid, when cost,
timeliness of survey, and access (especially to private land or remote/difficult terrain) are considered then it is likely that not all potential sites can be used. In these instances, an increased sampling error has to be accepted or a different sampling methodology used (Thompson, 2002).
Model based sampling
An example of a potentially powerful sampling design, developed to address some of the issues of random and systematic sampling, is model-based sampling. With this strategy, a regression model is used to determine the value of the attribute of interest, based on its relationship to an easily observed variable that was measured on every sampling unit in the population (West, 2004). Multiple covariate attributes can be utilised in the regression model, so long as they are all available in the sampled population (i.e., remote sensed data), which can often result in improved model prediction accuracies. Model based sampling designs have been used at local and regional levels for over 20 years (Biggs et al., 1985; Wood and Schreuder, 1986; Hamilton and Brack, 1999). However, whilst the application of model based sampling at continental scales was proposed (Brack, 2004), and forms a key part of the Australian Greenhouse Office NCAS methodology (Brack et al., 2006), the sampling strategy has not yet been implemented for national forest monitoring.
An advantage of a model-based strategy is that it utilises the full power of regression analysis in establishing relationships between the variable of interest (often something more difficult to measure – such as field data) and one or more covariates that can be (generally more easily) measured in the population (e.g., using remote sensing). Generally, all that is required is that the data collected covers most of the value range of the covariates occurring in the population, and that the sample is objectively selected (West, 2004). These criteria allow field data collected for purposes other than forest inventory to be utilised more often, thereby reducing cost and resource requirements.
Disadvantages of model-based sampling are that (as with most stratified designs) some prior knowledge or data is required in order to develop the regression models. If there are large
uncertainties in the existing data, or gaps in the extent or knowledge of the range of values in the population, then the resultant models may have large variances and poor prediction ability when applied to new areas. Second, a model-based sampling strategy could become potentially confusing and unwieldy (and therefore difficult to ‘sell’ and implement) when a large number of variables of interest are being investigated, with each requiring a separate regression model to be developed. This may be mitigated to some extent if there is correlation between forest metrics, thereby allowing a smaller set of variables to predict a larger range of metrics. Despite these potential disadvantages, where cost is a factor, the ability to predict a required attribute based on readily available remotely sensed data can provide a relatively inexpensive initial estimate (with confidence levels), especially if the required information was not available previously, or is available but has low spatial or spectral resolution and limited attributes (Brack, 2007).
Using field data for remote sensing calibration
Field data are used to enhance the extraction of information from remotely sensed sources, through calibration of the data and information, and to provide an assessment of the accuracy of derived information. This methodology is a core part of the strategy for integrated sampling schemes. The following section outlines current knowledge and theory on the data integration process, and provides guidance for addressing the primary research question, in terms of linking and calibrating LiDAR with field estimates of forest structure.
Field data can be defined as independently verifiable, more detailed and accurate (spatially and in information content), and collected using proven and repeatable techniques, usually at a fine spatial resolution (Curran and Williamson, 1985). Field data are collected at key locations determined through appropriately designed sampling strategies, as outlined previously (Curran and Williamson, 1986). The field site concept is defined by McDonald et al., (1998) as a small area of land that is considered to be representative of the vegetation, landform, or land surface / features associated with the observations. It is noted that whilst the extent of a site is arbitrary, a square or rectangular site of 400 m2 is appropriate for sampling
vegetation, however this may vary depending on the surrounding land cover. What is required with all field data collections used with remotely sensed data is a clear definition of the purpose of the data, and a specification of the criteria that the field data must meet, such as (but not limited to) the types or intended use, spatial resolution, timeliness and accuracy of the data collected (Zhou et al., 1998; Fisher et al., 2006).
When using field data to validate remote sensing derived information, accuracy is defined as the closeness of derived values to field estimates (Cooke and Harris, 1970). However, as field data are rarely totally accurate, the accuracy assessment is not necessarily an estimate of the closeness to the “true” values (although it may be close). This also suggests that field data should not be called “ground truth”, as it is also an estimate of the attribute of interest (Brogaard and Ólafsdóttir, 1997). Two other concepts related to the usefulness of field data for calibration deal with bias and precision. Bias is defined as the difference between the mean value from a set of repeated measurements and its ‘true’ value. Precision is defined as the variation in a set of repeated measurements (West, 2004).
Three different types of accuracy can be recognised with respect to remote sensing analyses: classification accuracy, parameter estimate accuracy and location accuracy. Classification accuracy can be undertaken at a pixel level, where the classification result is compared to field estimates for each pixel in a sample, and a commission and omission error table generated. When object-oriented analyses are undertaken on the data, objects (e.g., crops, fields or forest patches) that are generated from a number of pixels (or segments / clusters) can be assessed for accuracy for both class attributes and area / boundary location (Van Gendern et al., 1978).
Parameter estimate accuracy compares the estimate of a physical attribute (e.g., forest height or cover) with ground measurements, usually with correlation and regression analyses to establish the relationship and set confidence ranges (Thompson, 2002). Whilst the assessment process is generally simple (depending on the metric being compared), accuracy estimates can be compromised when there is a mismatch between the sensor spatial resolution and the scale at which in-situ measurements are collected. The use of in-situ measurements for model
calibration and validation therefore requires robust and defensible methods to adequately sample or spatially aggregate ground measurements to the scale (e.g., size and shape) at which the remotely sensed data are acquired (Curran and Williamson 1986; Atkinson and Curran 1995; Baccini, et al., 2007).
Location accuracy assessments between field and remotely sensing data compare points, lines, or polygon area features. Assessments generally rely on specialised field survey techniques (e.g., through the use of ground control points) and/or Global Positioning Systems (GPS), for the registration and geo-rectification of the remotely sensed data (Lund, 1998). Where both field and remotely sensed data have utilised the same ground control points, and thus have the same inherent positional error, it can be difficult to judge which source provides a more accurate estimate of the object location, particularly when fine scale remotely sensed data are used.