Statistical analyses were carried out to (a) identify the important drivers of LU/LCC in GNP, (b) understand the effects of these drivers on landcover change, and (c) determine how the influence of certain drivers changed over time. Similar to a study by Mertens et al. (2000) that investigates the influence of macroeconomic changes in Cameroon on land use using statistical models, this study of changes over time in drivers of LU/LCC in GNP is framed in the political, social, and macroeconomic contexts of post-Soviet Latvia. A
classification tree analysis and a multinomial logistic regression model were run for each pair of consecutive image dates, and for the 1985 – 2002 image pair. In each case, the analysis was done at the pixel level. To reduce the expected influence of spatial autocorrelation, the pixels were systematically sampled, and every 10th pixel was used in the analysis, bringing the total number of pixels used from approximately one million to just fewer than 100,000. For each model, the observations were further restricted to those pixels that represented change in landcover type between the two image dates. For the 1985 – 1994 image pair, the total number of pixels (n) used in the analysis was 21,803. For the 1994 – 1999 image pair, n
was 20,851; for the 1999 – 2002 image pair, n was 20,105; and for the 1985 – 2002 image pair, n was 23,483.
Four pairs of models were run (a pair being a classification tree analysis and a multinomial logistic regression), each with different dates for the dependent landcover change variable, but with the same set of independent variables. The first three model pairs used pairs of consecutive image dates, and the fourth model pair used the pair of dates straddling the full study period:
1. 1985 – 1994 2. 1994 – 1999 3. 1999 – 2002 4. 1985 – 2002.
The only exception was that the 2000 management zones were not used as an
independent variable in the 1985 – 1994 models, because these zones were not yet created at the time the images were taken, nor were the new zones even in discussion. The 2000 management zones were used in the 1994 – 1999 model, because although they had not yet gone into effect during that period, the 2000 management zones had already been designated by 1999 and knowledge of the new zone boundaries and forthcoming landuse laws had influenced landuse decisions for some time, particularly on the part of the GNP
Administration, the organization that created the new zones.
Classification Trees
In classification trees, binary splits of data are performed on the independent
variables, based on their predictive value (in decreasing order) of the dependent variable. A classification tree then allocates each observation to a particular class (of the categorical
dependent variable) based on a set of binary indicators of the independent variables. For example, if an observation has a value of variable #1 (suppose it is continuous) that is (for instance, greater than) x, then the observation is determined to be in one of the 2 major branches of the classification tree (note that variable #1 need not be the first variable in the list of independent variables – it is the variable determined to be the most significant in predictive value of the dependent variable); then if the observation’s value of variable #2 (suppose it is categorical) is among an element of a subset, , of classes, then it is determined to be in one of the 2 sub-branches of the first branch; then if the observation’s variable #3 (suppose it is continuous) is (for instance, less than) y, then this observation is determined to be in one of the 2 sub-branches of the previous sub-branch, etc. This process continues to the lowest branch of the tree, where the observation is allocated (and determined most likely to belong to) one of the classes, z, of the categorical dependent variable. Each successive split further down on the tree always has lower predictive value (than the higher up branches) of the dependent variable. The goals of these analyses were, for each classification tree, to (a) determine the most important splits in the independent variables in their predictive value for landcover change, (b) determine how the variables affected probability of landcover change, and (c) assess how the most important splits in independent variables changed over time (from one classification tree to the next). The S-plus software was used to run the classification tree analyses.
To avoid generating relatively meaningless splits on variables (at the bottom of the classification trees) involving an insignificant number of observations, the classification trees were run with the following parameters: each branch of a split on a variable was constrained to a minimum of 500 observations, the minimum node size to allow a split was 1000
observations, and the minimum node deviance before tree growing was stopped (if the other constraints were not yet satisfied) was set to 0.01. Note that there were approximately
20,000 observations (pixels) for each tree. Once the full trees were created, they were pruned by cutting off the least important nodes, in terms of both the values of the node deviances and the scientific relevance of the splits, as determined through an analysis of branch size. In summarizing the results, the most important of these branches are discussed. The effects of the splits (and therefore the effects of the independent variables) on the probability of the outcome variable (the landcover type the pixel changed to) were analyzed by examining the relative probabilities (between the branches of each important split) of each outcome at each node. Large differences in probabilities were noted in the results.
Multinomial Logistic Regression Models
The goals of the multinomial logistic regression models were to (a) assess the statistical significance of each variable in its predictive value for landcover change, (b) examine the effects of specific variables on specific types of landcover change, and (c) assess how these effects changed over time. The SAS program was used to run the multinomial logistic regressions.
Multinomial logistic regression is a regression method that can be used when the dependent variable is a categorical variable with more than two levels. Instead of modeling the dependent variable directly against the independent variables, the log of the probability of the dependent variable occurring in one class relative to a referent class, log(Pi/Pr), is
modeled, where Pi is the probability that the dependent variable is in class i, and Pr is probability that the dependent variable is in the referent class. One of the benefits of
squares (OLS) regression. Multinomial logistic regression models have the following important freedoms from the restrictive assumptions of the OLS model (Garson 1998):
1) although the log(Pi/Pr) must have a linear relationship between the dependent and the independent variables, the outcome variable itself need not have a linear relationship with the independent variables,
2) there is no homogeneity of variances assumption, and 3) the error terms are not assumed to be normally distributed.
As mentioned, the observations were restricted to those pixels that underwent change between the two image dates. Each multinomial logistic regression took the following form, simultaneously estimating these 5 equations (each equation with one of 5 different levels of i):
Log(Pi/P6) = ß0 + ß1X1 + ß2X2 + … + ßjXj + , where:
i = 1, 2, .., 5, representing each landcover type, excluding forest, the reference category, j is the number of independent variables,
Pi = the probability of a pixel changing to landcover class i,
P6 is the probability of a pixel changing to the forest (referent) class, X1, X2, ... Xj represent the j independent variables,
ß 1, ß 2, ... ß j represent the j parameter estimates, and
represents the independent, identically distributed error terms.
Note: j = 9 variables except for the model using the 1985 – 1994 image pair, where j = 8, since the 2000 management zone variable was not included in this model.