Soil data analysis - Statistical analysis

Chapter 4 METHODS

4.3 Statistical analysis

4.3.1 Soil data analysis

Chapter 6

A Principal Component Analysis was conducted using the PC-ORD software (McCune & Mefford 1999) to investigate the relationship between wetland types and -zones, and determine the main influencing environmental variables on these relationships. Since clay content was not available for all the wetland sites, a constant of 1 was added to compensate for the absent data. Thereafter all the data was log- transformed to improve the linearity of the environmental relationships (Mccune & Grace 2002). The pH data was not transformed, as it is a log-value already.

The variation of soil properties in the wetland zones down a topographical gradient was investigated by fitting a series of mixed models in order to investigate various aspects of the data. The various mixed models were fitted using the SAS software package (version 9.22, SAS procedure MIXED; SAS 2009). Generally the data were modelled as a function of the factors ‘Type’, ‘Zone’, ‘Transect’ and ‘Depth’. Only data to a depth of 400 mm were analysed, as it was statistically determined using one- way ANOVA (significance at p < 0.05) that most variation in soil properties occurs in the top 400 mm of a soil profile (refer also to Chapter 5). The following depth increments were used: 0-50, 50-100, 100-150, 150-200, 200-250, 250-300, and 300-400 mm. The 11 variable measurements (C, N, Ca, K, Mg, Na, Fe, Mn, CEC, pH and resistance) constitute the dependent variables in the statistical analyses. The 11 dependent variables were analysed separately. The objectives were to:

 Assess the effects of the factors ‘Type’ and ‘Depth’, and

 Investigate differences with respect to ‘Zone’.

63 Models 1 and 2

In models 1 and 2, data from all five wetland types were analysed jointly by fitting a linear mixed model with the following categorical (class) effects:

 Fixed effects: type, zone, type*zone, depth, type*depth, zone*depth, type*zone*depth

 Random effects: transect*type, transect*type*zone

If the ‘transect*type’ random effect were to be left out, the resulting mixed model would be equivalent to a split-plot ANOVA; where transect*type*zone identifies the sampling points, with between-plot factors type and zone, and within-plot factor depth. The additional random effect transect*type was fitted in order to model correlation along transects.

Model 1 was fitted to the data as measured (untransformed), while Model 2 was fitted to the log- transformed data (natural logarithm). When analysing the untransformed data, plots of the residuals against predicted values generally suggested that the variance increased with the mean (funnel shaped residual plots). When analyzing the log-transformed data, the residual variance seemed to stabilize. Therefore, all subsequent analyses were carried out using the log-transformed data, with the exception of the variable pH which was analysed using the untransformed data (since pH is already a log-value).

Generally the type*zone and type*zone*depth interaction terms were statistically significant. The effect of between-zone differences therefore depended on the type of wetland. As a result further analyses were carried out separately for the different wetland types.

Model 3

The following mixed model was fitted to data from the five wetland types separately:

 Fixed effects: zone, depth, zone*depth

 Random effects: transect, transect*zone

Based on this model, (least squares) means for the various zones in each wetland type were calculated. In order to assess the differences between zones, pairwise differences between the zone means and associated P-values were calculated.

The data were analysed on the log-scale, so that the antilog of the zone means were geometric mean values of the measurements. Similarly, the antilog of the pairwise differences between zone means was the ratios of the geometric means.

Because Model 3 did not assume that the effect of depth was linear (depth was fitted as a categorical effect in Model 3) and because Model 3 did not assume that the zone*depth interaction was not significant, the results from this model were valid for all dependent variables and wetland types.

64 Model 4

Where the zone*depth interaction term in Model 3 was not significant, this interaction term was dropped from the model, and the following mixed model was fitted to data from the five wetland types separately:

 Fixed effects: zone, depth_c, depth

 Random effects: transect, transect*zone

The variable depth_c represents depth fitted as covariate. The variable depth_c is included in Model 4 in order to test whether depth as categorical variable, fitted after depth_c, remains significant. If depth fitted after depth_c is found not to be significant, the effect of depth is linear, and can be modelled using the covariate depth_c. Generally, but not always, the factor depth was not significant.

Model 5

In the cases where the factor depth was not significant in Model 4, this factor was dropped, and the covariate depth_c was continued to be fitted with the following mixed model (from the five wetland types separately):

 Fixed effects: zone, depth_c

 Random effects: transect, transect*zone

As for Model 3, (least squares) means for the zones were calculated and pairwise differences between the zone means and associated P-values were calculated.

As with model 3, the data were analysed on the log-scale. Therefore the antilog of the zone means were geometric mean values of the measurements. Similarly, the antilog of the pairwise differences between zone means was the ratios of the geometric means.

Chapter 8

The Munsell system has three components: Hue (a specific colour), Value (lightness and darkness), and Chroma (colour intensity), which are arranged in books of colour chips. Soil is then matched visually and assigned the corresponding Munsell notation.

Indices to determine the correlation between organic carbon and soil colour

The following relationships were determined to establish which give the best correlation between soil colour and SOC:

 Hue Value and Chroma: wet and dry.

 Dry Value - Wet Value. The ΔValue from dry to wet was correlated against SOC. Soil generally becomes bleached when it is dried, while SOC retains its dark colours whether it is dry or wet, therefore the colour change in the sandy soil samples will be higher than in the

65 high organic soil samples. Only the Value component of the Munsell system was used, as this showed the best correlation with SOC.

 Dry Value + Wet Value. Similar to above.

 Mokma & Cremeens (1991) developed a horizon colour index based on matrix colour, size and colour of mottles and continuity and colour of clay films. The colour of mottles was not determined in this study, as the soil was ground and sieved and the resulting matrix colour read. No clay cutans were observed in the soil in this study. Therefore the colour index of the soil matrix was adapted, and determined by: numeric Hue + (8 - Chroma).

 Evans & Franzmeier (1988) developed an index to combine Hue and Chroma. To account for the high Hue values of wet soils a Hue index “hi” was calculated by subtracting the hue number from 30 and assigning neutral hues a hi value of 2.5. Thus hi numbers are 2.5YR = 17.5, 5YR = 15, 7.5YR = 12.5, 10YR = 10, 2.5Y = 7.5, 5Y = 5.0. These conventions are arbitrary, and the number 30 was chosen to keep the ‘hi’ value positive. The equation ‘hi + Chroma’ was used to combine the effects of Hue and Chroma.

 Godlove (1951) and Melville & Atkinson (1985) propose that Euclidean distance is a valid measure of perceived colour differences, and to obtain a single numerical value for this distance with the equation ΔE = (2C1C2 [1 – cos (3.6 x ΔH)] + (ΔC) 2 + (4ΔV) 2)½ where C1 and C2 are the Chroma units of two colours separated by ΔC Chroma units, ΔH Hue units and ΔV Value units.

 Van Huyssteen (1997) developed a number of indices to correlate the degree of wetness with soil colour.

 Effect of substrate. From the literature review it is apparent that texture class, land use, size of geographic area, and climatic region has an influence whether there is a relationship between SOC and soil colour, and how significant this relationship is. Since all the wetland systems occur within in one climatic region, the land use was similar, and none of it was under cultivation, therefore only the effect of texture in soil colour was determined. Chapter 5 and 7 indicated that there are three main substrate types that are highly influential in wetlands on the MCP. The data were analysed per substrate type:

- High Organic soils, where SOC > 10% (Soil Classification Working Group, 1991); - Clay soil, where the clay fraction is > 10% (as per Pretorius, 2011); and

- Sandy soil, where the clay fraction is < 10% (as per Pretorius, 2011).

There were no data for the Utilised Perched Pans (DP Type) for any of the soil colour analysis, therefore this wetland type was omitted in this chapter.

Segmented quantile regression models

It is not unusual that relationships between soil properties cannot be described by conventional correlation or regression analysis. Blavet et al. (2000) suggests that colour limits could be defined when constructing relationships between soil morphology and the duration of water saturation. This is because there tends to be a scatter of values of which the only meaningful feature may be a boundary line that separates a zone of reality from that of imagination (Mills et al. 2006).

It is believed that envelopes delineated by segmented quantile regression would provide greater insight into relationships between two soil parameters than would straight-line regression (Koenker

66 and Hallock 2001). Quantile regression models are useful when the response variables are affected by more than one factor, when the response is different to different factors, when not all applicable factors are measured and when there is an interaction of multiple factors (Cade and Noon 2003). Unlike multiple regression and multivariate analyses, the quantile regression approach illustrates that the expression of the dependent variable can only occur within a limited range of a particular variable, but that this potentially maximal expression is not guaranteed. Conversely, there may be a predictably minimal expression of the dependent variable over one or more ranges of the environmental variable (Mills et al. 2006). Segmented quantile regression would result in an understanding of the relationships between SOC and soil colour by demarcating zones of potentially maximal and predictably reduced expression (Mills et al. 2006).

The idea behind quantile regression is to fit a regression line through a part of a set of data points to create a response envelope (Mosteller and Tukey 1977). Inside of this envelope will be the zone of reality, where actual data points occur; while outside of this envelope would be the imagination zone, where data points could, but do not occur. Depending on the quantiles chosen to create this regression line a certain percentage of data points will occur beneath it (Figure 4.13; Van Zijl et al. 2014, Medinsky 2006, Mills et al. 2006).

Figure 4.13. Hypothetical relationship between infiltrability and a soil property showing a boundary line that divides a zone of reality from that of imagination (Mills et al. 2006).

There should be a balance between a sufficient number of classes and a sufficient number of data points in each class to accurately reflect the distribution of the response variable over the particular range of the independent variable. This is a somewhat subjective choice. The boundary lines presenting 0.9 and 0.1 quantiles were calculated in MS Excel, as this adequately reduced the amount of outliers. To construct the boundary lines the data were sorted in ascending order according to the

67 independent variable and subdivided into a number of classes with equal number of samples in each class. The number of data points per class were mostly 50, although some were as low as 20 and as high as 99 in some classes. Mean soil variables and quantiles (0.1 and 0.9) were obtained for each class. Regression lines fitting the 0.9 quantile were selected.

Topsoil colour as indicator of wetland boundaries

The four different wetland systems were analyzed separately. For each wetland system the zones were statistically compared to determine whether there are different colour values moving from the outside of the wetland to the inside of the wetland. Significant differences between topsoil colours for the various zones could then be used as an indicator of wetland boundaries. This analysis was done for a selected number of Munsell colour indices.

A one-way between-groups ANOVA was conducted to determine whether the colour indices can be used to differentiate between the various zones in the various wetland types. Normality was assumed by examining the Shapiro-Wilk test result (p > 0.05). The assumption for homogeneity of variances was investigated using Levene’s statistic (p > 0.05). In cases where the assumption for homogeneity of variance were violated (p < 0.05), a Welch ANOVA was applied. A Tukey Post-Hoc test was applied to the ANOVA results, and a Dunnett Post-Hoc test to the Welch ANOVA test results.

In document Selected soil properties and vegetation composition of five wetland systems on the Maputaland Coastal Plain, Kwazulu-Natal (Page 62-67)