3.5 Data Analysis
3.5.1 Demographic Household Data
For the data on demographic household attributes, preliminary data analysis was carried out in Microsoft Excel software to check and clean errors. The preliminary data operations involved processing, cleaning, data reduction for accuracy and reliability and, coding by rapid qualitative analysis and restructuring of all the recurrent answers from the respondents.
Categorical Principal Component Analysis (Meulman and Heiser, 2005) in SPSS Version 20 was used to perform household categorization based on variables that included farm size, age of household head, total household income (on and off-farm income), family size, production orientation, education level of the household head. Household variables that made up the first four principal components were selected by use of factor loadings obtained by varimax rotation. The variables that loaded into the same component were analysed to identify indicators for use as proxies for household categorization. The households were grouped into generic categories using individual factor scores for the first and second principal components.
3.5.2 Rainfall data
The rainfall data obtained from the National Metereological Station, Katumani had three months of missing data (January 1985, September and October 1990) in-filled by the extrapolation method (HPTM, 2002) as indicated in Appendix V. The rainfall data was consistent based on the Single Mass Curve method and there were no further data corrections. The following parameters were used to assess rainfall variability: Standardized Precipitation Indices (SPI), Standardized Anomaly Index (SAI), Coefficient of Variation (CV) and Polynomial regression.
(i) Standardized Precipitation Indices (SPI)
Monthly Standardized Precipitation Indices (SPI) based on the World Meteorological Organization (WMO, 2012) were used to assess dry and wet months according to Equation (iii):
SPI= Qj- Qj/σj …………..………(iii)
…………..………(iii)
Where Qj the measured monthly rainfall for a given month, Qj the long term mean monthly rainfall and σj is the long term mean monthly standard deviation.
The SPI was normalized and the rainfall for both wet and dry months were subsequently represented and monitored on the same scale as shown in Table 3.3.
Table 3.3: The monthly standardized precipitation indices (SPI)
SPI Description of Rainfall pattern
2.0> Extremely wet 1.5 to -1.99 Very wet 1.0 to 1.49 Moderately wet -0.99 to 0.99 Near normal -1.0 to -1.49 Moderately dry <-2.0 Extremely dry Source: Hayes (2005)
(ii)Drought occurrence
A meteorological drought event was deemed to have occurred when the monthly SPI was continuously negative over consecutive months and the event ended when the SPI became positive. Each drought event, therefore, had a duration defined by the time interval in months within the drought event and a magnitude which was the positive sum of the SPI for all the months within a drought event. A drought event would be mild to exceptional based on the drought severity classification proposed by the World Meteorological Organization in (Table 3.4).
Table 3.4: Drought severity classification Drought Description Severity of event Frequency in 100 years SPI Mild 1 in 3 years 33 0 to – 0.99 Moderate 1 in 10 years 10 - 1.0 to - 1.49 Severe 1 in 20 years 5.0 - 1.50 to – 1.99
Extreme 1 in 50 years 2.5 - 2 or less
Source: WMO (2012)
(iii) Standardized Anomaly Index (SAI)
The Standardized Rainfall Anomaly Index (SAI) was used to analyse annual and seasonal rainfall variability according to Equation (iv):
SAI=(X-X)/SD……….………. (iv)
Where X was annual/seasonal rainfall total, X the mean of the entire annual/ seasonal data series, and SD is the standard deviation from the mean of the seasonal/annual data series.
(iv)Coefficient of Variation (CV)
CV was used to analyse seasonal rainfall variability according to Equation (v).
CV=SD/X……….…(v)
where SD is the standard deviation from the mean of the seasonal and X is the seasonal mean rainfall.
(v) Polynomial regression
The quadratic polynomial regression was used to assess rainfall variability based on the relationship between seasonal/annual rainfall variability and time according to Equation (vi).
Y=βo+β1X+β2X2+C……….(vi)
Where Y is the seasonal/annual rainfall, βo is the Y intercept, β1 is the 1st order
To test the strength of the polynomial regression coefficient, the following hypotheses were tested:
Ho: R2=0, there was no significant difference in the seasonal/annual variability of rainfall over time in Muooni Catchment
H1: R2≠0, there was significant difference in the seasonal/annual variability of rainfall over time in Muooni Catchment
ANOVA was used to conduct significance tests as to whether a significant relationship existed between rainfall variability and time. The total variance of a seasonal/annual rainfall over time was estimated according to Equation (vii).
………..(vii)
Where s2 is the seasonal/annual standard deviation, yi is the ith observation; n is the
number of observations, and is the mean of the n observations.
The numerator in equation (vii) is called the total sum of squares (SST) and is the summation of the squares of the deviations of all the observations, yi, from their mean, and associated
with the total variance of the observations. The denominator in Equation (vii) is the degrees of freedom associated with the SST which is (n-1).
The sample variance or the mean sums square (MST) was obtained by dividing the SST by the respective degrees of freedom according to Equation (viii).
………...……….(viii) Fitting a polynomial regression model to the observations was an attempt to explain some of the variability in the data by calculating model sum of squares (SSR), using a relationship similar to the one used to obtain SST according to Equation (ix).
………(ix)
The regression coefficient (R2) was a measure of the amount of variability in the data accounted for by the polynomial regression model. As mentioned previously, the total variability of the data was measured by the total sum of squares (SST) and the proportion of this variability explained by the regression model was the regression sum of squares (SSR). The regression coefficient was therefore the ratio of the regression sum of squares to the total sum of squares according to Equation (x) and had a range of 0-1.
………(x)
An imperfect regression model was assumed in this study whereby a proportion of the total variability in the observed data still remained unexplained and the total sum of squares not explained by the model was the error sum of squares (SSE). The deviation for this sum of
squares was obtained at each observation in the form of the residuals according to Equation (xi).
………(xi)
The error sum of squares (SSE) was obtained as the sum of squares of these deviations according to equation (xii):
……….(xii).
………(xiii)
The total variability of the observed data (i.e., the total sum of squares, SST) can be expressed
by the analysis of variance identity based on portion of the variability explained by the model, SSR, and the portion unexplained by the model, SSE, according to equation (xiv).
……….(xiv)
To test if R2 was significantly different from zero, a statistic based on the F distribution was used. The F statistic is a ratio of the model mean square and the residual mean square according to Equation (xv).
……….(xv)
Using the F-test value from equation (xv), a decision was made for rejecting or accepting the null hypothesis (Ho R2=0) stating that there was no significant difference in rainfall variability over time according to the following decision rule:
If Fo>F1, rejection of Ho (R2=0), and acceptance of H1 (R2≠0) for absolute F values If Fo<F1, acceptance of Ho (R2=0), and rejection of H1 (R2≠0) for absolute F values.
Where F1 was the tabulated F-test statistic given by statistical tables (1 degree of
freedom in the numerator and n-2 degrees of freedom in the denominator) and F0 was