Regression Modeling - Data Analysis - Data Collection and Analysis

Research Design and Methodology

6 Stratified sampling was involved in dividing the population into homogenous groups containing

4.4. Data Collection and Analysis

4.4.3. Data Analysis

4.4.3.1. Regression Modeling

households‟ perceptions about climate change, vulnerability to food insecurity, livelihood capital assets and livelihood strategies was documented and analyzed textually8 to substantiate the statistical results from the structured questionnaire. In general, the collected data were analyzed with the help of narrations, descriptions and quotations.

The quantitative data analysis, on the other hand, was a process of tabulating, interpreting and summarizing empirical and numerical data for the purpose of describing or generalizing the population from the samples. Upon completion of the data collection, the data were coded, edited, digitized and entered into the statistical package SPSS (Statistical Package for Social Scientists) and analyzed using descriptive and inferential statistics such as frequencies, percentages and tables. Inferential statistics such as paired t- test, one way ANOVA, independent t-test, chi-square and bivariate correlations were used to investigate the relationships and differences of the variables. In general, to analyze the quantitative data, descriptive statistics and inferential statistics (bivariate correlation, linear regression and binary logistic regression modeling) were used.

4.4.3.1. Regression Modeling

Linear Regression Model

Linear regression modeling was used to identify significant factors influencing household total annual incomes as dependent variable. The productions of crops of the sample households in 2010/11 were expressed in terms of monetary equivalent to understand contributions to total annual incomes. For that reason, estimated average prices of crops produced in 2010/11 were taken during the field survey. Besides, annual incomes of households from sale of livestock during that time were considered. Annual incomes from non-agricultural activities were also estimated. Hence, the major sources of income for households in the study area were small-scale agriculture (crop and livestock

If the data is in the form of text, the raw data requires some sort of organizing and processing before it can actually be analyzed. Field notes, for example, may fill hundreds of pages of notebooks or take up thousands of megabytes of space on a computer disk (Berg, 2001).

75 production, and sale of trees and fruits), engagement in off-farm and non-farm activities, participation in public works programs and receiving remittances. Annual income of households was taken as a proxy for the livelihood outcome of households from their diverse set of livelihood strategies, as annual incomes broadly determine food security status and wellbeing of households. Babu and Sanyal (2009) added that a household annual income is one of the determinants of household food security outcomes. Annual incomes of households reported here, sums from all sources of income, were estimated by respondents themselves. Explanatory variables included selected socio-economic and biophysical factors that were assumed to influence annual incomes of households in the study area (Chapter 6).

Binary Logistic Regression Model

Binary logistic regression model was employed to identify determinant variables affecting households‟ vulnerability to food insecurity. Such kind of model is suitable when the dependent variable is dummy in this case household food security as it is shown in the succeeding topics. The factors that determine households food security were grouped into natural and socio-economic factors, and the variables selected for the model were dominantly socio-economic factors.

An assessment of the Goodness-of-fit of the model

Checking the Goodness-of-fit is important for binary logistic regression model (Quinn and Keough, 2001). The Pearson χ2

statistic based on the observed (o) and the expected (e) is used to visualize the two (binary response) and contingency tables (Quinn and Keough, 2001). This showed that the fitness of the logistic model is determined by how similar the observed values are to the expected or predicted values. The null hypothesis that the model fits the data against the alternative hypothesis was also tested using Hoemer- Lemeshow Test. Hoemer - Lemeshow‟s goodness of fit test indicates that the predicted frequency and observed frequency should match closely; and the more closely they match, the best fit it yields (Alemu, 2007; Tang, 2001). According to Babu and

76 Sanyal (2009), the binary logistic regression model best fits, if the value of the Hosmer- Lemeshow goodness of fit approaches to one.

Multicollineraity checking

Once the model is fitted to the observed and expected of the binary response variable, a thorough examination of the extent to which the fitted model provides an appropriate description of the observed data is vital in the modeling process (Alemu. 2007). According to the same author, the fitted logistic regression model may be inadequate because a particular observation, termed as outliers or influential values might have an impact on the conclusions drawn from the results. Some of the statistical techniques, which are employed to examine the model of adequacy, include multicollineraity, tolerance and variance inflation rate (VIF). Multicollinearity occurs when two or more independent variables are approximately determined by a linear combination of the independent variable in the model (Quinn and Keough, 2001). When the collineraity is perfect linear, it is impossible to obtain a unique estimate of the regression coefficient with all the independent variables. Gupta (1999) suggested that a bivariate correlation coefficient greater than 0.8 (in absolute terms) between two independent variables indicates the presence of significant multicollinearity effect. Multicollinearity indicates the strength of the interrelationship between independent variables however, how much the inflation of the standard errors caused by collinearity effect could be checked using tolerance (1 - R2)andVIF (1/tolerance). As a rule of thumb, the VIF rate greater than 10 shows high multicollinearity and tolerance close to zero also indicates high multicollinearity between independent variables (Alemu, 2007).

In document Determinants of rural household food security in drought-prone areas of Ethiopia : case study in Lay Gaint District, Amhara Region (Page 88-90)