Data management, preparation and cleaning

3.4.1 Quantitative data handling and dealing with missing values

The first step in any research that involves data analysis is data handling (Richards, 2014). In this study, quantitative data was collected with the use of a questionnaire, which was mainly distributed to retirees and retrieved over a period of four months (January 2016 –April 2016). Definition of variables or questionnaire items was done in SPSS version 24 data editor in preparation for data entries midway into the fieldwork. The choice of SPSS was mainly because the researcher is more conversant with it and also because it is the most widely used statistical application and versatile in its applicability. It has a wide range of tools that can be used for simple and complex analysis as well as enabling modification of variables and creation of new ones (Arkkelin, 2014). Each completed questionnaire was given a unique identity number, starting from 001 to 330 in preparation for processing in SPSS. Since the questionnaire was designed largely with close-ended questions, responses were equally largely pre-coded. The raw data collected were keyed into the already designed SPSS template. Questions that were open-ended were entered as string responses.

After entering the data, SPSS was used to screen the data for possible errors stemming from incorrect responses to questions and wrongful entering of data. Where appropriate, these were identified and corrected. Basic descriptive statistics were conducted for all the variables to observe trends in the dataset. In doing this, only basic frequencies were generated for categorical variables whilst additional descriptive statistics like mean, standard deviation, minimum and maximum values were included for numerical variables. These enabled the researcher to have a snapshot of the dataset and to determine the next level of analysis. For instance, this process led to the observation that some data were missing, resulting in the conduct of missing data analysis to determine the extent of missingness and how to address it. As noted by Hill (1997), the phenomenon of missing values or incomplete data in a survey is common in many surveys but SPSS has tools that enable analysis of missingness.

Analysis of missing values

Theoretically, when a dataset has missing values, the researcher has a number of options for addressing them (Little and Rubin, 1989), but the first step is to check the extent of missing data. To this end, an analysis of missing values was conducted on the dataset using SPSS. The procedure entailed a description of the pattern of missing values to know where they were located (which variables had missing values), the extent of the missingness (how many cases had missing values) and whether the missing values had occurred randomly (Hill, 1997). The analysis showed that four (4) variables (3.846%) had missing values. These variables were the ‘number of years worked before retirement’, ‘number of years of contribution to SSNIT fund’, ‘amount of monthly pension received from SSNIT’ and ‘amount of other incomes received per month’. As shown in Figure 3.2, specific cases that had missing values were 42 out of 330 cases, representing 12.73%. Each of the 42 cases had either 1, 2, 3 or 4 missing values. Overall, there were 49 missing values in the dataset out of a total of 34,320, representing 0.143%. Even though this appears insignificant, two (2) out of the four (4) variables that had missing values (number of years worked before retirement and number of years of contribution to SSNIT fund) had two missing values each. The other two (amount of monthly pension received from SSNIT and amount of other incomes received per month), had 23 and 22 values missing respectively. This means that some of the respondents did not report their income.

Figure 3.2: Summary of missing values

Missing values in a dataset pose a challenge because this can potentially shrink the sample size, create complications in handling and analyzing the data, and bias the results (Kaiser, 2014, p. 42; Soley-Bori, 2013b, p. 1). Given that income is a key independent variable in the study and that it had a greater percentage of missingness, it was deemed necessary to conduct other tests to determine what is described as the mechanism of missingness in order to guide the decision on what to do about the missing values.

There are three mechanisms of missingness, namely, missing completely at random (MCAR), missing at random (MAR) and missing not at random (MNAR). Soley-Bori (2013a, p. 5) notes that MCAR (which is also called uniform non-response) is the best mechanism or cause of missingness and hence the study opted for the MCAR test. MCAR means that the probability of having missing values in one independent variable is not related to another independent variable in the dataset but allows the possibility of being related to a dependent variable (Briggs, Clark, Wolstenholme, and Clarke, 2003; Graham, 2009; Kaiser, 2014). It is the view of Graham (2009) that MCAR results in unbiased estimation of population parameters, but the loss of statistical power is its main drawback. However, MCAR is said to be the best measure of missingness. Thus if missing values in a dataset are said to be missing completely at random, it can conveniently be considered not to significantly affect the outcome of the study.

Addressing the challenge of missing values

There are two conventional methods in addressing the problem of missing values in a dataset (Soley-Bori, 2013b). The first is listwise deletion, which means excluding all cases with missing values for any of the variables in the analysis. The main strength in this approach

according to Soley-Bori (2013a, p. 6) is that it does not require any computation and can also be used for any statistical analysis. Its main disadvantage is the possibility of excluding a significant proportion of the sample. For instance, in this study, 42 cases, representing 12.73% had missing values and hence excluding them from the analysis would reduce the sample significantly. The second is to substitute each of the missing values with imputed values using what is described as imputation methods.

Two methods were employed in addressing the missing values in the dataset. The first method, which involves replacement of the missing values with the mean of the series is usually used where the proportion of missing values is low and will not significantly affect the outcome of the study. Out of the 4 variables that had missing values, 2 (‘total number of years worked’ and ‘total number of years of contribution to the SSNIT pension scheme’) had 0.6% each of missing values (2 cases). The other 2 variables (‘amount of pension’ and ‘amount of other income’) had a greater proportion of missing values (7.0% and 6.0% respectively) and so it was deemed inappropriate to replace those values with the series mean. Consequently, the EM parameters estimation technique in SPSS was employed to estimate the missing values. According to Graham (2009), this is a ‘modern’ method for addressing the challenge of missing data in social science research. Through this method, the researcher arrived at a complete dataset to proceed with data analysis.

3.4.2 Qualitative data handling and analysis

As observed by Richards (2014, p. 16), qualitative data can be very complex and difficult for researchers to handle. Qualitative data were derived from the in-depth face-to-face interviews with 12 retirees. Each audio recorded interview, which lasted between 45 and 60 minutes, was transferred to a laptop as audio files and labelled for easy reference as suggested by McLellan, MacQueen, and Neidig (2003, p. 74). Each interview was replayed and attentively listened to by the researcher to be sure that the voices were clear enough for transcription. Since all the interviews were conducted in English, translation was not required. Given that any meaningful analysis of qualitative data is dependent on the quality of the data (Bazeley, 2009, p. 7), it was important to ensure that recorded voices were transcribed for reading and interpretation. McLellan et al. (2003, p. 66) advise researchers to choose between a verbatim transcription and both verbatim transcription and non-verbal expressions, including contexts. A verbatim transcription of the 12 interviews was chosen by the researcher using transcription software on a laptop, aided by a transcription foot pedal that was used to regulate the speed of the replay of

recorded voices to enable a full capture of audio voices into text. Following the transcription protocol of McLellan et al. (2003), copies of the transcripts were sent to the researcher’s academic advisers for validation.

Each transcribed interview was subjected to proofreading, following which the researcher described each interviewee’s story in relation to background, work experience, retirement income, and experience of health, housing, food, social and financial wellbeing. These descriptions were used as case studies as appropriate in Chapter Seven. Basically, a three-stage technique were used in analysing the qualitative data: data reduction, data display, and drawing a conclusion (Appleton, 1995).

Data reduction involved reading of transcribed data to identify key issues emerging under each of the five dimensions of wellbeing. As suggested by O'Leary (2014), colour highlighters were used for each dimension of wellbeing. The highlighted statements were then coded into themes and sub-themes in order to identify emerging patterns. Under each dimension of wellbeing, the data were reduced to a few issues that were reflected on and related to findings from the quantitative data analysis. The inter-connectedness of issues under each dimension of wellbeing was also noted. An inductive approach to qualitative data analysis was adopted in the process of identifying key drivers of wellbeing in the five dimensions as was done in the study by Aryeetey, Doh, and Andoh (2013).

Data display involved the presentation of qualitative data in reporting the findings of the research with the use of quotations, narratives, and discussions. It also entailed a reflexive engagement with relevant literature, findings from the quantitative data analysis, and the study’s research questions. The process followed the advice given by Bazeley (2009) that researchers ought to use this stage of the qualitative data analysis process to build arguments that drive home a point or points.

The final stage of the qualitative data analysis involved the conclusions drawn from the plethora of issues, themes, and patterns that emerged. In each dimension of wellbeing, the mechanisms or trajectories by which pension income directly or indirectly affected interviewees’ experience of wellbeing was derived from the data and the interconnectedness of dimensions noted.

In document From pension to wellbeing: A study of retirement income and wellbeing of retirees in Ghana (Page 76-81)