3.4 Methodology
3.4.6 Data processing, data reduction and data cleaning
Authors such as Richards (2009) and Robson (2002) write about setting up soft and hard copy filing systems prior to collecting data as it is easy to get overwhelmed by the data once carrying out fieldwork. Despite having an idea of the type of data records sought, the researcher was keen on maintaining flexibility in the structure until these records were sourced or created. As a consequence, various attempts were tried before settling on the resultant method of organising data. Pre-emptive decisions over hard and soft copy
has resulted in creation of many sub-folders. Once familiarity with the data became stronger the number of sub-folders reduced hand-in-hand with the data reduction process.
Data reduction
As reported in Chapter two, a process of data reduction was applied to the literature review data. This section focuses on the data records sourced as part of fieldwork and desk reviews.
Using the guidance of Miles and Huberman (1994), data reduction (or as referred to as ‘data condensation’ by Tesch, 1990), was considered a priority, by the researcher, in making the data manageable for analysis. Recognition was also given to the scenario that through the process of data reduction, preliminary analysis would, by default, also be carried out.
The process itself requires the preparation of a document sheet, or session summary sheet, per data record depending on data record type. For this research, the principles were applied in a soft copy format only - for ease of managing the data and making use of soft copy storage. A single Excel workbook was compiled with each sheet containing data relevant to a single stakeholder organisation as a ‘master workbook’. However, relatively early on, this initial tool became unmanageable. During the course of months 9-15, various versions of ‘master workbooks’ were tried and tested to capture the necessary and relevant information. As the number of data records and familiarity with the data grew, many of these attempts either failed to provide the requisite flexibility to store, access and apply further analysis applications, or became too unruly in quantity and quality to manage the data once processed. What seemed a disproportionate amount of time later the researcher reverted to a similar tool as the one being used for the literature review – still an Excel work book – which also enabled filtering and robust pivot table analysis. Following discussions with PhD colleagues and in terms of what the researcher anticipated doing with the data, this was considered an appropriate basis to revert to. The framework was slightly amended and to completion, proved
suitable. Anticipating that not all literature and other data records sourced would be applicable to answering one or more of the research questions and objectives, both the ‘Literature Articles’ master book and the ‘Data Reduction’ master book, with a filter function, allowed sub-sets of data to be extracted and combined, into a third master sheet, for specific analysis. This third sheet is called ‘Combined Master Work Book’ (see Appendix C – C1 - for extract)
An ‘Access’ database may have been equally appropriate however the researcher was not as experienced with the Access© software compared to Excel©, neither were the resources available for the researcher to ‘get up to speed’, hence considered counter-productive. Furthermore, there was also a risk that if components of the data reduction and analysis findings were to be sent to the stakeholders (consideration at the time18), the stakeholders may also have limited experience in using such software.
Specifically, each document and data archive was searched, using a search and find function, for information on each of the components of M&E (see Appendix C – C2 - for search terms) and concurrently the corresponding text or numbers were highlighted in preparation for analysis. A scale of yes, no, TBD19, (amongst other categories) was entered in the data reduction work sheet. For hard copy data records, a similar approach was used, where the researcher, speed read and highlighted using either pencil, post-its, or in some cases highlighters to signify the ‘key’ narrative or numbers.
For the series of SSI1 interviews the recordings were transferred into, where possible20, verbatim transcripts, by using an additional research resource21. In turn the recordings were listened to by the researcher and transcripts ‘cleaned’, rectifying typos and word gaps. Thereafter, the cleaned transcripts were sent
18
The researcher had considered returning an extract of the indicator (‘What’ component) worksheets to the stakeholders to review and code as priorities.
19
TBD - To Be Determined.
20
Some few parts of recordings were obscured by other unavoidable background noises i.e. birds, traffic, music and other conversations.
21
back to the relevant interviewee for comment, amendment and annotations as appropriate.
For the series of SSI2 interviews, again, using an additional research resource, the recordings were transferred into full transcripts, whereby omitting some of the ‘Umms’, ‘Mmms’, as these expressive sounds were considered, by the researcher, superfluous to the research analysis. As with SSI1, the recordings were ‘cleaned’ and returned to the interviewee for comment. In both cases, where feasible, reflective notes were also added. Certain information relating to the interviews were documented in the ‘data reduction’ worksheet catalogued as a session summary.
Normalising data
In theory, the temporal questions require normalisation of data where financial data is reported and there is the intention to compare financial data from one year to the next. The extent of normalising data in terms of this research was limited to financial year and where applicable used market value exchange rates to bring local currency into the US$.
Data cleaning
As with the interviews, the data included in the Master Workbooks needed cleaning in order to maximise the rigour of the research. This was achieved through (but not limited to) cross-checking print outs of each of the analysis worksheets with those filtered sheets from the original workbooks. Further ‘cleaning’ of data also took place during the data analysis phase.