Data Flow Organising action on Research Methods and Data Management

(1)

Data Flow

Organising action on Research Methods and Data Management

Research Methods Support for Collaborative Crop Research Program (CCRP) Projects Funded by the McKnight Foundation

(2)

Data Flow

Organising action on research method and data management

Why has this guide been written?

SSC has been asked to provide support and training to the projects funded under the McKnight Foundation CCRP. This guide on "Data Flow" was written because, while much of this support is ‘on demand’, we need a way of organising and structuring our offer so that:

• Projects and scientists have a good idea of what it is reasonable to demand

• We can communicate efficiently though the use of a common means of visualising and describing what projects and SSC are doing

• We can link the work to the integrated monitoring and planning of the CCRP

Data Flow

The idea of ‘data flow’ was developed by participants of the E/H Af CoP meeting in May 2009 Since then it has been further developed and discussed by SSC Staff and by the S Af CoP at their meeting in September 2009.

All the SSC work in support for Research Methods is about contributing to the generation, management, quality assurance and use of research data within research and development projects. We are therefore structuring our inputs around the way data 'flows' through a project. A first thought is that typically when a researcher collects a data set in the field or lab, he or she enters it into a computer and organises it, then subjects it to statistical analysis. A little further reflection shows that there are important steps before and after those, as shown Figure 1:

Figure 1: Data Flow

In any project there are loops and feedback. This is acknowledged but for the purpose of this guide those loops are implicit (not indicated in the figure). An important feature of the data flow depicted is that there are multiple stages in the flow of information. At each of them, quality assurance needs to be considered to ensure the overall quality of the research, leading to effective knowledge generation.

(3)

For each stage in data flow (as illustrated in Figure 1), there are a number of aspects that project teams and scientists need to be aware of and, usually, take some decisive action on. Some of the most common aspects are listed in Table 1. This list is not exhaustive, nor are the items prioritised. They will have differing relevance for different projects and we expect further items of concern to emerge when specific projects assess their data flow process.

Many of the items in Table 1 are relevant and important for any type of research for development, including social research and participatory research. The concepts behind many are also relevant to qualitative research and to action research that will generate and use data.

Table 1: Data flow and quality assurance

Data flow step Examples of areas for action Quality assurance

Data ownership – Intellectual property – Data exchange and sharing

agreements – Authorship Planning for data

collection

– Understand the problem and set clear objectives

– Determine appropriate research approaches

– Plan and describe the outputs, including outline tables and graphs, that the data will be used to generate.

– Design the research activities (experiments, surveys, observation, or other information collection techniques)

– Decide what to measure – Decide what supporting data to

collect e.g. climatic data – Design data collection tools and

instruments

– Plan details – field layout, sampling, calendar…

– Prepare human & other resources: Training and logistics – Document the plan in a research

protocol

– Liaise with partners to ensure relevance of and shared objectives – Are we using current best

practice in our methods?

– Is there clear allocation of responsibilities to team members?

– Does the research protocol exist?

– Is there a system of version control for all documents (protocols, questionnaires etc)?

– Is there a well defined calendar of activities?

– Has the design of the research activity been optimised?

– Is the research plan understood by all involved?

Data collection ^– Field and lab work

– Implementation of an error trapping system at the time of data collection

– Assessment of field reports to take actions to improve quality

– Identification of

omissions and gaps in the data collected with

(4)

respect to the original plan

– Monitoring quality – Has the plan of activities

been updated to reflect the implementation of the research process?

Data entry – Find efficient, accurate and well‐

adapted systems

– Prepare data entry system (e.g.

spreadsheets, databases) – Enter or import data – Clean data and validate data – Document how it was done

– Design of automatic checks for data entry – Ensure well trained staff

and collaborators – Ensure backups are made

of all the information in electronic form – Keeping records of data

processing decisions Statistical analysis – Plan and describe the statistical

analyses that will lead from data to outputs.

– Decision on which statistical or other information processing software to use

– Data formatting for analysis – Indicators and transformations – Data exploration and summary – Statistical Analysis – trade‐off

perfection and practice

– Keep a data processing log

– Keep well documented syntax of data processing tasks.

– Well organised and documented datasets

Interpret and write up

– Interpret data

– Presentation of results – tables &

graphs

– Merging new information with what was previously known and working out the implications – Reporting summary outputs, conclusions, and next steps – Documentation for dissemination

(report, journal papers, leaflets etc)

– Does the research team have a system for checking and reviewing quality of products before they are released?

Storage and access – Data storage and archive – Data re‐use

– Public access and donor requirements

– Existence of a clear access policy and a defined system to request data access – Metadata available and

organised together with the data

Feedback to originators

– Suitable formats and occasions – Have research products been generated in a form appropriate to the intended audiences?

– Is there a defined mechanism for receiving feedback from interested

(5)

parties?

– Has the plan of activities been updated to reflect the implementation of the research process?

– Have research products been reviewed and accepted by stakeholders?

There are loops and feedbacks implicit in this system. This means projects can and should be planning for some of the later steps in the data flow before the earlier steps have been completed. For example data entry

procedures should be in place prior to data collection. Indeed developing the data entry system in parallel with developing the data collection instruments facilitates and speeds up the data entry process as it can be started as soon as the first data collection sheets are returned.

Diagnosis, planning and action

The Data Flow concept can be used for improved performance of research projects, using the ideas of IMEP (integrated monitoring, evaluation and planning) or the ‘What? So What? Now What?’ framework.

For Data Flow, we find it useful to think of – Problems (What?),

– Targets (So what?)

– and strategies and actions (Now what?).

The same model is used in quality assurance, with the emphasis on continuous improvement of performance.

This is very relevant in this context as project teams try to improve themselves so they reach their own goals, are better prepared for future projects, and contribute to building the skills of research organisations

Diagnosing problems

Researchers are well aware of some of the data flow problems they have. However there are some areas in which they are not aware of the issues. This might be because they have not yet had to face them (e.g. If a data sharing conflict has not yet arisen in their project ) or because they are unaware of alternatives (e.g. If data has always been collected on paper then entered by hand onto a computer). Therefore projects will need assistance in assessing the current status. This could be (a) with experts visiting a project, (b) during workshops, or (c) through self-diagnosis guides.

Setting targets

The target is the state of managing data flow that the project would like to get to. For each data flow step and issue there are many possible targets, with no universally applicable standards. However there are some recognised good practices, expectations of some parties (such as journal editors, donors, universities in which students are registered). Projects should be selecting data flow targets which (a) will ensure they meet their overall project objectives, (b) are appropriate to the scale of the project and its human and technical resources, and (c) are realistic and achievable while also pushing the project towards higher standards. Hence we will help projects set targets by describing some of the options and standards that have been used elsewhere.

(6)

Taking action

The first action that needs taking is for the projects to agree that they want to try to change and improve their data flow, and to set suitable targets. Then the actions needed will be a blend of formal training, support from specialists and self-learning. The last is most important: no amount of formal training and external input can make a real difference if scientists do not take seriously the task of broadening their understanding and learning new skills. So the actions of those outside the project should be directed at that – helping scientists become self- learners and providing access to appropriate resources.

Project level

It is assumed that the ‘project’ is the right level at which to do the diagnosis, planning and take action. Here we are referring to a ‘project’ as a body of work supported by a grant from The McKnight Foundation. Such a project has a (fairly) clear boundary and timeframe, specific outputs to produce and an identified team to implement it – all necessary for diagnosis, planning and action. Some projects are spread across countries and organisations. In these cases it may be sensible for different parts of a project to manage their dataflow in different ways. However at some point data for a project will have to be pulled together, and this will be easiest if there is some coherence across sites.

Projects actually function because of the individuals in them. Improving data flow on a project will only be possible if those individuals understand and are committed to it. It is not sufficient for ‘The project’ (usually meaning the PI) to take a decision on improving data flow if the people who will need to do something different do not understand and support the decision.

Next Steps

The final two tables here are provided to help projects through the process of exploring data flow, exploring important issues and deciding on priorities for action. Table 2 provides examples of some of the problems, targets and actions that are commonly found in research projects data flows. Table 3 is a template for you to use to diagnose the current situation with your own project.

(7)

Page 6 of 12

Table 2: Potential problems, targets and actions

Data flow item Diagnosis (What?) Targets (So what?) Taking action (What now?)

–

Typical problems and diagnosis

tools

–

Examples of targets projects may set,

including those used by others

– Possible actions

– Guides and tools to help

Data ownership ^– No clear understanding of data ownership

– Conflicts over who can access and use data

– Conflicts over authorship of publications

– Written data sharing and authorship agreements in place at the start of the project

– Discuss authorship and data ownership and access in early project meetings – SSC prepare a template data sharing

agreement

– SSC prepare a template authorship agreement

^– Unclear IP ownership with project

scientists from different organisations

– All partners and partner organisations understand and agree to IP status of project outputs

– Raise awareness of the IP ownership early in the project before any problem arises

– All project scientist look at CCRP IP pages Planning for data

collection

– Lack of rigour in collection of data in participatory research

– All participatory research meets usual standards of scientific rigour and could be published.

– Seek peer review of research design – Seek exchange of experience with

partners who may have experience in tackling this issue

– Get statistical advice (Yes! This is important in participatory research!) – No sample size justification – Explicit sample size justification for all

research studies.

– Seek guidance on choice of sample size – Learn about methods of choosing sample

size

^– Experimental designs not suitable

or optimised for the problem

– Design of all experiments goes through peer and statistical review

– Seek guidance on experimental designs – Training courses in design of research

studies

– SSC prepare checklists for some common designs

(8)

Page 7 of 12

–

tools

–

– No written protocol before start of

data collection

– Project does not use research activity protocols

– A written protocol prepared, shared and reviewed before every data collection activity starts (survey, experiment, participatory data collection)

– SSC prepare templates or checklists for protocols for some common research types.

– SSC provide a service that reads protocols and offers feedback

– Starting data collection with no

plan of exactly how the data will be used and the outputs that will be generated from it.

– Skeleton or outline tables and graphs prepared that show how the data will be used before every data collection activity.

– SSC prepare a guide on use of skeleton tables, graphs and statistical analyses.

– Get all such plans review by SSC.

Data collection ^– Data returned from field is often incomplete or error‐prone – No formal system exists to ensure

completeness of data when brought from the field to the office – Senior scientist rarely in the field to

check data collection

– Explicit quality assurance processes in place for all data collection

– All involved in field data collection trained in their use.

– Senior scientists spend time in the field during each data collection activity

– SSC prepare guide on simple field data quality assurance techniques

– Develop a list of key skills for data collection and organise training

Data entry – Data entry and organisation of low quality as no one in the team is an expert

– A data manager with the right skills and background in each project team

– Seek advice on key skills required from a competent data manager

– Train a member of the team on data management

– Recruit a new team member who brings good data management skills

– Data entry very slow, delaying

progress with research

– All data ready for analysis within 1 week of field collection

– Get advice on appropriate data entry methods

– Adopt new technology for data entry – entry in the field, data entry software, etc.

(9)

Page 8 of 12

–

tools

–

– Multiple versions of data files exist with no one certain why they differ and which is correct

– Data dispersed over many computers and locations

– Everyone in a project knows where the current correct data is, and are able to access and use any data they need.

– Prepare a data management plan that includes processes for tracking changes data file versions and reasons for changes.

Statistical analysis – Difficulties in deciding which statistical technique is most appropriate to ensure that the project data provides evidence that fulfils the project objectives

– Seek advice on statistical techniques available to fulfil project objectives

– Long time spent organising data

sets when trying to analyse.

– Data organised for efficient processing.

– Seek advice on options available for data organisation from SSC

– Data organisation and formats planning before data collection

^– Project does not have the skills

required to carry out statistical data analysis

– Difficulties with statistical analysis never hold up interpretation and use of research data

– Seek partnership with institutions or individuals who can provide data analysis services

– Request SSC to help in developing data analysis skills and/or analysis of project data

Interpret and write up ^– Results from data analysis are difficult to interpret

– ? – Seek advice on the interpretation of

statistical results from SSC

– Seek advice on interpretation of non‐

statistical results from experts in the field (SSC may be able to help locating them)

(10)

Page 9 of 12

–

tools

–

– Difficulties in linking data

processing with writing up because different people and skills are required for each stage –

– ? – Organise an analysis and writing up

workshop. Seek specialised support from SSC and Regional Team.

Storage and access – Data lost due to hardware failure, theft or misplacement.

– Data dispersed over many locations and computers, with no one able to retrieve what is needed

– Data from earlier activities can not be found, retrieved or used – Project progress held up because

some project members hold data needed by others

– Data archive built as the project progresses, so it is complete when the project finishes.

– Seek advice on technical options available for the production of a data archive.

– Training in data archiving, including use of public data archives.

Feedback to originators ^– There is no defined mechanism to give feedback to the originators of the data, for example, farmers engaged in the implementation of field activities and collection of information

– All research results shared with farmers who participated in the research process

– Use the process of feedback to engage in discussions about the "So what" and "What now" of the research outputs with the farmers who participated in the research process

– Plan for and budget activities that enable the project to share results with farmers – Engage participating farmers in the

process of analysis and conclusions from the research process

– Produce research results dissemination products that are suitable for

participating farmers and their communities

(11)

Page 10 of 12

Table 3: Template for problems, targets and actions

–

tools

–

Data ownership ^– ^– ^–

– – –

Planning for data collection

– – –

Data collection ^– ^– ^–

– – –

Data entry – –

– – –

(12)

Page 11 of 12

–

tools

–

Statistical analysis – – –

– – –

Interpret and write up – – –

– – –

Storage and access – – –

– – –

Feedback to originators ^– ^– ^–

– – –