• No results found

Data Flow Organising action on Research Methods and Data Management

N/A
N/A
Protected

Academic year: 2021

Share "Data Flow Organising action on Research Methods and Data Management"

Copied!
12
0
0

Loading.... (view fulltext now)

Full text

(1)

Data Flow

Organising action on Research Methods and Data Management

Research Methods Support for Collaborative Crop Research Program (CCRP) Projects Funded by the McKnight Foundation

(2)

 

Data Flow

Organising action on research method and data management

Why has this guide been written?

SSC has been asked to provide support and training to the projects funded under the McKnight Foundation CCRP. This guide on "Data Flow" was written because, while much of this support is ‘on demand’, we need a way of organising and structuring our offer so that:

• Projects and scientists have a good idea of what it is reasonable to demand

• We can communicate efficiently though the use of a common means of visualising and describing what projects and SSC are doing

• We can link the work to the integrated monitoring and planning of the CCRP  

Data Flow

The idea of ‘data flow’ was developed by participants of the E/H Af CoP meeting in May 2009 Since then it has been further developed and discussed by SSC Staff and by the S Af CoP at their meeting in September 2009.

All the SSC work in support for Research Methods is about contributing to the generation, management, quality assurance and use of research data within research and development projects. We are therefore structuring our inputs around the way data 'flows' through a project. A first thought is that typically when a researcher collects a data set in the field or lab, he or she enters it into a computer and organises it, then subjects it to statistical analysis. A little further reflection shows that there are important steps before and after those, as shown Figure 1: 

  Figure 1: Data Flow

In any project there are loops and feedback. This is acknowledged but for the purpose of this guide those loops are implicit (not indicated in the figure). An important feature of the data flow depicted is that there are multiple stages in the flow of information. At each of them, quality assurance needs to be considered to ensure the overall quality of the research, leading to effective knowledge generation.

(3)

 

For each stage in data flow (as illustrated in Figure 1), there are a number of aspects that project teams and scientists need to be aware of and, usually, take some decisive action on. Some of the most common aspects are listed in Table 1. This list is not exhaustive, nor are the items prioritised. They will have differing relevance for different projects and we expect further items of concern to emerge when specific projects assess their data flow process.

Many of the items in Table 1 are relevant and important for any type of research for development, including social research and participatory research. The concepts behind many are also relevant to qualitative research and to action research that will generate and use data.

Table 1: Data flow and quality assurance

Data flow step   Examples of areas for action   Quality assurance 

Data ownership   – Intellectual property – Data exchange and sharing 

agreements  – Authorship   Planning for data 

collection  

– Understand the problem and set  clear objectives  

– Determine appropriate research  approaches 

– Plan and describe the outputs,  including outline tables and  graphs, that the data will be used  to generate. 

– Design the research activities  (experiments, surveys,  observation, or other  information collection  techniques) 

– Decide what to measure  – Decide what supporting data to 

collect e.g. climatic data  – Design data collection tools  and 

instruments  

– Plan details – field layout,  sampling, calendar…  

– Prepare human &  other  resources: Training and logistics  – Document the plan in a research 

protocol    

– Liaise with partners to  ensure relevance of and  shared objectives  – Are we using current best 

practice in our methods?

– Is there clear allocation  of responsibilities to  team members? 

– Does the research  protocol exist? 

– Is there a system of  version control for all  documents (protocols,  questionnaires etc)? 

– Is there a well defined  calendar of activities? 

– Has the design of the  research activity been  optimised? 

– Is the research plan  understood by all  involved? 

Data collection   Field and lab work  

 

– Implementation of an  error trapping system at  the time of data  collection 

– Assessment of field  reports to take actions to  improve quality 

– Identification of 

omissions and gaps in the  data collected with 

(4)

 

respect to the original  plan 

– Monitoring quality   – Has the plan of activities 

been updated to reflect  the implementation of  the research process? 

Data entry   – Find efficient, accurate and well‐

adapted systems 

– Prepare data entry system (e.g. 

spreadsheets, databases)   – Enter or import data   – Clean data and validate data  – Document how it was done  

– Design of automatic  checks for data entry  – Ensure well trained staff 

and collaborators  – Ensure backups are made 

of all the information in  electronic form  – Keeping records of data 

processing decisions  Statistical analysis   – Plan and describe the statistical 

analyses that will lead from data  to outputs. 

– Decision on which statistical or  other information processing  software to use 

– Data formatting for analysis   – Indicators and transformations   – Data exploration and summary  – Statistical Analysis – trade‐off 

perfection and  practice  

– Keep a data processing  log 

– Keep well documented  syntax of data processing  tasks. 

– Well organised and   documented datasets 

Interpret and write  up  

– Interpret data 

– Presentation of results – tables & 

graphs  

– Merging new information with  what was previously known and  working out the implications  – Reporting summary outputs,  conclusions, and next steps  – Documentation for dissemination 

(report, journal papers, leaflets  etc)  

– Does the research team  have a system for  checking and reviewing  quality of products  before they are released?

 

Storage and access   – Data storage and archive – Data re‐use 

– Public access and donor  requirements 

– Existence of a clear  access policy and a  defined system to  request data access   – Metadata available and 

organised together with  the data 

Feedback to  originators  

– Suitable formats and occasions  – Have research products  been generated in a form  appropriate to the  intended audiences? 

– Is there a defined  mechanism for receiving  feedback from interested 

(5)

 

parties? 

– Has the plan of activities  been updated to reflect  the implementation of  the research process?  

– Have research products  been reviewed and  accepted by  stakeholders? 

   

 

There are loops and feedbacks implicit in this system. This means projects can and should be planning for some of the later steps in the data flow before the earlier steps have been completed. For example data entry

procedures should be in place prior to data collection. Indeed developing the data entry system in parallel with developing the data collection instruments facilitates and speeds up the data entry process as it can be started as soon as the first data collection sheets are returned.

Diagnosis, planning and action

The Data Flow concept can be used for improved performance of research projects, using the ideas of IMEP (integrated monitoring, evaluation and planning) or the ‘What? So What? Now What?’ framework.

For Data Flow, we find it useful to think of – Problems (What?),

Targets (So what?)

and strategies and actions (Now what?).

The same model is used in quality assurance, with the emphasis on continuous improvement of performance.

This is very relevant in this context as project teams try to improve themselves so they reach their own goals, are better prepared for future projects, and contribute to building the skills of research organisations

Diagnosing problems

Researchers are well aware of some of the data flow problems they have. However there are some areas in which they are not aware of the issues. This might be because they have not yet had to face them (e.g. If a data sharing conflict has not yet arisen in their project ) or because they are unaware of alternatives (e.g. If data has always been collected on paper then entered by hand onto a computer). Therefore projects will need assistance in assessing the current status. This could be (a) with experts visiting a project, (b) during workshops, or (c) through self-diagnosis guides.

Setting targets

The target is the state of managing data flow that the project would like to get to. For each data flow step and issue there are many possible targets, with no universally applicable standards. However there are some recognised good practices, expectations of some parties (such as journal editors, donors, universities in which students are registered). Projects should be selecting data flow targets which (a) will ensure they meet their overall project objectives, (b) are appropriate to the scale of the project and its human and technical resources, and (c) are realistic and achievable while also pushing the project towards higher standards. Hence we will help projects set targets by describing some of the options and standards that have been used elsewhere.

(6)

 

Taking action

The first action that needs taking is for the projects to agree that they want to try to change and improve their data flow, and to set suitable targets. Then the actions needed will be a blend of formal training, support from specialists and self-learning. The last is most important: no amount of formal training and external input can make a real difference if scientists do not take seriously the task of broadening their understanding and learning new skills. So the actions of those outside the project should be directed at that – helping scientists become self- learners and providing access to appropriate resources.

Project level

It is assumed that the ‘project’ is the right level at which to do the diagnosis, planning and take action. Here we are referring to a ‘project’ as a body of work supported by a grant from The McKnight Foundation. Such a project has a (fairly) clear boundary and timeframe, specific outputs to produce and an identified team to implement it – all necessary for diagnosis, planning and action. Some projects are spread across countries and organisations. In these cases it may be sensible for different parts of a project to manage their dataflow in different ways. However at some point data for a project will have to be pulled together, and this will be easiest if there is some coherence across sites.

Projects actually function because of the individuals in them. Improving data flow on a project will only be possible if those individuals understand and are committed to it. It is not sufficient for ‘The project’ (usually meaning the PI) to take a decision on improving data flow if the people who will need to do something different do not understand and support the decision.

Next Steps

The final two tables here are provided to help projects through the process of exploring data flow, exploring important issues and deciding on priorities for action. Table 2 provides examples of some of the problems, targets and actions that are commonly found in research projects data flows. Table 3 is a template for you to use to diagnose the current situation with your own project.

 

(7)

 

Page 6 of 12   

Table 2: Potential problems, targets and actions

 

Data flow item  Diagnosis (What?)  Targets (So what?)  Taking action (What now?) 

 

 

Typical problems and diagnosis 

tools

  –

Examples of targets projects may set, 

including those used by others

 

– Possible actions

– Guides and tools to help  

Data ownership   No clear understanding of data  ownership 

– Conflicts over who can access and  use data 

– Conflicts over authorship of  publications 

– Written data sharing and authorship  agreements in place at the start of  the project 

– Discuss authorship and data ownership  and access in early project meetings  – SSC prepare a template data sharing 

agreement 

– SSC prepare a template authorship  agreement 

  Unclear IP ownership with project 

scientists from different  organisations 

 

– All partners and partner organisations  understand and agree to IP status of  project outputs 

– Raise awareness of the IP ownership  early in the project before  any problem  arises  

– All project scientist look at CCRP IP pages Planning for data 

collection 

– Lack of rigour in collection of data  in participatory research 

– All participatory research meets usual  standards of scientific rigour and  could be published. 

– Seek peer review of research design – Seek exchange of experience with 

partners who may have experience in  tackling this issue 

– Get statistical advice (Yes! This is  important in participatory research!)    – No sample size justification – Explicit sample size justification for all 

research studies. 

– Seek guidance on choice of sample size – Learn about methods of choosing sample 

size 

  Experimental designs not suitable 

or optimised for the problem   

– Design of all experiments goes  through peer and statistical review 

– Seek guidance on experimental designs – Training courses in design of research 

studies 

– SSC prepare checklists for some common  designs 

 

(8)

 

Page 7 of 12 

Data flow item  Diagnosis (What?) Targets (So what?) Taking action (What now?)

 

 

Typical problems and diagnosis 

tools

  –

Examples of targets projects may set, 

including those used by others

 

– Possible actions

– Guides and tools to help  

  – No written protocol before start of 

data collection 

– Project does not use research  activity protocols 

– A written protocol prepared, shared  and reviewed before every data  collection activity starts (survey,  experiment, participatory data  collection)  

– SSC prepare templates or checklists for  protocols  for some common research  types. 

– SSC provide a service that reads  protocols and offers feedback  

  – Starting data collection with no 

plan of exactly how the data will be  used and the outputs that will be  generated from it.  

– Skeleton or outline tables and graphs  prepared that show how the data will  be used before every data collection  activity. 

– SSC prepare a guide on use of skeleton  tables, graphs and statistical analyses. 

– Get all such plans review by SSC. 

Data collection  Data returned from field is often  incomplete or error‐prone   – No formal system exists to ensure 

completeness of data when  brought from the field to the office – Senior scientist rarely in the field to 

check data collection 

– Explicit quality assurance processes in  place for all data collection 

– All involved in field data collection  trained in their use.  

– Senior scientists spend time in the  field during each data collection  activity 

– SSC prepare guide on simple field data  quality assurance techniques  

– Develop a list of key skills for data  collection and organise training 

Data entry   – Data entry and organisation of low  quality as no one in the team is an  expert 

 

– A data manager with the right skills  and background in each project team

– Seek advice on key skills required from a  competent data manager  

– Train a member of the team on data  management 

– Recruit a new team member who brings  good data management skills 

  – Data entry very slow, delaying 

progress with research 

– All data ready for analysis within 1  week of field collection 

– Get advice on appropriate data entry  methods 

– Adopt new technology for data entry –   entry in the field, data entry software,  etc. 

(9)

 

Page 8 of 12   

Data flow item  Diagnosis (What?) Targets (So what?) Taking action (What now?)

 

 

Typical problems and diagnosis 

tools

  –

Examples of targets projects may set, 

including those used by others

 

– Possible actions

– Guides and tools to help  

– Multiple versions of data files exist with no one certain why they differ  and which is correct 

– Data dispersed over many  computers and locations 

– Everyone in a project knows where  the current correct  data is, and are  able to access and use any data they  need. 

– Prepare a data management plan that  includes processes for tracking changes  data file versions and reasons for  changes. 

Statistical analysis  – Difficulties in deciding which  statistical technique is most  appropriate to ensure that the  project data provides evidence that  fulfils the project objectives 

– Seek advice on statistical techniques  available to fulfil project objectives 

  – Long time spent organising data 

sets when trying to analyse. 

– Data organised for efficient  processing. 

– Seek advice on options available for data  organisation from SSC 

– Data organisation and formats planning  before data collection 

  Project does not have the skills 

required to carry out statistical  data analysis 

– Difficulties with statistical analysis  never hold up interpretation and use  of research data 

– Seek partnership with institutions or  individuals who can provide data analysis  services 

– Request SSC to help in developing data  analysis skills and/or analysis of project  data 

Interpret and write up  Results from data analysis are  difficult to interpret 

– ? – Seek advice on the interpretation of 

statistical results from SSC 

– Seek advice on interpretation of non‐

statistical results from experts in the field  (SSC may be able to help locating them) 

(10)

 

Page 9 of 12 

Data flow item  Diagnosis (What?) Targets (So what?) Taking action (What now?)

 

 

Typical problems and diagnosis 

tools

  –

Examples of targets projects may set, 

including those used by others

 

– Possible actions

– Guides and tools to help  

  – Difficulties in linking data 

processing with writing up because  different people and skills are  required for each stage  –  

– ? – Organise an analysis and writing up 

workshop. Seek specialised support from  SSC and Regional Team. 

Storage and access  – Data lost due to hardware failure,  theft or misplacement. 

– Data dispersed over many locations  and computers, with no one able to  retrieve what is needed 

– Data from earlier activities can not  be found, retrieved or used  – Project progress held up because 

some project members hold data  needed by others 

– Data archive  built as the project  progresses, so it is complete when the  project finishes. 

– Seek advice on technical options  available for the production of a data  archive. 

– Training in data archiving, including use  of public data archives. 

Feedback to originators  There is no defined mechanism to   give feedback to the originators of  the data, for example, farmers  engaged in the implementation of  field activities and collection of  information 

– All research results shared with  farmers who participated in the  research process 

– Use the process of feedback to  engage in discussions about the "So  what" and "What now" of the  research outputs with the farmers  who participated in the research  process 

– Plan for and budget activities that enable  the project to share results with farmers – Engage participating farmers in the 

process of analysis and conclusions from  the research process 

– Produce research results dissemination  products that are suitable for 

participating farmers and their  communities 

 

 

 

(11)

 

Page 10 of 12   

Table 3: Template for problems, targets and actions 

Data flow item  Diagnosis (What?)  Targets (So what?)  Taking action (What now?) 

 

 

Typical problems and diagnosis 

tools

  –

Examples of targets projects may set, 

including those used by others

 

– Possible actions  

Data ownership    

–   – –

Planning for data  collection 

–   – –

–   – –

Data collection   

–   – –

Data entry    – –

–   – –

(12)

 

Page 11 of 12 

Data flow item  Diagnosis (What?) Targets (So what?) Taking action (What now?)

 

 

Typical problems and diagnosis 

tools

  –

Examples of targets projects may set, 

including those used by others

 

– Possible actions  

Statistical analysis  –   – –

–   – –

Interpret and write up  –   – –

–   – –

Storage and access  –   – –

–   – –

Feedback to originators   

–   – –

   

References

Related documents

А для того, щоб така системна організація інформаційного забезпечення управління існувала необхідно додержуватися наступних принципів:

As you may recall, last year Evanston voters approved a referendum question for electric aggregation and authorized the city to negotiate electricity supply rates for its residents

After creating the metadata for an entity type, you can use the Generate Jobs option from the entity type editor toolbar to create and publish jobs to the DataFlux Data

Read, which is a program to promote reading among reluctant male readers.

The results, reported in Table 2, show that we find no difference across the four assignments, except for household size, with the households assigned to the detailed self-report

14 Note that we calculate the technological regime indicator for each industry in each region separately industries, the share of observations with no startup-up in a

In the previous experiments using spoken instructions, the backward counting effect was significantly larger than the articulatory suppression, suggesting a greater contribution

In sum, therefore, the argument is that Teva can encourage the broad appearance in patent cases of the methodology of claim construction described above. In the interest of