Statistics and Data Management Guides
Guidelines for Research Data Management and Analysis
Our guides are intended to help researchers plan and implement their research programmes in a way that will allow all data management and statistical aspects of the research to be dealt with efficiently and effectively. We have grouped the guides to convey the idea of data flowing through a research project. Several important steps in this Data Flow process have been identified. Our guides have been organised below according to where they fit in the Data Flow process.
Please use the menu on the side to browse our resources for the different stages of the research project.
We also have links to a number of papers related to the use of participatory activities in research.
This is the initial task that follows directly from the research proposal. The proposal usually specifies the project as a whole, which has to be broken into specific activities that contribute to achieving the research objectives. One output from this planning stage is the documentation of a protocol for each activity. The protocol would include approaches to be used for each activity. Further work at this stage would include planning the rest of the Data Flow process, e.g. establishing protocols for Data Ownership and documenting strategies and responsibilities for data entry, organisation, storage and dissemination.
We have put Data Ownership as the very first step in the Data Flow process as we feel it is important to establish protocols at the outset with agreement from all parties. This will help to avoid problems in the future such as individuals being unwilling to share “their” data with the rest of the project team. We have raised the subject of Data Ownership in many of our training events and it has always generated an interesting and often emotive debate.
You should establish who owns the data, and who therefore has the responsibility for collecting, using and looking after the data in an ethical and responsible manner.
The following PowerPoint slide-show, available in both English and French, gives further details to help researchers prepare a Data Ownership Protocol for their project.
A related issue is Authorship of publications resulting from a research project. When working with partners on joint projects authorship should be discussed early in the process, long before any possible problems or misunderstandings arise. Authorship brings advantages but also responsibilities and, if not handled correctly, can lead to bad feeling among members of the project team. Our colleagues at ICRAF have put together the following guidelines regarding authorship.
We include here a template (in English and French) for use in setting up a Data Management Protocol. This can be adjusted appropriately to serve the needs of your own project. We also include a set of guidelines suitable for those undertaking experimental research.
Writing research protocols for each research activity is an important aspect to consider at the planning stage. Such protocols should be considered to be dynamic entities and modified/updated while the activity is in progress. The final version would then reflect what actually took place during the activity. The guide we give below concentrates on the statistical aspects of the protocol. It is available in both English and Spanish.
Planning for Data Collection
Planning data collection includes preparing the fieldworker manuals, training the fieldworkers, planning the schedule for the work, etc. High quality data is only possible if the data collection exercise is planned (and piloted) in detail. Decisions have to be made on what primary data to measure and what supporting data to collect. The detailed planning also includes the design of the data collection tools/instruments; plans of field layouts, and sampling schemes. Other practical issues include logistics, costs, timing of collection activities and the planning and execution of enumerator training programmes.
The first three of the guides below are specifically related to the design of experimental studies, while the last two relate to surveys and sampling.
Planning for Data Entry
In parallel with planning the data collection, you should also be preparing the data entry system. This can and should be developed at the same time as the data collection instruments and before any fieldwork. This ensures that data entry can start as soon as possible after data collection. In planning data entry, you need to also consider what software to use – should you use a database system or is a spreadsheet adequate? The third guide below will help you to make this decision.
CS-Pro (Census and Survey Processing System) is a software package for entry, editing, tabulation and dissemination of census and survey data. It was developed jointly by the U.S. Census Bureau, Macro International and Serpro S.A., with major funding from the U.S. Agency for International Development. The software can be downloaded from the website of the U.S. Census Bureau and is available free of charge.
We have produced a series of video demonstrations to help you set up and use data entry systems using CS-Pro. Here are our choice picks of the series that relate to this stage of the Data Flow process:Watch our full CS-Pro playlist on YouTube
Mobile Data Entry
Many researchers now use mobile devices to enter data directly in the field. If these devices are properly configured with all the necessary checks, they can help to reduce the data entry phase substantially, thus making the data ready for analysis that much sooner.
The Open Data Kit (ODK) is a free an open-source set of tools that allow users to author forms in excel, collect data on Android devices and aggregate data onto a server for storage and organisation. There are many guides to the various components of ODK available online. Here are links to a few useful resources.
- XLSForm.org: A good introduction to authoring digital data collection forms in Excel.
- ODK Collect: A popular Android application to enable digital data collection on mobile devices.
- ODK Aggregate: Information about intalling an ODK Aggregate server.
- ODK Training Guides: A list of links and information about training for ODK. This includes guides on using ODK for a project, and detailed notes about training enumerators to use ODK Collect in the field.
CSPro also now has an Android application for mobile data collection. The application is quite new, and works with the latest version of CSPro (6.1, as of July 2015): http://www.csprousers.org/category/android/
Exploratory Data Analysis
The initial step in the analysis is often an exploration of the data. This includes searching for oddities, and continuing to pay attention to the accuracy and validity of the data. We include 2 videos here, from our series of CS-Pro demonstrations. The first shows how to create frequency tables in CS-Pro while the second demonstrates how to export data so that it can be read into a statistical package for analysis. Frequency tables are useful to produce at this stage as they provide the means to check for anomalies and allows an initial feel for the data.
The data are now processed to answer the objectives of the activity. When researchers are skilled at using suitable software, the initial exploratory work is often quick and the main story in the data can be readily seen. Any delay is often because the objectives were not complete or were not clearly specified. Other common reasons for a delay at this stage are lack of confidence by the researchers in the appropriate methods of analysis, or poor organisation of the data from the previous stage. Our guides below cover general statistical guidelines; information on various statistical techniques; the analysis of survey data and the analysis of experimental data. A glossary of statistical terms is also included.
Interpretation and Write-up
Preparing materials ready for dissemination and publication requires a careful write-up of the results of the analysis and presenting the results in such a way that important messages emerging from the research can be understood easily by other interested researchers and/or policy planners. The guides below are intended to assist the researcher at the write-up stage of the research findings.
Bridging the gap between statistics and participatory methods
This page contains materials aimed at helping with the integration of statistical and participatory principles for research. Our intention is to contribute to the development of methods that take advantage of the strengths of statistics and participatory methods when gathering information for decision making in a development context.
For more information or contributions to this section, please contact Carlos Barahona at firstname.lastname@example.org.
Statistical Services Centre, Working Paper
by C. Barahona and S. Levy
Ian Wilson, email@example.com
Savitri Wilson ne Abeyasekera, firstname.lastname@example.org
Savitri Wilson ne Abeyasekera, email@example.com