The research data lifecycle
It can be helpful to think of research data management in terms of a research data lifecycle and the data-related activities that take place at stages during this lifecycle. The diagram below and the following legend illustrate this lifecyle in seven stages.
Here you will identify the data that will be collected or used to answer your research question, and will plan for data management throughout the lifecycle. This is the stage at which a data management plan would be created. Many public funders of research ask for a data management plan to be submitted as part of a research application.
This is the stage at which experiments are carried out, observations made, surveys undertaken, secondary materials acquired, etc. This will involve documentation of data collection instruments and methods and information necessary to interpret and use the data.
Data once collected will need to be processed in order to be usable. This might involve cleaning data to eliminate noise, combining data from multiple sources, transforming data from one state to another (e.g. by format conversion), and using procedures to validate or quality-control data. Any data processing will need to be documented, such that the end result could be replicated from the raw data.
Data analysis is the stage at which the raw materials of research are interrogated to produce the insights that constitute the research findings, which will be written up and published in research outputs. Instruments and methods used for analysis should be documented; code written for purposes of data analysis and visualisation may need to be preserved and made available in support of research results.
Towards the completion of your research you will preserve for the long term data that substantiate your research findings and have long-term value. Data will need to be prepared for preservation and archived in a suitable location. In many cases this will involve deposit of the digital data in a suitable data repository/data centre. Preservation activities may involve quality assurance of data, file format conversion, creation of metadata records with assignment of Digital Object Identifiers (DOIs) to datasets, licensing datasets for re-use, and putting in place any required access controls. Confidential and non-digital data may be held locally or in a non-public location, in which case they should be managed by an accountable person or group, who can ensure they are stored and preserved properly.
Publications based on data should include a data citation or a statement indicating where and on what terms the data can be accessed. A data repository will enable discovery of the data in its care by exposing the metadata online, and will provide access to the data when this is permitted. Data may be made publicly available, or restrictions on access may be imposed where data are of a sensitive or confidential nature. Data held locally or in non-public locations should be managed in such a way that others can discover and apply for access to the data.
Data that are available for discovery and access may be re-used by other researchers, either to substantiate the findings of the original research, or to generate new insights through further interrogation and analysis. At this stage the data may become raw materials collected within a new cycle of research. Research data may also have other valuable uses, e.g. in policy-making, development of commercial products and services, and teaching.