Before you can go about preserving and sharing your data, you will need to identify what needs to be preserved. You are unlikely to need to preserve all the data you collect or create in the course of your research. You will therefore need to select data of value, and dispose of the remainder.
It can be useful to undertake a systematic value assessment of your data to help you make an informed decision about what to preserve. We provide a set of appraisal criteria in the Data Selection and Appraisal Checklist (PDF). This document is intended for use by prospective depositors in the University's Research Data Archive, but the appraisal criteria are applicable to any preservation selection activity. They are based on the criteria provided in the NERC Data Value Checklist, which should be used by NERC-funded researchers. A more detailed guide to appraising and selecting research data for curation is provided by the Digital Curation Centre.
Below are highlighted some of the key considerations to be borne in mind when thinking about selecting data for preservation.
Validating published findings
What data will be required to validate the research findings that are placed on the public record, e.g. through publication in a research article or inclusion in a PhD thesis? Test data, results of failed experiments, and data from faulty instruments are obvious candidates for disposal. Data at intermediate stages of processing will often be surplus to requirements, as it is more important to preserve the raw data and the record of processing by which they were transformed from one state to the other. It may also be useful to preserve your data in its final processed format. Bear in mind that code files used to generate, process and analyse data may form part of the material required to validate results.
What is the intrinsic value of the data? Environmental data, for example, are unique to their time and place and have inherent value as part of the historical record. If these are lost they can never be replaced. Experiments can in principle be repeated, and the data reproduced, although the cost of doing so may be high.
Data may also have specific value for re-use by other stakeholders, for example, because of their usefulness for research, or because they may be re-used in products or services.
Some research may generate large volumes of data, at the scale of 100s gigabytes (GB) or several terabytes (TB). Examples of research producing data at these scales might include large-scale high-resolution imaging and video recording, and computer simulations of complex systems, where raw output can run to TB. Many data repositories will not have the capacity to handle very large datasets. Storage, preservation and transfer of data at these scales present both technical and financial challenges, to the extent that the cost of meaningful preservation and sharing of such data outputs may be in excess of any possible benefit. In the case of computer simulations in particular, it may be less important to preserve individual outputs than the model code and input parameters, by means of which a set of results can be reproduced.
Funders recognise that there may be practical limits to the preservability of some data. UKRI accepts that 'there may be cases in which it may not be possible or cost effective to preserve research data. This will depend on the type and scale of the data, their role in validating published results, and their predicted long-term usefulness for further research' (see guidance to the Common Principles on Data Policy, p. 4).
Even where it is not desirable or possible to deposit high-volume data outputs in a data repository, you may still wish to retain them, for your own ongoing use, and/or in order to be able to share them with others on request. In this case, you would need to store the data in a personal storage solution (with appropriate backup), and register them in the University Archive, so that others can find about about them and how to access them. For more information see the web page Where to archive data.
Data that cannot be shared
Are there any legal/ethical/contractual restrictions on what data can be shared? In many cases, this is unlikely to mean that data cannot be shared at all. Data may need to be redacted, e.g. to remove confidential or commercially-privileged information, or access to them may need to be restricted in some way.
As a general rule, you would be expected to preserve anonymised data only. For example, you may preserve anonymised transcripts, but dispose of original interview audio recordings; you may preserve anonymised quantitative data from an observation study, but would not record data by means of which individual participants might be identified.
Where confidential information or personal data cannot be removed from data (as may be the case with biometric data, for example), or where the risk of causing harm or distress by disclosure is significant, data may be preserved on a restricted-access basis using closed storage. Some data repositories, e.g. the UK Data Service ReShare repository the European Genome-phenome Archive, can manage controlled access to sensitive/confidential data. The University's Research Data Archive can also offer a restricted access option. Contact us if you wish to discuss this.