Open and FAIR data
The University and many public funders expect data supporting research findings to be made openly available wherever possible. According to The Open Definition, open data 'can be freely used, modified, and shared by anyone for any purpose'.
But permission alone is not enough if there is no means to find, access and use the data. Open data also have to be:
- explicitly identified and formally entered on the online public record, so that they can be accurately cited and discovered;
- accessible, so that they can be opened, read and processed;
- presented and documented in such a way that they can be understood and used.
These usability conditions are expressed in the FAIR Data Principles, according to which data must be Findable, Accessible, Interoperable and Re-usable. The FAIR Principles were first set out in 2016 by a group of stakeholders from academia, industry, funding agencies, and scholarly publishers. The Principles put specific emphasis on the ability of machines to automatically find and use data and/or related metadata, in addition to supporting re-use by individuals.
Since they were first published, the FAIR Principles have achieved widespread acceptance, and have been adopted as standards for management of data, development of infrastructure and delivery of services.
Open data, to be open to fullest extent, must also be FAIR. (But note that FAIR data do not have to be open: restricted-access data may be FAIR, providing the metadata describing them are openly accessible.)
It is important to think about making data FAIR from the outset of your research, as this may affect how you collect and document your data, the formats you store the data in, how you preserve and share the data, and how they are licensed for re-use.
Use a data repository
To be made open and FAIR, data should be deposited in a data repository. A data repository is a service that exists to preserve and provide access to research data. It is a future-proofed vehicle for ensuring that data remain accessible and usable over the long-term.
Using a data repository is preferable to sharing data as supplementary files alongside a published article, or via cloud-based file storage and sharing services (such as Dropbox, or the Open Science Framework), or maintaining data in private storage and sharing on request only. None of these ways of sharing data is fully FAIR.
A data repository performs a number of specific FAIR functions:
- It actively preserves data for long-term viability, e.g. replicating and validating data files, migrating to preservation formats;
- It publishes machine-readable metadata to enable online discovery;
- It assigns persistent unique identifiers (e.g. DOIs) to datasets and makes them citable;
- It quality-controls datasets and enhances metadata, e.g. by applying standard vocabularies (not all repositories do this);
- It manages online access to data so that they can be used by other people;
Examples of data repositories include: disciplinary data centres and their component databases, such as the NERC data centres and the databases of the European Bioinformatics Institute; institutional data repositories, such as the University of Reading Research Data Archive; and general-purpose data sharing services, such as Zenodo and figshare.
We provide guidance on finding a suitable repository for your data.