Where to Archive Your Data
Whether you want to preserve your data for re-use in future research, or archive them for the long-term in order to meet your funder's requirements, you will need to know what data archive services are available to you.
The University provides a Research Data Archive for researchers and research students to preserve and share their data, and this should be used wherever possible, unless a more suitable service exists. Follow the link for more information about the Archive and how to deposit data.
Other services may be more suitable for the following reasons:
- You are part of a research collaboration or partnership and it has been decided/agreed that data will be archived in another service;
- Your funder or publisher requires you to deposit data in a specified data service;
- There is a dedicated data repository that is the preferred locus of deposit for data in your field;
- You prefer to use a different service, which meets your requirements better and provides a suitable standard of data preservation and publication.
The choices available to you may depend on the field you work in, and any relevant policies/requirements of sponsors, collaborators and publishers. They will also depend on whether the available services meet requirements. Key considerations to bear in mind are these:
- Does the repository guarantee long-term preservation and integrity of the data?
- Is the repository trustworthy? Is it funded sustainably? Is it certified, for example to the Trusted Repositories Audit and Certification standard?
- Are suitable data licensing options offered?
- Does the repository publish online metadata to required standards?
- Are datasets assigned persistent identifiers such as DOIs?
- Can access controls be used to restrict access to sensitive data if this is necessary?
The University's Research Data Archive provides data preservation and publication to guaranteed standards and offers the features outlined above.
Other data archiving options, with some examples, are outlined below. For comprehensive worldwide listings of data centres and repositories, which can be searched by discipline, consult the re3data.org data repository registry. There is further information on preserving data in the MANTRA training programme.
Research Council data centres
Some Research Councils directly fund data centres and expect funded researchers to offer their data to these. For research that falls outside these data centres' remits, other preservation services must be used.
The main data centres are:
- Archaeology Data Service (AHRC);
- UK Data Service (ESRC);
- NERC data centres (British Atmospheric Data Centre, NERC Earth Observation Data Centre, and others)
Disciplinary data repositories
There are many data repositories that provide data preservation and sharing services to particular disciplines or for specific kinds of data, mostly in areas of the physical and life sciences. In many cases these are funded by various public funding bodies or research organisations. Some repositories may accept submissions free of charge; others may levy a charge to cover the costs of data curation and storage.
Examples include Dryad Digital Repository (general science data, mainly life sciences); the databases of the European Bioinformatics Institute (molecular biology); and the Cambridge Crystallographic Data Centre (crystal structures).
General data sharing services
There are some general online data sharing services, including figshare, Zenodo and Dataverse. These allow you to share data at no cost, and will undertake some basic services, such as licensing data under standard open licences (usually Creative Commons), and assignment of Digital Object Identifiers (DOIs) to uploaded datasets, so they are citable and trackable. But their limitations mean they are not useful for all datasets:
- There may be a file size limit (for Zenodo this is set at 2MB), and in practice these services are more suited to sharing of smaller datasets;
- These are essentially sharing services, not preservation repositories. There is no long-term guarantee that the data will remain intact and persistent, although all services offer contingency support in case they close;
- These are not suitable for sharing confidential data or data that require stringent access controls, e.g. acceptance of end-user licences, access for authorised users only. (Although Dataverse allows you to restrict access to authorised users only, such as a research group, until data are ready for publication.)
Some journals accept the submission of supplementary data, and will provide access to them alongside a journal article. They are likely to accept small datasets only, however. Many publishers do not accept data themselves, but ask for associated data to be deposited in a suitable data repository which can be linked to. Here are two examples:
- Nature Publishing Group does not host datasets but maintains a list of Recommended Data Repositories where data associated with papers published in its journals can be archived.
- PLOS journals allow smaller datasets to be submitted with articles as supplementary information files, but otherwise require all data and related metadata underlying the findings reported in a submitted manuscript to be deposited in an appropriate public repository. The PLOS Data Availability web page provides a list of established repositories, which are recognized and trusted within their respective communities.
Partner research organisations
If you are involved in a research partnership or consortium, partner research organisations may host or have access to institutional data archiving services.
Cloud services such as Dropbox and Google Drive are not suitable for long-term preservation of data. They are useful for sharing files with collaborators during your research, but the storage capacity is limited (unless you are willing to pay), information security is not robust enough for sensitive data, and file backup is short-term only. Some services may restrict users to a limited range of proprietary file formats. You cannot create discoverable metadata records for data with such services, or assign DOIs to datasets.
Personal storage solutions
You should not archive data using your University network drive, or personal storage solutions such as a PC hard drive or external storage device. Storing data by these means will not meet the University's or funders' requirements for long-term preservation and provision of access to data. Master versions of data should always be archived using an institutional service that preserves and enables access to data according to standard policies and procedures.
If you do wish to maintain your own personal archive copy of data, you will need to use suitable storage media and implement a preservation strategy to ensure the continuing integrity of the data. Here are some tips:
- Use the highest-quality storage media that are available to you: computer hard drives and external hard drives are lower-risk options than CDs and USB sticks;
- If the data you wish to store are confidential or sensitive, ensure the files or storage areas are encrypted;
- Maintain at least two copies of your data on two different forms of storage, for example hard drive and CD;
- Store the different copies of your data in different locations if possible, e.g. in your office and your home;
- Keep storage media in appropriate physical environments. Magnetic media such as computer hard drives can crash if overheated; optical media such as CDs and DVDs are vulnerable to damage from handling, temperature variations, excessive humidity, poor air quality and strong light;
- Check the integrity of your data files at regular intervals. A commonly-used method is to create checksums - algorithmic character strings generated from the byte configurations of individual files, like digital fingerprints. If a single byte in the file changes, so will the checksum. You can use them to check that backup copies are exact matches of originals, and to monitor the bit-integrity of files over time. Further information is available from the UK Data Service;
- Copy or migrate files to new media every two to five years, as both magnetic and optical media are subject to physical degradation.