Where to archive data
Choosing the right place to archive your data is important, as this will determine to what extent they are Findable, Accessible, Interoperable and Re-usable (FAIR). In most cases, a digital data repository will be suitable, but different services may be appropriate depending on the type and nature of the data. High-volume and non-digital data will require other solutions.
As a rule, digital data should be archived to a suitable public data repository. A data repository will ensure your dataset is FAIR by undertaking the following functions:
- It actively preserves data for long-term viability, e.g. replicating and validating data files, migrating to preservation formats;
- It publishes machine-readable metadata to enable online discovery;
- It assigns persistent unique identifiers (e.g. DOIs) to datasets and makes them citable;
- It quality-controls datasets and enhances metadata, e.g. by applying standard vocabularies (not all repositories do this);
- It enables online access to data, so that they can be used by other people;
We provide guidance on choosing a suitable data repository.
Software code that supports published results (e.g. model code used to generate output data, or code written for purposes of statistical analysis) should be archived to a public data repository, so that it is preserved in the specific version relevant to the reported results, and can be cited by DOI. GitHub provides an easy-to-use function for archiving code files to the Zenodo digital repository. Code files can also be deposited in the University's Research Data Archive, or any other general-purpose repository.
Where it is desirable to release source code so that others can download and run it, and contribute to its ongoing development, it should also be made available as a public code repository. A code repository will provide version control, code review, bug tracking, documentation, user support and other features. The University provides a GitLab code repository service; other popular platforms are GitHub and Bitbucket.
Some data may not be suitable for public access, for example because they contain confidential information that cannot be easily removed (such as biometric data), or because redacting data to remove sensitive or confidential information would significantly diminish their value. This does not mean the data cannot be archived and made available to others.
Some repositories can manage sensitive data under a controlled access procedure. This may require a prospective data user to make an application to consult a specific dataset, which can be rejected or approved by the data owner or a nominated data steward. The requestor may also need to fulfil certain conditions to be granted access to the data, such as signing a confidentiality agreement. Access to personal data will also be subject to consent from the data subject, so this would need to be considered at the planning and recruitment stage of the research. See the University's guide to Data Protection and Research for more information.
If there is no suitable repository in which restricted-access data can be deposited, they should be held in a designated archive location on the University network, under the authority of and accessible by at least two members of staff. You should create a metadata record in the University's Research Data Archive describing the data and the means by which they can be accessed, so that others to find out about the data and request access to them. Requests for access to such data should be directed to a responsible person, who can respond to the requests following a procedure as outlined above.
If you have generated a large volume of data, at the 100s gigabytes (GB) or terabytes (TB) scale, which you need to archive, there are practical and cost limitations that may constrain your options. Most data repositories cannot effectively handle datasets of this size, although this is not always the case: the NERC CEDA data centre, for example, routinely manages datasets at the TB scale. If there is no suitable data repository for your data, you may have various options:
- Some research facilities, such as the Diamond Light Source, provide an archive facility for raw data collected on their instruments. In this case you would not need to archive the data yourself as this will be done as part of facility operational procedures.
- You may have access to University network storage infrastructure through your Department or research group. It may be possible to set aside part of this allocation as a designated separate archive area. But capacity in local network storage is likely to be limited. Storage provided through the DTS Research Data Storage service is charged for on an ongoing basis, so this is unlikely to be a viable long-term option.
- Institutional cloud storage options offer free high-volume storage. Your University account provides staff users with 5 TB of storage at no cost as standard, and come with 25 TB storage. While the allowances are generous, these cloud services are not designed as long-term storage solutions, and are not optimal for storage and use of high volumes of data. Upload and retrieval times may be slow, and large directories in Teams can cause problems. Data stored in OneDrive are held in personal accounts, so to ensure they remain easily accessible, the data should be retrievable by more than one person. For example, a copy of the data could be stored in another OneDrive account or on an external hard drive in an office accessible by members of the project team.
- External hard drives provide inexpensive storage solutions, although you would need to ensure you back up the data following the 3-2-1 rule: 3 copies, on 2 different media, one copy in a separate location. The data would also need to be accessible by at least two people. Data would need to be migrated to new media periodically, e.g. every five years.
If data are stored under individual management, in addition to ensuring they are accessible to/retrievable by at least two people, some handover policy would be required, so that if the data owner or primary data steward left the University, the data would continue to be accessible, either at the University or elsewhere.
Data that are archived by any of the above means should be write-protected, so that once they are stored in their definitive form, they cannot be further modified. It is advisable to have a designated steward for archived data within a research group or department, who maintains a register of archived datasets, their locations, and responsible owners.
In each case described above, you should create a metadata record in the University's Research Data Archive describing the data and the means by which they can be accessed, so that others can find out about the data and request access to them. If a request to access the data is received, this can be granted by inviting the requester to view the data on site (if this is feasible), or by arranging (at their expense) to send a hard drive containing a copy of the relevant data.
Non-digital (offline) data
Non-digital data should be stored in a secure School or Department archive or office. At least one colleague (for example, a departmental administrator) should have a record of where the data are and how they can be accessed. Ideally this information would be held in an information asset register maintained by the School or Department; if no such register exists you may wish to establish one, even if only with your research group.
You should create a metadata record in the University's Research Data Archive describing the data and the means by which they can be accessed, so that others can find out about the data and request access to them.
If data are stored under your authority and you leave the University, you must hand them over to a colleague if you leave them behind, or provide a forward contact if you take them with you.