LibGuides: Research data management: Research Data Archive

The Research Data Archive

The University of Reading Research Data Archive is an institutional repository for the preservation and sharing of research data produced or collected at the University of Reading. We accept open and restricted datasets.

University research staff and research students can deposit data in the Archive. Up to 20 GB of data per project can deposited at no charge. Deposits greater than 20 GB may be subject to a charge and must be discussed with us before the deposit is made.

The Research Data Archive will enable you to comply with the University's Research Data Management Policy and any relevant funder policy for the long-term preservation/sharing of research data. By depositing data in the Archive you can:

create and publish an online metadata record for a digital and/or non-digital dataset;
generate a unique permanent Digital Object Identifier (DOI) for your dataset, so that it can be cited and linked to;
deposit digital data files and related documentation for long-term preservation and access;
license your data and use access settings to control how your data files can be accessed and used.

Data offered or submitted to the Archive will be assessed for eligibility in accordance with the Archive's Collection Policy.

If you are depositing for the first time or require support, you are welcome to book a deposit consultation with us.

We strongly recommend you consult our guide to preparing data for sharing before you deposit data with us. There are a number of key things you will need to consider and address as part of preparing a dataset for deposit, including identifying creators and rights-holders, determining licensing requirements, seeking permission to share the data where necessary, reviewing and including consent documentation where data have been collected from research participants, and preparing your data files and supporting documentation. If you do not address these things before you deposit your data, it may delay the data publication process.

We also provide guidance on Recommended File Formats (PDF) and a README template (txt) you can use to prepare your dataset documentation.

If you wish to book a deposit consultation, please contact us using the booking link on this page. We can advise you on preparing your data and documentation files for deposit, and help you address any relevant considerations, including intellectual property rights in the data, issues of consent and confidentiality, and how the data will be licensed.

Deposits of restricted datasets must be discussed with us prior to deposit. See the Restricted data tab for more information.

When you have prepared your data and supporting document and you are ready to deposit, you can follow our step-by-step Data deposit instructions (PDF) to submit your data. We also provide a short video walkthrough of the process on this page.

Members of staff and resarch students can log in to the Research Data Archive using their University username and password.

The Research Data Archive can hold and manage controlled access to restricted datasets. These are datasets that meet the definition of Restricted or Highly Restricted information in the University's Classification Polic y. They may encompass datasets containing identifiable information or information that is confidential for other reasons, but which, with the consent of the data subjects/owners, may be shared with legitimate researchers on a controlled basis.

If you wish to deposit a restricted dataset we will ask you to create a metadata record for the dataset according to the standard deposit procedure, to upload any public documentation files, and to transfer the restricted data files to us by secure means. We will hold the restricted dataset in secure storage on the University network, and publish the metadata record in the Research Data Archive, so that the dataset can be cited and interested researchers may request access. There is an example of a record for a restricted dataset in the Research Data Archive here.

Access to a restricted dataset for research purposes can be requested by any researcher affiliated to a research organisation. The request will be reviewed by a Data Access Committee including the PI of the original study. If the request is authorised, we will arrange for the recipient organisation to sign a Data Access Agreement (see an example in this dataset) and transfer the data for use by the authorised user(s).

We only accept restricted datasets where there is a documented ethical and legal basis for retaining the data and making them accessible to others. Participants or owners of data must have consented to the data being preserved and made available to other researchers under safeguards. Providing you are transparent about who the data may be shared with, and for what purpose, and as long as appropriate safeguards are in place, it is possible to preserve data and make them accessible in compliance with legal and ethical obligations.

If you wish to deposit a restricted dataset, please first read the Restricted dataset deposit instructions (PDF), and then contact us to discuss your deposit.

Preparing data for sharing

You should put as much care into preparing a dataset as you would any other research output. A deposit in a data repository can be delayed and in some cases rejected if, for example, you have not preapred and dcoumented your data to an appropriate standard, or correctly identified intellectual property rights in a dataset and obtained relevant permissions, or established an ethical basis for sharing of data collected from participants, or anonymised a dataset where this is required.

This guide takes you through the main things to consider and address before you deposit a dataset in a data repository. It will help you to address critical requirements and produce a good quality, appropriately documented dataset.

For a more detailed version of this guide, download Preparing for data sharing (PDF).

It is important to define your dataset and identify its contents, as this will also determine what preparation is necessary. Refer to What data should you share? for more guidance on defining your dataset.

Check your preferred repository's guidance on depositing data and note any requirements it may have. Repositories may have content and metadata requirements for certain types of data, require submission of data in specific formats, and place limitations on the volume of data that can be deposited. Some repositories may also charge for deposit of data (although most do not). If you have not identified the repository you will deposit data, refer to our guidance on choosing a data repository.

If data have been collected from research participants, check that you have documented consent for data sharing. It is acceptable to disclose data obtained from human subjects without consent if the data have been fully anonymised, but it is good practice to inform participants of your intention to do this. It is not acceptable to disclose even anonymised data if in your consent procedure you stated that the data would not be disclosed, or would be destroyed at a given time. Identifiable data can be disclosed under a controlled access procedure, providing that participants have consented to participate in the study on the understanding that data would be shared in this way. The University provides a sample consent form including statements suitable for open data sharing and sharing of data subject to safeguards.

If you are depositing data collected from participants in the Research Data Archive, you will be required to submit your participant information sheet(s) and unsigned copies of any consent form(s) used alongside your data files, so that we can confirm you have a basis for data sharing. These documents will be stored alongside the dataset for administrative purposes. Access to them will be restricted, meaning they will not form part of the dataset available for users download.

It is important to understand who is a creator of your dataset – as well as who is not – because intellectual property rights and permission to distribute the data will be associated with its creators. Creators of datasets also have the moral right to be identified as such. Datasets may be the work of many hands, and it is not always easy to clearly distinguish its creators from other people who contributed to the work of the project.

According to the Copyright, Designs and Patents Act 1988 it is ‘the selection or arrangement of the contents of the database’ that constitutes the creative act which attracts copyright. Creators are those who have had a direct creative role in the selection and arrangement of data in the dataset. This is not the same as being involved in the design of the research or in the original data collection. In most cases, a project PI or student supervisor will not be a creator of the dataset, unless they had a direct authorial hand in its creation. Technicians, contractors and others involved in the collection of data are not usually creators of a dataset, unless they had creative input into the selection and arrangement of the data points.

Anyone who does not meet the definition of a Creator but has contributed to the production of the dataset can still be acknowledged for their contribution in the dataset documentation. The Research Data Archive includes a Contributors field in its metadata schema.

You must clearly identify rights-holders, because your authorisation to deposit the dataset depends on their permission. By depositing data you are also distributing them, and doing this without the authorisation of the rights-holder will be a breach of copyright.

Owners of intellectual property rights (IPR) in the data will be associated with the creators of the dataset.

In general, an employer will own IPR created by its employees: the University is ordinarily the rights-holder in IP created by members of staff. Research contracts generally allow ownership of ‘arising IP’ (i.e. created under the contract) to reside with the originating institution.

Students registered with the University own the IP they create by default, but this may not be the case if they are funded under a third-party sponsorship agreement (excluding public funders such as Research Councils, which do not assign student IP to other parties), or if they have assigned their IP to the University. A sponsorship agreement will include Intellectual Property clauses stating which party has ownership of arising IP. Ownership of IP created by a student at another institution will be subject to that institution's IP policy and any relevant agreements.

If a dataset has multiple creators, it may also have multiple rights-holders, which may include the University, students in their own right, and collaborating and partner organisations. There is more guidance on IPR in primary data/software in the Managing data section.

You may need to investigate any applicable research contracts or studentship agreements to establish what parties hold rights in a dataset. Students and/or their supervisors should have copies of any contracts relating to their research programmes. If you need to locate a copy of a contract, contact your Contracts Manager.

Where datasets incorporate secondary data, the owners of these data will also have the rights to determine how and on what terms their data are distributed by you.

IP should always be published under a licence, so that ownership of the IP and terms of use are clear to others. In accordance with the University's Research Data Management Policy you are expected to share data under an open licence wherever possible. The most widely used open data licences are the Creative Commons Attribution (CC BY) licence, which permits re-use of the data provided proper attribution is made, and the Creative Commons Zero Public Domain Dedication (CC0), a waiver of all rights in the work.

In order to license the data you must be the data owner or authorised to assign a licence on behalf of the data owner, so the choice of licence may be subject to the permission of other parties. For example: a third-party co-creator with commercial interests may request the application of a non-commercial licence; if the dataset incorporates third-party materials these may be made available with the third party's permission on an ‘All Rights Reserved’ basis.

Data held under a controlled access policy (such as UK Data Service safeguarded data and restricted datasets in the Research Data Archive) will be made available under special licence terms. The Data Access Agreement for restricted datasets deposited in the Research Data Archive allows data to be used, subject to authorisation, in confidence for non-commercial research and learning purposes only. The Agreement will be made between the University and the organisation to which the authorised user is affiliated.

As a general rule we recommend you use the Creative Commons Attribution licence for open data, and this is the default applied to uploaded files in the Research Data Archive. More restrictive licences should only be used if there is a justification for doing so, for example, to protect commercial or other confidential interests.

We provide guidance on licences and licensing. Guidance on licence options for software can be found in our Guide to publishing research software.

You must ensure that you have permission to archive and distribute the dataset from: the creators; the rights-holders; parties with contractual rights regarding publication of research outputs; secondary data owners.

Creators

Creators of datasets have the moral right in copyright law to be identified as such. Individuals also have the moral right not to have a work falsely attributed to them as an author. You must therefore ensure that dataset is archived with the knowledge and permission of its creators.

Rights-holders

Where the employer is a University or publicly-funded research organisation, permission to publish the data can be inferred from their policy position on research data, which is, certainly in the case of universities, to promote the public sharing of data supporting research outputs wherever possible. Other parties, including students, industrial studentship sponsors and commercial research partners, will need to give written consent to publication of the dataset.

Parties to contracts

Research and studentship contracts have Publication clauses, which generally grant other parties the right to be notified of and have the opportunity to approve or delay any intended publication. This right exists irrespective of who owns the IP created under the contract. The standard notice period is 30 days. Persons to whom notices should be sent will be identified in the agreement (usually towards the end).

Secondary data owners

If your dataset incorporates IP from existing sources, you may need to seek permission to distribute the dataset. If data have been obtained from a public resource such as a website or a data repository, you should check the source for any terms of use or licence information. If you have incorporated government or research data, these may well have been made available under open licences that permit redistribution, providing acknowledgement of the source is given. If you cannot find any information in the published source, or the data have been obtained from a non-public source, you may need to contact the data owner directly. We provide guidance on using secondary data.

Seeking permission

Permission should be requested in writing. Email is acceptable. Research contracts and sponsorship agreements will nominate a contact for each party, to whom any notices under the contract can be directed. In the case of studentship agreements, notices would usually be sent to the student's supervisors at the University and the sponsor organisation.

When contacting other parties for permission to archive and distribute data, it is important to identify the data unambiguously, and to be clear how the data will be made available, and on what terms they will be licensed for use. While you should always seek to licence the dataset on the most open terms, other parties may legitimately require more restrictive licensing. For example, a commercial partner may not be willing to distribute a dataset under terms that permit re-use for commercial purposes.

Depositing data in a repository is not simply a matter of transferring the files from your active storage location into the repository. Your data will need to be tidied up, put into order, and documented. When forming the dataset, consider the following:

Identify all the files that will compose it. These might include: raw data files (in the initial collection format); processed data files (e.g. cleaned data; raw data saved to another format; statistical analyses and visualisations); programming code (e.g. analysis scripts); documentation
Ensure the data are stored in suitable formats for preservation, for example by saving tabular data in an open format such as CSV. You may need to check file format requirements specified by your chosen repository. Guidance is provided on suitable file formats for preservation in the Research Data Archive.
Make sure your data files are well-formed and readable. Poorly-presented data are harder to read, more likely to contain errors, and will inspire less trust. Check the data for errors. Apply consistent style and formatting, and spellcheck your text. Ensure relevant information is clearly presented in data files, e.g. variable names and units of measurement, missing value codes, etc. Present actual values; avoid encoded content, such as formulae in spreadsheets and colour formatting.
Redact data as necessary. Data collected from research participants may need to be anonymised. There is guidance on anonymisation provided by the UK Data Service. Other kinds of information may also need to be removed or obscured, such as commercially-confidential information. Link-coded data, where data records are identified by a unique code which is linked to identifiable participant information held in a separate table, are in data protection law still personal data. For a dataset to be anonymised, and suitable for sharing as open data, you will need to remove any means of linking data records to identifiable participants, e.g. by destroying all documented records of the link, or by replacing linked IDs in the dataset with unlinked IDs.
If the dataset is composed of multiple files, make sure they are organised in a logical fashion. You can upload zip files to some repositories, including the Research Data Archive, which would allow you to organise files within a folder structure.
Use appropriate and consistent file names, which are descriptive of the file contents, formatted without spaces or special characters, and not longer than 32 characters. We provide guidance on file naming.
Check the size of the dataset and make sure it does not exceed any size limitations specified by your chosen data repository. The Research Data Archive allows the deposit of datasets up to 20 GB free of charge and recommends that individual files be no larger than 4 GB. If you have a large dataset and/or a large number of files, it may be easier for both you and prospective users of the data to use an archive format to package/compress the files. Zip and tar.gz are good choices, as they provide lossless compression.
You could ask a colleague to review your dataset. A pair of eyes unfamiliar with the data may spot mistakes and things you have overlooked. Remember that the people reading your data will have not have your experience of the research context.

Every dataset should have at least a basic manual or user guide. This should include the following:

citation metadata for the dataset (creators, title, publication year);
identification of the rights-holder(s) with licence statements;
a brief description of the dataset. This might include summary information about what and how much data were collected, the research context in which they were collected, the purpose for which they were collected, and the instruments and methods used;
information about the project in which data were collected, with any external funding details;
a description of the contents of the dataset, e.g. as a file listing;
key interpretative information, e.g. a full definition of variables and units used, such as a codebook or data dictionary;
details of the methods and instruments used to collect, process and analyse the data, and relevant supporting information, such as analysis scripts;
references to any secondary data sources used;
references to related publications. If a publication in process, as much information as possible should be provided to enable identification of the published item, e.g. authors, provisional title, journal (if known), year and status (in preparation/under review, in press).

For deposits in the University's Research Data Archive, a README template (txt) is provided, which can be used to record basic documentation. Documentation can be saved in PDF, Word or another text format as preferred.

How to deposit in the Archive

Click here to access the Research Data Archive

Contact the Research Data Service

researchdata@reading.ac.uk

0118 378 4141

Research data management: Research Data Archive