Data that support research findings should be preserved, and made accessible wherever possible, by deposit in a suitable data repository no later than publication of related research findings, or, in the case of research students, availability of the awarded thesis in CentAUR. Supporting datasets should be referenced from related publications.
Data should be as open as possible, as closed as necessary. Almost all data can be made accessible to others outside the original project on some basis. While open data should be the presumed default in the absence of any reason to restrict access, there may be valid commercial, legal and ethical reasons why access to some data needs to be restricted. There are controlled-access repositories that can manage such datasets.
The data you preserve and make accessible to others are part of the legacy of the research, and in many cases will be necessary to validate the findings you place on the public record. It is important, therefore, that the data are of good quality, preserved according to appropriate standards, and are made accessible and re-usable.
The basis of effective open data sharing is described by the FAIR Data Principles, according to which Data should be Findable, Accessible, Interoperable, and Re-usable. In most cases these principles can be complied with by depositing data in a data repository. Various data repository options may be available to you, including disciplinary data centres and data type-specific databases, the University's Research Data Archive, and general-purpose data sharing services.
The data deposit process should begin towards the end of the project, as results are being finalised and publications prepared. PhD students should deposit data before they submit their thesis for examination, so that the dataset can be cited from the thesis. Deposit of data in a data repository requires preparation, and time for this whould be allowed.
It is important to reference and link to datasets from related publications, by means of what is called a data access statement or data availability statement. This is required by funders, and also now by many publishers.
'Open means anyone can freely access, use, modify, and share for any purpose' (The Open Definition)
Data should be as 'open as possible, as closed as necessary'. This is the expectation of this University's Research Data Management Policy, and it reflects widely-accepted standards of research transparency. Funders and publishers express similar expectations in their data sharing policies.
Making available is not the same as making open. Open content is content that has been licensed by or on behalf of the rights-holder as free to use, modify and share for any purpose. Resarch data should always be licensed when they are made available, whether the liceence used is an open licence or one defining more restricted permission. See the Licences and Licensing data tabs for more information.
A large proportion of research data is suitable for being made openly available. this includes much data collected from participants, providing they have been anonymised.
Some data may not be suitable for making openly available. This may be the case for example where data have been collected from participants and there is a higher risk of identification and/or harm, for one or more of the following reasons:
This does not mean data cannot be made available. Data can be made available under restricted licences and using controlled-access repositories. For example, the UK Data Service ReShare repository has a 'safeguarded' option for higher-risk anonymised data, and the University's Research Data Archive offers a restricted dataset option for very high-risk data and data containing identifiable/confidential information. Data managed on such terms would only be made available in confidence to authorised researchers under a data access agreement. See the tab on Controlled-access repositories for more information.
Data sharing should be thought of as a formal process akin to publication. This entails a number of requirements:

These usability conditions are expressed in the FAIR Data Principles, according to which data must be Findable, Accessible, Interoperable and Re-usable. These Principles have been widely adopted as standards for management of data, development of infrastructure and delivery of services. They put special emphasis on the ability of machines to automatically find and use data and/or related metadata, in addition to supporting re-use by individuals.
Open data, to be open to fullest extent, must also be FAIR. But FAIR data do not have to be open: restricted-access data may be FAIR, providing the metadata describing them are openly accessible.
It is important to think about making data FAIR from the outset of your research, as this may affect how you collect and document your data, the formats you store the data in, how you preserve and share the data, and how they are licensed for re-use.
To be made FAIR, data should be deposited in a data repository. This is a service that exists to preserve and provide access to research data. It is a future-proofed vehicle for ensuring that data remain accessible and usable over the long-term.
Using a data repository is preferable to sharing data as supplementary files alongside a published article, or via cloud-based file storage and sharing services (such as the Open Science Framework), or maintaining data in private storage and sharing on request only. None of these ways of sharing data is fully FAIR.
A data repository performs a number of specific FAIR functions:
You are unlikely to need to presrve and share all the data you collect or create in the course of your research. You will therefore need to select data of value, and dispose of the remainder. Bear in mind the following considerations when selecting data for preservation and sharing.
What data will be required to validate the research findings that are placed on the public record, i.e. through publication in a research article or inclusion in a PhD thesis? Test data, results of failed experiments, and data from faulty instruments need not be included. Data at intermediate stages of processing will often be unnecessary, providing the raw data are preserved and you have documented any processing used to generate the final results. It may also be useful to preserve your data in a final processed format, especially if the effort required to reproduce them would be considerable. Bear in mind that code files used to generate, process and analyse data may form part of the material required to validate results.
The data you share should be your raw data, or as near as possible, at the individual record level (with appropirate anonymisation if required) and in an appropriate format for use and analysis. It is not enough to share only summary or aggregate values without the raw source data, as the results of processing alone are not sufficient to enable others to validate or reproduce your results.
There are practical limits to the preservability and shareability of some data. Some research may generate large volumes of data, at the scale of 100s gigabytes (GB) or several terabytes (TB). Examples of such research might include large-scale high-resolution imaging and video recording, and computer simulations of complex systems, where raw output can run to TB. Many data repositories will not have the capacity to handle very large datasets. Storage, preservation and transfer of data at these scales present both technical and financial challenges, to the extent that the cost of meaningful preservation and sharing of such data outputs may be in excess of any possible benefit. In the case of computer simulations in particular, it may be less important to preserve individual outputs than the model code and input parameters, by means of which a set of results can be reproduced.
Even where it is not desirable or possible to deposit high-volume data outputs in a data repository, you may still wish to retain them, for your own ongoing use, and/or in order to be able to share them with others on request. The University can offer options for such use case. See the tab on Where to deposit data.
Are there any legal/ethical/contractual restrictions on what data can be shared? In many cases, this is unlikely to mean that data cannot be shared at all. Data may need to be redacted, e.g. to remove confidential or commercially-privileged information, or access to them may need to be restricted in some way.
As a general rule, you would be expected to preserve anonymised data only. For example, you may preserve anonymised transcripts, but dispose of original interview audio recordings; you may preserve anonymised quantitative data from an observation study, but would not record data by means of which individual participants might be identified.
Where confidential information or personal data cannot be removed from data (as may be the case with biometric data, for example), or where the risk of causing harm or distress by disclosure is significant, it may be possible to preserve data on a controlled-access basis. Some data repositories, e.g. the UK Data Service ReShare repository the European Genome-phenome Archive, can manage controlled access to sensitive/confidential data. The University's Research Data Archive can also offer a restricted access option. See the Resarch Data Archive section for more information.
Computer code written to generate, process, analyse and validate research data is part of the data produced by the research, and falls within the scope of the University's Research Data Management Policy. Principles of data management should be applied to code, and code written in support of research findings should be preserved and shared wherever possible. Our Publishing research software guide (PDF) provides guidance on best practice in software code sharing.
An online code repository is often used to manage and publish code. A code repository provides various management feaures, including version control, code review, bug tracking, documentation, and user support, and allows the user to publish code releases. The University provides a GitLab code repository service; other popular platforms are GitHub and Bitbucket. A code repository is a good solution for managing code that is under ongoing development or for building a community of developers and users. But code repository platforms do not guarantee long-term preservation of the code or issue DOIs, and links to code repositories are not version-specific.
Any versions of code that supports published results (e.g. model code used to generate output data, or code written for purposes of statistical analysis) should be archived to a public data repository, so that it is preserved as the version relevant to the reported results, and can be cited by DOI from the related publication. Small scripts specific to a dataset can be archived in a data repository alongside the data. Code that may exist as an output in its own right, e.g. model code, may be better archived as a standalone item. GitHub provides an easy-to-use function for archiving code files to the Zenodo digital repository. Code files can also be deposited in the University's Research Data Archive, or any other general-purpose repository.
A licence is an official authorisation to make use of specified material. As well as telling users what they are and are not allowed to do with the material, a licence also provides protection to the creators and owners of intellectual property. An accompanying rights statement asserts legal ownership of the licensed item and the right of its creator(s) to be recognised as such. The attribution condition that is common to many open licences is the legal basis of your right to be credited as the creator of the licensed material. Many licences also include formal disclaimers of liability for any harm or damage that may arise from someone else's use of the material.
An open licence makes an item free to access, use, modify and share by anyone for any purpose. Examples of open licences include:
The Creative Commons licence suite includes versions with Non-Commercial and No-Derivatives terms. These and any licences with similar terms are less open licences, because of the restrictions they place on re-use. But if material cannot be made available under a more open licence, it is still wise to publish under a standard licence. The Creative Commons Attribution-NonCommercial (CC BY-NC) licence still grants broad permission for use in research and teaching and other non-commercial activities.
The Open Definition provides a list of conformant open licences for creative works (including publications and datasets). The Open Source Initiative lists Open Source licences for software.
The University does not prescribe use of any particular open licences for data or software, as the most appropriate licence will depend on the nature of the material and related requirements.
Creative Commons Attribution (CC BY) is widely used for the licensing of datasets (as well as Open Access publications and other materials), and is a good choice that will suit most requirements. It is the default licence recommended by the University's Research Data Archive. Other licences may be used or preferred by some repositories. For example, by default NERC data centres release primary data from NERC-funded research under the Open Government Licence; the Dryad Digital Repository releases data only under the Creative Commons Zero Public Domain Dedication.
More customised and restrictive licences may be used where data have been deposited in a controlled-access repository. Examples include:
When making data available to others outside the research team, you should observe two rules:
Data should be made available under an open licence, unless there is good reason to licence them on a more restrictive basis, for example, to prohibit commercial re-use of data in which a commercial partner has an interest.
A licence to make use of intellectual property is issued by or on behalf of the intellectual property rights-holders. The first thing therefore is to establish who owns the intellectual property, and your right or authorisation to issue the licence. The IPR in primary data tab of the Managing data section provides guidance on identifying the rights-holders in data or software. Rights-holders are typically the University (for IP created by University employees), students (in the absence of any contract or assignment agreement indicating otherwise), or third parties involved in research, such as commercial partners, collaborator organisations, or studentship sponsors.
If the material has been created by multiple authors, or multiple parties have interests in it, you should ensure that any proposed release under a specific licence is agreed by all concerned beforehand, as once it has been applied to material a licence cannot be revoked. Where ownership of research data resides with the University, researchers are authorised under the Research Data Management Policy to make data and source code available under an open licence, providing no commercial, legal or ethical restrictions apply.
To license material, you should clearly mark it with both a rights statement and a licence statement. These combined statements make clear to any prospective user who is the owner of intellectual property rights in the licensed material, and the terms on which the material can be used.
The rights and licence statements should be included in the public information recorded about the material (such as a metadata record in a data repository, or the landing page of a software code repository), as well as in the material itself and/or its primary documentation (such as a readme file or user manual). You do not necessarily have to mark all individual files with these statements, providing item-level statements are clearly visible. Licence statements should include the URL to the full legal code of the licence used (the URL can be embedded in text or a licence logo image).
Most data repositories will include include rights and licence statements in the metadata record for an item. A repository will usually enable you to specify rights and licence information when you deposit the dataset. The University's Research Data Archive provides a licence picker tool for uploaded files, with various standard licences and the option to upload your own licence. The licence information displays both in the file metadata and on the item record.
It is important that the rights statement identify all owners of intellectual property in the material. For example, the rights statement for a dataset created by a member of University staff jointly with student John Smith must identify the University and John Smith as rights-holders (assuming the student has not assigned his IP to any other party under contract).
Examples of combined rights and open licence statements are:
In most cases short scripts and segments of code written to perform standard operations, e.g. for purposes of data processing, statistical analysis or data visualisation, can be archived alongside data, under the same licence as the dataset (for example, a Creative Commons Attribution licence). This is best suited for situations where the code is likely to have little independent use value, and any re-use is likely to be solely for the purpose of validating results, e.g. by re-running analyses described in a paper.
Where re-use of source code in new contexts or further development is anticipated, for example if substantial original software has been developed, or source code has been written in the context of an ongoing project or established community, it will be appropriate to release the code under an Open Source licence, witht he caveat that where existing code has been modified any licence for the modified code must be in accordance with the licence terms for the original code.
There are a number of popular Open Source licences for software, which are listed by the Open Source Initiative, and there is a useful licence picker tool at choosealicense.com. Another useful resource, tl:drLegal provides plain English summaries of many Open Source licences. For detailed guidance on software licensing, consult our guide to Publishing research software (PDF).
A data repository is a service that exists to preserve and provide access to research data. It is a future-proofed vehicle for ensuring that data remain accessible and usable over the long-term. It should always be used in preference to sharing data as supplementary files alongside a published article, or via cloud-based file storage and sharing services (such as the Open Science Framework), or maintaining data in private storage and sharing on request only. None of these ways of sharing data is fully FAIR.
A data repository performs a number of specific functions to make research data Findable, Accessible, Interoperable and Re-usable:

The University does not prescribe the use of specific repositories, and there may be a variety of options open to you. As a general rule we recommend your first choice should be a relevant domain repository where there is one available; alternatively, you can in most cases use the University's Research Data Archive; as a third choice, general-purpose data sharing services may be used.
Most repositories are free to use. Where there is an archiving charge for a data repository, this can usually be recovered from grant funding.
Data should be deposited in a data repository specific to your research dsicipline or the data type, where one is available. These are community services and provide subject-specialist curation. They include repositories recommended by various funders and publishers. Some have the capacity to accept large volumes of data.
These are some examples of recommended repositories. They are free to use except where otherwise specified.
Many publishers recommend discipline-specific repositories, esepcially in the sciences, for example Springer Nature and PLOS. The Wellcome Trust also maintains a list of approved data repositories.
You can search for data repositories by discipline in re3data.org and FAIRsharing.
In the absence of a suitable external service staff and research students can use the University's Research Data Archive. This is free to University members and provides both open data archiving and a restricted dataset option for data containing confidential information which can be shared only on a strictly controlled basis.
The Archive has a 20 GB limit for deposits, but other services with more capacity may be options where needed. If no alternative data repository is available for a high-volume dataset, it is an option for a modest charge to archive it offline with DTS and create a linked metadata record for the dataset in the Research Data Archive, so that it can be cited and access to the data can be requested.
You can also use general-purpose data sharing services, such as Zenodo (funded by the EC), and Figshare (a commercial service that is free to individual users). These will not provide the quality control that a specialist or institutional data repository offers, but they are free, quick and easy to use.
Figshare+ can be used to share datasets up to several TB in scale for a one-off charge. (The standard Figshare service is free to use for deposits up to 20 GB.) Zenodo accepts deposits of up to 50 GB for free, and up to 200 GB on a one-off basis.
Some data may not be suitable for public access, for a number of reasons:
A number of repositories exist that can manage sensitive data falling into one or more of these categories under controlled access procedures. Such a procedure may require a prospective data user to make an application to consult a specific dataset, which can be approved or rejected by the data owner or a nominated data steward. Access would be granted under a special licence or data accesss agreement. Access to personal data will also be subject to consent from the data subject, so this would need to be considered at the planning and recruitment stage of the research. See the University's guide to Data Protection and Research for more information.
Repositories that provide controlled-access procedures include:
Some research can generate large volumes of data, at the 100s gigabytes (GB) or terabytes (TB) scale, such as computational modeling and various kinds of experimental imaging. If you need to archive these data, there may be practical and cost limitations that may constrain your options, as some data repositories have size limits. But repositories designed to handle large-scale datasets do exist, notably:
Bear in mind that you may not need to archive or maintain all of the raw data collected or generated in project. See What data should you share?
If there is no suitable data repository for a high-volume dataset, we recommend you consider the following solutions, in the order presented. If combined with creation of a metadata record in the Research Data Archive describing the dataset and the means by which it can be accessed, this can enable compliance with the University's data sharing requirements.
If data are stored by these means, you are advised to observe the following principles:
The Research Data Service can advise on and support you in archiving data using the principles outlined above.
Non-digital data should be digitised for long-term preservation wherever possible. If for any reason this is not possible or desirable, they should be archived following the principles for high-volume data. There should be clear documented ownership and local management of the data. If the data are necessary to support published research findings, a record should be published in the University's Research Data Archive describing the data and the means by which they can be accessed, so that they can be cited from the related publication.
Research outputs that rely on supporting data, code and other materials should provide information about where and how these materials can be accessed. This is a requirement specified by UKRI in its Common principles on research data and Open Access Policy, as well as by other funders of research. Many publishers ask for articles to be accompanied by a data availability statement.
This will usually appear either at the head of the article or in the end matter, often in the Acknowledgements section. Your journal's guidance for authors should indicate how to provide your data access statement.
We also recommend that you include a full citation to the dataset in your main references list. An example citation is provided further down this page.
You must bear this requirement in mind when preparing your research outputs. In order to be able to cite your data from the output, you will have to first deposit the dataset in your chosen data repository.
These general principles apply when providing a data availability statement. Examples are provided on a separate tab:
Below there are examples of data access statements covering a variety of different scenarios. In these examples a dummy DOI is used; this will not resolve.
Data supporting the results reported in this paper are openly available from the University of Reading Research Data Archive at https://doi.org/10.17864/1947.000999.
All data supporting this study are provided as supplementary information accompanying this paper.
All data are provided in full in the results section of this paper.
This study was a re-analysis of data that are publicly available from the British Atmospheric Data Centre at [DOI]. Data derived through the re-analysis undertaken in this study are available from the University of Reading Research Data Archive at https://doi.org/10.17864/1947.000999.
Interview transcripts are held under safeguards by the UK Data Service and may be accesserd by authorised researchers, subject to registration, at [DOI].
Because of the sensitive nature of the research, interviewees did not consent to the retention or sharing of their data.
Supporting data are subkect to IP protection and will be available from the University of Reading Research Data Archive at https://doi.org/10.17864/1947.000999 after a temporary embargo period.
Research data are commercially confidential, but can be made available to bona fide researchers subject to a data access agreement. Details of the data and how to request access are available at the University of Reading Research Data Archive: https://doi.org/10.17864/1947.000999.
No new data were created in this study.
The standard citation format for a dataset is:
Creator(s) (PublicationYear): Title. Publisher. Resource Type. Identifier
For example:
Smith, John and Jones, David (2015): Electricity pylons of the UK, 1928-2005. University of Reading. Dataset. https://doi.org/10.17864/1947.000999.
You should put as much care into preparing a dataset as you would any other research output. A deposit in a data repository can be delayed and in some cases rejected if, for example, you have not preapred and dcoumented your data to an appropriate standard, or correctly identified intellectual property rights in a dataset and obtained relevant permissions, or established an ethical basis for sharing of data collected from participants, or anonymised a dataset where this is required.
This guide takes you through the main things to consider and address before you deposit a dataset in a data repository. It will help you to address critical requirements and produce a good quality, appropriately documented dataset.
For a more detailed version of this guide, download Preparing for data sharing (PDF).
It is important to define your dataset and identify its contents, as this will also determine what preparation is necessary. Refer to What data should you share? for more guidance on defining your dataset.
Check your preferred repository's guidance on depositing data and note any requirements it may have. Repositories may have content and metadata requirements for certain types of data, require submission of data in specific formats, and place limitations on the volume of data that can be deposited. Some repositories may also charge for deposit of data (although most do not). If you have not identified the repository you will deposit data, refer to our guidance on choosing a data repository.
If data have been collected from research participants, check that you have documented consent for data sharing. It is acceptable to disclose data obtained from human subjects without consent if the data have been fully anonymised, but it is good practice to inform participants of your intention to do this. It is not acceptable to disclose even anonymised data if in your consent procedure you stated that the data would not be disclosed, or would be destroyed at a given time. Identifiable data can be disclosed under a controlled access procedure, providing that participants have consented to participate in the study on the understanding that data would be shared in this way. The University provides a sample consent form including statements suitable for open data sharing and sharing of data subject to safeguards.
If you are depositing data collected from participants in the Research Data Archive, you will be required to submit your participant information sheet(s) and unsigned copies of any consent form(s) used alongside your data files, so that we can confirm you have a basis for data sharing. These documents will be stored alongside the dataset for administrative purposes. Access to them will be restricted, meaning they will not form part of the dataset available for users download.
It is important to understand who is a creator of your dataset – as well as who is not – because intellectual property rights and permission to distribute the data will be associated with its creators. Creators of datasets also have the moral right to be identified as such. Datasets may be the work of many hands, and it is not always easy to clearly distinguish its creators from other people who contributed to the work of the project.
According to the Copyright, Designs and Patents Act 1988 it is ‘the selection or arrangement of the contents of the database’ that constitutes the creative act which attracts copyright. Creators are those who have had a direct creative role in the selection and arrangement of data in the dataset. This is not the same as being involved in the design of the research or in the original data collection. In most cases, a project PI or student supervisor will not be a creator of the dataset, unless they had a direct authorial hand in its creation. Technicians, contractors and others involved in the collection of data are not usually creators of a dataset, unless they had creative input into the selection and arrangement of the data points.
Anyone who does not meet the definition of a Creator but has contributed to the production of the dataset can still be acknowledged for their contribution in the dataset documentation. The Research Data Archive includes a Contributors field in its metadata schema.
You must clearly identify rights-holders, because your authorisation to deposit the dataset depends on their permission. By depositing data you are also distributing them, and doing this without the authorisation of the rights-holder will be a breach of copyright.
Owners of intellectual property rights (IPR) in the data will be associated with the creators of the dataset.
In general, an employer will own IPR created by its employees: the University is ordinarily the rights-holder in IP created by members of staff. Research contracts generally allow ownership of ‘arising IP’ (i.e. created under the contract) to reside with the originating institution.
Students registered with the University own the IP they create by default, but this may not be the case if they are funded under a third-party sponsorship agreement (excluding public funders such as Research Councils, which do not assign student IP to other parties), or if they have assigned their IP to the University. A sponsorship agreement will include Intellectual Property clauses stating which party has ownership of arising IP. Ownership of IP created by a student at another institution will be subject to that institution's IP policy and any relevant agreements.
If a dataset has multiple creators, it may also have multiple rights-holders, which may include the University, students in their own right, and collaborating and partner organisations. There is more guidance on IPR in primary data/software in the Managing data section.
You may need to investigate any applicable research contracts or studentship agreements to establish what parties hold rights in a dataset. Students and/or their supervisors should have copies of any contracts relating to their research programmes. If you need to locate a copy of a contract, contact your Contracts Manager.
Where datasets incorporate secondary data, the owners of these data will also have the rights to determine how and on what terms their data are distributed by you.
IP should always be published under a licence, so that ownership of the IP and terms of use are clear to others. In accordance with the University's Research Data Management Policy you are expected to share data under an open licence wherever possible. The most widely used open data licences are the Creative Commons Attribution (CC BY) licence, which permits re-use of the data provided proper attribution is made, and the Creative Commons Zero Public Domain Dedication (CC0), a waiver of all rights in the work.
In order to license the data you must be the data owner or authorised to assign a licence on behalf of the data owner, so the choice of licence may be subject to the permission of other parties. For example: a third-party co-creator with commercial interests may request the application of a non-commercial licence; if the dataset incorporates third-party materials these may be made available with the third party's permission on an ‘All Rights Reserved’ basis.
Data held under a controlled access policy (such as UK Data Service safeguarded data and restricted datasets in the Research Data Archive) will be made available under special licence terms. The Data Access Agreement for restricted datasets deposited in the Research Data Archive allows data to be used, subject to authorisation, in confidence for non-commercial research and learning purposes only. The Agreement will be made between the University and the organisation to which the authorised user is affiliated.
As a general rule we recommend you use the Creative Commons Attribution licence for open data, and this is the default applied to uploaded files in the Research Data Archive. More restrictive licences should only be used if there is a justification for doing so, for example, to protect commercial or other confidential interests.
We provide guidance on licences and licensing. Guidance on licence options for software can be found in our Guide to publishing research software.
You must ensure that you have permission to archive and distribute the dataset from: the creators; the rights-holders; parties with contractual rights regarding publication of research outputs; secondary data owners.
Creators of datasets have the moral right in copyright law to be identified as such. Individuals also have the moral right not to have a work falsely attributed to them as an author. You must therefore ensure that dataset is archived with the knowledge and permission of its creators.
Where the employer is a University or publicly-funded research organisation, permission to publish the data can be inferred from their policy position on research data, which is, certainly in the case of universities, to promote the public sharing of data supporting research outputs wherever possible. Other parties, including students, industrial studentship sponsors and commercial research partners, will need to give written consent to publication of the dataset.
Research and studentship contracts have Publication clauses, which generally grant other parties the right to be notified of and have the opportunity to approve or delay any intended publication. This right exists irrespective of who owns the IP created under the contract. The standard notice period is 30 days. Persons to whom notices should be sent will be identified in the agreement (usually towards the end).
If your dataset incorporates IP from existing sources, you may need to seek permission to distribute the dataset. If data have been obtained from a public resource such as a website or a data repository, you should check the source for any terms of use or licence information. If you have incorporated government or research data, these may well have been made available under open licences that permit redistribution, providing acknowledgement of the source is given. If you cannot find any information in the published source, or the data have been obtained from a non-public source, you may need to contact the data owner directly. We provide guidance on using secondary data.
Permission should be requested in writing. Email is acceptable. Research contracts and sponsorship agreements will nominate a contact for each party, to whom any notices under the contract can be directed. In the case of studentship agreements, notices would usually be sent to the student's supervisors at the University and the sponsor organisation.
When contacting other parties for permission to archive and distribute data, it is important to identify the data unambiguously, and to be clear how the data will be made available, and on what terms they will be licensed for use. While you should always seek to licence the dataset on the most open terms, other parties may legitimately require more restrictive licensing. For example, a commercial partner may not be willing to distribute a dataset under terms that permit re-use for commercial purposes.
Depositing data in a repository is not simply a matter of transferring the files from your active storage location into the repository. Your data will need to be tidied up, put into order, and documented. When forming the dataset, consider the following:
Every dataset should have at least a basic manual or user guide. This should include the following:
For deposits in the University's Research Data Archive, a README template (txt) is provided, which can be used to record basic documentation. Documentation can be saved in PDF, Word or another text format as preferred.