LibGuides: Research data management: Managing data

Managing data

Managing data on a day-to-day basis during the course of your research involves a number of practical considerations and activities. These include:

storing and backing up your data in suitable locations, so that they are protected against loss, accessible to those who need them, and secure against unauthorised access. The University provides a number of options for the storage and sharing of research data
ensuring data are managed in compliance with information security, data protection and confidentiality requirements
organising your data, using suitable filing systems, file naming and version control policies, so that data files are easy to navigate and locate, and versions of files are clearly identifiable
using appropriate file formats to store and process your data and prepare them for preservation
using quality control procedures to manage the risks of collecting poor quality data and introducing errors into data in the course of processing and analysis
capturing relevant documentation and metadata, so that data can be correctly and accurately understood and used by you and colleagues, and to support effective public sharing of data in due course
ensuring intellectual property rights in any data collected or used are understod and that data are managed in accordance with these, especially when disclosure of data to other parties is in question

Collecting data also involves deciding what tools are appropriate to the purpose. We provide information about a number of online survey tools available through the University, including UoR REDCap.

To explore further aspects of data management described above, you may wish to consult some of training and information resources listed on the Useful links page. UK Data Service guidance is particularly recommended.

Data collected under the authority of the University should be stored in University infrastructure during the project. University storage provides data security and resilience through access controls, replication in separate data centres, automated backup, and file recovery. Sensitive/confidential data can be stored in these locations.

Data collected in the field should be stored securely, backed up using local devices in the absence of an internet connection, and transferred at the earliest opportunity to the primary storage location.

University storage solutions

The University storage solutions described below can be used to store and share research data. They are all suitable for the storage of personal data and other confidential information. Care should be taken if you store confidential information in OneDrive and Teams, as these both allow file sharing, including with people outside the University.

Information about these services and how to apply for them can be found in the DTS service catalogue (login required).

Microsoft Teams provides high-capacity file storage and sharing capability. The Team provides group ownership of files, which mitigates the risk of files being lost if someone leaves the University. DTS provides information about Teams and sites can be requested via DTS (login required).
Your University OneDrive account allows you to store and share up to 5 TB of data at no cost. Note that OneDrive accounts and any files stored in them are deleted when account owners leave the University. You should ensure any files stored in OneDrive and that need to be retained are handed over when you are preparing to leave the University. DTS provides information about OneDrive.
The University's Research Data Storage service is a high-volume data storage service provided at different specifications and costs (per TB per year). This may be more efficient than Teams or OneDrive for storing and processing large volumes of data. Storage can integrate with the Reading Resarch Cloud service. Capacity can be requested in increments of 0.5 TB. The minimum subscription term is one month. For grant costing purposes we recommend costing by the year. DTS provides information about the Research Data Storage service.
For group access with small to medium-volume storage requirements, staff can set up a local network collaborative share for the project of up to 100 GB at no cost. But Teams is the preferred solution to meet requirements for collaborative file access.

If data are acquired using specialist infrastructure, such as the ISIS neutron and muon source, or the JASMIN supercomputing environment, raw data may be stored in the facility infrastructure, and data copied or extracted locally as required.

Hard-copy data and documentation

You may need to address storage of non-digital data and documentation, e.g. signed consent forms, in appropriate secure environments. Where possible, you can digitise hard copies for secure storage and then destroy the paper originals. This is acceptable practice with consent forms. Even better would be to obtain consent electronically to obviate the need for paper at all. For example, UoR REDCap has an e-consent function.

Where hard-copy data and documents are collected, they should be stored securely when in transit, and kept in offices or other storage areas on University premises that can be locked and are accessible only by authorised persons. If the data are kept in offices that remain open throughout the day, they should be kept in a locked storage area, such as a desk drawer, cupboard or filing cabinet.

Signed consent forms or other non-digital records may contain identifying information and should be stored separately from data files, although a link-coding system can enable the two sets of materials to be linked if required.

Information security

The basic principle of information security is that information should be accessible only to those for whom access is authorised. In the case of confidential and legally-protected information, such as personal data, this principle must be observed with care and you must put in place appropriate access controls to keep data safe from unauthorised access.

Confidential information should be managed in accordance with the University's Data Protection, Encryption and Remote Working policies, which can be found on the Information Compliance Policies web page. Information about sensible approaches to the storage and sharing of personal data can be found in the IMPS guide to Data protection for researchers.

The University network and OneDrive provide warranted security for the storage of confidential information. Data can be transferred to the University network via VPN, which is an encrypted channel, and can be safely moved between devices and shared with others using your OneDrive account. Microsoft Teams can be used to store/share confidential information, but should be used with care, especially where people external to the University are granted access. Be aware that any member or guest of a Team will have access to all its files unless they are stored in a private channel.

If you are using personal and portable devices to collect and store confidential data, such as audio recorders, video cameras, laptops and tablets, removable hard drives, and USB sticks, the devices should be encrypted. Guidance is provided in the Encryption Policy.

Device-level protection may not in itself be sufficient if the data are highly sensitive. You may additionally need to password-protect individual files, or better still encrypt them or the folder area in which they are stored. The University's Encryption Policy provides instructions on how to encrypt files and storage areas. The UK Data Service also provides guidance on data encryption methods.

Be aware of the risks involved in using third-party services other than those provided through the University. Use of personal accounts in third-party platforms is not covered by data processing agreements with the University. Using such services to store and transfer personal data could put you in violation of the UK Data Protection Act.

If you are collecting data using an online survey you should, where possible, use on the University's approved online survey tools. If you wish to use an alternative third-party supplier for data collection or processing, contact your DTS Business Relationship Manager to discuss your requirements.

You should also be careful about transferring confidential data by less secure methods, such as email. Data transmitted via email are likely to pass through and remain on a number of servers, so it is best to avoid sending confidential data by such means. It is better to share such data by granting individual access to the files or folder area in your University OneDrive or Teams site.

Another approach to the storage of personal data is to remove individual identifiers from a data file (such as participants' names and contact details) and store them in a separate secure location. You can use a unique ID or pseudonym to maintain the link between the personal data and the related data held separately. This is sometimes called link-coding or pseudonymisation. This is a sensible data minimisation practice that mitigates the risk of identification in the event that the code data are disclosed to unauthorised parties. Be aware, though, that psueonymised data may still be personal data in data protection law, as the means to relink them to an identifiable individual exists. For this reason, they should be protected in the same way that personal data are protected.

We provide more information about compliance with ethical and legal obligations in the storage and processing of information in the Research ethics section.

Implementing a logical and consistent system to organise your data files allows you and others to locate and use them effectively, and helps to preserve the integrity of your data.

There are three main elements to data organisation:

a filing system
file naming rules
a version control policy

These elements are discussed in separate tabs. When considering how you will organise your data, bear the following principles in mind:

Use established conventions and procedures if they exist and meet your needs. Your research group or laboratory may already have standard protocols.
Ensure everyone involved in the project is aware of and follows the policy. If a policy is not observed or is applied inconsistently it has little value.
Keep your policies and practices under review. Don't leave files unsorted, hanging under top level folders; weed and tidy your folders periodically, removing redundant files.
You may want to maintain a retention schedule, with retention and review periods for designated files. This would be particularly important if you are collecting personal data, which will need to be processed lawfully and securely destroyed when no longer required. A simple spreadsheet can be used as a retention schedule.

These principles apply to information in whatever format it is held, whether physical or digital.

The following guidance is mostly addressed to the storage of digital information. For further information on good practice see guidance on data organisation from the UK Data Service and MIT Libraries.

Use a logical, hierarchical folder structure to store your files, grouping files in categories, and descending from broad high-level categories to more specific folders within these. There is no single right way to do this; the important thing is that the structure is logical, legible, and meaningful for its purpose. For example, you could organise files into folders according to task (e.g. work package, experiment), then a significant defining property (e.g. location, sample number, run, company name) or type of data (e.g. raw, processed, final). You would probably have separate high-level folders for data, administrative documentation, publications, etc.

Don't let your folder structure become too complicated, and avoid too many layers in your hierarchy (three is comfortable; ideally no more than four at the most).

Confidential information, for example, participant records, should be stored in separate identified folders with appropriate access controls.

Appropriate file naming can help you and others to easily identify the contents of a file, and to organise and version-control files. This applies whether you are storing digital or physical materials. It can be very important if you are generating large numbers of files, for example by automated process. The following suggestions may help you to develop a file naming protocol:

Use short but meaningful file names composed of significant elements, e.g. ABCProject_Interview_P012_2014-06-18 (where P012 is the participant ID number). You should be able to tell what is in a file by looking at the filename. Some properties you might use include: project identifier, data collection method or instrument, data type, location, subject, date, version number.
Don't make files names too long (32 characters should be the maximum). Avoid redundant information in file names and file paths.
Avoid spaces in file names; you can use _ or - to separate elements, or run them together using CamelCase.
Consider the sort order of your files, as this will aid identification and retrieval. Files will sort according to the types of characters used in their names, with special characters first (e.g. @), followed by numbers, and then alphabetic characters. For example, the file datafile.txt, if renamed, would sort in this order: @_datafile.txt, 001_datafile.txt, 20190731_datafile.txt.
Write dates in reverse order in the format YYYYMMDD or YYYY-MM-DD. For example, use 20250812 for August 12th 2025. This is an international standard for representing dates (ISO 8601), and it will enable you to sort files chronologically.
If you want to use numbers to force a sort order, and the number will exceed 9, use leading zeros (i.e. 001, 002 etc., not 1, 2, etc.).
Embed version control in filenames where this is relevant: date and time or version numbers will enable accurate identification of current and previous versions of files.

You can use this file naming convention worksheet to generate your file naming syntax.

Version control or versioning is a system to record changes to a file or set of files over time. It is important at all times when working with digital items, which can easily be modified. It is essential if you are working in a research group and sharing and modifying files between yourselves. Uncontrolled versions of files modified by different people can easily proliferate, causing you to lose track of your data and the transformations they have undergone. In the worst-case scenario this can compromise the integrity of the data - for example, if a raw data file is overwritten.

There are some simple things you can do to put in place effective version control. Not all of the following need be used. It will depend on the nature of the work and the processes the data undergo. More detailed guidance on version control is available from the UK Data Service.

Use access control and read/write permissions in files/storage areas to restrict the ability to modify files to authorised users only.
Save new versions with a different file name element to signify the version, e.g. by incorporating a version number (001) or a version date (20251008).
Use OneDrive or Teams to synchronise versions of files stored in multiple locations and to retain the file version history, or use versioning software, e.g. Subversion (SVN).
Make raw data files, master and milestone versions of files read-only and store them in separate folders. You may not need to keep all old versions of files, but it is advisable to retain milestone versions or old master files.
Document changes in a version control table within the document. This should contain headings for Version number, Author, Purpose/Change, and Date.

The file formats you use for your data may affect what you can do with the data and how effectively they can be preserved and shared. In practice your choice of file formats may be governed by the standards in your discipline, or the types of hardware and software you use in your research, but you should follow best practice principles as far as possible.

The UK Data Service provides detailed advice on formatting your data, including recommendations on optimal formats for preservation. The University provides guidance on recommended file formats (PDF) for deposit of data in the Research Data Archive.

Proprietary and open formats

File formats may be proprietary, such as Microsoft Excel and Adobe PDF, or open, such as comma-separated values (CSV) or Open Document Format (ODF).

The best formats for data collection and analysis may not be the most suitable formats for long-term data preservation. Proprietary formats can provide rich highly-specified functionality, but may limit the usability of your data and be high-risk in the long-term, as they are commercial products, available under licence only and less widely-used formats may be prone to obsolescence.

Open formats may lack rich functionality and be more generic, but they provide high usability and carry a low risk over the long term because there are no licence fees, their specifications are publicly available, and they can be rendered by multiple software packages.

Working and preservation formats

For day-to-day working, use file formats that are fit for purpose and accessible to your research group. For example, you may use Microsoft Excel for quantitative data analysis and visualisation.

For long-term preservation, where possible, you should store data in open or widely-used formats, and plan for conversion from proprietary formats where necessary. For detailed information about any of the formats mentioned below, refer to Library of Congress format assessments.

Suitable preservation formats may be:

Open formats, such as CSV for tabular data, ASCII text (.txt) and PDF/A for text and documentation, XML with an appropriate Document Type Definition (DTD) for structured machine-readable information, JPEG for images, FLAC for audio, and MPEG-4 for video. Included in this category are self-describing formats encoded in text files, where the file contains a header with information about the variables reported in the body of the file: examples include the NetCDF format used in climate system models, and the FASTA format for representing nucleotide or peptide sequences.
Widely-used proprietary formats, such as MS Excel and MS Access for tabular data and databases, MS Word for text, TIFF 6.0 uncompressed for images, and MP3 or WAV for audio.

For example, raw instrument data in a proprietary format may be preserved, but also or alternatively converted into a .txt format, to be more widely accessible; data analysed in a proprietary software, such as MATLAB or SPSS, should be preserved in a format accessible to users without a software licence.

In some conversions you may lose rich features and formatting, but you have a greater chance of retaining the integrity of the content in the long run. If the richer features provided by a proprietary format add value to your data, you can always retain the data in that format as well. Popular formats such as Adobe PDF and those of Microsoft applications are likely to endure for many years.

Image and audiovisual files may need to be preserved at the most information-rich level in order to support future uses, but practical consideration of usability may also enter in. For example, an uncompressed TIFF file will preserve the highest level of information; by comparison a lossy compressible format such as JPEG while preserve less information, but has practical benefits in that file sizes will be smaller and faster to serve online.

Research software

Use of open programming languages such as Python and R to process and analyse data can have functional advantages over 'point-and-click' proprietary software, as well as being intrinsically reproducible.

For example, to undertake statistical analysis of your data you could use SPSS, which is proprietary software, and requires a licence. Because operations are performed by interacting with a Graphical User Interface, there is no script of your operations that can automated. Anyone wishing to replicate your analysis would need to access SPSS, import your data, and reconstruct the analysis on the basis of information provided by you.

If instead you used the free programming language R, you can conduct your analysis without having to access proprietary software, and you will be able to preserve the full analysis workflow by saving your scripts to text files. You or anyone else with these scripts can then re-run exactly the same analysis by executing the code; because the analysis is automated, it is guaranteed to be reproducible. Because a software licence is not required to run the analysis, it is also a more transparent method.

Quality control is essential to maintaining the integrity and viability of data. Every action performed on data, from the point of collection onwards, presents an opportunity for errors to be introduced. You must therefore implement procedures to manage this risk , and to mitigate the impact of errors when they occur.

Various quality control strategies can be used:

Map your data workflows, from the point of collection to the final format dataset, and break it down into operations performed on the data (see the example in the image). For every operation, identify the quality control procedure to be applied. Ensure the quality control procedures are consistently performed and documented where relevant.
Standardise and document your workflows, so that another person could follow your instructions and achieve the same result as you, for example, by writing a step-by-step protocol for data collection, or guidelines for formatting and anonymisation of interview transcriptions. Include a data dictionary containing a full definition of variables, and information about permitted values (including missing value codes). Follow established procedures where relevant, such as laboratory Standard Operating Procedures: these have been tried and tested. Keep in mind that protocols are only useful if they are followed. If other people are involved in data collection, ensure they are appropriately trained and supervised until they perform their operations to the required standard.
Make sure you have tested and calibrated your instruments before you start live data collection. This can include creating and piloting data collection forms or templates in advance. For example, set up a spreadsheet with variables clearly labelled in column headings, including units of measurement, or create a transcription template for interview transcripts. Some software may have data validation functions that you can use to reduce the risk of error, for example by allowing only a controlled range of values to be recorded in a field.
Methods such as taking repeat measurements and random sample checking can reduce the incidence of error.
Review data to check they make sense. Data visualisation can help to identify suspicious outliers and anomalies: a trendline with an obvious spike in it may highlight a suspicious value.
Ask someone to review your data. You could ask a co-author or pair up with a colleague or fellow student and agree to review each other's data.

The UK Data Service provides guidance on quality control.

Would you understand your data in five or ten years' time? If somebody else wanted to use your data in their own research, or wished to replicate your results, what information would they need?

Documentation makes your raw data meaningful and provides the means to validate them and the analyses on which your findings are based. You should record relevant information as soon as possible, and ensure it is stored and organised efficiently. This process of documentation is iterative, and will begin before you even start data collection, as you identify and define the data you plan to collect.

Documenting early and often will make it much easier to work with your data over the long term. For example, when you are preparing a dataset for deposit in a data repository at the end of your project, you will need to create supporting documentation to support the dataset, which might include information about methods and instruments, a file listing, and a data dictionary listing and defining variables recorded in the dataset.

It can be useful to think of documentation in terms of four levels: variable, file/database, project, and metadata.

Variable

Variable-level documentation defines your variables, and specifies units of measurement and permitted values (including missing value codes). This information is often embedded within data files, e.g. as a header, or in column labels. We recommend you create a data dictionary listing and defining variables. Some software may be able to do this for you: for example, UoR REDCap has a data dictionary feature. The Open Science Framework provides a useful guide to creating a data dictionary.

File/database

File or database-level information describes the components and logical structure of the dataset. This could be as simple as a listing of files with details of their contents, or a database schema. This information might typically be recorded in a separate readme file.

Project

Project-level information describes the research questions and hypotheses the data are collected to answer or test, the research methodologies, the instruments used to collect and process the data, and records of the research process. There may be standard experimental reporting protocols in your field that you can use to document your methods and instruments. Documentation might include laboratory notebooks, interview schedules, instrument or software specifications and guides, in-line commentary of software code written in the research, interview transcription and anonymisation guidelines, etc. In scientific research the documentation of this information may be more formalised, and may be supported by specific processes or tools. For example, it is increasingly common for study protocols to be publicly pre-registered, and there are a number of online tools such as protocols.io, Benchling, Labstep or RSpace that can be used to record and publish experimental protocols and lab notes.

Metadata

Metadata can refer to any information about data in a general sense, but it is often used more specifically to mean a set of defined elements organised in a structured description of an information item such as a dataset. Metadata are created when a dataset is deposited in a data repository or described in a data catalogue, and will be composed of information generated at the first three levels of documentation. The metadata record enables a dataset to be discovered online and provides key information to support continued curation and use of the dataset. Core metadata properties are typically: Creator(s), Title, Publisher, Publication Year, Resource Type, Unique Identifier, e.g. DOI. Additional properties may be included to facilitate discovery and use, such as description, keywords, temporal and geographical references, rights and licence information, and links to related publications.

You will not need to create a metadata record until you are at the stage of depositing your data in a relevant repository. But if you have identified a specific repository that you plan to deposit data in, it is worth familiarising yourself with their metadata requirements, so that you have all the information you need when the time comes. Disciplinary and data-type specific repositories may have particular metadata requirements. For example, if you are conducting microarray or next-generation sequencing experiments and plan to deposit data in Array Express, you should be prepared to record your experiment using the Minimum Information About a Microarray Experiment (MIAME) or Minimum Information About a Sequencing Experiment (MINSEQE) guidelines.

A database is defined in law as 'a collection of independent works, data or other materials which: a) are arranged in a systematic or methodical way, and b) are individually accessible by electronic or other means' (Copyright and Rights in Databases Regulations 1997). Therefore any collection of data made or used in the course of research is likely to constitute a database subject to legal protections for intellectual property.

Intellectual property rights (IPR) affect the way both you and others can use your and others' research data, and these issues should be considered at the outset of any research project.

Failure to clarify rights in your primary data and permissions for the use of secondary data at the start of your research can affect your ability to use and disseminate the data. It can also cause you legal trouble if you infringe another party's IPR, for example by publishing data without authorisation.

The UK Data Service provides useful guidance on rights in data.

University IP policy can be found in its Code of Practice on Intellectual Property.

If you need assistance with any research contracts that affect IPR, you should contact Research Contracts. For queries relating to commercial exploitation of IP and any related restrictions on data sharing, contact Intellectual Property and Commericalisation.

In general the following three principles apply:

Where no external contract exists, the University has ownership of IP created by researchers in its employment;
Where research is carried out under a contract or research agreement, the terms of the agreement will determine ownership and rights to exploit the data;
The University does not automatically own student IP, although in some circumstances students may assign IP to the University, for example, where research is carried out under third-party contract or where the data are produced with the significant involvement of University employees.

Research collaboration and partnership agreements, and industrial sponsorship/CASE studentship agreements will include IP clauses, specifying where ownership of arising IP resides. It is standard in collaboration and partnership agreements for ownership of IP to belong to the originating party. In industrial sponsorship/CASE agreements, ownership of arising IP usually resides with either the University or the sponsoring organisation.

Research contracts also have Publication clauses, which generally grant other parties the right to be notified of and have the opportunity to approve or delay any intended publication. This right exists irrespective of who owns the IP created under the contract. Deposit of data in a data repository for long-term preservation and sharing will constitute publication, so any notice requirements and lead times must be factored into planning for deposit of data. The standard notice period is 30 days.

You should always keep a copy of any legal agreements governing your research, and refer to them in your planning, and again prior to making your data publicly available, in order to ensure you are complying with your contractual obligations.

Where ownership of research data or software resides with the University, researchers are authorised under the Research Data Management Policy to make data and source code openly available, providing no commercial, legal or ethical restrictions apply. If staff at other universities are co-creators of the data, it is likely that these institutions will permit data sharing under similar policies. But you should always ensure any colleagues with other organisations who had a hand in creating the dataset or software are aware that they will be published and the terms on which they will be made available.

While for University employees authorisation in principle to publish data and software source code created in research under an open licence may be assumed, the particular context should always be taken into account. For example, if it is an objective of the research to create a proof of concept with a view to commercialisation, there may be a very good reason not to make project IP (including relevant data) openly available. Be aware that open licences applied to data and source code cannot be revoked once they have been applied.

If students at other institutions have co-created a dataset, they should be asked to confirm who has ownership of their IP and that permission to disclose it on the specified licence terms is granted. They may need to check their insitution's IP policy and any relevant sponsorship or IP assignment agreement.

Who created the data/software?

Ownership of IP follows creation, so it is important to be clear about who is and is not the creator of a dataset or piece of software. The creation of research outputs is often a collective endeavour, involving the work of many hands, which may belong to staff and students of the University, as well as employees of other organisations.

A dataset creator is someone who has had a direct creative hand in the selection and arrangement of data to form the dataset. This is not necessarily the same as being involved in the design of the research or in the collection of data. In most cases, a student's supervisor will not be a creator of a dataset, nor will a technician or contractor who has been involved in the collection of data. The creator of written code is similarly one who has had a direct authorial hand in its creation. If existing code is incorporated or adapted, then there may be subsisting IP rights that will need to be taken into account.

It should always be borne in mind that when a person is identified as a creator of a piece of intellectual property, IP rights will follow, and disclosure of the IP, for example by deposit of a dataset in a data repository, may entail seeking permission from acknowledged rights-holders.

Some research may include the creation of copyright materials by research participants, such as photographs, drawings or written work. In such a case, in order to be able to use and share the IP, you should ask your participants to transfer copyright in the materials to the University or to grant the University a licence to use and distribute the materials. You can discuss this in the Participant Information Sheet and include a consent statement such as one of the following, adapted as necessary:

I agree to transfer copyright in any material created by me in the course of this project to the University of Reading

I grant the University of Reading a worldwide, royalty-free, non-sublicensable, non-exclusive, irrevocable license to reproduce and share any copyright material created by me in the course of this research project, in whole or in part

Bear in mind that any materials would need to be shared without attribution if you intend to preserve the anonymity of your participants. If you wished to attribute materials to individuals, for example if an exhibition featuring works created by participants were a planned output, you would need to address the fact that participants will be identifiable in your recruitment procedure. See our guidance on consent for data sharing for more information.

IPR considerations

Secondary data are intellectual property belonging to another party, so they may only be used with that party's permission and subject to any terms specified by the provider. You should investigate any secondary data sources that will be essential to your research in advance, so that you are confident they can be used for the purposes anticipated. Check the data source for a licence or terms of use, and document these in a record of the data sources you plan to use. If you cannot find any information, contact the data provider to obtain their terms of use in writing.

A lot of research and public sector data available from public data repositories and government providers will be published under an open licence or a licence with broad re-use permissions, such as the Creative Commons Attribution licence or the Open Government Licence (widely used for public sector information). These allow data to be re-used, modified and redistributed for any purpose; often the only requirement will be that the source is acknowledged in any published re-use.Some research and public sector data may be published under licences that place restrictions on commercial re-use and redistribution (such as the Creative Commons Attribution-NonCommercial licence).

Data containing confidential information may be supplied under a special licence or data sharing agreement that require compliance with confidentiality and information security requirements (see below).

Commercial data products are supplied for a fee under proprietary licences that prohibit the distribution of copies or derived information.

Special conditions on use of confidential information

Any data not supplied under an open licence may have special conditions of use that will restrict what can be done with the data and any derived information, and may require organisational signature of a licence or data use agreement before data can be used.

Data that contain confidential information, such as national census microdata provided the UK Data Service, may be supplied only once you and the University have signed a data use agreement containing confidentiality clauses, and demonstrated that you will meet any information security conditions.

For example, in order to be able to consult Office of National Statistics data held in the UK Data Service Secure Lab you would need to:

complete the Safe Researcher training course to qualify as an ONS Accredited Researcher;
arrange organisational signature of the Secure Access User Agreement. You would need to contact your School's Contracts Manager to have the agreement reviewed and signed;
complete any additional documentation specific to the data collection(s) you wish to consult;
either attend the Secure Lab in person at the UK Data Archive, or, if you wish to access the data remotely from on campus, ensure the PC you use is: located in a room not accessible to the public and locked when unattended; protected by a password lock with a five-minute timeout; assigned a static routable IP address. You should contact your School's IT Business Partner (login required) to arrange for assignment of a static IP address to your designated PC;
submit any derived datasets to the data provider for clearance before publication.

Can you distribute the data in part or whole or any derived datasets?

If secondary data will be incorporated into new datasets, or used to derive new datasets, it is preferable that you should be able to distribute the modified or derived data in support of your research outputs, for example by depositing the modified or derived dataset in a data repository.

If data have been supplied under an open licence or a standard licence for research and public sector data, then redistribution in whole or part will be possible, unless the licence includes a No Derivatives clause (e.g. Creative Commons Attribution-NoDerivatives 4.0 International). Beware of any standard licence restrictions (i.e. Non-Commercial or Share-Alike clauses) that may dictate the terms on which a modified dataset can be shared.

Commercial data providers are unlikely to permit distribution of modified or derived datasets. Data supplied under a confidentiality agreement will not be distributable, and publication of derived data may also be prohibited, or permitted only after the derived dataset has been vetted by the data provider.

If you wish to use secondary data, consider as early as possible in your research whether you are likely to want to reproduce any subset of the data or any derived dataset as part of your eventual research outputs. If such permissions are not granted through the copyright owner's standard licence, it may be possible to negotiate them. Bear in mind that permissions may distinguish between non-commercial and commercial uses, and that there may be a cost to secure any permissions.

Survey tools can be useful data collection instruments for a variety of research and research-related purposes, but should be used with care.

Considerations of information security and legal compliance apply if you are using online instruments to process personal data. Under data protection law, the University will be the data controller for personal data collected under its authority, including by research students. This means the University has legal responsibility for these data.

Any third-party service provider collecting personal survey data on your behalf will be acting in the capacity of a data processor as defined under data protection laws. Whenever a data controller uses a data processor, a written contract must be in place so that both parties understand their responsibilities and liabilities.

This means you must use a service provider under agreed terms of service that provide specific guarantees about the processing of personal data in accordance with UK data protection laws.

The University provides access under institutional agreements to a number of tools with survey capabilities.

Online Surveys

Online Surveys is a tool provided by Jisc and designed for use in academic research. Available to University staff and students. To obtain an account contact the Planning and Strategy Office.

UoR REDCap

REDcap is secure web-based application for building and managing online databases and surveys. It can be used to collect data through both surveys and direct entry by members of a project team. Multiple instruments can be created in a single project, and longitudinal data collection is supported. Available to University staff and students. Visit the UoR REDCap web page for more information.

Microsoft Forms

Microsoft Forms is a basic web form application, useful for simple surveys and registration forms. Available to all University members through University Office 365.

Qualtrics

Qualtrics is an online survey tool widely used in academic research. Available to Agriculture, Policy and Development staff and students only. To obtain access, contact Giacomo Zanello.

If you wish to use other online services for collecting and processing personal data, but are unsure if they meet information security and data protection requirements, contact your DTS Business Partner (login required) for advice.

Research data management: Managing data