LibGuides: Research data management: About research data management

Research data management basics

Research data management (RDM) encompasses the collection, management, preservation and sharing of research data throughout the research lifecycle. It includes planning for data management at the grant application and research development stage, managing data on a day-to-day basis during the research project, and preserving and sharing primary data for the long term on completion of the research and publication of findings.

Many funders have policies requiring researchers to preserve and share primary data collected in the research that support findings placed on public record. The University supports these policies and in its own Research Data Management Policy requires researchers, wherever possible, to deposit data that support published research findings in a suitable data repository for long-term preservation and sharing.

RDM requirements apply to any data that are collected or used in the research process, including research software and code that may be created to generate, process and analyse research data. RDM is relevant to both primary and secondary data: while there is no requirement to preserve and share secondary data (and you may not have permission to do this in any case), various aspects of managing data will apply, including making sure the intellectual property rights of the data provider are respected.

RDM starts with data management planning. A data management plan (DMP) may need to be completed as part of a grant application, an application for ethical review, or, if you are a PGR at this University, as a requirement for confirmation of registration and annual review. It is always advisable to create a DMP as a practical tool for any research project involving the collection and use of data.

These are two definitions of research data:

'the evidence that underpins the answer to the research question' Concordat on Open Research Data
'recorded factual material commonly retained by and accepted in the [research] community as necessary to validate research findings' EPSRC Policy Framework on Research Data

Research data are the raw materials collected, processed and studied in order for the researcher to answer a research question. They are the evidential basis that substantiates published research findings.

They may be primary data generated or collected by the researcher, or secondary data collected from existing sources and used as part of the research activity. Primary data may include code created in order to generate, process or analyse data. Primary data should be preserved wherever possible in support of published findings.

In addition to the 'raw' data, the research process generates working materials such as administrative information and metadata: information about the data and the means by which they were collected or generated. This might include details of methods and instruments used, and essential interpretive and contextual information, such as specifications of variables. This information can be essential to understanding and validating research data, and often needs to be preserved along with primary data in metadata and supporting documentation.

See separate tabs for definitions and illustrations of primary data, secondary data and working materials.

Primary data are new data collected or created by the researcher to enable them to answer a research question.

These data may be:

collected by means of experiment, e.g. in a laboratory or field experiment, which may or may not involve research participants;
observations, e.g. physical observations collected by sensors, and surveys, interviews and ethnographic observations;
the outputs of computational simulations, e.g. numerical data generated by climate models;
arrangements of data that have been obtained from other sources and compiled in a format designed by the researcher, e.g. a database containing information extracted from various sources;
software and code created as part of the research process to generate, process or analyse data, which may be necessary to enable the replication of data and analyses.

The researcher must preserve these data and make them accessible to others in support of any published research findings (including PhD theses) by deposit in a data repository wherever possible. Data collected from participants can usually be shared, with appropriate consent and anonymisation processes.

Secondary data are data that already exist and that are analysed or used as inputs into the research.

These data may be:

published literature;
materials held in archives;
datasets held in data centres and data repositories, such as national census and social science research data held by the UK Data Service;
commercial data products, e.g. the Factiva business intelligence database;
material that is published online, e.g. social media and website content;
non-public material that is provided directly to the researcher by other parties, such as an organisation’s administrative records or internal reports;
digital copies of secondary materials, which are created solely as aids to the research and not in order to produce an output such as a digital edition or collection, e.g. photographs of documents in archives, or a corpus of textual material compiled for computational analysis.

The researcher is not responsible for preserving and sharing these data. Copies of confidential and copyright materials may not be shared without permission from the owner/provider. Sources used should be referenced appropriately from the thesis and any publications.

You will need to consider issues of data management, as you would for primary data, such as: where any copies of the data will be stored, and how they will be kept secure, particularly if they contain confidential or sensitive information; and how the data will be processed, for example if data are being combined from multiple sources, used as inputs into modelling activities, or transformed in any way as part of the research.

It is recommended that as part of your data management planning you prepare a list of the key data sources you will use in your research, with full references by DOI or other persistent identifier where possible. For each data source, record the terms of use, and whether the data will be consulted only, or will be incorporated into data outputs intended for distribution in support of project findings.

Working materials are administrative documentation and materials created by the researcher as aids to the research process.

These may be:

reading notes and quotations from secondary sources;
administrative documents related to the creation or use of data, e.g. participant recruitment documentation, data collection and processing protocols, data collection instruments such as surveys, and agreements and contracts.

As a rule, the researcher is not responsible for preserving and sharing these materials, but may need to include information/documentation relating to data collection and processing as part of shared datasets, e.g. a description of the data collection methodology, a survey questionnaire, a data dictionary.

Key elements of research data management include:

understanding the project context: the type of project, the people, organisations, funders or sponsors involved, and any policies, contracts or agreements, are all factors that will have a bearing on how data are managed
defining data management roles and responsibilities, so that data are consistently and accountably managed within the project
storing data securely so that they are protected against corruption and loss, and unauthorised access
organising data, using meaningful file names and logical folder structures, and applying version control to modified files, to enable effective working
using appropriate data structures, formats and standards for data, to support effective use and interoperability with recognised standards where these exist
applying quality controls, so that the integrity of data is maintained and the incidence and impact of error is minimised
documenting data, so that you (and others) can understand what the data are, how they were collected/generated, and how they have been processed and analysed
using appropriate protocols to process personal and confidential data, to ensure you are meeting the requirements of the Data Protection Act and your ethical obligations
handling intellectual property rights in data, whether primary data created by you or secondary data belonging to other parties
preserving and sharing primary data collected in the research using suitable data repositories, so that they can be consulted and re-used by other researchers

A data management plan (DMP) is likely to cover some or all of the elements highlighted above, depending on the purpose for which it is created. For example, a DMP required for ethical review will be focused mainly on the safe management of data collected from research participants; a DMP created as a practical tool to support a project team will be more comprehensive and detailed.

Research data management is especially important when applied to primary data, i.e. new data collected or generated in the research activity. Because these are new and in many cases are essential to validation of your research findings, it is important to ensure they are properly curated from outset and that they are preserved and made accessible when research findings are placed on public record.

While researchers are not responsible for the preservation and sharing of secondary data used in research, they will still need to consider a number of issues, including: how and on what terms are the data to be accessed and used; where and how any copies of data will be stored; and whether the data provider allows copies of the data or derived data to be distributed.

Data are your research capital. They enable you to answer your research questions; they provide the evidence base for the results you make public; they may have ongoing value to you and to others; and where they are used, they can be cited to your benefit. By actively managing your research data you will:

make life easier: well-organised data management increases your efficiency, and saves time and effort in the long run
protect yourself and others: you can reduce the risk of costly/embarrassing/damaging accidents, such as losing data, or disclosing confidential data
preserve the integrity of your research: well-documented data demonstrate the authenticity of your research and the reliability of your findings
realise the full value of your data: data that are preserved and accessible in the long-term can be re-used to your benefit and others'

It can be helpful to think of research data management in terms of a research data lifecycle and the activities that take place at different stages of the cycle.

Plan

Here you will identify the data that will be collected or used to answer your research question, and plan for data management throughout the lifecycle. This is the stage at which a data management plan would be created.

Collect

This is the stage at which experiments are carried out, observations made, secondary materials acquired, etc. This will involve documentation of data collection instruments and methods and information necessary to interpret and use the data.

Process

Data may need to be processed in order to be usable. This might involve cleaning data, combining data from separate collection eventws, transforming data from one state to another (e.g. format conversion), and using procedures to validate or quality-control data. Any data processing will need to be documented, such that the end result could be replicated from the raw data.

Analyse

Data analysis is the stage at which the raw materials of research are interrogated to produce the insights that constitute the research findings. Instruments and methods used for analysis should be documented; code written for purposes of data analysis may need to be preserved and made available in support of research results.

Preserve

Towards the completion of your research you will preserve for the long term the primary data that substantiate your research findings. Data will need to be prepared for preservation and deposited in a suitable data repository. Preservation activities may involve quality assurance, file format conversion, creation of metadata records with assignment of Digital Object Identifiers (DOIs) to datasets, licensing data for re-use, and implementing any required access controls.

Research outputs based on data (including PhD theses) should include a data availability statement indicating where and on what terms the supporting data can be accessed. A data repository will enable discovery of the data in its care by exposing the metadata online, and will provide access to the data when this is permitted. Data may be made publicly available, or restrictions on access may be imposed where data are of a sensitive or confidential nature.

Re-use

Data that are available for discovery and access may be re-used by other researchers, either to substantiate the findings of the original research, or to inform new research activities. Research data may also have other valuable uses, e.g. in policy-making, development of commercial products and services, and teaching.

Research data management: About research data management