Research data management (RDM) encompasses the collection, management, preservation and sharing of research data throughout the research lifecycle. It includes planning for data management at the grant application and research development stage, managing data on a day-to-day basis during the research project, and preserving and sharing primary data for the long term on completion of the research and publication of findings.
Many funders have policies requiring researchers to preserve and share primary data collected in the research that support findings placed on public record. The University supports these policies and in its own Research Data Management Policy requires researchers, wherever possible, to deposit data that support published research findings in a suitable data repository for long-term preservation and sharing.
RDM requirements apply to any data that are collected or used in the research process, including research software and code that may be created to generate, process and analyse research data. RDM is relevant to both primary and secondary data: while there is no requirement to preserve and share secondary data (and you may not have permission to do this in any case), various aspects of managing data will apply, including making sure the intellectual property rights of the data provider are respected.
RDM starts with data management planning. A data management plan (DMP) may need to be completed as part of a grant application, an application for ethical review, or, if you are a PGR at this University, as a requirement for confirmation of registration and annual review. It is always advisable to create a DMP as a practical tool for any research project involving the collection and use of data.
These are two definitions of research data:
Research data are the raw materials collected, processed and studied in order for the researcher to answer a research question. They are the evidential basis that substantiates published research findings.
They may be primary data generated or collected by the researcher, or secondary data collected from existing sources and used as part of the research activity. Primary data may include code created in order to generate, process or analyse data. Primary data should be preserved wherever possible in support of published findings.
In addition to the 'raw' data, the research process generates working materials such as administrative information and metadata: information about the data and the means by which they were collected or generated. This might include details of methods and instruments used, and essential interpretive and contextual information, such as specifications of variables. This information can be essential to understanding and validating research data, and often needs to be preserved along with primary data in metadata and supporting documentation.
See separate tabs for definitions and illustrations of primary data, secondary data and working materials.
Primary data are new data collected or created by the researcher to enable them to answer a research question.
These data may be:
The researcher must preserve these data and make them accessible to others in support of any published research findings (including PhD theses) by deposit in a data repository wherever possible. Data collected from participants can usually be shared, with appropriate consent and anonymisation processes.
Secondary data are data that already exist and that are analysed or used as inputs into the research.
These data may be:
The researcher is not responsible for preserving and sharing these data. Copies of confidential and copyright materials may not be shared without permission from the owner/provider. Sources used should be referenced appropriately from the thesis and any publications.
You will need to consider issues of data management, as you would for primary data, such as: where any copies of the data will be stored, and how they will be kept secure, particularly if they contain confidential or sensitive information; and how the data will be processed, for example if data are being combined from multiple sources, used as inputs into modelling activities, or transformed in any way as part of the research.
It is recommended that as part of your data management planning you prepare a list of the key data sources you will use in your research, with full references by DOI or other persistent identifier where possible. For each data source, record the terms of use, and whether the data will be consulted only, or will be incorporated into data outputs intended for distribution in support of project findings.
Working materials are administrative documentation and materials created by the researcher as aids to the research process.
These may be:
As a rule, the researcher is not responsible for preserving and sharing these materials, but may need to include information/documentation relating to data collection and processing as part of shared datasets, e.g. a description of the data collection methodology, a survey questionnaire.
Key elements of research data management include:
A data management plan (DMP) is likely to cover some or all of the elements highlighted above, depending on the purpose for which it is created. For example, a DMP required for ethical review will be focused mainly on the safe management of data collected from research participants; a DMP created as a practical tool to support a project team will be more comprehensive and detailed.
Resarch data management is especially important when applied to primary data, i.e. new data collected or generated in the research activity. Because these are new and in many cases are essential to validation of your research findings, it is important to ensure they are properly curated from outset and that they are preserved and made accessible when research findings are placed on public record.
While researchers are not responsible for the preservation and sharing of secondary data used in research, they will still need to consider a number of issues, including: how and on what terms are the data to be accessed and used; where and how any copies of data will be stored; and whether the data provider allows copies of the data or derived data to be distributed.
Data are your research capital. They enable you to answer your research questions; they provide the evidence base for the results you make public; they may have ongoing value to you and to others; and where they are used, they can be cited to your benefit. By actively managing your research data you will:
It can be helpful to think of research data management in terms of a research data lifecycle and the activities that take place at different stages of the cycle.
Here you will identify the data that will be collected or used to answer your research question, and plan for data management throughout the lifecycle. This is the stage at which a data management plan would be created.
This is the stage at which experiments are carried out, observations made, secondary materials acquired, etc. This will involve documentation of data collection instruments and methods and information necessary to interpret and use the data.
Data may need to be processed in order to be usable. This might involve cleaning data, combining data, transforming data from one state to another (e.g. format conversion), and using procedures to validate or quality-control data. Any data processing will need to be documented, such that the end result could be replicated from the raw data.
Data analysis is the stage at which the raw materials of research are interrogated to produce the insights that constitute the research findings. Instruments and methods used for analysis should be documented; code written for purposes of data analysis may need to be preserved and made available in support of research results.
Towards the completion of your research you will preserve for the long term the primary data that substantiate your research findings. Data will need to be prepared for preservation and deposited in a suitable data repository. Preservation activities may involve quality assurance, file format conversion, creation of metadata records with assignment of Digital Object Identifiers (DOIs) to datasets, licensing data for re-use, and implementing any required access controls.
Publications based on data should include a data availability statement indicating where and on what terms the data can be accessed. A data repository will enable discovery of the data in its care by exposing the metadata online, and will provide access to the data when this is permitted. Data may be made publicly available, or restrictions on access may be imposed where data are of a sensitive or confidential nature.
Data that are available for discovery and access may be re-used by other researchers, either to substantiate the findings of the original research, or to inform new research activities. Research data may also have other valuable uses, e.g. in policy-making, development of commercial products and services, and teaching.