Many kinds of research today involve creating software or writing code to generate, process and analyse data. Code written by researchers may be integral to reported findings, and should be preserved and made openly accessible where possible, so that others can reproduce results and verify analyses.
Some research will also involve the creation or modification of software, which may be subject to ongoing development and use within a research or commercial context. Release of source code under an Open Source licence can be a powerful tool for building a community of users and contributors, and enabling the creation of impact.
Writing of computer code is most widespread in the physical and life sciences, where programming for statistical analysis and data visualisation is common, and where there are areas of research based on computational simulation. But programming is also used in some social sciences and arts and humanities research, and where this is the case, considerations of code preservation and sharing must be entertained.
Funders' expectations in respect of software and code are mostly implicit in their policies on research data, where code is considered a component of the data necessary to validate research findings. This is the case for EPSRC, for example – see guidance from the Software Sustainability Institute. NERC also recognises that model code is a valuable research output which makes the research process more transparent and auditable, and should be preserved beyond the lifetime of the project. To that end, it includes guidance on the preservation of model code and model output alongside its data policy. The Wellcome Trust explicitly includes software alongside data in its Data, software and materials management and sharing policy.
Computer code that is necessary to the validation of research findings is within scope of the University's Research Data Management Policy, and should be preserved and shared wherever possible.
In most cases short scripts and segments for code written to perform standard operations, e.g. for purposes of data processing, statistical analysis or data visualisation, can be archived (with any comments/documentation) alongside data, under the same licence as the dataset (for example, a Creative Commons Attribution licence). This is best suited for situations where the code is likely to have little independent use value to anyone else, and any re-use is likely to be solely for the purpose of validating results, e.g. by re-running analyses described in a paper.
Where source code is more substantial or has been written in the context of an ongoing project or established community, and re-use in new contexts or further development is anticipated, a development-oriented approach to code management will be appropriate, with code released under an Open Source licence. This might be the case, for example, where code has been written to implement a simulation model, or where existing source code has been developed. There is more information about licences for software in the Open licences section of this Handbook. For detailed guidance on software licensing, consult our guide to publishing research software.
In this scenario it is best to release code (with an appropriate licence) using a code repository platform, such as the widely-used GitHub, or GitLab, which is available as a University service. This will provide a version-controlled environment for ongoing public management and development of code. If specific versions of the code used to generate published results need to be preserved and referenced from publications, code files can be exported from here and archived (again with an appropriate licence statement) to a suitable data repository. This can be easily done from GitHub, or by a more manual export-import process to other data repositories.
For managing and sharing code under ongoing development or with multiple contributors, it is a good idea to use an online code repository platform, such as the widely-used GitHub, or GitLab, which is available as a University service. Many code repository platforms are free to use (this is the case with the University GitLab), although you may be need to pay for more advanced features. They will provide version control, code review, bug tracking, documentation, and other features. Repositories can be private or public, so that code can be maintained during a closed development phase and then released for open use at an appropriate stage.
Online code repositories are good solutions for ongoing software development and for building a community of users and contributors. But it may also be desirable to archive a specific version of the code used to generate results reported in a paper. If a version of the code is archived to a suitable data repository, it can be assigned a DOI, making it citable.
GitHub has an integration with the Zenodo digital repository that allows you to archive a snapshot of code filesand get a DOI at a click. It is also easy enough to archive code hosted in the University GitLab or any other repository, but the process is a bit more ‘manual’: you would need to export your code files and then deposit them into your chosen repository along with relevant documentation and a licence statement. Most general data sharing services such as Zenodo and figshare are suitable for archiving code files. Code files can also be deposited in the University's Research Data Archive - our licence picker includes a selection of popular of Open Source licences, and other licence options can be specified as required.
If you have created a substantive piece of open software in your research, and you think it will be of use to others for research or other applications, or you would like to build a development community, or simply wish to gain some academic credit for your work, then you should consider publishing a software paper. This is a peer-reviewed article, published in an academic journal, which describes a piece of software that has been developed in a research context.
A software paper can be an effective means of advertising your code and encouraging others to make use of it and cite it. A software paper is also a citable output in its own right, and is a means to ensure that proper recognition is given to software developers and to the role of software development in research. Such a paper can also provide prospective users and developers of the software with valuable information about how and why it was developed, how it has been used, and how it might be used or further developed.
Bear in mind that the primary purpose of the software paper is to promote re-use, and many journals will require the software described to be available under an Open Source licence.
An example of a software paper published by University members is provided below.
There are plenty of journals that will publish software papers, including generalist publications and those serving specific disciplinary areas. Mathematics, the life sciences and the physical sciences are particularly well represented.