LibGuides: Open Research Handbook: Reproducibility

About reproducibility

Discussion of the 'reproducibility crisis' in scientific research has highlighted high rates of failure to replicate results of published studies. A survey of researchers published in Nature in 2016 reported that more than 70% of researchers have tried and failed to reproduce another scientist's experiments, and more than half have failed to reproduce their own experiments. Research whose results cannot be reproduced is unreliable and wasteful. In 2015 it was estimated that irreproducible biology research costs USD 28 billion per year.

Various reasons have been adduced for low rates of reproducibility, including poor reporting of research methods, weaknesses in study design and statistical analysis, and failure to provide access to data and software code supporting published results.

Critics also accuse fundamental flaws in the academic reward system, which overhwhelmingly values the rapid publication of novel results in high-impact journals, and lacks rigorous, systematically-applied reproducibility standards. Researchers are incentivised to take the shortest route to publication, to over-report significance and to under-substantiate results. It is argued that if the reward system were to put a higher premium on verifiability, and if researchers were more motivated to make the hypotheses, methods and data supporting scientific findings open, they would be more likely to be produce reproducible and reliable research, and the levels of waste and risk of fraud would be reduced.

The case for reform is being actively promoted by many researchers across the empirical sciences. In A manifesto for reproducible science a group of concerned researchers propose a series of measures that can be taken by stakeholders in scientific research, including researchers, research organisations, funders and publishers, to improve research efficiency and the robustness of scientific findings.

Here we outline some of the key steps you can take to increase the reproducibility of your research.

7 top tips for reproducibility

Reproducibility begins with planning. Writing a Data Management Plan (DMP) at the outset of a project can help you to maximise the reproducibility of your research. Some funders (including most Research Councils, the European Commission, the Royal Society and the Wellcome Trust), will ask you to submit a DMP as part of a grant application. Advantages of writing a DMP are:

It helps you to plan how the data you collect or generate will be managed both during the project and for the long term, and identifies at an early stage requirements that need to be addressed, for example, the need to obtain consent for data sharing;
Where data are managed within a research group or in a partnership, it helps to document roles and responsibilities, so that data are managed efficiently and consistently to agreed standards;
In collaborative research activities it can help to establish Intellectual Property Rights and data ownership, and permitted uses of the data by others, so that confusions or disagreements over ownership and use of the data can be avoided;
It allows you to identify the costs of data management activities, which you may be able to recover through your grant.

The University provides guidance on data management planning, including guides on writing DMPs for specific funders.

2. Use electronic tools to document experimental protocols and lab notes

You should document and publish your experiments in sufficient detail to allow experiments to be replicated. DMPs and protocols should be created at the outset and kept under review - more detail can be added as you engage with the day-to-day practicalities of running your experiments and managing data.

Instead of maintaining closed documentation and using paper lab notebooks, consider using online tools such as protocols.io, Benchling, Labstep or RSpace to record and and publish experimental protocols and lab notes. These browser-based tools can be used for free (with advanced features available on subscription), and provide an efficient means for an individual or group to record, annotate and publish detailed information about experimental procedures. For example, using protocols.io you can develop and annotate your protocols (in a closed group or in public) over time in a version-controlled process, and published versions can be assigned DOIs and linked from the methods section in a paper. The RSpace online lab notebook is integrated with protocols.io and notebooks can be archived to repositories such as figshare and Dataverse for public sharing.

There is a wide variety of applications to suit a range of different research methods and areas.

3. Pre-register your study design

In some areas of research, notably in the health and psychological sciences, practices are becoming established for the registration of study hypotheses and protocols in advance of undertaking the research. The rationale for this is to provide transparency about the research methods used, and to eliminate poor practice, such as hypothesising after the results are known (HARKing) and cherry-picking of results to ‘create’ or exaggerate significance. Registration of clinical trials is mandatory in many countries, and growing numbers of researchers are using platforms such as the Open Science Framework to register study protocols.

Public registration of hypotheses and protocols can establish the priority of a research approach and safeguard the integrity of results. Various models for introducing formal peer review of research processes into earlier stages of the research pathway have also emerged. This can increase the quality of study design and the reliability/reproducibility of results. It also provides a solution to the phenomenon of publication bias - where the decision to publicise or disseminate research is based on the perceived significance or interest of the results. A number of publishers now offer registered reports options, by means of which researchers can submit a study design for peer review and on acceptance receive a commitment from the journal to publish the final results.

4. Be computationally reproducible

There is almost no research that cannot become more efficient and reproducible by the intelligent use of computational methods. Computers, unlike humans, do what they are told to do in exactly the same way, again and again and again, and preserve a record of doing it.

Instead of pointing and clicking, script your workflows for generating, downloading, processing, and analysing data. Use a fully reproducible programming language. Free languages such as python and R are fully open and universally accessible, unlike some proprietary languages, such as SPSS syntax, which require a software licence to be used.

Remember to use a version control system to manage your code, and make sure it is well-commented. Popular choices are GitHub, and GitLab, which is available through a dedicated University server. Preserve and document the code you write so that others can reproduce your workflows and analyses.

If you don't know where to start, read Five selfish reasons to work reproducibly and Ten simple rules for reproducible computational research. The Software Sustainability Institute provides plenty of guidance to help you get up to speed. The Turing Way project is developing a handbook for reproducible data science. Lastly, look for help within the University: SPCLS runs a Coding Club.

5. Get your statistics right

Are your statistics up to scratch? Is your experiment sufficiently powered? Common errors in reporting of research findings arise from lack of statistical and methodological rigour. As the authors of A manifesto for reproducible science report, 'the interpretation of P values, limitations of null-hypothesis significance testing, the meaning and importance of statistical power, the accuracy of reported effect sizes, and the likelihood that a sample size that generated a statistically significant finding will also be adequate to replicate a true finding, could all be addressed through improved statistical training'. If you are not confident in your skills, take steps to educate yourself. There are plenty of online courses in statistical methods for research.

6. Share your data and code using repositories

To be transparent and reproducible, published research findings must be backed up by openly accessible supporting data and code. Supporting materials should be preserved and made available using suitable repositories, and referenced from related publications by means of a DOI citation. See the sections of this guide on Open research data and Open research software and code for further guidance.

7. Use the power of peer review

You don't have to be tied to the traditional model of publication-stage closed peer review. You can bring the power of peer review to bear on your study design and methods by using a pre-registration or registered reports model. You can publish a preprint of your findings and invite feedback from your peer network, or submit your work to a journal that operates an open peer review system. This can make the peer review process more transparent and improve its quality.

Reproducibility resources

The Turing Way
A handbook for reproducible data science.
SPCLS Coding Club
Contains information about sessions and course materials: using University computing resources, coding for complete beginners, Unix basics, introduction to R and python.
A manifesto for reproducible science
A call to action and checklist of practices that can be used to improve reprducbility.
protocols.io
A collaborative platform and preprint server for documenting and sharing methods
Electronic Lab Notebooks - for prospective users
A useful overview of different Electronic Lab Notebook products.
Open Science Framework
A popular platform for reproducible research, which can be used for study pre-registrations and to share preprints, and provides collaborative functionality with file sharing. OSF includes integrations with cloud storage, code repository and data archiving services.
Registered reports
A complete guide to registered reports, with an up-to-date list of participating journals
UoR GitLab guide
A guide to using the University GitLab server to manage your code, with a link to the service
UK Reproducibility Network
The UK Reproducibility Network (UKRN) is a peer-led consortium that promotes the adoption of reproducible research policies and practices in UK research organisations. The Local Lead at the Univerversity of Reading is Etienne Roesch, Associate Professor in Pscyhology.