Reproducibility SOLUTIONS: Taking the Pulse
After brief introductions and background on the goals of the Emerging Interest Group (EIG) on Reproducibility and Replicability and the P-WG, the group expressed its priorities for the conversation. We bring you a summary of the conversation here. For each of the topics we discussed — finding the right tools, optimizing their use, and supporting their sustainability — we highlight the key questions.
Finding the right tools
Is there an inventory of reproducibility enabling or enhancing tools? What criteria do / can we use to assess tools and solutions? (e.g., wide-spread use, open source, interoperability, FAIR)
It was noted that the P-RECS Workshop, which focuses heavily on practical, actionable aspects of reproducibility, is inviting submissions for P-RECS’21 and suggesting the following tools to automate experiments (not an exhaustive list): CK, CWL, Popper, ReproZip, Sciunit, Sumatra.
- What standards should reproducibility tools meet?
An important consideration is the standards the community would want tools to meet. In addition to the standards mentioned in our pre-meeting blog post (wide-spread use, open source, interoperability, FAIR), participants suggested two other standards: simplicity and veracity.
Simplicity captures the idea that use of the tool should be easy and easily integrated with current research practices so as to increase the likelihood that researchers will use it.
Veracity refers to the tool’s ability to execute computation as part of the full research cycle, including veracity of results. That is, not only the tool’s ability to generate a result, but to generate the same result as originally claimed. This ties in nicely with current conversations around reproducibility and quality assessment (QA) and quality control (QC), for example, ACM Policy on Artifact Review and Badging in computer science, Willis & Stodden, 2020 on journals addressing concerns about the quality and rigor of computational research.
This also raises a practical question of how to verify results at scale. A comment was made that the ability to determine comparability of results based on an automated comparison should be added to a “reproducibility wish list.”
- How can standards be applied at scale?
- Who should be entrusted to do a public evaluation of reproducibility tools?
Optimizing the use of reproducibility tools
How can we collaborate towards an interoperable ecosystem of tools to satisfy different reproducibility needs? Do we want to discourage one-off tools? Do we want to aim toward a unified system that can accommodate specialized tools?
With a proliferation of tools, it may be challenging for researchers to know what to use. Common tools and home grown solutions may meet immediate needs but fall short as far as other aspects of reproducibility (e.g., openness).
Participants expressed the opinion that no single tool provides a comprehensive solution for reproducibility due to the variety and complexity of issues and contexts, and that there is a need to create an ecosystem of reproducibility tools.
- How do we map the boundaries of an ecosystem of reproducibility tools?
- How do we build an ecosystem of reproducibility tools?
Whether a formal ecosystem of reproducibility tools emerges or a list of available tools is both a conceptual and practical question. It was suggested that a useful first step toward conceptualizing an ecosystem might be to categorize reproducibility tools, for example:
By the phase in the research cycle: “reproducibility in hindsight” vs “planning for reproducibility.”
By the main function: Web-based integrated development environment (IDE) (e.g. WholeTale), Web–based replay systems (e.g. Binder), packaging tools (e.g. ReproZip), containers (e.g. Docker, Singularity).
By level of specificity: tools that capture the computation steps vs the computation environment.
It was also suggested at the meeting that workflows need to be considered. Progress mentioned include work to identify the crucial research challenges related to workflows, (e.g., the Workflows Community Summit, Ferreira da Silva et al, 2021), work to define canonical components in the research lifecycle (Hardisty & Wittenburg, 2020), and efforts to capture information such as provenance and performance metrics about them (e.g., Pouchard et al, 2019). It is common in some disciplines to use workflow tools (e.g., biotech, machine learning, computing systems) and it was suggested that these communities can be brought together to collaborate around issues of reproducibility.
Supporting the sustainability of reproducibility tools
What are the lessons of recent efforts around research software sustainability?
Efforts around research software sustainability are highly relevant to this community. The time horizon for functional software is typically shorter than it is for data, with major implications for computational reproducibility. A working group at the Research Data Alliance is a good source of information about these issues.
- What assurance does the community want to have about the long-term usability of reproducibility tools?
- Who holds the responsibility for sustaining research software that supports reproducibility?
In order to bring the use of reproducibility tools to the fore, and therefore help make them visible to funders and potentially increase the likelihood that they will be sustained, the use of a “reproducibility plan” was suggested. This idea, and the educational and training effort required to support it, will be discussed at a future “Taking the Pulse” conversation.
This is the second in a series of open community meetings in which the P-WG explores issues related to reproducibility and document the views of different communities, culminating in recommendations informed by current practices. See our previous post on reproducibility principles. Each month we focus on a particular topic related to reproducibility: principles, solutions, training, publishing, and preservation.
By Limor Peer and Vicky Rampin
April 30, 2021