Reproducibility TRAINING & EDUCATION: Taking the Pulse

The community meeting on reproducibility training and education took place on May 20, 2021 (see meeting notes, slides, and invitation).

In this post, we will summarize the main themes that emerged from the conversation and highlight key remaining questions. 

The goal of reproducibility training and education

An important distinction needs to be recognized between the goals of instilling basic values and ensuring good (or best) practices on an ongoing basis. Both goals are essential and each has its own set of considerations. To instill basic values of responsible conduct, scientific rigor, and research integrity, the academy requires that researchers receive training (e.g., Responsible Conduct of Research at the NIH). In that context, reproducibility is often taught as a central concept and a guiding principle.

When it comes to fixing good habits and ensuring that reproducible practices are widely implemented (and improved), training is often taught haphazardly and “on the job.” There are weak incentives for researchers to refresh their practical reproducibility skills and knowledge and generally weak enforcement mechanisms.

·        What are the fundamentals of reproducibility training?

·        What is the right sequence for teaching reproducibility?

Research integrity and transparency are key concepts, as are data management and organization.

In the United States the NIH curriculum on Responsible Conduct of Research (RCR) covers topics related to scientific rigor and integrity with pre-clinical research. These trainings are critical but tend to be taught once, as a requirement for new faculty or post-doc, for example, but are not broadly embedded or enforced in the practice of research. The signal new researchers may be receiving is that fundamentals such as integrity, transparency, and ethics as they relate to reproducibility are somehow “extra” on top of what they actually need to do to get a paper published in a high-impact journal.

The observation that the daily grind of research practice, in some corners of the academy, sometimes works against some ethical norms (e.g., who gets authorship) in the race to publish is not new. In the community meeting, there was strong consensus that for ethical concepts to take hold they ought to be applied to the daily practice of research on an ongoing basis. This includes training in data management and basic organization skills, including directory structure. For example, educating researchers to use a common directory structure for research projects to follow FAIR principles or the TIER Protocol. Training (and tool development) in automated and standardized data collection is a worthy investment and will facilitate reproducibility. The RCR (and similar) training is often a successful outreach point for librarians teaching data management. Among the participants in the community meeting who are teaching reproducibility, the sentiment was that transparency is an underlying focus in all their training, but that “90% of reproducibility is research data management.”

This may be thought of as an issue of progression, sequence, timing, and frequency. Currently there is not enough thought put into how all the pieces fit together.

Approaches to teaching reproducibility

·        What is the most effective format for teaching reproducibility?

The short answer is that it’s difficult to say. There are various formats in use, ranging from a single “one and done” lesson, to more in-depth workshops, to embedding reproducibility in formal academic training, to online self-taught modules and MOOCS, lectures, and videos. The NSF Advisory Committee for Cyberinfrastructure (ACCI) is looking into the creation of educational modules about reproducibility that can be added into academic curricula. In addition, efforts to cultivate “communities of practice” such as ReproducibiliTea, which is run by graduate students, can be an effective mechanism to introduce or supplement reproducibility training on campus. The longer answer is that all formats are useful and have their place. 

There is not much by way of assessing reproducibility training. Some efforts were reported by librarians working on reproducibility in the conference, Librarians Building Momentum for Reproducibility, including an ethnographic study of a hands-on workshop on reproducibility. It was reported in the meeting that UC Berkeley does informal assessment and case studies at the time of RCR training looking at re-invitations and tracking consultations requests. This may point to the difficulty with assessing the success of practical training. For some high level perspective, see this issue of the Harvard Data Science Review.

·        Who are the main recipients of this training?

The observation was made that reproducibility training can benefit a large and broad constituency. From PIs who are increasingly asked to comply with funder, journal, and institutional requirements, to junior faculty and graduate students, who can benefit from fixing ideas and habits at the start of their academic career, to research administrators and staff who have various touch points with the research process and therefore opportunities to support (and enforce) reproducibility.

The group seemed to agree that a lot of progress can be made, in particular, by creating pathways for graduate students, both because they are often intrinsically motivated and because there are opportunities to educate them about reproducibility in the curriculum.

·        What incentives are in place for reproducibility training?

Required reproducibility training in universities typically focuses on RCR, as mentioned above. There are fewer formalized incentives for researchers to seek additional, more practical training. Viewing replications (successful and unsuccessful) as part of tenure and promotion may be a strong incentive for researchers to seek more training. For example, one university changed its promotion structures to emphasize and encourage collegiality. “Top down” approaches that emphasize new norms for DEI (diversity, equity and inclusion) and recruitment can raise the profile of reproducibility and transparency practices and encourage people to seek more training. Additionally, universities and funders can do more to promote work that supports reproducibility infrastructure, for example, taking code from a prominent publication and turning it into reproducible software.

The desire to publish in top journals may incentivize researchers to seek training if their target journal requires reproducibility. The American Economic Association, for example, requires reproducibility verification in its journals. ReScience is a journal dedicated to publishing replications. Obviously, some researchers are self-motivated and will seek out training opportunities (and badges may provide a nudge for some).

Who trains?

In the context of individual universities, RCR training is often conducted by the office of research. When offered, data management is generally taught by librarians, and relevant tools and solutions offered on campus are typically taught by librarians and research support professionals. Discipline-specific reproducible research techniques and tools may be taught in methods classes and may be modeled at labs. A promising development at some universities is something akin to a reproducibility resource renter (e.g., University of Michigan Reproducibility Hub; see also TU Delft’ Research Data Management group and Digital Competence Center; TU Graz program on reproducibility plans).

Use of university-provided resources varies. One participant reported that at their university there is a designated Academic Lead for Research Improvement and Research Integrity who has reproducibility in their remit, and that actual support varies from department to department. Some universities mount an effort to train researchers to use reproducibility-aligned tools, for example, HPC and Carpentries in bioinformatics but “getting researchers trained in programming feels Sisyphean.” The general point is that reproducibility training at universities should be located close to research labs, rather than campus-wide. Ideally, one person per lab would be named the “reproducibility expert.”

We note that several organizations and groups offer reproducibility training (see list compiled by attendees in the collaborative notes).

Perceived gaps in reproducibility training

Not enough focus on practical skills: Creating sustainable habits is a challenge. Reproducibility is often taught in the context of training about responsible conduct of research, as a concept related to transparency and integrity, but institutions often do not require that it be taught as part of practical training (or enforce that it is practiced).

Too much focus on tools: Much training is focused on cool tools at the expense of the fundamentals such as data management. Demand for training on popular tools may affect the allocation of teaching resources away from more fundamental (but less trendy) topics. 

Teaching documentation needs to be a priority: Teaching best practices in documentation is a priority given its high value for reproducibility as compared with the dismal implementation. More work can be done to highlight the benefits of applying good data management practices from the beginning (for example, by pointing out to researchers and universities the sunk cost of irreproducible or unusable research).

Unsupported mandate: Journals that systematically check reproducibility do not train their reviewers. There is opportunity here for the ACM to play a role and recommend training for authors and reviewers.

Lack of enforcing norms: Reproducibility training will not take hold until responsible and ethical conduct of research is a social norm enforced by institutions, funders, and journals.

Lack of incentives: Incentives to produce and publish a lot are often perceived as at odds with the working reproducibly.

By Limor Peer and Vicky Rampin

June 2, 2021