Templates for Reproducible Research: Documentation#

Introduction#

An empirical or computational research project only becomes a useful building block for science and policy when all steps can be easily repeated and modified by others.

Hence, some actions should be absent as much as possible. This includes copying and pasting, pointing and clicking with a mouse, or other forms of interactive input, which are not stored as part of the project.

The idea behind these templates is that the researcher specifies a set of tasks, which are executed in the correct order as required. The only input for (re-)producing results will be the action setting this pipeline to run.

This code base aims to provide two stepping stones to assist you in achieving this goal:

  1. A sensible directory structure. This will save you a bunch of thinking about this structure time and again, which typically happens when incrementally building up a new project. Put differently, instead of starting from scratch, you modify an example for your needs.

  2. Pre-configured useful tools such as pre-commit hooks, a conda environment, and pytask. These tools help you to maintain a clean code base and to automate the workflow of your project.

  3. A pre-configured instance of pytask, which facilitates the reproducibility of your research findings from the beginning to the end by letting the computer handle the project’s workflow.

The first should lure you in quickly. The second should convince you to stick to the tools in the long run – unless you have fought with large research projects before, at this point you may think that all of this is overkill and far more difficult than necessary. It is not. [although I am always happy to hear about easier alternatives]

The templates support a variety of programming languages already. They can be easily extended to cover others. Everything is tied together by pytask, which is written in Python. You do not need to know a lot of Python to use these tools, though.