Guides and Explanations#

This section contains some guides that have proven useful before when you are starting a project from scratch or porting an existing project.

In case you are unsure about the use(fulness) of Conda Environments and Pre-Commit Hooks you will find concise explanations below.

Starting a new project from scratch#

Your general strategy should be one of divide and conquer. If you are not used to thinking in computer science / software engineering terms, it will be hard to wrap your head around all of the things that are going on. So write one bit of code at a time, understand what is happening and why, and move on.

Assuming you have installed the template for the language(s) of your choice as described in Customising the template for your needs, my recommendation would be as follows.

Leave the examples in place.
Now add your own data and code bit by bit. Append the task_xxx files as necessary or create new ones.
Remove the build directory regularly to make sure you do not rely on outputs from tasks that do not exist any more — this is a frequent source of confusion.
Once you got the hang of how things work, remove the examples (both the data files and the code in the task_xxx files). Also remove the build directory.

Porting an existing project#

Your general strategy should be one of divide and conquer. If you are not used to thinking in computer science / software engineering terms, it will be hard to wrap your head around all of the things that are going on. So move one bit of code at a time to the template, understand what is happening and why, and move on.

Assuming that you use Git, first move all the code in the existing project to a subdirectory called old_code. Commit.
Now set up the templates.
Start with the data management code and move your data files to the spot where they belong under the new structure.
Move (the first steps of) your data management code to the folder under the templates. Modify the task_xxx files accordingly or create new ones.
Run pytask, adjusting the code for the errors you’ll likely see.
Move on step-by-step like this.
Delete the example files and the corresponding sections of the task_xxx files / the entire files in case you created new ones.

Conda Environments#

Progammes change. Few things are as frustrating as coming back to a project after a long time and spending the first {hours, days} updating your code to work with a new version of your favourite data analysis library. The same holds for debugging errors that occur only because your coauthor uses a slightly different setup.

The solution is to have isolated environments on a per-project basis. Conda environments allow you to do precisely this. This page describes them a little bit and explains their use.

The following commands can either be executed in a terminal or the Anaconda prompt (Windows).

Using the environment#

In the installation process of the template a new environment was created if it was not explicitly declined. It took its specification from the environment.yml file in your projects root folder.

To activate it, execute:

$ conda activate <env_name>

Repeat this step every time you want to run your project from a new terminal window.

Setting up a new environment#

If you want to create a clean environment we recommended specifying it through an environment.yml file. Below we show the contents of an example environment.yml file. A detailed explanation is given in the Conda documentation.

name: <env_name>

channels:
  - conda-forge
  - defaults

dependencies:
  - python=3.10
  - numpy
  - pandas
  - pip
  - pip:
    - black

If the environment.yml file exists you can create the environment using

$ conda create -f path/to/environment.yml

Updating packages#

Make sure you activated the environment by conda activate <env_name>. Then run

$ conda update [package]

to update a specific [package], or run

$ conda update --all

to update all packages.

Installing additional packages#

To list installed packages, activate the environment and type

$ conda list

If you want to add a package to your environment, add it to the environment.yml file. Once you have edited the environment.yml file, run

$ conda env update -f environment.yml

Choosing between conda and pip

Generally it is recommended to use conda whenever possible. It is a necessity for many scientific packages. These often are not pure-Python code and pip is built mainly for that. For pure-Python packages, sometimes nobody bothered to set up a conda package and we use pip.

If you add a package under dependencies: in the environment.yml file, conda will try to install its own package. If you add a package under pip:, conda will try to install the package via pip.

Information about your conda environments#

For listing your installed conda environments, type

$ conda info --envs

The currently activated one will be marked.

Pre-Commit Hooks#

Pre-commit hooks are checks and syntax formatters that run upon every commit. If one of the hooks fails, the commit is aborted and you have to commit again after you resolved the issues raised by the hooks. Pre-commit hooks are defined in the .pre-commit-config.yaml. The example project contains most hooks you will need. Below we present three common hooks. Note that some hooks are programming language agnostic while others work on a specific language. You can find a list of most hooks in the pre-commit documentation under Supported hooks.

black: Reformats your python code according to a universal standard. Blackened code looks the same regardless of the project you’re reading. Having black as a hook allows you to focus on the content while writing code and let the formatting be done automatically before each commit.
check-yaml: Checks whether all .yaml and .yml files within your project are valid yaml files. Similarly, having check-yaml as a hook allows you to focus on the content while writing yaml files. If you accidentally use a wrong syntax this hook will tell you before you commit.
codespell: Fixes common misspellings in text files. It’s designed primarily for checking misspelled words in source code, but it can be used with other files as well.

If you want to skip the pre-commit hooks for a particular commit, you can run:

$ git commit -am <your commit message> --no-verify

Previous topic

Next topic