Skip to content

PyMC3 Jupyter Notebook Style Guide

Martina Cantaro edited this page Aug 22, 2021 · 27 revisions

These guidelines should be followed by all notebooks in the documentation.

General guidelines

  • Don't use abbreviations or acronyms whenever you can use complete words. For example, "RVs" instead of "random variables".

  • Explain the reasoning behind each step.

  • Use the glossary whenever possible. If you use a term that is defined in the Glossary, link to it the first time that term appears in a significant manner. Use this syntax to add a term reference.

  • Attribute quoted text or code, and link to relevant references.

  • Keep notebooks short: 20/30 cells for content aimed at beginners or intermediate users, longer notebooks are fine at the advanced level.

Variable names

  • Above all, stay consistent with variable names within the notebook.

  • Use meaningful variable names wherever possible. Our users come from different backgrounds and not everyone is familiar with the same naming conventions.

  • Sometimes it makes sense to use Greek letters to refer to variables, for example when writing equations, as this makes them easier to read. In that case, use LaTeX to insert the Greek letter like this $\theta$ instead of using Unicode like θ

  • If you need to use Greek letter variable names inside the code, please spell them out instead of using unicode. For example, theta, not θ.

Development guidelines

PyMC3 has a very rich notebook (NB) gallery. With the goal of standardizing and giving an identity to this gallery, here are a few steps to check when you create or update a NB:

  1. In a cell just below the cell where you imported matplotlib (usually the first one), set the ArviZ style and display format (this has to be in another cell than the MPL import because of the way MPL sets its defaults):

    RANDOM_SEED = 8927
    rng = np.random.default_rng(RANDOM_SEED)
    az.style.use("arviz-darkgrid")

    A good practice when generating synthetic data is also to set a random seed as above, to improve reproducibility. Also, please check convergence (e.g. assert all(r_hat < 1.03)) because we sometime re-run notebooks automatically without carefully checking each one.

  2. Use a try... except clause to load the data and use pm.get_data in the except path. This will ensure that users who have cloned pymc-examples repo will read their local copy of the data while also downloading the data from github for those who don't have a local copy. Here is one example:

    try:
        df_all = pd.read_csv(os.path.join("..", "data", "file.csv"), ...)
    except FileNotFoundError:
        df_all = pd.read_csv(pm.get_data("file.csv"), ...)
  3. We run some code-quality checks on our notebooks during Continuous Integration. The easiest way to make sure your notebook(s) pass the CI checks is using pre-commit. You can install it with

    pip install -U pre-commit

    and then enable it with

    pre-commit install

    Then, the code-quality checks will run automatically whenever you commit any changes. To run the code-quality checks manually, you can do, e.g.:

    pre-commit run --files notebook1.ipynb notebook2.ipynb

    replacing notebook1.ipynb and notebook2.ipynb with any notebook you've modified.

    NB: sometimes, Black will be frustrating (well, who isn't?). In these cases, you can disable its magic for specific lines of code: just write #fmt: on/off to disable/re-enable it, like this:

    # fmt: off
    np.array(
        [
            [1, 0, 0, 0],
            [0, -1, 0, 0],
            [0, 0, 1, 0],
            [0, 0, 0, -1],
        ]
    )
    # fmt: on
  4. Once you're finished with your NB, add a very last cell with the watermark package. This will automatically print the versions of Python and the packages you used to run the NB -- reproducibility rocks! Here is some example code. Note that the -p argument may not be necessary (or it may need to have different libraries as input), but all the other arguments must be present.

    %load_ext watermark
    %watermark -n -u -v -iv -w -p theano,xarray

    watermark should be in your virtual environment if you installed our requirements-dev.txt. Otherwise, just run pip install watermark. The p flag is optional but should be added if Theano (or Aesara if in v4) or xarray are not imported explicitly. This will also be checked by pre-commit (because we all forget to do things sometimes 😳).

You're all set now 🎉 You can push your changes, open a pull request, and, once it's merged, rest with the feeling of a job well done 👏 Thanks a lot for your contribution to open-source, we really appreciate it!