Skip to content

GSoD 2021 Meeting Minutes

Martina Cantaro edited this page Jun 18, 2021 · 16 revisions

Friday May 21, 2021

These are the topics that were discussed during the first meeting, plus some notes taken after the meeting.

Sources

These are the sources I’m looking at. Am I missing anything?

  • docs.pymc.io
  • videos+books (linked in the docs)
  • GitHub readmes
  • Discourse

Integration

Integration of the existing standalone content into learner-focused guides that link components to one another in order to help users make sound decisions regarding the use of the software.

Step 1 for everyone who is getting started with PyMC3 is currently this quickstart guide.

  • Is there anything you would add/do differently to that guide?

I envision a tree where the trunk is the Step 1 guide (everybody starts there) and Step 2 might not be the same for everyone since they have different goals. As people make use of more specific techniques, their paths branch out further.

  • How can we help beginners progress from step 1 to step 2, assisting them in choosing the right path? How many “branches” would you say we need to document? Is there existing information we can refactor?

Revision

Revision of the tutorials and guides to reflect the important changes to the library that are currently underway, and to give them consistency, in terms of notation and language.

  • The first part might overlap partially/completely with the scope of Abhipsha’s work. How should we proceed?
  • As for the second part (consistency in terms of notation and language), are there specific examples that come to mind?

Expansion

  • Do you think a brief introduction to basic concepts would be helpful? These could include:
    • What is Probabilistic Programming?
    • What is MCMC?
    • What are variational inference algorithms?
    • What is a Bayesian model?
    • What is ArviZ?
  • What other concepts do beginners tend to need in order to use PyMC3?

Developer guide

As the current developer documentation consists of an API reference and a single developer guide notebook, we also view this as an opportunity to make PyMC3’s developer resources more robust, with the ultimate goal of attracting and retaining more contributors, and allowing all users the opportunity to better understand the underlying implementation of their favorite probabilistic programming methods.

  • Who should I interact with on this topic?

After-meeting Notes

All notes regarding the docs will be stored in the GSoD wiki

https://github.com/pymc-devs/pymc3/wiki/Season-of-Docs-2021-Proposal

Integration

Right now, creating a good structure is the main goal. Once the structure is created, people will fill in the gaps as they go. We just need to create the proper space for it to happen.

We can get inspiration from the Scikit-learn model in order to structure the documentation.

There are currently two starting points for beginners: the Quickstart guide (big button in the frontpage) and the Getting Started guide (which is the one that recieves more visits, maybe because it's linked in the paper). We agreed to unify both. That will be step 1 for beginners.

As for Step 2 (the branches of the tree), I (Martina) will look into the Discourse forum and see what users try to do after getting started, so we can orient them better. I also encourage PyMC3 collaborators to try and think what "categories” we can divide users into, according to their needs.

Revision

  • Revision of the tutorials and guides to reflect the important changes to the library that are currently underway
    • Abhipsha is in charge of this task and we will communicate as we move forwards to see where we need to collaborate.
  • Consistency in terms of notation and language
    • Most examples were created by mathematicians/statisticians (lots of equations) or computer scientists (lots of code)(*). For the most advanced notebooks, this is fine, because advanced users will know how to interpret them. Notebooks intended for beginners should be friendlier and more careful when introducing technical terms (maybe provide a quick definition or link to a useful explanation).
    • (*) Some of the notebooks produced by the latter are just code and markdown text, it would be neater to have plain text instead of markdown, but that's not something we need to solve right away, we can create an issue for the time being).

###Expansion

  • There are blog posts by Oriol, Thomas Wiecki, Ravin and Colin that provide explanations for many topics that might be helpful to link or work into the documentation. The same goes for some books, for example Bayesian Methods for Hackers. I will see if the open source policy of these books allows us to use examples or paragraphs in PyMC3's docs. Videos could be written down if we need.
  • A proper place needs to be created for people to add developer documentation. This is not only aimed at advanced users, anyone should be able to contribute according to their skills.
  • There's a a plan to move documentation to https://readthedocs.org/. If this happens, there will be “multi-version” documentation that is updated immediately, since it's not version-dependent, and version-controlled documentation. For the time being, this is not 100% certain.

Next steps

  • Go through the Discourse and Github issues to find common questions from beginners, and try to map out the tree structure we discussed. This will be a first draft that will most likely need multiple iterations before we're happy with it.
  • Research the open source policies of the books to see whether we can quote big chunks of them in the docs if we need to.
  • Scan the information contained in the blogs created by the community

Friday, June 21

Agenda

  • Migration to Read the Docs (estimated time: 30 mins)

    • Explain changes that are being introduced. More detail
    • Keep building the examples and the docs independently or merge them together? This affects the configuration of ablog and the use of javascript.
      • Pros and cons: do we want to version notebooks like we version docs?
    • Versioned docs
      • Estimating the amount of work involved and the availability to do it.
      • Potential obstacles:
        • Configuration can take a long time to get right,
        • Configuring custom URL
        • Using RTD server (Aesara?)
        • Others
    • Search
  • Overall structure of the docs (estimated time: 20 mins)

    • Cementing the structure tree
    • Tags and categories
      • Tags: topics (i.e. Linear regression, A/B testing)
      • Categories: levels (beginner, intermediate, advanced)
    • Getting started
    • Style guide
  • Feedback

    • Reviewing options