Skip to content

GSoD 2021 Meeting Minutes

Oriol Abril-Pla edited this page Aug 9, 2021 · 16 revisions

Friday, July 30

Agenda

Friday, June 18

Agenda

  • Documentation rendering and hosting

    • Migration to Read the Docs
    • Explain changes that are being introduced. More detail
    • Keep building the examples and the docs together or separate them? This affects the configuration of ablog and the use of javascript.
      • Pros and cons: do we want to version notebooks like we version docs?
      • Proposal: move away from javascript in the pymc-example and use pure rST/MyST with sphinx_panels
    • Versioned docs
      • Estimating the amount of work involved and the availability to do it.
      • Potential obstacles:
        • Configuration can take a long time to get right,
        • Configuring custom URL
        • Using RTD server (Aesara?)
        • Unknown unknowns
    • Search (new version should fix this with no additional work, but let's make sure we have that)
  • Overall structure of the docs

    • Cementing the structure tree
    • Tags and categories
      • Tags: topics (i.e. Linear regression, A/B testing)
      • Categories: levels (beginner, intermediate, advanced)
    • Getting started
    • Style guide
  • Feedback

    • Reviewing options

After-meeting Notes

Developmnent tasks

Oriol will focus on finishing the changes he already started, while Ravin does a quick test in the next couple of weeks to try to see exactly how difficult it might be to set up Read the Docs in order to have versioned docs. We can get in touch with Read the Docs devs who are willing to lend a hand.

Regarding this item:

Keep building the examples and the docs together or separate them?

Building them together is unnecessary since they don't need to be versioned and some take a very long time to build (up to three days).

The search bar is working fine now.

Non-development tasks

We agreed on using categories to label the different levels of documentation and tags to label the topics.

We're playing around with some ideas for site redesign.

  • Sections: Home, Installation, Learn, API, Developers, Community
    • Home: includes more information on what is PyMC3 and why use it, sponsors, marketing (PyMC for enterprise), governance (if updated), info on the difference between PyMC3 and PyMC3 V4.
    • Installation: info on the home page is broken and could be more detailed (i.e. troubleshooting)
    • API: nothing changes, we keep the automatically generated API documentation.
    • Learn: contains getting started and the entire tree structure for users.
    • Developers: contains the entire tree structure for developers
    • Community: Discourse, conferences, meetups, community guidelines.
  • The current about section can be deleted and its content placed in the other sections.
  • A footer will be visible in every page with links to find help (like Discourse) and socials.

We're not yet focused on this, but when we get to plan out the developer branch of documentation, we need to make sure people understand how to become PyMC3 developers so that more people feel welcome to join.

Idea: create a group focused on documentation. There are people that have been doing many contributions who are not part of the PyMC core team. We could start by inviting them. Martina will write a proposal for this.

Guidelines for deprecating notebooks (since it's hard to maintain them all): we can check if nobody uses them using Google Analytics. We should not have redundant notebooks.

Martina will write the style guide for notebooks over the weekend so that Abhipsha can use those guidelines in the work she's doing. The style guide will be included in the PR template.

Abhipsha will call out any duplicate/obsolete notebook she sees and flag it for deprecation, and also check out opportunities for tagging and categorizing notebooks by level of difficulty.

To think about: Some case studies that are intended to showcase the power of pymc and are useful in different ways at all levels - where's the best place to put them?


Friday May 21, 2021

These are the topics that were discussed during the first meeting, plus some notes taken after the meeting.

Sources

These are the sources I’m looking at. Am I missing anything?

  • docs.pymc.io
  • videos+books (linked in the docs)
  • GitHub readmes
  • Discourse

Integration

Integration of the existing standalone content into learner-focused guides that link components to one another in order to help users make sound decisions regarding the use of the software.

Step 1 for everyone who is getting started with PyMC3 is currently this quickstart guide.

  • Is there anything you would add/do differently to that guide?

I envision a tree where the trunk is the Step 1 guide (everybody starts there) and Step 2 might not be the same for everyone since they have different goals. As people make use of more specific techniques, their paths branch out further.

  • How can we help beginners progress from step 1 to step 2, assisting them in choosing the right path? How many “branches” would you say we need to document? Is there existing information we can refactor?

Revision

Revision of the tutorials and guides to reflect the important changes to the library that are currently underway, and to give them consistency, in terms of notation and language.

  • The first part might overlap partially/completely with the scope of Abhipsha’s work. How should we proceed?
  • As for the second part (consistency in terms of notation and language), are there specific examples that come to mind?

Expansion

  • Do you think a brief introduction to basic concepts would be helpful? These could include:
    • What is Probabilistic Programming?
    • What is MCMC?
    • What are variational inference algorithms?
    • What is a Bayesian model?
    • What is ArviZ?
  • What other concepts do beginners tend to need in order to use PyMC3?

Developer guide

As the current developer documentation consists of an API reference and a single developer guide notebook, we also view this as an opportunity to make PyMC3’s developer resources more robust, with the ultimate goal of attracting and retaining more contributors, and allowing all users the opportunity to better understand the underlying implementation of their favorite probabilistic programming methods.

  • Who should I interact with on this topic?

After-meeting Notes

All notes regarding the docs will be stored in the GSoD wiki

https://github.com/pymc-devs/pymc3/wiki/Season-of-Docs-2021-Proposal

Integration

Right now, creating a good structure is the main goal. Once the structure is created, people will fill in the gaps as they go. We just need to create the proper space for it to happen.

We can get inspiration from the Scikit-learn model in order to structure the documentation.

There are currently two starting points for beginners: the Quickstart guide (big button in the frontpage) and the Getting Started guide (which is the one that recieves more visits, maybe because it's linked in the paper). We agreed to unify both. That will be step 1 for beginners.

As for Step 2 (the branches of the tree), I (Martina) will look into the Discourse forum and see what users try to do after getting started, so we can orient them better. I also encourage PyMC3 collaborators to try and think what "categories” we can divide users into, according to their needs.

Revision

  • Revision of the tutorials and guides to reflect the important changes to the library that are currently underway
    • Abhipsha is in charge of this task and we will communicate as we move forwards to see where we need to collaborate.
  • Consistency in terms of notation and language
    • Most examples were created by mathematicians/statisticians (lots of equations) or computer scientists (lots of code)(*). For the most advanced notebooks, this is fine, because advanced users will know how to interpret them. Notebooks intended for beginners should be friendlier and more careful when introducing technical terms (maybe provide a quick definition or link to a useful explanation).
    • (*) Some of the notebooks produced by the latter are just code and markdown text, it would be neater to have plain text instead of markdown, but that's not something we need to solve right away, we can create an issue for the time being).

###Expansion

  • There are blog posts by Oriol, Thomas Wiecki, Ravin and Colin that provide explanations for many topics that might be helpful to link or work into the documentation. The same goes for some books, for example Bayesian Methods for Hackers. I will see if the open source policy of these books allows us to use examples or paragraphs in PyMC3's docs. Videos could be written down if we need.
  • A proper place needs to be created for people to add developer documentation. This is not only aimed at advanced users, anyone should be able to contribute according to their skills.
  • There's a a plan to move documentation to https://readthedocs.org/. If this happens, there will be “multi-version” documentation that is updated immediately, since it's not version-dependent, and version-controlled documentation. For the time being, this is not 100% certain.

Next steps

  • Go through the Discourse and Github issues to find common questions from beginners, and try to map out the tree structure we discussed. This will be a first draft that will most likely need multiple iterations before we're happy with it.
  • Research the open source policies of the books to see whether we can quote big chunks of them in the docs if we need to.
  • Scan the information contained in the blogs created by the community