Django under the hood: documentation systems - Eric Holscher

Tags: django, djangocon

(One of my summaries of a talk at the 2015 django under the hood conference).

Note by Reinout: I hope all the examples are rendered OK as I’m writing this blog with… sphinx and restructured text :-)

Eric Holscher started https://readthedocs.org/ a few years ago during a “django dash” weekend. That site is widely used and even gave rise to “write the docs” conferences and a community around technical writing.

Readthedocs and also the django documentation uses sphinx.

A documentation system? It is not just a single documentation page on one item. It is documentation for a whole project. Sphinx extends RST (restructured text) with additions for documenting software. It also adds semantic meaning.

Lot of documentation is written in markdown nowadays. A link to pep8 in markdown would be a simple link to pep8 on the internet. In sphinx/rst it could be:

Check out :pep:`8`

He becomes sad if he sees documentation written in markdown.

“Read the docs” is build on “sphinx” is build on “docutils” is build on “restructured text”. Here are some of the internal concepts of RST:

  • A reader reads input.

  • A parser takes the input and actually turns it into a “Doctree”. RST is the only parser implemented in docutils. It handles directives, multi-line inputs, etc.

  • The “doctree” is like an AST for docutils. This is the basis for everything else. The tree consists of nodes.

  • Nodes can be structural elements like document, section, sidebar. Body elements like paragraph, image, note. Inline elements like emphasis, strong, subscript.

    The most common types of nodes are “Text nodes”.

  • RST “directives” are the most common extension mechanism. It allows block level extension of RST.

  • You can also use “RST interpreted text roles”. It allows paragraph-level extension of RST. :pep: 8 is an example.

RST is a really neat language. Some directives are tied to RST because of the way it is parsed, which makes it hard to re-use things. We need to think about how to port this to other parses so that we could use/extend markdown and so as we might be too tied to RST.

Once the parser is ready, you can start doing things with it.

  • “Transformers” take the doctree and modify it in place. It allows for full knowledge of the tree, for instance for generating a TOC.

  • “Visitors” allow you to quickly add arbitrary node types. You implement a visitor with a visit_yournodename() and a depart_yournodename() function that outputs content.

  • “Translators” convert the doctree back to an output type. Html, pdf, etcetera.

  • “Writers” are used to actually write the translated items to disk.

So… doctutils READS the document, PARSES it, TRANSFORMS it and TRANSLATES/WRITES it to output.

On top of RST you have sphinx.

  • The sphinx “application” is the cental part that steers the entire process.

  • The sphinx “environment” keeps state for all the files for a project. It is serialzed to disk in-between runs.

    It is cached as pickles between run. This makes re-building the documentation much faster if you’ve only changed one file.

  • “Builders” are wrappers around docutils’ writers. It generates all types of outputs. It generates most HTML output through Jinja templates instead of using Transalators.

Sphinx has lots of events like source-read, doctree-read, env-updated etcetera. If you want to extend sphinx, this is the place to start.

Some extension examples:

  • One of the extensions they made themselves for read the docs is markdown support. They used recommonmark for that. Recommonmark’s node class is mapped to a node class that is understood by sphinx. Markdown files can even co-exist with RST files inside the same set of documents.

    The drawback of markdown is that it lacks ways to extend the language. There is a proposal for markdown inline markup, though. That would make it possible to support more of the RST features in markdown.

  • Table of contents. They implemented a “pending” node that can be filled later in the sphinx rendering process with the actual table of contents.

  • References can refer to anything elsewhere in one of the other pages. A transformation later on in the process resolves the references.

    With the proper setup, you can even reference items in other sphinx documents, like pointing at Django’s documentation on some subject. Google for “intersphinx_mapping” if you need it. It works wonderfully.

Django uses sphinx in a slightly different way.

  • All documentation is written in RST, but the HTML is generated as JSON blobs (!!!). It is rendered through django templates on the website.

  • It has some django-specific additions like directly linking to settings or tickets:

    :setting:`DATABASES`
    

    He showed the implementation of this feature.

Some take-aways:

  • Make sure to use semantic markup when writing docs. You can write down more information about what is going on in your brain.

  • Generally your job is to get the nodes to exist in the way that you want. So when you write extensions, keep the node tree in mind.

    You can run make pseudoxml to get a pseudo-xml output to show you the sphinx node tree view of your document to help you with this.

  • Understand where you need to plug into the pipeline and do as little as possible to make it happen.

vegetation on my model railroad

Image: attempting to get the vegetation colors right on my model railroad. The example photo is from Ulmen, Germany, in november 2000.

water-gerelateerd Python en Django in het hartje van Utrecht!
 
vanrees.org logo

Reinout van Rees

My name is Reinout van Rees and I program in Python, I live in the Netherlands, I cycle recumbent bikes and I have a model railway.

Weblog feeds

Most of my website content is in my weblog. You can keep up to date by subscribing to the automatic feeds (for instance with Google reader):