PyGrunn: python as a scientist’s playground - Peter Kroon

Tags: pygrunn, python

(One of my summaries of a talk at the 2019 PyGrunn conference).

He’s a scientist. Quite often, he searches for python packages.

  • If you’re writing python packages, you can learn how someone might search for your package.

  • If you don’t write python packages, you can learn how to investigate.

Scientists try to solve unsolved problems. When doing it with computers, you basically do three things.

  • Perform simulations.

  • Set up simulations.

  • Analyze results.

Newton said something about “standing on the shoulders of giants”. So basically he predicted the python package index! So much libraries to build upon!

A problem is that there is so much software. There are multiple libraries that can handle graphs (directed graphs, not diagrams). He’s going to use that as an example.

Rule one: PR is important. If you don’t know a package exists, it won’t come on the list. Google, github discovery, stackoverflow, scientific liberature, friends, pygrunn talks, etc.

A README is critical. Without a good readme: forget it.

The five he found: graph-tool, networkx, igraph, python-graph scipy.sparse.csgraph.

Rule two: documentation is very important. Docs should showcase the capabilities. This goes beyond explaining it, it should show it.

I must be able to learn how to use your package from the docs. Just some API documentation is not enough, you need examples.

Watch out with technical jargon and terms. On the one hand: as a scientist you’re often from a different field and you might not know your way around the terms. On the other hand, you do want to mention those terms to help with further investigation.

Bonus points for references to scientific literature.

Documentation gold standard: scikit-learn!

python-graph has no online docs, so that one’s off the shortlist. The other four are pretty OK.

Rule three: it must be python3 compatible. On 1 january 2020 he’s going to wipe python2 from all the machines that he has write access to.

All four packages are OK.

Rule four: it must be easy to install. So pypi (or a conda channel). You want to let pip/coda deal with your dependencies. If not, at least list them.

Pure python is desirable. If you need compilation of c/c++/fortran, you need all the build dependencies. This also applies to your dependencies.

He himself is a computer scientist, so he can compile stuff. But most scientists can’t really do.

He himself actually wants to do research: he doesn’t want to solve packaging problems.

graph-tool is not on pypi, networkx is pure python. scipy is fortran/c, but provides wheels. igraph is C core and not on pypi.

So scipy and networkx are left.

Rule five: it must be versatile. Your package must do everything. If your package does a lot, there are fewer dependencies in the future. And I have to learn fewer packages.

If it doesn’t do everything, it might still be ok if it is extendable. He might even open a pull request to add the functionality that he needs.

Note: small projects that solve one project and solve it well are OK, too.

networkx: does too much to count. Nice. scipy.sparse.csgraph has six functions. So for now, networkx is the package of choice.

The first and third rules are hard rules: if it is a python2-only package it is out and if you can’t find a package, you can’t find it, period.

Conclusions

  • You need to invest effort to make ME try your package.

  • “My software is so amazing, so you should invest time and effort to use it”: NO :-)

  • If it doesn’t work in 15 minutes: next candidate.

 
vanrees.org logo

About me

My name is Reinout van Rees and I work a lot with Python (programming language) and Django (website framework). I live in The Netherlands and I'm happily married to Annie van Rees-Kooiman.

Weblog feeds

Most of my website content is in my weblog. You can keep up to date by subscribing to the automatic feeds (for instance with Google reader):