(One of my summaries of the first Python Leiden (NL) meetup in Leiden, NL).
FawltyDeps is a python dependency checker. “Finding undeclared and unused dependencies in your notebooks and projects”.
Note by Reinout: since 2009 I’m one of the maintainers of z3c.dependencychecker…. also a python dependency checker :-) So this talk interested me a lot, as I didn’t know yet about fawltydeps.
A big problem in science is the “replication crisis”. Lots of research cannot actually be reproduced when you try it… Data science is part of this problem. Reproducing your jupyter notebook for instance.
Someone looked at 22k+ jupyter notebooks. Only 70% had declared their dependencies, 46%
could actually install the dependencies and only 5% actually could be
run. ModuleNotFoundError
and ImportError
were the number 1 and 3 in the list of
exceptions!
What is a dependency? For instance “numpy”, if you have a import numpy as np
in your
file. Numpy isn’t in the python standard library, you have to install it first.
You can specify dependencies in setup.py
, pyproject.toml
, requirements.txt
and so. If you import something and don’t specify it, it is an “undeclared
dependency”. When you later on remove an import and don’t adjust your
requirements.txt
, you have an “unused dependency”. That’s not immediately fatal, but
it might take up unnecessary space.
FawltyDeps was started to help with this problem: find undeclared and unused dependencies. It reports them. You can ask for a more detailed report with line numbers where the dependencies were found.
FawltyDeps supports most dependency declaration locations. requirements.txt, setup.py,
pyproject, conda, etc. And it works with plain python files, notebooks, most python
versions and most OSs. You can configure it on the commandline and in config
files. There’s even a handy command to add an example config to your pyproject.toml
.
Handy: you can add it as a pre-commit hook (https://pre-commit.com). And: there’s a ready-made github action for it, including good reporting.
Fawltydeps has to deal with several corner cases:
Package names that don’t match what you import. import sklearn
and the dependency
scikit-learn
.
Or setuptools
that provides both setuptools
and pkg_resources
.
For this it looks at various locations for installed packages to help figure out those mappings. It helps if you’ve installed FawltyDeps in your project’s virtualenv.
You can add your own custom mappings in your configuration to help FawltyDeps.
You can exclude directories.
There’s a default list of “tool” packages that FawltyDeps doesn’t complain about if you include them as dependency without importing them. Ruff, black, isort: those kinds of tools.
Django projects can have dependencies that aren’t actually imported. You can ignore those in the config to prevent them to be imported.
At the moment, extra dependencies (like [test]
or [dev]
dependencies) are just
handled as part of the whole set of dependencies.
My name is Reinout van Rees and I program in Python, I live in the Netherlands, I cycle recumbent bikes and I have a model railway.
Most of my website content is in my weblog. You can keep up to date by subscribing to the automatic feeds (for instance with Google reader):