I use docker-compose quite a lot. Most of the python/django stuff we deploy is done with docker-compose (one of the two big ones is in kubernetes already). A while back I moved several “geoservers” to docker-compose. Geoserver is a web mapping server written with java/tomcat. Normally pretty stable, but you can get it to crash or to become unresponsive.
So that’s something for which docker’s health check comes in handy. You
can configure it in docker-compose itself, but I put it in our geoserver’s
Dockerfile as I was making some other modifications anyway:
FROM docker.osgeo.org/geoserver:2.23.1 ... some unrelated customizations ... HEALTHCHECK --interval=20s --timeout=10s --retries=3 --start-period=150s \ CMD curl --fail --max-time 3 http://localhost:8080/geoserver/web/ || exit 1
A simple “curl” command to see if the geoserver still displays its start
page. With a generous
--start-period as geoserver needs quite some time to
Docker-compose allows for healthchecks, and displays
Up (healthy) in the
“state” column when you call
docker-compose ps. But docker-compose doesn’t actually
restart failed services. For that, you need docker-autoheal as an extra service. At the
core, it consists of a single shell script that asks docker if there are
containers matching the filter
health=unhealthy and optionally
autoheal=true. If found, they get restarted.
I have a mix of services (geoserver, pgbouncer, nginx) with only the
geoservers having a health check. So I configured autoheal like this in my
autoheal: image: willfarrell/autoheal:1.1.0 tty: true restart: unless-stopped environment: - AUTOHEAL_CONTAINER_LABEL=autoheal volumes: - /var/run/docker.sock:/var/run/docker.sock
And the services with healthcheck got the autoheal label:
geoserver: image: ... labels: autoheal: true # <= there's an error here
Autoheal didn’t seem to be working for me. No logs. Well, the geoservers that could need to be autohealed rarely failed, which is good news, but made it harder to see if autoheal was working.
Last week I made some changes that improved the speed for several geoserver
maps. But it also made geoserver as a whole unstable. So I had an
(unhealthy) container. But autoheal didn’t restart it. And there was nothing
in autoheal’s log output.
It turned out that
autoheal: true was the problem.
true needs to be
autoheal: "true", as autoheal searches for the lowercase
true gets translated to a capitalized
True (probably a
representation of the boolean value) by docker compose, which autoheal doesn’t
After quoting the value, autoheal properly restarted misbehaving geoservers when they went belly-up:
geoserver: image: ... labels: autoheal: "true" # <= quoted value works
That took some time to figure out… Especially as there was totally no
output from the autoheal docker. A short message upon startup (
"autoheal is running") would personally have helped me to be sure the
logging was actually working. I spend quite some time googling and figuring
out whether there was actually something wrong with my logging. That’s why the
tty: true is in there, for instance.
I hope this blog entry has the right words to help someone else plagued with the same problem :-) A quick note in the README, warning about the quotes, is probably a better solution. I’ve submitted an issue for it.
A win for open source, btw: I could read the source code for the autoheal shell script. That helped me figure out what was going wrong.
My name is Reinout van Rees and I work a lot with Python (programming language) and Django (website framework). I live in The Netherlands and I'm happily married to Annie van Rees-Kooiman.
Most of my website content is in my weblog. You can keep up to date by subscribing to the automatic feeds (for instance with Google reader):