Ansible thoughts (plus questions for which I need your input)¶

Now… how do we structure our servers? How do we put our software on them? How do we configure them?

I’ve done my share of server maintenance and devops and what have you. On linux, that is. Maintaining my own (web)server since 1996 or so. Done a lot with buildout (which means grabbing everything together for a site that has to do with python). Automated much of our installation with Fabric so that

$ bin/fab update production

was more or less the standard way to update a site. So I know my way around reasonably and I normally keep everything pretty neat and tidy.

Now Ansible has come along and several of our sites are managed with that tool. It feels like a much more elaborate version of Fabric. “Elaborate” is meant very positively here. And I want to dive into it and start using it too.

Fabric is great for executing commands via ssh on a remote server (including copying files, modifying files etc). Ansible also executes commands via ssh, but comes with a complete system for storing your configuration, handling variables, dividing up the workload into libraries and also for sharing those libraries (called “roles” in ansible-speak).

Problem description¶

I’ve only experimented with it a little bit. Apt-get update/upgrade on the two local laptops at home, that sort of stuff. Today I wanted to get backups working: just a basic rsync cronjob on the two laptops, rsyncing their stuff to my “vanrees.org” server, somewhere remote in England. Connection via a passwordless SSH connection.

Looks simple. Just add the cronjob and a private ssh key to the laptops, add the public ssh key to the server’s backup user and you’re ready to go. But it did not feel right: it are really two kinds of configuration that I want to keep separate.

The server is one that I manage together with my brother Maurits. The laptops at home are mine alone to manage. Different “organizations” if you want. Should they be mixed?

Ansible seems to like it best when it is, in the end, one big configuration. Of course you split it up a bit in roles, but when you want to install a new website, you want to add a database to the database server, add a new site to the external proxy configuration, install the django site on one of the webserver machines, perhaps make sure memcached is on that webserver machine, etc: for one site you need changes in multiple places on multiple machines.

This again, to me, looks like two organizations are mixed up. On the one hand the people that have to make sure the main external proxy keeps working and that the production database doesn’t get abused with 25 different test/staging/development sites and that nobody kills off four websites because he upgrades some dependency (gdal or numpy .deb package or so) for the sole other site on that machine that needs it.

On the other hand, the developers of the site want to make sure the server(s) are up to date and ready for the site. And that means adding a database, installing some packages, adding the url to the proxy, etc. Exactly the things that shouldn’t need to be modified all that often.

So… is this a lack of trust? No. Complete trust, actually, that something is going to get royally messed up in case everyone gets root access to all of our servers. I mean, if puppet (used by the sysadmins) gets de-installed from servers because someone thinks it is unused… Because packages got upgraded (“let’s add ubuntugis-unstable as PPA to our 12.04 LTS so that we can use the latest GDAL that we need”) followed by the four sites on the same server dying due to failed dependencies once the server rebooted.

So… Ansible can handle most of our servers and infrastructure and sites. But there are two kinds of configuration, I think:

Server/infrastructure configuration. Which LTS to install on which server? Front-end main web proxy config. Database server configuration. Which packages to install on which server. Which services like memcached or redis to run on the machines.
Site configuration. Special cases. Database to be installed. Nginx configuration. Necessary packages. Perhaps memcached or redis.

There’s quite some overlap between the two. But they develop at quite a different pace. Tricky, as they partially need each other’s data. The main proxy needs to know which site is on which server, for instance.

Current situation¶

We now have a base set of ansible roles we can reuse for multiple sites to set up a server and for setting up sites the way we want.

There are three or four sites where abovementioned base roles are used in combination with custom roles and custom config to actually configure and install the site on the server.

Handy way to install a site, whether on a local vagrant/virtualbox or on the production server. Same command, same setup.

You need full root access on the server, though, to get it all working. And you need “create database” permission on the database server. If there are other sites on the same server (as is the custom for us, now), you’ve suddenly got multiple sites ansible-bombarding the same poor server.

Doing it all in one configuration doesn’t sound like a perfect fit, either. Too many parallel projects, too many individual changes. For instance, someone locked himself out of a server after running ansible on it as someone had added a role to fix the sudoers file. The one running ansible wasn’t allowed to become root after his ansible script updated the sudoers file as that file was now tweaked to the needs of another one of the projects…

My impression¶

Ansible is hailed as the cure for all our installing woes. And for finally breaking free of the sysadmin mold (“I’m not allowed to do anything on this machine”). But I’m pretty sure that if everyone can just update/fix/improve everything with the same big all-encompassing ansible config, that it’ll be just a nice automatic way to shoot ourselves in the foot.

We’re, in practice, using it for installing individual projects. Sometimes even to manage two or three servers for one individual project. Servers specifically for that project: fine.

We’re not using it yet to manage all aspects of all our servers. Using our current setup in the all-encompassing way is not very safe-looking.

The problem is that we do talk about ansible as if it is one big configuration where everything is stored and that all the developers can now automatically manage everything ourselves. Our mental model doesn’t match up with reality.

My questions¶

My questions are a reality check, really:

We have lots of different kinds of sites (django 1.4, 1.5, 1.6, flask, static, socket). Each wants to use ansible to install itself on the server (staging/production) and also to set up a virtualbox for development. Is ansible suited for per-site setup work?
Ansible seems the right tool to manage a large collection of servers. Making sure the right packages are installed (“a couple of 12.04 ones with memcached pre-installed”). Making sure the right services run. This should be relatively stable, so allowing the current problems (someone mucks about on the server by hand) should not be automated, right? (So: not everyone can update the server).
The first point lets developers update the server just like they would the local development virtual machine. From within a site’s own custom ansible config. The second point lets sysadmins set up the right infrastructure from within one big ansible config. Those two don’t match, right? (At least not together at the same time).

Some options¶

Brainstorming….

Docker means effectively a per-site packaged virtual machine. So you can muck about it in as much as you want, basically. All the packages you want, all the nginx config you want, all the nginx/memcached/redis/whatever you want. Root access. Copy-able to your local dev machine. You can set up your docker instance with a custom ansible config.

The docker instance has to be installed on a server somewhere, but (apart from some port forwarding and perhaps mounting a directory), there’s nothing much that needs configuring. Could be a nice combination with a infrastructure-wide ansible config, right?

Docker seems to remove quite some of the belongs-to-site-or-infrastructure problems by allowing, effectively, a full server to be in user space, right?
The setup of a server could also be handled, for a big part, with debian packages. A site’s installer (probably ansible) can easily check if a package (“our-company-standard-LTS-with-memcached.deb is installed OK) is installed and it can safely assume that the environment is OK then.
You could extract this-should-be-done-globally tasks from a site’s ansible config. And write a program to send those over as pull request to the main infrastructure-wide ansible config. Things like “I want my port 8004 to be available via the main proxy as example.org” or “I need a database on the staging database server”.

Anyway… any input appreciated! I want to hone (or fix) my thinking.

Reinout van Rees

My name is Reinout van Rees and I program in Python, I live in the Netherlands, I cycle recumbent bikes and I have a model railway.

Weblog feeds

Most of my website content is in my weblog. You can keep up to date by subscribing to the automatic feeds (for instance with Google reader):