(Some summaries of the May 2022 Dutch Edgecase meeting).
You can run kubernetes in a lots of places. Greenhouses, chip making machines, factories, etcetera.
Connectivity. Plugging in a cable is the best case scenario. But perhaps you need VPN. Wifi is challenging. But perhaps also completely airgapped: no direct internet at all.
Security. If you work in the cloud, security is much different from when lots of people could have direct hardware access to your equipment because it is in a factory or so. Are your disks encrypted?
Unpredictability. On the edge, predictability is out the window. The situation can be quite diverse and weird. If you combine that with kubernetes’ complexity: why would you do that? If you combine complexity and unpredictability, you get operational overhead. But that was the situation a few years ago.
Now you have k3s. It changes everything. k3s, whether a node or a master, is a single binary. You can even run single-node clusters. k3s is great and easy. He did a quick demo. A master k3s in amazon and three battery-powered raspberry pi machines spread over the room running k3s.
What they normally use to manage all the sorts of nodes: rancher. Also ArgoCD. The software side is slowly becoming a bit standardised. The hardware is a bit of a problem: basically every customer has different equipment needs. But if you want to productise what you’re offering, you need to standardise a bit more.
We see forest fires, tornados in Germany, melting glaciers, rising sea levels. Can we fix it? No. We can’t undo what has happened and what is already happening. But we can dampen the effect quite a bit if we put in the effort.
Data centers are a big user of energy and thus a big source of global warming. 2.7% of Dutch energy usage is by data centers. Ireland is at 14.4%! Google’s datacenter in the Netherlands is right next door to the biggest coal electricity plant. Sure, most data centers say they use only green energy, but almost nobody gives solid figures… Google is one of the few that publish reasonably useful figures.
One GPU running for a full year has CO2 emissions equal to 23000 km by car. A 8 CPU kubernetes node some 5000 km. Both calculated with the Dutch mix of electricity sources and including the operation and construction of the data center building.
Some tips what you can do about it:
Remove unused or underused servers.
Build or write more efficient code.
Clever caching.
Look at the size of your environment. Can you scale down? Review everything periodically.
Automatically stop staging/testing environments.
Can you improve utilization by using a queue? Run stuff at night?
If you run on the edge: turn on power save mode.
If you run in the cloud: pick the right region (from a carbon footprint perspective).
Choose where you run: in the cloud our on the edge. Which takes less energy? Which is more efficient?
Tip from the audience: vote with your wallet. Energy costs money, so if you pick the cheaper offerings of cloud providers, you automatically pick the more energy efficient ones.
Cloud is a huge drain on the available energy. But cloud also enabled lots of innovation and progress. At leafcloud, they’re trying to change the design.
Data centers are basically space heaters. Energy goes in, heat goes out. That’s not efficient. At leafcloud they place servers in buildings where they actually can use the heat, like apartment buildings. So… they use the heat of the servers to pre-heat tap water. They make a trade: you provide room and they provide warmth.
So… basically a distributed datacenter. They use glassfiber connections from their central datacenter to the leaf nodes. Storage is in the central location, it is compute that runs on the nodes. If a leaf goes offline, everything gets re-scheduled on different leaf nodes.
He showed a couple of use cases. Central office/location with remote locations: greenhouses, separate factories, equipment inside a huge factory. Sometimes unstable internet connection, sometimes limited physical access. And you still want to manage everything as easy as possible.
How do we manage clusters? Git, flat yaml files, git-based pipelines, helm charts. “Bah!” The only way, according to him, to run clusters is with a “gitops engine” like ArgoCD.
If you have your setup in git: nice. But how do you deploy it? Can there be manual changes? Do you give developers access to kubernetes so that they can play with their namespace? In that case the state isn’t necessarily in git, as it can be changed by hand/kubectl.
What’s better is something like ArgoCD. It will monitor your cluster and compare it with the desired state (in git). And it will roll back changes! So using a strict ArgoCD tool really helps to keep everything nice and clean.
So: no kubectl access to your cluster. For anyone. Only argocd. You can kill off your production cluster and re-create it. That should be possible.
Ok, ArgoCD… Now how do we manage k8s clusters in remote locations when the location has limited connection? A central argocd server isn’t the handiest in that case. But… argocd can maintain itself. It needs a yml with config to start itself. So you can have an argocd node in every location. It needs a git server (which you can run locally) for its yaml config. A quick cronjob to pull config from a central git location.
Observability: prometheus. Prometheus writes to a “live file”. After two hours, it starts a new file. The older files aren’t changed anymore. And thus they can be shipped (=cronjob) to the central location. Use Thanos to read those files as if it is a real prometheus server.
He’s sorry for those poor souls that have to use the ELK stack for logging. Grafana Loki is way nicer. Like Prometheus it writes the logging to a “live file” and rolls over every two hours. The archive files can again be shippped to a centralized location as soon as there’s an internet connection.
He works at veeam, originally a virtual machine backup company. They now also have a kubernetes offering (k10, https://kasten.io).
An example: dredging companies. Their ships come home into the harbor only every few years, that’s the only time they can replace the entire IT environment on the ship. Once on location, often a very remote location, the internet connectivity is really bad. 64kb/s, that kind of stuff. So backups often have to be done locally, on the edge.
Why backups? A backup is like insurance for your data center. Hardware can get broken. Software can get broken. But that’s not the biggest problem nowadays: ransomware is what they see most. What also occurs: rogue employees that delete data.
Kubernetes opened up a big “backup gap”. The devs run stuff on the cluster, but don’t know how the backup works. OPS needs to back it up, but doesn’t know what is running on the cluster.
Kubernetes offers nice high availability, but it is no backup. If the server location burns down, everything is gone. It also doesn’t help with non-node-failure events like data corruption or accidental deletion or ransomware. And it is difficult to have truly offsite backups. Recovery is also often complex.
Their “k10” backup solution is actually running inside the cluster as microservices. Protection/backup is at the namespace level. They back up not only the persistent volumes, but also the configuration of everything that’s running (as it might have changed compared to the git-based config files that you’re depending on).
He showed a demo (https://github.com/tdewin/stock-demo). One of the technologies that he used was ZFS and the “openebs” zfs storage provider. ZFS can do snapshots, that’s one of the advantages. At the end of the demo he deleted his namespace and then restored everything from backup.
My name is Reinout van Rees and I program in Python, I live in the Netherlands, I cycle recumbent bikes and I have a model railway.
Most of my website content is in my weblog. You can keep up to date by subscribing to the automatic feeds (for instance with Google reader):