Edgecase 2023: wednesday early morning sessions

Tags: kubernetes, python

(One of my summaries of the 2023 Dutch edgecase k8s conference in Utrecht, NL).

The truth about kubernetes - Gerrit Tamboer

He’s one of the founders of Fullstaq (the conference organisers). They started with aws/azure/etc. Working for everything from

Around 2019, hybrid cloud became popular. Part in the regular cloud, part in your own datacenter. For that you need it to be reasonably similar on both sides. Now, edge computing is on the rise. Internet of things, CDN edge locations, all sorts of things.

A big help for them was K3S. But the core that makes it all possible: gitops (argocd, flux, etc).

2023: kubernetes was released in 2014. So nine years. What’s coming up in the near future? We’re maturing and lots of cool things are happening. But… for almost all companies, that maturity isn’t actually true. Some cloud-only companies that recently started might use kubernetes to the full, but most are slowly transitioning. Common problems are complexity and skill set.

Complexity: kobernetes is only part of the solution. 10%. You also have observability, security, advanced networking, ci/cd, data/storage, multi-cluster/multi-cloud… Only after you have all those layers, then you’re really production-ready and only then you can start migrationg yout software.

“Complexity” means “operationlal overhead”, which offsets most of the benefits of kubernetes for many companies. He showed a picture of a regular kubernetes observability stack: 10 components or so. Compare that to the regular monitoring in regular companies: just nagios (or zabbix in our case)… That’s quite a step!

Going from real servers to virtual machines was pretty OK to do. Not a big step. Moving from virtual machines to container deployments (“docker compose on a VM”) is also a small step. But jumping to a kubernetes deployment is a huge step. Lots of things are different. They need help.

Help? A solution is baby steps. How much of the full “production ready” stack do you really need to show value? You don’t need a service mesh from the start… Neither complex networking.

You could use a SaaS instead of doing observability yourself. You can use ci/cd via github. Storage can be done in the public cloud (or ask your local sysadmin). K8s management can also be done in the public cloud.

Less complexity means less time and less money.

Skill gap. Big problem. What can help are workshops. If you want kubernetes to “land” in your company, you need to get more people involved and knowledgeable.

Chick-fil-A’s edge architecture - Brian Chambers

Chick-fil-A is a chicken sandwich restaurant chain, something I didn’t know. They don’t have locations in the Netherlands, that’s why. They’ve run kubernetes for quite a while at 3000 locations.

Unrelated personal comment… He showed some examples of the restaurant chain, mostly in the USA/Canada. Lots of drive-tru stuff. Two lanes. Experiments with four lanes. Restaurants in the middle of a parking lot. What I saw was an awful car-centric environment. I’m used to city centers where you can walk. Our office is in the center of Utrecht: walking and cycling. There it was a lost little restaurant surrounded by cars in a sea of asphalt. And still he mentioned “we want a personal connection with the customer”. Ouch. Horrible.

Anyway, back to kubernetes :-) What are some of the kinds of data? IoT like kitchen equipment, temperature sensors. The POS terminals. Payments. Monitoring data (he mentioned Lidar, which is point cloud radar, so I’m wondering what they’re monitoring with that: probably length of car queues or so).

Lots of forecasting is happening based on that data. Nice. Car queue length is fed back into instructions for the kitchen. Or whether someone extra needs to take orders outside.

They looked at lots of kubernetes edge solutions. AWS greengrass+outpusts, Nomad, etc. They all looked pretty heavy for their use case. The solution was K3S. A couple of intel NUC machines suffice. A standard partition scheme on top if it, plus ubuntu 18.04, plus K3S and then the applications on top of it. “Partition scheme” in this case means that the NUCs are always wiped and freshly installed. (He also mentioned “overlayFS”, which apparently helped them with updates, but I didn’t get that fully).

The apps on the edge K3S are things like local authentication, message broker, postgres+mongo, observability with prometheus/vector, vessel for gitops.

K8s on the edge: they also run it in the cloud, mostly to manage the 3000 edge locations. The edge locations don’t know about each other, they only have to talk with the central system.

Deployment. Their organisation is split into separate teams. An app team using python and datadog. Another app team with java and amazon cloud watch. An infra team for gitops (with “fleet” as the orchestrator and “vessel” on the edge nodes.

Every edge location has its own git repository (gitlab). Rolling out changes incrementally over locations is easier that way. It is also possible to roll out changes everywhere at the same time. Having one repo might have been theoretically nicer, but their approach is more practical for their reality.

Persistence strategy. Their approach is to offer best-effort only. No guarantees at the edge. Most of the data used at the edge is what is needed right at that moment, so a few minutes of data that is lost isn’t really bad. The setup is pretty resilient though. So you can use mongo and postgres in your app just fine. App developers are encouraged to sync their app state to the cloud at regular intervals.

Monitoring: at the edge, they use Vector as a place to send the logs and metrics to. Vector then sends on the errors to the cloud: the rest is filtered out and stays local. This is also handy for some IoT things like the fridge, which you don’t want to send to cloud directly for security reasons.

In the cloud, everything also goes to a Vector instance, which then distributes it to the actual location (datadog, cloudwatch, etc.)

Some principles:

  • Constraints breed creativity. Helps to keep it simple. For example, there is no server room in a restaurant: you need to have simple small servers. Few people to do the work, so the hardware solution had to be simple as they didn’t have the capacity to troubleshoot. Network had to be simple, too.

  • Just enough kubernetes. Kubernetes sounds like “cute”, but watch out. A cute small baby bear ends up being a dangerous large animal. Stay lightweight. K3s. Aim at highly recoverable instead of highly available. They embrace the “kube movement”: the open source ecosystem. People want to work with the open source stuff, so it is easier to find people.

  • Cattle, not pets. Zero-touch provisioning: plug-and-play install. “Wipe” pattern: the capability to remotely wipe nodes back to their initial state. Throw-away pattern: if a device is broken, leave it out of the cluster and ship a replacement. Re-hydrate pattern: encourage teams to send critical data out to the cloud when they can and be able to rehydrate if needed, just like with a new iphone.

On the edge, mirror the cloud paradigms as much as you can. Use containers. The “cattle not pets” paradigm.

 
vanrees.org logo

Reinout van Rees

My name is Reinout van Rees and I program in Python, I live in the Netherlands, I cycle recumbent bikes and I have a model railway.

Weblog feeds

Most of my website content is in my weblog. You can keep up to date by subscribing to the automatic feeds (for instance with Google reader):