In my current project we faced the challenge of deploying Cassandra cluster in Kubernetes. We don’t use any of the cloud providers for hosting Cassandra nor Kubernetes. Since the beginning, there were almost no problem with spinning a Cassandra cluster. Recently, however, because of our hardware setup, we faced the issue of making Cassandra rack aware on Kubernetes cluster.
Infrastructure
The setup is(n’t) straightforward. We have 6 VMs for Cassandra, which are grouped into 3 racks - 2 VMs per rack. All of the VMs for Cassandra are labeled in k8s, so that we guarantee with affinity rules, that only Cassandra instances will be deployed there. Additionally the VMs are labeled with rack information: rack-1, rack-2, rack-3. This is precisely the information I needed to push down through Kubernetes to Cassandra itself.
Kubernetes and DownwardAPI
After some quick investigation I found the Kubernetes DownwardAPI. Without too much of a view I was sure that I can use any label specified on node and put it into the container environment variable:
Someone should have seen my face when I found out that you can only reference some restricted metadata with the DownwardAPI, and node labels isn’t one of them. There are even couple of issues and feature requests opened on how to pass through a node label into the pod:
So, ok, it’s not that easy but it’s not something that cannot be done right. In a moment I thought about using an initContainer to get the node label on which is the pod scheduled, and then add the label on to the pod. Shouldn’t be that hard, right:
Well. Almost. Quite. But not what I’d expect. Though the pod was labeled:
the environment variable was empty inside the container. That’s due to the fact, that the resolution of env vars with DownwardAPI happens during pods scheduling and not execution. Dohhh. So another brainer. But fortunately with little help of a teammate of mine I finally made it with the following approach
Solution
Just as a reminder, the original idea was to pass a node label to container with Cassandra inside, so it can use that information to configure Cassandra node with rack information. It’s also important to note that Cassandra is configured with multiple files, and one of them is cassandra-rackdc.properties which is the place where the rack information should finally be stored. The solution is not that simple, so a picture describes it best, but in steps:
configMap is used to store generic cassandra-rackdc.properties which should be updated during deployment
initContainer takes this (immutable) configMap and copies it onto a shared volume, which is shared with the Cassandra container
container mounts the shared volume and uses subPath for mounting just one of the files; we don’t want to overwrite other files
Drawing
The full blown yaml
For the purpose of readability, much configuration was removed
Uff and yay!. The following is the proof that 4 of the nodes were up with proper rack settings: