Conquering Elastic Kubernetes Service secondary subnets in Terraform

When setting up an Amazon Elastic Kubernetes Service (EKS) cluster, DevOps engineers often don't have the ability or permission to change the size of their primary subnet. This leads to a situation where a secondary subnet is needed, however, this has its own set of difficulties. Here is how we at Entersekt set up secondary subnets via Terraform.

OfferZen_Conquering-EKS-secondary-subnets-in-terraform_inner-article

Why use a secondary subnet in EKS?

Amazon EKS is a managed service that allows developers to make use of a Kubernetes cluster without having to worry about managing it themselves. In the default configuration for an EKS Kubernetes cluster, a single subnet is provided. This is usually fine for most deployments and a good starting point. As a platform engineer at Entersekt, a fintech company, we regularly create multiple clusters in the same environment. That’s where we encountered a problem: Once we started creating more clusters, we ran out of IP addresses.

Why you might ask? Well, for each pod that your EKS cluster creates, an IP is assigned to this pod from your VPC subnet. Once you start scaling out your cluster, your subnet IP addresses can be depleted quickly. This is especially true for us as we also have other non-EKS services running in this subnet as well. The solution we found in AWS is to provide a secondary subnet for your cluster, allowing for more IP addresses.

Creating helper modules in Terraform

At Entersekt, we use Terraform to configure our infrastructure as code. We built our modules to also assist with creating a secondary subnet in code.

We found no single Terraform module that allowed us to enable secondary subnets in EKS, but this can be solved by creating helper modules to set up an EKS cluster with secondary subnets.

By creating these helper modules via Terraform to set up the secondary subnets, you can sleep easy at night knowing your infrastructure is in code and secure. Below are some of the benefits of doing this:

Infrastructure as code: Your infrastructure is defined and saved in code format. This helps to track changes and manage the state of your infrastructure.
Automation: You can deploy and manage your infrastructure from a pipeline.
Reproducibility: You know that every time you deploy this, you expect the same result.
Your dev friends will like you

Setting up EKS secondary subnets in Terraform

Before you begin

You will need the following installed where you will be running your Terraform deployments from:

kubectl
AWS CLI, and a profile setup to connect to the cluster
Terraform

EKS, Add-ons and Terraform

To configure a secondary subnet in EKS, we need to configure the Amazon VPC CNI add-on. EKS Add-ons provide operational support to the cluster. This includes components like CoreDNS and kube-proxy. Versions of Kubernetes are matched and validated for specific versions of add-ons. You can, for example, check which add-ons versions are supported for a given Kubernetes version using the following AWS CLI command:

eksctl utils describe-addon-versions --kubernetes-version 1.23

In Terraform, you can specify add-ons and their versions using the official AWS provider. For example:

resource "aws_eks_addon" "example" {
  cluster_name      = aws_eks_cluster.example.name
  addon_name        = "coredns"
  addon_version     = "v1.8.7-eksbuild.3" #e.g., previous version v1.8.7-
eksbuild.2 and the new version is v1.8.7-eksbuild.3
  resolve_conflicts = "PRESERVE"
}

This allows for upgrading add-ons as you maintain and upgrade your cluster. A number of these add-ons are also installed by default if you do not specify them, which makes it difficult to manage your cluster upgrades in Terraform. For this reason, it is useful to specify these add-ons in your Terraform module.

The use of add-ons does, however, come with a bit of complexity. The add-ons’ interface does not allow for any modification of the installation. Any changes need to be done via the Kubernetes API. Okay, fine, so why not just use the official Kubernetes provider then and be done with it? Unfortunately, it is not that simple when developing modules.

What we found was that any changes in the core Terraform EKS module would result in the output for the Kubernetes endpoint being unknown. This will, in turn, cause the Kubernetes provider to fail as it will default back to localhost as the endpoint and fail to connect.

This brings us to the Amazon VPC CNI. This is an AWS-managed add-on that needs specific configuration to allow for secondary subnets. It has to be configured before the worker nodes are created.

The Terraform module solution

We can start by creating our secondary subnets via Terraform:

variable "vpc_id" {
  type        = string
  description = "The VPC your cluster will run in."
}

variable "secondary_subnet" {
  type        = string
  description = "The secondary subnet range. The secondary subnet range 
can only make use of 100.64.0.0/10 or 198.19.0.0/16 ranges, see https://aws.amazon.com/premiumsupport/knowledge-center/eks-multiple-cidr-ranges/"
}

variable "az_config" {
  type = list(object({
    cidr_block        = string
    availability_zone = string
    route_table_id    = string
  }))
  description = "Set a smaller subnet for each availability zone that form part of the overall secondary subnet."
}

resource "aws_vpc_ipv4_cidr_block_association" "eks_secondary_subnet" {
  vpc_id     = var.vpc_id
  cidr_block = var.secondary_subnet
}

resource "aws_subnet" "az_secondary_subnet" {
  count = length(var.az_config)
  vpc_id            = var.vpc_id
  cidr_block        = var.az_config[count.index].cidr_block
  availability_zone = var.az_config[count.index].availability_zone
  depends_on = [
    aws_vpc_ipv4_cidr_block_association.esp_eks_secondary_subnet
  ]
  tags = {
    Name = "eks-secondary-subnet-private${count.index}-$
{var.az_config[count.index].availability_zone}"
  }
  lifecycle {
    ignore_changes = [tags]
  }
}

resource "aws_route_table_association" "az_secondary_subnet" {
  count = length(var.az_config)
  subnet_id      = aws_subnet.az_secondary_subnet[count.index].id
  route_table_id = var.az_config[count.index].route_table_id
}

output "subnet_ids" {
  value = { for k, v in aws_subnet.az_secondary_subnet : v.id => 
v.tags_all }
}

Example input values:

input = {
  vpc_id           = "vpc-xxxxxx"
  secondary_subnet = "100.70.0.0/16"
  az_config = [{
    cidr_block        = "100.70.0.0/19"
    availability_zone = "us-west-1a"
    route_table_id    = "rtb-xxxxxx"
    }, {
    cidr_block        = "100.70.32.0/19"
    availability_zone = "us-west-1b"
    route_table_id    = "rtb-xxxxxx"
    }, {
    cidr_block        = "100.70.64.0/19"
    availability_zone = "us-west-1c"
    route_table_id    = "rtb-xxxxxx"
  }]
}

With our subnets created, we can work on our kubectl commands that we will need for the cluster. Let's get the cluster kubeconfig.yaml first:

aws eks update-kubeconfig \
    --region $AWS_REGION \
    --name $CLUSTER_NAME \
    --kubeconfig kubeconfig.yaml
export KUBECONFIG=./kubeconfig.yaml

Following the secondary subnet guide, we modify the aws-node deamonset with additional environment variables:

kubectl set env daemonset aws-node -n kube-system 
AWS_VPC_K8S_CNI_CUSTOM_NETWORK_CFG=true
kubectl set env daemonset aws-node -n kube-system 
ENI_CONFIG_LABEL_DEF=topology.kubernetes.io/zone

Next, we add a custom resource for each availability zone in the secondary subnet, but let's first create a reusable block with environment variables:

cat &lt;<CR_EOF | kubectl apply -f -
apiVersion: crd.k8s.amazonaws.com/v1alpha1
kind: ENIConfig
metadata:
  name: $AWS_AZ
spec:
  securityGroups:
    - $WORKERGROUPS_SG
  subnet: $SECONDARY_SUBNET_ID
CR_EOF

We can run this code block for each secondary subnet ID.

Now that we have our commands, we will need to run them in Terraform. The answer to this is null_resources and local exec. We can run local command in Terraform with a local-exec provisioner and wrap it in a null_resource.

With the example, we will assume the following:

There is an aws_eks_cluster.my_cluster resource.
There is an aws_eks_addon.vpc_cni resource that installs the VPC CNI add-on.
There is an aws_security_group_rule.control_plane security group that controls access to the cluster.

With the commands in place, we can start creating our Terraform module. The environment variable updates are the easiest. Because the resources do not have direct dependencies, we have to add our own depends_on blocks. Ensure this is created after the vpc_cni add-on has been installed as we need the CRDs that it provides. We also need the EKS control plane security group.

Setting the triggers ensures that the resources can be updated if a subnet changes. It also helps with the deletion step as we can get our previous state.

resource "null_resource" "az_subnet_daemonset_vars" {
  depends_on = [
    // Needs the VPC CNI installed to update its change
    aws_eks_addon.vpc_cni,
    // Needs this security group rule to exist to gain access to the api
    aws_security_group_rule.control_plane
  ]
  triggers = {
    aws_region   = var.region
    cluster_name = var.cluster_name
  }
  provisioner "local-exec" {
    command = &lt;<-EOT
        aws eks update-kubeconfig \
            --region $AWS_REGION \
            --name $CLUSTER_NAME \
            --kubeconfig kubeconfig.yaml
        export KUBECONFIG=./kubeconfig.yaml
        kubectl set env daemonset aws-node -n kube-system 
AWS_VPC_K8S_CNI_CUSTOM_NETWORK_CFG=true
        kubectl set env daemonset aws-node -n kube-system 
ENI_CONFIG_LABEL_DEF=topology.kubernetes.io/zone
    EOT
    environment = {
      AWS_REGION   = self.triggers.aws_region
      CLUSTER_NAME = self.triggers.cluster_name
    }
  }
}

With that we can start creating our custom AWS VPC resources:

resource "null_resource" "az_subnet_eniconfig_cr_app" {
  count = length(var.secondary_subnets)
  depends_on = [
    // Documentation suggests that the env as set before setting the CR
    null_resource.az_subnet_daemonset_vars
  ]
  triggers = {
    aws_region   = var.region
    cluster_name = var.cluster_name
    az           = var.secondary_subnets[count.index].availability_zone
    sg_id        = 
aws_eks_cluster.my_cluster.vpc_config[0].cluster_security_group_id
    subnet_id    = var.secondary_subnets[count.index].subnet_id
  }
  provisioner "local-exec" {
    command = &lt;<-EOT
        aws eks update-kubeconfig \
            --region $AWS_REGION \
            --name $CLUSTER_NAME \
            --kubeconfig kubeconfig.yaml
        export KUBECONFIG=./kubeconfig.yaml
        cat &lt;<CR_EOF | kubectl apply -f -
        apiVersion: crd.k8s.amazonaws.com/v1alpha1
        kind: ENIConfig
        metadata:
          name: $AWS_AZ
        spec:
          securityGroups:
            - $WORKERGROUPS_SG
          subnet: $SECONDARY_SUBNET_ID
        CR_EOF
    EOT
    environment = {
      AWS_AZ              = self.triggers.az
      WORKERGROUPS_SG     = self.triggers.sg_id
      SECONDARY_SUBNET_ID = self.triggers.subnet_id
      AWS_REGION          = self.triggers.aws_region
      CLUSTER_NAME        = self.triggers.cluster_name
    }
  }
}

Given the following variables:

variable "cluster_name" {
  type = string
  validation {
    condition     = can(regex("^([a-z][a-z0-9]*)(-[a-z0-9]+)*$", 
var.cluster_name))
    error_message = "Variable 'cluster_name' must be kebab-case."
  }
  description = "EKS cluster name."
}

variable "region" {
  type        = string
  description = "AWS region where the cluster is created."
}

variable "secondary_subnets" {
  type = list(object(
    {
      availability_zone = string
      subnet_id         = string
    }
  ))
  description = "A list of availability zones and corresponding secondary 
subnet ID."
}

Finally, ensure that these changes happen before the worker nodes are created. Add the following depends block in your aws_eks_node_group resource:

  depends_on = [
    #  Ensure secondary subnets are configured before worker nodes are 
created.
    null_resource.az_subnet_eniconfig_cr_app
  ]

If you want to ensure these changes are removed, a destruction step can be added. In the az_subnet_daemonset_vars resource:

  provisioner "local-exec" {
    when    = destroy
    command = &lt;<-EOT
        aws eks update-kubeconfig \
            --region $AWS_REGION \
            --name $CLUSTER_NAME \
            --kubeconfig kubeconfig.yaml
        export KUBECONFIG=./kubeconfig.yaml
        kubectl set env daemonset aws-node -n kube-system 
AWS_VPC_K8S_CNI_CUSTOM_NETWORK_CFG=false
        kubectl set env daemonset aws-node -n kube-system 
ENI_CONFIG_LABEL_DEF-
    EOT
    environment = {
      AWS_REGION   = self.triggers.aws_region
      CLUSTER_NAME = self.triggers.cluster_name
    }
  }

And the az_subnet_eniconfig_cr_app:

  provisioner "local-exec" {
    when    = destroy
    command = &lt;<-EOT
          aws eks update-kubeconfig \
              --region $AWS_REGION \
              --name $CLUSTER_NAME \
              --kubeconfig kubeconfig.yaml
          export KUBECONFIG=./kubeconfig.yaml
          cat &lt;<CR_EOF | kubectl delete -f -
          apiVersion: crd.k8s.amazonaws.com/v1alpha1
          kind: ENIConfig
          metadata:
            name: $AWS_AZ
          spec:
            securityGroups:
              - $WORKERGROUPS_SG
            subnet: $SECONDARY_SUBNET_ID
          CR_EOF
      EOT
    environment = {
      AWS_AZ              = self.triggers.az
      WORKERGROUPS_SG     = self.triggers.sg_id
      SECONDARY_SUBNET_ID = self.triggers.subnet_id
      AWS_REGION          = self.triggers.aws_region
      CLUSTER_NAME        = self.triggers.cluster_name
    }
  }

Applying the solution to a running cluster

If you have a running cluster that you want to retrofit this with, you will have to recreate the nodes. You can either delete all your nodes, but this will result in downtime, or you can update your launch template. Updating your launch template should create new nodes and hand over traffic to them.

In Conclusion

Conquering EKS secondary subnets in Terraform simplified our life by ensuring we don't have to manually manage our secondary subnet config outside of Terraform. We can now easily deploy and manage multiple clusters on the same VPC without having to worry about running out of address space. We can also expect the same cluster configuration every time we deploy. And finally, our dev friends think we are the coolest kid on the block that has all his infrastructure saved in git.

Nicholas Thompson is an experienced backend platform developer with 8+ years of experience in the tech industry. He has expertise in Kubernetes, Go, and Terraform, and is responsible for designing and maintaining backend systems at Entersekt.

Apart from his work, Nicholas has a passion for photography and hiking. He enjoys exploring the outdoors and capturing the beauty of nature with his camera.