January 13th, 2020

What is Kubernetes, and Why are My Cloud Costs So High?! – Part 1

by in Kubernetes In Action

What is Kubernetes, and Why are My Cloud Costs So High?! – Part 1

TC Audio

What is Kubernetes, and should I blame my DevOps engineer?

Kubernetes is a container orchestration system. It lets you deploy, scale, and manage containerized applications. Kubernetes is always deployed as a cluster, made up of a master node and many minion nodes. Each node is a virtual machine(VM) running in the cloud, consuming cloud resources.

Cloud complexity and costs have forced many companies to migrate their cloud infrastructure into Kubernetes clusters. Their cloud resources get put into containers (‘containerized’), vast Kubernetes clusters are created, and workloads are deployed into these clusters. Since the clusters are running within the cloud, the cloud providers are more than happy to assist in such a move.

Once you go through this migration process however, you might find that your cloud bill has remained the same or increased. And there’s no reason to blame your poor, overworked DevOps engineer. By the time you finish reading this article, you’ll see how they’re going to be a powerful ally against your rising cloud costs. So let’s examine strategies that can help you to manage a Kubernetes cluster without breaking the bank.

Almost half of C-suite executives cited cloud complexity (47%) as the factor that will have the most negative impact on cloud computing’s ROI over the next five years


Cloud Complexity Management survey results


What’s in Our Cloud?

The Cloud - Amorphous monolith made up of CPU, GPU, RAM, disk, and networking.
The Cloud

For analysis, we are looking at the cloud as an amorphous monolith made up of CPU, GPU (and TPU), RAM, disks, and networking. These are the primary resources that cloud providers sell. A Kubernetes cluster created and deployed in the cloud uses allocated resources to run containers.

Kubernetes in the Cloud

Kubernetes cluster with Master Node and many Minion Nodes
Kubernetes Cluster

The next logical question is: What is Kubernetes actually doing with those cloud resources? Well, the first thing it does is claim its space and marshall its resources. Think of Kubernetes as a sponge in a too-narrow beaker, dipped in not-quite-enough water. It’s going to soak up every drop that you give it, and expand to the exact size of the container that it’s in.

So before cluster creation, you define the number of nodes (the size of the beaker for our sponge) and types and power levels (the amount of water poured onto the sponge) of virtual machines that are used to create the nodes. Cloud providers have a lot of different kinds of virtual machines from which to choose. The usual approach is to pick something in the middle range, not too big and not too small, something that feels like it has just enough RAM and CPU. An educated guesstimation. Your DevOps engineer can help you to do the research and make these decisions.

Example of Underutilized Cluster Nodes.
Heterogeneous Workloads – Underutilized Node Resources

Remember: Kubernetes claims these resources and does not share with outside processes.

Workloads running within the Kubernetes cluster are heterogeneous. The microservices, functions, jobs, databases, etc. running in the cluster utilize different amounts of cloud resources through the nodes. It’s important to understand what resources your typical tasks are going to use, and tailor your nodes accordingly. Some workloads, like Jenkins builder for example, are CPU heavy. Other workloads, like the ElasticSearch databases, are RAM heavy. Still others, such as machine learning training jobs, are GPU heavy. Using a Kubernetes cluster that is composed of the same type of nodes to support these different workloads means you are losing money, since the cluster is not cost-optimized for any of the workloads.

Table 1: Average Monthly Cost for Virtual Machines used by a Node
NodeCPURAMMonthly Cost
Standard VM27.5GB$47.00
High RAM VM213GB$60.00
High CPU VM21.8GB$36.00
* Using average cloud provider prices, not reflecting any individual cloud provider.

Pool Strategy

Kubernetes Cluster made up of Node Pools
Kubernetes Cluster

Every major cloud provider, including AWS, GPC, and Azure, gives the ability to group multiple nodes into a node pool. The nodes in the node pool are of the same kind of virtual machine, with the same cloud resource specs. By creating specific pools that are cost-optimized for particular workloads, you can reduce the total cost of the cluster. For example, moving the CPU intensive workload to the CPU node pool, where high CPU and low RAM virtual machines have been configured, would reduce the total cost.

This is where your DevOps engineer can really help out. They can perform a performance test that charts the typical resource usage of a given workload. Armed with that knowledge, you can assign the right processes to the right node pools.

Using the pricing from Table 1, here is an example of a 5 node Kubernetes cluster. There are 3 clusters with 5 same kinds of nodes and one cluster with a mix of different types of nodes.

Table 2: Average Yearly Cost for 5 Node Kubernetes Cluster.
PoolsCPURAMMonthly CostYearly Cost
Pool (5) Standard VM1037.5GB$235$2820
Pool (5) High RAM VM1065GB$300$3600
Pool (5) High CPU VM109GB$180$900
Pool High CPU (2 VM) + Pool High RAM (1 VM) + Pool (2 VM) Standard1047.8GB$226$2712

Using a cluster with only standard VM nodes will leave the cluster nodes underutilized with leftover CPU and/or RAM capacity. Using only high RAM VM’s for the nodes will be costly since most likely the running workloads will not only be RAM heavy, any CPU-heavy workloads will max out the CPU allocations for the node and leave you with underutilized RAM.

Understanding your workloads is essential to optimize the total cost of the cluster. Starting with a naive cluster configuration with homogenous VM’s in the cluster node pool, and profiling your workloads with monitoring tools will give you insight into the cluster workloads. The feedback can then be used to create more specialized pools and fully utilize the nodes and reduce the total cost of the cloud cost for the cluster.

Preemptible Strategy

All the major cloud providers allow users to create virtual machines from their excess capacity. AWS has EC2 Spot, Google has Preemptible VM and Azure has Low-priority VM. These virtual machines are short-lived. The expectation is that they will be shut down, and the workloads running on them terminated. Depending on the cloud, you get different time limits. Since these are not regular virtual machines and they are short-lived, you can save up to 90% on the cost of running these virtual machines. These types of virtual machines are recommended to be used for batch jobs and fault-tolerant applications. They can also be used with the Kubernetes cluster.

Table 3: Average Monthly Cost for Preemptible Virtual Machines used by Node
NodeCPURAMMonthly Cost
Standard VM27.5GB$14.00
High RAM VM213GB$18.00
High CPU VM21.8GB$10.00
*Using average cloud provider prices, not reflecting any individual cloud provider.

Now you might be thinking, what is Kubernetes supposed to do with a bunch of virtual machines that are going to be terminated often, most likely daily? The answer is: Quite a bit. Because a Kubernetes cluster is self-healing; it will restart the nodes and workloads automatically after being terminated. So it can make use of these temporary VMs with more CPU and RAM at a fraction of the cost of regular VMs.

Having a pool that is composed of preemptible virtual machines is essential when it comes to reducing the total cost of the cluster and running workloads efficiently. Let’s use a Jenkins builder workload to show how the preemptible pool can save money and improve efficiency. The Jenkins builder is a job created by a DevOps engineer that pulls source code, compiles the code, runs tests and builds the Docker image.

Use Case 1 – Swap

The Jenkins builder workload on average takes from 3 min to 20 min. In this use case, we are replacing the High CPU VM with Preemptible High CPU VM.

Table 4: Preemptible Pool 5 Node Kubernetes Cluster
PoolsCPURAMMonthly CostYearly Cost
Pool High CPU (2 VM) + Pool High RAM (1 VM) + Pool (2 VM) Standard1047.8GB$226$2712
Preemptible Pool High CPU (2 VM) + Pool High RAM (1 VM) + Pool (2 VM) Standard1047.8GB$174$2088

Here we swap the High CPU VM’s with Preemptible High CPU VM’s. This is a direct replacement with VM’s that have the same specification. The daily termination of the virtual machines only affects the Jenkins builder workloads if the virtual machine is terminated in the middle of the job. Since we are saving over $700 a year, we can allow for the terminated job to be executed again after the nodes have been restarted.

Use Case 2 – Use more

The cost is a lot less for the preemptible virtual machines and we can use virtual machines with more CPU and RAM.

Table 5: Preemptible Pool 5 Node Kubernetes Cluster with more resources
PoolsCPURAMMonthly CostYearly Cost
Pool High CPU (2 VM) + Pool High RAM (1 VM) + Pool (2 VM) Standard1047.8GB$226$2712
Preemptible Pool High CPU (2 VM) + Pool High RAM (1 VM) + Pool (2 VM) Standard1451.4GB$196$2352

Staying below the price of the non-preemptable cluster pool configuration, we increased both the CPU and RAM. The High CPU VMs were replaced by more powerful Preempitable High CPU VMs. It allows for the Jenkins Builder workloads to complete faster. It also leads to higher throughput since we can run more Jenkins Builder workloads within a given day. This makes our developers and QA happy, and brings a grin to our face when we see the lower cloud bill.


In this post, we hope to have thoroughly answered the question ‘What is Kubernetes’, before looking at using node pools and preemptible virtual machines as a strategy to help with managing cloud costs for the Kubernetes cluster. Working alongside your DevOps engineer, these strategies will reduce cloud costs and allow you to be more efficient with the cloud resources used by the cluster nodes. In the next post, we will look at strategies for reducing the cost by examining the workloads.


January 13th, 2020

by in Kubernetes In Action

⟵ Back

Leave a Reply

Notify of