January 13th, 2020

Strategies For Managing Kubernetes Cloud Cost Part 1

by in Kubernetes In Action

Strategies For Managing Kubernetes Cloud Cost Part 1

TC Audio

Intro

Cloud complexity and cost have forced companies to migrate their cloud infrastructure into Kubernetes clusters. The cloud resources have been containerized, vast Kubernetes clusters created, and workloads deployed into the cluster. Since the cluster is running within the cloud, the cloud providers are happy to assist in such a move. Once you go through this migration process, you might find that your cloud bill has remained the same or increased. We are going to examine strategies that help with managing the Kubernetes cluster cloud cost.

Almost half of C-suite executives cited cloud complexity (47%) as the factor that will have the most negative impact on cloud computing’s ROI over the next five years

avatar

Cloud Complexity Management survey results

Source

Cloud

The Cloud - Amorphous monolith made up of CPU, GPU, RAM, disk, and networking.
The Cloud

For analysis, we are looking at the cloud as an amorphous monolith made up of CPU, GPU (and TPU), RAM, disks, and networking. These are the primary cloud resources that cloud providers sell. A Kubernetes cluster created and deployed in the cloud uses these cloud resources to run containers.

Kubernetes

Kubernetes cluster with Master Node and many Minion Nodes
Kubernetes Cluster

Kubernetes is a container orchestration system. Kubernetes is always deployed as a cluster, made up of a master node and many minion nodes. Each node is a virtual machine(VM) running in the cloud, consuming cloud resources. Before cluster creation, you define the number of nodes and types of virtual machines that are used to create the nodes. The cloud providers have a lot of different kinds of virtual machines from which to choose. The usual approach is to choose something, not too big and not too small, something that feels like it has enough RAM and CPU, a guesstimation.

Example of Underutilized Cluster Nodes.
Heterogeneous Workloads – Underutilized Node Resources

Workloads running within the Kubernetes cluster are heterogeneous. The microservices, functions, jobs, databases, etc. running in the cluster utilize different amounts of cloud resources through the nodes. Workloads, like Jenkins builder, are CPU heavy. Other workloads, like the ElasticSearch databases, are RAM heavy. Other workloads, like machine learning training jobs, are GPU heavy. Using a Kubernetes cluster that is composed of the same type of nodes to support these different workloads means you are losing money since the cluster is not cost-optimized for any of the workloads.

Table 1: Average Monthly Cost for Virtual Machines used by a Node
NodeCPURAMMonthly Cost
Standard VM27.5GB$47.00
High RAM VM213GB$60.00
High CPU VM21.8GB$36.00
* Using average cloud provider prices, not reflecting any individual cloud provider.

Pool Strategy

Kubernetes Cluster made up of Node Pools
Kubernetes Cluster

Every major cloud provider, including AWS, GPC, and Azure, gives the ability to group multiple nodes into a node pool. The nodes in the node pool are of the same kind of virtual machine, with the same cloud resource specs. By creating specific pools that are cost-optimized for particular workloads, you can reduce the total cost of the cluster. For example, moving the CPU intensive workload to the CPU node pool, where high CPU and low RAM virtual machines have been configured, would reduce the total cost.

Using the pricing from Table 1, here is an example of a 5 node Kubernetes cluster. There are 3 clusters with 5 same kinds of nodes and one cluster with a mix of different types of nodes.

Table 2: Average Yearly Cost for 5 Node Kubernetes Cluster.
PoolsCPURAMMonthly CostYearly Cost
Pool (5) Standard VM1037.5GB$235$2820
Pool (5) High RAM VM1065GB$300$3600
Pool (5) High CPU VM109GB$180$900
Pool High CPU (2 VM) + Pool High RAM (1 VM) + Pool (2 VM) Standard1047.8GB$226$2712

Using a cluster with only standard VM nodes will leave the cluster nodes underutilized with leftover CPU and/or RAM capacity. Using only high RAM VM’s for the nodes will be costly since most likely the running workloads will not only be RAM heavy, any CPU-heavy workloads will max out the CPU allocations for the node and leave you with underutilized RAM.

Understanding your workloads is essential to optimize the total cost of the cluster. Starting with a naive cluster configuration with homogenous VM’s in the cluster node pool, and profiling your workloads with monitoring tools will give you insight into the cluster workloads. The feedback can then be used to create more specialized pools and fully utilize the nodes and reduce the total cost of the cloud cost for the cluster.

Preemptible Strategy

All the major cloud providers allow users to create virtual machines from their excess capacity. AWS has EC2 Spot, Google has Preemptible VM and Azure has Low-priority VM. These virtual machines are short-lived. The expectation is that they will be shut down, and the workloads running on them terminated. Depending on the cloud, you get different time limits. Since these are not regular virtual machines and they are short-lived, you can save up to 90% on the cost of running these virtual machines. These types of virtual machines are recommended to be used for batch jobs and fault-tolerant applications. They can also be used with the Kubernetes cluster.

Table 3: Average Monthly Cost for Preemptible Virtual Machines used by Node
NodeCPURAMMonthly Cost
Standard VM27.5GB$14.00
High RAM VM213GB$18.00
High CPU VM21.8GB$10.00
*Using average cloud provider prices, not reflecting any individual cloud provider.

Before creating a pool of preemptible virtual machines there are things to keep in mind:

  • Short-Lived – VM’s will be terminated often, most likely daily
  • Low price – Use VM’s with more CPU and RAM for the same price as regular VM’s
  • Self-Healing – Kubernetes cluster is self-healing, it will restart the nodes and workloads automatically

Having a pool that is composed of preemptible virtual machines is essential when it comes to reducing the total cost of the cluster and running workloads efficiently. Let’s use a Jenkins builder workload to show how the preemptible pool can save money and improving efficiency. The Jenkins builder is a job that pulls source code, compiles the code, runs tests and builds the docker image.

Use Case 1 – Swap

The Jenkins builder workload on average takes from 3 min to 20 min. In this use case, we are replacing the High CPU VM with Preemptible High CPU VM.

Table 4: Preemptible Pool 5 Node Kubernetes Cluster
PoolsCPURAMMonthly CostYearly Cost
Pool High CPU (2 VM) + Pool High RAM (1 VM) + Pool (2 VM) Standard1047.8GB$226$2712
Preemptible Pool High CPU (2 VM) + Pool High RAM (1 VM) + Pool (2 VM) Standard1047.8GB$174$2088

Here we swap the High CPU VM’s with Preemptible High CPU VM’s. This is a direct replacement with VM’s that have the same specification. The daily termination of the virtual machines only affects the Jenkins builder workloads if the virtual machine is terminated in the middle of the job. Since we are saving over $700 a year, we can allow for the terminated job to be executed again after the nodes have been restarted.

Use Case 2 – Use more

The cost is a lot less for the preemptible virtual machines and we can use virtual machines with more CPU and RAM.

Table 5: Preemptible Pool 5 Node Kubernetes Cluster with more resources
PoolsCPURAMMonthly CostYearly Cost
Pool High CPU (2 VM) + Pool High RAM (1 VM) + Pool (2 VM) Standard1047.8GB$226$2712
Preemptible Pool High CPU (2 VM) + Pool High RAM (1 VM) + Pool (2 VM) Standard1451.4GB$196$2352

Staying below the price of the non-preemptable cluster pool configuration, we increased the CPU and RAM. The High CPU VM’s were replace by Preempitable High CPU VM’s with more CPU and RAM. It allows for the Jenkins Builder workloads to complete faster. It also leads to higher throughput since we can run more Jenkins Builder workloads within a given day. Making our developers and QA happy and bring a grin to my face with the lower cloud bill.

Conclusion

In this post, we have looked at using node pool and preemptible virtual machines as a strategy to help with managing cloud costs for the Kubernetes cluster. Both strategies will reduce the cloud cost and allow you to be more efficient with the cloud resources used by the cluster nodes. In the next post, we will look at strategies for reducing the cost by examing the workloads.

Links

January 13th, 2020

by in Kubernetes In Action

⟵ Back

Leave a Reply

avatar
  Subscribe  
Notify of