Partial hour billing when using kubernetes?
We've started using kubernetes autoscaling for bursty workloads, and have a question regarding billing for the temporary Linodes.
Currently, we're seeing a common workflow where a Linode will be up for 15 minutes, complete the work, then be cleaned up by auto scaling once the kube node has no pods allocated to it. This is expected based on our configuration.
Does this 15 minutes charge a full hour?
If a node is brought up then down again within an hour, is that charged as ANOTHER hour (so two billable hours in total)?
Is there a way to have the automatic kube autoscaling keep the node around until it approaches the full hour (e.g., don't shut it down until it's 50 minutes old, or some hourly increment of that)?
Separately, but related for our workflow: Is there documentation on how kube autoscaling selects the pool to scale to satisfy pod requirements? For example, if we have 4 pods launched simultaneously that required 3 CPU each, is autoscaling smart enough to expand the 16 core pool by one, versus the 4 core pool by 4?
Currently, our billing system is not able invoice on a level any more granular than hourly per service. Depending on your use-case, this could be detrimental in the long-term if you are rapidly deploying and then deleting nodes.
In this scenario, if you were to deploy a Linode, delete it before the full hour, then deploy a second Linode (regardless of size), you would be charged one-hour per Linode deployed (two hours total). With the rise in the use of scalable workloads (namely LKE), although each billable instance is small, this could quickly add up to large chunks of money spent on under-utilized services.
I have started by submitting your questions to our internal request tracker so that we can make note of ways to improve our billing system. Moving forward, I may have found a solution that helps answer your other questions in one fell swoop!
Digging around in our Blog, I located a post that discusses proactive cluster scaling, as well as Kubernetes scheduling that determines how pods get placed on Nodes. When a workload needs to scale and a subsequent pod is created and needs to be housed on a node, scheduling rules and node availability determine where it ends up.
As I understand your use-case, if your cluster was to scale-up without any sort of rules in place, I'm not quite sure how our system would determine which pool to increase if the pod needs were met by either node type. You should, however, be able to more precisely control this process through the use of node affinity and node taint.
Functionally, this would mean that you could assign affinity (or taint a node) to prevent a pod from ending up on a larger-than-necessary node, while also ensuring that large enough nodes are available when necessary. This would be defined within your deployment configuration .yaml or through your Helm chart, and if deployed correctly, should mean that you deploy the right type of Node when using multiple Node-size pools.
That same post provides some information about proactively scheduling your scaling based on scraped Prometheus metrics. Although it's beyond my own scripting knowledge, it seems like those metrics would allow for the ability to set timeframes to maximize the usage of a node in the context of your billing (keeping a Node available for a full hour even if the cluster would normally scale back down sooner that that).