D
Dan
our traffic is seasonable and it is most cost effective for us to run with a single larger node during the off season and scale up on small nodes during our peak season. Scaling on the main pool of larger nodes is very wasteful.
Our workloads are a mix of infrastructure support and public facing services. The public workloads are small and easily scalable while things like Prometheus and our build pipeline operators are fairly static. When not under load all of them fit onto a single mid-sized node.
B
Bjorn Goossens
I'd like to bump this idea with the upcoming addition of GPU worker nodes for k8s clusters in DO. I feel strongly that for many developers, being able to scale their pool of GPU worker nodes to zero would greatly help in managing costs. We'd really like to use the H100 instances, just not all the time. Our foreseen AI workloads would run training at night for a few hours, so we can serve the updated models the next day on less costly nodes.
Dmitry Golubets
Until DO fixes this, you can try my custom K8s operator: https://github.com/DGolubets/k8s-managed-node-pool.
Josh Mengerink
This would be a huge benefit to us, and actially cause us to use more expensive droplets (which we are not using now because of idle time)
H
Hoffman
It is adding unnecessary cost. For non-production clusters, we would want to scale down the node pools to 0 during weekends and on holidays.
J
Johnny
This idea is beneficial for use cases that occasionally require a lot of resources but usually only minimal resources.
In my case, I have a pod that requires a lot of resources (e.g., 32 CPU cores). However, this pod only runs once a week, so running a node with 32 CPU cores 24/7 is cost-inefficient.
So instead of having one node with 32 CPU cores, we would like to run a small instance for scalers like Keda and scale the larger instance from zero as needed.
B
Bjorn Goossens
This would be really helpful for us. Other cloud providers already support this, support for this would prevent us from switching to another provider.
I also don't understand why it is possible to create an additional node pool with zero nodes from the CLI, but not have it autoscale between zero and n nodes.