Add user controlled hard spend caps for inference
l
lunarloom
There are many horror stories of small businesses getting hit with surprise large bills from inference usage of leaked keys. Google Vertex AI added a feature to set inference limits because of this. Other inference providers all have this too.
By the time a monitoring email is fired, its usually too late since the cost get very high very quickly.
There's no ambiguity about shutting down anything since the server should just refuse any further inference. Digital Ocean is missing this feature.
Digital Ocean already has its own limits so most of the work needed is already done. A hard user adjustable per API key monthly spend limit would be ideal.