You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm managing a kubernetes cluster that uses KEDA to dispatch agents on demand based on the pool queue on DevOps. The agents run with --once to make sure they shut down after each job to allow the cluster to scale down it's nodes when no jobs are running, this works fine most of the time.
The issue arises if, for whatever reason, the new agent did not receive a job (this could happen if someone cancels a job, or something else unexpected happens). This is usually fine in a busy pool, since the agent will receive a job within a short amount of time, however, when this happens at the end of the day, or end of the work week, this can cause unnecessary infrastructure to run over the weekend, which will dramatically increase the cost, especially if the infrastructure includes GPU's or other expensive hardware.
I think this could be easily fixed by adding an "idle timeout" flag to the agent, this flag should allow specifying how long an agent is allowed to run while being idle.
./run-agent.sh --timeout 5m --once
The above command would ensure that the agent would timeout after 5 minutes, unless it received a job within that time frame.
I could work around this issue by using the DevOps api to fetch idle agents and tell kubernetes to stop the pod, but this seems like a lot of work that could be easily avoided with this proposal.
The text was updated successfully, but these errors were encountered:
Describe your feature request here
I'm managing a kubernetes cluster that uses KEDA to dispatch agents on demand based on the pool queue on DevOps. The agents run with
--once
to make sure they shut down after each job to allow the cluster to scale down it's nodes when no jobs are running, this works fine most of the time.The issue arises if, for whatever reason, the new agent did not receive a job (this could happen if someone cancels a job, or something else unexpected happens). This is usually fine in a busy pool, since the agent will receive a job within a short amount of time, however, when this happens at the end of the day, or end of the work week, this can cause unnecessary infrastructure to run over the weekend, which will dramatically increase the cost, especially if the infrastructure includes GPU's or other expensive hardware.
I think this could be easily fixed by adding an "idle timeout" flag to the agent, this flag should allow specifying how long an agent is allowed to run while being idle.
The above command would ensure that the agent would timeout after 5 minutes, unless it received a job within that time frame.
I could work around this issue by using the DevOps api to fetch idle agents and tell kubernetes to stop the pod, but this seems like a lot of work that could be easily avoided with this proposal.
The text was updated successfully, but these errors were encountered: