You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Save GPU VRAM when not in use. VRAM is quite valuable resource and should be possible to configure a keep_alive value. For example with Ollama it is configured like this:
keep_alive=-1 keeps model in memory indefinitely
keep_alive=0 unloads model after each use
keep_alive=60 keeps the model in memory for 1 minute after use
This can be a environment variable, default to -1 to not be a breaking change for anyone.
Reason for change
Right now the model is loaded into memory as soon as the container starts, and stays there even when not in use.
Proposed code change
No response
The text was updated successfully, but these errors were encountered:
Is this a new feature request?
Wanted change
Save GPU VRAM when not in use. VRAM is quite valuable resource and should be possible to configure a
keep_alive
value. For example with Ollama it is configured like this:keep_alive=-1
keeps model in memory indefinitelykeep_alive=0
unloads model after each usekeep_alive=60
keeps the model in memory for 1 minute after useThis can be a environment variable, default to
-1
to not be a breaking change for anyone.Reason for change
Right now the model is loaded into memory as soon as the container starts, and stays there even when not in use.
Proposed code change
No response
The text was updated successfully, but these errors were encountered: