Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEAT] Models is always loaded in vram #32

Open
1 task done
ecker00 opened this issue Jan 22, 2025 · 1 comment
Open
1 task done

[FEAT] Models is always loaded in vram #32

ecker00 opened this issue Jan 22, 2025 · 1 comment
Labels
enhancement New feature or request

Comments

@ecker00
Copy link

ecker00 commented Jan 22, 2025

Is this a new feature request?

  • I have searched the existing issues

Wanted change

Save GPU VRAM when not in use. VRAM is quite valuable resource and should be possible to configure a keep_alive value. For example with Ollama it is configured like this:

  • keep_alive=-1 keeps model in memory indefinitely
  • keep_alive=0 unloads model after each use
  • keep_alive=60 keeps the model in memory for 1 minute after use

This can be a environment variable, default to -1 to not be a breaking change for anyone.

Reason for change

Right now the model is loaded into memory as soon as the container starts, and stays there even when not in use.

Proposed code change

No response

@ecker00 ecker00 added the enhancement New feature or request label Jan 22, 2025
Copy link

Thanks for opening your first issue here! Be sure to follow the relevant issue templates, or risk having this issue marked as invalid.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: Issues
Development

No branches or pull requests

1 participant