Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LLM model not loaded in memory until first request for content-categoriser (and others?) causes timeouts and slow initial queries #890

Open
jeffbl opened this issue Oct 4, 2024 · 0 comments
Assignees

Comments

@jeffbl
Copy link
Member

jeffbl commented Oct 4, 2024

Implementation of #887 now loads the llava LLM in memory on pegasus and pins it there. However, the first request is still slow since the LLM is not loaded until first request. I think this is true of multiple models we're currently using. Work item is to figure out how we can load everything into memory so even first requests are fast.

Note that on a slower machine like unicorn, the initial load of LLMs can potentially take so long that the orchestrator moves on due to the request timing out. This effectively means that the first request fails completely.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants