-
Notifications
You must be signed in to change notification settings - Fork 224
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] CrashLoopBackOff starting ChatQnA on a single-node k8s cluster #1202
Comments
@arun-gupta From huggingface/text-generation-inference#451, the I've tried deployed the ChatQnA with k8s using helm charts on both
May I ask which AWS instance are you using and how much memory is it equipped? |
This is due to pods not specifying their (CPU/memory) resource requests. The problem in specifying those is resource usage depending on the model and data types specified for the service. The default mistral model needs >30GB RAM with FP32, half of that with FP16/BF16 (supported only by very latest processors). With correct resource requests, pod would be in See discussion: opea-project/GenAIInfra#431 |
I was using
But TGI pod is still giving an error:
It is essential that we provide the minimum vCPU and memory requirement as part of the instructions. @devpramod I've requested a service quota increase so that a larger instance type can be started, for example |
Yes. It seems that the We could recommend to book and utilize a larger AWS instance type or switch to even smaller models or utilize bfloat16 dtype (available for c7i/m7i type and above) for lower memory consumption. |
I could successfully start the pods with |
Hi @wangkl2 Are the CPU, memory and disk requirements for ChatQnA documented for K8s? |
No. Apparently, the m7i.4xlarge instance with at least 100 GB disk size documented here (I think it should have been verified with docker compose deployment) is not suitable for the default setting of the K8S deployment with helm charts, which caused the model converting issue and CrashLoopBackOff phenomenon. From the testing of Arun and I, either |
Priority
Undecided
OS type
Ubuntu
Hardware type
Xeon-SPR
Installation method
Deploy method
Running nodes
Single Node
What's the version?
1.1
Description
Following the instructions outlined in opea-project/docs#179 and trying to get ChatQnA running on a single-node k8s cluster using Helm charts.
Environment:
Ubuntu 24.04 on EC2 instance
TGI server is failing to start:
The pod tries to restart itself and then get into a
CrashLoopBackoff
:Reproduce steps
opea-project/docs#179
Raw log
No response
The text was updated successfully, but these errors were encountered: