-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Python backend with multiple instances cause unexpected and non-deterministic results #7907
Comments
@NadavShmayo Thanks for sharing this concise reproducer. Looking at the instructions at a high-level it seems that you are using pytorch model. In this case, can you review this section of the document? |
@NadavShmayo, we could reproduce a similar behavior and when running inference with an instance count greater than 1, the following error occurred randomly and frequently. Have you experienced the same error on your end?
|
Thank you for looking into this! |
Description
When using a Python backend with multiple model instances and running inference with many identical requests, the results are not deterministic and not even close to the expected result.
Triton Information
24.09
Are you using the Triton container or did you build it yourself?
Triton container (with additional Python libraries)
To Reproduce
Clone the following repository and follow the steps in the
README.md
file:https://github.com/NadavShmayo/fairseq-triton-example
Expected behavior
I expect the outputs from the Python model to be consistently the same for a request with the same input values.
The locust script in the example repository I created prints the output for every time it differs from the expected output.
Additional Information
ThreadPoolExecutor
, which lead to the same problem, even when moving every object initialization to inside the thread worker, to avoid non thread-safe behavior.The text was updated successfully, but these errors were encountered: