-
Notifications
You must be signed in to change notification settings - Fork 63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ChatQnA: accelerate also teirerank with Gaudi #475
Conversation
Marking as draft because I'm not sure in which other places this should be added. (I can add And because I'm not sure what the recent changes in there other GenAI repos imply on reranking use... |
Looking at the CI fails, CI seems to be currently in rather broken state:
|
I've fixed the pre-commit failure. |
The Xeon failure is caused by opea-project/GenAIExamples#891, which removed the use of microservice layers for LLM and Embedding. #474 is the follow up change for helm-charts. For the guardrails-gaudi-values.yaml, unfortunately there is no way to include gaudi-values.yaml in helm chart, so it's ok to do duplicate changes there, or just keep it as is(Guardrail case still use CPU for reranking). You'll have to rebase with latest change to continue. |
I added
Thanks, done! |
I dropped the PR draft status, but I haven't tested this yet with the ChatQnA "nowrapper" changes that were merged after I filed this. I would expect rerank perf to be even more important after its wrapper service is not providing extra buffering / slowdown though... |
As Gaudi rerank worked fine for me, CI failure for it could be result of the later nowrapper changes:
I think it's another bug in CI though, of only specifying one of the 2 related TEI options. |
This is not a bug in CI, but with git HEAD, max warmup (matching specified max input length) is given only for |
Rebased teirerank config fix as first one, so that every commit works. This PR does not change anything related to guardrails, but still that test fails, due to another CI bug:
Fail is due to ChatQnA timeouting on guardrails:
Although that service gets (eventually) to Ready state and its log shows now errors:
=> CI runs the query before verifying that all necessary backend pods (at least TGI & TEI) have reached Ready state? |
@lianhao Has this not been fixed in CI yet: #454 (comment) ? |
The CI failure is caused by this commit opea-project/GenAIExamples#977 |
This failures is caused by a recent PR #977 in GenAIExample |
let's wait for PR #489 to land-in first |
Any idea why that regression was not caught by CI? |
Because it's a change in GenAIExamples repo, not in this GenAIInfra repo. |
Hm. Such breakage seems to happen often enough that I think PRs in those repos could trigger (e.g. after merging) also GenAIInfra CI tests... EDIT: Or if cross-repo triggers are not possible, maybe there could be automated GenAIInfra CI test runs at some specific interval (e.g. weekly or more often), to catch regressions caused by changes in external dependencies. It would be nice if such tests could file the detected issues as bugs ("[Bug] CI external dependency test 2024 WW43 - failed"). |
Cross-repo trigger seems not available right now. We have an automated trigger to create issue in GenAIInfra if any GenAIExamples' docker compose files get changed(In this specific case, it's issue #484). I believe @daisy-ycguo has already enabled the CD test for GenAIInfra which we might be able to leverage for periodical CD test. |
Max input length applies to both, so teirerank needs also max warmup length. Fixes: opea-project#483 Signed-off-by: Eero Tamminen <[email protected]>
Signed-off-by: Eero Tamminen <[email protected]>
Signed-off-by: Eero Tamminen <[email protected]>
Currently it's the same image. Signed-off-by: Eero Tamminen <[email protected]>
Description
Accelerate also
teirerank
with Gaudi, not justtei
.When reranking is used, it does not make sense (performance-wise) to accelerate just tei, as reranking is a larger bottleneck.
Issues
#486
Type of change
Dependencies
n/a
.Tests
Manually checked the ChatQnA throughput.