ChatQnA: accelerate also teirerank with Gaudi #475

eero-t · 2024-10-15T14:10:08Z

Description

Accelerate also teirerank with Gaudi, not just tei.

When reranking is used, it does not make sense (performance-wise) to accelerate just tei, as reranking is a larger bottleneck.

Issues

#486

Type of change

Bug fix (non-breaking change which fixes an issue)

Dependencies

n/a.

Tests

Manually checked the ChatQnA throughput.

eero-t · 2024-10-15T14:32:04Z

Marking as draft because I'm not sure in which other places this should be added.

(I can add common/teirerank/gaudi-values.yaml, but IMHO guardrails-gaudi-values.yaml files should be something added on top of gaudi-values.yaml, not independent.)

And because I'm not sure what the recent changes in there other GenAI repos imply on reranking use...

eero-t · 2024-10-15T14:51:04Z

Looking at the CI fails, CI seems to be currently in rather broken state:

pre-commit.ci fails due to CI check for Python doc formatting not being configured properly
Change to gaudi-values.yaml cannot be cause for the Xeon failures
PR does not touch guardrail files yet either

yongfengdu · 2024-10-16T02:13:50Z

I've fixed the pre-commit failure.
Guardrail failure should be caused by other changes(Docker or Python), I'm debugging it but maybe we can disable it first to make the CI work.
Please hold a while until this one merged.#473
Besides the guardrail fail, there is another relative path check failed, I'm sure it's not due to the code itself, but https://github.com/yongfengdu/GenAIInfra/blob/ci-fix/.github/workflows/pr-path-detection.yml

yongfengdu · 2024-10-16T08:52:20Z

The Xeon failure is caused by opea-project/GenAIExamples#891, which removed the use of microservice layers for LLM and Embedding. #474 is the follow up change for helm-charts.

For the guardrails-gaudi-values.yaml, unfortunately there is no way to include gaudi-values.yaml in helm chart, so it's ok to do duplicate changes there, or just keep it as is(Guardrail case still use CPU for reranking).

You'll have to rebase with latest change to continue.

eero-t · 2024-10-16T12:46:19Z

For the guardrails-gaudi-values.yaml, unfortunately there is no way to include gaudi-values.yaml in helm chart, so it's ok to do duplicate changes there, or just keep it as is(Guardrail case still use CPU for reranking).

I added gaudi-values.yaml to common/teirerank/, but did not touch guardrails, as I haven't done any performance testing for it.

You'll have to rebase with latest change to continue.

Thanks, done!

eero-t · 2024-10-16T12:52:02Z

I dropped the PR draft status, but I haven't tested this yet with the ChatQnA "nowrapper" changes that were merged after I filed this. I would expect rerank perf to be even more important after its wrapper service is not providing extra buffering / slowdown though...

eero-t · 2024-10-16T14:44:59Z

As Gaudi rerank worked fine for me, CI failure for it could be result of the later nowrapper changes:

2024-10-16T13:00:32.007226Z  INFO text_embeddings_backend_python::management: backends/python/src/management.rs:136: Python backend process terminated
Error: Error when doing warmup

Caused by:
    Could not start backend: max_warmup_length (1024) exceeds model's max_input_length (512), you can modify this value adding `-e MAX_WARMUP_SEQUENCE_LENGTH=<new_warmup_length>` to your Docker run command

I think it's another bug in CI though, of only specifying one of the 2 related TEI options.

eero-t · 2024-10-18T08:13:00Z

    Could not start backend: max_warmup_length (1024) exceeds model's max_input_length (512), you can modify this value adding `-e MAX_WARMUP_SEQUENCE_LENGTH=<new_warmup_length>` to your Docker run command

I think it's another bug in CI though, of only specifying one of the 2 related TEI options.

This is not a bug in CI, but with git HEAD, max warmup (matching specified max input length) is given only for tei, not teirerank: #483

eero-t · 2024-10-18T11:44:46Z

Rebased teirerank config fix as first one, so that every commit works.

This PR does not change anything related to guardrails, but still that test fails, due to another CI bug:

#kubectl logs chatqna20241018105227-testpod -n chatqna-20241018105227
curl: (22) The requested URL returned error: 500
Internal Server Errorcurl failed with code 22

Fail is due to ChatQnA timeouting on guardrails:

-----------Pod: chatqna20241018105227-6897bff66b-cm9sv---------
#kubectl describe pod chatqna20241018105227-6897bff66b-cm9sv -n chatqna-20241018105227
...
     Image:           100.83.111.229:5000/opea/chatqna-guardrails:latest
...
INFO:     10.244.164.179:36050 - "POST /v1/chatqna HTTP/1.1" 500 Internal Server Error
ERROR:    Exception in ASGI application
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/aiohttp/connector.py", line 1091, in _wrap_create_connection
    sock = await aiohappyeyeballs.start_connection(
...
 TimeoutError: [Errno 110] Connect call failed ('10.103.241.79', 80)
...
 aiohttp.client_exceptions.ClientConnectorError: Cannot connect to host chatqna20241018105227-guardrails-usvc:80 ssl:default [Connect call failed ('10.103.241.79', 80)]

Although that service gets (eventually) to Ready state and its log shows now errors:

NAME                                                     READY   STATUS      RESTARTS       AGE     IP               NODE                NOMINATED NODE   READINESS GATES
...
chatqna20241018105227-guardrails-usvc-7658584b7c-4d25z   1/1     Running     0              3m22s   10.244.164.175   aise-cluster-01-3   <none>           <none>
...
-----------Pod: chatqna20241018105227-guardrails-usvc-7658584b7c-4d25z---------
#kubectl describe pod chatqna20241018105227-guardrails-usvc-7658584b7c-4d25z -n chatqna-20241018105227
...
    Liveness:        http-get http://:guardrails-usvc/v1/health_check delay=5s timeout=1s period=5s #success=1 #failure=24
    Readiness:       http-get http://:guardrails-usvc/v1/health_check delay=5s timeout=1s period=5s #success=1 #failure=3
    Startup:         http-get http://:guardrails-usvc/v1/health_check delay=5s timeout=1s period=5s #success=1 #failure=120
...
  Type     Reason     Age    From               Message
  ----     ------     ----   ----               -------
  Normal   Created    3m11s  kubelet            Created container chatqna20241018105227
  Normal   Started    3m11s  kubelet            Started container chatqna20241018105227
  Warning  Unhealthy  3m5s   kubelet            Startup probe failed: Get "http://10.244.164.175:9090/v1/health_check": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
...
-----------Pod: chatqna20241018105227-guardrails-usvc-7658584b7c-4d25z---------
#kubectl describe pod chatqna20241018105227-guardrails-usvc-7658584b7c-4d25z -n chatqna-20241018105227
...
#kubectl logs chatqna20241018105227-guardrails-usvc-7658584b7c-4d25z -n chatqna-20241018105227
...
INFO:     100.83.111.245:34224 - "GET /v1/health_check HTTP/1.1" 200 OK
INFO:     10.244.164.137:50580 - "POST /v1/guardrails HTTP/1.1" 200 OK
INFO:     100.83.111.245:52736 - "GET /v1/health_check HTTP/1.1" 200 OK

=> CI runs the query before verifying that all necessary backend pods (at least TGI & TEI) have reached Ready state?

eero-t · 2024-10-18T11:57:39Z

=> CI runs the query before verifying that all necessary backend pods (at least TGI & TEI) have reached Ready state?

@lianhao Has this not been fixed in CI yet: #454 (comment) ?

yongfengdu · 2024-10-21T01:37:53Z

The CI failure is caused by this commit opea-project/GenAIExamples#977
which changed the default port of guardrails, but we didn't specify the GUARDRAIL_SERVICE_PORT in our helm chart.

lianhao · 2024-10-21T01:38:20Z

=> CI runs the query before verifying that all necessary backend pods (at least TGI & TEI) have reached Ready state?

@lianhao Has this not been fixed in CI yet: #454 (comment) ?

This failures is caused by a recent PR #977 in GenAIExample

lianhao · 2024-10-21T01:55:49Z

let's wait for PR #489 to land-in first

eero-t · 2024-10-21T08:52:39Z

The CI failure is caused by this commit opea-project/GenAIExamples#977 which changed the default port of guardrails, but we didn't specify the GUARDRAIL_SERVICE_PORT in our helm chart.

Any idea why that regression was not caught by CI?

lianhao · 2024-10-22T01:25:32Z

The CI failure is caused by this commit opea-project/GenAIExamples#977 which changed the default port of guardrails, but we didn't specify the GUARDRAIL_SERVICE_PORT in our helm chart.

Any idea why that regression was not caught by CI?

Because it's a change in GenAIExamples repo, not in this GenAIInfra repo.

eero-t · 2024-10-22T09:36:49Z

Because it's a change in GenAIExamples repo, not in this GenAIInfra repo.

Hm. Such breakage seems to happen often enough that I think PRs in those repos could trigger (e.g. after merging) also GenAIInfra CI tests...

EDIT: Or if cross-repo triggers are not possible, maybe there could be automated GenAIInfra CI test runs at some specific interval (e.g. weekly or more often), to catch regressions caused by changes in external dependencies. It would be nice if such tests could file the detected issues as bugs ("[Bug] CI external dependency test 2024 WW43 - failed").

lianhao · 2024-10-23T06:43:04Z

Because it's a change in GenAIExamples repo, not in this GenAIInfra repo.

Hm. Such breakage seems to happen often enough that I think PRs in those repos could trigger (e.g. after merging) also GenAIInfra CI tests...

EDIT: Or if cross-repo triggers are not possible, maybe there could be automated GenAIInfra CI test runs at some specific interval (e.g. weekly or more often), to catch regressions caused by changes in external dependencies. It would be nice if such tests could file the detected issues as bugs ("[Bug] CI external dependency test 2024 WW43 - failed").

Cross-repo trigger seems not available right now. We have an automated trigger to create issue in GenAIInfra if any GenAIExamples' docker compose files get changed(In this specific case, it's issue #484). I believe @daisy-ycguo has already enabled the CD test for GenAIInfra which we might be able to leverage for periodical CD test.

Max input length applies to both, so teirerank needs also max warmup length. Fixes: opea-project#483 Signed-off-by: Eero Tamminen <[email protected]>

Signed-off-by: Eero Tamminen <[email protected]>

helm-charts/common/teirerank/gaudi-values.yaml

helm-charts/chatqna/gaudi-values.yaml

Currently it's the same image. Signed-off-by: Eero Tamminen <[email protected]>

eero-t requested review from yongfengdu and lianhao as code owners October 15, 2024 14:10

eero-t marked this pull request as draft October 15, 2024 14:10

eero-t force-pushed the gaudi-accel branch from b034a8d to 513f319 Compare October 15, 2024 14:15

eero-t force-pushed the gaudi-accel branch from 513f319 to bb5cea1 Compare October 16, 2024 12:42

eero-t marked this pull request as ready for review October 16, 2024 12:46

eero-t mentioned this pull request Oct 17, 2024

Gaudi teirerank is in crash loop after Helm install #483

Closed

lianhao force-pushed the gaudi-accel branch from bb5cea1 to c8f3d1f Compare October 18, 2024 01:30

eero-t force-pushed the gaudi-accel branch from c8f3d1f to b57cb16 Compare October 18, 2024 10:48

eero-t added 3 commits October 23, 2024 14:43

Fix: use MAX_WARMUP_SEQUENCE_LENGTH for both tei & teirerank

7009a77

Max input length applies to both, so teirerank needs also max warmup length. Fixes: opea-project#483 Signed-off-by: Eero Tamminen <[email protected]>

gaudi-values.yaml for teirerank

310067d

Signed-off-by: Eero Tamminen <[email protected]>

ChatQnA: accelerate also teirerank

f3dbe74

Signed-off-by: Eero Tamminen <[email protected]>

lianhao force-pushed the gaudi-accel branch from b57cb16 to f3dbe74 Compare October 23, 2024 06:43

lianhao requested changes Oct 23, 2024

View reviewed changes

helm-charts/common/teirerank/gaudi-values.yaml Outdated Show resolved Hide resolved

helm-charts/chatqna/gaudi-values.yaml Outdated Show resolved Hide resolved

Use tei-gaudi "latest" tag instead of "1.0"

95530f5

Currently it's the same image. Signed-off-by: Eero Tamminen <[email protected]>

lianhao approved these changes Oct 24, 2024

View reviewed changes

yongfengdu approved these changes Oct 24, 2024

View reviewed changes

lianhao merged commit 620963f into opea-project:main Oct 25, 2024
12 checks passed

eero-t deleted the gaudi-accel branch December 5, 2024 10:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ChatQnA: accelerate also teirerank with Gaudi #475

ChatQnA: accelerate also teirerank with Gaudi #475

eero-t commented Oct 15, 2024 •

edited by lianhao

Loading

eero-t commented Oct 15, 2024 •

edited

Loading

eero-t commented Oct 15, 2024

yongfengdu commented Oct 16, 2024

yongfengdu commented Oct 16, 2024

eero-t commented Oct 16, 2024

eero-t commented Oct 16, 2024

eero-t commented Oct 16, 2024 •

edited

Loading

eero-t commented Oct 18, 2024

eero-t commented Oct 18, 2024

eero-t commented Oct 18, 2024 •

edited

Loading

yongfengdu commented Oct 21, 2024

lianhao commented Oct 21, 2024

lianhao commented Oct 21, 2024

eero-t commented Oct 21, 2024

lianhao commented Oct 22, 2024

eero-t commented Oct 22, 2024 •

edited

Loading

lianhao commented Oct 23, 2024 •

edited

Loading

ChatQnA: accelerate also teirerank with Gaudi #475

ChatQnA: accelerate also teirerank with Gaudi #475

Conversation

eero-t commented Oct 15, 2024 • edited by lianhao Loading

Description

Issues

Type of change

Dependencies

Tests

eero-t commented Oct 15, 2024 • edited Loading

eero-t commented Oct 15, 2024

yongfengdu commented Oct 16, 2024

yongfengdu commented Oct 16, 2024

eero-t commented Oct 16, 2024

eero-t commented Oct 16, 2024

eero-t commented Oct 16, 2024 • edited Loading

eero-t commented Oct 18, 2024

eero-t commented Oct 18, 2024

eero-t commented Oct 18, 2024 • edited Loading

yongfengdu commented Oct 21, 2024

lianhao commented Oct 21, 2024

lianhao commented Oct 21, 2024

eero-t commented Oct 21, 2024

lianhao commented Oct 22, 2024

eero-t commented Oct 22, 2024 • edited Loading

lianhao commented Oct 23, 2024 • edited Loading

eero-t commented Oct 15, 2024 •

edited by lianhao

Loading

eero-t commented Oct 15, 2024 •

edited

Loading

eero-t commented Oct 16, 2024 •

edited

Loading

eero-t commented Oct 18, 2024 •

edited

Loading

eero-t commented Oct 22, 2024 •

edited

Loading

lianhao commented Oct 23, 2024 •

edited

Loading