chore(llmobs): use span store instead of temporary tags #11543

Yun-Kim · 2024-11-26T00:23:56Z

This PR performs some cleanup refactors on the LLM Obs SDK and associated integrations. Specifically regarding the data stored, which includes LLMObs span metadata/metrics/tags/IO:

Stop storing these as temporary span tags and instead use the span store field, which allows arbitrary key value pairs but is not submitted to Datadog. This removes the potential for temporary tags to be not extracted and still submitted as a APM span tag.
Stop attempting safe_json() (i.e. json.dumps()) to store the above data, which is an expensive operation that adds up with the number of separate calls, and instead just store the raw values of the stored objects in the store field, and only call safe_json() "once" at payload encoding time. (Note for input/output.value fields, we need to call safe_json() before encoding as we need to cast it to a string before encoding.)

Things to look out for:

Previously we were calling safe_json() every time to store data as string span tags. One danger includes errors during span processing due to wrong types (expect string, likely receive a dictionary/object from the span store field)
By avoiding any jsonify processing before encode time, a small edge case appeared from the LLMObs SDK decorator function which auto-annotates non-LLM spans with input function argument maps. In Python 3.8, the bind_partial().arguments call used to extract the function arguments returns an OrderedDict (otherwise returns a regular Dict() in Python >= 3.9, which broke some tests as we were simply casting to a string when storing the input/output value). I added a fix to cast the bind_partial().arguments object to a dict to avoid this issue coming up.

Next Steps

This is a great first step, but there are still tons of performance improvements we can make to our encoding/writing. The most notable is that we call json.dumps() on span events more than once (to calculate the payload size before adding to the buffer).

Checklist

PR author has checked that all the criteria below are met
The PR description includes an overview of the change
The PR description articulates the motivation for the change
The change includes tests OR the PR description describes a testing strategy
The PR description notes risks associated with the change, if any
Newly-added code is easy to change
The change follows the library release note guidelines
The change includes or references documentation updates if necessary
Backport labels are set (if applicable)

Reviewer Checklist

Reviewer has checked that all the criteria below are met
Title is accurate
All changes are related to the pull request's stated goal
Avoids breaking API changes
Testing strategy adequately addresses listed risks
Newly-added code is easy to change
Release note makes sense to a user of the library
If necessary, author has acknowledged and discussed the performance implications of this PR as reported in the benchmarks PR comment
Backport labels are set in a manner that is consistent with the release branch maintenance policy

github-actions · 2024-11-26T00:24:27Z

CODEOWNERS have been resolved as:

tests/llmobs/test_llmobs_span_encoder.py                                @DataDog/ml-observability
ddtrace/llmobs/_integrations/anthropic.py                               @DataDog/ml-observability
ddtrace/llmobs/_integrations/bedrock.py                                 @DataDog/ml-observability
ddtrace/llmobs/_integrations/gemini.py                                  @DataDog/ml-observability
ddtrace/llmobs/_integrations/langchain.py                               @DataDog/ml-observability
ddtrace/llmobs/_integrations/openai.py                                  @DataDog/ml-observability
ddtrace/llmobs/_integrations/vertexai.py                                @DataDog/ml-observability
ddtrace/llmobs/_llmobs.py                                               @DataDog/ml-observability
ddtrace/llmobs/_trace_processor.py                                      @DataDog/ml-observability
ddtrace/llmobs/_utils.py                                                @DataDog/ml-observability
ddtrace/llmobs/_writer.py                                               @DataDog/ml-observability
ddtrace/llmobs/decorators.py                                            @DataDog/ml-observability
tests/contrib/anthropic/test_anthropic_llmobs.py                        @DataDog/ml-observability
tests/contrib/openai/test_openai_llmobs.py                              @DataDog/ml-observability
tests/llmobs/_utils.py                                                  @DataDog/ml-observability
tests/llmobs/test_llmobs_decorators.py                                  @DataDog/ml-observability
tests/llmobs/test_llmobs_service.py                                     @DataDog/ml-observability
tests/llmobs/test_llmobs_span_agentless_writer.py                       @DataDog/ml-observability
tests/llmobs/test_llmobs_trace_processor.py                             @DataDog/ml-observability

ddtrace/llmobs/_integrations/openai.py

…tests from integrations

pr-commenter · 2024-11-26T22:38:57Z

Benchmarks

Benchmark execution time: 2024-12-12 22:58:54

Comparing candidate commit fd00236 in PR branch yunkim/llmobs-use-span-store with baseline commit 79be26d in branch main.

Found 0 performance improvements and 0 performance regressions! Performance is the same for 394 metrics, 2 unstable metrics.

datadog-dd-trace-py-rkomorn · 2024-12-04T22:49:37Z

Datadog Report

Branch report: yunkim/llmobs-use-span-store
Commit report: 433f9cb
Test service: dd-trace-py

✅ 0 Failed, 1193 Passed, 275 Skipped, 8m 36.51s Total duration (16m 21.29s time saved)

lievan

niceee

ddtrace/llmobs/_integrations/langchain.py

Kyle-Verhoog

2 smaller things but overall huge improvement 👏

glad to see those json encodes and decodes go away! 🧹

ddtrace/llmobs/_writer.py

tests/llmobs/test_llmobs_service.py

…#11745) This PR makes a small revert to #11543 where accessing propagated parent IDs (for distributed tracing) were unwittingly changed to access via the span store object, even though automatic context propagation results are always added to the span as tags (not the store). While all other LLMObs SDK data is added/accessed via the span store, `_dd.p.llmobs_parent_id` is automatically added by the tracer internals so we'll continue using this for now until our overall context management solution removes this problem entirely. ## Checklist - [x] PR author has checked that all the criteria below are met - The PR description includes an overview of the change - The PR description articulates the motivation for the change - The change includes tests OR the PR description describes a testing strategy - The PR description notes risks associated with the change, if any - Newly-added code is easy to change - The change follows the [library release note guidelines](https://ddtrace.readthedocs.io/en/stable/releasenotes.html) - The change includes or references documentation updates if necessary - Backport labels are set (if [applicable](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting)) ## Reviewer Checklist - [x] Reviewer has checked that all the criteria below are met - Title is accurate - All changes are related to the pull request's stated goal - Avoids breaking [API](https://ddtrace.readthedocs.io/en/stable/versioning.html#interfaces) changes - Testing strategy adequately addresses listed risks - Newly-added code is easy to change - Release note makes sense to a user of the library - If necessary, author has acknowledged and discussed the performance implications of this PR as reported in the benchmarks PR comment - Backport labels are set in a manner that is consistent with the [release branch maintenance policy](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting)

This PR resolves an issue in the Python SDK where non-ascii/utf8 characters being annotated on spans resulted in span payloads being dropped due to encoding errors. In #11330 we previously added the `ensure_ascii=False` option to our `safe_json()` helper's use of `json.dumps(...)` in order to keep non-ascii characters from being encoded multiple times into nonsense (as we were calling `safe_json()` multiple nested times while building the span event from the span tags. However this resulted in issues where non-latin1 characters (which is a subset of utf-8 and apparently the encoding scheme HTTP library relies on, which we in turn rely on to submit payloads) broke the encoding at payload submission time. To fix this, we remove the `ensure_ascii=False` option at the final write time. Also note that after #11543 we mostly centralized all of the times a span event is encoded, which is at write time and when encoding the span's input/output value fields (which can be a json dictionary format). Since we need to provide valid json formatting for the IO fields (which leads to a prettier UI display), we still need to call `json.dumps(ensure_ascii=False)` to avoid the same problem as fixed by #11330, i.e. keep the non-ascii characters unencoded until at the very end (i.e. write time) This PR also adds minor test fixtures mocking out the LLMObs back end intake to make assertions on the payloads we should be submitting to LLMObs, since previous tests were all relying on the span events prior to encoding/submission and weren't able to cover this scenario. ## Checklist - [x] PR author has checked that all the criteria below are met - The PR description includes an overview of the change - The PR description articulates the motivation for the change - The change includes tests OR the PR description describes a testing strategy - The PR description notes risks associated with the change, if any - Newly-added code is easy to change - The change follows the [library release note guidelines](https://ddtrace.readthedocs.io/en/stable/releasenotes.html) - The change includes or references documentation updates if necessary - Backport labels are set (if [applicable](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting)) ## Reviewer Checklist - [ ] Reviewer has checked that all the criteria below are met - Title is accurate - All changes are related to the pull request's stated goal - Avoids breaking [API](https://ddtrace.readthedocs.io/en/stable/versioning.html#interfaces) changes - Testing strategy adequately addresses listed risks - Newly-added code is easy to change - Release note makes sense to a user of the library - If necessary, author has acknowledged and discussed the performance implications of this PR as reported in the benchmarks PR comment - Backport labels are set in a manner that is consistent with the [release branch maintenance policy](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting) --------- Co-authored-by: Kyle Verhoog <[email protected]> Co-authored-by: Yun Kim <[email protected]> Co-authored-by: Yun Kim <[email protected]>

This PR resolves an issue in the Python SDK where non-ascii/utf8 characters being annotated on spans resulted in span payloads being dropped due to encoding errors. In #11330 we previously added the `ensure_ascii=False` option to our `safe_json()` helper's use of `json.dumps(...)` in order to keep non-ascii characters from being encoded multiple times into nonsense (as we were calling `safe_json()` multiple nested times while building the span event from the span tags. However this resulted in issues where non-latin1 characters (which is a subset of utf-8 and apparently the encoding scheme HTTP library relies on, which we in turn rely on to submit payloads) broke the encoding at payload submission time. To fix this, we remove the `ensure_ascii=False` option at the final write time. Also note that after #11543 we mostly centralized all of the times a span event is encoded, which is at write time and when encoding the span's input/output value fields (which can be a json dictionary format). Since we need to provide valid json formatting for the IO fields (which leads to a prettier UI display), we still need to call `json.dumps(ensure_ascii=False)` to avoid the same problem as fixed by #11330, i.e. keep the non-ascii characters unencoded until at the very end (i.e. write time) This PR also adds minor test fixtures mocking out the LLMObs back end intake to make assertions on the payloads we should be submitting to LLMObs, since previous tests were all relying on the span events prior to encoding/submission and weren't able to cover this scenario. ## Checklist - [x] PR author has checked that all the criteria below are met - The PR description includes an overview of the change - The PR description articulates the motivation for the change - The change includes tests OR the PR description describes a testing strategy - The PR description notes risks associated with the change, if any - Newly-added code is easy to change - The change follows the [library release note guidelines](https://ddtrace.readthedocs.io/en/stable/releasenotes.html) - The change includes or references documentation updates if necessary - Backport labels are set (if [applicable](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting)) ## Reviewer Checklist - [ ] Reviewer has checked that all the criteria below are met - Title is accurate - All changes are related to the pull request's stated goal - Avoids breaking [API](https://ddtrace.readthedocs.io/en/stable/versioning.html#interfaces) changes - Testing strategy adequately addresses listed risks - Newly-added code is easy to change - Release note makes sense to a user of the library - If necessary, author has acknowledged and discussed the performance implications of this PR as reported in the benchmarks PR comment - Backport labels are set in a manner that is consistent with the [release branch maintenance policy](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting) --------- Co-authored-by: Kyle Verhoog <[email protected]> Co-authored-by: Yun Kim <[email protected]> Co-authored-by: Yun Kim <[email protected]> (cherry picked from commit e11a0a3)

This PR resolves an issue in the Python SDK where non-ascii/utf8 characters being annotated on spans resulted in span payloads being dropped due to encoding errors. In #11330 we previously added the `ensure_ascii=False` option to our `safe_json()` helper's use of `json.dumps(...)` in order to keep non-ascii characters from being encoded multiple times into nonsense (as we were calling `safe_json()` multiple nested times while building the span event from the span tags. However this resulted in issues where non-latin1 characters (which is a subset of utf-8 and apparently the encoding scheme HTTP library relies on, which we in turn rely on to submit payloads) broke the encoding at payload submission time. To fix this, we remove the `ensure_ascii=False` option at the final write time. Also note that after #11543 we mostly centralized all of the times a span event is encoded, which is at write time and when encoding the span's input/output value fields (which can be a json dictionary format). Since we need to provide valid json formatting for the IO fields (which leads to a prettier UI display), we still need to call `json.dumps(ensure_ascii=False)` to avoid the same problem as fixed by end (i.e. write time) This PR also adds minor test fixtures mocking out the LLMObs back end intake to make assertions on the payloads we should be submitting to LLMObs, since previous tests were all relying on the span events prior to encoding/submission and weren't able to cover this scenario. - [x] PR author has checked that all the criteria below are met - The PR description includes an overview of the change - The PR description articulates the motivation for the change - The change includes tests OR the PR description describes a testing strategy - The PR description notes risks associated with the change, if any - Newly-added code is easy to change - The change follows the [library release note guidelines](https://ddtrace.readthedocs.io/en/stable/releasenotes.html) - The change includes or references documentation updates if necessary - Backport labels are set (if [applicable](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting)) - [ ] Reviewer has checked that all the criteria below are met - Title is accurate - All changes are related to the pull request's stated goal - Avoids breaking [API](https://ddtrace.readthedocs.io/en/stable/versioning.html#interfaces) changes - Testing strategy adequately addresses listed risks - Newly-added code is easy to change - Release note makes sense to a user of the library - If necessary, author has acknowledged and discussed the performance implications of this PR as reported in the benchmarks PR comment - Backport labels are set in a manner that is consistent with the [release branch maintenance policy](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting) --------- Co-authored-by: Kyle Verhoog <[email protected]> Co-authored-by: Yun Kim <[email protected]> Co-authored-by: Yun Kim <[email protected]>

…0] (#12031) Backport e11a0a3 from #11961 to 2.20. This PR resolves an issue in the Python SDK where non-ascii/utf8 characters being annotated on spans resulted in span payloads being dropped due to encoding errors. In #11330 we previously added the `ensure_ascii=False` option to our `safe_json()` helper's use of `json.dumps(...)` in order to keep non-ascii characters from being encoded multiple times into nonsense (as we were calling `safe_json()` multiple nested times while building the span event from the span tags. However this resulted in issues where non-latin1 characters (which is a subset of utf-8 and apparently the encoding scheme HTTP library relies on, which we in turn rely on to submit payloads) broke the encoding at payload submission time. To fix this, we remove the `ensure_ascii=False` option at the final write time. Also note that after #11543 we mostly centralized all of the times a span event is encoded, which is at write time and when encoding the span's input/output value fields (which can be a json dictionary format). Since we need to provide valid json formatting for the IO fields (which leads to a prettier UI display), we still need to call `json.dumps(ensure_ascii=False)` to avoid the same problem as fixed by #11330, i.e. keep the non-ascii characters unencoded until at the very end (i.e. write time) This PR also adds minor test fixtures mocking out the LLMObs back end intake to make assertions on the payloads we should be submitting to LLMObs, since previous tests were all relying on the span events prior to encoding/submission and weren't able to cover this scenario. ## Checklist - [x] PR author has checked that all the criteria below are met - The PR description includes an overview of the change - The PR description articulates the motivation for the change - The change includes tests OR the PR description describes a testing strategy - The PR description notes risks associated with the change, if any - Newly-added code is easy to change - The change follows the [library release note guidelines](https://ddtrace.readthedocs.io/en/stable/releasenotes.html) - The change includes or references documentation updates if necessary - Backport labels are set (if [applicable](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting)) ## Reviewer Checklist - [x] Reviewer has checked that all the criteria below are met - Title is accurate - All changes are related to the pull request's stated goal - Avoids breaking [API](https://ddtrace.readthedocs.io/en/stable/versioning.html#interfaces) changes - Testing strategy adequately addresses listed risks - Newly-added code is easy to change - Release note makes sense to a user of the library - If necessary, author has acknowledged and discussed the performance implications of this PR as reported in the benchmarks PR comment - Backport labels are set in a manner that is consistent with the [release branch maintenance policy](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting) Co-authored-by: Jonathan Chavez <[email protected]>

to 2.19] (#12033) Backports #11961 to 2.19. **Note that due to non-existent test files/conftest utilities in the 2.19 branch, this backport avoids backporting over the entire diff of #11961 and instead just backports over the fix implementation.** This PR resolves an issue in the Python SDK where non-ascii/utf8 characters being annotated on spans resulted in span payloads being dropped due to encoding errors. In #11330 we previously added the `ensure_ascii=False` option to our `safe_json()` helper's use of `json.dumps(...)` in order to keep non-ascii characters from being encoded multiple times into nonsense (as we were calling `safe_json()` multiple nested times while building the span event from the span tags. However this resulted in issues where non-latin1 characters (which is a subset of utf-8 and apparently the encoding scheme HTTP library relies on, which we in turn rely on to submit payloads) broke the encoding at payload submission time. To fix this, we remove the `ensure_ascii=False` option at the final write time. Also note that after #11543 we mostly centralized all of the times a span event is encoded, which is at write time and when encoding the span's input/output value fields (which can be a json dictionary format). Since we need to provide valid json formatting for the IO fields (which leads to a prettier UI display), we still need to call `json.dumps(ensure_ascii=False)` to avoid the same problem as fixed by end (i.e. write time) This PR also adds minor test fixtures mocking out the LLMObs back end intake to make assertions on the payloads we should be submitting to LLMObs, since previous tests were all relying on the span events prior to encoding/submission and weren't able to cover this scenario. --------- ## Checklist - [x] PR author has checked that all the criteria below are met - The PR description includes an overview of the change - The PR description articulates the motivation for the change - The change includes tests OR the PR description describes a testing strategy - The PR description notes risks associated with the change, if any - Newly-added code is easy to change - The change follows the [library release note guidelines](https://ddtrace.readthedocs.io/en/stable/releasenotes.html) - The change includes or references documentation updates if necessary - Backport labels are set (if [applicable](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting)) ## Reviewer Checklist - [x] Reviewer has checked that all the criteria below are met - Title is accurate - All changes are related to the pull request's stated goal - Avoids breaking [API](https://ddtrace.readthedocs.io/en/stable/versioning.html#interfaces) changes - Testing strategy adequately addresses listed risks - Newly-added code is easy to change - Release note makes sense to a user of the library - If necessary, author has acknowledged and discussed the performance implications of this PR as reported in the benchmarks PR comment - Backport labels are set in a manner that is consistent with the [release branch maintenance policy](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting) Co-authored-by: Jonathan Chavez <[email protected]> Co-authored-by: Kyle Verhoog <[email protected]>

Use span store instead of temporary tags

9b400ff

Yun-Kim added the changelog/no-changelog A changelog entry is not required for this PR. label Nov 26, 2024

datadog-datadog-prod-us1 bot reviewed Nov 26, 2024

View reviewed changes

ddtrace/llmobs/_integrations/openai.py Show resolved Hide resolved

Yun-Kim added 3 commits November 25, 2024 19:24

fmt

6412e16

Fix changes caught by tests

5d98ad1

Fix LLMObs tests, add span encoder test, remove unserializable error …

fb5e470

…tests from integrations

Yun-Kim added 2 commits November 27, 2024 14:07

Fix langchain tests, decorator cast to dict (see py3.8 ordereddict)

cef1f18

fmt

3346570

Yun-Kim marked this pull request as ready for review November 27, 2024 19:16

Yun-Kim requested a review from a team as a code owner November 27, 2024 19:16

Yun-Kim and others added 6 commits November 27, 2024 16:13

fix langchain chain test helpers

e28c218

fmt

fdf51ba

typing + missed langchain test fix

8cf2317

Merge branch 'main' into yunkim/llmobs-use-span-store

c05e0ef

fmt

24de3d0

Fix vertexAI integration to use span store

13551cd

Fix omitted change due to upstream merge

433f9cb

lievan approved these changes Dec 11, 2024

View reviewed changes

ddtrace/llmobs/_integrations/langchain.py Outdated Show resolved Hide resolved

Yun-Kim added the manual merge Do not automatically merge label Dec 11, 2024

Move input/output value safe json casting to trace processor

05c1e1b

Kyle-Verhoog approved these changes Dec 12, 2024

View reviewed changes

ddtrace/llmobs/_writer.py Outdated Show resolved Hide resolved

tests/llmobs/test_llmobs_service.py Show resolved Hide resolved

Remove unnecessary try/catch

fd00236

Yun-Kim enabled auto-merge (squash) December 12, 2024 23:24

Yun-Kim merged commit e474267 into main Dec 12, 2024
206 checks passed

Yun-Kim deleted the yunkim/llmobs-use-span-store branch December 12, 2024 23:34

Yun-Kim mentioned this pull request Dec 16, 2024

chore(llmobs): ensure propagated parent IDs are still using span tags #11745

Merged

2 tasks

Yun-Kim mentioned this pull request Jan 17, 2025

fix(llmobs): encode llm objects in utf-8 before sending #11961

Merged

2 tasks

github-actions bot mentioned this pull request Jan 22, 2025

fix(llmobs): encode llm objects in utf-8 before sending [backport 2.20] #12031

Merged

2 tasks

Yun-Kim mentioned this pull request Jan 22, 2025

fix(llmobs): encode llm objects in utf-8 before sending [backport #11961 to 2.19] #12033

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore(llmobs): use span store instead of temporary tags #11543

chore(llmobs): use span store instead of temporary tags #11543

Yun-Kim commented Nov 26, 2024 •

edited

Loading

github-actions bot commented Nov 26, 2024 •

edited

Loading

pr-commenter bot commented Nov 26, 2024 •

edited

Loading

datadog-dd-trace-py-rkomorn bot commented Dec 4, 2024 •

edited

Loading

lievan left a comment

Kyle-Verhoog left a comment

chore(llmobs): use span store instead of temporary tags #11543

chore(llmobs): use span store instead of temporary tags #11543

Conversation

Yun-Kim commented Nov 26, 2024 • edited Loading

Next Steps

Checklist

Reviewer Checklist

github-actions bot commented Nov 26, 2024 • edited Loading

pr-commenter bot commented Nov 26, 2024 • edited Loading

Benchmarks

datadog-dd-trace-py-rkomorn bot commented Dec 4, 2024 • edited Loading

Datadog Report

lievan left a comment

Choose a reason for hiding this comment

Kyle-Verhoog left a comment

Choose a reason for hiding this comment

Yun-Kim commented Nov 26, 2024 •

edited

Loading

github-actions bot commented Nov 26, 2024 •

edited

Loading

pr-commenter bot commented Nov 26, 2024 •

edited

Loading

datadog-dd-trace-py-rkomorn bot commented Dec 4, 2024 •

edited

Loading