Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integration Exception Tracking #11732

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

ygree
Copy link

@ygree ygree commented Dec 13, 2024

Collect, dedupe, ddtrace.contrib errors reported via DDLogger and send to the telemetry.
Must be an ERROR or a log with an exception and a stack trace.

image

Jira ticket: AIDM-389

Checklist

  • PR author has checked that all the criteria below are met
  • The PR description includes an overview of the change
  • The PR description articulates the motivation for the change
  • The change includes tests OR the PR description describes a testing strategy
  • The PR description notes risks associated with the change, if any
  • Newly-added code is easy to change
  • The change follows the library release note guidelines
  • The change includes or references documentation updates if necessary
  • Backport labels are set (if applicable)

Reviewer Checklist

  • Reviewer has checked that all the criteria below are met
  • Title is accurate
  • All changes are related to the pull request's stated goal
  • Avoids breaking API changes
  • Testing strategy adequately addresses listed risks
  • Newly-added code is easy to change
  • Release note makes sense to a user of the library
  • If necessary, author has acknowledged and discussed the performance implications of this PR as reported in the benchmarks PR comment
  • Backport labels are set in a manner that is consistent with the release branch maintenance policy

@ygree ygree self-assigned this Dec 13, 2024
Copy link
Contributor

github-actions bot commented Dec 13, 2024

CODEOWNERS have been resolved as:

ddtrace/internal/logger.py                                              @DataDog/apm-core-python
ddtrace/internal/telemetry/writer.py                                    @DataDog/apm-core-python

@pr-commenter
Copy link

pr-commenter bot commented Dec 13, 2024

Benchmarks

Benchmark execution time: 2025-01-09 06:56:04

Comparing candidate commit b132c4e in PR branch ygree/integration-exception-tracking with baseline commit 232e2eb in branch main.

Found 0 performance improvements and 2 performance regressions! Performance is the same for 392 metrics, 2 unstable metrics.

scenario:iast_aspects-ospathbasename_aspect

  • 🟥 execution_time [+310.826ns; +372.786ns] or [+9.319%; +11.177%]

scenario:iast_aspects-ospathdirname_aspect

  • 🟥 execution_time [+413.111ns; +485.606ns] or [+11.242%; +13.215%]

@ygree ygree marked this pull request as ready for review December 14, 2024 01:53
@ygree ygree requested a review from a team as a code owner December 14, 2024 01:53
@ygree ygree requested a review from erikayasuda December 14, 2024 01:53
ygree added 4 commits January 6, 2025 15:38
Collect, dedupe, ddtrace.contrib logs, and send to the telemetry.
Report only an error or an exception with a stack trace. Added tags and stack trace (without redaction)
@ygree ygree force-pushed the ygree/integration-exception-tracking branch from b11966f to ec8f7ca Compare January 6, 2025 23:57
)


class _TelemetryConfig:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like we are introducing telemetry-specific logic into a logging source. Can we try to see if there is a different design that allows keeping the two separate, please?

Copy link
Author

@ygree ygree Jan 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really "introducing", since some of this was already there to capture errors, and this change just extends it to exception tracking.
Alternatively, we would have to duplicate all the logging calls in the contib modules just to have exception tracking, which is easy to forget to add, and just introduces code duplication in the instrumentation code.

I'll consider adding a separate telemetry logger if you think that's a better solution. It will probably need to be in the same package, because my attempt to put it in a telemetry package ended with

ImportError: cannot import name 'get_logger' from partially initialized module 'ddtrace.internal.logger' (most likely due to circular import)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have introduced DDTelemetryLogger to separate concerns. Please let me know what you think about it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thanks. I think we really need to move all telemetry-related code to the already existing telemetry sources. For instance, we already parse DD_INSTRUMENTATION_TELEMETRY_ENABLED in

self._telemetry_enabled = _get_config("DD_INSTRUMENTATION_TELEMETRY_ENABLED", True, asbool)
self._telemetry_heartbeat_interval = _get_config("DD_TELEMETRY_HEARTBEAT_INTERVAL", 60, float)
so there is no need to duplicate that logic here. In general we should avoid making tight coupling between components, or making them tighter. If logging and telemetry need to interact with each other, one will have to do it via an abstract interface that knows nothing about the other. Otherwise we will end up with circular reference issues. Perhaps @mabdinur can advise better on how to proceed here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the feedback! While I agree with the general concern about coupling software components, I would appreciate some clarification and guidance on how the proposed improvements can be implemented effectively. My previous attempts to achieve this didn’t succeed, so your input would be invaluable.

Could you elaborate on what you mean by "all telemetry-related code"? Moving DDTelemetryLogger to the telemetry module isn’t straightforward because it is tightly coupled with DDLogger. Its primary functionality revolves around logging - extracting exceptions and passing them to the telemetry module. As a result, its logic and state are more closely tied to the logger than to telemetry itself.

Regarding the configuration, this is indeed a trade-off. Moving it to the telemetry module would result in circular dependency issues during initialization. Any suggestions on how to address these challenges while keeping the codebase clean and decoupled would be greatly appreciated.

@ygree ygree requested a review from P403n1x87 January 9, 2025 06:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants