chore: introduce first-class origin tags abstraction #441

tobz · 2025-01-23T19:37:28Z

Context

Currently, we've frontloaded origin detection/enrichment in the ingestion phase, where we strive to both resolve the origin of a metric and then also resolve the tags for that origin, cache them, and return a stable reference to those tags.... all before we resolve the context.

This technically works, but has one major issue: origin tags arrive incrementally from the Core Agent, especially during startup. Taking this approach, which necessitates caching to maintain reasonable performance, leads to issues with missing tags and effectively breaks origin enrichment.

Solution

We've taken the tack of breaking apart the resolving of an origin -- taking the raw identifiers and computing the unique origin identifier -- and the resolving of an origins tags.

Our new approach maintains much of the same machinery -- turning a raw OriginInfo<'a> into an OriginKey that is later used to look up the origin tags -- but we've added the aspect of deferring the actual resolution of the origin tags until they're queried.

To support this, we've created a new abstraction for origin tags called OriginTags. This type is the facade to visiting the tags attached to a given origin. It is powered by OriginKey, and a new trait, OriginTagsResolver, that replaces the old OriginEnricher. This allows us to resolve a stable origin identifier (OriginKey), bundle that identifier (and the resolver used to get it) into OriginTags itself, and then at the point where the origin tags are querier, the resolver is used again to look up the tags associated with the OriginKey.

Instead of merging and allocating storage for these tags, we've also added a new abstraction -- OriginTagsVisitor -- that is used to visit the tags attached to an origin. This lets us maintain a single copy of the tags in TagStore and simply iterate over them on-demand when requested.

A new type, OriginTagsQuerier, manages the link between the tag store, External Data resolver, and the logic to determine how to "resolve" an origin to its constituent parts -- pod, container, etc -- and sits within saluki-env. This type is wired into the ADP-specific workload provider, and bridged back to saluki-context by implementing the new OriginTagsResolver, such that configuring the DogStatsD source to use it is as simple as referencing the existing workload provider used in ADP.

Additionally, the Tagged implementation for Context now transparently visits the origin tags as well, which means both instrumented and origin tags are transparently chained together when visiting the tags for a context.

tobz · 2025-01-23T19:37:41Z

This stack of pull requests is managed by Graphite. Learn more about stacking.

pr-commenter · 2025-01-23T20:17:01Z

Regression Detector (DogStatsD)

Regression Detector Results

Run ID: b95e7750-c31f-4dbc-81d2-1ba814485db5

Baseline: 7.62.0-rc.8
Comparison: 7.62.0-rc.8

Optimization Goals: ✅ No significant changes detected

Fine details of change detection per experiment

perf	experiment	goal	Δ mean %	Δ mean % CI	trials
➖	dsd_uds_100mb_3k_contexts_distributions_only	memory utilization	+1.56	[+1.36, +1.76]	1
➖	dsd_uds_100mb_3k_contexts	ingress throughput	+0.00	[-0.05, +0.05]	1
➖	dsd_uds_40mb_12k_contexts_40_senders	ingress throughput	+0.00	[-0.00, +0.00]	1
➖	dsd_uds_500mb_3k_contexts	ingress throughput	+0.00	[-0.01, +0.01]	1
➖	dsd_uds_1mb_3k_contexts_dualship	ingress throughput	+0.00	[-0.00, +0.00]	1
➖	dsd_uds_1mb_50k_contexts_memlimit	ingress throughput	+0.00	[-0.00, +0.00]	1
➖	dsd_uds_100mb_250k_contexts	ingress throughput	-0.00	[-0.00, +0.00]	1
➖	dsd_uds_1mb_50k_contexts	ingress throughput	-0.00	[-0.00, +0.00]	1
➖	dsd_uds_512kb_3k_contexts	ingress throughput	-0.00	[-0.01, +0.01]	1
➖	dsd_uds_1mb_3k_contexts	ingress throughput	-0.00	[-0.00, +0.00]	1
➖	quality_gates_idle_rss	memory utilization	-0.00	[-0.08, +0.08]	1
➖	dsd_uds_10mb_3k_contexts	ingress throughput	-0.01	[-0.02, +0.00]	1

Bounds Checks: ❌ Failed

perf	experiment	bounds_check_name	replicates_passed	links
❌	quality_gates_idle_rss	memory_usage	0/10

Explanation

Confidence level: 90.00%
Effect size tolerance: |Δ mean %| ≥ 5.00%

Performance changes are noted in the perf column of each table:

✅ = significantly better comparison variant performance
❌ = significantly worse comparison variant performance
➖ = no significant change in performance

A regression test is an A/B test of target performance in a repeatable rig, where "performance" is measured as "comparison variant minus baseline variant" for an optimization goal (e.g., ingress throughput). Due to intrinsic variability in measuring that goal, we can only estimate its mean value for each experiment; we report uncertainty in that value as a 90.00% confidence interval denoted "Δ mean % CI".

For each experiment, we decide whether a change in performance is a "regression" -- a change worth investigating further -- if all of the following criteria are true:

Its estimated |Δ mean %| ≥ 5.00%, indicating the change is big enough to merit a closer look.
Its 90.00% confidence interval "Δ mean % CI" does not contain zero, indicating that if our statistical model is accurate, there is at least a 90.00% chance there is a difference in performance between baseline and comparison variants.
Its configuration does not mark it "erratic".

pr-commenter · 2025-01-23T20:31:31Z

Regression Detector Links

Experiment Result Links

experiment	link(s)
dsd_uds_100mb_250k_contexts	[Profiling (ADP)] [Profiling (DSD)] [SMP Dashboard]
dsd_uds_100mb_3k_contexts	[Profiling (ADP)] [Profiling (DSD)] [SMP Dashboard]
dsd_uds_100mb_3k_contexts_distributions_only	[Profiling (ADP)] [Profiling (DSD)] [SMP Dashboard]
dsd_uds_10mb_3k_contexts	[Profiling (ADP)] [Profiling (DSD)] [SMP Dashboard]
dsd_uds_1mb_3k_contexts	[Profiling (ADP)] [Profiling (DSD)] [SMP Dashboard]
dsd_uds_1mb_3k_contexts_dualship	[Profiling (ADP)] [Profiling (DSD)] [SMP Dashboard]
dsd_uds_1mb_50k_contexts	[Profiling (ADP)] [Profiling (DSD)] [SMP Dashboard]
dsd_uds_1mb_50k_contexts_memlimit	[Profiling (ADP)] [Profiling (DSD)] [SMP Dashboard]
dsd_uds_40mb_12k_contexts_40_senders	[Profiling (ADP)] [Profiling (DSD)] [SMP Dashboard]
dsd_uds_500mb_3k_contexts	[Profiling (ADP)] [Profiling (DSD)] [SMP Dashboard]
dsd_uds_512kb_3k_contexts	[Profiling (ADP)] [Profiling (DSD)] [SMP Dashboard]
quality_gates_idle_rss	[Profiling (ADP)] [Profiling (DSD)] [SMP Dashboard]
dsd_uds_50mb_10k_contexts_no_inlining (ADP only)	[Profiling (ADP)] [SMP Dashboard]
dsd_uds_50mb_10k_contexts_no_inlining_no_allocs (ADP only)	[Profiling (ADP)] [SMP Dashboard]

lukesteensen

Just a note on something it'd be nice to simplify and/or document a bit more, which can certainly be a followup.

lukesteensen · 2025-01-28T16:21:09Z

lib/saluki-components/src/sources/dogstatsd/mod.rs

-    /// See [`OriginEnrichmentConfiguration`] for more details.
-    #[serde(default)]
-    origin_enrichment: OriginEnrichmentConfiguration,
+pub struct DogStatsDConfiguration<R = ()> {


It'd be nice not to add this generic, since I don't think it's really serving any meaningful purpose. You can't build the source with the default unit type, and there's really only one implementation of the trait.

More generally, abstracting with the trait leads to a bigger stack of nouns to hold in your head when reviewing this stuff (i.e. WorkloadProvider -> OriginTagsResolver -> OriginTagsQuerier -> TagStoreQuerier + DogStatsDOriginTagResolver), which is especially confusing because at least TagStoreQuerier is used at multiple levels. That's not to say the design is wrong, it's just complex. If there's a workable way to simplify it, great, and if not then it might be useful have some kind of ascii diagram in a comment that illustrates how all the pieces fit together 😅

I'll do an immediate follow-up PR to clean up all of this and remove the generic because I don't disagree that it's a bit intertwined.

This was referenced Jan 23, 2025

chore: slightly revamp External Data support to expose both pod and container entity ID #440

Merged

chore: add resolver test for origin uniqueness + small cleanup #442

Closed

github-actions bot added area/core Core functionality, event model, etc. area/config Configuration. area/components Sources, transforms, and destinations. source/dogstatsd DogStatsD source. labels Jan 23, 2025

tobz marked this pull request as ready for review January 23, 2025 20:22

tobz requested review from a team as code owners January 23, 2025 20:22

This was referenced Jan 24, 2025

debugging: add trace logging to show exactly which buckets are being flushed during aggregation #443

Closed

fix: merge gauges properly and avoid dropping existing points #444

Merged

tobz force-pushed the tobz/external-data-expose-pod-container-entity-ids branch from d9aba07 to 80ef087 Compare January 27, 2025 14:51

tobz force-pushed the tobz/origin-enrichment-origin-tags branch from ec65bf8 to 30c0540 Compare January 27, 2025 14:51

tobz force-pushed the tobz/external-data-expose-pod-container-entity-ids branch from 80ef087 to b21a8c5 Compare January 27, 2025 14:59

tobz force-pushed the tobz/origin-enrichment-origin-tags branch from 30c0540 to 71ee9c4 Compare January 27, 2025 14:59

Base automatically changed from tobz/external-data-expose-pod-container-entity-ids to main January 28, 2025 15:53

tobz requested a review from a team as a code owner January 28, 2025 15:53

tobz force-pushed the tobz/origin-enrichment-origin-tags branch from fe4c64e to 71ee9c4 Compare January 28, 2025 16:01

github-actions bot removed area/io General I/O and networking. area/observability Internal observability of ADP and Saluki. labels Jan 28, 2025

tobz added 2 commits January 28, 2025 11:01

chore: make Tagged relevant to materialized tags, not raw tags

c03d80e

chore: introduce first-class origin tags abstraction

623ad73

tobz force-pushed the tobz/origin-enrichment-origin-tags branch from 71ee9c4 to 623ad73 Compare January 28, 2025 16:01

github-actions bot removed destination/datadog-metrics Datadog Metrics destination. destination/prometheus Prometheus Scrape destination. destination/datadog Common Datadog destination code. labels Jan 28, 2025

tobz removed request for a team January 28, 2025 16:02

lukesteensen approved these changes Jan 28, 2025

View reviewed changes

tobz merged commit 06ccd7f into main Jan 28, 2025
17 checks passed

tobz deleted the tobz/origin-enrichment-origin-tags branch January 28, 2025 16:36

tobz mentioned this pull request Jan 30, 2025

chore: clean up origin enrichment logic / type names / etc #463

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: introduce first-class origin tags abstraction #441

chore: introduce first-class origin tags abstraction #441

tobz commented Jan 23, 2025 •

edited

Loading

tobz commented Jan 23, 2025 •

edited

Loading

pr-commenter bot commented Jan 23, 2025 •

edited

Loading

Fine details of change detection per experiment

Explanation

pr-commenter bot commented Jan 23, 2025 •

edited

Loading

lukesteensen left a comment

lukesteensen Jan 28, 2025

tobz Jan 28, 2025

chore: introduce first-class origin tags abstraction #441

chore: introduce first-class origin tags abstraction #441

Conversation

tobz commented Jan 23, 2025 • edited Loading

Context

Solution

tobz commented Jan 23, 2025 • edited Loading

pr-commenter bot commented Jan 23, 2025 • edited Loading

Regression Detector (DogStatsD)

Regression Detector Results

Optimization Goals: ✅ No significant changes detected

Fine details of change detection per experiment

Bounds Checks: ❌ Failed

Explanation

pr-commenter bot commented Jan 23, 2025 • edited Loading

Regression Detector Links

Experiment Result Links

lukesteensen left a comment

Choose a reason for hiding this comment

lukesteensen Jan 28, 2025

Choose a reason for hiding this comment

tobz Jan 28, 2025

Choose a reason for hiding this comment

tobz commented Jan 23, 2025 •

edited

Loading

tobz commented Jan 23, 2025 •

edited

Loading

pr-commenter bot commented Jan 23, 2025 •

edited

Loading

pr-commenter bot commented Jan 23, 2025 •

edited

Loading