OTEL sources should create events without any transformations #5259

kkondaka · 2024-12-12T19:54:26Z

Is your feature request related to a problem? Please describe.
Currently, all OTEL sources (OTEL trace source, OTEL metric source and OTEL log source) does some transformations while creating events from OTEL data.

In all sources, the keys are created by replacing "." with "@" (dedotting)
In all sources, the attributes are "flattened" by moving them to the root of the event instead of nesting under "attributes"

The dedotting is done to make the data is compatible with OpenSearch.

Describe the solution you'd like
I think the transformations should be outside of the OTEL sources because sink is not always OpenSearch. Also, the users are not given any option to not do the transformations. We should remove the transformations from the OTEL sources and let users explicitly do this as a processor or OpenSearch sink option

   processor:
      - opensearch_compatibility_transform:
           flatten_attributes: true
           dedotting: true

or Alternatively

   sink:
     - opensearch:
           opensearch_compatibility_transformation: true

Describe alternatives you've considered (Optional)
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

The text was updated successfully, but these errors were encountered:

KarstenSchnitter · 2024-12-12T22:02:42Z

OpenTelemetry usually has a threefold aggregation of its signals: resource > instrumentation scope > logs/metrics/spans. This means a resource can hold a collection of instrumentation scopes which can hold a collection of either logs, metrics or spans. Each of these levels have a name and can have a nested map of attributes usually described by the semantic conventions.

The flattening of the data structure was introduced to break down these collections to the smallest part, i.e., a log record, metric data point or span record. This helps generate single documents for each of these parts. If the flattening is removed, the collections will stay together. Note, that these collections might have been built arbitrarily by the batch processor, e.g., implemented in the OpenTelemetry Collector. This seems strange to keep in Data Prepper.

Furthermore, Data Prepper currently does not handle the instrumentation scope attributes correctly. They are added as log/metric/span attributes without any means of distinction from the proper attributes of these signals. This needs to be fixed in any case.

Finally, the OpenSearch data model created by Data Prepper does not align to the schemes and mappings defined by the OpenSearch catalogue: https://github.com/opensearch-project/opensearch-catalog/tree/main/schema/observability. This should be aligned, so that data ingested by Data Prepper can be used with the visualizations from the catalogue.

dlvenable · 2024-12-13T15:03:51Z

@kkondaka , I like the goal of this proposal and think it makes sense. However, I'm not sure that a generic OpenSearch compatibility processor would be ideal. What does it mean to make something "OpenSearch compatible?"

For OTel, some fields are nested and some are not. You can see this in the OTel trace mappings.

Here is some sample data from the current trace pipeline. You see that resources.attributes. creates nested fields. But, within there the fields are flattened using @.

"span.attributes.sampler@param": true,
"span.attributes.http@method": "GET",
"span.attributes.http@url": "/jquery-3.1.1.min.js",
"resource.attributes.client-uuid": "41b4ebf38f38063a",
"resource.attributes.ip": "172.20.0.7",
"resource.attributes.host@name": "3a28510b4824",
"resource.attributes.opencensus@exporterversion": "Jaeger-Go-2.30.0",
"resource.attributes.service@name": "frontend",
"span.attributes.component": "net/http",

I think we'd do better moving this logic into the otel_traces processor. Similarly, we'd want to have processors for otel_metrics and otel_logs which get these to conform to the OpenSearch standard.

Also, we'd need to create a migration path. Could you elaborate on what that migration path would look like?

kkondaka · 2024-12-17T20:26:08Z

@KarstenSchnitter by "flattening" I meant flattening of "attributes" only. Which is, instead of putting attributes under "attributes" key, they are being put under the root. I have added "flatten_attributes" option to OTEL metrics to skip flattening if config option is provided. We need to add the same option to OTEL traces and logs.

kkondaka · 2024-12-17T20:29:46Z

@dlvenable regarding migration, I think in 2.x release, we should keep the default behavior as is but provide config options to skip "flattening" of attributes (already available for OTEL metrics) and also provide an option to skip "dedotting".

We should revisit at the time of 3.x release if we want to remove the default behavior altogether or provide optionally (basically, make default behavior optional and optional behavior default)

alamzeeshan · 2024-12-27T11:11:24Z

We are planning to migrate from Elasticsearch and Splunk to Opensearch and the feedback what we are getting from our users is that this @ notation is not user friendly. It is very confusing for them while writing PPL.
We don't have any issue with nested fields and we are fine with it. Also these renaming is not happening in http input. It is happening only with otel input.

I can see how this dedotting is going to be helpful in preventing mapping conflicts and will reduce field depth, but if you can control your incoming event format then this is not really helpful.

All we are asking is to provide an option to disable this feature. You can have the default behavior of replacing dot with @. Just give user a choice to opt out of it. We can add an option to otel_logs_source. Something like skip_dedot with default as false. People can use this option to skip dedot.

  source:
    otel_logs_source:
      skip_dedot: true

People are used to separate fields with dot and they will make mistake if it is a mix of dot and @.
"resource.attributes.k8s@pod@name" doesn't look as good as "resource.attributes.k8s.pod.name". Now you are saying that people has to remember to give @ instead of dot starting from second field after attributes. This is not at all user friendly.

LVMalekBe · 2025-01-03T18:51:59Z

Hi everyone,

I've been following the discussion regarding the handling of dotted fields and the transformations applied by the OTEL sources in Data Prepper. I've encountered a related issue that I'd like to bring to your attention and seek guidance or a potential fix.

Issue:

The current transformations applied by Data Prepper during the flattening process introduce attributes with dotted field names that the existing processors like copy_values, rename_key, and grok struggle to handle efficiently. This often results in errors or incomplete data processing. Additionally, while attributes are flattened and dotted, the traceGroupFields (like shown in the example below) are kept nested. This inconsistency suggests that other attributes could also be kept nested without flattening, providing a more consistent and flexible approach for users.

Example:

Given the following Span:

{
    "traceId": "12345678abcd1234abcd5678abcd1234",
    "droppedLinksCount": 2,
    "kind": "SPAN_KIND_SERVER",
    "droppedEventsCount": 1,
    "traceGroupFields": {
        "endTime": "2025-02-15T11:45:30.199Z",
        "durationInNanos": 2100000000,
        "statusCode": 1
    },
    "traceGroup": "paymentProcessing",
    "serviceName": "payment-service",
    "parentSpanId": "abcdefabcdef1234",
    "spanId": "1234abcd5678efgh",
    "traceState": "sampled=true",
    "name": "processPayment",
    "startTime": "2025-02-15T11:45:28.099Z",
    "links": [],
    "endTime": "2025-02-15T11:45:30.199Z",
    "droppedAttributesCount": 1,
    "durationInNanos": 2100000000,
    "events": [],
    "span.attributes.ip": "192.168.1.100",
    "span.attributes.app_id": "paymentApp",
    "span.attributes.client_id": "financeClient",
    "resource.attributes.serviceName": "payment-backend-service",
    "status.code": 1
}

Attempts to process dotted fields like "span.attributes.client_id" and "span.attributes.app_id" using existing processors have not been successful. For instance, the copy_values processor fails to handle these fields correctly, resulting in errors like:

java.lang.IllegalArgumentException: key span\.attributes\.app_id must contain only alphanumeric chars with .-_@/ and must follow JsonPointer (ie. 'field/to/key')

Proposed Solution:

Given the issues, it would be highly beneficial to provide an option to disable the automatic dotting and flattening transformations in the OTEL sources or just by replacing dots with dashes could offer a clear and user-friendly alternative for handling nested field names.

besha100 · 2025-01-20T14:56:39Z

First I think we started talking about 2 different things here, so It may make sense to create two different issues:

1- The fields to be moved to the root of the spans instead of being under span.attributes or resource.attributes

2- The flattening of the fields with @ notation instead of keeping them nested with dots
I think we’ve started discussing two different topics here, so it would make sense to split this into two issues:

WDYT?

ps48 · 2025-01-24T00:40:57Z

@kkondaka @dlvenable Based on the above discussions can you please let us know how is the end schema expected to look like with flattening of fields by moving attributes to root and flattening of the fields with @ notation. Also:

A "before and after" of the schema will be helpful to understand the impact on the trace analytics plugin. This will need an update in queries as the UI is tightly coupled with the schema.
What's the plan for migration of existing customers, who are still using the older mappings? Is this a breaking change?
Since, Are there any plans to merge this schema with simple schema: https://opensearch.org/docs/latest/observing-your-data/ss4o/ and https://github.com/opensearch-project/opensearch-catalog/tree/main/docs/schema/observability/traces#readme?

Thanks!

KarstenSchnitter · 2025-01-24T22:03:01Z

@ps48, thank you for that summary. I think, Data Prepper should eventually index OpenTelemetry data as described in the simple schema. I mentioned before, that this is currently not the case. I agree with @kkondaka, that this should be a 3.0 change leaving 2.x in the current state. I think, this would be highly beneficial for the observability use-case. Still, this will not solve all the issues, that were brought up in this discussion.

The OTel semantic conventions usually specify attribute names containing dots for namespaces, e.g., http.response.status_code. That means, the issues in processing the data remain as described by @LVMalekBe. Furthermore, the OTel semantic conventions include open-ended names such as http.request.header. or k8s.pod.label.. Especially the pod labels are notorious for creating field type conflicts in OpenSearch. If you cannot align your OS index mappings with the used labels you need help from Data Prepper to avoid the conflicts. This is why the @ replacement was introduced, I think.

Separating the data transformations for OS from the OTel*Sources as proposed by @dlvenable is the way to go for me. Of course, this raises the question, what the internal format emitted by the sources should be. I can see two options: The simple schema or OTLP JSON. I prefer the simple schema, since OTLP allows for arbitrary batches. I think, the sources should still emit one event at a time.

As a closing remark I am open to work on these changes, once we decide, what approach to take.

kkondaka added the untriaged label Dec 12, 2024

github-project-automation bot added this to Data Prepper Tracking Board Dec 12, 2024

github-project-automation bot moved this to Unplanned in Data Prepper Tracking Board Dec 12, 2024

kkondaka removed the untriaged label Dec 17, 2024

TomasLongo mentioned this issue Jan 10, 2025

Introduce otlp/http support #5322

Draft

4 tasks

KarstenSchnitter mentioned this issue Jan 24, 2025

[Question] Opentelemetry format for resource.attributes (replacing dot with at sign) #4632

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OTEL sources should create events without any transformations #5259

OTEL sources should create events without any transformations #5259

kkondaka commented Dec 12, 2024 •

edited

Loading

KarstenSchnitter commented Dec 12, 2024 •

edited

Loading

dlvenable commented Dec 13, 2024

kkondaka commented Dec 17, 2024 •

edited

Loading

kkondaka commented Dec 17, 2024

alamzeeshan commented Dec 27, 2024 •

edited

Loading

LVMalekBe commented Jan 3, 2025

besha100 commented Jan 20, 2025 •

edited

Loading

ps48 commented Jan 24, 2025

KarstenSchnitter commented Jan 24, 2025

OTEL sources should create events without any transformations #5259

OTEL sources should create events without any transformations #5259

Comments

kkondaka commented Dec 12, 2024 • edited Loading

KarstenSchnitter commented Dec 12, 2024 • edited Loading

dlvenable commented Dec 13, 2024

kkondaka commented Dec 17, 2024 • edited Loading

kkondaka commented Dec 17, 2024

alamzeeshan commented Dec 27, 2024 • edited Loading

LVMalekBe commented Jan 3, 2025

Issue:

Example:

Proposed Solution:

besha100 commented Jan 20, 2025 • edited Loading

ps48 commented Jan 24, 2025

KarstenSchnitter commented Jan 24, 2025

kkondaka commented Dec 12, 2024 •

edited

Loading

KarstenSchnitter commented Dec 12, 2024 •

edited

Loading

kkondaka commented Dec 17, 2024 •

edited

Loading

alamzeeshan commented Dec 27, 2024 •

edited

Loading

besha100 commented Jan 20, 2025 •

edited

Loading