Skip to content

Commit

Permalink
Revert "Rename Data Prepper to OpenSearch Data Prepper (#9086)"
Browse files Browse the repository at this point in the history
This reverts commit 2bd86c3.
  • Loading branch information
kolchfa-aws authored Jan 17, 2025
1 parent 2bd86c3 commit 0e84b39
Show file tree
Hide file tree
Showing 95 changed files with 379 additions and 379 deletions.
4 changes: 2 additions & 2 deletions _config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -223,7 +223,7 @@ benchmark_collection:
data_prepper_collection:
collections:
data-prepper:
name: OpenSearch Data Prepper
name: Data Prepper
nav_fold: true

# Defaults
Expand All @@ -240,7 +240,7 @@ defaults:
path: "_data-prepper"
values:
section: "data-prepper"
section-name: "OpenSearch Data Prepper"
section-name: "Data Prepper"
-
scope:
path: "_clients"
Expand Down
2 changes: 1 addition & 1 deletion _data-prepper/common-use-cases/anomaly-detection.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ nav_order: 5

# Anomaly detection

You can use OpenSearch Data Prepper to train models and generate anomalies in near real time on time-series aggregated events. You can generate anomalies either on events generated within the pipeline or on events coming directly into the pipeline, like OpenTelemetry metrics. You can feed these tumbling window aggregated time-series events to the [`anomaly_detector` processor]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/processors/anomaly-detector/), which trains a model and generates anomalies with a grade score. Then you can configure your pipeline to write the anomalies to a separate index to create document monitors and trigger fast alerting.
You can use Data Prepper to train models and generate anomalies in near real time on time-series aggregated events. You can generate anomalies either on events generated within the pipeline or on events coming directly into the pipeline, like OpenTelemetry metrics. You can feed these tumbling window aggregated time-series events to the [`anomaly_detector` processor]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/processors/anomaly-detector/), which trains a model and generates anomalies with a grade score. Then you can configure your pipeline to write the anomalies to a separate index to create document monitors and trigger fast alerting.

## Metrics from logs

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ nav_order: 10

# Codec processor combinations

At ingestion time, data received by the [`s3` source]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/sources/s3/) can be parsed by [codecs]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/sources/s3#codec). Codecs compresses and decompresses large data sets in a certain format before ingestion them through an OpenSearch Data Prepper pipeline [processor]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/processors/processors/).
At ingestion time, data received by the [`s3` source]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/sources/s3/) can be parsed by [codecs]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/sources/s3#codec). Codecs compresses and decompresses large data sets in a certain format before ingestion them through a Data Prepper pipeline [processor]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/processors/processors/).

While most codecs can be used with most processors, the following codec processor combinations can make your pipeline more efficient when used with the following input types.

Expand Down Expand Up @@ -47,4 +47,4 @@ The [`newline` codec]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/config

## `event_json`

The `event_json` output codec converts event data and metadata into JSON format to send to a sink, such as an S3 sink. The `event_json` input codec reads the event and its metadata to create an event in OpenSearch Data Prepper.
The `event_json` output codec converts event data and metadata into JSON format to send to a sink, such as an S3 sink. The `event_json` input codec reads the event and its metadata to create an event in Data Prepper.
2 changes: 1 addition & 1 deletion _data-prepper/common-use-cases/common-use-cases.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,4 +9,4 @@ redirect_from:

# Common use cases

You can use OpenSearch Data Prepper for several different purposes, including trace analytics, log analytics, Amazon S3 log analytics, and metrics ingestion.
You can use Data Prepper for several different purposes, including trace analytics, log analytics, Amazon S3 log analytics, and metrics ingestion.
2 changes: 1 addition & 1 deletion _data-prepper/common-use-cases/event-aggregation.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ nav_order: 25

# Event aggregation

You can use OpenSearch Data Prepper to aggregate data from different events over a period of time. Aggregating events can help to reduce unnecessary log volume and manage use cases like multiline logs that are received as separate events. The [`aggregate` processor]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/processors/aggregate/) is a stateful processor that groups events based on the values for a set of specified identification keys and performs a configurable action on each group.
You can use Data Prepper to aggregate data from different events over a period of time. Aggregating events can help to reduce unnecessary log volume and manage use cases like multiline logs that are received as separate events. The [`aggregate` processor]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/processors/aggregate/) is a stateful processor that groups events based on the values for a set of specified identification keys and performs a configurable action on each group.

The `aggregate` processor state is stored in memory. For example, in order to combine four events into one, the processor needs to retain pieces of the first three events. The state of an aggregate group of events is kept for a configurable amount of time. Depending on your logs, the aggregate action being used, and the number of memory options in the processor configuration, the aggregation could take place over a long period of time.

Expand Down
20 changes: 10 additions & 10 deletions _data-prepper/common-use-cases/log-analytics.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,17 +7,17 @@ nav_order: 30

# Log analytics

OpenSearch Data Prepper is an extendable, configurable, and scalable solution for log ingestion into OpenSearch and Amazon OpenSearch Service. OpenSearch Data Prepper supports receiving logs from [Fluent Bit](https://fluentbit.io/) through the [HTTP Source](https://github.com/opensearch-project/data-prepper/blob/main/data-prepper-plugins/http-source/README.md) and processing those logs with a [Grok Processor](https://github.com/opensearch-project/data-prepper/blob/main/data-prepper-plugins/grok-processor/README.md) before ingesting them into OpenSearch through the [OpenSearch sink](https://github.com/opensearch-project/data-prepper/blob/main/data-prepper-plugins/opensearch/README.md).
Data Prepper is an extendable, configurable, and scalable solution for log ingestion into OpenSearch and Amazon OpenSearch Service. Data Prepper supports receiving logs from [Fluent Bit](https://fluentbit.io/) through the [HTTP Source](https://github.com/opensearch-project/data-prepper/blob/main/data-prepper-plugins/http-source/README.md) and processing those logs with a [Grok Processor](https://github.com/opensearch-project/data-prepper/blob/main/data-prepper-plugins/grok-processor/README.md) before ingesting them into OpenSearch through the [OpenSearch sink](https://github.com/opensearch-project/data-prepper/blob/main/data-prepper-plugins/opensearch/README.md).

The following image shows all of the components used for log analytics with Fluent Bit, OpenSearch Data Prepper, and OpenSearch.
The following image shows all of the components used for log analytics with Fluent Bit, Data Prepper, and OpenSearch.

![Log analytics component]({{site.url}}{{site.baseurl}}/images/data-prepper/log-analytics/log-analytics-components.jpg)

In the application environment, run Fluent Bit. Fluent Bit can be containerized through Kubernetes, Docker, or Amazon Elastic Container Service (Amazon ECS). You can also run Fluent Bit as an agent on Amazon Elastic Compute Cloud (Amazon EC2). Configure the [Fluent Bit http output plugin](https://docs.fluentbit.io/manual/pipeline/outputs/http) to export log data to OpenSearch Data Prepper. Then deploy OpenSearch Data Prepper as an intermediate component and configure it to send the enriched log data to your OpenSearch cluster. From there, use OpenSearch Dashboards to perform more intensive visualization and analysis.
In the application environment, run Fluent Bit. Fluent Bit can be containerized through Kubernetes, Docker, or Amazon Elastic Container Service (Amazon ECS). You can also run Fluent Bit as an agent on Amazon Elastic Compute Cloud (Amazon EC2). Configure the [Fluent Bit http output plugin](https://docs.fluentbit.io/manual/pipeline/outputs/http) to export log data to Data Prepper. Then deploy Data Prepper as an intermediate component and configure it to send the enriched log data to your OpenSearch cluster. From there, use OpenSearch Dashboards to perform more intensive visualization and analysis.

Check failure on line 16 in _data-prepper/common-use-cases/log-analytics.md

View workflow job for this annotation

GitHub Actions / vale

[vale] _data-prepper/common-use-cases/log-analytics.md#L16

[OpenSearch.Spelling] Error: http. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.
Raw output
{"message": "[OpenSearch.Spelling] Error: http. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.", "location": {"path": "_data-prepper/common-use-cases/log-analytics.md", "range": {"start": {"line": 16, "column": 271}}}, "severity": "ERROR"}

## Log analytics pipeline

Log analytics pipelines in OpenSearch Data Prepper are extremely customizable. The following image shows a simple pipeline.
Log analytics pipelines in Data Prepper are extremely customizable. The following image shows a simple pipeline.

![Log analytics component]({{site.url}}{{site.baseurl}}/images/data-prepper/log-analytics/log-ingestion-pipeline.jpg)

Expand All @@ -27,7 +27,7 @@ The [HTTP Source](https://github.com/opensearch-project/data-prepper/blob/main/d

### Processor

OpenSearch Data Prepper 1.2 and above come with a [Grok Processor](https://github.com/opensearch-project/data-prepper/blob/main/data-prepper-plugins/grok-processor/README.md). The Grok Processor is an invaluable tool for structuring and extracting important fields from your logs, making them more queryable.
Data Prepper 1.2 and above come with a [Grok Processor](https://github.com/opensearch-project/data-prepper/blob/main/data-prepper-plugins/grok-processor/README.md). The Grok Processor is an invaluable tool for structuring and extracting important fields from your logs, making them more queryable.

Check warning on line 30 in _data-prepper/common-use-cases/log-analytics.md

View workflow job for this annotation

GitHub Actions / vale

[vale] _data-prepper/common-use-cases/log-analytics.md#L30

[OpenSearch.DirectionAboveBelow] Use 'later' instead of '1.2 and above' for versions or orientation within a document. Use 'above' and 'below' only for physical space or screen descriptions.
Raw output
{"message": "[OpenSearch.DirectionAboveBelow] Use 'later' instead of '1.2 and above' for versions or orientation within a document. Use 'above' and 'below' only for physical space or screen descriptions.", "location": {"path": "_data-prepper/common-use-cases/log-analytics.md", "range": {"start": {"line": 30, "column": 14}}}, "severity": "WARNING"}

Check failure on line 30 in _data-prepper/common-use-cases/log-analytics.md

View workflow job for this annotation

GitHub Actions / vale

[vale] _data-prepper/common-use-cases/log-analytics.md#L30

[OpenSearch.Spelling] Error: queryable. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.
Raw output
{"message": "[OpenSearch.Spelling] Error: queryable. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.", "location": {"path": "_data-prepper/common-use-cases/log-analytics.md", "range": {"start": {"line": 30, "column": 288}}}, "severity": "ERROR"}

The Grok Processor comes with a wide variety of [default patterns](https://github.com/thekrakken/java-grok/blob/master/src/main/resources/patterns/patterns) that match common log formats like Apache logs or syslogs, but it can easily accept any custom patterns that cater to your specific log format.

Expand Down Expand Up @@ -92,9 +92,9 @@ The following are the main changes you need to make:

## Fluent Bit

You will need to run Fluent Bit in your service environment. See [Getting Started with Fluent Bit](https://docs.fluentbit.io/manual/installation/getting-started-with-fluent-bit) for installation instructions. Ensure that you can configure the [Fluent Bit http output plugin](https://docs.fluentbit.io/manual/pipeline/outputs/http) to your OpenSearch Data Prepper HTTP source. The following is an example `fluent-bit.conf` that tails a log file named `test.log` and forwards it to a locally running OpenSearch Data Prepper HTTP source, which runs by default on port 2021.
You will need to run Fluent Bit in your service environment. See [Getting Started with Fluent Bit](https://docs.fluentbit.io/manual/installation/getting-started-with-fluent-bit) for installation instructions. Ensure that you can configure the [Fluent Bit http output plugin](https://docs.fluentbit.io/manual/pipeline/outputs/http) to your Data Prepper HTTP source. The following is an example `fluent-bit.conf` that tails a log file named `test.log` and forwards it to a locally running Data Prepper HTTP source, which runs by default on port 2021.

Check failure on line 95 in _data-prepper/common-use-cases/log-analytics.md

View workflow job for this annotation

GitHub Actions / vale

[vale] _data-prepper/common-use-cases/log-analytics.md#L95

[OpenSearch.Spelling] Error: http. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.
Raw output
{"message": "[OpenSearch.Spelling] Error: http. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.", "location": {"path": "_data-prepper/common-use-cases/log-analytics.md", "range": {"start": {"line": 95, "column": 256}}}, "severity": "ERROR"}

Note that you should adjust the file `path`, output `Host`, and `Port` according to how and where you have Fluent Bit and OpenSearch Data Prepper running.
Note that you should adjust the file `path`, output `Host`, and `Port` according to how and where you have Fluent Bit and Data Prepper running.

### Example: Fluent Bit file without SSL and basic authentication enabled

Expand Down Expand Up @@ -145,8 +145,8 @@ The following is an example `fluent-bit.conf` file with SSL and basic authentica

# Next steps

See the [OpenSearch Data Prepper Log Ingestion Demo Guide](https://github.com/opensearch-project/data-prepper/blob/main/examples/log-ingestion/README.md) for a specific example of Apache log ingestion from `FluentBit -> OpenSearch Data Prepper -> OpenSearch` running through Docker.
See the [Data Prepper Log Ingestion Demo Guide](https://github.com/opensearch-project/data-prepper/blob/main/examples/log-ingestion/README.md) for a specific example of Apache log ingestion from `FluentBit -> Data Prepper -> OpenSearch` running through Docker.

In the future, OpenSearch Data Prepper will offer additional sources and processors that will make more complex log analytics pipelines available. Check out the [OpenSearch Data Prepper Project Roadmap](https://github.com/orgs/opensearch-project/projects/221) to see what is coming.
In the future, Data Prepper will offer additional sources and processors that will make more complex log analytics pipelines available. Check out the [Data Prepper Project Roadmap](https://github.com/orgs/opensearch-project/projects/221) to see what is coming.

If there is a specific source, processor, or sink that you would like to include in your log analytics workflow and is not currently on the roadmap, please bring it to our attention by creating a GitHub issue. Additionally, if you are interested in contributing to OpenSearch Data Prepper, see our [Contributing Guidelines](https://github.com/opensearch-project/data-prepper/blob/main/CONTRIBUTING.md) as well as our [developer guide](https://github.com/opensearch-project/data-prepper/blob/main/docs/developer_guide.md) and [plugin development guide](https://github.com/opensearch-project/data-prepper/blob/main/docs/plugin_development.md).
If there is a specific source, processor, or sink that you would like to include in your log analytics workflow and is not currently on the roadmap, please bring it to our attention by creating a GitHub issue. Additionally, if you are interested in contributing to Data Prepper, see our [Contributing Guidelines](https://github.com/opensearch-project/data-prepper/blob/main/CONTRIBUTING.md) as well as our [developer guide](https://github.com/opensearch-project/data-prepper/blob/main/docs/developer_guide.md) and [plugin development guide](https://github.com/opensearch-project/data-prepper/blob/main/docs/plugin_development.md).

Check warning on line 152 in _data-prepper/common-use-cases/log-analytics.md

View workflow job for this annotation

GitHub Actions / vale

[vale] _data-prepper/common-use-cases/log-analytics.md#L152

[OpenSearch.Please] Using 'please' is unnecessary. Remove.
Raw output
{"message": "[OpenSearch.Please] Using 'please' is unnecessary. Remove.", "location": {"path": "_data-prepper/common-use-cases/log-analytics.md", "range": {"start": {"line": 152, "column": 150}}}, "severity": "WARNING"}
2 changes: 1 addition & 1 deletion _data-prepper/common-use-cases/log-enrichment.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ nav_order: 35

# Log enrichment

You can perform different types of log enrichment with OpenSearch Data Prepper, including:
You can perform different types of log enrichment with Data Prepper, including:

- Filtering.
- Extracting key-value pairs from strings.
Expand Down
2 changes: 1 addition & 1 deletion _data-prepper/common-use-cases/metrics-logs.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ nav_order: 15

# Deriving metrics from logs

You can use OpenSearch Data Prepper to derive metrics from logs.
You can use Data Prepper to derive metrics from logs.

The following example pipeline receives incoming logs using the [`http` source plugin]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/sources/http-source) and the [`grok` processor]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/processors/grok/). It then uses the [`aggregate` processor]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/processors/aggregate/) to extract the metric bytes aggregated during a 30-second window and derives histograms from the results.

Expand Down
2 changes: 1 addition & 1 deletion _data-prepper/common-use-cases/metrics-traces.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ nav_order: 20

# Deriving metrics from traces

You can use OpenSearch Data Prepper to derive metrics from OpenTelemetry traces. The following example pipeline receives incoming traces and extracts a metric called `durationInNanos`, aggregated over a tumbling window of 30 seconds. It then derives a histogram from the incoming traces.
You can use Data Prepper to derive metrics from OpenTelemetry traces. The following example pipeline receives incoming traces and extracts a metric called `durationInNanos`, aggregated over a tumbling window of 30 seconds. It then derives a histogram from the incoming traces.

The pipeline contains the following pipelines:

Expand Down
Loading

0 comments on commit 0e84b39

Please sign in to comment.