Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a markdown with high level kubernetes metadata enrichment explanation #38757

Merged
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
88 changes: 88 additions & 0 deletions metricbeat/module/kubernetes/util/enrichers.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,88 @@
## Kubernetes Metadata enrichment

The metadata enrichment process involves associating contextual information, such as Kubernetes metadata (e.g., labels, annotations, resource names), with metrics and events collected by Elastic Agent and Beats in Kubernetes environments. This process enhances the understanding and analysis of collected data by providing additional context.

### Key Components:

1. **Metricsets/Datasets:**
- Metricsets/Datasets are responsible for collecting metrics and events from various sources within Kubernetes, such as kubelet and kube-state-metrics.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still dont explain why we have two terms. Users dont understand why this changes in beats vs agents. I think we should make this clear

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gizas This markdown is in beats repo, under kubernetes module. There, only metricsets exist. Personally I would remove the dataset as it is confusing. What kind of sentence do you want me to add?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missed that we are under beats. I had in my mind that this could be a more general doc.
So ok you can remove datasets overall


2. **Enrichers:**
- Enrichers are components responsible for enriching collected data with Kubernetes metadata. Each metricset is associated with its enricher, which handles the metadata enrichment process.

3. **Watchers:**
- Watchers are mechanisms used to monitor Kubernetes resources and detect changes, such as the addition, update, or deletion of resources like pods or nodes.

4. **Metadata Generators:**
- Metadata generators are responsible for generating metadata associated with Kubernetes resources. These generators are utilized by enrichers to collect relevant metadata. Each enricher has one metadata generator.

### Metadata Generation Process:

1. **Initialization:**
- Metricsets are initialized with their respective enrichers during startup. Enrichers are responsible for managing the metadata enrichment process for their associated metricsets.

2. **Watcher Creation:**
- Multiple enrichers are associated with one watcher. For example a pod watcher is associated with pod, state_pod, container and state_container metricsets and their enrichers.
- Watchers are created to monitor Kubernetes resources relevant to the metricset's data collection requirements. For example pod metricset triggers the creation of watcher for pods, nodes and namespaces.

3. **Metadata Generation:**
- When a watcher detects a change in a monitored resource (e.g., a new pod creation or a label update), it triggers the associated enrichers' metadata generation process.

4. **Enrichment Generation Process:**
- The enricher collects relevant metadata from Kubernetes API objects corresponding to the detected changes. This metadata includes information like labels, annotations, resource names, etc.

5. **Association with Events:**
- The collected metadata is then associated with the metricset's events. This association enriches the events with contextual information, providing deeper insights into the collected data. The enriched events generated from beats/agent are then sent to the configured output (e.g. Elasticsearch).

### Handling Edge Cases:

1. **Synchronization:**
- Special mechanisms are in place to handle scenarios where resources trigger events before associated enrichers are fully initialized. Proactive synchronization ensures that existing resource metadata is captured and updated in enricher maps.
- When a watcher detects events (like object additions or updates), it updates a list (metadataObjects) with the IDs of these detected objects. Before introducing new enrichers, existing metadataObjects are reviewed. For each existing object ID, the corresponding metadata is retrieved and used to update the new enrichers, ensuring that metadata for pre-existing resources is properly captured and integrated into the new enricher's metadata map. This synchronization process guarantees accurate metadata enrichment, even for resources that triggered events before the initialization of certain enrichers.

### Watcher Management:

1. **Initialization Sequence:**
- Watchers are initialized and managed by metricsets. Extra watchers, such as those for namespaces and nodes, are initialized first to ensure metadata availability before the main watcher starts monitoring resources.

2. **Configuration Updates:**
- Watcher configurations, such as watch options or resource filtering criteria, can be updated dynamically. A mechanism is in place to seamlessly transition to updated configurations without disrupting data collection.



In the following diagram, an example of different metricsets leveraging the same watchers is depicted. Metricsets have their own enrichers but share watchers. The watchers monitor the Kubernetes API for specific resource updates.
[metadata diag](../_meta/images/enrichers.png)

### Expected watchers per metricset

The following table demonstrates which watchers are needed for each metricset by default.
Note that no watcher monitoring the same resource kind will be created twice.

| Metricset | Namespace watcher | Node watcher | Resource watcher | Notes |
|----------------------|:-----------------:|:------------:|:----------------:|-----------------------------------------------------------|
| API Server | ✕ | ✕ | ✕ | |
| Container | ✓ | ✓ | ✓ | |
| Controller manager | ✕ | ✕ | ✓ | |
| Event | ✓ | ✕ | ✓ | |
| Node | ✕ | ✓ | ✓ | Resource watcher should be the same as node watcher. |
| Pod | ✓ | ✓ | ✓ | |
| Proxy | ✕ | ✕ | ✕ | |
| Scheduler | ✕ | ✕ | ✕ | |
| State container | ✓ | ✓ | ✓ | |
| State cronjob | ✓ | ✕ | ✓ | |
| State daemonset | ✓ | ✕ | ✓ | |
| State deployment | ✓ | ✕ | ✓ | |
| State job | ✓ | ✕ | ✓ | |
| State namespace | ✓ | ✕ | ✓ | Resource watcher should be the same as namespace watcher. |
| State node | ✕ | ✓ | ✓ | Resource watcher should be the same as node watcher. |
| State PV | ✕ | ✕ | ✓ | |
| State PVC | ✓ | ✕ | ✓ | |
| State pod | ✓ | ✓ | ✓ | |
| State replicaset | ✓ | ✕ | ✓ | |
| State resource quota | ✓ | ✕ | ✓ | |
| State service | ✓ | ✕ | ✓ | |
| State statefulset | ✓ | ✕ | ✓ | |
| State storage class | ✕ | ✕ | ✓ | |
| System | ✕ | ✕ | ✕ | |
| Volume | ✕ | ✕ | ✕ | |

Loading