Skip to content

Commit

Permalink
docs: change node operator docs structure
Browse files Browse the repository at this point in the history
  • Loading branch information
muXxer committed Feb 3, 2025
1 parent 82a2850 commit 69cb811
Show file tree
Hide file tree
Showing 22 changed files with 130 additions and 247 deletions.
Original file line number Diff line number Diff line change
@@ -1,14 +1,3 @@
---
title: Metrics
description: Prometheus-based metrics for the node.
tags: [node-operation]
teams:
- iotaledger/node
---

import Quiz from '@site/src/components/Quiz';
import questions from '/json/node-operators/telemetry/iota-metrics.json';

The `iota-metrics` crate defines a `Metrics` struct with various [`IntGaugeVec`][] metrics to monitor running tasks,
pending futures, channel sizes, and scope activities. A gauge is a type of metric that represents a single numerical value
that can go up or down.
Expand Down Expand Up @@ -235,6 +224,4 @@ The `iota-node` crate, for example, starts a push task `start_metrics_push_task`
This push task assigns the current timestamp to each metric, encodes the metrics into the Protobuf format, adds compression and
pushes the compressed metrics data via an HTTP POST.

[`IntGaugeVec`]: https://docs.rs/prometheus/latest/prometheus/type.IntGaugeVec.html

<Quiz questions={questions} />
[`IntGaugeVec`]: https://docs.rs/prometheus/latest/prometheus/type.IntGaugeVec.html
116 changes: 116 additions & 0 deletions crates/telemetry-subscribers/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -96,3 +96,119 @@ This library installs a custom panic hook which records a log (event) at ERROR l
crate. This allows span information from the panic to be properly recorded as well.

To exit the process on panic, set the `CRASH_ON_PANIC` environment variable.




# TODO: Combine with the above

The `telemetry-subscribers` crate is used for logging and tracing the node.
It provides a [`tracing`](https://docs.rs/tracing/latest/tracing/) subscriber with flexible options to monitor the nodes performance and behavior.
It is currently used in different main functions of the node, including tests and is typically initialized at the
beginning of a main function like:

```rust
// initialize tracing
let _guard = telemetry_subscribers::TelemetryConfig::new()
.with_env()
.init();
```

## Configuration

The `TelemetryConfig` allows setting various configuration options, such as tracing with the OpenTelemetry protocol (OTLP), outputting JSON logs,
writing logs to files, setting log levels, defining span levels, setting panic hooks, or specifying Prometheus
registries.

```rust
/// Configuration for different logging/tracing options
/// ===
/// - json_log_output: Output JSON logs to stdout only.
/// - log_file: If defined, write output to a file starting with this name, ex
/// app.log
/// - log_level: error/warn/info/debug/trace, defaults to info
#[derive(Default, Clone, Debug)]
pub struct TelemetryConfig {
/// Enable OpenTelemetry tracing
pub enable_otlp_tracing: bool,
/// Enables Tokio Console debugging on port 6669
pub tokio_console: bool,
/// Output JSON logs.
pub json_log_output: bool,
/// If defined, write output to a file starting with this name, ex app.log
pub log_file: Option<String>,
/// Log level to set, defaults to info
pub log_string: Option<String>,
/// Span level - what level of spans should be created. Note this is not
/// same as logging level If set to None, then defaults to INFO
pub span_level: Option<Level>,
/// Set a panic hook
pub panic_hook: bool,
/// Crash on panic
pub crash_on_panic: bool,
/// Optional Prometheus registry - if present, all enabled span latencies
/// are measured
pub prom_registry: Option<prometheus::Registry>,
pub sample_rate: f64,
/// Add directive to include trace logs with provided target
pub trace_target: Option<Vec<String>>,
}
```

It implements a `with_env` function in order to set the config fields based on environment variables.
After a `TelemetryConfig` instance is created and values are set, the `init` function will enable it.

For that, the `init` first sets up a `EnvFilter` to manage which log messages are shown, based on the log level.
Per default, the log level is set to `info`, but it can be adjusted by setting the `log_string` variable.
Then, another filter is created for span levels.

A span is a single unit of work and represents a specific operation, like a database query, and has a start and end
time.
Spans can be linked together to show the flow of operations in a system. Each span can include:
* A name describing the operation
* Timing information (start and end)
* Attributes (key-value pairs) for extra context or relationships to other spans (parent-child).

Spans help understand the flow of requests through a system better.
The span filter, created by the `init` function ensures that only relevant spans are processed, which helps manage
performance and logging noise.

After the span filter is set up, a collection of layers is initialized. These layers send data to tokio-console for
debugging or integrate with Prometheus for measuring span latencies. Each layer will be enabled or disabled based
on the configuration.
If OTLP tracing is enabled, an `OpenTelemetryLayer` will be set up for tracing to either a file or an OTLP endpoint
based on environment settings.

After setting up all layers, a tracing subscriber is created with the configured layers and set as the global default.
Ultimately, the function creates and returns `TelemetryGuards` and `TracingHandle` structs, which manage the tracing
subscriber.
They should be kept active in the main function to ensure logging and tracing throughout the application's lifecycle.

## Prometheus Layer

The `telemetry-subscriber` allows to measure Tokio-tracing [span](https://docs.rs/tracing/latest/tracing/span/index.html) latencies that will be recorded into Prometheus histograms directly.
For that, a `prometheus::Registry` must be passed to the `TelemetryConfig` using the `with_prom_registry` function.
The name of the Prometheus histogram is `tracing_span_latencies(_sum/count/bucket)`. For example, in the `iota-node` it is set up as follows:

```rust
let runtimes = IotaRuntimes::new( & config);
let metrics_rt = runtimes.metrics.enter();
let registry_service = iota_metrics::start_prometheus_server(config.metrics_address);
let prometheus_registry = registry_service.default_registry();

// Initialize logging
let (_guard, filter_handle) = telemetry_subscribers::TelemetryConfig::new()
.with_env()
.with_prom_registry(&prometheus_registry)
.init();

drop(metrics_rt);
```

In order to set up the subscriber, it enters the metrics runtime first and creates a `RegistryService` from
the [`iota-metrics` crate](iota-metrics.mdx).
It initializes the default Prometheus registry in the `RegistryService` and returns it.
The `metrics_address` is the address where the Prometheus metrics will be exposed and can be specified in the node's
configuration file.
Per default, it is set to `0.0.0.0:9400`.
After the default registry is created, it is passed to the `TelemetryConfig` with the `with_prom_registry` function.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
---
title: Operator Guides
title: Overview
tags: [node-operation]
teams:
- iotaledger/node
Expand Down
123 changes: 0 additions & 123 deletions docs/content/operator/telemetry/telemetry-subscribers.mdx

This file was deleted.

6 changes: 6 additions & 0 deletions docs/content/operator/validator-node/docker.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
---
sidebar_label: Docker Configuration
tags: [node-operation]
teams:
- iotaledger/node
---
File renamed without changes.
6 changes: 6 additions & 0 deletions docs/content/operator/validator-node/systemd.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
---
sidebar_label: Systemd Configuration
tags: [node-operation]
teams:
- iotaledger/node
---
Original file line number Diff line number Diff line change
Expand Up @@ -283,5 +283,4 @@ sleep 2
iota validator display-metadata
```


<Quiz questions={questions} />
11 changes: 0 additions & 11 deletions docs/content/operator/validator-operation/docker/README.mdx

This file was deleted.

11 changes: 0 additions & 11 deletions docs/content/operator/validator-operation/systemd/README.mdx

This file was deleted.

28 changes: 0 additions & 28 deletions docs/site/static/json/node-operators/telemetry/iota-metrics.json

This file was deleted.

29 changes: 0 additions & 29 deletions docs/site/static/json/node-operators/telemetry/iota-telemetry.json

This file was deleted.

Loading

0 comments on commit 69cb811

Please sign in to comment.