diff --git a/docs/tempo/website/_index.md b/docs/tempo/website/_index.md index e6191c74a94..ed51edaa678 100644 --- a/docs/tempo/website/_index.md +++ b/docs/tempo/website/_index.md @@ -13,9 +13,10 @@ Grafana Tempo is an open source, easy-to-use and high-volume distributed tracing - [Getting Started](getting-started/) - [Configuration](configuration/) -- [Monitoring](monitoring/) +- [Deployment](deployment/) +- [Operations](operations/) +- [API](api_docs/) - [Integration Guides/Trace Discovery](guides/) -- [Tempo CLI](cli/) - [Architecture](architecture/) - [Troubleshooting](troubleshooting/) - [Community](community/) \ No newline at end of file diff --git a/docs/tempo/website/api_docs/_index.md b/docs/tempo/website/api_docs/_index.md new file mode 100644 index 00000000000..61eac0a021d --- /dev/null +++ b/docs/tempo/website/api_docs/_index.md @@ -0,0 +1,186 @@ +--- +title: API documentation +weight: 350 +--- + +# Tempo's API + +Tempo exposes an API for pushing and querying traces, and operating the cluster itself. + +For the sake of clarity, in this document we have grouped API endpoints by service, but keep in mind that they're exposed both when running Tempo in microservices and singly-binary mode: +- **Microservices**: each service exposes its own endpoints +- **Single-binary**: the Tempo process exposes all API endpoints for the services running internally + +## Endpoints + +| API | Service | Type | Endpoint | +| --- | ------- | ---- | -------- | +| [Configuration](#configuration) | _All services_ | HTTP | `GET /config` | +| [Readiness probe](#readiness-probe) | _All services_ | HTTP | `GET /ready` | +| [Metrics](#metrics) | _All services_ | HTTP | `GET /metrics` | +| [Pprof](#pprof) | _All services_ | HTTP | `GET /debug/pprof` | +| [Ingest traces](#ingest) | Distributor | - | See section for details | +| [Querying traces](#query) | Query-frontend | HTTP | `GET /api/traces/` | +| [Query Path Readiness Check](#query-path-readiness-check) | Query-frontend | HTTP | `GET /api/echo` | +| [Memberlist](#memberlist) | Distributor, Ingester, Querier, Compactor | HTTP | `GET /memberlist` | +| [Flush](#flush) | Ingester | HTTP | `GET,POST /flush` | +| [Shutdown](#shutdown) | Ingester | HTTP | `GET,POST /shutdown` | +| [Distributor ring status](#distributor-ring-status) | Distributor | HTTP | `GET /distributor/ring` | +| [Ingesters ring status](#ingesters-ring-status) | Distributor, Querier | HTTP | `GET /ingester/ring` | +| [Compactor ring status](#compactor-ring-status) | Compactor | HTTP | `GET /compactor/ring` | + + +### Configuration + +``` +GET /config +``` + +Displays the configuration currently applied to Tempo (in YAML format), including default values and settings via CLI flags. +Sensitive data is masked. Please be aware that the exported configuration **doesn't include the per-tenant overrides**. + + +### Readiness probe + +``` +GET /ready +``` + +Returns status code 200 when Tempo is ready to serve traffic. + +### Metrics + +``` +GET /metrics +``` + +Returns the metrics for the running Tempo service in the Prometheus exposition format. + +### Pprof + +``` +GET /debug/pprof/heap +GET /debug/pprof/block +GET /debug/pprof/profile +GET /debug/pprof/trace +GET /debug/pprof/goroutine +GET /debug/pprof/mutex +``` + +Returns the runtime profiling data in the format expected by the pprof visualization tool. +There are many things which can be profiled using this including heap, trace, goroutine, etc. + +_For more information, please check out the official documentation of [pprof](https://golang.org/pkg/net/http/pprof/)._ + +### Ingest + +Tempo distributor uses the OpenTelemetry Receivers as a shim to ingest trace data. +Note that these APIs are meant to be consumed by the corresponding client SDK or a pipeline service like Grafana +Agent / OpenTelemetry Collector / Jaeger Agent. + +| Protocol | Type | Docs | +| -------- | ---- | ---- | +| OpenTelemetry | GRPC | [Link](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/protocol/otlp.md) | +| OpenTelemetry | HTTP | [Link](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/protocol/otlp.md) | +| Jaeger | Thrift Compact | [Link](https://www.jaegertracing.io/docs/latest/apis/#span-reporting-apis) | +| Jaeger | Thrift Binary | [Link](https://www.jaegertracing.io/docs/latest/apis/#span-reporting-apis) | +| Jaeger | Thrift HTTP | [Link](https://www.jaegertracing.io/docs/latest/apis/#span-reporting-apis) | +| Jaeger | GRPC | [Link](https://www.jaegertracing.io/docs/latest/apis/#span-reporting-apis) | +| Zipkin | HTTP | [Link](https://zipkin.io/zipkin-api/) | + +_For information on how to use the Zipkin endpoint with curl (for debugging purposes) check [here](pushing-spans-with-http)._ + +### Query + +Tempo's Query API is simple. The following request is used to retrieve a trace from the query frontend service in +a microservices deployment, or the Tempo endpoint in a single binary deployment. + +``` +GET /api/traces/ +``` + +The following query API is also provided on the querier service for _debugging_ purposes. + +``` +GET /querier/api/traces/?mode=xxxx&blockStart=0000&blockEnd=FFFF +``` +Parameters: +- `mode = (blocks|ingesters|all)` + Specifies whether the querier should look for the trace in blocks, ingesters or both (all). + Default = `all` +- `blockStart = (GUID)` + Specifies the blockID start boundary. If specified, the querier will only search blocks with IDs > blockStart. + Default = `00000000-0000-0000-0000-000000000000` + Example: `blockStart=12345678-0000-0000-1235-000001240000` +- `blockEnd = (GUID)` + Specifies the blockID finish boundary. If specified, the querier will only search blocks with IDs < blockEnd. + Default = `FFFFFFFF-FFFF-FFFF-FFFF-FFFFFFFFFFFF` + Example: `blockStart=FFFFFFFF-FFFF-FFFF-FFFF-456787652341` + +Note that this API is not meant to be used directly unless for debugging the sharding functionality of the query +frontend. + +### Query Path Readiness Check + +``` +GET /api/echo +``` + +Returns status code 200 and body `echo` when the query frontend is up and ready to receive requests. + +**Note**: Meant to be used in a Query Visualization UI like Grafana to test that the Tempo datasource is working. + + +### Flush + +``` +GET,POST /flush +``` + +Triggers a flush of all in-memory traces to the WAL. Useful at the time of rollout restarts and unexpected crashes. + +### Shutdown + +``` +GET,POST /shutdown +``` + +Flushes all in-memory traces and the WAL to the long term backend. Gracefully exits from the ring. Shuts down the +ingester service. + +**Note**: This is usually used at the time of scaling down a cluster. + +### Distributor ring status + +``` +GET /distributor/ring +``` + +Displays a web page with the distributor hash ring status, including the state, healthy and last heartbeat time of each +distributor. + +_For more information, check the page on [consistent hash ring](../operations/consistent_hash_ring)._ + +### Ingesters ring status + +``` +GET /ingester/ring +``` + +Displays a web page with the ingesters hash ring status, including the state, healthy and last heartbeat time of each ingester. + +_For more information, check the page on [consistent hash ring](../operations/consistent_hash_ring)._ + + + +### Compactor ring status + +``` +GET /compactor/ring +``` + +Displays a web page with the compactor hash ring status, including the state, healthy and last heartbeat time of each +compactor. + +_For more information, check the page on [consistent hash ring](../operations/consistent_hash_ring)._ + diff --git a/docs/tempo/website/api_docs/pushing-spans-with-http.md b/docs/tempo/website/api_docs/pushing-spans-with-http.md new file mode 100644 index 00000000000..150f2ea446b --- /dev/null +++ b/docs/tempo/website/api_docs/pushing-spans-with-http.md @@ -0,0 +1,104 @@ +--- +title: Pushing Spans with HTTP +--- + +Sometimes using a tracing system is intimidating because it seems like you need complex application instrumentation +or a span ingestion pipeline in order to push spans. This guide aims to show an extremely basic technique for +pushing spans with http/json from a Bash script using the [Zipkin](https://zipkin.io/) receiver. + +## Starting Tempo + +Let's first start Tempo with the Zipkin receiver configured. In order to do this create a config file like so: + +```yaml +server: + http_listen_port: 3100 + +distributor: + receivers: + zipkin: + +storage: + trace: + backend: local + local: + path: /tmp/tempo/blocks +``` + +and run Tempo using it: + +```bash +docker run -p 9411:9411 -p 3100:3100 -v $(pwd)/config.yaml:/config.yaml grafana/tempo:latest -config.file /config.yaml +``` + +## Pushing Spans + +Now that Tempo is running and listening on port 9411 for [Zipkin spans](https://zipkin.io/zipkin-api/#/default/post_spans) let's push a span to it using `curl`. + +```bash +curl -X POST http://localhost:9411 -H 'Content-Type: application/json' -d '[{ + "id": "1234", + "traceId": "0123456789abcdef", + "timestamp": 1608239395286533, + "duration": 100000, + "name": "span from bash!", + "tags": { + "http.method": "GET", + "http.path": "/api" + }, + "localEndpoint": { + "serviceName": "shell script" + } +}]' +``` + +Note that the `timestamp` field is in microseconds and was obtained by running `date +%s%6N`. The `duration` field is also in microseconds and so 100000 is 100 milliseconds. + +## Retrieving Traces + +The easiest way to get the trace is to execute a simple curl command to Tempo. The returned format is [OTLP](https://github.com/open-telemetry/opentelemetry-proto/blob/main/opentelemetry/proto/trace/v1/trace.proto). + +```bash +curl http://localhost:3100/api/traces/0123456789abcdef + +{"batches":[{"resource":{"attributes":[{"key":"service.name","value":{"stringValue":"shell script"}}]},"instrumentationLibrarySpans":[{"spans":[{"traceId":"AAAAAAAAAAABI0VniavN7w==","spanId":"AAAAAAAAEjQ=","name":"span from bash!","startTimeUnixNano":"1608239395286533000","endTimeUnixNano":"1608239395386533000","attributes":[{"key":"http.path","value":{"stringValue":"/api"}},{"key":"http.method","value":{"stringValue":"GET"}}]}]}]}]} +``` + +However, staring at a json blob in bash is not very fun. Let's start up Tempo query so we can visualize our trace. Tempo query is [Jaeger Query](https://hub.docker.com/r/jaegertracing/jaeger-query/) with a [GRPC Plugin](https://github.com/jaegertracing/jaeger/tree/master/plugin/storage/grpc) that allows it to query Tempo. + +```bash +docker run --env BACKEND=localhost:3100 --net host grafana/tempo-query:latest +``` + +And open `http://localhost:16686/trace/0123456789abcdef` in the browser of your choice to see: + +

single span

+ +## More Spans! + +Now that we have the basics down it's easy to continue building our trace. By specifying the same trace id and a parent span id we can start building a trace. + +```bash +curl -X POST http://localhost:9411 -H 'Content-Type: application/json' -d '[{ + "id": "5678", + "traceId": "0123456789abcdef", + "parentId": "1234", + "timestamp": 1608239395316533, + "duration": 100000, + "name": "child span from bash!", + "localEndpoint": { + "serviceName": "shell script" + } +}]' +``` + +And now the UI shows: +

parent and child spans

+ +## Spans from everything! + +Tracing is not limited to enterprise languages with complex frameworks. As you can see it's easy to store and track events from your js, python or bash scripts. +You can use Tempo/distributed tracing today to trace CI pipelines, long running bash processes, python data processing flows or anything else +you can think of. + +Happy tracing! diff --git a/docs/tempo/website/guides/pushing-spans-with-http.png b/docs/tempo/website/api_docs/pushing-spans-with-http.png similarity index 100% rename from docs/tempo/website/guides/pushing-spans-with-http.png rename to docs/tempo/website/api_docs/pushing-spans-with-http.png diff --git a/docs/tempo/website/guides/pushing-spans-with-http2.png b/docs/tempo/website/api_docs/pushing-spans-with-http2.png similarity index 100% rename from docs/tempo/website/guides/pushing-spans-with-http2.png rename to docs/tempo/website/api_docs/pushing-spans-with-http2.png diff --git a/docs/tempo/website/community/_index.md b/docs/tempo/website/community/_index.md index 0072cfc75b5..90d18460861 100644 --- a/docs/tempo/website/community/_index.md +++ b/docs/tempo/website/community/_index.md @@ -3,7 +3,16 @@ title: Community weight: 600 --- -# Contribute +## Communicate + +- [Grafana Slack](https://slack.grafana.com/) #tempo channel +- [Community Forum](https://community.grafana.com/c/grafana-tempo/40) - for questions/feedback. +- [Community Call](https://docs.google.com/document/d/1yGsI6ywU-PxZBjmq3p3vAXr9g5yBXSDk4NU8LGo8qeY/edit#) - Monthly + on the second thursday at 1630 UTC. + Recordings available [online](https://www.youtube.com/playlist?list=PLDGkOdUX1Ujqe8WZ8T1h2pNjpll0t-KLw). +- [Google Groups](https://groups.google.com/forum/#!forum/tempo-users) + +## Contribute This page lists resources for developers who want to contribute to the Tempo software ecosystem. - [Governance](https://github.com/grafana/tempo/blob/main/GOVERNANCE.md) diff --git a/docs/tempo/website/community/communication.md b/docs/tempo/website/community/communication.md deleted file mode 100644 index 74775ac5741..00000000000 --- a/docs/tempo/website/community/communication.md +++ /dev/null @@ -1,8 +0,0 @@ ---- -title: Communication ---- - -## Communicate - -- [Grafana Slack](https://slack.grafana.com/) #tempo channel -- [Google Groups](https://groups.google.com/forum/#!forum/tempo-users) \ No newline at end of file diff --git a/docs/tempo/website/guides/_index.md b/docs/tempo/website/guides/_index.md index 198e284c30f..0ebe04c47d9 100644 --- a/docs/tempo/website/guides/_index.md +++ b/docs/tempo/website/guides/_index.md @@ -6,5 +6,4 @@ weight: 400 Because Tempo is a trace id only lookup it relies on integrations for trace discovery. Common methods of discovery are through logs and exemplars. [The examples](https://github.com/grafana/tempo/tree/main/example) are also a good place to see how some of these discovery flows work. - [Loki Derived Fields](loki-derived-fields/) -- [Pushing Spans with HTTP](pushing-spans-with-http/) - [Instrumentation Examples](instrumentation/) \ No newline at end of file diff --git a/docs/tempo/website/guides/pushing-spans-with-http.md b/docs/tempo/website/guides/pushing-spans-with-http.md index 150f2ea446b..7c9f5410595 100644 --- a/docs/tempo/website/guides/pushing-spans-with-http.md +++ b/docs/tempo/website/guides/pushing-spans-with-http.md @@ -1,104 +1,4 @@ --- title: Pushing Spans with HTTP --- - -Sometimes using a tracing system is intimidating because it seems like you need complex application instrumentation -or a span ingestion pipeline in order to push spans. This guide aims to show an extremely basic technique for -pushing spans with http/json from a Bash script using the [Zipkin](https://zipkin.io/) receiver. - -## Starting Tempo - -Let's first start Tempo with the Zipkin receiver configured. In order to do this create a config file like so: - -```yaml -server: - http_listen_port: 3100 - -distributor: - receivers: - zipkin: - -storage: - trace: - backend: local - local: - path: /tmp/tempo/blocks -``` - -and run Tempo using it: - -```bash -docker run -p 9411:9411 -p 3100:3100 -v $(pwd)/config.yaml:/config.yaml grafana/tempo:latest -config.file /config.yaml -``` - -## Pushing Spans - -Now that Tempo is running and listening on port 9411 for [Zipkin spans](https://zipkin.io/zipkin-api/#/default/post_spans) let's push a span to it using `curl`. - -```bash -curl -X POST http://localhost:9411 -H 'Content-Type: application/json' -d '[{ - "id": "1234", - "traceId": "0123456789abcdef", - "timestamp": 1608239395286533, - "duration": 100000, - "name": "span from bash!", - "tags": { - "http.method": "GET", - "http.path": "/api" - }, - "localEndpoint": { - "serviceName": "shell script" - } -}]' -``` - -Note that the `timestamp` field is in microseconds and was obtained by running `date +%s%6N`. The `duration` field is also in microseconds and so 100000 is 100 milliseconds. - -## Retrieving Traces - -The easiest way to get the trace is to execute a simple curl command to Tempo. The returned format is [OTLP](https://github.com/open-telemetry/opentelemetry-proto/blob/main/opentelemetry/proto/trace/v1/trace.proto). - -```bash -curl http://localhost:3100/api/traces/0123456789abcdef - -{"batches":[{"resource":{"attributes":[{"key":"service.name","value":{"stringValue":"shell script"}}]},"instrumentationLibrarySpans":[{"spans":[{"traceId":"AAAAAAAAAAABI0VniavN7w==","spanId":"AAAAAAAAEjQ=","name":"span from bash!","startTimeUnixNano":"1608239395286533000","endTimeUnixNano":"1608239395386533000","attributes":[{"key":"http.path","value":{"stringValue":"/api"}},{"key":"http.method","value":{"stringValue":"GET"}}]}]}]}]} -``` - -However, staring at a json blob in bash is not very fun. Let's start up Tempo query so we can visualize our trace. Tempo query is [Jaeger Query](https://hub.docker.com/r/jaegertracing/jaeger-query/) with a [GRPC Plugin](https://github.com/jaegertracing/jaeger/tree/master/plugin/storage/grpc) that allows it to query Tempo. - -```bash -docker run --env BACKEND=localhost:3100 --net host grafana/tempo-query:latest -``` - -And open `http://localhost:16686/trace/0123456789abcdef` in the browser of your choice to see: - -

single span

- -## More Spans! - -Now that we have the basics down it's easy to continue building our trace. By specifying the same trace id and a parent span id we can start building a trace. - -```bash -curl -X POST http://localhost:9411 -H 'Content-Type: application/json' -d '[{ - "id": "5678", - "traceId": "0123456789abcdef", - "parentId": "1234", - "timestamp": 1608239395316533, - "duration": 100000, - "name": "child span from bash!", - "localEndpoint": { - "serviceName": "shell script" - } -}]' -``` - -And now the UI shows: -

parent and child spans

- -## Spans from everything! - -Tracing is not limited to enterprise languages with complex frameworks. As you can see it's easy to store and track events from your js, python or bash scripts. -You can use Tempo/distributed tracing today to trace CI pipelines, long running bash processes, python data processing flows or anything else -you can think of. - -Happy tracing! +This page has been moved. Check [here](../../api_docs/pushing-spans-with-http) for the new location. \ No newline at end of file diff --git a/docs/tempo/website/operations/_index.md b/docs/tempo/website/operations/_index.md index dd5e5ea8ba7..952cde3aba5 100644 --- a/docs/tempo/website/operations/_index.md +++ b/docs/tempo/website/operations/_index.md @@ -5,5 +5,6 @@ weight: 300 Operations for Tempo include - [Monitoring](monitoring) +- [Tempo CLI](tempo_cli) - [Ingester PVCs](ingester_pvcs) - [Consistent Hash Ring](consistent_hash_ring) \ No newline at end of file diff --git a/docs/tempo/website/operations/monitoring.md b/docs/tempo/website/operations/monitoring.md index 0c50fd4615f..de060262375 100644 --- a/docs/tempo/website/operations/monitoring.md +++ b/docs/tempo/website/operations/monitoring.md @@ -9,7 +9,7 @@ set of dashboards, rules and alerts. Together, these can be used to monitor Temp ## Dashboards -The Tempo mixin has four Grafana dashboards in the `out` folder that can be downloaded and imported into your Grafana UI. +The Tempo mixin has four Grafana dashboards in the `yamls` folder that can be downloaded and imported into your Grafana UI. Note that at the moment, these work well when Tempo is run in a k8s environment and metrics scraped have the `cluster` and `namespace` labels! @@ -59,7 +59,7 @@ This dashboard is included in this repo for two reasons: ## Rules and Alerts -The Rules and Alerts are available as [yaml files in the mixin](https://github.com/grafana/tempo/tree/main/operations/tempo-mixin/out) on the repository. +The Rules and Alerts are available as [yaml files in the mixin](https://github.com/grafana/tempo/tree/main/operations/tempo-mixin/yamls) on the repository. To set up alerting, download the provided json files and configure them for use on your Prometheus monitoring server. diff --git a/docs/tempo/website/cli/_index.md b/docs/tempo/website/operations/tempo_cli.md similarity index 98% rename from docs/tempo/website/cli/_index.md rename to docs/tempo/website/operations/tempo_cli.md index 0c15a94de6d..ab178cf5814 100644 --- a/docs/tempo/website/cli/_index.md +++ b/docs/tempo/website/operations/tempo_cli.md @@ -2,7 +2,6 @@ title: "Tempo CLI" description: "Guide to using tempo-cli" keywords: ["tempo", "cli", "tempo-cli", "command line interface"] -weight: 450 --- # Tempo CLI @@ -24,7 +23,7 @@ tempo-cli command [subcommand] -h ``` ## Running Tempo CLI -Tempo CLI is currently available as source code. A working Go installation is required to build it. It can be compiled to a native binary and executed normally, or it can be executed using the `go run` command. +Tempo CLI is currently available as source code. A working Go installation is required to build it. It can be compiled to a native binary and executed normally, or it can be executed using the `go run` command. **Example:** ```bash @@ -88,7 +87,7 @@ Explanation of output: - `Age` The age of the block. - `Duration`Duration between the start and end time. - `Idx` Number of records stored in the index (present when --load-index is specified). -- `Dupe` Number of duplicate entries in the index (present when --load-index is specified). Should be zero. +- `Dupe` Number of duplicate entries in the index (present when --load-index is specified). Should be zero. - `Cmp` Whether the block has been compacted (present when --include-compacted is specified). **Example:** diff --git a/docs/tempo/website/troubleshooting/max-trace-limit-reached.md b/docs/tempo/website/troubleshooting/max-trace-limit-reached.md index b16c0e47a0d..b6af77f0e97 100644 --- a/docs/tempo/website/troubleshooting/max-trace-limit-reached.md +++ b/docs/tempo/website/troubleshooting/max-trace-limit-reached.md @@ -4,7 +4,7 @@ weight: 474 --- # I am seeing the error: max live traces per tenant exceeded -In high volume tracing environments the default trace limits are sometimes not sufficient. For example, if you reach the [maximum number of live traces allowed](https://github.com/grafana/tempo/blob/3710d944cfe2a51836c3e4ef4a97316ed0526a58/modules/overrides/limits.go#L25) per tenant in the ingester, you will see the following messages: +In high volume tracing environments the default trace limits are sometimes not sufficient. For example, if you reach the [maximum number of live traces allowed](https://github.com/grafana/tempo/blob/626efef93cacb0e5044548bbbeb0be72b759f7c2/modules/overrides/limits.go#L32) per tenant in the ingester, you will see the following messages: `max live traces per tenant exceeded: per-user traces limit (local: 10000 global: 0 actual local: 10000) exceeded`. ### Solutions diff --git a/docs/tempo/website/troubleshooting/missing-trace.md b/docs/tempo/website/troubleshooting/missing-trace.md deleted file mode 100644 index 28e798f8c19..00000000000 --- a/docs/tempo/website/troubleshooting/missing-trace.md +++ /dev/null @@ -1,18 +0,0 @@ ---- -title: Missing traces in Tempo -weight: 472 ---- - -# Some of my traces are missing in Tempo -This could happen because of a number of reasons and some have been detailed in this blog post - -[Where did all my spans go? A guide to diagnosing dropped spans in Jaeger distributed tracing](https://grafana.com/blog/2020/07/09/where-did-all-my-spans-go-a-guide-to-diagnosing-dropped-spans-in-jaeger-distributed-tracing/). - -### Diagnosing the issue -If the pipeline is not reporting any dropped spans, check whether application spans are being dropped by Tempo. The following metrics help determine this - -- `tempo_receiver_refused_spans`. The value of `tempo_receiver_refused_spans` should be 0. -If the value of `tempo_receiver_refused_spans` is greater than 0, then the possible reason is the application spans are being dropped due to rate limiting. - -#### Solution -- The rate limiting may be appropriate and does not need to be fixed. The metric simply explained the cause of the missing spans, and there is nothing more to be done. -- If more ingestion volume is needed, increase the configuration for the rate limiting, by adding this CLI flag to Tempo at startup - https://github.com/grafana/tempo/blob/78f3554ca30bd5a4dec01629b8b7b2b0b2b489be/modules/overrides/limits.go#L42 - \ No newline at end of file diff --git a/docs/tempo/website/troubleshooting/unable-to-see-trace.md b/docs/tempo/website/troubleshooting/unable-to-see-trace.md index 5c9a2ae3134..319edbdcfad 100644 --- a/docs/tempo/website/troubleshooting/unable-to-see-trace.md +++ b/docs/tempo/website/troubleshooting/unable-to-see-trace.md @@ -10,7 +10,12 @@ weight: 471 - There could be issues querying for traces that have been received by Tempo. ## Section 1: Diagnosing and fixing ingestion issues -Check whether the application spans are actually reaching Tempo. The following metrics help determine this +The first step is to check whether the application spans are actually reaching Tempo. + +Add the following flag to the distributor container - [`distributor.log-received-traces`](https://github.com/grafana/tempo/blob/57da4f3fd5d2966e13a39d27dbed4342af6a857a/modules/distributor/config.go#L55). +This enables debug logging of all the traces received by the distributor, and is useful to check if Tempo is receiving any traces at all. + +Or, check the following metrics - - `tempo_distributor_spans_received_total` - `tempo_ingester_traces_created_total` @@ -52,7 +57,37 @@ This can also be confirmed by checking the metric `tempo_request_duration_second - Check logs of distributors for a message like `msg="pusher failed to consume trace data" err="DoBatch: IngesterCount <= 0"`. This is likely because no ingester is joining the gossip ring, make sure the same gossip ring address is supplied to the distributors and ingesters. -## Section 2: Diagnosing and fixing issues with querying traces +## Section 2: Diagnosing and fixing sampling & limits issues + +If you are able to query some traces in Tempo but not others, you have come to the right section! + +This could happen because of a number of reasons and some have been detailed in this blog post - +[Where did all my spans go? A guide to diagnosing dropped spans in Jaeger distributed tracing](https://grafana.com/blog/2020/07/09/where-did-all-my-spans-go-a-guide-to-diagnosing-dropped-spans-in-jaeger-distributed-tracing/). +This is useful if you are using the Jaeger Agent. + +If you are using the Grafana Agent, continue reading the following section for metrics to monitor. + +### Diagnosing the issue +Check if the pipeline is dropping spans. The following metrics on the _Grafana Agent_ help determine this - +- `tempo_exporter_send_failed_spans`. The value of this metric should be 0. +- `tempo_receiver_refused_spans`. This value of this metric should be 0. +- `tempo_processor_dropped_spans`. The value of this metric should be 0. + +If the pipeline is not reporting any dropped spans, check whether application spans are being dropped by Tempo. The following metrics help determine this - +- `tempo_receiver_refused_spans`. The value of `tempo_receiver_refused_spans` should be 0. + Note that the Grafana Agent and Tempo share the same metric. Make sure to check the value of the metric from both services. + If the value of `tempo_receiver_refused_spans` is greater than 0, then the possible reason is the application spans are being dropped due to rate limiting. + +#### Solution +- If the pipeline (Grafana Agent) is found to be dropping spans, the deployment may need to be scaled up. Look for a message like `too few agents compared to the ingestion rate` in the agent logs. +- There might also be issues with connectivity to Tempo backend, check the agent for logs like `error sending batch, will retry` and make sure the Tempo endpoint and credentials are correctly configured. +- If Tempo is found to be dropping spans, then the possible reason is the application spans are being dropped due to rate limiting. + The rate limiting may be appropriate and does not need to be fixed. The metric simply explained the cause of the missing spans, and there is nothing more to be done. +- If more ingestion volume is needed, increase the configuration for the rate limiting, by adding this CLI flag to Tempo at startup - https://github.com/grafana/tempo/blob/78f3554ca30bd5a4dec01629b8b7b2b0b2b489be/modules/overrides/limits.go#L42 + +> **Note**: Check the [ingestion limits page](../../configuration/ingestion-limit) for further information on limits. + +## Section 3: Diagnosing and fixing issues with querying traces If you have determined that data has been ingested correctly into Tempo, then it is time to investigate possible issues with querying the data. A quick thing to check is your version of Grafana. The way Tempo is queried differs from 7.4.x to 7.5.x. Please refer to [the querying documentation](https://grafana.com/docs/tempo/latest/configuration/querying/) for help. If this is not a Grafana version issue, proceed! Check the logs of the Tempo Query Frontend. The Query Frontend pod runs with two containers (Query Frontend & Tempo Query), so lets use the following command to view Query Frontend logs - diff --git a/operations/tempo-mixin/README.md b/operations/tempo-mixin/README.md index ecde1a10506..d5c90823582 100644 --- a/operations/tempo-mixin/README.md +++ b/operations/tempo-mixin/README.md @@ -1,15 +1,17 @@ +Dashboards, rules and alerts are in the `yamls` folder. Use them directly in Prometheus & Grafana to monitor Tempo. + To generate dashboards with this mixin use: ```console -jb install && jsonnet -J vendor -S dashboards.jsonnet -m out +jb install && jsonnet -J vendor -S dashboards.jsonnet -m yamls ``` To generate alerts, use: ```console -jsonnet -J vendor -S alerts.jsonnet > out/alerts.yaml +jsonnet -J vendor -S alerts.jsonnet > yamls/alerts.yaml ``` To generate recording rules, use: ```console -jsonnet -J vendor -S rules.jsonnet > out/rules.yaml +jsonnet -J vendor -S rules.jsonnet > yamls/rules.yaml ``` diff --git a/operations/tempo-mixin/out/alerts.yaml b/operations/tempo-mixin/yamls/alerts.yaml similarity index 100% rename from operations/tempo-mixin/out/alerts.yaml rename to operations/tempo-mixin/yamls/alerts.yaml diff --git a/operations/tempo-mixin/out/rules.yaml b/operations/tempo-mixin/yamls/rules.yaml similarity index 100% rename from operations/tempo-mixin/out/rules.yaml rename to operations/tempo-mixin/yamls/rules.yaml diff --git a/operations/tempo-mixin/out/tempo-operational.json b/operations/tempo-mixin/yamls/tempo-operational.json similarity index 100% rename from operations/tempo-mixin/out/tempo-operational.json rename to operations/tempo-mixin/yamls/tempo-operational.json diff --git a/operations/tempo-mixin/out/tempo-reads.json b/operations/tempo-mixin/yamls/tempo-reads.json similarity index 100% rename from operations/tempo-mixin/out/tempo-reads.json rename to operations/tempo-mixin/yamls/tempo-reads.json diff --git a/operations/tempo-mixin/out/tempo-resources.json b/operations/tempo-mixin/yamls/tempo-resources.json similarity index 100% rename from operations/tempo-mixin/out/tempo-resources.json rename to operations/tempo-mixin/yamls/tempo-resources.json diff --git a/operations/tempo-mixin/out/tempo-writes.json b/operations/tempo-mixin/yamls/tempo-writes.json similarity index 100% rename from operations/tempo-mixin/out/tempo-writes.json rename to operations/tempo-mixin/yamls/tempo-writes.json