Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add inference token usage metrics #952

Closed
wants to merge 10 commits into from
Closed

Add inference token usage metrics #952

wants to merge 10 commits into from

Conversation

dineshyv
Copy link
Contributor

@dineshyv dineshyv commented Feb 4, 2025

What does this PR do?

Add token usage metrics for inference API
Changes:

  1. Adds a an optional dep on telemetry for all routers
  2. Inference Router accepts an optional router
  3. Adds a new MetricsMixin which can be used in any response that injects metrics
  4. Inference Router computes the prompt and compltion tokens and injects a token_usage metric in response
  5. Inference Router then calls telemetry API if provided and logs the metrics
  6. Added a new endpoint in telemetry to query metrics. Currently, this is specific to promethues. Can be exteneded to support different backends.

Metrics flow:

Stack -> Open telemetry collector -> prometheus -> grafana

Grafana UI:
image

Curl to query metrics:

curl --request POST \
  --url http://localhost:8321/v1/telemetry/metrics/completion_tokens_total \
  --header 'content-type: application/json' \
  --data '{
  "start_time": 1738774535,
  "end_time":  1738778135,
  "step": "14s",
  "query_type": "range"
}'

Response:

{
  "data": [
    {
      "metric": "completion_tokens_total",
      "labels": {
        "__name__": "completion_tokens_total",
        "exported_job": "llama-stack",
        "instance": "otel-collector:8889",
        "job": "otel-collector",
        "model_id": "meta-llama/Llama-3.1-70B-Instruct",
        "provider_id": "together"
      },
      "values": [
        {
          "timestamp": "2025-02-05T09:23:21",
          "value": 4109.0
        },
       .......
      ]
    }
  ]
}

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Feb 4, 2025
@dineshyv dineshyv changed the title [WIP] Add inference token usage metrics Add inference token usage metrics Feb 5, 2025
@ashwinb
Copy link
Contributor

ashwinb commented Feb 20, 2025

@dineshyv do you want to continue / polish this PR? it has a bunch of useful stuff I feel

@dineshyv
Copy link
Contributor Author

@ashwinb Yes, I will continue working on this.

@dineshyv
Copy link
Contributor Author

actually, I will create new PRs with only relevant changes. Closing this one.

@dineshyv dineshyv closed this Feb 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Meta Open Source bot.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants