Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow annotations to be added to just to the nvidia-dcgm-node-exporter daemonset for datadog monitoring via helm install #681

Open
flowinh2o opened this issue Mar 14, 2024 · 4 comments

Comments

@flowinh2o
Copy link

Would it be possible to add dcgmExporter.annotations to the helm chart? We are using Datadog to monitor our clusters and seems like the autodiscovery agent (v7.51.0) has a problem with all of the daemonsets having the same annotations as seen below:

Thank you!

  Configuration Errors
  ====================
    gpu-operator/gpu-feature-discovery-nrsl7 (d57f6d2c-e8f2-48e7-9989-4f795acf9b10)
    -------------------------------------------------------------------------------
        annotation ad.datadoghq.com/nvidia-dcgm-exporter.checks is invalid: nvidia-dcgm-exporter doesn't match a container identifier [gpu-feature-discovery toolkit-validation]
    gpu-operator/nvidia-container-toolkit-daemonset-q8f25 (53c136bf-e3ed-4dca-9cce-87f0830312fb)
    --------------------------------------------------------------------------------------------
        annotation ad.datadoghq.com/nvidia-dcgm-exporter.checks is invalid: nvidia-dcgm-exporter doesn't match a container identifier [driver-validation nvidia-container-toolkit-ctr]
    gpu-operator/nvidia-device-plugin-daemonset-9jp9f (3b692b0d-55e9-4ca4-a125-949f019c3618)
    ----------------------------------------------------------------------------------------
        annotation ad.datadoghq.com/nvidia-dcgm-exporter.checks is invalid: nvidia-dcgm-exporter doesn't match a container identifier [nvidia-device-plugin toolkit-validation]
    gpu-operator/nvidia-driver-daemonset-rltzp (91ba4b61-6a1f-4135-a63e-44995fb7acfd)
    ---------------------------------------------------------------------------------
        annotation ad.datadoghq.com/nvidia-dcgm-exporter.checks is invalid: nvidia-dcgm-exporter doesn't match a container identifier [k8s-driver-manager mofed-validation nvidia-driver-ctr nvidia-peermem-ctr]
    gpu-operator/nvidia-mig-manager-5tvkk (ef6705e1-2a9d-4ea8-a23c-e9726e644fb0)
    ----------------------------------------------------------------------------
        annotation ad.datadoghq.com/nvidia-dcgm-exporter.checks is invalid: nvidia-dcgm-exporter doesn't match a container identifier [nvidia-mig-manager toolkit-validation]
    gpu-operator/nvidia-operator-validator-642wm (b72e3d8b-b633-4617-bb62-5ab05585935b)
    -----------------------------------------------------------------------------------
        annotation ad.datadoghq.com/nvidia-dcgm-exporter.checks is invalid: nvidia-dcgm-exporter doesn't match a container identifier [cuda-validation driver-validation nvidia-operator-validator plugin-validation toolkit-validation]
@flowinh2o
Copy link
Author

Actually it looks like the integration works so this is not really needed and would only eliminate the errors seen in the agent above.

@shashiranjan84
Copy link

@flowinh2o I am getting same error and I dont any metrics in DD. How you managed to see the metrics?

@shashiranjan84
Copy link

Got the metrics working once I fixed the annotation. But agree @flowinh2o, we need dcgmExporter specific annotation

@changhyuni
Copy link

@shashiranjan84
Is it possible to separate comments by container?
I need to use datadog's openmetrics...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants