Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Custom grafana dashboard #317

Merged
merged 12 commits into from
Dec 6, 2023
Merged

Conversation

jbasu01
Copy link
Contributor

@jbasu01 jbasu01 commented Nov 13, 2023

A new custom Grafana dashboard was created and tested by bringing in metrics from ArgoCD. ArgoCD provides a JSON template download, and code from the JSON download could be copied into the config file and appropriate changes made to ensure metrics from ArgoCD dashboard are pulled in and made visible in the custom Grafana dashboard. Additional information is available at https://grafana.com/grafana/dashboards/14584-argocd/

A key item to make a note of is that one cannot use the word "ArgoCD" or "argocd" just by itself in the value for the key "title" in the grafana-dashboard-argocd.yaml. When changed to "ArgoCD xxxxx" ("ArgoCD stats" in our case and it worked fine.

@schwesig
Copy link
Contributor

@schwesig

@larsks
Copy link
Member

larsks commented Nov 13, 2023

A key item to make a note of is that one cannot use the word "ArgoCD" or "argocd" just by itself in the value for the key "title" in the grafana-dashboard-argocd.yaml.

Why?

Copy link
Contributor

@schwesig schwesig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

working together with Jeet in a call
checking and linting together

@computate
Copy link
Member

@jbasu01 @schwesig Nice job resolving the pre-commit hooks and squashing to one commit. I just want to test this out together in an upcoming meeting on Monday, update the datasources, and make sure it's ready. I realize @jbasu01 will be on vacation, but we can work out the final details next week.

@computate
Copy link
Member

@larsks after a couple hours going crazy testing out a custom dashboard in the ACM grafana, I can't explain why, but naming the dashboard anything but argocd or ArgoCD magically made the dashboard work.

@schwesig
Copy link
Contributor

@computate thanks for checking the argocd naming problem we had and confirming it.

grafana/base/grafanadatasources/grafanadatasource.yaml Outdated Show resolved Hide resolved
grafana/base/grafanas/grafana.yaml Outdated Show resolved Hide resolved
grafana/base/kustomization.yaml Outdated Show resolved Hide resolved
grafana/base/routes/route.yaml Outdated Show resolved Hide resolved
grafana/base/serviceaccounts/serviceaccount.yaml Outdated Show resolved Hide resolved
grafana/overlays/nerc-ocp-infra/kustomization.yaml Outdated Show resolved Hide resolved
@schwesig
Copy link
Contributor

@larsks Thanks for your feedback.
@jbasu01 next steps to work on for us

@schwesig schwesig self-requested a review November 29, 2023 15:42
@computate computate force-pushed the custom-grafana-dashboard branch 4 times, most recently from 2a74c05 to 513a5a7 Compare November 29, 2023 22:05
@computate
Copy link
Member

@larsks I'm having trouble with the Patch Operator with this PR. I have applied the changes from this branch to test it:

oc --as system:admin apply -k grafana/overlays/nerc-ocp-infra/

The Patches say LastReconcileCycleSucceded:

oc --as system:admin -n grafana get patch/grafana-oauth -o yaml | grep 'status:' -A 20
oc --as system:admin -n grafana get patch/grafanadatasource-observability-metrics -o yaml | grep 'status:' -A 20

But the objects are not patched:

oc --as system:admin -n grafana get grafana/grafana -o yaml | grep 'root_url:'
oc --as system:admin -n grafana get grafanadatasource/observability-metrics -o yaml | grep 'secureJsonData:' -A 2

@larsks Can you please help me understand why the patches are not working, and if I need this ClusterRoleBinding?

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: grafana-node-labeler-patcher
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: node-labeler
subjects:
  - kind: ServiceAccount
    name: patcher
    namespace: grafana

@larsks
Copy link
Member

larsks commented Dec 5, 2023

@computate, looking at the logs for the patch operator, we see the following errors after creating the Patch resource:

...: failed to list /v1, Kind=Secret: secrets is forbidden: User
"system:serviceaccount:grafana:patcher" cannot list resource "secrets" in API
group "" at the cluster scope

...: failed to list integreatly.org/v1alpha1, Kind=Grafana:
grafanas.integreatly.org is forbidden: User
"system:serviceaccount:grafana:patcher" cannot list resource "grafanas" in API
group "integreatly.org" at the cluster scope

...: Failed to watch /v1, Kind=Secret: failed to list /v1, Kind=Secret: secrets
is forbidden: User "system:serviceaccount:grafana:patcher" cannot list resource
"secrets" in API group "" at the cluster scope

...: Failed to watch integreatly.org/v1alpha1, Kind=Grafana: failed to list
integreatly.org/v1alpha1, Kind=Grafana: grafanas.integreatly.org is forbidden:
User "system:serviceaccount:grafana:patcher" cannot list resource "grafanas" in
API group "integreatly.org" at the cluster scope

It looks like a permissions issue; we'll need to add some RBAC to provide the necessary permissions.

...but the node-labeler role isn't appropriate, because you're not labelling nodes.

@larsks
Copy link
Member

larsks commented Dec 5, 2023

Add this RBAC to your pr and things seem to work as expected:

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: grafana-patcher
rules:
- apiGroups:
  - ""
  resources:
  - secrets
  verbs:
  - get
  - list
  - watch
  - update
  - patch
- apiGroups:
  - integreatly.org
  resources:
  - grafanas
  verbs:
  - get
  - list
  - watch
  - update
  - patch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: grafana-patcher
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: grafana-patcher
subjects:
- kind: ServiceAccount
  name: patcher
  namespace: grafana

Following our organizational model, that should go into the cluster-scope application somewhere.

@larsks
Copy link
Member

larsks commented Dec 5, 2023

Following our organizational model, that should go into the cluster-scope application somewhere.

...or not, since I see the grafana application includes the subscription and other resources we would normally put in cluster-scope, so it's best to keep things together.

@larsks
Copy link
Member

larsks commented Dec 5, 2023

@computate, looking at https://github.com/grafana-operator/grafana-operator/blob/v3.6.0/documentation/env_vars.md this patching may no longer be necessary. I'm not 100% positive; followed grafana/grafana-operator#130 to find that link and I'm taking a closer look now.

@larsks
Copy link
Member

larsks commented Dec 5, 2023

I think you ought to be able to do something like this...

In the Grafana resource:

  deployment:
    envFrom:
    - configMapRef:
        name: grafana-config-overrides
    - secretRef:
        name: grafana-secrets

And in grafana-secrets:

apiVersion: v1
kind: Secret
metadata:
  name: grafana-secrets
type: Opaque
data:
  GF_AUTH_GENERIC_OAUTH_CLIENT_SECRET: "..."
  GF_AUTH_GENERIC_OAUTH_CLIENT_ID: "..."

You should be patching spec.config.server via Kustomize, not via a dynamic Patch resource.

@larsks
Copy link
Member

larsks commented Dec 5, 2023

@computate with this latest set of changes I think everything is working (and no more patch operator required).

@larsks larsks force-pushed the custom-grafana-dashboard branch from 89a9b74 to 39577cb Compare December 6, 2023 00:24
jbasu01 and others added 6 commits December 5, 2023 19:25
This will allow cluster-admins, nerc-org-admins, and nerc-ops teams to
develop new dashboards using the existing multi-cluster observability
metrics.

rh-pre-commit.version: 2.0.3
rh-pre-commit.check-secrets: ENABLED
We're reusing the dex configuration for logging-grafana, but we had not
updated the list of valid redirect urls in Dex, so oauth logins were
failing.
Previously we were storing a service account token in the vault and
retrieving it via an ExternalSecret. This is not necessary; Kubernetes
already has the ability to populate a Secret with the token for a service
account [1].

[1]: https://kubernetes.io/docs/reference/access-authn-authz/service-accounts-admin/#create-token
This commit removes the patch-operator [1] based patches and accompanying
support resources such as service accounts, RBAC, etc.

[1]: https://github.com/redhat-cop/patch-operator
We're not using the grafana-config-overrides ConfigMap.
Grafana supports setting configuration from environment variables [1]. This
commit renames the secret key in the oauth-client-secret resource to match
the required variable name pattern so that Grafana will get the oauth
client secret from the environment.

[1]: https://grafana.com/docs/grafana/latest/setup-grafana/configure-grafana/#override-configuration-with-environment-variables
Grafana supports environment variable substitution in data sources [1].
This commit modifies the observability-metrics datasource to get the
service account token and CA certificate from the environment, rather than
using a live patch.

[1]: https://grafana.com/docs/grafana/latest/administration/provisioning/#using-environment-variables
The server root_url and the oauth client id are not dynamic nor are they
secret, so we can set them statically rather than patching them.
This commit modifies the Grafana resource to read environment variables
from the oauth-client-secret and grafana-serviceaccount-token Secrets and
the openshift-service-ca.crt ConfigMap.
This commit has some minor formatting changes that didn't
really fit in anywhere else.
@larsks larsks force-pushed the custom-grafana-dashboard branch from 39577cb to 968d127 Compare December 6, 2023 00:25
Copy link
Member

@computate computate left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@larsks Nice work with the secretGenerator, ExternalSecrets, and environment variable substitutions, thanks!

grafana/base/grafanadatasources/grafanadatasource.yaml Outdated Show resolved Hide resolved
grafana/base/grafanas/grafana.yaml Outdated Show resolved Hide resolved
grafana/base/routes/route.yaml Outdated Show resolved Hide resolved
grafana/base/serviceaccounts/serviceaccount.yaml Outdated Show resolved Hide resolved
grafana/base/kustomization.yaml Outdated Show resolved Hide resolved
Copy link
Contributor

@schwesig schwesig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@larsks awesome work, thanks for having these ideas/digging them up. appreciated.
Also thanks for getting me all the details in our call.

@computate computate force-pushed the custom-grafana-dashboard branch from 960850c to ca68014 Compare December 6, 2023 21:26
@schwesig schwesig merged commit 7a3d2d5 into OCP-on-NERC:main Dec 6, 2023
2 checks passed
@jbasu01 jbasu01 deleted the custom-grafana-dashboard branch January 23, 2024 15:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants