Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add state_horizontalpodautoscaler metricset #42392

Draft
wants to merge 13 commits into
base: main
Choose a base branch
from
Draft

add state_horizontalpodautoscaler metricset #42392

wants to merge 13 commits into from

Conversation

endorama
Copy link
Member

@endorama endorama commented Jan 22, 2025

Proposed commit message

Introduce the state_horizontalpodautoscaler metricset to retrieve metrics for Horizontal Pod Autoscaler from kube-state-metrics.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

Disruptive User Impact

Author's Checklist

  • [ ]

How to test this PR locally

Related issues

Use cases

Screenshots

Logs

@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Jan 22, 2025
Copy link
Contributor

mergify bot commented Jan 22, 2025

This pull request does not have a backport label.
If this is a bug or security fix, could you label this PR @endorama? 🙏.
For such, you'll need to label your PR with:

  • The upcoming major version of the Elastic Stack
  • The upcoming minor version of the Elastic Stack (if you're not pushing a breaking change)

To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-8./d is the label to automatically backport to the 8./d branch. /d is the digit

Copy link
Contributor

mergify bot commented Jan 22, 2025

backport-8.x has been added to help with the transition to the new branch 8.x.
If you don't need it please use backport-skip label and remove the backport-8.x label.

@mergify mergify bot added the backport-8.x Automated backport to the 8.x branch with mergify label Jan 22, 2025
@endorama endorama added the Team:Cloudnative-Monitoring Label for the Cloud Native Monitoring team label Jan 22, 2025
@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Jan 22, 2025
@endorama
Copy link
Member Author

endorama commented Feb 4, 2025

I've been able to run it successfully to collect the new metrics, see screenshot:
Screenshot 2025-02-04 at 12 42 13

@endorama
Copy link
Member Author

endorama commented Feb 4, 2025

While running it I faced an unexpected error: Error starting the watchers: resource for name horizontalpodautoscaler does not exist. Watcher cannot be created.

This is the relevant log line:

{"log.level":"error","@timestamp":"2025-02-04T11:31:01.447Z","log.logger":"kubernetes","log.origin":{"function":"github.com/elastic/beats/v7/metricbeat/module/kubernetes/util.NewResourceMetadataEnricher","file.name":"util/kubernetes.go","file.line":739},"message":"Error starting the watchers: resource for name horizontalpodautoscaler does not exist. Watcher cannot be created","service.name":"metricbeat","ecs.version":"1.6.0"}

It happened because I enabled add_metadata in the metricbeat configuration.

@@ -0,0 +1 @@
null
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe you should also add the HPA metrics in ksm.v2.10.0.plain so that this file is not null. In a later PR we can remove the support for ksm 2.10.0

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct to allign with what we have in other metricsets

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added in 47ae556

@MichaelKatsoulis
Copy link
Contributor

MichaelKatsoulis commented Feb 4, 2025

I believe the tests are failing because the in the integration tests, kind is used to create the Kubernetes clusters and with kind you don't get HPA metrics.

kind create cluster --image kindest/node:${K8S_VERSION}

@MichaelKatsoulis
Copy link
Contributor

@elastic/elastic-agent-data-plane could someone also review this PR?

"kube_horizontalpodautoscaler_status_condition": p.LabelMetric("status.condition", "condition", p.OpLowercaseValue()),
},

Labels: map[string]p.LabelMap{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think shall we include kube_horizontalpodautoscaler_labels ?

Does it happen that you have some info for a cluster for that?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think shall we include kube_horizontalpodautoscaler_labels ?

I think this poses more risks that benefits. It's easy to use this to increase the cardinality and create additional issues and I don't see a use case for it at the moment. In beats we already enrich this with the various metadata provider, I think we should postpone this until a use case appears to justify it.


// mapping stores the state metrics we want to fetch and will be used by this metricset
var mapping = &p.MetricsMapping{
Metrics: map[string]p.MetricMap{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the list here I see that: kube_horizontalpodautoscaler_spec_target_metric (although is experimental) that might also be interesting

WDYT? @MichaelKatsoulis

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not 100% sure what this metric mean. But why not add it? Maybe @endorama whose team actually uses this metrics could argue about its importance.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This metric represent the target metric utilization that HPA uses in calculation. In our example the HPA was set to trigger on 30% CPU utilization:

  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 30

Definitely a useful metrics to understand why a HPA behaves in a certain way but I would not considered it critical and we have not planned to use it.
I'm also not familiar with policies around including experimental metrics and I would favor keeping the maintenance cost low unless we have a use case that requires it.

@gizas
Copy link
Contributor

gizas commented Feb 4, 2025

While running it I faced an unexpected error: Error starting the watchers: resource for name horizontalpodautoscaler does not exist. Watcher cannot be created.

This is the relevant log line:

{"log.level":"error","@timestamp":"2025-02-04T11:31:01.447Z","log.logger":"kubernetes","log.origin":{"function":"github.com/elastic/beats/v7/metricbeat/module/kubernetes/util.NewResourceMetadataEnricher","file.name":"util/kubernetes.go","file.line":739},"message":"Error starting the watchers: resource for name horizontalpodautoscaler does not exist. Watcher cannot be created","service.name":"metricbeat","ecs.version":"1.6.0"}

It happened because I enabled add_metadata in the metricbeat configuration.

You will need to update the list as here https://github.com/elastic/beats/pull/37332/files#diff-1d7b17f1ef239e6177db5ee660f2234bfd69c82e93494d4e46ee97f8a8617cfdR183

We have one watcher per metricset as descibed here

@endorama
Copy link
Member Author

endorama commented Feb 4, 2025

believe the tests are failing because the in the integration tests, kind is used to create the Kubernetes clusters and with kind you don't get HPA metrics.

@MichaelKatsoulis should I disable integration tests for this metricset? I'm not sure how to work around/solve this issue.
I understood it wrong, the HPA is not configured so there are no metrics. I'll look into it.

@@ -253,6 +254,12 @@ func getExtraWatchers(resourceName string, addResourceMetadata *metadata.AddReso
return []string{}
case NamespaceResource:
return []string{}
case HorizontalPodAutoscalerResource:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. You can add namespace metadata to the horizontal pod autoscaler. But you should add a HorizontalPodAutoscalerResource block in the getEventMetadataFunc

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I didn't find a HorizontalPodAutoscaler type in https://github.com/elastic/elastic-agent-autodiscover/blob/c0ab8b63bf5ac1e926b9a997bfa4ca272a3b4af3/kubernetes/types.go, should it be added there first?

Is not clear to me what is the advantage of adding namespace metadata do the HPA resource with this code change. Namespace information for the metric are already retrieved in

"namespace": p.KeyLabel(mb.ModuleDataKey + ".namespace"),

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Somehow related to this #42392 (comment)

With the addition of namespace metadata ,you will get the labels/ annotations from the namespace, not only the name.

I understand that hpa might not needed. At least now in your scenarios might not be needed.

Is not clear to me what is the advantage of adding namespace metadata do the HPA resource

There are scenarios that administrators search with labels to find what is installed in their cluster and this means that will loose the hpa info.

The important is that we need to make sure that we dont break the existing metatdata enrichment for eg. in deployments where an hpa is attached

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the addition of namespace metadata ,you will get the labels/ annotations from the namespace, not only the name.

Thanks for the clarification, so it would pull any label on the namespace and attach it to the HPA related metrics. I think this is useful to help with discoverability.

I understand that hpa might not needed. At least now in your scenarios might not be needed.

No in our case I don't think we will use it but I think this makes a great addition that we should provide if add_metadata is set to true. That's the behaviour I'd expect.

The important is that we need to make sure that we dont break the existing metatdata enrichment for eg. in deployments where an hpa is attached

I'll test it with add_metadata, but I remember I saw an error related in previous tests.

To clarify on how to solve this:

I didn't find a HorizontalPodAutoscaler type in elastic/elastic-agent-autodiscover@c0ab8b6/kubernetes/types.go

Should I open a PR in that library to add

type HorizontalPodAutoscaler = autoscaling.HorizontalPodAutoscaler // autoscaling/v1 API

(Go docs)

and then

		case *kubernetes.HorizontalPodAutoscaler:
			return map[string]mapstr.M{id: generalMetaGen.Generate(HorizontalPodAutoscalerResource, r)}

in getEventMetadataFunc()?

@MichaelKatsoulis
Copy link
Contributor

@MichaelKatsoulis should I disable integration tests for this metricset? I'm not sure how to work around/solve this issue. I understood it wrong, the HPA is not configured so there are no metrics. I'll look into it.

I believe I know what is needed. You need to edit the https://github.com/elastic/beats/blob/main/metricbeat/module/kubernetes/kubernetes.yml and add a horizontalpodautosclaer resource. These resources defined there are deployed during integration tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-8.x Automated backport to the 8.x branch with mergify Team:Cloudnative-Monitoring Label for the Cloud Native Monitoring team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants