Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metricbeat 8.2.3+ does not report metrics for root owned process when runnins as non-root #37135

Open
AndersonQ opened this issue Nov 15, 2023 · 1 comment
Labels
bug Stalled Team:Elastic-Agent Label for the Agent team

Comments

@AndersonQ
Copy link
Member

Until v8.1.3, if metricbeat was running as non-root it'd still collect some metrics from process owned by root, after this version (at least since 8.2.3) it isn't the case anymore, the processes owned by root are not reported at all.

Technically speaking, running metricbeat as non-root ins't officially supported as it'd jeopardise its ability to collect metrics as some of them require root permissions. Nevertheless, versions up to v8.1.3 would still report process metrics for root owned process even though it'd not be the complete set of metrics metricbeat can collect from a process.

Currently our quick start guide instructs to run metricbeat as root.

For confirmed bugs, please report:

  • Version: 8.2.3 (most likely since 8.2.0)
  • Operating System: Linux, not tested on others
  • Discuss Forum URL: N/A
  • Steps to Reproduce:
    • run metricbeat 8.1.3 or earlier as non-root
    • use the system module (see config below)
    • wait for at least 1 Non-zero metrics in the last 30s log to appear
    • stop metricbeat
    • run metricbeat 8.2.3 or earlier as non-root
    • use the system module (see config at the end)
    • wait for at least 1 Non-zero metrics in the last 30s log to appear
    • compare the process metrics collected. You'll notice v8.2.3 did not collect any metrics for root owned process
system module config: click to expand
- module: system
 period: 10s
 metricsets:
   - cpu
   - load
   - memory
   - network
   - process_summary
   - socket_summary
   - core
   - diskio
 core.metrics: [percentages, ticks]

- module: system
 period: 1m
 metricsets:
   - filesystem
   - fsstat
 processors:
 - drop_event.when.regexp:
     system.filesystem.mount_point: '^/(sys|cgroup|proc|dev|etc|host|lib|snap)($|/)'

- module: system
 period: 15m
 metricsets:
   - uptime

- module: system
 period: 15m
 metricsets:
   - process
 processes: ['.*']
 process.include_top_n:
   by_cpu: 10      # include top 5 processes by CPU
   by_memory: 10   # include top 5 processes by memory

Investigation

The way system process metrics are collected has been refactored and moved to https://github.com/elastic/elastic-agent-system-metrics.
To simply restore the previous behaviour, the errors related to access permission can be ignored as I started doing here. However it might not be the better solution as it's not clear why some metrics are absent or are zero values.

Most likely a better solution is either make the metric collection "permission aware", so ti won't even try to collect metrics for which it does not have access rights, and document it properly or at least handle the errors in a way it's be clear for the user the collected metrics are incomples due to metricbeat not having permission to collect the full set of metrics.

@botelastic
Copy link

botelastic bot commented Nov 14, 2024

Hi!
We just realized that we haven't looked into this issue in a while. We're sorry!

We're labeling this issue as Stale to make it hit our filters and make sure we get back to it as soon as possible. In the meantime, it'd be extremely helpful if you could take a look at it as well and confirm its relevance. A simple comment with a nice emoji will be enough :+1.
Thank you for your contribution!

@botelastic botelastic bot added the Stalled label Nov 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Stalled Team:Elastic-Agent Label for the Agent team
Projects
None yet
Development

No branches or pull requests

1 participant