Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add watcher for Kernel Events via ebpf #37833

Merged
merged 5 commits into from
Feb 2, 2024
Merged

Conversation

mjwolf
Copy link
Contributor

@mjwolf mjwolf commented Feb 1, 2024

Add a global watcher which can be used by clients to receive kernel events via ebpf.

Proposed commit message

This adds a watcher that will watch for Linux kernel events, using ebpf via the ebpfevents library, and send the events to subscribed clients.

By using a single global watcher, multiple clients can subscribe and receive kernel events, while avoiding increasing the amount of kernel resources used (e.g. avoiding having multiple ebpf probes/maps).

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

Related issues

Both these PRs will use this watcher. This code was part of both of these PRs, but as that code is diverging, this PR is created to have a single branch with the watcher. These PRs will be updated to use this code, once it's merged.

Add a global watcher which can be used by clients to receive kernel events via
ebpf.
@mjwolf mjwolf requested review from a team as code owners February 1, 2024 21:10
@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Feb 1, 2024
@mjwolf mjwolf requested review from mmat11 and efd6 February 1, 2024 21:11
@mergify mergify bot assigned mjwolf Feb 1, 2024
Copy link
Contributor

mergify bot commented Feb 1, 2024

This pull request does not have a backport label.
If this is a bug or security fix, could you label this PR @mjwolf? 🙏.
For such, you'll need to label your PR with:

  • The upcoming major version of the Elastic Stack
  • The upcoming minor version of the Elastic Stack (if you're not pushing a breaking change)

To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-v8./d.0 is the label to automatically backport to the 8./d branch. /d is the digit

if gWatcher.status == stopped {
l, err := ebpfevents.NewLoader()
if err != nil {
gWatcherErr = fmt.Errorf("init ebpf loader: %w", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it necessary to use a global for this? If it needs to be retained between calls (though why? I don't see it being accessed anywhere) it can be kept in the Watcher.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've moved err into the Watcher struct.

This is needed so that subsequent callers of GetWatcher can get the same error as the first call. The most likely cause of the error is that the system doesn't support ebpf, so there's no point retrying, but the other callers should still get the error. So the error needs to be kept in a global. But if you know a better way to do this, I can change

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, OK. Keeping it in the struct is the conventional approach to make errors sticky. Then you can check the err field on method entry and return the error if it is non-nil. You can also have an Err() error method to return the error if you think it will be helpful (probably not here).

Because of the prevalence of concurrency in Go, use of globals is avoided, and if it can't be, try to make any singleton have only one entry (having the error as an associated global essentially makes the singleton doubly rooted).


var (
gWatcherOnce sync.Once
gWatcherErr error
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we using a global error?

ctx, gWatcher.cancel = context.WithCancel(context.Background())

go gWatcher.loader.EventLoop(ctx, records)
go func(ctx context.Context) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can close over ctx, though this is a matter of personal style.

@mjwolf mjwolf added the Team:Security-Linux Platform Linux Platform Team in Security Solution label Feb 1, 2024
@elasticmachine
Copy link
Collaborator

Pinging @elastic/sec-linux-platform (Team:Security-Linux Platform)

@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Feb 1, 2024
@mjwolf mjwolf added the backport-skip Skip notification from the automated backport with mergify label Feb 1, 2024
@elasticmachine
Copy link
Collaborator

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Duration: 173 min 59 sec

❕ Flaky test report

No test was executed to be analysed.

🤖 GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

  • /package : Generate the packages and run the E2E tests.

  • /beats-tester : Run the installation tests with beats-tester.

  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

- Remove global error
- Move methods onto struct
- Remove unneeded seccomp policy on arm
@mjwolf mjwolf requested a review from efd6 February 2, 2024 00:55
@elasticmachine
Copy link
Collaborator

❕ Build Aborted

There is a new build on-going so the previous on-going builds have been aborted.

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview

Expand to view the summary

Build stats

  • Start Time: 2024-02-02T00:54:57.164+0000

  • Duration: 6 min 50 sec

Steps errors 1

Expand to view the steps failures

Error signal
  • Took 0 min 0 sec . View more details here
  • Description: tar: step failed with error null

🤖 GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

  • /package : Generate the packages and run the E2E tests.

  • /beats-tester : Run the installation tests with beats-tester.

  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

@elasticmachine
Copy link
Collaborator

❕ Build Aborted

Either there was a build timeout or someone aborted the build.

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview

Expand to view the summary

Build stats

  • Duration: 64 min 35 sec

🤖 GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

  • /package : Generate the packages and run the E2E tests.

  • /beats-tester : Run the installation tests with beats-tester.

  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

@elasticmachine
Copy link
Collaborator

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2024-02-02T00:55:58.726+0000

  • Duration: 163 min 45 sec

Test stats 🧪

Test Results
Failed 0
Passed 28780
Skipped 2016
Total 30796

💚 Flaky test report

Tests succeeded.

🤖 GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

  • /package : Generate the packages and run the E2E tests.

  • /beats-tester : Run the installation tests with beats-tester.

  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

@pierrehilbert pierrehilbert added the Team:Elastic-Agent Label for the Agent team label Feb 2, 2024
@elasticmachine
Copy link
Collaborator

Pinging @elastic/elastic-agent (Team:Elastic-Agent)

@elasticmachine
Copy link
Collaborator

💔 Build Failed

Failed CI Steps

cc @mjwolf

@elasticmachine
Copy link
Collaborator

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2024-02-02T16:02:19.483+0000

  • Duration: 164 min 32 sec

Test stats 🧪

Test Results
Failed 0
Passed 28780
Skipped 2016
Total 30796

💚 Flaky test report

Tests succeeded.

🤖 GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

  • /package : Generate the packages and run the E2E tests.

  • /beats-tester : Run the installation tests with beats-tester.

  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

@mjwolf mjwolf merged commit 4567803 into elastic:main Feb 2, 2024
128 of 129 checks passed
@mjwolf mjwolf deleted the ebpf_event_watcher branch February 2, 2024 19:34
Scholar-Li pushed a commit to Scholar-Li/beats that referenced this pull request Feb 5, 2024
This adds a watcher that will watch for Linux kernel events, using ebpf via the ebpfevents library, and send the events to subscribed clients.

By using a single global watcher, multiple clients can subscribe and receive kernel events, while avoiding increasing the amount of kernel resources used (e.g. avoiding having multiple ebpf probes/maps).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-skip Skip notification from the automated backport with mergify enhancement Team:Elastic-Agent Label for the Agent team Team:Security-Linux Platform Linux Platform Team in Security Solution
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants