Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filebeat does not handle the OS open file limit correctly #34763

Closed
rdner opened this issue Mar 7, 2023 · 2 comments
Closed

Filebeat does not handle the OS open file limit correctly #34763

rdner opened this issue Mar 7, 2023 · 2 comments
Labels
bug Filebeat Filebeat Stalled Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team

Comments

@rdner
Copy link
Member

rdner commented Mar 7, 2023

I tried to ingest 250 files (1000 lines each) on Mac 13.2.1 with the following configuration:

First run: log input

filebeat.inputs:
  - type: log
    paths:
      - "{{ROOT}}/logs/file*.log"
path.data: "{{ROOT}}/data-log"
# logging:
#   level: debug
output.file:
  rotate_every_kb: 10485760 # 10GB
  path: "{{ROOT}}/out/log"
  filename: output.json

Second run: filestream input

filebeat.inputs:
  - type: filestream
    id: my-filestream-id
    enabled: true
    paths:
      - "{{ROOT}}/logs/file*.log"
path.data: "{{ROOT}}/data-fs"
output.file:
  rotate_every_kb: 10485760 # 10GB
  path: "{{ROOT}}/out/fs"
  filename: output.json

I found that for this file count neither log input nor filestream can finish ingesting the files. However, they fail to do so differently:

The log input starts logging these messages and drops the files completely, which leads to data loss.

{
  "log.level": "error",
  "@timestamp": "2023-03-06T11:18:45.840+0100",
  "log.logger": "input",
  "log.origin": {
    "file.name": "log/input.go",
    "file.line": 556
  },
  "message": "Harvester could not be started on new file: /Users/rdner/Projects/es_confs/load-log-vs-fs/logs/file777.log, Err: error setting up harvester: Harvester setup failed. Unexpected file opening error: Failed opening /Users/rdner/Projects/es_confs/load-log-vs-fs/logs/file777.log: open /Users/rdner/Projects/es_confs/load-log-vs-fs/logs/file777.log: too many open files",
  "service.name": "filebeat",
  "input_id": "d9d6ec66-5381-4f4c-adba-e5221ed3ad6c",
  "source_file": "/Users/rdner/Projects/es_confs/load-log-vs-fs/logs/file777.log",
  "state_id": "native::102548165-16777230",
  "finished": false,
  "os_id": "102548165-16777230",
  "ecs.version": "1.6.0"
}

The filestream input logs this error:

{
  "log.level": "error",
  "@timestamp": "2023-03-06T15:37:54.647+0100",
  "log.logger": "input.filestream",
  "log.origin": {
    "file.name": "filestream/input.go",
    "file.line": 140
  },
  "message": "File could not be opened for reading: failed opening /Users/rdner/Projects/es_confs/load-log-vs-fs/logs/file58.log: open /Users/rdner/Projects/es_confs/load-log-vs-fs/logs/file58.log: too many open files",
  "service.name": "filebeat",
  "id": "my-filestream-id",
  "source_file": "filestream::my-filestream-id::native::102589522-16777232",
  "path": "/Users/rdner/Projects/es_confs/load-log-vs-fs/logs/file58.log",
  "state-id": "native::102589522-16777232",
  "ecs.version": "1.6.0"
}

and it gets stuck in this loop of re-ingesting already sent data. Which leads to data duplication and I believe filestream would not recover from this unless restarted and will keep re-sending all the data again and again. I left it running for 30 minutes on my machine and it duplicated the same message thousands of times.

This happens only when the amount of files being ingested exceeds the OS open file limit, which happens to be only 256 for Mac OS.

The expected behaviour would be that Filebeat queues the files perhaps based on the modified date. In this case we have to periodically close the files we already ingest to decrease the open file counter.

It's not trivial to come up with the right strategy but we need to handle this OS limit better than we do now.

@rdner rdner added bug Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team labels Mar 7, 2023
@elasticmachine
Copy link
Collaborator

Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane)

@rdner rdner added the Filebeat Filebeat label Mar 7, 2023
@botelastic
Copy link

botelastic bot commented Mar 6, 2024

Hi!
We just realized that we haven't looked into this issue in a while. We're sorry!

We're labeling this issue as Stale to make it hit our filters and make sure we get back to it as soon as possible. In the meantime, it'd be extremely helpful if you could take a look at it as well and confirm its relevance. A simple comment with a nice emoji will be enough :+1.
Thank you for your contribution!

@botelastic botelastic bot added the Stalled label Mar 6, 2024
@botelastic botelastic bot closed this as completed Sep 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Filebeat Filebeat Stalled Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team
Projects
None yet
Development

No branches or pull requests

2 participants