Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Filestream] migrate state from previous ID to current #42624

Draft
wants to merge 18 commits into
base: main
Choose a base branch
from

Conversation

belimawr
Copy link
Contributor

@belimawr belimawr commented Feb 7, 2025

Proposed commit message

This commit enables Filestream inputs to migrate file states from previous IDs to its current ID, this is done by adding a previous_ids entry to the input configuration. We look in the store for all states that match an active file with one of the previous IDs and migrate this state to the new ID. The migrated old states are marked for removal from the store. States are only migrated if they have the same file identity.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

Disruptive User Impact

Author's Checklist

  • Test changing file identity and ID at the same time.
  • Investigate any possible side effects of migrating ID + file identity at the same time.
  • Ensure the integration tests passes on Linux, Mac and Windows)

How to test this PR locally

Run the tests (on all platforms: Linux, Mac and Windows)

cd filebeat
mage buildSystemTestBinary
go test -tags=integration -run=TestFilestreamIDMigration -v -count=1 ./tests/integration

Manual test

  1. Create a file with more than 1kb in size: docker run -it --rm mingrammer/flog -f rfc5424 -n 15 > /tmp/flog.log

  2. Run Filebeat with the following configuration:

    filebeat.yml

    filebeat.inputs:
      - type: filestream
        id: "first-id"
        paths:
          - /tmp/flog.log
    
    queue.mem:
      flush.timeout: 0s
    
    output.file:
      path: ${path.home}
      filename: "output-file"
      rotate_on_startup: false
    
    filebeat.registry:
      cleanup_interval: 5s
      flush: 1s
    
    logging:
      level: debug
      selectors:
        - input
        - input.filestream
        - input.filestream.prospector
      metrics:
        enabled: false
    

  3. Wait until the file is fully ingested (15 lines in the output file)

  4. Stop Filebeat

  5. Optionally, remove the logs, this will make the next steps easier. rm -rf logs

  6. Run Filebeat with the following configuration (note the change in id and the new previous_ids):

    filebeat.yml

    filebeat.inputs:
      - type: filestream
        id: "second"
        paths:
          - /tmp/flog.log
        previous_ids:
          - "first-id"
    
    queue.mem:
      flush.timeout: 0s
    
    output.file:
      path: ${path.home}
      filename: "output-file"
      rotate_on_startup: false
    
    filebeat.registry:
      cleanup_interval: 5s
      flush: 1s
    
    logging:
      level: debug
      selectors:
        - input
        - input.filestream
        - input.filestream.prospector
      metrics:
        enabled: false
    

  7. Wait until Filebeat "read" the files to the end. Look for End of file reached: /tmp/flog.log; Backoff now. in the second execution logs.

  8. Ensure no new data was added to the output file

  9. You can also look for log entries like:

    Migrating input ID: 'filestream::first-id::fingerprint::6fb3cb6c565bdba1354f64a42dd47ef937964019400dd571f25c2cd13a9fb5be' -> 'filestream::second-id::fingerprint::6fb3cb6c565bdba1354f64a42dd47ef937964019400dd571f25c2cd13a9fb5be'
    migrated entry in registry from 'filestream::first-id::fingerprint::6fb3cb6c565bdba1354f64a42dd47ef937964019400dd571f25c2cd13a9fb5be' to 'filestream::second-id::fingerprint::6fb3cb6c565bdba1354f64a42dd47ef937964019400dd571f25c2cd13a9fb5be'. Cursor: map[offset:309]
    

Related issues

## Use cases
## Screenshots

Logs

{
  "log.level": "info",
  "@timestamp": "2025-02-07T15:12:34.224-0500",
  "log.logger": "input.filestream.prospector",
  "log.origin": {
    "function": "github.com/elastic/beats/v7/filebeat/input/filestream.(*fileProspector).Init.func4",
    "file.name": "filestream/prospector.go",
    "file.line": 214
  },
  "message": "Migrating input ID: 'filestream::first-id::fingerprint::6fb3cb6c565bdba1354f64a42dd47ef937964019400dd571f25c2cd13a9fb5be' -> 'filestream::second-id::fingerprint::6fb3cb6c565bdba1354f64a42dd47ef937964019400dd571f25c2cd13a9fb5be'",
  "service.name": "filebeat",
  "filestream_id": "second-id",
  "ecs.version": "1.6.0"
}
{
  "log.level": "info",
  "@timestamp": "2025-02-07T15:12:34.224-0500",
  "log.logger": "input",
  "log.origin": {
    "function": "github.com/elastic/beats/v7/filebeat/input/filestream/internal/input-logfile.(*sourceStore).updateIdentifiers",
    "file.name": "input-logfile/store.go",
    "file.line": 273
  },
  "message": "migrated entry in registry from 'filestream::first-id::fingerprint::6fb3cb6c565bdba1354f64a42dd47ef937964019400dd571f25c2cd13a9fb5be' to 'filestream::second-id::fingerprint::6fb3cb6c565bdba1354f64a42dd47ef937964019400dd571f25c2cd13a9fb5be'. Cursor: map[offset:309]",
  "service.name": "filebeat",
  "input_type": "filestream",
  "ecs.version": "1.6.0"
}
{
  "log.level": "info",
  "@timestamp": "2025-02-07T15:12:34.224-0500",
  "log.logger": "input.filestream.prospector",
  "log.origin": {
    "function": "github.com/elastic/beats/v7/filebeat/input/filestream.(*fileProspector).Init.func4",
    "file.name": "filestream/prospector.go",
    "file.line": 214
  },
  "message": "Migrating input ID: 'filestream::first-id::fingerprint::db8399294e69089070405b13d4f057672f3852fa8e0f56ce4b6c92398aef1b6a' -> 'filestream::second-id::fingerprint::db8399294e69089070405b13d4f057672f3852fa8e0f56ce4b6c92398aef1b6a'",
  "service.name": "filebeat",
  "filestream_id": "second-id",
  "ecs.version": "1.6.0"
}
{
  "log.level": "info",
  "@timestamp": "2025-02-07T15:12:34.224-0500",
  "log.logger": "input",
  "log.origin": {
    "function": "github.com/elastic/beats/v7/filebeat/input/filestream/internal/input-logfile.(*sourceStore).updateIdentifiers",
    "file.name": "input-logfile/store.go",
    "file.line": 273
  },
  "message": "migrated entry in registry from 'filestream::first-id::fingerprint::db8399294e69089070405b13d4f057672f3852fa8e0f56ce4b6c92398aef1b6a' to 'filestream::second-id::fingerprint::db8399294e69089070405b13d4f057672f3852fa8e0f56ce4b6c92398aef1b6a'. Cursor: map[offset:296]",
  "service.name": "filebeat",
  "input_type": "filestream",
  "ecs.version": "1.6.0"
}

This commit enables Filestream inputs to migrate file states from
previous IDs to its current ID, this is done by adding a
`previous_ids` entry to the input configuration. We look in the store
for all states that match an active file with one of the previous IDs
and migrate this state to the new ID. The migrated old states are
marked for removal from the store.
@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Feb 7, 2025
@belimawr belimawr self-assigned this Feb 7, 2025
@belimawr belimawr added Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team backport-skip Skip notification from the automated backport with mergify and removed needs_team Indicates that the issue/PR needs a Team:* label labels Feb 7, 2025
@belimawr belimawr marked this pull request as ready for review February 10, 2025 20:18
@belimawr belimawr requested a review from a team as a code owner February 10, 2025 20:18
@belimawr belimawr requested review from AndersonQ and rdner February 10, 2025 20:18
@elasticmachine
Copy link
Collaborator

Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane)

@belimawr
Copy link
Contributor Author

I've handled the accidental file identity migration on cc980a2. Now if the file identity is different, then no entry will be migrated from the registry.

@belimawr belimawr requested a review from AndersonQ February 20, 2025 22:15
@pierrehilbert
Copy link
Collaborator

@belimawr your new test is failing:

=== FAIL: filebeat/tests/integration TestFilestreamIDMigrationDoesNotMigrateFileIdentity (2.74s)
--
  | filestream_test.go:656:

@belimawr
Copy link
Contributor Author

@belimawr your new test is failing:

=== FAIL: filebeat/tests/integration TestFilestreamIDMigrationDoesNotMigrateFileIdentity (2.74s)
--
  | filestream_test.go:656:

Thanks!

I accidentally made the test rely on a hardcoded inode 🤦‍♂️

I'm gonna put it back to draft until I sort out all the tests. I also have some other things failing on Windows 😞

@belimawr belimawr marked this pull request as draft February 21, 2025 16:48
@@ -424,6 +424,7 @@ otherwise no tag is added. {issue}42208[42208] {pull}42403[42403]
- The journald input is now generally available. {pull}42107[42107]
- Add metrics for number of events and pages published by HTTPJSON input. {issue}42340[42340] {pull}42442[42442]
- Filestram input now supports migrating state when changing its ID, for that set `previous_ids`. {issue}42472[42472] {pull}42624[42624]
- Add `etw` input fallback to attach an already existing session. {pull}42847[42847]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it supposed to be here? It seems unrelated to this PR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That came when I merged main onto my branch, I'll fix it. Thanks for spotting that!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the whole PR diff, the only new entry on the changelog is the one related to this PR. Maybe you reviewed a subset of changes since the last review?
2025-02-25_11-25

Comment on lines +674 to +678
if runtime.GOOS == "windows" {
t.Logf("[WARN] Could not remove temporatry directory '%s': %s", tempDir, err)
} else {
t.Errorf("could not remove temp dir '%s': %s", tempDir, err)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it happening that often? I'm wondering if it'd be worth retrying within a given timeout and/or perhaps getting a callback that should try to kill the beat once again. For any test that does not want/need/can try to stop the beat again, the callback can be a noop. That way this function does not depend on the BeatProc but still can attempt to stop it and retry the clean up.
It could receive the BeatProc if it wouldn't compromise other tests

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it happening that often?

Often enough. While testing this PR on Windows, every time I added an runtime.GOOS == "windows" to fix a failure, another would pop up. They all together make the test fail all the time.

I'm a bit confused on what would be the function of the callback you're suggesting. One of the ideas of this integration test framework is to remove as much work from the engineer writing the test as possible, the clean up part being one of the key automation. So it feels that adding a callback for the clean up would be adding burden on the engineer writing tests.

@belimawr belimawr requested a review from AndersonQ February 25, 2025 16:25
@belimawr belimawr added the windows-11 Enable builds in the CI for windows-10 label Feb 26, 2025
@belimawr
Copy link
Contributor Author

Moving it back to draft while we discuss: #36777

@belimawr belimawr marked this pull request as draft February 26, 2025 17:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-skip Skip notification from the automated backport with mergify Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team windows-11 Enable builds in the CI for windows-10
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow changing filestream IDs without duplication by providing the previous ID values
4 participants