Skip to content

Commit

Permalink
Merge remote-tracking branch 'refs/remotes/upstream/main' into fosb1
Browse files Browse the repository at this point in the history
  • Loading branch information
khushijain21 committed Dec 31, 2024
2 parents 7f98ce0 + 111a480 commit 225f9d7
Show file tree
Hide file tree
Showing 78 changed files with 12,612 additions and 11,301 deletions.
1 change: 1 addition & 0 deletions CHANGELOG-developer.next.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -108,6 +108,7 @@ The list below covers the major changes between 7.0.0-rc2 and main only.
- AWS CloudWatch Metrics record previous endTime to use for next collection period and change log.logger from cloudwatch to aws.cloudwatch. {pull}40870[40870]
- Fix flaky test in cel and httpjson inputs of filebeat. {issue}40503[40503] {pull}41358[41358]
- Fix documentation and implementation of raw message handling in Filebeat http_endpoint by removing it. {pull}41498[41498]
- Fix flaky test in filebeat Okta entity analytics provider. {issue}42059[42059] {pull}42123[42123]

==== Added

Expand Down
9 changes: 7 additions & 2 deletions CHANGELOG.next.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@ https://github.com/elastic/beats/compare/v8.8.1\...main[Check the HEAD diff]
- Fixes filestream logging the error "filestream input with ID 'ID' already exists, this will lead to data duplication[...]" on Kubernetes when using autodiscover. {pull}41585[41585]
- Add kafka compression support for ZSTD.
- Filebeat fails to start if there is any input with a duplicated ID. It logs the duplicated IDs and the offending inputs configurations. {pull}41731[41731]

- The Filestream input only starts to ingest a file when it is >= 1024 bytes in size. This happens because the fingerprint` is the default file identity now. To restore the previous behaviour, set `file_identity.native: ~` and `prospector.scanner.fingerprint.enabled: false` {issue}40197[40197] {pull}41762[41762]
*Heartbeat*


Expand Down Expand Up @@ -197,6 +197,8 @@ https://github.com/elastic/beats/compare/v8.8.1\...main[Check the HEAD diff]
- Further rate limiting fix in the Okta provider of the Entity Analytics input. {issue}40106[40106] {pull}41977[41977]
- Fix streaming input handling of invalid or empty websocket messages. {pull}42036[42036]
- Fix awss3 document ID construction when using the CSV decoder. {pull}42019[42019]
- The `_id` generation process for S3 events has been updated to incorporate the LastModified field. This enhancement ensures that the `_id` is unique. {pull}42078[42078]
- Fix Netflow Template Sharing configuration handling. {pull}42080[42080]

*Heartbeat*

Expand Down Expand Up @@ -231,7 +233,7 @@ https://github.com/elastic/beats/compare/v8.8.1\...main[Check the HEAD diff]
- Log Cisco Meraki `getDevicePerformanceScores` errors without stopping metrics collection. {pull}41622[41622]
- Don't skip first bucket value in GCP metrics metricset for distribution type metrics {pull}41822[41822]
- Fixed `creation_date` scientific notation output in the `elasticsearch.index` metricset. {pull}42053[42053]

- Fix bug where metricbeat unintentionally triggers Windows ASR. {pull}42177[42177]

*Osquerybeat*

Expand Down Expand Up @@ -371,8 +373,10 @@ https://github.com/elastic/beats/compare/v8.8.1\...main[Check the HEAD diff]
- Add support for SSL and Proxy configurations for websoket type in streaming input. {pull}41934[41934]
- AWS S3 input registry cleanup for untracked s3 objects. {pull}41694[41694]
- The environment variable `BEATS_AZURE_EVENTHUB_INPUT_TRACING_ENABLED: true` enables internal logs tracer for the azure-eventhub input. {issue}41931[41931] {pull}41932[41932]
- The Filestream input now uses the `fingerprint` file identity by default. The state from files are automatically migrated if the previous file identity was `native` (the default) or `path`. If the `file_identity` is explicitly set, there is no change in behaviour. {issue}40197[40197] {pull}41762[41762]
- Rate limiting operability improvements in the Okta provider of the Entity Analytics input. {issue}40106[40106] {pull}41977[41977]
- Added default values in the streaming input for websocket retries and put a cap on retry wait time to be lesser than equal to the maximum defined wait time. {pull}42012[42012]
- Rate limiting fault tolerance improvements in the Okta provider of the Entity Analytics input. {issue}40106[40106] {pull}42094[42094]

*Auditbeat*

Expand All @@ -386,6 +390,7 @@ https://github.com/elastic/beats/compare/v8.8.1\...main[Check the HEAD diff]

- Added status to monitor run log report.
- Upgrade node to latest LTS v18.20.3. {pull}40038[40038]
- Add support for RFC7231 methods to http monitors. {pull}41975[41975]

*Metricbeat*

Expand Down
20,875 changes: 10,457 additions & 10,418 deletions NOTICE.txt

Large diffs are not rendered by default.

2 changes: 2 additions & 0 deletions filebeat/_meta/config/filebeat.global.reference.yml.tmpl
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,8 @@
# batch of events has been published successfully. The default value is 1s.
#filebeat.registry.flush: 1s

# The interval which to run the registry clean up
#filebeat.registry.cleanup_interval: 5m

# Starting with Filebeat 7.0, the registry uses a new directory format to store
# Filebeat state. After you upgrade, Filebeat will automatically migrate a 6.x
Expand Down
59 changes: 46 additions & 13 deletions filebeat/_meta/config/filebeat.inputs.reference.yml.tmpl
Original file line number Diff line number Diff line change
Expand Up @@ -303,7 +303,7 @@ filebeat.inputs:
# If enabled, instead of relying on the device ID and inode values when comparing files,
# compare hashes of the given byte ranges in files. A file becomes an ingest target
# when its size grows larger than offset+length (see below). Until then it's ignored.
#prospector.scanner.fingerprint.enabled: false
#prospector.scanner.fingerprint.enabled: true

# If fingerprint mode is enabled, sets the offset from the beginning of the file
# for the byte range used for computing the fingerprint value.
Expand Down Expand Up @@ -438,8 +438,9 @@ filebeat.inputs:
#clean_removed: true

# Method to determine if two files are the same or not. By default
# the Beat considers two files the same if their inode and device id are the same.
#file_identity.native: ~
# a fingerprint is generated using the first 1024 bytes of the file,
# if the fingerprints match, then the files are considered equal.
#file_identity.fingerprint: ~

# Optional additional fields. These fields can be freely picked
# to add additional information to the crawled log files for filtering
Expand Down Expand Up @@ -770,25 +771,57 @@ filebeat.inputs:
# Journald input is experimental.
#- type: journald
#enabled: true
#id: service-foo

# You may wish to have separate inputs for each service. You can use
# include_matches.or to specify a list of filter expressions that are
# applied as a logical OR. You may specify filter
#include_matches.match:
#- _SYSTEMD_UNIT=foo.service
# Unique ID among all inputs, if the ID changes, all entries
# will be re-ingested
id: my-journald-id

# List of syslog identifiers
#syslog_identifiers: ["audit"]
# Specify paths to read from custom journal files.
# Leave it unset to read the system's journal
# Glob based paths.
#paths:
#- /var/log/custom.journal

# The position to start reading from the journal, valid options are:
# - head: Starts reading at the beginning of the journal.
# - tail: Starts reading at the end of the journal.
# This means that no events will be sent until a new message is written.
# - since: Use also the `since` option to determine when to start reading from.
#seek: head

# A time offset from the current time to start reading from.
# To use since, seek option must be set to since.
#since: -24h

# Collect events from the service and messages about the service,
# including coredumps.
#units: ["docker.service"]
#units:
#- docker.service

# List of syslog identifiers
#syslog_identifiers: ["audit"]

# The list of transports (_TRANSPORT field of journald entries)
#transports: ["audit"]

# Parsers are also supported, here is an example of the multiline
# Filter logs by facilities, they must be specified using their numeric code.
#facilities:
#- 1
#- 2

# You may wish to have separate inputs for each service. You can use
# include_matches.or to specify a list of filter expressions that are
# applied as a logical OR.
#include_matches.match:
#- _SYSTEMD_UNIT=foo.service

# Uses the original hostname of the entry instead of the one
# from the host running jounrald
#save_remote_hostname: false

# Parsers are also supported, the possible parsers are:
# container, include_message, multiline, ndjson, syslog.
# Here is an example of the multiline
# parser.
#parsers:
#- multiline:
Expand Down
23 changes: 23 additions & 0 deletions filebeat/_meta/config/filebeat.inputs.yml.tmpl
Original file line number Diff line number Diff line change
Expand Up @@ -41,3 +41,26 @@ filebeat.inputs:
#fields:
# level: debug
# review: 1

# journald is an input for collecting logs from Journald
- type: journald

# Unique ID among all inputs, if the ID changes, all entries
# will be re-ingested
id: my-journald-id

# The position to start reading from the journal, valid options are:
# - head: Starts reading at the beginning of the journal.
# - tail: Starts reading at the end of the journal.
# This means that no events will be sent until a new message is written.
# - since: Use also the `since` option to determine when to start reading from.
#seek: head

# A time offset from the current time to start reading from.
# To use since, seek option must be set to since.
#since: -24h

# Collect events from the service and messages about the service,
# including coredumps.
#units:
#- docker.service
11 changes: 11 additions & 0 deletions filebeat/docs/faq.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,10 @@ We do not recommend reading log files from network volumes. Whenever possible, i
send the log files directly from there. Reading files from network volumes (especially on Windows) can have unexpected side
effects. For example, changed file identifiers may result in {beatname_uc} reading a log file from scratch again.

If it is not possible to read from the host, then using the
<<filebeat-input-filestream-file-identity-fingerprint, `fingerprint`>>
file identity is the next best option.

[[filebeat-not-collecting-lines]]
=== {beatname_uc} isn't collecting lines from a file

Expand Down Expand Up @@ -71,6 +75,13 @@ By default states are never removed from the registry file. To resolve the inode

You can use <<{beatname_lc}-input-log-clean-removed,`clean_removed`>> for files that are removed from disk. Be aware that `clean_removed` cleans the file state from the registry whenever a file cannot be found during a scan. If the file shows up again later, it will be sent again from scratch.

Aside from that you should also change the
<<filebeat-input-filestream-file-identity, `file_identity`>> to
<<filebeat-input-filestream-file-identity-fingerprint,
`fingerprint`>>. If you were using `native` (the default) or `path`,
the state of the files will be automatically migrated to
`fingerprint`.

include::filebeat-log-rotation.asciidoc[]

[[windows-file-rotation]]
Expand Down
71 changes: 49 additions & 22 deletions filebeat/docs/inputs/input-filestream-file-options.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -150,9 +150,9 @@ The default setting is 10s.
[id="{beatname_lc}-input-{type}-scan-fingerprint"]
===== `prospector.scanner.fingerprint`

Instead of relying on the device ID and inode values when comparing files, compare hashes of the given byte ranges of files.

Enable this option if you're experiencing data loss or data duplication due to unstable file identifiers provided by the file system.
Instead of relying on the device ID and inode values when comparing
files, compare hashes of the given byte ranges of files. This is the
default behaviour for {beatname_uc}.

Following are some scenarios where this can happen:

Expand Down Expand Up @@ -542,34 +542,71 @@ indirectly set higher priorities on certain inputs by assigning a higher
limit of harvesters.

[float]
[id="{beatname_lc}-input-{type}-file-identity"]
===== `file_identity`

Different `file_identity` methods can be configured to suit the
environment where you are collecting log messages.

WARNING: Changing `file_identity` methods between runs may result in
duplicated events in the output.
IMPORTANT: Changing `file_identity` is only supported from `native` or
`path` to `fingerprint`. On those cases {beatname_uc} will
automatically migrate the state of the file when {type} starts.

WARNING: Any unsupported change in `file_identity` methods between
runs may result in duplicated events in the output.

[id="{beatname_lc}-input-{type}-file-identity-fingerprint"]
*`fingerprint`*:: The default behaviour of {beatname_uc} is to
identify files based on content by hashing a specific range (0 to 1024
bytes by default).

WARNING: In order to use this file identity option, you must enable
the <<{beatname_lc}-input-filestream-scan-fingerprint,fingerprint
option in the scanner>>. Once this file identity is enabled, changing
the fingerprint configuration (offset, length, or other settings) will
lead to a global re-ingestion of all files that match the paths
configuration of the input.

Please refer to the
<<{beatname_lc}-input-filestream-scan-fingerprint,fingerprint
configuration for details>>.

[source,yaml]
----
file_identity.fingerprint: ~
----

*`native`*:: The default behaviour of {beatname_uc} is to differentiate
between files using their inodes and device ids.
*`native`*:: Differentiates between files using their inodes and
device ids.
+
In some cases these values can change during the lifetime of a file.
For example, when using the Linux link:https://en.wikipedia.org/wiki/Logical_Volume_Manager_%28Linux%29[LVM] (Logical Volume Manager), device numbers are allocated dynamically at module load (refer to link:https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/logical_volume_manager_administration/lv#persistent_numbers[Persistent Device Numbers] in the Red Hat Enterprise Linux documentation). To avoid the possibility of data duplication in this case, you can set `file_identity` to `path` rather than `native`.
For example, when using the Linux
link:https://en.wikipedia.org/wiki/Logical_Volume_Manager_%28Linux%29[LVM]
(Logical Volume Manager), device numbers are allocated dynamically at
module load (refer to
link:https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/logical_volume_manager_administration/lv#persistent_numbers[Persistent
Device Numbers] in the Red Hat Enterprise Linux documentation). To
avoid the possibility of data duplication in this case, you can set
`file_identity` to `fingerprint` rather than the default `native`.
+
The states of files generated by `native` file identity can be migrated to `fingerprint`.

[source,yaml]
----
file_identity.native: ~
----

*`path`*:: To identify files based on their paths use this strategy.

+
WARNING: Only use this strategy if your log files are rotated to a folder
outside of the scope of your input or not at all. Otherwise you end up
with duplicated events.

+
WARNING: This strategy does not support renaming files.
If an input file is renamed, {beatname_uc} will read it again if the new path
matches the settings of the input.
+
The states of files generated by `path` file identity can be migrated to `fingerprint`.

[source,yaml]
----
Expand All @@ -578,25 +615,14 @@ file_identity.path: ~

*`inode_marker`*:: If the device id changes from time to time, you must use
this method to distinguish files. This option is not supported on Windows.

+
Set the location of the marker file the following way:

[source,yaml]
----
file_identity.inode_marker.path: /logs/.filebeat-marker
----

*`fingerprint`*:: To identify files based on their content byte range.

WARNING: In order to use this file identity option, you must enable the <<{beatname_lc}-input-filestream-scan-fingerprint,fingerprint option in the scanner>>. Once this file identity is enabled, changing the fingerprint configuration (offset, length, or other settings) will lead to a global re-ingestion of all files that match the paths configuration of the input.

Please refer to the <<{beatname_lc}-input-filestream-scan-fingerprint,fingerprint configuration for details>>.

[source,yaml]
----
file_identity.fingerprint: ~
----

[[filestream-log-rotation-support]]
[float]
=== Log rotation
Expand All @@ -609,6 +635,7 @@ When reading from rotating files make sure the paths configuration includes
both the active file and all rotated files.

By default, {beatname_uc} is able to track files correctly in the following strategies:

* create: new active file with a unique name is created on rotation
* rename: rotated files are renamed

Expand Down
Loading

0 comments on commit 225f9d7

Please sign in to comment.