-
Notifications
You must be signed in to change notification settings - Fork 154
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Switch k8s input paths to /var/log/pods/* to ingest rotated container logs #6583
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -16,12 +16,12 @@ Config input for container logs | |
namespace: {{ .Values.kubernetes.namespace }} | ||
use_output: {{ .Values.kubernetes.output }} | ||
streams: | ||
- id: kubernetes-container-logs-${kubernetes.pod.name}-${kubernetes.container.id} | ||
- id: kubernetes-container-logs-${kubernetes.namespace}-${kubernetes.pod.name}-${kubernetes.container.name} | ||
data_stream: | ||
dataset: kubernetes.container_logs | ||
type: logs | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As long as this translates to a running Starting with 9.0 we will have fingerprint as a default file identity which is based on the file content, not its filesystem metadata. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Does this mean that an upgrade from 8.x to 9.x will result in a re-index of all files? That would be very bad. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
No, we also implemented a feature to migrate the state from the old default ( |
||
paths: | ||
- '/var/log/containers/*${kubernetes.container.id}.log' | ||
- '/var/log/pods/${kubernetes.namespace}_${kubernetes.pod.name}_${kubernetes.pod.uid}/${kubernetes.container.name}/*.log' | ||
prospector.scanner.symlinks: {{ dig "vars" "symlinks" true .Values.kubernetes.containers.logs }} | ||
parsers: | ||
- container: | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new ID does not seem to be unique, this will cause problems and lead to data duplication.
Are you sure that
${kubernetes.namespace}-${kubernetes.pod.name}-${kubernetes.container.name}
will always be unique and will make this block be rendered only once?The previous implementation used the
${kubernetes.container.id}
because that's guaranteed to always be unique.The key thing we have to ensure here is that there will never be two Filestream inputs started with the same ID.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see this file is in the helm chart folder, how does upgrade here work? Is there any change an user will upgrade and have the input ID changing? If so, that will lead to data duplication because the new ID will make all files be considered new, thus fully re-ingested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@belimawr some things I have in mind, the combination of
${kubernetes.namespace}-${kubernetes.pod.name}-${kubernetes.container.name}
should always be unique in a k8s cluster, right?!also about the
indeed when we restart shouldn't fingerprint be the same no matter the ID?! that said, if we have to do it then we should do it now that helm chart is not GA yet?! 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hope so, however I'm not sure. It would be nice to back it with some K8s documentation. I know for sure
${kubernetes.container.id}
is always unique, hence it's the one I've always used/recommended.Yes, however, the input ID is part of the state ID, hence changing the input ID will affect the final state of the file.
Within a single instance of the Filestream input, files are identified by the
file_identity
, in this case it'sfingerpring
, however the Filestream input will only look at states that match it's ID.Here is an example of an registry (file state) entry:
Even if the file is the same, changing the input ID will change the key:
"k": "filestream::my-filestream-id::native::8787338-65029"
.That's actually one of the benefits of Filestream over Log input, because the input ID is part of the state ID, we can have different instances of the input harvesting the same file as long as the input ID is different.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quoting from k8s docs https://kubernetes.io/docs/concepts/overview/working-with-objects/namespaces/#when-to-use-multiple-namespaces
Within a pod spec container names must be unique so once the pod is created the tuple (namespace, pod name, container name) is unique and the input definition does not change with container restarts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for checking the docs and clarifying @pchila !!
Then ignore the change I suggested.
The only issue that persists is the ID changing when the helm chart is upgraded... That will cause re-ingestion of files, even with fingerprint.