Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow changing filestream IDs without duplication by providing the previous ID values #42472

Open
cmacknz opened this issue Jan 29, 2025 · 3 comments · May be fixed by #42624
Open

Allow changing filestream IDs without duplication by providing the previous ID values #42472

cmacknz opened this issue Jan 29, 2025 · 3 comments · May be fixed by #42624
Assignees
Labels
Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team

Comments

@cmacknz
Copy link
Member

cmacknz commented Jan 29, 2025

Describe the enhancement:

This relates to the discussion in elastic/elastic-agent#6583 (comment) where we believe we should change the default container input ID. We need a way to do this without duplicating data when this happens. One way to solve this is to simply tell the filestream input what ID it should take over from.

Describe a specific use case for the enhancement or feature:

Provide a way to list previous filestream input IDs such that it will start from where it left off even if the ID changes. For example:

id: container-log-${kubernetes.namespace}-${kubernetes.pod.name}-${kubernetes.container.id}
previous_ids:
 - container-log-${kubernetes.pod.name}-${kubernetes.container.id}
@cmacknz cmacknz added the Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team label Jan 29, 2025
@elasticmachine
Copy link
Collaborator

Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane)

@belimawr
Copy link
Contributor

Thinking about it, it looks reasonable to implement this logic. Recently I merged a PR that migrates the state from one file identity to another, which effectively is copying a registry entry into a new one with a different ID. Migrating the input ID is the same thing.

The code for that is here:

cleaner.UpdateIdentifiers(func(v loginp.Value) (string, interface{}) {
var fm fileMeta
err := v.UnpackCursorMeta(&fm)
if err != nil {
return "", nil
}
fd, ok := files[fm.Source]
if !ok {
return "", fm
}
// Return early (do nothing) if:
// - The identifiers are the same
// - The old identifier is neither native nor path
oldIdentifierName := fm.IdentifierName
if oldIdentifierName == identifierName ||
!(oldIdentifierName == nativeName || oldIdentifierName == pathName) {
return "", nil
}
// Our current file (source) is in the registry, now we need to ensure
// this registry entry (resource) actually refers to our file. Sources
// are identified by path, however as log files rotate the same path
// can point to different files.
//
// So to ensure we're dealing with the resource from our current file,
// we use the old identifier to generate a registry key for the current
// file we're trying to migrate, if this key matches with the key in the
// registry, then we proceed to update the registry.
registryKey := v.Key()
oldIdentifier, ok := identifiersMap[oldIdentifierName]
if !ok {
// This should never happen, but just in case we properly handle it.
// If we cannot find the identifier, move on to the next entry
// some identifiers cannot be migrated
p.logger.Errorf(
"old file identity '%s' not found while migrating entry to"+
"new file identity '%s'. If the file still exists, it will be re-ingested",
oldIdentifierName,
identifierName,
)
return "", nil
}
previousIdentifierKey := newID(oldIdentifier.GetSource(
loginp.FSEvent{
NewPath: fm.Source,
Descriptor: fd,
}))
// If the registry key and the key generated by the old identifier
// do not match, log it at debug level and do nothing.
if previousIdentifierKey != registryKey {
return "", fm
}
// The resource matches the file we found in the file system, generate
// a new registry key and return it alongside the updated meta.
newKey := newID(p.identifier.GetSource(loginp.FSEvent{NewPath: fm.Source, Descriptor: fd}))
fm.IdentifierName = identifierName
p.logger.Infof("registry key: '%s' and previous file identity key: '%s', are the same, migrating. Source: '%s'",
registryKey, previousIdentifierKey, fm.Source)
return newKey, fm
})

On a brief analysis it looks pretty reasonable to do, specially because we already have a similar feature.

@belimawr
Copy link
Contributor

I'd love a quick-and-dirty POC before we commit with this feature. There will be two changes to the "same" registry entry in 9.0 if we implement this feature, I'd love to explore corner cases and possible weird interactions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants