Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add draft file-content resource specification #220

Closed

Conversation

miroman9364
Copy link
Contributor

PR Summary

The purpose of the file-content resource is to ensure the content of a file matches the desired state.

  • If the file does not exist, the file is created.
  • The file's checksum is computed. If a checksum was passed in and both checksums match, processing
    is complete.
  • If no checksum was provided, the files content is compared to the specified content.
    • If the specified content is provided as YAML, then the YAML parsing rules for blocks are applied.
    • If the text contains escape sequences, the escape sequences are converted.
    • If the eol property is provided, all default escape sequences are replaced with the eol value,
      which can be one of LF, CRLF, or CR. Note: This property is only meaningful with YAML input.
  • If the source content does not match the file, then the file is clobbered and replaced with the source
    content.

PR Context

This PR provides a starting point to discuss the behavior for the file-content resource implementation.

The purpose of the file-content resource is to ensure the content of a file matches the desired state.

- If the file does not exist, the file is created.
- The file's checksum is computed. If a checksum was passed in and both checksums match, processing
  is complete.
- If no checksum was provided, the files content is compared to the specified content.
  - If the specified content is provided as YAML, then the YAML parsing rules for blocks are applied.
  - If the text contains escape sequences, the escape sequences are converted.
  - If the `eol` property is provided, all default escape sequences are replaced with the `eol` value,
    which can be one of `LF`, `CRLF`, or `CR`. Note: This property is only meaningful with YAML input.
- If the source content does not match the file, then the file is clobbered and replaced with the source
  content.
@michaeltlombardi
Copy link
Collaborator

The first point notes that the resource creates a file if it doesn't exist - we may want an explicit example case showing that the file can be removed.

Thinking about the eventual schema, is it in scope to define a file that exists but with content we don't care about? Otherwise, the schema would have to require at least one of content and hash.checksum.

Thinking about eol, I think that's a writeOnly property - it's a directive that controls how the ingested YAML should be read, but that's handled one level removed (at least) from the resource - so maybe it should munge any non-escaped EOL characters?

@miroman9364
Copy link
Contributor Author

Thinking about eol, I think that's a writeOnly property - it's a directive that controls how the ingested YAML should be read, but that's handled one level removed (at least) from the resource - so maybe it should munge any non-escaped EOL characters?

I agree 100% about it being writeOnly.

It's a directive that is specific to YAML -- for handling the case where YAML is going to provide LF, but on Windows, some use cases will need CRLF. I'm still coming up to speed on who/where some processing happens. If the resource is always getting JSON from DSC and not processing the input directly, then it's not possible to determine the difference between a YAML parser inserted LF and an escaped LF coming from the source.

@miroman9364
Copy link
Contributor Author

The first point notes that the resource creates a file if it doesn't exist - we may want an explicit example case showing that the file can be removed.

Right now, the missing delete example is intentional, but open for discussion.

I think this resource is literally just for file content. I'm trying to avoid the heavily overloaded resource that is available from the existing Windows File resource. I can imagine different resources for delete, mirror-tree, acl-[Windows|Linux|macOS], maybe something just for file-info.

I'm going to insist that delete is something that the robo-copy resource does by copying empty directories. ;)

@miroman9364
Copy link
Contributor Author

BTW, I also think compare with match/replace is a separate resource. Right now I am thinking of something that supports an array of sed-like commands that run like a sed pipeline. (Not sed, but sed-like, so not the full sed syntax, and using ECMA 2018 regex, since there's a crate for that.)

@michaeltlombardi
Copy link
Collaborator

I think this resource is literally just for file content. I'm trying to avoid the heavily overloaded resource that is available from the existing Windows File resource. I can imagine different resources for delete, mirror-tree, acl-[Windows|Linux|macOS], maybe something just for file-info.

I think I disagree - the resources are declarative, and for the most part what they can do they should be able to _un_do. I'm not suggesting that this resource should be made a full replacement for the File resource - though I do think we need one, eventually, because laying down files is a critical part of configuration management (as is removing them, though less frequently) - but the declarative model that _exist represents is a contract for the resource being able to create, update, and delete the instance as needed.

I also think compare with match/replace is a separate resource. Right now I am thinking of something that supports an array of sed-like commands that run like a sed pipeline.

I'm not sure I follow on this - what declarative state would I model with this resource?

@michaeltlombardi
Copy link
Collaborator

If the resource is always getting JSON from DSC and not processing the input directly, then it's not possible to determine the difference between a YAML parser inserted LF and an escaped LF coming from the source.

That's my understanding, so in that case the only coherent model would be to say "turn all EOL sequences into this sequence" or "leave the EOL sequences alone" I think.

@miroman9364
Copy link
Contributor Author

Without being able to interact directly with the YAML, I think the value of the eol property is questionable.

How common is the issue of "how to interpret embedded line endings in YAML input"?

If this is something common, it makes sense to create an issue to update the dsc command and make the behavior universal. If it is uncommon, then I think it should be removed completely. After the MVP for file-content is working, there will be a follow-up release that adds the ability to copy the content from a file. This will copy whatever line endings are in the file. Meanwhile, even without an eol property, the existing proposal will handle CRLF escape sequences.

At any rate, I think I need to remove the eol from this proposal.

@miroman9364
Copy link
Contributor Author

miroman9364 commented Oct 6, 2023

I have been thinking about, "what if I was running this as a command line tool?" One result of that is that I think I should rename the path property as file. The meaning will stay the same.

@miroman9364
Copy link
Contributor Author

I think I disagree - the resources are declarative, and for the most part what they can do they should be able to _un_do.

I think there needs to be a resource that provides presence/absence for all file objects, provides basic file info, and some amount of file-system info. I would make this the new file resource.

This resource would work for directories, files, and symbolic links. The file info would be read-only. This is not the resource that would manage file permissions (basically don't exist on Windows) or ACL (basically unique to OS, and in some cases unique to distribution).

This only real set action for this resource would be to set _exist to false, and delete the file object if necessary. It would be aware of different types of objects and have functionality to act recursively on directory trees. It would also be aware of links and provide the ability to act on the link versus the link target.

If a user needs to remove a file and manipulate the content, then they can have a dependency step that uses the file resource to remove the file.

@SteveL-MSFT
Copy link
Member

With _exists being used, I would agree that _exists = false should delete the file. In this case, the content and hash don't matter. Otherwise, there's no point to have that if _exists is always true. It may make sense for a separate FileInfo resource which handles things like permissions, etc... would also handle existence and having FileContent only ever care about the content with the assumption that the file exists.

@michaeltlombardi
Copy link
Collaborator

I always think it's a good idea to look at what existing tools use and why:

In more-or-less every case, POSIX is the default and Windows/NTFS is the special case, which makes sense for *Nix-first tools, which is how these were all developed.

Ansible has different modules because it uses on-the-target scripts to get the work done, and it's more coherent to split the implementations this way for their architecture. Ansible's file module (and win_file) create, update, and remove the files. Ansible separates out the content definition for files to copy/win_copy and template/win_template modules. The file module also handles file attributes, ownership, and mode permissions, but win_file does not.

Chef allows you to define a file with its ownership, permissions, and content. It handles POSIX and NTFS with the same resource. This is probably the most "batteries included" implementation, but that means the implementation is much more complex to provide the smoothest user experience.

Puppet uses a resource-provider model, where you swap the backend for a resource (and the resource can define different backends on a per-platform basis, as needed). The file resource handles defining a file with its ownership, permissions, and content. It's roughly on par with Chef for "batteries included" with the caveat that the translations for POSIX permissions is... imperfect, so in practice people ignore them for Windows and use the acl resource instead.

I don't really have a good handle on Salt, but it seems to be able to do content, ownership, and permissions, with the caveat that it ignores ownership and permissions on Windows.

So then the question is: why do these implementations mostly seem to collect these properties together into singular (to the user, even if the implementation is actually multiple discrete code modules) resources? Because their goal is to make declaring state as low-effort for the user as possible. Consider the difference between these hypothetical configs:

Separate Resources

$schema: https://raw.githubusercontent.com/PowerShell/DSC/main/schemas/2023/08/config/document.json

variables:
  FooConfigPath: /etc/xdg/FooBarBaz/foo/foo.config.json

# _exist is implied true for every resource
resources:
  - name: foo config file
    type: DSC.File/Basic
    properties:
      path: "[variables('FooConfigPath')]"
  - name: foo config ownership
    type: DSC.File.Posix/Ownership
    properties:
      path: "[variables('FooConfigPath')]"
      owner: app_foo
      group: app_admin
  - name: foo config permissions
    type: DSC.File.Posix/Permissions
    properties:
      path: "[variables('FooConfigPath')]"
      mode: 644
  - name: foo config content
    type: DSC.File/Content
    properties:
      path: "[variables('FooConfigPath')]"
      content: |
        server:  https://foo.app
        updates:
          automatic:       true
          check_frequency: 30

Single resource

$schema: https://raw.githubusercontent.com/PowerShell/DSC/main/schemas/2023/08/config/document.json

# _exist is implied true for every resource
resources:
  - name: foo config
    type: DSC/File
    properties:
      path:      /etc/xdg/FooBarBaz/foo/foo.config.json
      owner:     app_foo
      group:     app_admin
      mode:      644
      content: |
        server:  https://foo.app
        updates:
          automatic:       true
          check_frequency: 30

The latter is easier for a user to reason about and is less likely to conflict - what happens if I define this configuration?

resources
  - name: foo config info
    type: DSC.File/Info
    properties:
      path:   /etc/xdg/FooBarBaz/foo/foo.config.json
      _exist: false
  - name: foo config content
    type: DSC.File/Content
    properties:
      path:    /etc/xdg/FooBarBaz/foo/foo.config.json
      content: |
        server:  https://foo.app
        updates:
          automatic:       true
          check_frequency: 30

@miroman9364
Copy link
Contributor Author

I see how having separate resource configurations is more verbose and more cumbersome for the user. I think there's some missing dependencies, but I'm guessing that is part of the point.

@miroman9364
Copy link
Contributor Author

@michaeltlombardi can you go into the idea you were tossing around for composition, i.e., resource type File*? Also, whether it's a single resource, or some type of composition, what does _exist mean when a file can exist, but the content doesn't match or the permissions are different? Conversely, what do any of the properties other than the path mean if _exist is set to false on the input?

It might be cumbersome, but in the last example with the delete and the content set, with the addition of a dependency you have purge/replace, while without the dependency you might have undefined behavior.

@SteveL-MSFT
Copy link
Member

@michaeltlombardi my expectation isn't that we'd have lots of small File related resources, but I do think it makes sense to optimize for certain domains of functionality:

  • FileContent (sed, grep, etc...)
  • FileAttribute (ownership, attributes, permissions, existence) - This could handle both Windows/non-Windows by having a nested WindowsPermissions vs PosixPermissions object.
  • FileCopy (recursion, overwrite, etc...)

These somewhat map to different command-line tools that exist on POSIX systems rather than having a single resource with a complicated schema. It also means that in the future, if a better community resource that handles file content comes along, you can just replace that resource usage with the better one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants