Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal for version 2 #24

Merged
merged 8 commits into from
Oct 18, 2024
Merged

Proposal for version 2 #24

merged 8 commits into from
Oct 18, 2024

Conversation

m-mohr
Copy link
Contributor

@m-mohr m-mohr commented Apr 25, 2024

This PR proposes version 2 of this extension and fixes most issues that are open.
It is meant to be a better, more universal and more lightweight approach based on the learnings from v1.
It also reworks the extension so that less duplication happens in assets based on the archtiecture in the authentication extension.
This is heavily breaking but I think this is required to make this a better extension that can fullfil all the various needs.

Ideally we would define the general approach in this PR and then add the actual best practices per provider in separate PRs. The examples that I have right now are more to show how it's meant to work, not to define actual best practices for S3/AWS/Azure. Otherwise we'll probably never finish the PR itself...

Please review the changes carefully:

See the changelog below:

Added

  • storage:schemes, storage:ref and Storage Scheme Object
  • Support the storage extension in Links
  • Support for the Alternate Assets Extension
  • Support for other storage providers, including custom S3 hosts

Changed

  • The extension is a framework for storage providers, it doesn't strictly define the individual providers.
  • The storage providers are grouped in storage:schemes and located in the Item Properties, Collections or Catalog metadata
  • Assets and Links reference the storage schemes by key in storage:ref

Removed

  • storage:platform, storage:region, storage:requester_pays and storage:tier

@m-mohr m-mohr added the enhancement New feature or request label Apr 25, 2024
@m-mohr m-mohr changed the title Propose version 2 of this extension. Proposal for version 2 Apr 25, 2024
@m-mohr m-mohr linked an issue Apr 25, 2024 that may be closed by this pull request
json-schema/schema.json Show resolved Hide resolved
README.md Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
README.md Show resolved Hide resolved
README.md Show resolved Hide resolved
Copy link

@fmigneault fmigneault left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just need to address the open question about the use of URI for the providers. However, it is not blocking either. The rest looks good.

@matthewhanson
Copy link
Member

I like the use of references, which is a slightly different approach that we have done in the past and would probably be a good way to define spectral bands. But that's not relevant here.

I'm not sure why storage:refs should be a list though. Each asset has a single URL which would be for a single storage platform. Alternate assets handle the case of the same asset on different storage providers.
Is there a reason this isn't just a string?

@m-mohr
Copy link
Contributor Author

m-mohr commented Apr 30, 2024

Could be a string, if there's no way something could be hosted with the same URL on different storage providers? Not sure, would something like hosted on AWS US and AWS EU work?

examples/item-naip.json Outdated Show resolved Hide resolved
examples/item-naip.json Outdated Show resolved Hide resolved
@emmanuelmathot
Copy link
Member

Out of discussions with @m-mohr and @matthewhanson :

  1. The platform property should be renamed with the protocol used. For instance "type": "S3"
  2. a new property for extra parameters specific to the protocol (e.g. endpoint_url, requester_pays)

@m-mohr
Copy link
Contributor Author

m-mohr commented Jul 30, 2024

  1. The platform property should be renamed with the protocol used. For instance "type": "S3"

That was your proposal, which I didn't agree with yet. Neither did Matt. We need to clarify the details. Just a type seems not enough to transmit information about the storage provider. I just don't understand how this should work with just a protocol. I'd need a more concrete proposal to understand how this may work.

  1. a new property for extra parameters specific to the protocol (e.g. endpoint_url, requester_pays)

Yes, we talked about the parameters, but we did not agree yet to move the existing properties all there. I understood it more as a free-form object for additional properties, i.e. in addition to what is in this PR.
Also, this might need more standardization per provider to avoid that it's actually diverging between implementations.

Additionally, we discussed whether storage:refs should be an array or string and that the endpoints discussed above should resolve to something useful, i.e. they should usually be the actual API endpoint, not a generic service webpage or so.

@fmigneault
Copy link

Just a type seems not enough to transmit information about the storage provider.

I second this. The type: S3, for example, could be relevant for AWS, Minio, or any other S3-compatible technologies using the same protocol. It doesn't help identify the actual provider.

@m-mohr m-mohr force-pushed the v2 branch 2 times, most recently from b3a21af to 3cc6274 Compare August 12, 2024 09:38
@m-mohr
Copy link
Contributor Author

m-mohr commented Aug 12, 2024

I completely reworked the PR to make a new proposal.
This is more a high-level framework now for various providers based on URI templates and other free-form variables.
It is meant that the extension itself is just a framework and collections best-practices around various providers, but the providers are not actually enforced or so. Everyone can contribute additional documents for providers easily by just providing short documents with platform URI and variables. It feels like otherwise we'll always go through the hell of aligning, standardizing and are always outdated with every iteration in cloud provider evolutions.
Happy to discuss and looking foward to feedback and additional providers. Kept the providers intentionally small as I don't have much experience with others and want to keep this to experienced users.

Ideally we would define the general approach in this PR and then add the actual best practices per provider in separate PRs. The examples that I have right now are more to show how it's meant to work, not to define actual best practices for S3/AWS/Azure. Otherwise we'll probably never finish the PR itself...

@matthewhanson @emmanuelmathot @fmigneault @philvarner

PS: This is my last attempt to make something useful out of this extension. If we can't agree on something in the next weeks I consider this extension to be non-functional for many cases and I'll ditch it for my use cases in favor of a custom extensions per provider.

@m-mohr m-mohr linked an issue Aug 12, 2024 that may be closed by this pull request
Copy link

@fmigneault fmigneault left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Just minor comments.

README.md Show resolved Hide resolved
README.md Show resolved Hide resolved
CHANGELOG.md Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
@j-musial
Copy link

j-musial commented Oct 7, 2024

@philvarner @matthewhanson any updates regarding the reviews?

@m-mohr
Copy link
Contributor Author

m-mohr commented Oct 18, 2024

I just added a type field for validation / discriminator purposes.

@fmigneault @emmanuelmathot and everyone else: Any last minute comments? I think we can merge soon once @matthewhanson made his review.

@matthewhanson
Copy link
Member

This looks great, @m-mohr went over it with me this morning. I like the changes, big improvement over 1.0

Copy link

@fmigneault fmigneault left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice addition 👍

@m-mohr m-mohr merged commit d09b109 into main Oct 18, 2024
2 checks passed
@m-mohr m-mohr deleted the v2 branch October 18, 2024 22:34
@m-mohr
Copy link
Contributor Author

m-mohr commented Oct 18, 2024

If there are still comments, please open issues. I'll release next week. Thanks everyone.

@gadomski
Copy link
Member

@m-mohr coming in late ... feels pretty complex but since this is aimed at data providers and implementors, we can expect a decent level of technical competency. 👍🏼

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
7 participants