Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve URI recognition #1736

Merged
merged 35 commits into from
Oct 24, 2024

Conversation

Seb-MCaw
Copy link
Contributor

@Seb-MCaw Seb-MCaw commented Aug 12, 2024

Currently, there are a number of places where Alire tries to guess the nature of a resource based on its URI (or a path or SCP-style remote location, neither of which is technically a URI). Alire.URI.Scheme provides some of this functionality, but it's not always used, and there are additional checks (e.g. looking for a .git suffix) which are applied subsequently by the calling code (but only sometimes). With a view to adding better support for ssh based private indexes and origins, and making it easier to support other kinds in future (these, for example), this PR attempts to consolidate all of these checks into Alire.URI so that they are applied consistently.

A few other URI-related things have also been changed in hope of improving consistency.


I believe this PR makes the following substantive changes to alr's behaviour:

  • The scheme ssh:// is now recognised, though will generally raise an error unless there are other indications of what the URL points to (e.g. a .git suffix).

  • file: urls are now always:

    • not produced with the form file://relative/path (problematic because officially this is absolute path /path on the host relative), though user inputs should still be recognised as before (i.e. the relative path relative/path) and converted into the form file:relative/path (which is still not technically valid, but at least no longer ambiguous)
    • converted to bare paths before passing to git etc. (which results in more efficient cloning, but not 100% sure if hardlinks will ever cause issues)
    • interpreted as paths in their entirety, i.e. # and ? are interpreted literally (#<commit> fragments were never needed/accepted for user input anyway)
  • bitbucket.org is now recognised as a git-only host (this was previously based on the list in Known_Transformable_Hosts, but is now implemented by Is_Known_Git_Host)

  • A .git/ suffix is now treated the same as .git.

  • Recognition of SCP-style git remotes (git@host:/path) is slightly stricter in some places, such that git@host/with/slash:/path is now always treated as a local path (https://git-scm.com/docs/git-clone#URLS).

  • Alire now adds a git+ prefix instead of .git suffix whenever it needs to make a URL explicitly Git (on the basis that the former is specific to Alire, while the latter modifies the actual URL).

  • git+file: index URLs are now converted to absolute paths (though in practice I don't think this changes anything other than the output of alr index --list, since git clone does the conversion anyway).

  • alr publish now gives The origin cannot use a private remote for ssh:// URLs and anything else recognised as private (unless --for-private-index is supplied), not just those of the form git@host:path

  • alr publishing an origin which might be a git remote (e.g. https://github.com/mosteo/aaa/archive/v0.1.tar.gz) as a source archive now gives a warning rather than an un---force-able error.

  • alr publish file:/some/path/ now always raises an error unless the path ends with a recognised archive file extension (previously there was a bug where the full URL was treated as a relative path, which meant effectively running the publishing procedure for the current working directory, even if it differed from /some/path/)

  • Pins no longer treat https: URLs with any xyz+ prefix as a git remote, only those with git+ (or other identifying features).


There are some other obscure edge cases and behaviours which have changed, so here is my attempt at a complete account of all changes for each subprogram:

  • Alire.Index_On_Disk.Directory.Index_Directory and Alire.Index_On_Disk.Directory.New_Handler
    • Now accept file:/path [previously only accepted file:///path]
  • Alire.Index_On_Disk.Git.Add
    • Now separates commit part from URL before passing to VCSs.Git.Handler.Clone. I believe this is now the only place where the URL fragment is used for a commit (because the UI doesn't have a separate commit argument).
  • Alire.Index_On_Disk.Git.New_Handler
    • Now recognises Git URLs without git+ when they have a .git or .git/ suffix or a known git host
  • Alire.Index_On_Disk.New_Handler
    • Now generates file: URLs not file://
    • Now recognises git+file: URLs [previously only file:] and Windows paths containing @ or + as local indexes for path validation purposes
    • Now recognises Git URLs without git+ when they have a .git or .git/ suffix or a known git host
    • Now converts relative paths to absolute for git+file: URLs
  • Alire.Origins.From_TOML
    • Now recognises git urls without git+ when they have a .git or .git/ suffix or a known git host (unless hashes is specified) [previously always assumed https:// was a source archive, and error on ssh://]
    • ssh:// urls which aren't recognised as a VCS now produce an an error message in line with the equivalent for indexes.
  • Alire.Origins.New_VCS
    • Special handling of /s for file: schemes now also applies to mixed-case equivalents (schemes are case insensitive)
    • No longer adds a .git suffix to https://github.com/repo or https://gitlab.com/repo urls (which I'm pretty sure was redundant)
    • Now recognises ssh:// git urls without git+ when they have a .git suffix or a known git host [previously raised unknown VCS URL]
    • Now converts all file: urls to local paths (file:///some/path, not just file:/some/path)
    • No longer includes fragment as part of URL (except for file: URLs, where it is treated as part of the path)
    • Now treats .git/ suffix identically to .git
  • Alire.Origins.To_TOML
    • Only adds prefixes to VCS origins when they are otherwise ambiguous
  • Alire.Origins.Deployers.*.Deploy
    • Now pass commit as a separate parameter of VCS.*.Handler.Clone [previously concatenated into one parameter with # separator, which was separated again by VCS.*.Handler.Clone]
  • Alire.Origins.Tweaks.Fixed_Origin
    • Now converts file: and git+file: URLs to bare paths
    • Now converts relative paths in git+file: URLs into absolute paths, as it already does for file: URLs (though I don't think it is ever called with such a URL)
  • Alire.Publish.Verify_Origin
    • Now produces The origin cannot use a private remote error for all URIs that appear to be private (not just those of the form git@host:path)
    • No longer permits unknown schemes
    • Recognition of trusted sites now uses the value of Alire.URI.Host (e.g. SCP-style git remotes are recognised)
  • Alire.Publish.Local_Repository
    • Now only rejects private URIs according to Alire.Publish.Verify_Origin (so git+https is no longer considered private and local URIs can be forced)
    • Now resolves ambiguity of URLs by adding git+ prefix, not .git suffix
    • No longer permits unknown schemes
    • Precondition no longer permits file: schemes
  • Alire.Publish.Remote_Origin
    • Error for not specifying a commit when the URI is recognised as a VCS repo is now the same whether or not the URI has a .git suffix (Specifically, URL seems to point to a repository, but no commit was provided.)
    • Raises errors directly for unknown or inappropriate schemes
    • Publishing an origin which might be a git remote (e.g. https://github.com/mosteo/aaa/archive/v0.1.tar.gz) as a source archive now gives a warning rather than a non---force-able error
    • Now treats .git/ suffix identically to .git
  • Alire.Publish.Prepare_Archive
    • The default for the URL confirmation is now No if the URL looks like a git repo.
  • Alire.Roots.Editable.Add_Remote_Pin
    • Now passes commit as a separate parameter of VCS.Git.Handler.Clone [previously concatenated into one parameter with # separator, which was separated again by VCS.Git.Handler.Clone]
  • Alire.User_Pins.Deploy
    • Now passes commit as a separate parameter of VCS.Git.Handler.Clone [previously concatenated into one parameter with # separator, which was separated again by VCS.Git.Handler.Clone]
  • Alire.VCSs.*.Clone
    • Now Take commit as separate parameter, instead of extracting fragment from URL
  • Alire.VCSs.Git.Transform_To_Public
    • Now recognises hosts according to Alire.URI.Host
    • Now resolves ambiguity of URLs by adding git+ prefix, not .git suffix
  • Alire.VCSs.Repo (now called Alire.VCSs.Repo_URL)
    • Now removes both versions of file: scheme (file:// and file:)
    • Now only strips recognised prefixes (git+, hg+ and svn+)
  • Alr.Commands.Publish.Execute
    • The behaviour of URLs with a file: scheme which end with a recognised archive file extension is unchanged
    • URLs with a file: scheme and which do not end with a recognised archive file extension are now treated as a remote origin (as is already the case for git+file:), in line with the docs
      • Previously, due to a bug where the URI is interpreted as a relative path (the debug output includes lines of the form debug: Looking for alire metadata at: /current/working/directory/file:/path/to/repo; presumably this is because the Alire.Roots package expects only paths, but I haven't investigated further):
        • When a git repository containing an Alire workspace was present at any ancestor of the CWD (even if it differed from the path specified in the argument), alr publish file:/path/to/repo (with or without specifying a commit id) performed the normal publishing procedure with that repo's configured remote as the origin url in the index manifest (not file:/path/to/workspace), or raised an error if no remote was configured
        • When no ancestor of the CWD was a git repository containing an Alire workspace, alr publish file:/path/to/repo (with or without specifying a commit id) gave error: No Alire workspace found at file:/path/to/repo
      • Now always gives error: unknown VCS URL: file:/path/to/repo when a commit is specified, and error: Unable to determine archive format from file extension otherwise
  • Alr.Commands.Pin.Execute
    • Now recognises ssh URIs with known git host or a .git/ suffix as remote repos
    • No longer treats http[s] URIs as repositories unless they have a git+ prefix, .git[/] suffix or known host
    • No longer treats local paths ending with .git as remote repos (unless scheme is git+file:)
    • Now gives warning when treating something that looks like a URL as a directory path
  • Alr.Commands.Withing.Add_With_Pin
    • Now recognises ssh URIs with known git host or a .git/ suffix as remote repos
    • No longer treats http[s] URIs as repositories unless they have a git+ prefix, .git[/] suffix or known host
    • No longer treats local paths ending with .git as remote repos (unless scheme is git+file:)
    • Now gives warning when treating something that looks like a URL as a directory path
    • Now gives a confirmation prompt if a directory path doesn't exist

- Consolidate a number of separate ad hoc tests for recognising URIs into Alire.URI.URI_Kind
- Add support for origin URLs with scheme "ssh://"
- Add bitbucket.org to the list of hosts recognised as git only
- Treat ".git/" suffix the same as ".git"
- Change publish command to raise errors on URLs with "file:" scheme (sidestepping a bug where the whole URL was treated as a relative path)
- Change various error messages
- Change handling of some obscure edge cases
@Seb-MCaw Seb-MCaw force-pushed the feat/improve-uri-recognition branch 3 times, most recently from 2764a45 to 4186694 Compare August 13, 2024 10:54
@Seb-MCaw
Copy link
Contributor Author

Seb-MCaw commented Aug 13, 2024

Compilation currently fails with GNAT 10. Versions 11 and later compile fine, so it's presumably a bug with GNAT, but I haven't been able to find a workaround.

Any ideas? If not, is it feasible to bump the minimum version up to 11?

@mosteo
Copy link
Member

mosteo commented Aug 13, 2024

Thanks for the comprehensive write-up. I will have to peruse the changes in this one.

Any ideas? If not, is it feasible to bump the minimum version up to 11?

IIRC GNAT 10 is the one shipped in Ubuntu LTS 20.04, which is the one we use for releases as the oldest supported by GitHub action runners. So until that changes, bumping to 11 is not very attractive.

I will look into finding a workaround, I have faced similar situations in a few occasions.

@Seb-MCaw
Copy link
Contributor Author

There's no rush; I appreciate there's a lot to go through.

I wasn't sure how important supporting GNAT 10 was, but that makes sense. In that case I'll keep looking for a workaround too (I've only tried a few obvious things so far).

@Seb-MCaw
Copy link
Contributor Author

Turned out to be simpler than expected. Not sure what's up with this one, though.

@mosteo
Copy link
Member

mosteo commented Sep 3, 2024

Please mark ready for review once done so I know to check it.

@Seb-MCaw
Copy link
Contributor Author

Seb-MCaw commented Sep 3, 2024

Please mark ready for review once done so I know to check it.

Will do. #1745 (which is ready for review at your convenience) seemed a higher priority, but I'm back to working on this one now.

@mosteo
Copy link
Member

mosteo commented Sep 3, 2024

Ah, I thought #1745 relied on some changes here. Will review that one then.

@Seb-MCaw Seb-MCaw force-pushed the feat/improve-uri-recognition branch 2 times, most recently from 59d8c81 to 090d5ec Compare September 24, 2024 10:53
@Seb-MCaw Seb-MCaw force-pushed the feat/improve-uri-recognition branch from 090d5ec to 6732fcd Compare September 24, 2024 14:56
@jquorning
Copy link
Contributor

jquorning commented Oct 11, 2024

NB: spellcheck reports several problems but can not report it here.

@Seb-MCaw
Copy link
Contributor Author

NB: spellcheck reports several problems but can not report it here.

So it does. Thanks for pointing that out.

@mosteo
Copy link
Member

mosteo commented Oct 15, 2024

Sorry, @Seb-MCaw, I didn't realize this one was marked as ready for review. Whenever you have one ready for merging, it's better that you explicitly request the review, which will mail me so I won't miss it.

@Seb-MCaw
Copy link
Contributor Author

Whenever you have one ready for merging, it's better that you explicitly request the review, which will mail me so I won't miss it.

OK, good to know.

Copy link
Member

@mosteo mosteo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a couple of very minor changes, and a few questions to verify my assumptions. Great work!

testsuite/tests/index/git-local/test.py Outdated Show resolved Hide resolved
testsuite/tests/with/equivalent/test.py Show resolved Hide resolved
Comment on lines 139 to 148
-- Formats currently not recognized include:
-- user@host:/path/to/repo.git [returns Unknown]
-- host.name:/path/to/repo.git [returns Unknown]
-- git://host/path/to/repo.git [returns Unknown]
-- ftp://host/path/to/repo.git [returns Unknown]
-- svn://(something) [returns Unknown]
-- https://user:pass@host/repo.git [returns Public_Git]
-- https://user@host/repo.git [returns Public_Git]
-- https://user:pass@host/path [returns Public_Other]
-- https://user@host/path [returns Public_Other]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it on purpose that those Public_Git are not being recognized as Private_Git?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a general rule, I have tried to preserve existing behaviour, except where said behaviour was inconsistent. There was previously no distinction between https URLs with or without a userinfo component, so I have kept that the same.

If you want, I could change Alire.URI_Kind to treat http[s] URLs with a non-empty userinfo component as private, but I haven't really looked into whether this would break anything.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Well, as long as you have this fresh in your mind, I think it's a good opportunity to plug some holes. In particular, the ones returning Public_* could return Private_* and I don't expect it should affect anything. If you find unexpected troubles we could decide on leaving it as-is.

testsuite/tests/clean/cache-files/test.py Show resolved Hide resolved
testsuite/tests/index/git-local/test.py Outdated Show resolved Hide resolved
@Seb-MCaw Seb-MCaw requested a review from mosteo October 18, 2024 12:17
Copy link
Member

@mosteo mosteo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me this is OK for merging once you give a stab at the Public/Private identification I mention in another comment. Thanks!

@Seb-MCaw Seb-MCaw requested a review from mosteo October 23, 2024 13:06
@mosteo
Copy link
Member

mosteo commented Oct 24, 2024

Thanks, @Seb-MCaw. I see the last changes weren't as light as hoped.

@mosteo mosteo merged commit 12458f6 into alire-project:master Oct 24, 2024
25 checks passed
@Seb-MCaw Seb-MCaw deleted the feat/improve-uri-recognition branch October 24, 2024 14:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants