Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

object-name: fix resolution of object names containing curly braces #1844

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

newren
Copy link

@newren newren commented Dec 30, 2024

Maintainer note: these bugs both date back to 2006; neither is a regression in this cycle.

Changes since v1:

  • Covered the ^{...} cases, and added a testcase for those
  • Added a second patch for another bug discovered by the same reporter, where branch:path/to/file/named/major-gaffed is interpreted as a request for a commit (namely affed) rather than a blob. (At least, assuming commit affed exists)

The second patch has some backward compatibility concerns. People used to be able to do e.g. git show ${garbage}-g${hash}. I tightened it to ${valid_refname}-${number}-g${hash}, but do we want to allow e.g. ${valid_refname}-g${hash} (allowing the count to be omitted) or maybe even allow a subset of invalid refnames?

Also for the second patch, while the repository the reporter found the issue in was something else, I found two open source examples:

  • lore.git: git cat-file -t master:random/path/major-gaffed
  • git.git: git cat-file -t super-invalid~///\\[email protected]

cc: Patrick Steinhardt [email protected]
cc: Elijah Newren [email protected]

@newren
Copy link
Author

newren commented Jan 1, 2025

/submit

Copy link

gitgitgadget bot commented Jan 1, 2025

Submitted as [email protected]

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git/ pr-1844/newren/object-name-fix-v1

To fetch this version to local tag pr-1844/newren/object-name-fix-v1:

git fetch --no-tags https://github.com/gitgitgadget/git/ tag pr-1844/newren/object-name-fix-v1

Copy link

gitgitgadget bot commented Jan 1, 2025

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Elijah Newren via GitGitGadget" <[email protected]> writes:

> From: Elijah Newren <[email protected]>
>
> Given a branch name of 'foo{bar', commands like
>
>     git cat-file -p foo{bar:README.md
>
> should succeed (assuming that branch had a README.md file, of course).
> However, the change in cce91a2caef9 (Change 'master@noon' syntax to
> 'master@{noon}'., 2006-05-19) presumed that curly braces would always
> come after an '@' and be paired, causing 'foo{bar:README.md' to
> entirely miss the ':' and assume there's no object being referenced.
> In short, git would report:
>
>     fatal: Not a valid object name foo{bar:README.md
>
> Change the parsing to only make the assumption of paired curly braces
> immediately after a '@' character appears.

Interesting.  I wonder if this looseness was to ensure that we won't
mistake a colon inside "master^{/title with : a colon}" as a start
of a subpath, instead of asking for a commit with a title that
happens to have a colon in it?

> Add tests for both this and 'foo@@{...}' cases, which an initial version
> of this patch broke.

Thanks for being extra careful here.


> Reported-by: Gabriel Amaral <[email protected]>
> Helped-by: Michael Haggerty <[email protected]>
> Signed-off-by: Elijah Newren <[email protected]>
> ---
>     object-name: fix resolution of object names containing curly braces
>     
>     Maintainer note: this bug dates back to 2006; it is not a regression in
>     this cycle.
>
> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1844%2Fnewren%2Fobject-name-fix-v1
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1844/newren/object-name-fix-v1
> Pull-Request: https://github.com/gitgitgadget/git/pull/1844
>
>  object-name.c       |  8 +++++---
>  t/t1006-cat-file.sh | 17 +++++++++++++++++
>  2 files changed, 22 insertions(+), 3 deletions(-)
>
> diff --git a/object-name.c b/object-name.c
> index c892fbe80aa..e92f26b3256 100644
> --- a/object-name.c
> +++ b/object-name.c
> @@ -2087,12 +2087,14 @@ static enum get_oid_result get_oid_with_context_1(struct repository *repo,
>  		return -1;
>  	}
>  	for (cp = name, bracket_depth = 0; *cp; cp++) {
> -		if (*cp == '{')
> +		if (*cp == '@' && *(cp+1) == '{') {
> +			cp++;
>  			bracket_depth++;
> -		else if (bracket_depth && *cp == '}')
> +		} else if (bracket_depth && *cp == '}') {
>  			bracket_depth--;
> -		else if (!bracket_depth && *cp == ':')
> +		} else if (!bracket_depth && *cp == ':') {
>  			break;
> +		}
>  	}
>  	if (*cp == ':') {
>  		struct object_id tree_oid;
> diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
> index d36cd7c0863..252485dac78 100755
> --- a/t/t1006-cat-file.sh
> +++ b/t/t1006-cat-file.sh
> @@ -603,6 +603,23 @@ test_expect_success FUNNYNAMES '--batch-check, -Z with newline in input' '
>  	test_cmp expect actual
>  '
>  
> +test_expect_success FUNNYNAMES 'setup with curly braches in input' '
> +	git branch "foo{bar" &&
> +	git branch "foo@"
> +'
> +
> +test_expect_success FUNNYNAMES 'object reference with curly brace' '
> +	git cat-file -p "foo{bar:hello" >actual &&
> +	git cat-file -p HEAD:hello >expect &&
> +	test_cmp expect actual
> +'
> +
> +test_expect_success FUNNYNAMES 'object reference with at-sign' '
> +	git cat-file -p "foo@@{0}:hello" >actual &&
> +	git cat-file -p HEAD:hello >expect &&
> +	test_cmp expect actual
> +'
> +
>  test_expect_success 'setup blobs which are likely to delta' '
>  	test-tool genrandom foo 10240 >foo &&
>  	{ cat foo && echo plus; } >foo-plus &&
>
> base-commit: 92999a42db1c5f43f330e4f2bca4026b5b81576f

Copy link

gitgitgadget bot commented Jan 3, 2025

On the Git mailing list, Patrick Steinhardt wrote (reply to this):

On Wed, Jan 01, 2025 at 02:53:09AM +0000, Elijah Newren via GitGitGadget wrote:
> From: Elijah Newren <[email protected]>
> 
> Given a branch name of 'foo{bar', commands like
> 
>     git cat-file -p foo{bar:README.md
> 
> should succeed (assuming that branch had a README.md file, of course).
> However, the change in cce91a2caef9 (Change 'master@noon' syntax to
> 'master@{noon}'., 2006-05-19) presumed that curly braces would always
> come after an '@' and be paired, causing 'foo{bar:README.md' to
> entirely miss the ':' and assume there's no object being referenced.
> In short, git would report:
> 
>     fatal: Not a valid object name foo{bar:README.md
> 
> Change the parsing to only make the assumption of paired curly braces
> immediately after a '@' character appears.
> 
> Add tests for both this and 'foo@@{...}' cases, which an initial version
> of this patch broke.

Curious. I was kind of surprised to see that it's perfectly legal to
have branch names with curly braces in them in the first place. Even
something like `foo{bar}` is legal, even though it might be confusing
when one knows the above syntax. But sans your finding, this should be
fine given that curly braces are only interpreted specially when
preceded by '@', and the '@{' sequence is indeed disallowed by
`check_refname_compoment()`.

> diff --git a/object-name.c b/object-name.c
> index c892fbe80aa..e92f26b3256 100644
> --- a/object-name.c
> +++ b/object-name.c
> @@ -2087,12 +2087,14 @@ static enum get_oid_result get_oid_with_context_1(struct repository *repo,
>  		return -1;
>  	}
>  	for (cp = name, bracket_depth = 0; *cp; cp++) {
> -		if (*cp == '{')
> +		if (*cp == '@' && *(cp+1) == '{') {
> +			cp++;
>  			bracket_depth++;
> -		else if (bracket_depth && *cp == '}')
> +		} else if (bracket_depth && *cp == '}') {
>  			bracket_depth--;
> -		else if (!bracket_depth && *cp == ':')
> +		} else if (!bracket_depth && *cp == ':') {
>  			break;
> +		}
>  	}
>  	if (*cp == ':') {
>  		struct object_id tree_oid;

Makes sense. Only the first hunk actually changes anything, the
remaining changes are only required to make us stick to our coding
style.

I wonder though: does this have any impact on '<rev>^{<type>}' and other
syntaxes where we use '^' instead of '@'?

> diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
> index d36cd7c0863..252485dac78 100755
> --- a/t/t1006-cat-file.sh
> +++ b/t/t1006-cat-file.sh
> @@ -603,6 +603,23 @@ test_expect_success FUNNYNAMES '--batch-check, -Z with newline in input' '
>  	test_cmp expect actual
>  '
>  
> +test_expect_success FUNNYNAMES 'setup with curly braches in input' '
> +	git branch "foo{bar" &&
> +	git branch "foo@"
> +'
> +
> +test_expect_success FUNNYNAMES 'object reference with curly brace' '
> +	git cat-file -p "foo{bar:hello" >actual &&
> +	git cat-file -p HEAD:hello >expect &&
> +	test_cmp expect actual
> +'
> +
> +test_expect_success FUNNYNAMES 'object reference with at-sign' '
> +	git cat-file -p "foo@@{0}:hello" >actual &&
> +	git cat-file -p HEAD:hello >expect &&
> +	test_cmp expect actual
> +'

Do these really need the FUNNYNAMES prereq? The prereq seems to only be
about embedded quotes, tabs and newlines and is disallowed on MinGW. But
I think both '{' and '@' should work alright there, shouldn't they?

Thanks!

Patrick

Copy link

gitgitgadget bot commented Jan 3, 2025

User Patrick Steinhardt <[email protected]> has been added to the cc: list.

Copy link

gitgitgadget bot commented Jan 3, 2025

On the Git mailing list, Junio C Hamano wrote (reply to this):

Patrick Steinhardt <[email protected]> writes:

> I wonder though: does this have any impact on '<rev>^{<type>}' and other
> syntaxes where we use '^' instead of '@'?
> ...
> Do these really need the FUNNYNAMES prereq? The prereq seems to only be
> about embedded quotes, tabs and newlines and is disallowed on MinGW. But
> I think both '{' and '@' should work alright there, shouldn't they?

Thanks for a review.  I am too curious how this change interacts
with syntax with {braces} that do not use "@".

Copy link

gitgitgadget bot commented Jan 3, 2025

There are issues in commit f6e6e7a:
Can I get rid of the FUNNYNAME requirement?
Commit checks stopped - the message is too short
Commit not signed off

Copy link

gitgitgadget bot commented Jan 3, 2025

There are issues in commit 4759f9c:
Can I get rid of the FUNNYNAME requirement?
Commit checks stopped - the message is too short
Commit not signed off

@newren newren force-pushed the object-name-fix branch 2 times, most recently from 030d909 to 68bc921 Compare January 3, 2025 23:20
newren added 2 commits January 3, 2025 15:20
Given a branch name of 'foo{bar', commands like

    git cat-file -p foo{bar:README.md

should succeed (assuming that branch had a README.md file, of course).
However, the change in cce91a2 (Change 'master@noon' syntax to
'master@{noon}'., 2006-05-19) presumed that curly braces would always
come after an '@' or '^' and be paired, causing e.g. 'foo{bar:README.md'
to entirely miss the ':' and assume there's no object being referenced.
In short, git would report:

    fatal: Not a valid object name foo{bar:README.md

Change the parsing to only make the assumption of paired curly braces
immediately after either a '@' or '^' character appears.

Add tests for this, as well as for a few other test cases that initial
versions of this patch broke:
  * 'foo@@{...}'
  * 'foo^{/${SEARCH_TEXT_WITH_COLON}}:${PATH}'

Reported-by: Gabriel Amaral <[email protected]>
Helped-by: Michael Haggerty <[email protected]>
Signed-off-by: Elijah Newren <[email protected]>
From Documentation/revisions.txt:
    '<describeOutput>', e.g. 'v1.7.4.2-679-g3bee7fb'::
      Output from `git describe`; i.e. a closest tag, optionally
      followed by a dash and a number of commits, followed by a dash, a
      'g', and an abbreviated object name.
which means that output of the format
    ${REFNAME}-${INTEGER}-g${HASH}
should parse to fully expand ${HASH}.  This is fine.  However, we
currently don't validate any of ${REFNAME}-${INTEGER}, we only parse
-g${HASH} and assume the rest is valid.  That is problematic, since it
breaks things like

    git cat-file -p branchname:path/to/file/named/i-gaffed

which, when commit affed exists, will not return us information about a
file we are looking for but will instead tell us about commit affed.

Similarly, we should probably not treat
    refs/tags/invalid/./../...../// ~^:/?*\\&[}/busted.lock-g049e0ef6
as a request for commit 050e0ef either.

Tighten up the parsing to make sure ${REFNAME} and ${INTEGER} are
present and valid.

Reported-by: Gabriel Amaral <[email protected]>
Signed-off-by: Elijah Newren <[email protected]>
Copy link

gitgitgadget bot commented Jan 3, 2025

On the Git mailing list, Elijah Newren wrote (reply to this):

On Wed, Jan 1, 2025 at 9:01 AM Junio C Hamano <[email protected]> wrote:
>
> "Elijah Newren via GitGitGadget" <[email protected]> writes:
>
> > From: Elijah Newren <[email protected]>
> >
> > Given a branch name of 'foo{bar', commands like
> >
> >     git cat-file -p foo{bar:README.md
> >
> > should succeed (assuming that branch had a README.md file, of course).
> > However, the change in cce91a2caef9 (Change 'master@noon' syntax to
> > 'master@{noon}'., 2006-05-19) presumed that curly braces would always
> > come after an '@' and be paired, causing 'foo{bar:README.md' to
> > entirely miss the ':' and assume there's no object being referenced.
> > In short, git would report:
> >
> >     fatal: Not a valid object name foo{bar:README.md
> >
> > Change the parsing to only make the assumption of paired curly braces
> > immediately after a '@' character appears.
>
> Interesting.  I wonder if this looseness was to ensure that we won't
> mistake a colon inside "master^{/title with : a colon}" as a start
> of a subpath, instead of asking for a commit with a title that
> happens to have a colon in it?

Yeah, good catch, my changes would for example break parsing
  master^{/object-name:}:t/t1006-cat-file.sh

I'll fix that and add a testcase.

Copy link

gitgitgadget bot commented Jan 3, 2025

User Elijah Newren <[email protected]> has been added to the cc: list.

Copy link

gitgitgadget bot commented Jan 3, 2025

On the Git mailing list, Elijah Newren wrote (reply to this):

On Fri, Jan 3, 2025 at 12:16 AM Patrick Steinhardt <[email protected]> wrote:
>
> On Wed, Jan 01, 2025 at 02:53:09AM +0000, Elijah Newren via GitGitGadget wrote:
> > From: Elijah Newren <[email protected]>
> >
> > Given a branch name of 'foo{bar', commands like
> >
> >     git cat-file -p foo{bar:README.md
> >
> > should succeed (assuming that branch had a README.md file, of course).
> > However, the change in cce91a2caef9 (Change 'master@noon' syntax to
> > 'master@{noon}'., 2006-05-19) presumed that curly braces would always
> > come after an '@' and be paired, causing 'foo{bar:README.md' to
> > entirely miss the ':' and assume there's no object being referenced.
> > In short, git would report:
> >
> >     fatal: Not a valid object name foo{bar:README.md
> >
> > Change the parsing to only make the assumption of paired curly braces
> > immediately after a '@' character appears.
> >
> > Add tests for both this and 'foo@@{...}' cases, which an initial version
> > of this patch broke.
>
> Curious. I was kind of surprised to see that it's perfectly legal to
> have branch names with curly braces in them in the first place.

I was surprised too, but apparently they are valid and we have real
world repositories where people have used such bad names.

> Even
> something like `foo{bar}` is legal, even though it might be confusing
> when one knows the above syntax. But sans your finding, this should be
> fine given that curly braces are only interpreted specially when
> preceded by '@', and the '@{' sequence is indeed disallowed by
> `check_refname_compoment()`.
>
> > diff --git a/object-name.c b/object-name.c
> > index c892fbe80aa..e92f26b3256 100644
> > --- a/object-name.c
> > +++ b/object-name.c
> > @@ -2087,12 +2087,14 @@ static enum get_oid_result get_oid_with_context_1(struct repository *repo,
> >               return -1;
> >       }
> >       for (cp = name, bracket_depth = 0; *cp; cp++) {
> > -             if (*cp == '{')
> > +             if (*cp == '@' && *(cp+1) == '{') {
> > +                     cp++;
> >                       bracket_depth++;
> > -             else if (bracket_depth && *cp == '}')
> > +             } else if (bracket_depth && *cp == '}') {
> >                       bracket_depth--;
> > -             else if (!bracket_depth && *cp == ':')
> > +             } else if (!bracket_depth && *cp == ':') {
> >                       break;
> > +             }
> >       }
> >       if (*cp == ':') {
> >               struct object_id tree_oid;
>
> Makes sense. Only the first hunk actually changes anything, the
> remaining changes are only required to make us stick to our coding
> style.
>
> I wonder though: does this have any impact on '<rev>^{<type>}' and other
> syntaxes where we use '^' instead of '@'?

<type> is pretty limited, so I see no problem there.  However
<rev>^{/<search text>} is problematic, as Junio pointed out.  I've
fixed up the patch and added a testcase to cover all the '^{...}'
cases.

> > diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
> > index d36cd7c0863..252485dac78 100755
> > --- a/t/t1006-cat-file.sh
> > +++ b/t/t1006-cat-file.sh
> > @@ -603,6 +603,23 @@ test_expect_success FUNNYNAMES '--batch-check, -Z with newline in input' '
> >       test_cmp expect actual
> >  '
> >
> > +test_expect_success FUNNYNAMES 'setup with curly braches in input' '
> > +     git branch "foo{bar" &&
> > +     git branch "foo@"
> > +'
> > +
> > +test_expect_success FUNNYNAMES 'object reference with curly brace' '
> > +     git cat-file -p "foo{bar:hello" >actual &&
> > +     git cat-file -p HEAD:hello >expect &&
> > +     test_cmp expect actual
> > +'
> > +
> > +test_expect_success FUNNYNAMES 'object reference with at-sign' '
> > +     git cat-file -p "foo@@{0}:hello" >actual &&
> > +     git cat-file -p HEAD:hello >expect &&
> > +     test_cmp expect actual
> > +'
>
> Do these really need the FUNNYNAMES prereq? The prereq seems to only be
> about embedded quotes, tabs and newlines and is disallowed on MinGW. But
> I think both '{' and '@' should work alright there, shouldn't they?

Oh, I misread the failures.  It turns out the FUNNYNAMES prereq fixed
things in CI on windows for me because the only commit ever created in
the repository is created by a testcase with a FUNNYNAMES prereq.
Since the setup for my tests relied on HEAD existing (because I run
   git branch "foo{bar" HEAD
in a setup test of my own), the tests were failing.  I didn't look
closely enough and assumed that command was failing because Windows
didn't like a branch name with a curly brace, but the real reason it
was failing was because HEAD didn't exist.

I'll tweak one of the earlier setup tests to create a commit so HEAD exists.

Thanks for pointing this out.

@newren
Copy link
Author

newren commented Jan 4, 2025

/submit

Copy link

gitgitgadget bot commented Jan 4, 2025

Submitted as [email protected]

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git/ pr-1844/newren/object-name-fix-v2

To fetch this version to local tag pr-1844/newren/object-name-fix-v2:

git fetch --no-tags https://github.com/gitgitgadget/git/ tag pr-1844/newren/object-name-fix-v2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant