Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

object-name: fix resolution of object names containing curly braces #1844

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
63 changes: 59 additions & 4 deletions object-name.c
Original file line number Diff line number Diff line change
Expand Up @@ -1271,6 +1271,58 @@ static int peel_onion(struct repository *r, const char *name, int len,
return 0;
}

/*
* Documentation/revisions.txt says:
* '<describeOutput>', e.g. 'v1.7.4.2-679-g3bee7fb'::
* Output from `git describe`; i.e. a closest tag, optionally
* followed by a dash and a number of commits, followed by a dash, a
* 'g', and an abbreviated object name.
*
* which means that the stuff before '-g${HASH}' needs to be a valid
* refname, a dash, and a non-negative integer. This function verifies
* that.
*
* In particular, we do not want to treat
* branchname:path/to/file/named/i-gaffed
* as a request for commit affed.
*
* More generally, we should probably not treat
* 'refs/heads/./../.../ ~^:/?*[////\\\&}/busted.lock-g050e0ef6ead'
* as a request for object 050e0ef6ead either.
*
* We are called with name[len] == '-' and name[len+1] == 'g', i.e.
* we are verifying ${REFNAME}-{INTEGER} part of the name.
*/
static int ref_and_count_parts_valid(const char *name, int len)
{
struct strbuf sb;
const char *cp;
int flags = REFNAME_ALLOW_ONELEVEL;
int ret = 1;

/* Ensure we have at least one digit */
if (!isxdigit(name[len-1]))
return 0;

/* Skip over digits backwards until we get to the dash */
for (cp = name + len - 2; name < cp; cp--) {
if (*cp == '-')
break;
if (!isxdigit(*cp))
return 0;
}
/* Ensure we found the leading dash */
if (*cp != '-')
return 0;

len = cp - name;
strbuf_init(&sb, len);
strbuf_add(&sb, name, len);
ret = !check_refname_format(name, flags);
strbuf_release(&sb);
return ret;
}

static int get_describe_name(struct repository *r,
const char *name, int len,
struct object_id *oid)
Expand All @@ -1284,7 +1336,8 @@ static int get_describe_name(struct repository *r,
/* We must be looking at g in "SOMETHING-g"
* for it to be describe output.
*/
if (ch == 'g' && cp[-1] == '-') {
if (ch == 'g' && cp[-1] == '-' &&
ref_and_count_parts_valid(name, cp - 1 - name)) {
cp++;
len -= cp - name;
return get_short_oid(r,
Expand Down Expand Up @@ -2051,12 +2104,14 @@ static enum get_oid_result get_oid_with_context_1(struct repository *repo,
return -1;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Elijah Newren via GitGitGadget" <[email protected]> writes:

>  	for (cp = name, bracket_depth = 0; *cp; cp++) {
> -		if (*cp == '{')
> +		if (*(cp+1) == '{' && (*cp == '@' || *cp == '^')) {
> +			cp++;
>  			bracket_depth++;

Checking cp[1] before even knowing if cp[0] is the end of the string
(hence cp[1] is an out of bounds access) smells fishy.  If it were
something like ...

	if (cp[0] && strchr("@^", cp[0]) && cp[1] == '{')

... it may be a bit more palatable, perhaps?  At least writing it
this way we can easily scale when we find the third character we
need to special case, hopefully, but again, I do prefer if we can
find a solution that does not have such an intimate knowledge about
"@^", which I just failed to do here X-<.

Thanks.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Elijah Newren wrote (reply to this):

On Sat, Jan 4, 2025 at 9:26 AM Junio C Hamano <[email protected]> wrote:
>
> "Elijah Newren via GitGitGadget" <[email protected]> writes:
>
> >       for (cp = name, bracket_depth = 0; *cp; cp++) {
> > -             if (*cp == '{')
> > +             if (*(cp+1) == '{' && (*cp == '@' || *cp == '^')) {
> > +                     cp++;
> >                       bracket_depth++;
>
> Checking cp[1] before even knowing if cp[0] is the end of the string
> (hence cp[1] is an out of bounds access) smells fishy.

We checked *cp in the loop already, so we know cp[0] != '\0'.
Combined with the fact that this is a NUL-terminated string, we
therefore also know that cp[1] is not an out-of-bounds access.

> If it were
> something like ...
>
>         if (cp[0] && strchr("@^", cp[0]) && cp[1] == '{')

Since we know cp[0] != '\0' already, couldn't this be simplified to

    if (strchr("@^", *cp) && cp[1] == '{')

?

I do like this form better though, yes.

> ... it may be a bit more palatable, perhaps?  At least writing it
> this way we can easily scale when we find the third character we
> need to special case, hopefully, but again, I do prefer if we can
> find a solution that does not have such an intimate knowledge about
> "@^", which I just failed to do here X-<.

Yeah, I have failed to come up with an alternative as well.  If I and
others can't come up with something better in a few days, I'll
resubmit with the above change and a comment in the commit message
that we'd prefer something better but were unable to come up with
anything.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Elijah Newren via GitGitGadget" <[email protected]> writes:

> Given a branch name of 'foo{bar', commands like
>
>     git cat-file -p foo{bar:README.md
>
> should succeed (assuming that branch had a README.md file, of course).
> However, the change in cce91a2caef9 (Change 'master@noon' syntax to
> 'master@{noon}'., 2006-05-19) presumed that curly braces would always
> come after an '@' or '^' and be paired, causing e.g. 'foo{bar:README.md'
> to entirely miss the ':' and assume there's no object being referenced.
> In short, git would report:
>
>     fatal: Not a valid object name foo{bar:README.md

Naïvely, it seems that a solution is to parse from left to right,
i.e., (1) notice there is a colon, (2) see if everything before that
colon resolves to a treeish, and (3) see if everything after it is a
path that appears in the treeish.

 - When we are given foo@{some:thing}, if we did that, we realize
   that "foo@{some" is not a valid tree-ish object name (since "@{"
   cannot appear in a refname) and then can backtrack by realizing
   "foo" is a ref, and @{...} could be a reflog reference (most
   likely a way to spell some sort of timestamp), and try that.

 - Similarly, for foo:path-gaffed, we would notice "foo" is a valid
   tree-ish object name, and if path-gaffed is a path in it, we'd be
   happy.  Or foo may not be a tree-ish, or path-gaffed may not
   exist in that tree-ish.  In which case, we can backtrack and see
   foo:path-g is an allowed prefix in a desribe name.

Now in the above description, I have assumed that an alternative
interpretation kicks in only as a fallback when we backtrack, but
we could make sure we try all possibilities and notice ambiguity if
we wanted to.

In any case, such an updated structure of the parsing code paths
(whether alternative interpretations are treated as fallbacks or
equally plausible candidates subject to disambiguation) would be
a vast departure from what we currently have, so a targeted "fix"
like these two patches attempt would be more appropriate as an
initial approach, I think.

Thanks, will queue, but probably we'd look at in any seriousness
after the 2.48 final gets tagged.

}
for (cp = name, bracket_depth = 0; *cp; cp++) {
if (*cp == '{')
if (*(cp+1) == '{' && (*cp == '@' || *cp == '^')) {
cp++;
bracket_depth++;
else if (bracket_depth && *cp == '}')
} else if (bracket_depth && *cp == '}') {
bracket_depth--;
else if (!bracket_depth && *cp == ':')
} else if (!bracket_depth && *cp == ':') {
break;
}
}
if (*cp == ':') {
struct object_id tree_oid;
Expand Down
31 changes: 30 additions & 1 deletion t/t1006-cat-file.sh
Original file line number Diff line number Diff line change
Expand Up @@ -240,7 +240,8 @@ test_expect_success "setup" '
git config extensions.objectformat $test_hash_algo &&
git config extensions.compatobjectformat $test_compat_hash_algo &&
echo_without_newline "$hello_content" > hello &&
git update-index --add hello
git update-index --add hello &&
git commit -m "add hello file"
'

run_blob_tests () {
Expand Down Expand Up @@ -602,6 +603,34 @@ test_expect_success FUNNYNAMES '--batch-check, -Z with newline in input' '
test_cmp expect actual
'

test_expect_success 'setup with curly braches in input' '
git branch "foo{bar" HEAD &&
git branch "foo@" HEAD
'

test_expect_success 'object reference with curly brace' '
git cat-file -p "foo{bar:hello" >actual &&
git cat-file -p HEAD:hello >expect &&
test_cmp expect actual
'

test_expect_success 'object reference with at-sign' '
git cat-file -p "foo@@{0}:hello" >actual &&
git cat-file -p HEAD:hello >expect &&
test_cmp expect actual
'

test_expect_success 'setup with commit with colon' '
git commit-tree -m "testing: just a bunch of junk" HEAD^{tree} >out &&
git branch other $(cat out)
'

test_expect_success 'object reference via commit text search' '
git cat-file -p "other^{/testing:}:hello" >actual &&
git cat-file -p HEAD:hello >expect &&
test_cmp expect actual
'

test_expect_success 'setup blobs which are likely to delta' '
test-tool genrandom foo 10240 >foo &&
{ cat foo && echo plus; } >foo-plus &&
Expand Down
22 changes: 22 additions & 0 deletions t/t6120-describe.sh
Original file line number Diff line number Diff line change
Expand Up @@ -725,4 +725,26 @@ test_expect_success '--exact-match does not show --always fallback' '
test_must_fail git describe --exact-match --always
'

test_expect_success 'avoid being fooled by describe-like filename' '
test_when_finished rm out &&

git rev-parse --short HEAD >out &&
FILENAME=filename-g$(cat out) &&
touch $FILENAME &&
git add $FILENAME &&
git commit -m "Add $FILENAME" &&

git cat-file -t HEAD:$FILENAME >actual &&

echo blob >expect &&
test_cmp expect actual
'

test_expect_success 'do not be fooled by invalid describe format ' '
test_when_finished rm out &&

git rev-parse --short HEAD >out &&
test_must_fail git cat-file -t "refs/tags/super-invalid/./../...../ ~^:/?*[////\\\\\\&}/busted.lock-42-g"$(cat out)
'

test_done
Loading