Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DRAFT] for testing : Fix 4Gb limit for large files on Git for Windows #2179

Open
wants to merge 14 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion http-push.c
Original file line number Diff line number Diff line change
Expand Up @@ -370,7 +370,7 @@ static void start_put(struct transfer_request *request)

/* Set it up */
git_deflate_init(&stream, zlib_compression_level);
size = git_deflate_bound(&stream, len + hdrlen);
size = git_deflate_bound(&stream, len + (size_t) hdrlen);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since len is already of type size_t, this cast is superfluous. The only real difference this would make was if hdrlen was negative. Which cannot be the case because xsnprintf() is explicitly not allowed to return negative values.

So I'd just drop this hunk.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't assume the type of len because it (git_deflate_bound) may be a macro, or a function, and it is then subject to the compile flags.

Also, one of my readings of the various 'standards' suggested that it is within the implementation defined definition to downcast oversize variables if one (of the computation) was the normal 'right' size (e.g. long).

The whole up/down cast cascade and dual default function templates thing made me pedantic/explicit!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't assume the type of len because it (git_deflate_bound) may be a macro, or a function, and it is then subject to the compile flags.

Yes we can. Because we declared it ourselves, a couple lines above the shown diff context, as size_t len. So we can very much assume that its type is size_t.

it is within the implementation defined definition to downcast oversize variables if one (of the computation) was the normal 'right' size (e.g. long).

Nope. Arithmetic expressions are always evaluated after upcasting narrower data types to larger ones. A compiler that downcasts is buggy.

You probably think about an example like this:

short result;
long a = 1;
int b = 2;

result = a + b;

This is a bit tricky to understand, as the evaluation of the arithmetic expression a + b really upcasts the int to a long (unless they already have the same bit size). Only after that is the result downcast in order to be assigned to the narrower result.

That example has little relationship to the code under discussion, though, except in the case where git_deflate_bound() is defined as a function whose second parameter is declared with a too-narrow datatype. In which case you simply cannot do anything about it, except replace the function by our custom macro (as you did in another patch in this PR).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree about the upcast to long. But when long is meant to be the maximum arithmetic calculation, then size_t may be down cast (as per Microsoft's desire that 32 bit code stays unchanged on 64 bit machines, so only (arithmetically) address 32bit range, unless one specially changes to full pointer calculation. As I understand it it's a bit open ended (which standard is your go to reference document?)

It maybe that the commit message needs to be long to explain the ripple through consequences, or we make the code informative. Its one of those 'discursive' areas.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't assume the type of len

But we can. It is declared as size_t len;. See here:

git/http-push.c

Line 363 in a4b2bb5

size_t len;

strbuf_init(&request->buffer.buf, size);
request->buffer.posn = 0;

Expand Down
6 changes: 3 additions & 3 deletions zlib.c
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ static const char *zerr_to_string(int status)
#define ZLIB_BUF_MAX ((uInt) 1024 * 1024 * 1024) /* 1GB */
static inline uInt zlib_buf_cap(size_t len)
{
return (ZLIB_BUF_MAX < len) ? ZLIB_BUF_MAX : len;
return ((size_t) ZLIB_BUF_MAX < len) ? ZLIB_BUF_MAX : (uInt) len;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But ZLIB_BUF_MAX is already defined to be uInt, i.e. unsigned? And that cast from size_t to uInt should be implicit.

Therefore I doubt that this hunk is really necessary.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm against implicit conversions for implementation defined concerns. One of them has to be probably wrong 😉 One of then is an up cast and the other a down cast, and they are being compared to each other...

Aside: for testing, 1.5Gb is better, as it crosses the thresholds quickly and not at boundaries.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm against implicit conversions for implementation defined concerns.

You can be against that all you want, at the same time you will have to abide by Git's conventions. And those conventions seem to avoid quite hard any unnecessary explicit cast.

For example, when ZLIB_BUF_MAX is compared to len, and they have different bit sizes, the smaller is up-cast to the larger one. That's defined behavior. Any compiler producing different code has a bug.

And do not forget that cluttering your code with unnecessary casts (or for that matter, cluttering your PR with unnecessary hunks) makes it much, much harder to read.

So no, this hunk needs to be dropped, sorry.

}

static void zlib_pre_call(git_zstream *s)
Expand Down Expand Up @@ -116,7 +116,7 @@ int git_inflate(git_zstream *strm, int flush)
zlib_pre_call(strm);
/* Never say Z_FINISH unless we are feeding everything */
status = inflate(&strm->z,
(strm->z.avail_in != strm->avail_in)
((size_t) strm->z.avail_in != strm->avail_in)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hrm. That looks dubious.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another nasty implicit conversion issue. They are different sizes and one definitely has wraparound relative to the other (z.avail_in is only 32bit, but strm->avail_in is full 64bit)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Show me an explicit example where GCC (or Clang or MSVC) produces incorrect code for a != comparison between variables of different bit sizes. Show me. If you can show that to me, then I am willing to retract my objections to this hunk.

? 0 : flush);
if (status == Z_MEM_ERROR)
die("inflate: out of memory");
Expand Down Expand Up @@ -242,7 +242,7 @@ int git_deflate(git_zstream *strm, int flush)

/* Never say Z_FINISH unless we are feeding everything */
status = deflate(&strm->z,
(strm->z.avail_in != strm->avail_in)
((size_t) strm->z.avail_in != strm->avail_in)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here.

I think this entire commit can go without breaking anything.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same - being explicit about implementation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, let's not clutter the diff with unnecessary (and distracting) churn.

? 0 : flush);
if (status == Z_MEM_ERROR)
die("deflate: out of memory");
Expand Down