Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clone the parser with specific depth to reduce network traffic #821

Open
blindFS opened this issue Dec 31, 2024 · 10 comments
Open

Clone the parser with specific depth to reduce network traffic #821

blindFS opened this issue Dec 31, 2024 · 10 comments

Comments

@blindFS
Copy link

blindFS commented Dec 31, 2024

Problem

The size of the whole git history can be large (parser.c changes a lot if tracked)
git clone --depth 1 can reduce a lot of unnecessary network traffic

Describe the solution you'd like

extra option named depth in nickel
and add something like

    .arg("--depth")
    .arg(&self.depth)

to

Command::new("git")
.arg("clone")
.arg(&self.git)
.arg(&tmp_dir)
.status()
.map_err(TopiaryConfigFetchingError::Git)?;

Also, it would be nice to have another topiary-cli subcommand to clean the historical versions of the compiled .so files.

@aspiwack
Copy link
Member

aspiwack commented Jan 6, 2025

Do you see any reason not to clone with depth exactly 1 all the time, rather than adding an option?

@blindFS
Copy link
Author

blindFS commented Jan 6, 2025

Actually after some searching, I think git clone --filter=blob:none would be a generally better idea.

@yannham
Copy link
Member

yannham commented Jan 6, 2025

According to this github blog post, I think we do want a shallow clone (--depth 1). It is the smallest clone possible (only downloading the HEAD commit and the corresponding trees and blobs), while a blobless clone still pulls the all the commits in the history and their corresponding trees, and a tree-less clone still pulls all the commits in the history. So, it seems that (comparing by the size of the cloned repository) one should have shallow <= tree-less <= blob-less.

According to the post, if we just want to download the content of HEAD, then we can use a shallow clone (although it is the more restricted form of the three, as many normal git commands don't work anymore on such repos - however I don't think we care in the case of Topiary).

@blindFS
Copy link
Author

blindFS commented Jan 6, 2025

I think --depth 1 will cause some issues, as it blocks checkout at any commit of the parser that we want. So it forces the formatter query file to be compatible with the latest commit, which is hard to guarantee.

@Xophmeister
Copy link
Member

I think --depth 1 will cause some issues, as it blocks checkout at any commit of the parser that we want. So it forces the formatter query file to be compatible with the latest commit, which is hard to guarantee.

You have to provide a Git rev, so I don't think this would be an issue as you'd always pin to a known working commit.

@blindFS
Copy link
Author

blindFS commented Jan 6, 2025

@Xophmeister That is not the case if --depth 1 is enabled, it always downloads the latest single commit of the given branch.

@yannham
Copy link
Member

yannham commented Jan 6, 2025

Oh, I think you're right, my bad. Shallow clone just works with HEAD, although there are hacky work-arounds to get the range of HEAD down to a specific revision (but then I guess you get everything, including trees and blobs, which makes it noncompetitive). Then maybe tree-less is the way to go then, where checking out the revision will lazily download what's needed?

@Xophmeister
Copy link
Member

Ah, my mistake: You can only do that with branches 😞

@blindFS
Copy link
Author

blindFS commented Jan 6, 2025

tree-less and blob-less are both Ok and similar in size according to my test, actually also close to --depth 1

@yannham
Copy link
Member

yannham commented Jan 6, 2025

Also, I concur with @aspiwack 's suggestion: I would just make any light clone we choose to be the default silently, as users aren't supposed to mess with downloaded repo manually anyway. It's more like a Topiary implementation detail.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants