Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option to optionally skip downloading the contents of a file #79

Merged
merged 2 commits into from
Mar 4, 2024
Merged

Add option to optionally skip downloading the contents of a file #79

merged 2 commits into from
Mar 4, 2024

Conversation

benjamb
Copy link
Contributor

@benjamb benjamb commented Mar 1, 2024

This PR does a couple of things:

1 . The first commit prevents the downloading of entire files into memory by streaming the response and reading small chunks as a time.
2. The second commit adds the configuration option to optionally skip downloading of a remote URLs content.

Closes #76.

Comment on lines 157 to 159
# Download the entire contents as to not break previous behaviour.
for _ in response.iter_content(chunk_size=1024):
pass
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we don't actually care about downloading the contents (we seemingly don't do any verification of contents), then we could always just drop these lines, with stream=True set the body of the response isn't downloaded automatically.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't as long as the url status is correct.

Copy link
Contributor Author

@benjamb benjamb Mar 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I just simplify this MR to just skip the download entirely and drop the read_max_bytes configuration? I'm hesitant to break existing behaviour, so I'll list out a few possible routes this MR could go:

  1. Keep this MR as it is, with read_max_bytes limiting the size of downloaded content.
  2. Opt for a different configuration value, such as skip_downloads, which triggers this new behaviour of skipping the download of the response's body.
  3. Don't bother adding any configurable behaviour and just establish a connection via stream=True and skip any downloads.
    @manuzhang Which would be your preference?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer option 2.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@manuzhang I've implemented option 2 and updated this MR, as well as the original issue title.

@benjamb benjamb changed the title Add option to read max N bytes of a file Add option to optionally skip downloading the contents of a file Mar 4, 2024
@manuzhang manuzhang merged commit fce60c9 into manuzhang:main Mar 4, 2024
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add option to prevent downloading the entirety of a file
2 participants