Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement zip extract #158

Closed
wants to merge 12 commits into from
Closed

Implement zip extract #158

wants to merge 12 commits into from

Conversation

tempusfrangit
Copy link
Contributor

@tempusfrangit tempusfrangit commented Feb 15, 2024

Implements zip extraction consumer. This is require to handle dreambooth updates.

In golang extraction of tar and zip are within ~300ms for equivalent files (test case for dreambooth processing), lets enable lightening the load and directly handle zip files in pget.

  • Include FileSize to call to consume() to support consumers like ZipExtractor that requires a size
  • Implement a ReaderAt interface that can convert the muiltiChanReader, ZipExtractor requires an io.ReaderAt interface implementation
  • Implement the ZipExtractor consumer
  • Add some better logging when -x is supplied
  • Handle overwrite support in the Consumer
  • Address an Import loop.
  • Add contentType as input to consumer (and output from Fetch)
  • Support tar and zip extraction from multifile mode.
  • Correct issue with MultiReader, it was not blocking on bufferedReader ready signal
  • Implement --unzip (-u) option for setting unzip consumer
  • Multifile uses --unzip and --extract options now
  • README update
  • Debug Logging update for unzip and tar extract

Some future consumers will need to know the expected fileSize depending
on implementation (e.g. unzip). This wires up basic support for adding
the fileSize as an argument to Consume; the value is already available
at the time Consume is called.
@tempusfrangit
Copy link
Contributor Author

tempusfrangit commented Feb 15, 2024

Still Missing: muiltifile support for .tar and .zip

multiReader is a reader that implements the ReadAt functionality needed
for some future consumers (e.g. unzip). The multiReader at a basic level
consumes a mutltiChanReader via the NewmultiReader() function and
returns an io.ReaderAt implementation.

bufferedReader now has a .len() calculation that will report the
content-length once that header is received. Since we do not know the
actual content length until the download starts, there is a new signal
channel to indicate the download has started and allows us to read the
size of the bufferedReader.

This means that there is the real likelihood that reading from
multiReader may block more often than chanmultiReader. MultiReader may
be able to implement Seek() and other related functions for reading the
data out of strict order.
Implement ZipExtractor consumer
If the consumer is not File or tar-extractor when -x is used, log a warning that the
tar-extractor supersedes the specified consumer.
Make the consumer handle overwriting explicitly. This addresses edge
cases with tar and zip consumer when extracting files.
Move the ConsistentHashingStrategyKey to client not config.
'unzip' is the binary used in linux to extract from a zip file, lets
stick with names that are more aligned with the CLI tools we otherwise
use.
Fetch now returns contentType and consumers take ContentType as an
argument. This is in preperation of multifile being able to direct
differnt contentTypes to different consumers in the case of tar/zip
extraction.
Multifile can now extract tar and zip files based upon the content-type.
The -u and -t flags for multifile command control unzip and untar
capabilities respectively.
* MultiReader was not blocking on the buffered reader ready signal
* Unzip now joins the path name to the target instead of using '+'
  incorrectly
* Implement `-u` short hand for `--unzip`
* `--unzip` option for invoking the unzip consumer added
* multifile mode utilizes `--unzip/-u` and `--extract/-x` for tar and
  unzip modes

* Improved Debugging logs for tar and unzip
* Update README
PreRun and PreRunE are mutually exclusive. This moves the extraction and
unzip consumer handling via short-hand options to PreRunE where we
validate that -x and -u are not consurrently used.

for _, file := range zipReader.File {
err := handleFileFromZip(file, destPath, overwrite)
if err != nil {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extracted files do not end up outside the intended destination directory?
I assume ZIP code checks archive's size, structure for signs of corrupted/junk archive...
same for other arch types.

Copy link
Contributor Author

@tempusfrangit tempusfrangit Feb 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Malicious Tar and Zip checking should be added. I am not wanting to support non-standard zip (read: extensions) unless there is a real need.

@tempusfrangit tempusfrangit marked this pull request as draft February 26, 2024 20:10
@tempusfrangit tempusfrangit deleted the implement-zip-extract branch March 1, 2024 16:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants