Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow to access WARC filename, record offset and length #8

Merged
merged 1 commit into from
Aug 2, 2019

Conversation

sebastian-nagel
Copy link
Contributor

Allow to access WARC filename and from ArchiveIterator record offset and length (see #6)

  • introduce customizable method iterate_records(warc_file_uri, archive_iterator) which iterates over WARC record and calls process_record(record)
  • document pitfall: accessing offset and length must be done after WARC record is processed

from ArchiveIterator, implements #6
- introduce customizable method
    `iterate_records(warc_file_uri, archive_iterator)`
  which iterates over WARC record and calls `process_record(record)`
- document pitfall: accessing offset and length must be done after
  WARC record is processed
@sebastian-nagel sebastian-nagel merged commit 7e2f67a into master Aug 2, 2019
@sebastian-nagel sebastian-nagel deleted the cc-pyspark-6-record-offset-length branch August 2, 2019 09:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant