Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix incomplete attached file problem #1261

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

vlasakm
Copy link
Contributor

@vlasakm vlasakm commented Feb 22, 2025

Currently, we only have ttstub_input_read function, which corresponds to Rust's read_exact. This seems to work for most of xetex, but it's not semantically a replacement for buffered reading with fread, which is used in a dvipdfmx.

One of such places is reading a file to be included in a PDF file as attachment. This is done through dvipdfmx special like this:

\special{pdf:fstream @SourceFile (example.txt)}

These reads are done in 1024 byte chunks, and if we read these chunks with read_exact, we are going to fail unless the file size is not exactly divisible by 1024.

Tectonic would report this as:

warning: 1024-byte read failed
caused by: failed to fill whole buffer

To fix this issue in a non-intrusive way, we introduce a new variant of reads - ttstub_input_read_partial, which follows Rust's read semantics, and not read_exact's, and we use it for the dvipdfmx fstream reading.

Likely any place that calls ttstub_input_read in a loop is also a candidate for using ttstub_input_read_partial, and perhaps it would work better for others.

But there definitely are places in xetex that depend on the read_exact semantics. For example, when I naively just tried to make all reads into ttstub_input_read_partial, xetex failed pretty quickly on "undumping the format file", which has a few initial 4 byte reads, but then it tries to undump big string pool which was many kilobytes in size. Rust read however would only return 8196 minus couple of the the 4 bytes, which would fail a check on the TeX side which expects full read. AFAICT xetex uses fread, so this probably just comes down to buffering differences between fread and Rust's Reader / BufReader.

Relates to #935
Fixes #1260

Currently, we only have `ttstub_input_read` function, which corresponds to
Rust's `read_exact`. This seems to work for most of xetex, but it's not
semantically a replacement for buffered reading with `fread`, which is
used in a dvipdfmx.

One of such places is reading a file to be included in a PDF file as
attachment. This is done through dvipdfmx special like this:

    \special{pdf:fstream @sourcefile (example.txt)}

These reads are done in 1024 byte chunks, and if we read these chunks
with read_exact, we are going to fail unless the file size is not
exactly divisible by 1024.

Tectonic would report this as:

    warning: 1024-byte read failed
    caused by: failed to fill whole buffer

To fix this issue in a non-intrusive way, we introduce a new variant of
reads - `ttstub_input_read_partial`, which follows Rust's `read`
semantics, and not `read_exact`'s, and we use it for the dvipdfmx
fstream reading.

Likely any place that calls `ttstub_input_read` in a loop is also a
candidate for using `ttstub_input_read_partial`, and perhaps it would
work better for others.

But there definitely are places in xetex that depend on the `read_exact`
semantics. For example, when I naively just tried to make all reads into
`ttstub_input_read_partial`, xetex failed pretty quickly on "undumping
the format file", which has a few initial 4 byte reads, but then it
tries to undump big string pool which was many kilobytes in size. Rust
`read` however would only return 8196 minus couple of the the 4 bytes,
which would fail a check on the TeX side which expects full read. AFAICT
xetex uses `fread`, so this probably just comes down to buffering
differences between `fread` and Rust's `Reader` / `BufReader`.

Relates to tectonic-typesetting#935
Fixes tectonic-typesetting#1260
Copy link

codecov bot commented Feb 22, 2025

Codecov Report

Attention: Patch coverage is 0% with 15 lines in your changes missing coverage. Please review.

Project coverage is 46.34%. Comparing base (c2ae25f) to head (e335870).

Files with missing lines Patch % Lines
crates/bridge_core/src/lib.rs 0.00% 13 Missing ⚠️
crates/bridge_core/support/support.c 0.00% 1 Missing ⚠️
crates/pdf_io/pdf_io/dpx-spc_pdfm.c 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1261      +/-   ##
==========================================
- Coverage   46.35%   46.34%   -0.02%     
==========================================
  Files         184      184              
  Lines       66222    66236      +14     
==========================================
- Hits        30695    30694       -1     
- Misses      35527    35542      +15     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@vlasakm
Copy link
Contributor Author

vlasakm commented Feb 22, 2025

I provided more details in #1260 (comment).

@CraftSpider
Copy link
Contributor

Thank you for the fix! At some point I'd like to get a fulldeps test suite set up for testing stuff with dependencies, but for now, do you think there's an easy way to write a test that covers this behavior without it?

@vlasakm
Copy link
Contributor Author

vlasakm commented Feb 23, 2025

At some point I'd like to get a fulldeps test suite set up for testing stuff with dependencies

Sorry, I am not familiar with Tectonic's tests at all. What does fulldeps mean?

do you think there's an easy way to write a test that covers this behavior without it?

I assume you mean to test that pdf:fstream special gets emitted correctly. I posted LaTeX test case here: #1260 (comment), I don't know exactly what our options are with regards to testing, but here is a minimal plain XeTeX test with no external dependencies:

\special{dvipdfmx:config C 0x40} % don't use object streams
\special{dvipdfmx:config z 0}    % don't use compression

\special{pdf:fstream @SourceFile (\jobname.tex)}
\special{pdf:ann width 10bp height 20bp
  <<
    /Type /Annot
    /Subtype /FileAttachment
    /FS <<
      /Type /Filespec
      /F (\jobname.tex)
      /EF << /F @SourceFile >>
    >>
    /Name /PushPin
    /C [0.8 0.2 0.2]
    /T (Source code of this document)
    /Subj (Source code of this document)
    /Contents (Source code of this document)
  >>
}

\bye

It's based on dvipdfmx manual, and embeds the TeX source file as an attachment. Other options make the PDF "readable", so we could check that:

  1. There are no warnings emitted.
  2. That the embedded PDF stream for the source file has the expected length.

@CraftSpider
Copy link
Contributor

Sorry, I am not familiar with Tectonic's tests at all. What does fulldeps mean?

Sorry, jargon borrowed from the Rust compiler. There, the 'fulldeps' are tests that need access to the 'full' compiler, including as an external dependency. Here, I'm using it to imply a project with access to the default bundle or Tectonic.toml usage, which would make writing some test cases much easier.

<snip> but here is a minimal plain XeTeX test with no external dependencies: <snip>

Thanks, that's exactly what I meant! For now using external dependencies in tests is hard, so something that only relies on
XeTeX is perfect. If you want, I can push such a test directly to this branch when I have a minute.

@vlasakm
Copy link
Contributor Author

vlasakm commented Feb 23, 2025

Sorry, I am not familiar with Tectonic's tests at all. What does fulldeps mean?

Sorry, jargon borrowed from the Rust compiler. There, the 'fulldeps' are tests that need access to the 'full' compiler, including as an external dependency. Here, I'm using it to imply a project with access to the default bundle or Tectonic.toml usage, which would make writing some test cases much easier.

I see.

If you want, I can push such a test directly to this branch when I have a minute.

Please do if you have a minute.

It would take me some time to figure out how to test this in Tectonic test suite, and to be honest, don't have much chances now. I promise this well serve as an example for me to write such test myself in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants