Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature]: Copy line-wrapped paragraphs without line breaks #19343

Open
shivaprsd opened this issue Jan 18, 2025 · 1 comment
Open

[Feature]: Copy line-wrapped paragraphs without line breaks #19343

shivaprsd opened this issue Jan 18, 2025 · 1 comment

Comments

@shivaprsd
Copy link

shivaprsd commented Jan 18, 2025

Is the feature relevant to the Firefox PDF Viewer?

Yes

Feature description

I have searched past issues and found similar ones, but not one that addresses this exact issue.

Here is the test PDF, it is the same one referenced (long ago) in #7833.

STR

  1. Open the test PDF in PDF.js viewer
  2. Copy the first few lines of the abstract
  3. Paste it in a text editor

Obtained output

Lines are broken as they are wrapped visually:

This paper considers DoS attacks on DNS wherein attackers flood
the nameservers of a zone to disrupt resolution of resource records
belonging to the zone and consequently, any of its sub-zones. We
propose a minor change in the caching behavior of DNS resolvers
that can significantly alleviate the impact of such attacks [...]

Requested behaviour

It would be useful, while copying and pasting, if the lines are not broken but forms a logical paragraph.

OS and browser

macOS Sonoma 14.6, Firefox Developer Edition 133.0b9 (aarch64), PDF.js 4.8.30 [bde36f2]

Other PDF viewers

Behaviour in macOS Preview

Lines are not broken:

This paper considers DoS attacks on DNS wherein attackers flood the nameservers of a zone to disrupt resolution of resource records belonging to the zone and consequently, any of its sub-zones. We propose a minor change in the caching behavior of DNS resolvers that can significantly alleviate the impact of such attacks [...]

I haven't checked it in Adobe Acrobat reader.

@shivaprsd
Copy link
Author

I will also add a small observation, in case it helps:

I saw it mentioned that <div> was replaced by <span> to aid continuous text selection. But now there are also <br role="presentation"> tags in between the spans, which seems to be the culprit. When I copied two lines after removing the <br> between them, they came together.

But this is not a solution, as it also removes any spaces actually needed between consecutive words in adjacent lines...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants