Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement encryption #54

Merged
merged 2 commits into from
Jun 14, 2024
Merged

Conversation

programmarchy
Copy link

@programmarchy programmarchy commented May 23, 2024

Based on PhakornKiong/pdf-lib#dev/DocEncrypt

What?

This PR adds the ability to encrypt a PDF document. It's based on PhakornKiong's PR Hopding#1015, which sadly did not get merged. I've updated his code to be compatible with the latest changes to pdf-lib. Tried to simplify where possible, minimizing changes to PDFDocument and PDFContext.

Why?

I needed PDF encryption on a project. This repo has already merged decryption. Let's have both!

How?

Uses CryptoJS to encrypt streams.

Testing?

I've tested it for my personal use case. I have not implemented unit tests yet. Want some assurance it will get merged before working on unit tests.

New Dependencies?

CryptoJS.

Screenshots

N/A

Suggested Reading?

Yes, also read the Encryption part of the spec.

Anything Else?

Checklist

  • I read CONTRIBUTING.md.
  • I read MAINTAINERSHIP.md#pull-requests.
  • I added/updated unit tests for my changes.
  • I added/updated integration tests for my changes.
  • I ran the integration tests.
  • I tested my changes in Node, Deno, and the browser.
  • I viewed documents produced with my changes in Adobe Acrobat, Foxit Reader, Firefox, and Chrome.
  • I added/updated doc comments for any new/modified public APIs.
  • My changes work for both new and existing PDF files.
  • I ran the linter on my changes.

Based on PhakornKiong/pdf-lib#dev/DocEncrypt
@Sharcoux
Copy link
Collaborator

I'll have a look at it as soon as I have a moment. Thanks very much for driving this here.

package.json Show resolved Hide resolved
@Sharcoux
Copy link
Collaborator

@programmarchy Everything seems alright. I'll be able to merge. I'm just not sure about the change from PDFHeader.forVersion(1, 7); to this.context.header. I don't understand either the original code nor the new one to be honest 😅

@programmarchy
Copy link
Author

@Sharcoux I know that the PDF version dictates which encryption algorithm (represented by V) is used.

But you ask a good question worth investigating. There may be a reason that version 1.7 was hardcoded. For example, maybe pdf-lib can read version 1.4 PDFs, but only supports writing 1.7-compatible PDFs?

If we determine that pdf-lib will only support writing 1.7 PDFs, then we could eliminate the code that handles encryption for other PDF versions.

@Sharcoux
Copy link
Collaborator

Do you have time to run those investigations? Otherwise, we should take the safest approach, which is probably to keep the original code. If that raises problems, we can solve them as they go.

@programmarchy
Copy link
Author

@Sharcoux Agree on taking the safer approach. I'm good only supporting 1.7. So I'll simplify PDFSecurity to only support that version.

Also, this PR has a bug where encrypted PDFs cannot be opened Adobe Acrobat. It's stricter than other readers, so I need to fix that also.

@Sharcoux
Copy link
Collaborator

@Sharcoux Agree on taking the safer approach. I'm good only supporting 1.7. So I'll simplify PDFSecurity to only support that version.

Also, this PR has a bug where encrypted PDFs cannot be opened Adobe Acrobat. It's stricter than other readers, so I need to fix that also.

Maybe you should keep the support for other versions and only add a comment above the part: const header = PDFHeader.forVersion(1, 7); mentioning that investigations are required to know why this is there. This way, we don't lose your work.

@RippleRurigaki
Copy link

RippleRurigaki commented Jun 5, 2024

@programmarchy

But you ask a good question worth investigating. There may be a reason that version 1.7 was hardcoded. For example, maybe pdf-lib can read version 1.4 PDFs, but only supports writing 1.7-compatible PDFs?

If we determine that pdf-lib will only support writing 1.7 PDFs, then we could eliminate the code that handles encryption for other PDF versions.

pdf-lib does not seem to assume anything other than version 1.7.
I can't find any version checks for added functionality or deprecated functionality by PDF version.
Maybe that's why version 1.7 is hardcoded because of the simplification.

Add:

I found the comment mentioned by @Hopding

Hopding#916 (comment)

@Sharcoux
Copy link
Collaborator

@programmarchy

But you ask a good question worth investigating. There may be a reason that version 1.7 was hardcoded. For example, maybe pdf-lib can read version 1.4 PDFs, but only supports writing 1.7-compatible PDFs?
If we determine that pdf-lib will only support writing 1.7 PDFs, then we could eliminate the code that handles encryption for other PDF versions.

pdf-lib does not seem to assume anything other than version 1.7. I can't find any version checks for added functionality or deprecated functionality by PDF version. Maybe that's why version 1.7 is hardcoded because of the simplification.

Add:

I found the comment mentioned by @Hopding

Hopding#916 (comment)

Nice, but then, this means that we should probably keep things the way they were. Can you revert that part and add a comment near the hard codded header linking to Hopding message or this conversation?

@programmarchy
Copy link
Author

I reverted the version header part.

However, there's still the outstanding issue with encrypted PDFs not able to be read by Adobe Acrobat. Still looking into that. Acrobat doesn't seem to output any useful information for troubleshooting. Going to compare with other PDF libs.

@Sharcoux Sharcoux merged commit 215eded into cantoo-scribe:master Jun 14, 2024
1 check passed
@Sharcoux
Copy link
Collaborator

Ah, my bad, I saw your comment too late. I merged but I won't release waiting for your fix.

@Sharcoux
Copy link
Collaborator

@programmarchy Will you still try to make the pdfs be opened with Adobe Acrobet?

@programmarchy
Copy link
Author

@Sharcoux It's on my todo list but I don't have a timeline of when I can finish it. Partly because of my schedule, and partly because I'm a bit stuck, frankly.

I know that qpdf is able to encrypt the same PDFs I tested with and they open successfully. Their encryption code is here: https://github.com/qpdf/qpdf/blob/main/libqpdf/QPDF_encryption.cc but I haven't finished comparing the implementations.

If we can attract others to help solve the problem I think that'd be ideal.

@programmarchy
Copy link
Author

@PhakornKiong Hope it's okay that I am reaching out to you here. We're working on merging in your PDF encryption implementation into this fork of pdf-lib. The implementation works great for most PDF viewers, however, Adobe Acrobat cannot open the file.

Do you happen to have any insight?

image

@RippleRurigaki
Copy link

@programmarchy

Try adding types that do not compress in the object stream.

PDFStreamWriter.ts

      const shouldNotCompress =
        ref === this.context.trailerInfo.Encrypt ||
        object instanceof PDFStream ||
        object instanceof PDFInvalidObject ||
        object instanceof PDFCatalog ||   //Add
        object instanceof PDFPageLeaf ||  //Add
        object instanceof PDFPageTree || //Add
        ref.generationNumber !== 0;

and Import.

Because,In PDF Specifications

In linearized files (see Annex F, "Linearized PDF" and Annex G, "Linearized PDF access strategies"),
the document catalog dictionary,
the linearization dictionary, and page objects shall not appear in an object stream.

Only a simple test, but the error seems to have disappeared.

@RippleRurigaki
Copy link

Ah?

#54 (comment)

Perhaps this should be a separate issue from encryption?
There should be an impact on Object Stream output regardless of encryption.

@programmarchy
Copy link
Author

@RippleRurigaki Yep, that fixed the issue with Acrobat! Nice find... I'll open a new PR.

@Sharcoux
Copy link
Collaborator

Released in version 2.2.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants