-
Notifications
You must be signed in to change notification settings - Fork 630
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature upgrade apache pdfbox #4450
Conversation
Maybe just provide a separate "beta" Version of PP with (only) pdfbox 3, post it in the forums and let the community test it? Advantage: we will (hopefully) only deal with one pdfbox Version in future. |
05ea668
to
aeea8dd
Compare
Can you perhaps make it so that you can check many files at once? Then I would run everything through myself |
I added a menu option to create diffs for multiple files. What it does not support: anonymizing the data via mouse click. You might have to do that manually. Please post meaningful deltas in this issue #4449 So far, I think the diffs are manageable. They do not look like material differences that have impact on the relevant regular expressions. |
Perhaps you should also add “Experimental”, as in the XML document menu item. |
Because it is a "headline" - all supported XML documents should come afterwards. There is only 1 at the moment. Menus do not support a headline. They only support deactivated items. the item is only visible when the experimental stuff is activated. I hope that is enough labelling. We'll remove it in a couple weeks anyway. (or maybe in years... who knows ;-)) |
First draft for #4449
It switches to PDFBox 3.0.3 and falls back - in case of errors - to PDFBox 1.8.
What I am still thinking about: How to learn if and where there are differences between the PDFBox versions.
Right now, I am printing a log message. But users will most likely not notice it let alone inform us.
I wonder if we should create both versions of the extracted text, and, if there are differences, put both files into the debug text. At least we see documents that are different.