-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unable to get document widgets #74
Comments
It returns |
let's try to recognize Adobe behavior with the same fields. Maybe need install some auto-fix with removing this flag or something else. Can be sure that Adobe allows to fill these fields and we also should |
The document has incorrect object indexing in the XRef table. The objects are mistakenly marked as deleted objects, making it impossible to retrieve the position of the object within the document. Our current implementation relies solely on the indices specified in the XRef table, which speeds up the document loading process and avoids line-by-line reading. Here is example of XRef indexes from this document
Considering the severity of the issue and the impact it has on the document's integrity, I recommend treating this particular format as corrupted. However, if it's necessary to support this format, we will need to modify the document reading approach in our module, moving away from relying solely on the XRef indices. @rmhrisk What do you think? |
@microshine Just my thinks about it:
Now we find a case where xref table betrayed us because have the wrong info and present an invalid experience (we can't find a field that actually exists) I'll leave some questions about it: 1) Could we install "Line-by-line reading " as a fallback method? Is it hard to implement? In Hancock experience, we do not produce random requests to widgets. If we ask for some widget by id, it means that we are sure that this widget exists in the document. I think we can keep "Line-by-line reading " as a fallback method and use it only in extra cases do not produce a bad effect on documents with a good structure 2) Could we create some review for xref table and refactor it as needed? We currently implemented auto fix for documents with some strange format etc. Maybe we can produce the same with this problem. Some method in a document instance that review xref table and create new one if will found a bug. This strategy allow for us fix table and omit "Line-by-line reading" at the start of using a document. It could work at least for all not signed previously documents |
Supporting line-by-line reading should not be difficult. The main question is how to determine when to apply this approach.
Updating the indices in the XRef table can be quite problematic. It would be easier to suggest re-saving the document. Fortunately, our module provides the capability to resave documents. |
When we can't get an object using xref table |
This way is valid for us but will be strange if we will do it with each document. I think need some method which will do review before it |
I discussed this matter with @Romashine, and we came up with an idea on how it could be implemented. After reading the XRef indices in the document, we can perform a check for deleted objects. If a deleted object doesn't have a preceding version (i.e., it was created as deleted), we can search for that object throughout the original document using its header obj. We will read the found object and use it as a reference within the document. At first glance, the implementation doesn't seem too complicated. I will try to work on it over the weekend. However, there is one challenging aspect to consider. What should be done if the same deleted object appears twice in the document? In this case, it becomes difficult to determine the most recent version because, unfortunately, in PDFs, objects are not always written in the sequence they were created. |
It turns out that the document uses a hybrid XRef. Unfortunately, our current version does not support this XRef format.
I have created a new issue #78 to implement support for this format. |
Washington Vehicle_Vessel Bill of Sale Form (1).pdf @microshine another one document with same issue but by another reason. Please check this document |
There is also another problem with this Maryland Bill of Sale for Vehicle Transactions Form VR-181.pdf file, but it may also be related to XRef table. Some of Signature images could be missposition and compressed vertically after signing. It happens if the document has several signature fields assigned to different recipients. Steps to reproduce in Hancock:
|
Can't get widgets from this document
Maryland Bill of Sale for Vehicle Transactions (Form VR-181).pdf
As example widget with id
673R
exist for this document but the document methodgetComponentById
called with value673
returnsnull
The text was updated successfully, but these errors were encountered: