Unable to get document widgets #74

Bonn2018 · 2023-06-08T16:24:07Z

Can't get widgets from this document
Maryland Bill of Sale for Vehicle Transactions (Form VR-181).pdf

As example widget with id 673R exist for this document but the document method getComponentById called with value 673 returns null

The text was updated successfully, but these errors were encountered:

microshine · 2023-06-09T04:34:50Z

It returns null because this component is marked like removed (type: 'f'). I don't think we should allow updating such components. But maybe it would be better to throw an exception for this case. @Bonn2018 what do you think?

Bonn2018 · 2023-06-09T09:10:50Z

It returns null because this component is marked like removed (type: 'f'). I don't think we should allow updating such components. But maybe it would be better to throw an exception for this case. @Bonn2018 what do you think?

let's try to recognize Adobe behavior with the same fields. Maybe need install some auto-fix with removing this flag or something else. Can be sure that Adobe allows to fill these fields and we also should

microshine · 2023-06-09T10:59:26Z

The document has incorrect object indexing in the XRef table. The objects are mistakenly marked as deleted objects, making it impossible to retrieve the position of the object within the document. Our current implementation relies solely on the indices specified in the XRef table, which speeds up the document loading process and avoids line-by-line reading.

Here is example of XRef indexes from this document

0000205176 00000 n
0000207305 00000 n
0000209026 00000 n
0000210373 00000 n
0000211931 00000 n
0000214166 00000 n
0000214831 00000 n
0000215684 00000 n
0000216733 00000 n
0000216795 00000 n
0000219355 00000 n
0000000000 65535 f
0000000000 65535 f
0000000000 65535 f
0000000000 65535 f
0000000000 65535 f
0000000000 65535 f
0000000000 65535 f
0000000000 65535 f
0000000000 65535 f
0000000000 65535 f
0000000000 65535 f
0000000000 65535 f
0000000000 65535 f
0000000000 65535 f
0000000000 65535 f
0000000000 65535 f
0000000000 65535 f
0000000000 65535 f
0000000000 65535 f
0000000000 65535 f
0000000000 65535 f
0000000000 65535 f
0000000000 65535 f
0000000000 65535 f
0000000000 65535 f

Considering the severity of the issue and the impact it has on the document's integrity, I recommend treating this particular format as corrupted. However, if it's necessary to support this format, we will need to modify the document reading approach in our module, moving away from relying solely on the XRef indices.

@rmhrisk What do you think?

Bonn2018 · 2023-06-09T11:43:12Z

@microshine Just my thinks about it:
As I understood we consider two tactics for reading documents:

According to xref table
Line-by-line reading

Now we find a case where xref table betrayed us because have the wrong info and present an invalid experience (we can't find a field that actually exists)
From the last thesis, we can find a fast theory that "Line-by-line reading " is better because of more safety.
Also, need to say that "Line-by-line reading" is much slower and we want omit it.

I'll leave some questions about it:

1) Could we install "Line-by-line reading " as a fallback method? Is it hard to implement?

In Hancock experience, we do not produce random requests to widgets. If we ask for some widget by id, it means that we are sure that this widget exists in the document. I think we can keep "Line-by-line reading " as a fallback method and use it only in extra cases do not produce a bad effect on documents with a good structure

2) Could we create some review for xref table and refactor it as needed?

We currently implemented auto fix for documents with some strange format etc. Maybe we can produce the same with this problem. Some method in a document instance that review xref table and create new one if will found a bug. This strategy allow for us fix table and omit "Line-by-line reading" at the start of using a document. It could work at least for all not signed previously documents

microshine · 2023-06-09T12:06:47Z

Could we install "Line-by-line reading " as a fallback method? Is it hard to implement?

Supporting line-by-line reading should not be difficult. The main question is how to determine when to apply this approach.

Could we create some review for xref table and refactor it as needed?

Updating the indices in the XRef table can be quite problematic. It would be easier to suggest re-saving the document. Fortunately, our module provides the capability to resave documents.

Bonn2018 · 2023-06-09T12:55:05Z

Supporting line-by-line reading should not be difficult. The main question is how to determine when to apply this approach.

When we can't get an object using xref table

Bonn2018 · 2023-06-09T13:00:46Z

Updating the indices in the XRef table can be quite problematic. It would be easier to suggest re-saving the document. Fortunately, our module provides the capability to resave documents.

This way is valid for us but will be strange if we will do it with each document. I think need some method which will do review before it

microshine · 2023-06-10T09:15:03Z

I discussed this matter with @Romashine, and we came up with an idea on how it could be implemented.

After reading the XRef indices in the document, we can perform a check for deleted objects. If a deleted object doesn't have a preceding version (i.e., it was created as deleted), we can search for that object throughout the original document using its header obj. We will read the found object and use it as a reference within the document.

At first glance, the implementation doesn't seem too complicated. I will try to work on it over the weekend.

However, there is one challenging aspect to consider. What should be done if the same deleted object appears twice in the document? In this case, it becomes difficult to determine the most recent version because, unfortunately, in PDFs, objects are not always written in the sequence they were created.

microshine · 2023-06-10T16:10:57Z

It turns out that the document uses a hybrid XRef. Unfortunately, our current version does not support this XRef format.

trailer
<</Size 1131/Root 1083 0 R/Info 173 0 R/ID[<5822179FD54F55489CDF1CB430BF4866><FF637C10A2A4434BBBED709DCC73345A>]/Prev 219538/XRefStm 1792>>
startxref

I have created a new issue #78 to implement support for this format.

Bonn2018 · 2023-06-12T21:01:54Z

Washington Vehicle_Vessel Bill of Sale Form (1).pdf

@microshine another one document with same issue but by another reason. Please check this document

MarikTar · 2024-03-19T14:44:25Z

There is also another problem with this Maryland Bill of Sale for Vehicle Transactions Form VR-181.pdf file, but it may also be related to XRef table.

Some of Signature images could be missposition and compressed vertically after signing. It happens if the document has several signature fields assigned to different recipients.

Steps to reproduce in Hancock:

Create a 'me and other' transaction with this document
Assign two any signature fields for First recipient
Add another one signature field
Assign new one field and the last one not assigned signature field to the Second recipient
Sign transaction for both recipients

This was referenced Jun 14, 2023

Fix issues #81

Merged

Issue with PDF Checkbox Behavior in Library #83

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unable to get document widgets #74

Unable to get document widgets #74

Bonn2018 commented Jun 8, 2023

microshine commented Jun 9, 2023

Bonn2018 commented Jun 9, 2023

microshine commented Jun 9, 2023

Bonn2018 commented Jun 9, 2023 •

edited

Loading

microshine commented Jun 9, 2023

Bonn2018 commented Jun 9, 2023

Bonn2018 commented Jun 9, 2023 •

edited

Loading

microshine commented Jun 10, 2023

microshine commented Jun 10, 2023

Bonn2018 commented Jun 12, 2023

MarikTar commented Mar 19, 2024

Unable to get document widgets #74

Unable to get document widgets #74

Comments

Bonn2018 commented Jun 8, 2023

microshine commented Jun 9, 2023

Bonn2018 commented Jun 9, 2023

microshine commented Jun 9, 2023

Bonn2018 commented Jun 9, 2023 • edited Loading

microshine commented Jun 9, 2023

Bonn2018 commented Jun 9, 2023

Bonn2018 commented Jun 9, 2023 • edited Loading

microshine commented Jun 10, 2023

microshine commented Jun 10, 2023

Bonn2018 commented Jun 12, 2023

MarikTar commented Mar 19, 2024

Bonn2018 commented Jun 9, 2023 •

edited

Loading

Bonn2018 commented Jun 9, 2023 •

edited

Loading