Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make valid .docx file with equation numbering #64

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

nOkuda
Copy link

@nOkuda nOkuda commented Nov 3, 2021

Resolves #62

At the least, I've been able to produce a .docx file that opens in Google Docs. My solution was to remove what appear to be extraneous tags.

@nOkuda
Copy link
Author

nOkuda commented Nov 3, 2021

Here are the notes I took while diagnosing the error:

I am using pandoc 1.16.1, pandoc-eqnos 2.5.0, and pandocfilters 1.5.0

Here's the file I used: test.md.

Here's the command I used: pandoc --filter pandoc-eqnos -o test.docx test.md.

Here's the output: test.docx.

When attempting to open the .docx file in LibreOffice Writer, the error is reported to be on line 1, character 739 of the word/document.xml entity in the .docx file. Unzipping test.docx and opening up word/document.xml shows that this location is the start of an end tag </w:p>.

Immediately prior to this end tag, I see <w:bookmarkStart w:id="0" w:name="eq:a"/><w:r><w:t>. There are two things of note here. One is that segment looks like this line of code (line 216 in pandoc_eqnos.py at the time of writing). Another thing to note is that both <w:r> and <w:t> at the end of this segment lack their own closing tags before the </w:p> tag that the error message reported.

However, there are matching </w:t> and </w:r> end tags starting at character 1484. This particular segment, which appears as </w:t></w:r><w:bookmarkEnd w:id="0"/>, looks like this line of code (line 220 in pandoc_eqnos.py at the time of writing).

Another thing I've noticed but that might not be significant is that both of these segments I've commented on immediately follow the sequence <w:pPr><w:pStyle w:val="FirstParagraph" /></w:pPr>. If I'm interpreting this correclty, this is applying a first paragraph style to some span within the document. I'm not sure why the first paragraph styling is being applied at the end of the document, but maybe that's an expected behavior in .docx files.

My initial guess as to why the error is occurring has to do with how the results of _add_markup interact with the final json data structure that gets passed on to pandoc. Perhaps instead of returning bookmarkstart, AttrMath(*value), and bookmarkend as a list, as they are here (line 221 in pandoc_eqnos.py at the time of writing), they should be precombined into their own pandoc AST node.

I never figured out how to precombine the list, so I decided to take the even easier route of removing the <w:r><w:t> and the </w:t></w:r> from the earlier referenced lines of the code. I was at first reluctant to try this, since I was concerned about the bookmark tags. But on closer inspection, and after running into the example in the BookmarkStart class, I realized that they are self-closing, which means that they don't need to be in any specific relationship with the BookmarkEnd class. (I wonder if that means that the w:id attribute might need to be updated with different values for each equation referenced; a future fix to pandoc-eqnos, I suppose.)

To generate the .docx file with the modified code, I adapted the piping instructions on the Pandoc filter page, which lead to the following command: pandoc -s test.md -t json | python3 <path _to_modified>/pandoc_eqnos.py docx | pandoc -s -f json -o newtest.docx. This yielded newtest.docx. I was happy to see that LibreOffice Writer opened the file without reporting an error, but I was unsatisfied with the formatting. Looking at the word/document.xml in newtest.docx, I noticed that the formatting information for the equation was there, so I uploaded the file to Google Docs and verified that the equation was formatted more pleasantly there.

@ianhbell
Copy link

Any chance of merging this at some point?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Convert markdown to word: “Word experienced an error trying to open the file”
2 participants