-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature request: Pandoc integration #2491
Comments
@ssddanbrown in response to the last comment over in #2412, indeed, these ostensibly simple things often get more complex very quickly. In terms of workflow, after giving it some thought perhaps a similar integration as WKHTMLTOPDF. The user installs Pandoc manually, using the Pandoc docs for their environment (apt-get Pandoc for example in Ubuntu). Then adds in a When PANDOC=True there could be some new fields in the export dropdown menu: EPUB; HTML Archive (or something more logically named instead of HTML Archive. Hopefully then passing the same content being pulled for the current export features to Pandoc on the system locally, followed by a return of the output to download. By using the same method as WKHTMLTOPDF, it doesn't make as mission critical to maintain and allows for some dev experimentation. Similarly, only using EPUB and HTML Archive rather than replacing the current PDF and html export processes, as certainly not confident enough in it to recommend that off the bat. I realise a lot of this is preaching to the choir, but seems you have plenty of tickets and things on your plate, so figure the more thought/detail given to a feature request and the use case considered before making the request the better. Big thanks for the work on this, it is going to become quite a central part of our EdTech COVID response work. |
After further thought, how about simplifying this down to allowing the original markdown that bookstack uses to be exported? When included in the api this would allow us to utilise third party processing of exported data (like pandoc) without the extra support burden. |
Hi @Maggie0002 , |
Whoops, sorry, thought it defaulted to Markdown. I meant an API point to export the WYSIWYG content as is, rather than converting first to HTML or PDF. I don't see that in the API docs. |
That (pages => read) endpoint should give you the HTML that's used when viewing a page. This is pretty much the same as the HTML loaded in the WYSIWYG editor but with a pass to remove some potentially dangerous elements. |
Helpful, and interesting, thanks. My understanding then is the difference is just that the export -> html function takes that same html seen in the pages -> read endpoint, passes it to a processor that converts pictures etc into an embedded html file. But without headers, which presumably is what the html processor takes care of (among other things). Will experiment with that endpoint and report back anything useful. |
Didn't get very far. Turns out the HTML the API pipes out is missing headings, css, all the formatting, would be a lot of work to go from there to something usable. Is there a way to access the HTML used by the exporter but with the original HREF to the images and/or video rather than the embedded images? It would be a fairly simple (in theory) mirror of that page to then get it with exported content. Wget for example has a --mirror option I could experiment with as a light-weight solution. |
No way to get that directly, Although the main content HTML is what you'd get out of the API; The export just wraps it up in a template with some extra styles. The export uses this template, With these export styles. |
Having given it some more thought, how would you feel about PanDoc as an optional exporter similar to how wkhtmltopdf is currently integrated? This wrapper is proving useful: https://github.com/ueberdosis/pandoc Would also help resolve some other issues that I don't think we will find a way around: |
Hi @Maggie0002, To be honest, I'd not be very keen. Supporting both of the existing PDF export options has already proved a lot more challenging than hoped and consumed a lot of my time in the various requests & issues that have generated from it. The range of conversion formats that pandoc would open up would worry me, and I think that it's optimistic that it'll solve more issues than it'll create as an alternative PDF generator, especially since I believe pandoc will use WKHTMLtoPDF by default anyway for HTML to PDF conversions. |
Hi @ssddanbrown,
I was thinking Pandoc integration as an optional module. It would add some efficiencies to the various exports by keeping the assets seperate as discussed above (and potentially resolve some other outstanding issues), but also provide a bunch of additional options, such as EPUB (#1949), Word doc, video export support (#883; #2412) and a bunch more.
Here are a few shortcuts to try it out:
apt-get install pandoc
orbrew install pandoc
should do the trick (if installing in a docker container, may need to install build-essential and/or curl).test.md
Execute the command:
pandoc test.md -o example2.html --extract-media ./assets
More info relating to this originally discussed in: #2412
The text was updated successfully, but these errors were encountered: