Add support for Pandoc style markdown #22

kamalsacranie · 2022-05-27T00:01:54Z

I write all my notes using Pandoc, a powerful document converter. Using Pandoc's python package will allow us to solve the backslash escaping problem causing math syntax to be clunky and use dollar and double dollar signs for math.

There would also be a nice option to use Pandoc-style divs:

::: {data-question=}
## Multi-line front

With markdown preserved within the div that is created
:::

Turns into:

<div data-question="">
    <h2>Multi-line front</h2>
    <p>With markdown ....</p>
</div>

I've implemented this change locally and haven't had any problems. There are some cons:

This would mean the project would either depend on Pandoc or;
Have Pandoc as an optional dependency and pass in a boolean flag on the cli

There are many pros, however. It makes .md files which weren't written for Anki conversion need fewer changes. In fact, we would be writing pure Pandoc markdown which gets converted via an abstract syntax tree to html.

Just a thought

The text was updated successfully, but these errors were encountered:

lukesmurray · 2022-05-27T00:10:26Z

Definitely an interesting idea. I'm a huge fan of pandoc so I understand why it would be so enticing. Would you be open to sharing how you implemented it locally so I can check it out? If it doesn't complicate the project too much I'm open to discussing how we could integrate alternative parsers.

kamalsacranie · 2022-05-27T00:37:13Z

Was quite simple to implement.

import pandoc

def is_math_class(tag: Tag) -> bool:
    """Check if an HTML tag is a math oriented tag generated by pandoc"""
    try:
        return "math" in tag["class"]
    except KeyError:
        return False

def parse_markdown(
    file: str, deck_title_prefix: str, generate_cloze_model: bool
) -> Deck:
    """Parse a markdown string to an anki deck."""
    metadata, markdown_string = frontmatter.parse(read_file(file))
    doc = pandoc.read(markdown_string)
    html = pandoc.write(doc, format="html", options=["--mathjax"])

    soup = BeautifulSoup(html, "html.parser")

    # Find all the math tags using filter
    math_tags = soup.find_all(is_math_class)
    for tag in math_tags:
        tag.unwrap()  # Done for cleaner html in Anki

    ...

The rest of the script is identical

lukesmurray · 2022-05-27T05:33:13Z

Interesting. On the one hand, we could make this a command-line flag. I would probably call it md-parser and have it accept either python-markdown or pandoc as its value. However, I want to think about the default options we pass to pandoc. I also want to make sure that users' decks don't break if they switch parsers.

As an example, we support multiline questions, which I believe uses python-markdown specific syntax.

So while I love how simple this is, it requires a little bit of thought and care before we can go ahead and add it.

wrvsrx · 2022-09-20T01:21:32Z

Maybe we can add an option to receive pandoc ast (output in json format) from stdin and operate on it. That allows to convert any input format as long as pandoc support it. That also allows us to add custom pandoc filter. I'm trying to make such change.

lukesmurray · 2022-09-22T14:01:31Z

Given that we have multiple people interested in this I'll try to add support for this fairly soon.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for Pandoc style markdown #22

Add support for Pandoc style markdown #22

kamalsacranie commented May 27, 2022

lukesmurray commented May 27, 2022

kamalsacranie commented May 27, 2022

lukesmurray commented May 27, 2022 •

edited

Loading

wrvsrx commented Sep 20, 2022 •

edited

Loading

lukesmurray commented Sep 22, 2022

Add support for Pandoc style markdown #22

Add support for Pandoc style markdown #22

Comments

kamalsacranie commented May 27, 2022

lukesmurray commented May 27, 2022

kamalsacranie commented May 27, 2022

lukesmurray commented May 27, 2022 • edited Loading

wrvsrx commented Sep 20, 2022 • edited Loading

lukesmurray commented Sep 22, 2022

lukesmurray commented May 27, 2022 •

edited

Loading

wrvsrx commented Sep 20, 2022 •

edited

Loading