Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for Pandoc style markdown #22

Open
kamalsacranie opened this issue May 27, 2022 · 5 comments
Open

Add support for Pandoc style markdown #22

kamalsacranie opened this issue May 27, 2022 · 5 comments

Comments

@kamalsacranie
Copy link

I write all my notes using Pandoc, a powerful document converter. Using Pandoc's python package will allow us to solve the backslash escaping problem causing math syntax to be clunky and use dollar and double dollar signs for math.

There would also be a nice option to use Pandoc-style divs:

::: {data-question=}
## Multi-line front

With markdown preserved within the div that is created
:::

Turns into:

<div data-question="">
    <h2>Multi-line front</h2>
    <p>With markdown ....</p>
</div>

I've implemented this change locally and haven't had any problems. There are some cons:

  • This would mean the project would either depend on Pandoc or;
  • Have Pandoc as an optional dependency and pass in a boolean flag on the cli

There are many pros, however. It makes .md files which weren't written for Anki conversion need fewer changes. In fact, we would be writing pure Pandoc markdown which gets converted via an abstract syntax tree to html.

Just a thought

@lukesmurray
Copy link
Owner

Definitely an interesting idea. I'm a huge fan of pandoc so I understand why it would be so enticing. Would you be open to sharing how you implemented it locally so I can check it out? If it doesn't complicate the project too much I'm open to discussing how we could integrate alternative parsers.

@kamalsacranie
Copy link
Author

Was quite simple to implement.

import pandoc

def is_math_class(tag: Tag) -> bool:
    """Check if an HTML tag is a math oriented tag generated by pandoc"""
    try:
        return "math" in tag["class"]
    except KeyError:
        return False

def parse_markdown(
    file: str, deck_title_prefix: str, generate_cloze_model: bool
) -> Deck:
    """Parse a markdown string to an anki deck."""
    metadata, markdown_string = frontmatter.parse(read_file(file))
    doc = pandoc.read(markdown_string)
    html = pandoc.write(doc, format="html", options=["--mathjax"])

    soup = BeautifulSoup(html, "html.parser")

    # Find all the math tags using filter
    math_tags = soup.find_all(is_math_class)
    for tag in math_tags:
        tag.unwrap()  # Done for cleaner html in Anki

    ...

The rest of the script is identical

@lukesmurray
Copy link
Owner

lukesmurray commented May 27, 2022

Interesting. On the one hand, we could make this a command-line flag. I would probably call it md-parser and have it accept either python-markdown or pandoc as its value. However, I want to think about the default options we pass to pandoc. I also want to make sure that users' decks don't break if they switch parsers.

As an example, we support multiline questions, which I believe uses python-markdown specific syntax.

So while I love how simple this is, it requires a little bit of thought and care before we can go ahead and add it.

@wrvsrx
Copy link

wrvsrx commented Sep 20, 2022

Maybe we can add an option to receive pandoc ast (output in json format) from stdin and operate on it. That allows to convert any input format as long as pandoc support it. That also allows us to add custom pandoc filter. I'm trying to make such change.

@lukesmurray
Copy link
Owner

Given that we have multiple people interested in this I'll try to add support for this fairly soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants