Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bold/italic in the middle of words does not work: should it? #55

Open
tfeb opened this issue Oct 10, 2015 · 10 comments
Open

Bold/italic in the middle of words does not work: should it? #55

tfeb opened this issue Oct 10, 2015 · 10 comments
Assignees

Comments

@tfeb
Copy link
Contributor

tfeb commented Oct 10, 2015

If you type something like **fr**ozen bl**og** then I think you should get frozen blog: you currently get something with embedded asterisks instead (this is using markdown in frog of course). I don't know what the markdown standard, as far as there is one, says about this.

I think this used to work (at least, when I wrote that markup originally I assume I checked the output!).

This is a non-major issue.

@greghendershott
Copy link
Owner

@greghendershott
Copy link
Owner

Markdown is not markup. Profound, huh? :) Seriously, markdown is an underspecified DWIM system. To get exactly what you mean, sometimes you need to use inline HTML. I think this is one of those times.

So for example <strong>fr</strong>ozen bl<strong>og</strong> is valid markdown and is consistent among virtually all markdown parsers.

@tfeb
Copy link
Contributor Author

tfeb commented Oct 10, 2015

Yes, I agree this is an implementation-dependent thing. The only thing I think would be good is if it was handled consistently (for some value of consistently). What you currently get for **x**y z**q** is <strong>x</strong>y z*<em>q</em>* which I think is weird: either it should look for both asterisks or neither, I think.

If you'd consider a pull request, what do you think it should do here, if anything?

@greghendershott
Copy link
Owner

Oh, I'm sorry. I misunderstood your description of the actual result. Given this:

#lang racket
(require markdown)
(parse-markdown "**fr**ozen bl**og**")

I agree that this actual result is weird:

'((p () (strong () "fr") "ozen bl*" (em () "og") "*"))

I expected it would be:

'((p () (strong () "fr**ozen bl**og")))

I could also understand sticking with the original markdown spec and having the result be:

'((p () (strong () "fr") "ozen bl" (strong () "og")))

Although I understand the spirit of the GHFM idea (what if you have _ or * in a variable name), I think the better, general answer for that is to put code-ish things in single backticks.

Hmmm.

So maybe I should change this. My only misgivings are:

  1. How might it break people's existing stuff (including Frog blog posts and Pollen books).
  2. This part of the parser grammar has been performance sensitive in the past, so I'll need to be careful.

@greghendershott
Copy link
Owner

p.s. In case it wasn't clear, I'm now thinking that the actual result ought to be:

'((p () (strong () "fr") "ozen bl" (strong () "og")))

That single or double underlines or asterisks should always be parsed (unless explicitly escaped with \ of course) even within words.

Which is in line with the original markdown description, and seems to be the BabelMark2 consensus.

@tfeb
Copy link
Contributor Author

tfeb commented Oct 10, 2015

I agree with you on what the result 'should' be. But I think you're right to be worried about breaking things either in terms of performance or (worse) changes to how existing stuff is processed. So perhaps the right thing would be to leave it (I can certainly easily fix the single instance where it is a problem for me by explicit HTML or something else).

@greghendershott
Copy link
Owner

I spent some time on this and have a potential commit.

  • The performance is comparable on all the test cases I have.
  • The result is now in line with the BabelMark2 majority ("consensus" would be putting it too strongly).

So mainly I just need to think about the backward incompatibility, the breakage scenarios.

(Often it's reasonable to deal with this sort of change by saying, "add a parameter", defaulting to the old behavior. But this markdown library is mostly embedded in end-user tools like Frog and Pollen, each of which would need to reflect such a new option to the real user. And it's not the kind of option that's fun to explain to people, or people really want to think about how to set. As a result, I'd love to be able to feel that just changing the default would be OK. If I can....)

@greghendershott
Copy link
Owner

I sat on this for a couple weeks. I just now pushed the commit to a topic branch, at least, for Travis CI to check.

I'm thinking strongly of merging this to master. Although it's not 100% backward compatible with the recent behavior, it's compatible with the earlier versions of this parser, and it's consistent with the BabelMark2 majority.

@greghendershott greghendershott self-assigned this Oct 27, 2015
@greghendershott
Copy link
Owner

Actually I may defer this. There are still corner cases I don't like.

I checked out the relevant section of the CommonMark spec, which goes into rather mind-numbing (but valid) detail. Although its 12 rules -- and 5 more rules to handle when the first 12 are ambiguous -- seem like the ultimate right thing to do, I just don't have time right now to tackle that.

@tfeb
Copy link
Contributor Author

tfeb commented Oct 28, 2015

I'm completely happy with that: I was afraid that there'd be a hideous nightmare of twisty little rules in there somewhere. I've manually fixed the only case I care about, and I kind of agree that it's better to escape to HTML in these cases, because you can then have markup which is clear (even if the CommonMark rules are clear, I bet a human writing or reading the markdown can't reliably predict what happens, which makes them of limited use in practice).

So if you want to punt indefinitely that's fine by me!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants