Bold/italic in the middle of words does not work: should it? #55

tfeb · 2015-10-10T11:41:52Z

If you type something like **fr**ozen bl**og** then I think you should get frozen blog: you currently get something with embedded asterisks instead (this is using markdown in frog of course). I don't know what the markdown standard, as far as there is one, says about this.

I think this used to work (at least, when I wrote that markup originally I assume I checked the output!).

This is a non-major issue.

The text was updated successfully, but these errors were encountered:

greghendershott · 2015-10-10T14:50:40Z

The "standard" says yes: http://daringfireball.net/projects/markdown/syntax#em but its test suite doesn't exercise this IIRC.

GHFM says no for underlines but yes for asterisks: https://help.github.com/articles/github-flavored-markdown/#differences-from-traditional-markdown

BabelMark says http://johnmacfarlane.net/babelmark2/?normalize=1&text=**fr**ozen+bl**og**+__fr__ozen+bl__og__+*fr*ozen+bl*og*+_fr_ozen+bl_og_%0A%0A

greghendershott · 2015-10-10T15:05:50Z

Markdown is not markup. Profound, huh? :) Seriously, markdown is an underspecified DWIM system. To get exactly what you mean, sometimes you need to use inline HTML. I think this is one of those times.

So for example frozen blog is valid markdown and is consistent among virtually all markdown parsers.

tfeb · 2015-10-10T15:40:27Z

Yes, I agree this is an implementation-dependent thing. The only thing I think would be good is if it was handled consistently (for some value of consistently). What you currently get for **x**y z**q** is xy z*q* which I think is weird: either it should look for both asterisks or neither, I think.

If you'd consider a pull request, what do you think it should do here, if anything?

greghendershott · 2015-10-10T16:49:47Z

Oh, I'm sorry. I misunderstood your description of the actual result. Given this:

#lang racket
(require markdown)
(parse-markdown "**fr**ozen bl**og**")

I agree that this actual result is weird:

'((p () (strong () "fr") "ozen bl*" (em () "og") "*"))

I expected it would be:

'((p () (strong () "fr**ozen bl**og")))

I could also understand sticking with the original markdown spec and having the result be:

'((p () (strong () "fr") "ozen bl" (strong () "og")))

Although I understand the spirit of the GHFM idea (what if you have _ or * in a variable name), I think the better, general answer for that is to put code-ish things in single backticks.

Hmmm.

So maybe I should change this. My only misgivings are:

How might it break people's existing stuff (including Frog blog posts and Pollen books).
This part of the parser grammar has been performance sensitive in the past, so I'll need to be careful.

greghendershott · 2015-10-10T16:54:13Z

p.s. In case it wasn't clear, I'm now thinking that the actual result ought to be:

'((p () (strong () "fr") "ozen bl" (strong () "og")))

That single or double underlines or asterisks should always be parsed (unless explicitly escaped with \ of course) even within words.

Which is in line with the original markdown description, and seems to be the BabelMark2 consensus.

tfeb · 2015-10-10T18:54:49Z

I agree with you on what the result 'should' be. But I think you're right to be worried about breaking things either in terms of performance or (worse) changes to how existing stuff is processed. So perhaps the right thing would be to leave it (I can certainly easily fix the single instance where it is a problem for me by explicit HTML or something else).

greghendershott · 2015-10-10T19:26:55Z

I spent some time on this and have a potential commit.

The performance is comparable on all the test cases I have.
The result is now in line with the BabelMark2 majority ("consensus" would be putting it too strongly).

So mainly I just need to think about the backward incompatibility, the breakage scenarios.

(Often it's reasonable to deal with this sort of change by saying, "add a parameter", defaulting to the old behavior. But this markdown library is mostly embedded in end-user tools like Frog and Pollen, each of which would need to reflect such a new option to the real user. And it's not the kind of option that's fun to explain to people, or people really want to think about how to set. As a result, I'd love to be able to feel that just changing the default would be OK. If I can....)

greghendershott · 2015-10-27T14:25:46Z

I sat on this for a couple weeks. I just now pushed the commit to a topic branch, at least, for Travis CI to check.

I'm thinking strongly of merging this to master. Although it's not 100% backward compatible with the recent behavior, it's compatible with the earlier versions of this parser, and it's consistent with the BabelMark2 majority.

greghendershott · 2015-10-28T03:19:52Z

Actually I may defer this. There are still corner cases I don't like.

I checked out the relevant section of the CommonMark spec, which goes into rather mind-numbing (but valid) detail. Although its 12 rules -- and 5 more rules to handle when the first 12 are ambiguous -- seem like the ultimate right thing to do, I just don't have time right now to tackle that.

tfeb · 2015-10-28T08:52:02Z

I'm completely happy with that: I was afraid that there'd be a hideous nightmare of twisty little rules in there somewhere. I've manually fixed the only case I care about, and I kind of agree that it's better to escape to HTML in these cases, because you can then have markup which is clear (even if the CommonMark rules are clear, I bet a human writing or reading the markdown can't reliably predict what happens, which makes them of limited use in practice).

So if you want to punt indefinitely that's fine by me!

greghendershott pushed a commit that referenced this issue Oct 27, 2015

Intra-word strong and em. Closes #55

5004e09

greghendershott self-assigned this Oct 27, 2015

greghendershott mentioned this issue Jul 9, 2016

Bold/italic within characters like dashes is not properly rendered #64

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bold/italic in the middle of words does not work: should it? #55

Bold/italic in the middle of words does not work: should it? #55

tfeb commented Oct 10, 2015

greghendershott commented Oct 10, 2015

greghendershott commented Oct 10, 2015

tfeb commented Oct 10, 2015

greghendershott commented Oct 10, 2015

greghendershott commented Oct 10, 2015

tfeb commented Oct 10, 2015

greghendershott commented Oct 10, 2015

greghendershott commented Oct 27, 2015

greghendershott commented Oct 28, 2015

tfeb commented Oct 28, 2015

Bold/italic in the middle of words does not work: should it? #55

Bold/italic in the middle of words does not work: should it? #55

Comments

tfeb commented Oct 10, 2015

greghendershott commented Oct 10, 2015

greghendershott commented Oct 10, 2015

tfeb commented Oct 10, 2015

greghendershott commented Oct 10, 2015

greghendershott commented Oct 10, 2015

tfeb commented Oct 10, 2015

greghendershott commented Oct 10, 2015

greghendershott commented Oct 27, 2015

greghendershott commented Oct 28, 2015

tfeb commented Oct 28, 2015