\mid incorrectly parsed as fence #2503

xworld21 · 2025-01-27T18:48:09Z

The simple

a \mid b \mid c

renders as

because \mid gets parsed as a fence, and is given inner spacing 0, even though it is a binary relation (and LaTeX spaces it as such). More worringly, MathJax will spell this out as 'absolute value'.

I think LaTeXML should treat \mid (and \mvert) more like RELOP than VERTBAR. This may require some changes to the grammar e.g. around the rule MIDDLE to catch uses like \{ x \in A \mid P(x) \}.

The text was updated successfully, but these errors were encountered:

dginev · 2025-01-27T18:51:14Z

Vertbars are actually the least simple operators :> Too much ambiguity. Valid problem of course.

xworld21 · 2025-01-27T18:54:14Z

PS: pleased to report that after assigning role RELOP, VoiceOver reads a \mid b \mid c as 'a divides b divides c', as it should.

xworld21 · 2025-01-27T19:06:14Z

To state the issue a bit more precisely (then I'll shut up!), the grammar should make more use of \mathrel, \mathbin, etc. for disambiguating (based on my unfounded belief that \mid is almost never used as a fence – it would look very odd, and it doesn't resize with \left and \right!).

dginev · 2025-01-27T19:55:47Z

Sometimes that's a chained divides RELOP, sometimes they are fences for an inner product with bra-ket notation

$$ \langle\phi \mid \boldsymbol{A} \mid \psi\rangle $$

And then we also have the much simpler (but rarer) multiplication of two factors with an intermediate absolute value term:

$$ 2 \left| -x \right| y $$

It's tricky.

xworld21 · 2025-01-27T20:05:11Z

Your last example is not \mid! I think you are actually supporting my point that \mid should not be 'absolute value'. The grammar could be refined to accept \mid inside brakets and set notation (as 'middle bar'), but not confuse it with the absolute value. In fact the grammar already has MIDBAR and MIDDLE production rules, so it would be an easy adjustment.

I would say it's not good that the grammar can change the class (meaning rel,ord,bin) of a character because that will definitely produce different visual output than LaTeX.

dginev · 2025-01-27T20:18:38Z

Try a quick Google search with site:arxiv.org "\mid" and see how it is used in real-world abstracts. There is the "prescribed" use from LaTeX and the "described" use from large corpora. I am a descriptivist myself (coming from a linguistics mindset), which typically collides with the prescriptive approach most mathematicians have of LaTeX.

There may be some middle ground, who knows :>

xworld21 · 2025-01-30T09:31:53Z

Try a quick Google search with site:arxiv.org "\mid"

I did, and was I proven wrong... Ok, ok, \mid is used as vertical bar. I feel like I still disagree with a \mid b \mid c potentially being a * abs(b) * c, because it's even visually hard to parse it like that, because the spacing around \mid is symmetric. But you are right, this should be addressed as a linguistics problem and handled based on real world text.

I checked what MathJax does here. It is both better and worse. The MathML will report \mid as a 'divide' operator with no other special treatment (no mrow, mfence) so the screen reader will read 'a divides b divides c'. However, the accessibility module will say 'a StartAbsoluteValue b EndAbsoluteValue c' and worse \mid\mid a \mid\mid will become 'StartAbsoluteValue EndAbsoluteValue a [...]'.

So by using the wrong spacing, LaTeXML is actually helping me notice that the parsing is not doing what I want.

I am not so sure what my question is anymore. Let me rephrase. Say I am uploading a paper on arXiv. How could I go about forcing \mid to be a relation? I guess \lxDefMath is discouraged (it's not documented, and I assume not stable either) and \mathrel{\mid} won't affect the outcome. (For stuff authored with BookML, I can instead encourage \lx* macros because the author is in control there.)

dginev · 2025-01-30T19:26:51Z

Say I am uploading a paper on arXiv. How could I go about forcing \mid to be a relation?

Alas, that is not something we want authors to be thinking about in the "standard LaTeX ecosystem". One day a sophisticated math grammar (or other AI tool) may be able to understand the intention. The arXiv requirement at present is roughly approximate to "a PDF that reads well upon review by the author."

I would probably go about this by first creating a wrapper in latexml.sty following \lxFcn, \lxPunct etc, maybe:

DefConstructor('\lxRel{}', "<ltx:XMWrap role='RELOP'>#1</ltx:XMWrap>",
  requireMath => 1, reversion => '#1', alias => '');

Then wait for that latexml.sty.ltxml to upstream to arXiv (fingers crossed in 2025), then define in the article source:

\usepackage{latexml}
\def\myrelbar{\lxRel{\mid}}

It may be helpful to expose all/most of the grammar categories as macros, and maybe also dust off the DLMF "semantic macros", but that is also all in @brucemiller's court. Some of these (latexml.sty in particular) would even be nice to have in CTAN.

dginev added enhancement math parsing labels Jan 27, 2025

dginev added this to the Future (if) milestone Jan 27, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

\mid incorrectly parsed as fence #2503

\mid incorrectly parsed as fence #2503

xworld21 commented Jan 27, 2025

dginev commented Jan 27, 2025

xworld21 commented Jan 27, 2025

xworld21 commented Jan 27, 2025

dginev commented Jan 27, 2025

xworld21 commented Jan 27, 2025

dginev commented Jan 27, 2025

xworld21 commented Jan 30, 2025

dginev commented Jan 30, 2025

\mid incorrectly parsed as fence #2503

\mid incorrectly parsed as fence #2503

Comments

xworld21 commented Jan 27, 2025

dginev commented Jan 27, 2025

xworld21 commented Jan 27, 2025

xworld21 commented Jan 27, 2025

dginev commented Jan 27, 2025

xworld21 commented Jan 27, 2025

dginev commented Jan 27, 2025

xworld21 commented Jan 30, 2025

dginev commented Jan 30, 2025