Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

\mid incorrectly parsed as fence #2503

Open
xworld21 opened this issue Jan 27, 2025 · 8 comments
Open

\mid incorrectly parsed as fence #2503

xworld21 opened this issue Jan 27, 2025 · 8 comments

Comments

@xworld21
Copy link
Contributor

The simple

a \mid b \mid c

renders as
Image
because \mid gets parsed as a fence, and is given inner spacing 0, even though it is a binary relation (and LaTeX spaces it as such). More worringly, MathJax will spell this out as 'absolute value'.

I think LaTeXML should treat \mid (and \mvert) more like RELOP than VERTBAR. This may require some changes to the grammar e.g. around the rule MIDDLE to catch uses like \{ x \in A \mid P(x) \}.

@dginev
Copy link
Collaborator

dginev commented Jan 27, 2025

Vertbars are actually the least simple operators :> Too much ambiguity. Valid problem of course.

@xworld21
Copy link
Contributor Author

PS: pleased to report that after assigning role RELOP, VoiceOver reads a \mid b \mid c as 'a divides b divides c', as it should.

@xworld21
Copy link
Contributor Author

To state the issue a bit more precisely (then I'll shut up!), the grammar should make more use of \mathrel, \mathbin, etc. for disambiguating (based on my unfounded belief that \mid is almost never used as a fence – it would look very odd, and it doesn't resize with \left and \right!).

@dginev
Copy link
Collaborator

dginev commented Jan 27, 2025

Sometimes that's a chained divides RELOP, sometimes they are fences for an inner product with bra-ket notation

$$ \langle\phi \mid \boldsymbol{A} \mid \psi\rangle $$

And then we also have the much simpler (but rarer) multiplication of two factors with an intermediate absolute value term:

$$ 2 \left| -x \right| y $$

It's tricky.

@xworld21
Copy link
Contributor Author

Your last example is not \mid! I think you are actually supporting my point that \mid should not be 'absolute value'. The grammar could be refined to accept \mid inside brakets and set notation (as 'middle bar'), but not confuse it with the absolute value. In fact the grammar already has MIDBAR and MIDDLE production rules, so it would be an easy adjustment.

I would say it's not good that the grammar can change the class (meaning rel,ord,bin) of a character because that will definitely produce different visual output than LaTeX.

@dginev
Copy link
Collaborator

dginev commented Jan 27, 2025

Try a quick Google search with site:arxiv.org "\mid" and see how it is used in real-world abstracts. There is the "prescribed" use from LaTeX and the "described" use from large corpora. I am a descriptivist myself (coming from a linguistics mindset), which typically collides with the prescriptive approach most mathematicians have of LaTeX.

There may be some middle ground, who knows :>

@xworld21
Copy link
Contributor Author

Try a quick Google search with site:arxiv.org "\mid"

I did, and was I proven wrong... Ok, ok, \mid is used as vertical bar. I feel like I still disagree with a \mid b \mid c potentially being a * abs(b) * c, because it's even visually hard to parse it like that, because the spacing around \mid is symmetric. But you are right, this should be addressed as a linguistics problem and handled based on real world text.

I checked what MathJax does here. It is both better and worse. The MathML will report \mid as a 'divide' operator with no other special treatment (no mrow, mfence) so the screen reader will read 'a divides b divides c'. However, the accessibility module will say 'a StartAbsoluteValue b EndAbsoluteValue c' and worse \mid\mid a \mid\mid will become 'StartAbsoluteValue EndAbsoluteValue a [...]'.

So by using the wrong spacing, LaTeXML is actually helping me notice that the parsing is not doing what I want.

I am not so sure what my question is anymore. Let me rephrase. Say I am uploading a paper on arXiv. How could I go about forcing \mid to be a relation? I guess \lxDefMath is discouraged (it's not documented, and I assume not stable either) and \mathrel{\mid} won't affect the outcome. (For stuff authored with BookML, I can instead encourage \lx* macros because the author is in control there.)

@dginev
Copy link
Collaborator

dginev commented Jan 30, 2025

Say I am uploading a paper on arXiv. How could I go about forcing \mid to be a relation?

Alas, that is not something we want authors to be thinking about in the "standard LaTeX ecosystem". One day a sophisticated math grammar (or other AI tool) may be able to understand the intention. The arXiv requirement at present is roughly approximate to "a PDF that reads well upon review by the author."

I would probably go about this by first creating a wrapper in latexml.sty following \lxFcn, \lxPunct etc, maybe:

DefConstructor('\lxRel{}', "<ltx:XMWrap role='RELOP'>#1</ltx:XMWrap>",
  requireMath => 1, reversion => '#1', alias => '');

Then wait for that latexml.sty.ltxml to upstream to arXiv (fingers crossed in 2025), then define in the article source:

\usepackage{latexml}
\def\myrelbar{\lxRel{\mid}}

It may be helpful to expose all/most of the grammar categories as macros, and maybe also dust off the DLMF "semantic macros", but that is also all in @brucemiller's court. Some of these (latexml.sty in particular) would even be nice to have in CTAN.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants