Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ambiguous ampersands are not detected #82

Open
ezequiel-garzon opened this issue May 25, 2023 · 2 comments
Open

Ambiguous ampersands are not detected #82

ezequiel-garzon opened this issue May 25, 2023 · 2 comments
Labels
bug Something isn't working

Comments

@ezequiel-garzon
Copy link

According to the HTML standard, "an ambiguous ampersand is a U+0026 AMPERSAND character (&) that is followed by one or more ASCII alphanumerics, followed by a U+003B SEMICOLON character (;), where these characters do not match any of the names given in the named character references section".

Maybe I'm missing something, but shouldn't then something like &ThisAmpersandShouldBeDeemedAmbiguous; raise an error, or a warning? I know it used to, but I've checked both on a Mac with version 23.4.11, as well as on https://validator.w3.org/, and no error or warning is now raised. Thanks in advance.

@ezequiel-garzon ezequiel-garzon added the enhancement New feature or request label May 25, 2023
@sideshowbarker sideshowbarker transferred this issue from validator/validator May 29, 2023
@sideshowbarker sideshowbarker added bug Something isn't working and removed enhancement New feature or request labels May 29, 2023
sideshowbarker added a commit that referenced this issue May 29, 2023
Fixes #82

44427c0 broke handling of "ambiguous
ampersands" for the case where we've already examined enough of string
after the ampersand to conclude that the substring we have so far
doesn't match the beginning of any known named character reference.

This change re-conforms the error-reporting for that case to the
requirements in the HTML spec. Otherwise, without this change, no error
is reported as expected in many or most cases where an ampersand doesn't
actually start a character reference.
sideshowbarker added a commit to validator/validator that referenced this issue May 29, 2023
See validator/htmlparser#82
See validator/htmlparser#83

This switches the HTML checker to using the validator-nu branch of the
htmlparser repo until validator/htmlparser#83 is
reviewed and merged into the main branch.
@sideshowbarker
Copy link
Member

@ezequiel-garzon Thanks much for catching it. It was a great catch because, embarrassingly, it appears we’ve unfortunately had this bug for almost two years now — and the effect of it is that for those almost two years now, the HTML checker hasn’t been reporting any errors for almost all cases of invalid named character references.

In other words, when, for example, people have accidentally made minor spelling mistakes to otherwise-valid named character references, the HTML checker hasn’t been catching that and reporting it so that they can fix their spelling mistakes.

I’ve fixed this in a feature branch with #83 — and for now, I’ve switched the HTML checker to being built from that branch, and pushed the updates to https://validator.w3.org/nu/

But I’ll keep this issue open until the fix gets merged into the main branch of the HTML parser code here.

@ezequiel-garzon
Copy link
Author

My pleasure, @sideshowbarker. Thank you for taking care of this and so many other projects. I checked many, many times before reporting as I thought I was doing something wrong.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants