-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ambiguous ampersands are not detected #82
Comments
Fixes #82 44427c0 broke handling of "ambiguous ampersands" for the case where we've already examined enough of string after the ampersand to conclude that the substring we have so far doesn't match the beginning of any known named character reference. This change re-conforms the error-reporting for that case to the requirements in the HTML spec. Otherwise, without this change, no error is reported as expected in many or most cases where an ampersand doesn't actually start a character reference.
See validator/htmlparser#82 See validator/htmlparser#83 This switches the HTML checker to using the validator-nu branch of the htmlparser repo until validator/htmlparser#83 is reviewed and merged into the main branch.
@ezequiel-garzon Thanks much for catching it. It was a great catch because, embarrassingly, it appears we’ve unfortunately had this bug for almost two years now — and the effect of it is that for those almost two years now, the HTML checker hasn’t been reporting any errors for almost all cases of invalid named character references. In other words, when, for example, people have accidentally made minor spelling mistakes to otherwise-valid named character references, the HTML checker hasn’t been catching that and reporting it so that they can fix their spelling mistakes. I’ve fixed this in a feature branch with #83 — and for now, I’ve switched the HTML checker to being built from that branch, and pushed the updates to https://validator.w3.org/nu/ But I’ll keep this issue open until the fix gets merged into the main branch of the HTML parser code here. |
My pleasure, @sideshowbarker. Thank you for taking care of this and so many other projects. I checked many, many times before reporting as I thought I was doing something wrong. |
According to the HTML standard, "an ambiguous ampersand is a U+0026 AMPERSAND character (&) that is followed by one or more ASCII alphanumerics, followed by a U+003B SEMICOLON character (;), where these characters do not match any of the names given in the named character references section".
Maybe I'm missing something, but shouldn't then something like
&ThisAmpersandShouldBeDeemedAmbiguous;
raise an error, or a warning? I know it used to, but I've checked both on a Mac with version 23.4.11, as well as on https://validator.w3.org/, and no error or warning is now raised. Thanks in advance.The text was updated successfully, but these errors were encountered: