Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix handling of "ambiguous ampersands" to report error when expected #83

Closed
wants to merge 27 commits into from

Conversation

sideshowbarker
Copy link
Member

@sideshowbarker sideshowbarker commented May 29, 2023

Fixes #82. 44427c0 broke handling of "ambiguous ampersands" for the case where we've already examined enough of string after the ampersand to conclude that the substring we have so far doesn't match the beginning of any known named character reference.

This change re-conforms the error-reporting for that case to the requirements in the HTML spec. Otherwise, without this change, no error is reported as expected in many or most cases where an ampersand doesn't actually start a character reference.

@sideshowbarker sideshowbarker requested a review from hsivonen May 29, 2023 08:52
sideshowbarker added a commit to validator/validator that referenced this pull request May 29, 2023
See validator/htmlparser#82
See validator/htmlparser#83

This switches the HTML checker to using the validator-nu branch of the
htmlparser repo until validator/htmlparser#83 is
reviewed and merged into the main branch.
@dill0wn
Copy link

dill0wn commented May 30, 2023

This change started triggering a bunch of new errors for us when checking against https://validator.w3.org/nu/.

The lines are all lines similar to this:

<a href="https://foo.example.com?query=q&sort=-1">An Hyperlink!</a>

The"&" in the href's query string triggers this syntax violation.

I just would like to double check that this is bad practice and something we should fix in our code. Because it seems pretty innocuous.

@nschonni
Copy link
Member

@dill0wn that looks like a regression to me. I don't think it should be flagging the query string parameters

@sideshowbarker
Copy link
Member Author

@dill0wn that looks like a regression to me. I don't think it should be flagging the query string parameters

Yeah, it‘s a regression. See also validator/validator#1595

@sideshowbarker sideshowbarker marked this pull request as draft May 31, 2023 08:59
@sideshowbarker
Copy link
Member Author

@dill0wn

I just would like to double check that this is bad practice and something we should fix in our code. Because it seems pretty innocuous.

Per the current spec, it’s not bad practice and not something you need to fix in your code in order for it to be valid.

We changed the parser algorithm in the spec years ago to allow for ampersands in URLs in attribute values, while still disallowing it in other cases where we wanted to disallow it — and then we attempted to change the parser to match the behavior required by the spec. It seems now we got the allow-in-attribute-values behavior done right in the parser, but as a result, somewhere along the lines, we seem to have ended up allowing ampersands everywhere. So we still need to fix that. But in the meantime, I need to revert this change.

@sideshowbarker
Copy link
Member Author

sideshowbarker commented May 31, 2023

I’ve fixed this (reverted the patch in the PR) in the HTML checker sources and pushed the fix to https://validator.w3.org/nu/

sideshowbarker and others added 20 commits June 2, 2023 18:37
to remedy an O(n^2) performance problem when parsing e.g. inline images
#64)

* Mozilla bug 1701828 addendum - More comments for meta charset rewrite for Gecko.

Differential Revision: https://phabricator.services.mozilla.com/D125808

Co-authored-by: Michael[tm] Smith <[email protected]>
Bumps [actions/checkout](https://github.com/actions/checkout) from 2 to 3.
- [Release notes](https://github.com/actions/checkout/releases)
- [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md)
- [Commits](actions/checkout@v2...v3)

---
updated-dependencies:
- dependency-name: actions/checkout
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <[email protected]>
Bumps [actions/cache](https://github.com/actions/cache) from 2 to 3.
- [Release notes](https://github.com/actions/cache/releases)
- [Commits](actions/cache@v2...v3)

---
updated-dependencies:
- dependency-name: actions/cache
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <[email protected]>
Bumps [actions/setup-java](https://github.com/actions/setup-java) from 2 to 3.
- [Release notes](https://github.com/actions/setup-java/releases)
- [Commits](actions/setup-java@v2...v3)

---
updated-dependencies:
- dependency-name: actions/setup-java
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <[email protected]>
See Mozilla bug 1775477.

Co-authored-by: Henri Sivonen <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Ambiguous ampersands are not detected
6 participants