Skip to content

Commit

Permalink
Fix handling of "ambiguous ampersands" to report error when expected
Browse files Browse the repository at this point in the history
Fixes #82

44427c0 broke handling of "ambiguous
ampersands" for the case where we've already examined enough of string
after the ampersand to conclude that the substring we have so far
doesn't match the beginning of any known named character reference.

This change re-conforms the error-reporting for that case to the
requirements in the HTML spec. Otherwise, without this change, no error
is reported as expected in many or most cases where an ampersand doesn't
actually start a character reference.
  • Loading branch information
sideshowbarker committed May 29, 2023
1 parent 9448adc commit 4ef2c05
Showing 1 changed file with 7 additions and 3 deletions.
10 changes: 7 additions & 3 deletions src/nu/validator/htmlparser/impl/Tokenizer.java
Original file line number Diff line number Diff line change
Expand Up @@ -3529,9 +3529,13 @@ private void ensureBufferSpace(int inputLength) throws SAXException {

if (candidate == -1) {
// reconsume deals with CR, LF or nul
if (c == ';') {
errNoNamedCharacterMatch();
}
// We can go ahead and emit an error here without
// needing to do an "if (c == ';')" check, because if
// we've gotten here, we've already examined enough of
// string after the ampersand to determine that the
// substring we've looked at so far doesn't match the
// beginning of any known named character references.
errNoNamedCharacterMatch();
emitOrAppendCharRefBuf(returnState);
if ((returnState & DATA_AND_RCDATA_MASK) == 0) {
cstart = pos;
Expand Down

0 comments on commit 4ef2c05

Please sign in to comment.