Incremental parsing produces two attributes when only one expected. #64

grokys · 2024-10-23T11:42:51Z

This is a bit of an obscure one, but if:

One has an incomplete attribute with a name of more than 4 characters, but no value (i.e. the user is typing the attribute name)
And then adds a dot after it
And incrementally parses
Then two attributes are created instead of 1

The following test reproduces the behavior:

        [Fact]
        public void IncrementalParsingIsSameAsFullParsing_DotAfterAttributeName()
        {
            const string xml = "<A><B Names/></A>";
            var full = Parser.ParseText(xml);

            // Attribute name needs to be >= 5 characters to trigger the bug.
            var textToEdit = "Names";
            var insertIndex = xml.IndexOf(textToEdit) + textToEdit.Length;
            var newXml = xml.Insert(insertIndex, ".");
            var change = new TextChangeRange(new TextSpan(insertIndex, 0), 1);
            var incremental = Parser.ParseIncremental(newXml, new[] { change }, full);
            var fullAfterModification = Parser.ParseText(newXml);

            Assert.Single(fullAfterModification.Root.Elements.Single().Attributes);
            Assert.Single(incremental.Root.Elements.Single().Attributes);
        }

grokys · 2024-10-23T12:04:50Z

The reason this only happens when the attribute has > 4 characters seems to be that Scanner.MaxTokensLookAheadBeyondEOL is 4 characters. This causes _affectedRanges to include a span which prevents the previous Name attribute from being recycled, the span does not filter out the attribute when it's called "Names".

https://github.com/KirillOsenkov/XmlParser/blob/main/src/Microsoft.Language.Xml/Blender.cs#L96

grokys · 2024-10-23T12:10:43Z

Actually, this is more serious than originally thought. The inserted character does not need to be a . - it can be any valid attribute name character. So editing "Names" -> "Namesx" will cause two attributes to be produced, "Names" and "x":

        [Fact]
        public void IncrementalParsingIsSameAsFullParsing_AppendingToAttributeName()
        {
            const string xml = "<A><B Change/></A>";
            var full = Parser.ParseText(xml);

            // Attribute name needs to be >= 5 characters to trigger the bug.
            var textToEdit = "Change";
            var insertIndex = xml.IndexOf(textToEdit) + textToEdit.Length;
            var newXml = xml.Insert(insertIndex, "s");
            var change = new TextChangeRange(new TextSpan(insertIndex, 0), 1);
            var incremental = Parser.ParseIncremental(newXml, new[] { change }, full);
            var fullAfterModification = Parser.ParseText(newXml);

            Assert.Single(fullAfterModification.Root.Elements.Single().Attributes);
            Assert.Single(incremental.Root.Elements.Single().Attributes); // <-- FAILS
        }

grokys · 2024-10-23T12:24:31Z

This seems to be a problem in marking the affected ranges dirty:

Blender.ExpandToParentChain calls root.FindNode(change.Span.Start, includeTrivia: false);
This returns the /> of B
It marks <B> as dirty
But it doesn't mark the Change attribute as dirty

A simple fix is to change:

https://github.com/KirillOsenkov/XmlParser/blob/main/src/Microsoft.Language.Xml/Blender.cs#L45

To

var node = root.FindNode(change.Span.Start - 1, includeTrivia: false);

i.e. mark the node of the character before the change dirty. This fixes the issue, and all other tests still pass but I'm not convinced it's the correct thing to do. I'm not sure I know enough about how this was intended to work to go much further (and there aren't enough incremental parsing tests to allow me to be confident I'll not be breaking something else)

KirillOsenkov · 2024-10-23T22:37:30Z

I'd say I wouldn't worry about breaking stuff, because if we break something later we'll deal with it then.

Honestly I know about as much about incremental parsing as you do.

One thing we could do later is fuzz to randomly mutate a simple XML file, do a full parse and an incremental parse, and compare the results. If they differ, we found a bug.

I usually start with this file:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<X>
  <n:T></n:T>
  <X/>
  <A.B></A.B>
  <A B="a"></A>
  <A>&#x03C0;</A>
  <A>a &lt;</A>
  <A><![CDATA[bar]]></A>
  <!-- comment -->
</X>

and then we can select a series of edits, either delete a char or insert a char (and favor chars such as < > " = and /).

For now since you've already built some understanding about this bug, just send a PR and we can merge and move on.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incremental parsing produces two attributes when only one expected. #64

Incremental parsing produces two attributes when only one expected. #64

grokys commented Oct 23, 2024 •

edited

Loading

grokys commented Oct 23, 2024 •

edited

Loading

grokys commented Oct 23, 2024 •

edited

Loading

grokys commented Oct 23, 2024

KirillOsenkov commented Oct 23, 2024 •

edited

Loading

Incremental parsing produces two attributes when only one expected. #64

Incremental parsing produces two attributes when only one expected. #64

Comments

grokys commented Oct 23, 2024 • edited Loading

grokys commented Oct 23, 2024 • edited Loading

grokys commented Oct 23, 2024 • edited Loading

grokys commented Oct 23, 2024

KirillOsenkov commented Oct 23, 2024 • edited Loading

grokys commented Oct 23, 2024 •

edited

Loading

grokys commented Oct 23, 2024 •

edited

Loading

grokys commented Oct 23, 2024 •

edited

Loading

KirillOsenkov commented Oct 23, 2024 •

edited

Loading