Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect <form> tag processing #26

Closed
feeeper opened this issue Oct 19, 2016 · 5 comments
Closed

Incorrect <form> tag processing #26

feeeper opened this issue Oct 19, 2016 · 5 comments

Comments

@feeeper
Copy link
Contributor

feeeper commented Oct 19, 2016

I start working on tests and found some problem (or may be I misunderstanding

tag processing).

My test:

Jsonize jsonize = new Jsonize("<html><head></head><body><form></form></body></html>");
var result = jsonize.ParseHtmlAsJsonString(jsonizeConfiguration);

Result JSON string:

{
  "node": "Document",
  "child": [
    {
      "node": "Element",
      "tag": "html",
      "child": [
        {
          "tag": "head"
        },
        {
          "node": "Element",
          "tag": "body",
          "child": [
            {
              "tag": "form"
            },
            {
              "node": "Text",
              "text": "</form>"
            }
          ]
        }
      ]
    }
  ]
}

Is it true that node "Text" with text "</form>" is not correct? And it should look so:

{
  "node": "Document",
  "child": [
    {
      "tag": "html",
      "child": [
        {
          "tag": "head"
        },
        {
          "tag": "body",
          "child": [
            {
              "tag": "form"
            }
          ]
        }
      ]
    }
  ]
}
@JackWFinlay
Copy link
Owner

Nice find. You are correct, the second case should be the proper result. I'll have to address that when I have some time.

Just out of curiosity, what happens when you put some elements inside the form?

@feeeper
Copy link
Contributor Author

feeeper commented Oct 19, 2016

I inserted input inside the form and got the result:

{
  "node": "Document",
  "child": [
    {
      "node": "Element",
      "tag": "html",
      "child": [
        {
          "tag": "head"
        },
        {
          "node": "Element",
          "tag": "body",
          "child": [
            {
              "tag": "form"
            },
            {
              "tag": "input"
            },
            {
              "node": "Text",
              "text": "</form>"
            }
          ]
        }
      ]
    }
  ]
}

@JackWFinlay
Copy link
Owner

Interesting. I'll definitely have to fix that, or you can give it a go If you'd like. Thanks for spotting that. I think it may be an issue with HtmlAgilityPack. We can probably intercept it if it is form node and do some manual of our own processing.

@feeeper
Copy link
Contributor Author

feeeper commented Oct 19, 2016

First of all I'll add the test for the case (today or tomorrow) :) And then I'll try to figure out what the problem.

@feeeper
Copy link
Contributor Author

feeeper commented Oct 20, 2016

I send the PR for the issue.

PS: Can you add Hacktoberfest label for the issue, please?

JackWFinlay added a commit that referenced this issue Oct 21, 2016
Fix #26. Exclude 'Form' tag from ElementsFlags
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants