html parsing #200

KeenCN · 2018-12-06T06:23:56Z

Hi, when I try to parse a html string, Tested in python command line:

from pyquery import PyQuery as pq
t = pq('<span class="test">.</span>')
o = t("span.test").html()
print(o)
[ . ]

How do I get the original string?

KeenCN · 2018-12-06T09:44:40Z

That's ok, but it's not what I want

from pyquery import PyQuery as pq
s = '<span class="test">.</span>'
s = s.replace("&", "&")
t = pq(s)
o = t("span.test").html()
print(o)
[ . ]

CodingMoeButa · 2021-06-06T05:22:54Z

I have the same problem with you: #218
If it is < that < , the problem would be more serious.

liquancss · 2021-09-29T15:46:13Z

"&#xe034" looks like a kind of icon font which means it has nothing to do with this lib.
There must be a font file(like .woff file) to tell the browser how &#xe034 rendered. Without the corresponding font file or wrong font file, "&#xe034" will looks weird or wrong.
This is commonly used in website to protect secret data(like price) from crawlers which called font encryption.

jcushman mentioned this issue Aug 2, 2021

Escape entities in html() output #221

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

html parsing #200

html parsing #200

KeenCN commented Dec 6, 2018 •

edited

Loading

KeenCN commented Dec 6, 2018 •

edited

Loading

CodingMoeButa commented Jun 6, 2021

liquancss commented Sep 29, 2021 •

edited

Loading

html parsing #200

html parsing #200

Comments

KeenCN commented Dec 6, 2018 • edited Loading

KeenCN commented Dec 6, 2018 • edited Loading

CodingMoeButa commented Jun 6, 2021

liquancss commented Sep 29, 2021 • edited Loading

KeenCN commented Dec 6, 2018 •

edited

Loading

KeenCN commented Dec 6, 2018 •

edited

Loading

liquancss commented Sep 29, 2021 •

edited

Loading