Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTMLSerializer and HTML entities #197

Closed
danielknell opened this issue Jul 20, 2015 · 4 comments
Closed

HTMLSerializer and HTML entities #197

danielknell opened this issue Jul 20, 2015 · 4 comments

Comments

@danielknell
Copy link

>>> html5lib.serializer.serialize(html5lib.parse('<p>&nbsp;</p>'))
'<p>\xa0'

at the moment the parsing and serialising a document causes entities to be converted into special characters, including things like #00 and there is no way to pass additional entities to xml.sax.saxutils.escape.

I looked into subclassing the serialiser but the escaping happens in the middle of the serialize() method at:

https://github.com/html5lib/html5lib-python/blob/master/html5lib/serializer/htmlserializer.py#L223

perhaps the class should define an entities dict to pass through the standard html5 entities and special characters or do the escaping via a class method that can be overridden?

@gsnedders
Copy link
Member

What's your use-case for wanting them escaped?

@gsnedders
Copy link
Member

Basically, you might want to encode it as an ASCII string (which will obviously escape all non-ASCII characters), or is something like #38 more along the lines of what you want?

@danielknell
Copy link
Author

I am building a html document using etree Element instances, and serialising with html5lib, special characters (especially things like the non breaking space) will be non-obvious to anyone viewing the rendered source.

@gsnedders
Copy link
Member

Okay, that makes it sound like #38 is more or less what you want — readability of the source given "invisible" characters. Closing this as a dupe of that, then. (If you disagree, feel free to reopen this and comment.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants