-
Notifications
You must be signed in to change notification settings - Fork 286
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HTMLSerializer and HTML entities #197
Comments
What's your use-case for wanting them escaped? |
Basically, you might want to encode it as an ASCII string (which will obviously escape all non-ASCII characters), or is something like #38 more along the lines of what you want? |
I am building a html document using etree Element instances, and serialising with html5lib, special characters (especially things like the non breaking space) will be non-obvious to anyone viewing the rendered source. |
Okay, that makes it sound like #38 is more or less what you want — readability of the source given "invisible" characters. Closing this as a dupe of that, then. (If you disagree, feel free to reopen this and comment.) |
at the moment the parsing and serialising a document causes entities to be converted into special characters, including things like #00 and there is no way to pass additional entities to xml.sax.saxutils.escape.
I looked into subclassing the serialiser but the escaping happens in the middle of the serialize() method at:
https://github.com/html5lib/html5lib-python/blob/master/html5lib/serializer/htmlserializer.py#L223
perhaps the class should define an entities dict to pass through the standard html5 entities and special characters or do the escaping via a class method that can be overridden?
The text was updated successfully, but these errors were encountered: