-
Notifications
You must be signed in to change notification settings - Fork 248
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rewrite bleach to support html5lib >= 0.99999999 (8 9s) #229
Comments
as the supplied htmllib has XSS vulnerabilities (https://www.sourceclear.com/registry/security/cross-site-scripting-xss-/python/sid-3068) would be nice to see this issue fixed |
Did someone request a CVE for that? |
We need this older version of html5lib because bleach doesn’t yet support versions greater than 0.9999999 (7 9s). It will at some point in the future: mozilla/bleach#229 Weasyprint doesn’t support versions of html5lib lower than 0.999999999 (9 9s) and isn’t going to: Kozea#353 (comment) So this commit patches Weasyprint to use the older version.
We need this older version of html5lib because bleach doesn’t yet support versions greater than 0.9999999 (7 9s). It will at some point in the future: mozilla/bleach#229 Weasyprint doesn’t support versions of html5lib lower than 0.999999999 (9 9s) and isn’t going to: Kozea#353 (comment) So this commit patches Weasyprint to use the older version.
We need this older version of html5lib because bleach doesn’t yet support versions greater than 0.9999999 (7 9s). It will at some point in the future: mozilla/bleach#229 Weasyprint doesn’t support versions of html5lib lower than 0.999999999 (9 9s) and isn’t going to: Kozea#353 (comment) So this commit patches Weasyprint to use the older version.
Besides the security issues it is currently not possible to use bleach together with weasyprint because the latter already migrated to the new html5lib and does not support the old one. That's why I currently cannot use bleach! |
This is the first time I've heard of that XSS. While it's a problem with html5lib, has anyone looked into whether it affects bleach, too? If so, can someone write up a test case and toss that in a pull request? Regarding supporting html5lib 0.99999999 (8 9s) and up, I'm happy to look at pull requests. If anyone is interested in working on it, let me know. |
Not sure, however what they write on sourceclear:
However, that version breaks other things, right? |
I looked into it. The security vulnerability is about characters that need to be quoted for legacy browsers when used in attributes. I think the relevant test in the patch for html5lib is this one: html5lib/html5lib-python@9b8d8eb#diff-42ac3b2a6b7432ce8cc3967afb76ba26L245 If you run that string through bleach right now with 0.9999999 (7 9s), it quotes it correctly. That's because bleach always quotes attributes: Line 430 in 70b1d2d
Given that, I don't think bleach is affected by the XSS in question. Please feel free to double-check this. Regardless, I'm working on updating bleach to use html5lib |
Thanks, Will, for updating bleach to use the new html5lib. I hope that
you are successful and don't have too much work with it. Even if
security is not a problem, I think that anyway it is a thing which needs
to be done sooner or later. For us it is currently just disturbing
(http://bugs.lino-framework.org/tickets/Ticket/1262) but might become
urgent some day.
Luc
…On 20/12/16 00:06, Will Kahn-Greene wrote:
I looked into it. The security vulnerability is about characters that
need to be quoted for legacy browsers when used in attributes.
I think the relevant test in the patch for html5lib is this one:
***@***.***#diff-42ac3b2a6b7432ce8cc3967afb76ba26L245
<html5lib/html5lib-python@9b8d8eb#diff-42ac3b2a6b7432ce8cc3967afb76ba26L245>
If you run that string through bleach right now with 0.9999999 (7 9s),
it quotes it correctly. That's because bleach always quotes attributes:
https://github.com/mozilla/bleach/blob/70b1d2d03a069791a19ab6291b6e885371f6fe0f/bleach/__init__.py#L430
|Python 2.7.12 (default, Nov 19 2016, 06:48:10) [GCC 5.4.0 20160609] on
linux2 Type "help", "copyright", "credits" or "license" for more
information. >>> from bleach import clean >>> clean('<span
title=foo\u000bbar>') u'<span title="foo\\u000bbar">' >>> |
Given that, I don't think bleach is affected by the XSS in question.
Please feel free to double-check this.
Regardless, I'm working on updating bleach to use html5lib |>=|
0.9999999 now. I'm not sure how long it'll take.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#229 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AEM-J_fFKCwA83coPeoxm4cyKxxtpq7aks5rJv_rgaJpZM4KlpIG>.
|
I've been working on this on and off for a couple of months when I get a chance to spend time on it. Yesterday, I hit a problem that is currently a showstopper. I wrote up an issue for it here: I'm hoping I'm horribly wrong and completely misunderstand something. If that's the case, hopefully someone points it out and then I can return to work. If not, then I'll think of another way to implement things. Anyhow, I'm blocked until that's resolved in some manner. |
hi, any timeline for a bleach release with this compatibility fix ? |
There's no timeline for this rather large rewrite of bleach. Last I updated, I mentioned how the changes in the html5lib API cause a number of regressions, so I'm still pretty stuck on that. These are the options I've come up with so far in rough order of the estimated amount of work required: Option 1: Vendor html5lib 0.9999999 and continue with bleach as is This has a number of problems the biggest one in my opinion being that we'd be responsible for security issues. Bleach is probably small enough that we probably won't even find out about security issues. Further, I think this step implicitly involves a fork of html5lib. Because of that, this option is pretty unenthusing. Option 2: Finish up the work and whatever bleach regressions we have--that's the new normal, do a release This has a number of problems. Currently, we regularly get complains about the output of Option 3: Wait for something to change upstream in html5lib I think this will take a long time to happen because it's substantial work upstream and I don't think it's very high on their priority list and I think they're pretty starved for maintenance help, too. Maybe 6 months? Maybe a year? Maybe never? Waiting for changes upstream to happen is probably not a valid option. I don't have time to work on it myself--it's probably a huge project. Option 4: Switch to another library I started looking into whether we could switch from html5lib to something else and didn't get very far, but maybe it's possible. So that's where we're at. I have no idea how long this is going to take. I make no promises on timelines etc. |
I continued investigating option 2. One of the problems here was that in order to understand the scope of the html5lib 0.99999999 (8 9s) API change as it affected Bleach, I had to finish the rewrite. But I didn't want to spend all my time on the rewrite if I was just going to throw it away and go with a different option. Anyhow, I'm mostly done with the rewrite. There will be some regressions in I'm hoping I can wrap this up next week. Who can help test? |
Great news! I wish you success and will do my best for helping test it.
Luc
…On 18/02/17 04:18, Will Kahn-Greene wrote:
I continued investigating option 2. One of the problems here was that in
order to understand the scope of the html5lib 0.99999999 (8 9s) API
change as it affected Bleach, I had to finish the rewrite. But I didn't
want to spend all my time on the rewrite if I was just going to throw it
away and go with a different option.
Anyhow, I'm mostly done with the rewrite. There will be some regressions
in |.clean()| and |.linkify()| output and the Bleach API will change.
I'll have more info when I clean things up and make the branch public.
I'm hoping I can wrap this up next week.
Who can help test?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#229 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AEM-J71RuRSJGI0BvLIjvRFnV_obYUkvks5rdlT2gaJpZM4KlpIG>.
|
I'm definitely happy to help test. |
I'm reopening this for testing. @Xof, @lsaffre and anyone else: I landed the changes just now. You can grab a .tar.gz of it with this link: https://github.com/mozilla/bleach/archive/543984c.tar.gz You can install it with pip:
Please take some time before Monday to install it, test it out and comment on your experiences here. If you run into issues, write them up as new issues. I'll work through them as I'm able. Thanks! |
Great! It seems that it works now! See my blog. So for me you can close this ticket. I cannot say whether it introduced any regression because I will only now start using bleach on the battle field. |
I've installed and tested it as far as seeing that the tests pass with python2.7 and python3.5 on Debian. Of course, these html5lib changes ripple outward. The update, not surprisingly, causes django-html_sanitizer to fail: I don't think that should stop you from releasing though. It's already broken with the current html5lib, so anyone wanting to use it as it stands will already to muck around with using old versions of things. |
I also would welcome a release because that would simplify deploying in order to test it on real data. |
I haven't gotten it to break in my testing. Thanks for the fast work! |
Thank you for help with testing! I made a bunch more changes and pushed out a release just now. If there are issues, I'll try to triage them quickly. |
Yes, it works perfectly now for me. For me we can close the ticket. Thanks for your work, Will! |
Changes: mozilla/bleach@v1.4.3...v2.0 Summary: https://github.com/mozilla/bleach/blob/b8b8e4aa9b688fb00f41959e74d8054e461f6aaf/CHANGES This is the change we care about: > * Supports html5lib >= 0.99999999 (8 9s). Means we won’t need to do this workaround in admin any more: > We need [to use] an older version of html5lib because bleach doesn’t > yet support versions greater than 0.9999999 (7 9s). It will at some > point in the future: mozilla/bleach#229 > > Weasyprint doesn’t support versions of html5lib lower than > 0.999999999 (9 9s) and isn’t going to: Kozea#353 (comment) > > So this commit patches Weasyprint to use the older version. – quis/WeasyPrint@99718c2
Changes: mozilla/bleach@v1.4.3...v2.0 Summary: https://github.com/mozilla/bleach/blob/b8b8e4aa9b688fb00f41959e74d8054e461f6aaf/CHANGES This is the change we care about: > * Supports html5lib >= 0.99999999 (8 9s). Means we won’t need to do this workaround in admin any more: > We need [to use] an older version of html5lib because bleach doesn’t > yet support versions greater than 0.9999999 (7 9s). It will at some > point in the future: mozilla/bleach#229 > > Weasyprint doesn’t support versions of html5lib lower than > 0.999999999 (9 9s) and isn’t going to: Kozea#353 (comment) > > So this commit patches Weasyprint to use the older version. – quis/WeasyPrint@99718c2
This is needed to fix the breaking incompatibility with html5 lib. mozilla/bleach#229
This is needed to fix the breaking incompatibility with html5 lib. mozilla/bleach#229
This is needed to fix the breaking incompatibility with html5 lib. mozilla/bleach#229
I thought someone was going to write up an issue for this, but apparently no one has, yet.
html5lib changed some APIs so we pinned to specific versions of html5lib with some number of 9s.
html5lib 1.0 milestone is pretty close to done. I think they're just waiting on documenting their public APIs. I don't see any indication of what their schedule looks like, but from what I did see, it seems like it's "done" except for the documentation. Given that, we could probably be safe by adding support for html5lib 0.999999999 (that's 9 9s) in addition to the ones we support now.
There's another issue in our tracker about how 0.999 causes some errors. We might want to nix support for that version.
This issue covers adding support for html5lib 0.999999999 (that's 9 9s) and 1.0 when that releases.
The text was updated successfully, but these errors were encountered: