Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

flask.Response.make_conditional fails to return 304 due to Etag mismatch #24

Open
danielzen opened this issue May 28, 2021 · 5 comments

Comments

@danielzen
Copy link

This is more a need for technique than a library failure. Due to the necessary changes to modify Etags when returning compressed content, the Etags don't match when calling make_conditional

So, how would one get flask to acknowledge a matching etag on compressed content? Should I intercept and strip the trailing :gzip off the normally immutable request headers?

I'm at a loss. Any recommendations? Thanks in advance.

@ziddey
Copy link

ziddey commented Oct 7, 2021

Just came across this myself.

I'm curious if altering the etag in flask-compress is even necessary, since it's adding accept-encoding to the vary header. Shouldn't this be enough to prevent any cache from matching if the requested encoding does change?

edit: just saw #15
hmm

In the interim, I've added a before_app_request to alter the header in the environ:

    for h in ["HTTP_IF_NONE_MATCH", "HTTP_IF_MATCH"]:
        if h in request.environ:
            request.environ[f"ORIG_{h}"] = request.environ[h]
            request.environ[h] = re.sub(':[^"]+', "", request.environ[h])

This would cause issues if you're using : in your etags, but flask's implementation does not. However, a 304 will then prevent flask-compress from appending the encoding onto the resulting etag... I suppose an after_app_request could be used to restore it if needed.

    if (
        "ORIG_HTTP_IF_NONE_MATCH" in request.environ
        and "etag" in response.headers
        and response.status_code == 304
    ):
        etag = response.headers["etag"].strip('"')
        for orig in request.environ["ORIG_HTTP_IF_NONE_MATCH"].split(","):
            orig = orig.strip()
            if orig.strip('"').startswith(etag):
                response.headers["ETag"] = orig
                break
    return response

Of course, this is just to get around #15 and restore conditional requests. I'm not clear if etags do need to be unique for different encodings and their ramifications-- potential range request disasters if the accepted-encoding changes for some strange reason?

edit: https://datatracker.ietf.org/doc/html/rfc7232#section-2.3.3

So I suppose flask-compress may want to call make_conditional. This would resolve the etag 304 issue, but there are other scenarios that could still have issues:

  • For if-modified-since requests, when flask itself calls make_conditional (e.g. via send_static_file), it will successfully convert to 304 before flask-compress gets to it and will be missing the content-encoding. I'm not sure if this can cause any issues-- I do know that caches should use a 304's cache control headers to update the cache.
  • Range requests get (mis)handled before compression
  • An if-match will reach flask-compress with a 412 if the "matching" altered etag is used and fail the http 200-300 criteria, where it should be a 200..

Basically, all conditional request scenarios need careful consideration.

for reference: https://stackoverflow.com/questions/33947562/is-it-possible-to-send-http-response-using-gzip-and-byte-ranges-at-the-same-time

@ziddey
Copy link

ziddey commented Oct 9, 2021

@alexprengere I've done a little more research on this. I've seen etags with the encoding appended, like flask-compress is currently doing. As well, it's common to just weaken the etag (cloudflare does this). If we use this approach instead, flask/werkzeug's make_conditional should work just fine (for non-range requests...).

Something like:

etag = response.get_etag()
if etag and etag[0]:
    response.set_etag(etag[0], True)

As far as range requests, it looks like it's common to simply not support compression in this scenario. We could just check for a content-range response header:

        if (chosen_algorithm is None or
            response.mimetype not in app.config["COMPRESS_MIMETYPES"] or
            response.status_code < 200 or
            response.status_code >= 300 or
            "Content-Encoding" in response.headers or
            "Content-Range" in response.headers or
            (response.content_length is not None and
             response.content_length < app.config["COMPRESS_MIN_SIZE"])):
            return response

I did see some interesting behavior testing range requests with cloudflare. CF will accept gzip encoding for upstream requests, and will then happily serve a gzipped range response (with an unweakened etag). But for any other encodings (no accept-encoding, or br or whatever else), it will fall back to identity and weaken the etag.

This obviously can still lead to major issues, if say a HEAD request is made (with accept-encoding) to determine the content-length to subsequently make a range request.. But realistically, such responses should be excluded from compression by the user anyway (excluded mimetypes, etc), so it's mostly a non-issue. Either way, range requests are performed on the uncompressed data which can lead to all sorts of surprises.

@alexprengere
Copy link
Collaborator

Thanks a lot for this. Would you be able to work on a PR I can review?

@andreykryazhev
Copy link

andreykryazhev commented Aug 21, 2024

I have the similar issue, I am using flask/werkzeug for serving static files. When browser requests some file (main.js for example) flask-compress adds gzip mark to response's ETag, for example after compressing Etag looks like "1724158821.877386-11899091-378803215:gzip".

But when browser pass this value in If-None-Match header with next request to main.js werkzeug desides that main.js on filesystem is differs from browser main.js (because for werkzeug the valid value is "1724158821.877386-11899091-378803215", not "1724158821.877386-11899091-378803215:gzip") and werkzeug returns status code 200. It leads to flask_compress compresses the result again and so on.

Actually, in my opinion it is not quite flask-compress issue (this is more werkzeug issue since I cannot override ETag validation when serving static files). But may be you have some thoughts how to resolve such issue on your side?
By my opinion one of the option would be a possibility to modify response status code in flask-compress. In this case we would modify it forcefully to 304 or whatever when we needed (for example when used flask-compress cache).

P.S.: I also do not understand how weakeaning the validation can help with resolving this issue.

@ziddey
Copy link

ziddey commented Aug 29, 2024

Apologies for the late response. Since I'm not using If-Range or If-Match conditional requests, my prior solution has been working fine for me.

It works because If-None-Match only performs a weak comparison:

Weak comparison: two entity-tags are equivalent if their opaque-tags match character-by-character, regardless of either or
both being tagged as "weak".

https://datatracker.ietf.org/doc/html/rfc7232#section-2.3.2

Weakening etags when compressing is fairly common, at least for downstream servers. E.g. https://developers.cloudflare.com/cache/reference/etag-headers/#behavior-with-respect-strong-etags-disabled

However, this approach does not work for If-Range or If-Match requests since those require strong comparisons. Again, simply not supporting compression for range requests would solve etag issues for most scenarios here. Besides, such requests are likely natively compressed-- videos etc.

As for the mid-air collision example, this would be implemented outside of flask's native make_conditional (which seems to only be used by send_file?), so although not by the book, one could just work with the weak etag.. Otherwise, we could try to only weaken for send_file, like detecting direct_passthrough or having a Content-Disposition header.

If strong etags for compressed content is desired, flask-compress would need to hook into one of flask/werkzeug's functions:
send_file can take an etag as an argument, or generate one if actually sending a file.
It then calls make_conditional, which ends up generating an etag if it didn't exist in is_resource_modified.

I think the simplest solution would be to:

  • Weaken the etag if compressing (fixes if-not-match)
  • Exclude compression for range requests (flask-compress is out of the picture., and probably already was anyway)
  • Don't weaken if not a send_file-- direct_passthrough / Content-Disposition or whatever (allows any custom conditional implementation to proceed. fixes mid-air collisions).

Actually, in the last scenario, the etag can have the compression algorithm tacked on like it is currently. The user will just need to handle it accordingly when doing any custom conditional checks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants