-
-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support default values in placeholders #5275
base: master
Are you sure you want to change the base?
Conversation
Oooh, hey! So I assume Mercedes-Benz is using Caddy? Awesome! We'd love to hear what it's helping you solve, if that's something you can share. Thanks for working on this, it's definitely a tricky one. The JSON issue is definitely valid. Users do write JSON strings in their configs occasionally. We have the backtick-delimited token syntax in the Caddyfile to make this easier, for example:
I'm thinking that a quick-and-dirty workaround for the replacer would be to reject replacing the default if the key (left side of We do have some placeholders that take an "argument", like |
Oh, this is great -- thank you! While I echo what Francis said, I want to add my 2 cents real quick. So yeah, JSON in Caddy configs is somewhat common, and as Francis mentioned, the backtick string delimiters can be used to make this easier since it interprets those quotes within literally. Additionally, it depends how the placeholders are evaluated in a particular context or place within the config. Many places use (I somewhat regret designing placeholders the way I have, sigh.) Anyway, I think the double-quote workaround is good enough for now -- I don't think placeholders should use quotes within them. That way we don't need to worry about escaping colons. |
I agree with checking for quoted keys, this should be somewhat easy to add. We realized we'd actually love to support placeholders as default values, to be specific something similar to But trying to provide a first approach in this, I stumbled over a whole bunch of open questions around how braces are matched. An example is this (adopted from the unit tests, adding letters to make it easier to understand what's going on):
Can we do any assumptions on what's allowed as placeholder key ("not contains a quote") is already one of those). Also: how escapes get parsed is bugging me a little bit.
I understand I'm doing lots of (sometimes different) interpretation of the semantics here, but I have the feeling there are no well-defined semantics at all.
|
Regarding our use of caddy: I guess there's a bunch of uses over the entire company (we're large...). In our specific use case, caddy is embedded in an in-house authentication gateway solution, somewhat comparable to something like keycloak gatekeeper, but with several integrations for special user directories. In another more boring use case we're using it for hosting some static web content, as it fits our CI/CD pipeline well which is mostly built around creating golang container images. |
I think this is what I'd expect: first open brace, should match until first close brace, regardless of braces inside; as currently placeholders can't be nested. (That may change, as we talk about using placeholders as defaults, and thus recursive parsing. We'll need a limit on the recursion I think!)
Do you mean this is the result of adapting the Caddyfile to JSON?
You are right... I will readily admit the Caddyfile and placeholders are the weakest parts of my design 😅, so I really appreciate this feedback as I think we can make great improvements here! 👍
This should be fine, yeah -- but let's set a limit on max recursion. Maybe 2?
LGTM
Perhaps, yeah, since they're just a literal substitution (the
I think the tests with JSON objects which you've already added is about as good as it gets: other JSON syntax doesn't look like placeholders. Very cool use cases by the way, we're thrilled that M-B is using Caddy! There's a couple other large companies using Caddy in a similar fashion: Stripe (an enterprise sponsor), and the other one I'm not at liberty to say which at the moment. You're in good company though :) |
Exactly this is where I realized there are some issue with the current behavior. it's really hard to reasonably implement "recursive" defaulting like this. Implementation-wise, a "greedy" nesting would be the most simple, and I think also the one easiest to understand (and probably what most people would expect). I'm unsure in how far this might break current use cases, although I have a pretty hard time trying to imagine some.
So no, this happened when I tried adapting for nested defaults.
The recursion doesn't add relevant new cost, and we already have the pretty special "100 unmatched parenthesis" limit. I'd rather align those, and allow a decent number of alternatives. If people start templating the Caddy configuration for some weird use cases, you'll quickly reach larger amounts of nested defaults that don't really hurt. In fact: the code is not 100% complete yet (I pretty much rewrote the scanner/parser part of the function and was pretty surprised how few test cases I broke, still trying to adopt the behavior as much as possible), but I'm pretty sure that recursive parsing will not add new worst-case processing cost compared to the input length than we have already, but of course we add some function calls on the stack.
No kudos to me, they've already been there (and bugged me). ;) Test coverage actually is pretty decent already, otherwise I probably wouldn't have been brave enough to try and rewrite half of the function. You're giving me some confidence that finishing the approach I'm currently testing is reasonable (tomorrow, it's already 7pm local time in Germany). I'll add the changes as a second commit on this PR, so we can easily undo if required. If the code and approach seems fine, I'll add documentation updates, too. |
I pushed some updated code. I'd still consider this work in progress with respect to following:
Currently (on master), we have a pretty weird semantics around handling escapes. We always change I'm sorry for bugging you with that many questions for such few lines of codes, but I'm really hesitating at just changing stuff that might be considered backwards incompatible. |
I think the "too many braces" thing was the because we had a fuzzer that exploded because of it. @mohammed90 being our resident fuzzing expert, how do we re-run that fuzzer to make sure it still behaves okay? Edit: Mohammed is too busy to look into this now, but he linked me to #4170 which has the reproducer. Regarding changing escape semantics, I agree the existing behaviour is messy so I think we should rip off the bandaid and change it even if it might cause breakages. We'll put a prominent breakage warning in the release notes. No worries about the question, your attention to detail is very much appreciated! Thank you! |
@JensErat (Thanks for working on this; I meant to reply earlier, but got delayed with all the holiday things!)
Yeah; as Francis said, our fuzzer caught a problem there. I still question the value of that particular fuzz because our input is trusted (unless a glitchy script generated and deployed a ridiculous config).
Sounds great to me! I never loved my patch anyway.
You're right, and again this was lazy design on my part.
Yeah, I figured we don't need to escape } unless we're inside a placeholder, since it's not significant outside a placeholder. (But the "whether escaped or not" thing might be a bug... hm.)
Oops. 😇 I knew that was a limitation of my current implementation but didn't know if it was needed/useful.
This sounds like a bug fix to me. I think people expect escaping to work like you're describing. We should fix it.
Not at all! This is a very welcomed contribution, and a refreshing update to some very old code. (I didn't rewrite this code with Caddy 2, so this is mostly original pre-v1 logic from ~7 years ago.) PS. You might know this already, but I'm presenting to your company later this month about Caddy! |
f96274d
to
6a0a9bf
Compare
I just completed the code. Getting all the bits together was a little more involved than originally expected, slightly more complex and we definitely should have embedded a parser generator... The new replace function now mostly has these steps (earlier, these phase have been a little more interleaved):
We have lots of new tests now, ie. I also added results for ReplaceAll to cover those empty value handling branches. I guess the easiest approach in checking the new semantics is looking at the unit test inputs and expected outputs. Checking out the benchmarks, my code is generally a little bit slower, but always just a constant factor. I guess most of the extra compute time would already been there if we'd just fix what was considered a bug above. The one exemption where it got much more expensive is the If you agree to the changes, I'd continue with changes the documentation (and get the linter to agree). If there's any feedback or request for changes (I'm sure there are still some rough edges with deeper Caddy knowledge applied than I have), I'm happy to follow up.
Successfully added the optimization. Should reduce the runtime cost searching for closing braces to something like from
I assume both are rather special cases have rather little use. But you never know. :) Honestly -- I just came up with those putting together edge case unit tests...
Would you mind giving me details in private mail (or ask your other contact at Mercedes-Benz to ping me)? Maybe I'll join in if I can make it. |
@JensErat Wow, this is impressive. Thank you very much - our hats off to you! I'll start by replying to some comments, then I'll review the semantics in the tests and go from there.
Do you think it is a negligible constant? Placeholders eval is considered a hot path, so I just want to check :)
Hmm. "No placeholders" is definitely the most common scenario: the vast majority of inputs that support placeholders don't use placeholders, so it's useful for that path to be as fast as possible. How much is the slowdown here? (In my experience I find manual for loops to be faster than even simple regular expressions, but if you're benchmarking I'll trust that.)
That sounds like a super valuable addition to the code! Yes, that'd be fantastic if you have the time. :D Then we won't lose all that work you put into the reasoning/logic.
Yes, absolutely -- I'll send you an email. |
Integrating the modified code, we stumbled over some issue. We're trying to determine a redirect target either based on a custom header value or otherwise the usual What happens now: if if strings.HasPrefix(key, reqHeaderReplPrefix) {
field := key[len(reqHeaderReplPrefix):]
vals := req.Header[textproto.CanonicalMIMEHeaderKey(field)]
// always return true, since the header field might
// be present only in some requests
return strings.Join(vals, ","), true
} The issue is the return statement: no matter whether we actually do find a value, we always respond with "we found a value". The replacer code now has a hard time deciding whether to use the default or not (the empty string well might be the replaced content). The comment doesn't really help me understand why we return true here. I guess this is some kind of legacy defaulting to be able to just discard placeholders of non-existing headers/query paramters/... (we now could do something like Maybe this is (or can be) already completely handled by |
Hmm, you're right, that I think it was meant to make sure that It's definitely tricky and subtle. Not sure what to say. I think it's probably best to make the change here and just document the change, most notably mention that it could change behaviour in places where Edit: Oh, also |
I just realized that #5154 is related, regarding the discussion surrounding |
I just remembered we merged in a change in https://github.com/caddyserver/transform-encoder/blob/44f7460143b76360e74b5c3e8622afd6bc115bdf/formatencoder.go#L120 (it's a plugin, not in the main Caddy repo) which has similar default values support. I think that implementing it here might break it there? I'm not sure. We'll need to test that to see how it behaves. Hopefully it still just works and makes the |
Less than 2 I guess. But thinking about it again, maybe we actually can add a similar optimization again, but we'd have to check for
As an alternative (and less intrusive/breaking change) we might well pass a "default to empty" flag, which we can use (in replacer.go). This feels like adding a little more legacy, though. If we're already adding a kind of breaking change, maybe we should just get everything cleaned up. Changing
to
and then use this to determine what to return, from (kindly ignore the different fields, I was lazy and copy-pasted from the directly following similar code blocks instead of getting the diff):
to
I'd toggle the boolean based on whether the variable has a default or not. |
Hmm, changing For sake of discussion, what would be acceptable as an API change is a new function to replace it called something like |
One trick for adding a function parameter in a non-breaking way is to make it variadic:
(or something like that) Obviously this is not ideal, but if we are concerned about breakage, this gives us time to prepare any plugins that may be using this and forewarn them to support another parameter. Before we go further down this road though, I'd like to talk about that return statement again: if strings.HasPrefix(key, reqHeaderReplPrefix) {
field := key[len(reqHeaderReplPrefix):]
vals := req.Header[textproto.CanonicalMIMEHeaderKey(field)]
// always return true, since the header field might
// be present only in some requests
return strings.Join(vals, ","), true
} So yes, the original intent there was to remove the placeholder in cases where the header isn't present; however, with further experience, I'm not sure if that's always the right behavior, especially if we are implementing default values. If we implement default values, we can always evaluate and remove/replace the placeholder, so we don't always have to return true. If we change that to return |
Of course this precludes invalid placeholder-like text, such a JSON, which should not be replaced at all. Using JSON in header values (which are processed with Just want to make sure that's covered. We probably should have a test for the |
I now:
Further above, we discussed just not applying defaulting to any placeholders where the key has quotes in it. This is implemented and tested here: https://github.com/caddyserver/caddy/pull/5275/files#diff-ee2fe630c0af5186f990380915a1d4b9b2c15b24cb5bab9e13e87dafe700eb88R256-R257 Now it depends on which of the
Just returning false would definitely not be the right thing. In most cases, it's just passing through the boolean result we get from the map lookup. |
@JensErat Thanks for the excellent work. I saw this earlier but was waiting for CI, but now it appears that CI is stuck. Does anyone know how to jiggle it? I don't even see controls to run them manually for this PR anymore. This change is looking good. I'll review it as soon as I have a chance. Busy preparing a presentation for your company :) |
Github Action workflows don't run if a conflict exists (see https://github.com/orgs/community/discussions/26304#discussioncomment-3251303). Resolving the conflict should trigger the CI run. |
Ah, thanks Mohammed. @JensErat After a brief few days vacation, I've returned to the office and am ready to do a final review and merge this, whenever you're ready! 👍 You might be more qualified to resolve the merge conflicts. Let me know if I can help though. |
Another thing I spotted, we might want to remove this caddy/modules/caddyhttp/vars.go Line 169 in e9d95ab
There might be some similar code in other places. Searching for |
Support variable expansion also in placeholders, like in `{unsetPlaceholder:default value}`. Implemented similarly to caddyserver#3682, which provided defaults for environment variable expansion. Closes caddyserver#1793. Signed-off-by: Jens Erat <[email protected]>
If we have a string `\}`, we'd expect this to be interpreted as `}`, similar to how `\}\{` would be interpreted as `}{`. If we optimize early by searching for opening braces, we'd not strip the escape character here. Signed-off-by: Jens Erat <[email protected]>
The old parser logic was quite complicated and does not support nested placeholder evaluation for defaults, as does not support counting closing braces. I assume the given approach using two loops and switch statements is more readable, and changes braces matching behavior to support nested defaulting provided in another commit. Signed-off-by: Jens Erat <[email protected]>
Signed-off-by: Jens Erat <[email protected]>
e5621ee
to
2d8fba4
Compare
I was pretty busy, I'm sorry for the late reply. @francislavoie I have a somewhat hard time understanding how this code is used. It seems there is some relation to the cel matcher; but when I add a test case (which fails) and start debugging I don't yet see this code executed. So it seems we found like two more issues:
|
Yeah, the CEL code is pretty arcane. Sorry about that. It gets called here: https://github.com/mercedes-benz/caddy/blob/66aae7049addd406698b37ef2c8b878617e82981/modules/caddyhttp/celmatcher.go#L207 To explain what happens internally, we hook into the CEL compiler to transform Note we use this regexp to match placeholders in CEL expressions (bottom of placeholderRegexp = regexp.MustCompile(`{([a-zA-Z][\w.-]+)}`) This obviously won't match Edit: It looks like |
The default will allow pretty much anything, including escaped closing braces. I'm pretty sure this is nothing an arbitrary complex/fancy regular expression will be able to support: we'll have to catch anything in braces and analyse it. I guess we can well support it; after all a placeholder with default will have a value at any time. I'll look into that tomorrow. |
@JensErat Oh hey, I just wanted to check in and see if I can help us get the tests passing and merge this in :) |
Hey Jens, I'm going to push a commit where I hope I resolved the merge conflict successfully through a merge. Apologies if this creates a mess after pushing 😆 |
Oops, just kidding, I don't have permission to push to your fork. I guess it's up to you to finish this up 🙏 Let me know how I can help! |
@JensErat I'll be tagging 2.7 beta 1 in the next day or so. I'm happy to get this change in that, or into a later beta release if needed. Are you still interested in finishing this? |
Hi, I'm so sorry for ghosting you here. I was pretty busy with some trips (out of home for 6.5/9 weeks in a row...) and currently preparing to relocate within my city. I definitely want to finish this. I guess most work should actually be done, but the integration in the expression library interrupted me. I guess some help with the library would help at speeding up the remaining tasks. Do you think we can have a chat/call on what's the best approach here? I'll also rebase the branch soon and fix any conflicts. |
No worries @JensErat thanks for following up! No rush, we just wanted to make sure we didn't lose you 😅 I can hop on a call as well to discuss (I can probably provide more context on the CEL stuff). I think Matt can help you schedule a time. I can make it pretty much anytime this week before Friday or next week after Monday. |
@JensErat That's awesome, hope you had a good time on your trips! And I know the busy-ness of moving, it's a lot of work. Hope it goes well for you! Yeah, Francis is one of our primary CEL maintainers, and I am also happy to chat sometime. You can schedule a time with me at https://matt.chat :) |
@JensErat will you have time soon to finalize this? I'd love to get this finished up! I'd be willing to rebase, but we're not able to because this PR is under your mercedes-benz fork which doesn't allow us to commit. |
I'll try to circle back to this change this year and see if I can finish it up :) (Or anyone else is welcome to I suppose.) We'll make sure Jens is credited |
TODO: dynamic env vars with defaults in caddy.json. See caddyserver/caddy#5275
TODO: dynamic env vars with defaults in caddy.json. See caddyserver/caddy#5275
Hey @JensErat , if the CEL stuff is a blocker, let's not worry about that for now. If this change is ready by your definition otherwise, I am happy to give this one last review and merge it in. (We could always add CEL support later.) If I don't hear from you soon, I'll likely close this since I can't push to your branch. But we might salvage the changes and put them into our own branch and merge it in (with credit to you) if that's the case. |
Support variable expansion also in placeholders, like in
{unsetPlaceholder:default value}
. Implemented similarly to #3682, which provided defaults for environment variable expansion.This PR seems to work fine already, but two or three minors should be discussed (see TODO in commit).
{"json"\: "object"}
: is this a case we need to handle? the behavior seems unspecified to me right now.{"json": "object"}
: defaulting to"object"
seems appropriate to me here, can you confirm?{"json": "object"\}
: to me, this looks like a parser bug and should never have gone into the placeholder substitution code?Closes #1793.
Jens Erat [email protected], Mercedes-Benz Tech Innovation GmbH, imprint