-
-
Notifications
You must be signed in to change notification settings - Fork 77
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Overhaul UDP server #141
Overhaul UDP server #141
Conversation
268b6b1
to
b91d7e8
Compare
You could try running a second Caddy instance instead of nginx for QUIC/H3, I'm curious if it would be less bottlenecked 🤔 |
About 125MB/s bypassing caddy-l4, directly between curl and Caddy. There Caddy burns 230% CPU, but that's over loopback so in practice it just moves the bottleneck to curl which itself is single threaded. For comparison, passing Back to HTTP/3, when doing curl -> caddy-l4 -> caddy, caddy-l4 is driven to about 250% CPU. Wondering how much performance is still on the table, I wrote the simplest possible UDP proxy in Go with no unnecessary allocations, and that runs about 130%. Then I rewrote it in C (using level-triggered epoll) and that runs around 90%. I think the unfortunate reality is that UDP proxying is just going to be inherently slow until the industry catches up to HTTP/3. For bulk data transfers, I imagine UDP hasn't seen nearly as much optimization effort as TCP has. For example, sendfile(2) works with TCP but not with UDP, which means, unlike with TCP, there's no avoiding kernel<->userspace copying for HTTP/3 proxies using standard syscalls. (AFAICT.) As HTTP/3 gains increasing adoption I'm sure this will get more attention, but how long it will take is anyone's guess. |
This is fascinating progress!! Sorry I haven't replied yet -- been a very busy weekend with family, and today I caught up on a bunch of work things that also came in over the weekend (which is unusual). So I'm hoping to look at this more this week, but I want you to know this sounds really great and I'm excited to look at it! |
I should mention that Caddy has two open PRs which I believe should address the "biggest obstacle" (problem 3):
Will be circling back to this in a bit. Releasing Caddy v2.7.4 now... |
@mholt Problem 3 will be fixed by my pr. As it allows udp listeners to expose the underlying socket at the cost of managing the life cycle of udp socket by oneself. (Trivia if using listenerPool with a custom network key and constructor like the new Applying |
@WeidiDeng Gotcha. So, to clarify, both our PRs will be correct (once mine is fixed a little bit with regards to unixgram) and both should be merged (when they're ready)? |
@jtackaberry I've just pushed a commit to Caddy upstream that uses SO_REUSEPORT for UDP sockets (with @WeidiDeng's help) -- do you think that'll fix the last problem you mentioned? |
@jtackaberry With that last problem addressed, is there anything else left before I merge this? |
Sorry for leaving this hanging. I have a bit of free time coming up after the week is over so I'm aiming to test things out again next week (or possibly the weekend). |
No worries, I also left for quite a gap of time (we had a baby). Anyway, yeah, whenever you have a chance, I'll be happy to merge it once you're happpy with things. 💯 |
@jtackaberry Hey, I keep losing track of things, lol. Any interest in working together to finish this up to your satisfaction? |
@mholt You and me both! Still interested in moving this along, just got caught up in a couple other projects in the meantime. I do tend to chase a lot of squirrels. :) I will at least take a couple hours within the next week to kick the tires and get a refreshed look of the lay of the land in light of Caddy's (not-so-)recent changes. |
Ha, sounds good. Looking forward to it 😄 |
Moving up on the to-do list but free time has been an unexpected scarcity. Sorry for being such a flake. Other obligations have ended up decimating my weekends through March. Meanwhile I just wanted to report in that I'm still alive and invested. :) |
Hi ! The PR looks great and would enable to write nice QUIC plugins ! Thanks for that ! What do you think ? Cheers and thanks a lot for the work ! |
This would be cool indeed, as @francoismichel 's work would allow this UDP proxy to be used as a HTTP/3 proxy that multiplexes at the SNI layer (no decryption needed). |
Given where this PR is at, I think it's looking pretty good (without having tested it as thoroughly as @jtackaberry) -- and I might be inclined to just go ahead and merge it soon, with the possibility for follow-up patches later if needed. |
@mholt This PR is very useful and I hope it will be merged soon. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jtackaberry I'll merge what you've got so far, I think it's probably best to get it out there at this point and have people try it and iterate on it. And you can always revisit it when you have time! 👍
Thanks for the great contribution!
Following up from #140, the UDP server implementation has two fatal flaws:
1 is an easy enough fix. 2 is addressed by this PR in the following way:
udpConns
(keyed on downstream client addr:port) for the given UDP server portudpConns
map is checked for that remoteaddr:port
.packetConn
is created for that downstream and a goroutine is created to call the long-running proxy handlerpacketConn
has been updated to include additional fields to allow for communication with the UDP serverpacketConn
to read new packets from the server for this particular downstream, and there's a channel for it to communicate back to the UDP server that the connection has been closed.udpConns
map, which avoids the need to acquire a mutex for each packet.)packetConn
's packet channelpacketConn
now has the concept of an idle timeout. If no new packets are received from the downstream within a period of time (currently hardcoded) then the connection is considered closed. Any pendingRead()
call is immediately returned with io.EOF, which in turn causes the upstream connections to be closed, which allows the handler (and the per-connection goroutine which called it) to terminate.I've tested this with HTTP/3 (with QUIC-enabled nginx and curl). Problem 1 stopped the current master branch dead in its tracks, since it couldn't get past the initial handshake (due to the first packet being 1200 bytes, which got truncated by the 1024 byte buffer).
Even after fixing that in master, the QUIC handshake didn't complete because each new packet from the downstream would be forwarded to the upstream from a different source port, so the QUIC server side saw them as unrelated packets.
With this PR, curl is able to fetch via caddy to the nginx upstream, both short requests as well as bulk downloads. Performance isn't brilliant, it must be said: with all 3 things (curl, caddy, nginx) running on the same box, caddy runs around 200-220% CPU in order to bottleneck nginx, which has a single thread bottleneck so runs at 98%+. The nginx bottleneck caps effective throughput to about 70-80MB/s. Disappointing results from QUIC there, but not Caddy's fault. :)
But it works, which is an already an improvement.
Pending problems:
The last problem is the biggest obstacle right now, and I'm not really sure how to fix it. Here's what happens:
Handler.proxy()
is diligently doingio.Copy()
for upstream->downstream and downstream->upstream (in separate goroutines)io.Copy()
invocations both immediately abort with "use of closed network connection"This isn't a problem for TCP, because we get a separate socket representing the TCP connection that's independent from the listen socket. This isn't the case for UDP, where incoming data from all downstreams is always received over the same socket, so any in-flight handlers proxying data between upstream/downstream depend on this socket continuing to exist.
This is a consequence of how config reloading is done in general within Caddy, so I think I'll need to depend on your suggestions at this point. It's the main thing keeping this PR in draft status -- although even with that problem, the code in the PR still significantly improves UDP proxying behavior with caddy-l4.