-
-
Notifications
You must be signed in to change notification settings - Fork 408
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Potential bug in GRO code for Windows? #2041
Comments
Does the windows run on intel CPU or on ARM CPU ? |
We see a similar issue on Windows on ARM, see summary in last comment on this thread. While I myself don't have access to a Windows on ARM machine, someone at Mozilla does and is able to reproduce the issue. We have not been able to reproduce this on a Windows on ARM VM.
Interesting. I will try reproducing the issue with a larger buffer. For what it is worth, here is MSQUIC's buffer length. |
We see this issue on an x64_86 CPU. |
I thought about this a bit more and I am no longer sure that is a viable possibility unless we receive a lot of other control messages that we don't match on. |
Could the receiver interface be a dual stack (IPv4 and IPv6) one ? I had no chance to test that case when writing the code, maybe we receive ipv4 + ipv6 control messages ? |
Interesting, according to https://bugzilla.mozilla.org/show_bug.cgi?id=1916558#c43, the |
In https://learn.microsoft.com/en-us/windows-hardware/drivers/network/udp-rsc-offload, it is mentioned that this only works on Windows 11, version 24H2 and later. I'll have to investigate what version our customer is using. |
I can see in our logs that this customer did have dual-stack but I don't think it matters here because the way we do dual-stack is by having two sockets, one for IPv4 and one for IPv6. |
This code work fine on our side on Windows 10 and 11. |
I do not see any special case related to this in msquic C code https://github.com/microsoft/msquic/blob/main/src/platform/datapath_winuser.c#L3216 |
@mxinden Do you know by any chance if the affected user is using a dock? We have reports that this only happens when the user is connected via a Dell dock WD19S. It works fine over WiFi and via a separate Ethernet dongle. |
@mxinden could your colleague print the stride value (at L210) before the loop ? it is init with the length of the received packet to handle the case were received packet is not coalesced ? |
The colleague here 👋 I'm not using any dock but only the builtin WiFi, and the stride value is 1280. |
Just to avoid any confusion, the stride value is Looking at pcaps with fosstodon.org that @saschanaz provided, we see >1_500k UDP datagrams over an internet path. >1_500k over an internet path is very unlikely. My suspicion is, that the Windows kernel correctly coalesces the UDP datagrams (< 1_500k) into one >1_500k datagram, but then either does not provide the stride length, or we fail to read the stride length. |
I think you wanted to link this commit mxinden@7ce09c3 |
We have a difference with msquic, they set the max segment size to "u16:MAX - 8", we set it to "u16:MAX", maybe you can try with this (i doubt it will change anything but who knows) https://github.com/microsoft/msquic/blob/e3ffc71a375d85b1216348b2ffa1e13653213e26/src/platform/datapath_winuser.c#L1496 |
@saschanaz when you have time, could you (1) apply the following git diff to your local mozilla-central checkout, (2) build in debug mode, and (3) try whether you can reproduce the issues with fosstodon.org. If you can reproduce the issue, can you provide logs and a pcap? (I wrote the above against 79808865c472858740158ce5cbbf4e0b002cf848, but any commit from today on Also see past Bugzilla comment for log level. diff --git a/third_party/rust/quinn-udp/src/windows.rs b/third_party/rust/quinn-udp/src/windows.rs
index 886f455117ceb..054e267d97921 100644
--- a/third_party/rust/quinn-udp/src/windows.rs
+++ b/third_party/rust/quinn-udp/src/windows.rs
@@ -114,7 +114,7 @@ impl UdpSocketState {
// u32 per
// https://learn.microsoft.com/en-us/windows/win32/winsock/ipproto-udp-socket-options.
// Choice of 2^16 - 1 inspired by msquic.
- u16::MAX as u32,
+ u16::MAX as u32 - 8,
);
let now = Instant::now();
@@ -211,6 +211,7 @@ impl UdpSocketState {
let cmsg_iter = unsafe { cmsg::Iter::new(&wsa_msg) };
for cmsg in cmsg_iter {
+ error!("cmsg_iter {} {}", cmsg.cmsg_level, cmsg.cmsg_type);
const UDP_COALESCED_INFO: i32 = WinSock::UDP_COALESCED_INFO as i32;
// [header (len)][data][padding(len + sizeof(data))] -> [header][data][padding]
match (cmsg.cmsg_level, cmsg.cmsg_type) {
@@ -251,6 +252,7 @@ impl UdpSocketState {
ecn: EcnCodepoint::from_bits(ecn_bits as u8),
dst_ip,
};
+ error!("{:?}", meta[0]);
Ok(1)
}
@@ -398,7 +400,7 @@ fn set_socket_option(
pub(crate) const BATCH_SIZE: usize = 1;
// Enough to store max(IP_PKTINFO + IP_ECN, IPV6_PKTINFO + IPV6_ECN) + max(UDP_SEND_MSG_SIZE, UDP_COALESCED_INFO) bytes (header + data) and some extra margin
-const CMSG_LEN: usize = 128;
+const CMSG_LEN: usize = 512;
const OPTION_ON: u32 = 1;
// FIXME this could use [`std::sync::OnceLock`] once the MSRV is bumped to 1.70 and upper @stormshield-damiend the diff above contains your suggestion, namely to use @thomaseizinger this includes your suspicion that Thank you for the help everyone! |
I guess I need a fork to make that work? |
Sent! It still showed a lot of dropped packets with decryption failure though. |
More theories:
|
I managed to set up a Windows 11 VM that uses USB passthrough to a dedicated network card and I can confirm that GRO works there. Any ideas on what I could try to reproduce the issue are welcome :) |
msquic does it with a standard setsockeopt call
Yes we do, but we did not tested in Windows ARM only Intel. |
Thank you @saschanaz for the thorough work! For others, with the above diff the issue still persists. The pcap shows UDP datagrams with up to 20_000 bytes length over an internet link. As expected, the contained QUIC packet(s) fail to decrypt, given that multiple QUIC packets with short headers are disguised as a single large QUIC packet. |
For what it is worth, Firefox Nightly does URO on Windows. While we don't see many coalesced receives, we do see some, e.g. in the 99th percentile 8 or more. |
@nibanks would you happen to have any guidance/hot tips for this stuff? |
Is there a specific problem you need help with? I don't have time to go through the whole thread here. |
@nibanks summary:
Note that this is the same issue as the one discussed on Slack / e-mail last week, see mail UDP Receive Segment Coalescing Offload (URO) on Windows on ARM. |
Per my understanding the URO thing requires 24H2, right? We probably need to make sure that when requesting repro. |
@mxinden Have you checked what the return value of quinn/quinn-udp/src/windows.rs Lines 110 to 118 in f8b8c50
Maybe this actually fails but leaves the driver in a kind of limbo state and we should attempt to unset this on failure? Also, what do people think of adding a safe-guard mechanism like we do for GSO? If the read packet exceeds the interface MTU (I saw y'all have been collaborating on https://github.com/mozilla/mtu), we attempt to unset the socket option. |
It sounds like that would make sense as a last-ditch mitigation, though it'd be nice to understand the root cause better, especially since it sounds like other software might be dodging this issue somehow. Finding the right interface MTU to refer to might also be complex in practice, since it may change over time and/or depending on address. |
I can now consistently reproduce the bug. It is tied to WSL. Installing WSL makes it occur. Disabling WSL "fixes" the issue. Here is a smallish reproducer. On a Windows on ARM device:
Once reproduced, you can "fix" the bug by disabling WSL:
@stormshield-damiend do you have an idea why it might fail with WSL installed, but not without? Note that I am using https://fosstodon.org here which is using Fastly, which consistently sends packet trains (2 or more UDP datagrams back-to-back), which are then coalesced by the client OS. This is not tied to https://fosstodon.org or Fastly. |
Very interesting! Does that also reproduce in a VM? Or only on real hardware? |
We were not able to reproduce on the following machines, despite WSL being installed:
|
I have no idea why this fails when WSL is installed |
On Windows on ARM with Windows Subsystem for Linux (WSL) `WSA_RECVMSG` does not return the segment size of coalesced UDP datagrams. See quinn-rs#2041 for details. While lacking a fix for the root cause, don't enable URO, i.e. don't coalesce UDP datagrams on Windows on ARM.
On Windows on ARM with Windows Subsystem for Linux (WSL) `WSA_RECVMSG` does not return the segment size of coalesced UDP datagrams. See quinn-rs#2041 for details. While lacking a fix for the root cause, don't enable URO, i.e. don't coalesce UDP datagrams on Windows on ARM.
For the record, I am proposing to disable URO on Windows on ARM, at least for now, see #2071. While I can consistently reproduce the bug (see #2041 (comment)), I don't know how to fix the root cause itself. |
@nibanks ^^^ |
We'll take a look after the Thanksgiving break. Everyone's out right now. |
On Windows on ARM with Windows Subsystem for Linux (WSL) `WSA_RECVMSG` does not return the segment size of coalesced UDP datagrams. See quinn-rs#2041 for details. While lacking a fix for the root cause, don't enable URO, i.e. don't coalesce UDP datagrams on Windows on ARM.
On Windows on ARM with Windows Subsystem for Linux (WSL) `WSA_RECVMSG` does not return the segment size of coalesced UDP datagrams. See quinn-rs#2041 for details. While lacking a fix for the root cause, don't enable URO, i.e. don't coalesce UDP datagrams on Windows on ARM.
On Windows on ARM with Windows Subsystem for Linux (WSL) `WSA_RECVMSG` does not return the segment size of coalesced UDP datagrams. See #2041 for details. While lacking a fix for the root cause, don't enable URO, i.e. don't coalesce UDP datagrams on Windows on ARM.
I'd like to propose an additional (automated?) way to disable URO to workaround this. We got another telemetry alert where we received packets on a machine that are bigger than the MTU. I'd like to either have an API where I can disable URO in that case or have quinn-udp disable it for me automatically. Whilst this check might not detect it in every case as there might be UDP packets read in a batch where they are in total less than the MTU, chances are there will be at least a few packets that maxed out the MTU and are read in a single batch by the driver. In this particular bug report, there were 15 of those. Thoughts? |
We have multiple bug reports for URO on Windows, including non-ARM. See: - quinn-rs#2041 - https://bugzilla.mozilla.org/show_bug.cgi?id=1916558 Instead of enabling GRO on Windows by default, this commit changes the default to off, adding a `set_gro` to optionally enable it.
Thank you for the additional data @thomaseizinger. Very helpful!
The above heuristic, i.e. to disable URO on >1500 bytes, is in my eyes too fragile to apply across a broad spectrum of use-cases. E.g. in a web-context (latency critical) it might delay connection establishment till the first >1500 bytes packet train is received. Thus I suggest disabling URO by default and adding, as you suggest above, an API to optionally enable it. Implemented in #2092. What do folks think? |
Slightly OT but why do you think that? Does the initial handshake not fit into a UDP packet? Why would the initial handshake send multiple packets that are not filled to the MTU? There will still be a delay of course because the coalesced packets needs to be retransmitted. I would expect the actual handshake to succeed though. At least that was what we saw in the logs: Connection established fine and then "breaks" (once the first packet train arrives). |
We have multiple bug reports for URO on Windows, including non-ARM. See: - quinn-rs#2041 - https://bugzilla.mozilla.org/show_bug.cgi?id=1916558 Instead of enabling GRO on Windows by default, this commit changes the default to off, adding a `set_gro` to optionally enable it.
We have multiple bug reports for URO on Windows, including non-ARM. See: - quinn-rs#2041 - https://bugzilla.mozilla.org/show_bug.cgi?id=1916558 Instead of enabling GRO on Windows by default, this commit changes the default to off, adding a `set_gro` to optionally enable it.
We have multiple bug reports for URO on Windows, including non-ARM. See: - quinn-rs#2041 - https://bugzilla.mozilla.org/show_bug.cgi?id=1916558 Instead of enabling GRO on Windows by default, this commit changes the default to off, adding a `set_gro` to optionally enable it.
We have multiple bug reports for URO on Windows, including non-ARM. See: - quinn-rs#2041 - https://bugzilla.mozilla.org/show_bug.cgi?id=1916558 Instead of enabling GRO on Windows by default, this commit changes the default to off, adding a `set_gro` to optionally enable it.
We have multiple bug reports for URO on Windows, including non-ARM. See: - quinn-rs#2041 - https://bugzilla.mozilla.org/show_bug.cgi?id=1916558 Instead of enabling GRO on Windows by default, this commit changes the default to off, adding a `set_gro` to optionally enable it.
We have multiple bug reports for URO on Windows, including non-ARM. See: - #2041 - https://bugzilla.mozilla.org/show_bug.cgi?id=1916558 Instead of enabling GRO on Windows by default, this commit changes the default to off, adding a `set_gro` to optionally enable it.
This release disables URO/GRO on Windows entirely due to hardware / driver bugs. Related: quinn-rs/quinn#2041.
We are seeing an issue on a Windows machine where
quinn-udp
gives us packets larger than the configured MTU (2540 and 3810). Here is how we split the packets: https://github.com/firezone/firezone/blob/5437c3e2df001c8c491749c7bff0c063ccdf39e1/rust/socket-factory/src/lib.rs#L210-L255This code has been working on Linux for several months and we have also not had reports that it doesn't work on Windows. Recently however a customer filed a bug report and looking through the logs, I discovered that we receive packets (i.e.
DatagramIn
in the above snippet) that are much larger than expected.Going through
quinn-udp
's source code, I developed two theories and would be interested in hearing your thoughts on it:stride
is being read incorrectly (endianess-issue?) and instead of being smaller thanlen
, it is bigger thanlen
, meaning we never end up splitting the packets despite it being coalesced.quinn/quinn-udp/src/windows.rs
Line 241 in d23e4e4
UDP_COALESCED_INFO
message into it and can't tell us that the buffer is coalesced even though it is.Unfortunately, I've not been able to reproduce this on my own hardware :(
The text was updated successfully, but these errors were encountered: