Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix connection failure on iOS when connecting to a Tailscale IPv4 address from an IPv6-only mobile network such as T-Mobile #98

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

andygrundman
Copy link

T-Mobile handles IPv4 addresses, including the special 100.x.x.x Tailscale addresses, using 464xlat and NAT64. We need to force resolve the address using AF_INET instead of AF_UNSPEC in order to get a usable address to connect to. More info: https://en.wikipedia.org/wiki/IPv6_transition_mechanism#464XLAT

I am unsure if true native IPv6 currently works properly and I'm also not sure if this patch would break it if so.

@cgutman
Copy link
Member

cgutman commented Jan 5, 2025

T-Mobile handles IPv4 addresses, including the special 100.x.x.x Tailscale addresses, using 464xlat and NAT64. We need to force resolve the address using AF_INET instead of AF_UNSPEC in order to get a usable address to connect to.

464XLAT works fine on T-Mobile with the current code. For IPv4-only hosts, it will either synthesize an IPv6 address and communicate via NAT64 or use 464XLAT to appear as an end-to-end IPv4 connection over IPv6. I've used a number of T-Mobile devices over the years to connect to IPv4-only and dual-stack hosts through their network without issue, both using hostnames and IPv4 literals.

This sounds more like a routing or address translation compatibility issue with 464XLAT and Tailscale than an application bug. Are you getting a synthesized NAT64 IPv6 address rather than a 100.x.x.x Tailscale address and that's being routed onto T-Mobile's network instead of through Tailscale? That seems like an obvious device bug, since the OS should not synthesize IPv6 addresses for on-link IPv4 prefixes.

I am unsure if true native IPv6 currently works properly and I'm also not sure if this patch would break it if so.

IPv6 does work for both dual-stack and IPv6-only/NAT64 networks, and this will break it.

Even if we just preferred IPv4 over IPv6 if both are available, that's degrading our performance on devices/networks with correctly working NAT64+464XLAT configurations (by requiring a trip through 464XLAT for an otherwise totally IPv6-capable application) to fix a niche scenario with a buggy one. We could explicitly prioritize IPv4 addresses when provided with an 100.64.0.0/10 subnet, but I'd need to think about that more to determine if that's actually the correct thing to do for conventional Carrier-Grade NATs. I suspect it may not be.

@andygrundman
Copy link
Author

I neglected to save all my logs from debugging this, I will obtain some more data and we can come up with a better fix.

@andygrundman
Copy link
Author

Here's what happens, when the system only has IPv6 interfaces. RemoteAddr will resolve to this:

Resolved 100.87.206.80 to 2607:7700:0:51::6457:ce50 0

This is the 464XLAT address for the Tailscale CGNAT IPv4 address (6457:ce50 in hex). It is able to make some v4 HTTP requests to do pairing and to display the host's list of apps. But It dies during RTSP even though it wants to use rtspenc://100.87.206.80:48010, it will try to connect via connectTcpSocket(&RemoteAddr, ....

I'm thinking that a better fix is to write a bool is464XLATAddress(struct sockaddr_storage* address); and when RemoteAddr is AF_INET6 && is464XLAT(), do the conversion back to IPv4 like my previous patch. Does that sound ok to you?

@andygrundman
Copy link
Author

Here's a shot of a packet capture, hopefully this makes it a bit more obvious that real problem here is just the local routing on the phone. When it uses the v4 address it is able to route to Tailscale correctly, but the v6 just goes nowhere.
tailscale-464xlat

@andygrundman andygrundman marked this pull request as draft January 7, 2025 05:13
@cgutman
Copy link
Member

cgutman commented Jan 8, 2025

Resolved 100.87.206.80 to 2607:7700:0:51::6457:ce50 0

Aha yep, 2607:7700::/32 is T-Mobile's address space, so it is indeed blindly converting on-link IPv4 to synthesized IPv6 addresses that the NAT64 gateway can't route anywhere (because 100.64.0.0/10 must not be routed outside a service provider). What device and OS is this?

My guess is that the only reason web requests work is that the HTTP client is using Happy Eyeballs and establishing both IPv4 and IPv6 connections. Without that, it would be very obvious to the device vendor that this scenario was badly broken.

What's so confusing to me is that we explicitly use AF_UNSPEC with getaddrinfo(), which means we should get both IPv4 and IPv6. We even go through all the work to establish TCP connections to ensure the resolved address is actually working (if we get more than one address). Since the IPv6 connection would always go into a black hole and we'd use the other IPv4 address instead in that case, that must mean that we're only getting an IPv6 address back when we attempt to "resolve" an IPv4 address. That also seems crazy.

I'd be curious to see if you remove AI_ADDRCONFIG in resolveHostName() whether that causes getaddrinfo() to also return an IPv4 address and then everything works properly. There have been historical bugs in DNS clients where they would incorrectly prune address families thinking that they were unrouteable.

I'm thinking that a better fix is to write a bool is464XLATAddress(struct sockaddr_storage* address); and when RemoteAddr is AF_INET6 && is464XLAT(), do the conversion back to IPv4 like my previous patch. Does that sound ok to you?

In a 464XLAT environment, you'd rather not go through 464XLAT. Well-behaved IPv6-supporting applications should never need to interact with 464XLAT since they will use getaddrinfo(), get the synthesized IPv6 address, and communicate directly with the NAT64 gateway over IPv6 to talk to IPv4 hosts.

…cale IPv4 address from an IPv6-only mobile network such as T-Mobile.

T-Mobile handles IPv4 addresses, including the special 100.x.x.x Tailscale addresses, using 464xlat and NAT64. This patch detects the 464XLAT situation and forces the address into AF_INET IPv4 mode, allowing the connection to reach Tailscale.
@andygrundman andygrundman force-pushed the andyg.fix-ios-tmobile-464xlat branch from bc67221 to cd0e47e Compare January 8, 2025 07:32
@andygrundman
Copy link
Author

andygrundman commented Jan 8, 2025

That was a good idea but it doesn't work, probably because AF_UNSPEC is just AF_INET6 when there is no v4 interface. I did manage to hack together a working patch that should only run in this very specific case. I feel like there's gotta be a way to do this with less code, but that's BSD sockets for you...

The output from the patch looks like this now, err, well it did before I removed some extra debug. This would be a perfect thing to write a unit test for, I will look into that later.

Resolved 100.87.206.80 to 2607:7700:0:51::6457:ce50 0
ipv4only.arpa AAAA returned a synthesized address 2607:7700:0:51::c000:ab
ipv4 is a match for encoded IP 6457ce50
IPv4 address was resolved to synthesized IPv6 address 2607:7700:0:51::6457:ce50, this network might be using 464XLAT
Resolved 100.87.206.80 to 100.87.206.80 0
IPv4 address was restored via AF_INET: 100.87.206.80
Initializing audio stream...

@andygrundman andygrundman marked this pull request as ready for review January 8, 2025 07:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants