Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor aro-dnsmasq-pre.sh to not overwrite /etc/resolv.conf #4100

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

ventifus
Copy link
Collaborator

@ventifus ventifus commented Feb 12, 2025

Which issue this PR addresses:

Fixes ARO-15180

What this PR does / why we need it:

We've been overwriting /etc/resolv.conf. NetworkManager owns this file and if NetworkManager needs to refresh it we will lose our changes. Instead, create a NetworkManager drop-in /etc/NetworkManager/conf.d/dns-servers.conf with the node's IP.

Test plan for issue:

Is there any documentation that needs to be updated for this PR?

No, but the change needs to be socialized amongst ARO SRE since it affects how nameservers are managed.

How do you know this will function as expected in production?

Testing is required.

@ventifus
Copy link
Collaborator Author

/azp run ci,e2e

Copy link

Azure Pipelines successfully started running 2 pipeline(s).

Copy link
Contributor

@kimorris27 kimorris27 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What testing do you think we should do before this merges?

@tsatam
Copy link
Collaborator

tsatam commented Feb 13, 2025

Do we need to make a corresponding change to what the installer puts down? My understanding is that for new clusters, the changes in our operator won't get applied until the cluster is first upgraded, and the cluster will run with what the installer has until then.

@hawkowl
Copy link
Collaborator

hawkowl commented Feb 14, 2025

@tsatam When the Operator is first installed, it is set to allow all reconciliations, and then that is switched to only on upgrades at the end of the install process. So, this will apply to new clusters (at the cost of a reboot + install time, so we should also update the installer wrapper).

@ventifus ventifus force-pushed the ventifus/ARO-15180-networkmanager-dns-servers branch from 830431a to 2c2ca5c Compare February 15, 2025 00:14
@ventifus
Copy link
Collaborator Author

ventifus commented Feb 15, 2025

I've tested this now in a UDR + misconfigured DNS cluster (vnet dns = 172.16.0.0). I set aro.dnsmasq.enabled: "false" and manually edited 99-master-aro-dns and 99-worker-aro-dns to have the new content.

After all nodes roll out they have the following config, which looks good

sh-5.1# cat /etc/resolv.conf
# Generated by NetworkManager
nameserver 10.0.0.7
sh-5.1# cat /etc/resolv.conf.dnsmasq
# Generated by NetworkManager
search reddog.microsoft.com
nameserver 172.16.0.0
sh-5.1# cat /etc/NetworkManager/conf.d/dns-servers.conf
# Added by dnsmasq.service
[global-dns-domain-*]
servers=10.0.0.7

I made sure all the cluster operators were healthy, and worker machinesets can scale up.

N.B. even with this change we still end up touching /etc/resolv.conf with dnsmasq.service's

ExecStopPost=/bin/bash -c '/bin/mv /etc/resolv.conf.dnsmasq /etc/resolv.conf; /usr/sbin/restorecon /etc/resolv.conf'

I'll fix that up to delete dns-servers.conf instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants