Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dyns DNS update seems slow #6540

Open
joakim-tjernlund opened this issue Jan 25, 2023 · 58 comments
Open

dyns DNS update seems slow #6540

joakim-tjernlund opened this issue Jan 25, 2023 · 58 comments
Assignees

Comments

@joakim-tjernlund
Copy link
Contributor

When connecting/booting a new computer it may take several minutes for DNS records to appear.
What minimal log config could I use to see the nsupdate calls in sssd ?

@sumit-bose
Copy link
Contributor

Hi,

with debug_level = 6 in the [domain/...] section of sssd.conf you should see the commands send to nsupdate in the logs.

bye,
Sumit

@joakim-tjernlund
Copy link
Contributor Author

Here is one partial log with debug_level = 0x0fff

May 25 19:40:13 se-jocke8-lx.infinera.com systemd[1]: Listening on sssd-kcm.socket.
May 25 19:40:13 se-jocke8-lx.infinera.com systemd[1]: Starting sssd.service...
May 25 19:40:13 se-jocke8-lx.infinera.com sssd[674]: Starting up
May 25 19:40:13 se-jocke8-lx.infinera.com sssd_be[764]: Starting up
May 25 19:40:13 se-jocke8-lx.infinera.com sssd_nss[774]: Starting up
May 25 19:40:13 se-jocke8-lx.infinera.com sssd_pam[775]: Starting up
May 25 19:40:13 se-jocke8-lx.infinera.com systemd[1]: Started sssd.service.
May 25 19:40:14 se-jocke8-lx.infinera.com sssd_be[764]: Backend is offline
May 25 19:40:17 se-jocke8-lx.infinera.com sssd_nss[774]: Enumeration requested but not enabled
May 25 19:40:20 se-jocke8-lx.infinera.com sssd_be[764]: Backend is online
May 25 19:42:14 se-jocke8-lx.infinera.com sssd[2513]: Outgoing update query:
May 25 19:42:14 se-jocke8-lx.infinera.com sssd[2513]: ;; ->>HEADER<<- opcode: UPDATE, status: NOERROR, id:  53952
May 25 19:42:14 se-jocke8-lx.infinera.com sssd[2513]: ;; flags:; ZONE: 1, PREREQ: 0, UPDATE: 2, ADDITIONAL: 0
May 25 19:42:14 se-jocke8-lx.infinera.com sssd[2513]: ;; UPDATE SECTION:

Here one can see that it took 2 mins after the backend became online until and DNS updates was checked/done

@joakim-tjernlund
Copy link
Contributor Author

@alexey-tikhonov , could you reopen ?

@joakim-tjernlund
Copy link
Contributor Author

Thanks for reopening.

Another case is pulling the eth cable and connect WLAN.
Now there almost 16 mins before I see sssd register new DNS IP address for my laptop.
There is no sssd output in between.

@joakim-tjernlund
Copy link
Contributor Author

Added debug_level = 0xffff on [sssd] as well, tried setting dyndns_iface = cscotun0, vpn0, wlan0, eth0
but there is no log entries in sssd when a pull/connect eth/wlan I/F's

Does sssd depend on some kernel config to react on I/F changes? My kernel is custom built

@joakim-tjernlund
Copy link
Contributor Author

Or perhaps on systemd build config, I have on Gentoo:

acl cgroup-hybrid cryptsetup dns-over-tls elfutils gcrypt idn kmod lz4 openssl pam pcre policykit resolvconf seccomp split-usr sysv-utils xkb zstd -apparmor -audit -curl -fido2 -gnuefi -gnutls -homed -http -importd -iptables -lzma -pkcs11 -pwquality -qrcode -selinux -test -tpm -vanilla

@alexey-tikhonov
Copy link
Member

Does sssd depend on some kernel config to react on I/F changes? My kernel is custom built

By default SSSD sets a watch using libnl (netlink) - if not disabled in sssd.conf (see man sssd.conf :: disable_netlink)

@alexey-tikhonov
Copy link
Member

alexey-tikhonov commented Jun 1, 2023

but there is no log entries in sssd

Take a note that currently netlink watch is set by main sssd process ("monitor") - hence you should check sssd.log, not domain's. This is a subject to change in upcoming versions.

@joakim-tjernlund
Copy link
Contributor Author

Does sssd depend on some kernel config to react on I/F changes? My kernel is custom built

By default SSSD sets a watch using libnl (netlink) - if not disabled in sssd.conf (see man sssd.conf :: disable_netlink)

I have netlink. Did note more log entries in /var/log/sssd though(I figured all logging was moved to journalctl ?)

I can see that sssd notices that I/Fs comes and goes but that does not trigger nsupdate.
The nsupdate seems to be on timer only.
There is an note in man pages backend offline/online should trigger nsupdate but for
me backend is always online, even if both eth and wlan I/Fs are down.

I believe dyndns need to be triggered when I/F go up/down

@alexey-tikhonov
Copy link
Member

I can see that sssd notices that I/Fs comes and goes but that does not trigger nsupdate.

static void network_status_change_cb(void *cb_data)
->
check_if_online(be_ctx, 1);

So it looks it only has any effect if sssd_be is currently offline.

@aplopez, this is probably something to take into consideration while moving netlink watch from 'monitor' to 'sssd_be'.

@joakim-tjernlund
Copy link
Contributor Author

I can see that sssd notices that I/Fs comes and goes but that does not trigger nsupdate.

static void network_status_change_cb(void *cb_data)

->

check_if_online(be_ctx, 1);

So it looks it only has any effect if sssd_be is currently offline.

@aplopez, this is probably something to take into consideration while moving netlink watch from 'monitor' to 'sssd_be'.

What would the minmum fix be to make nsupdate work for any I/F change?

@alexey-tikhonov
Copy link
Member

What would the minmum fix be to make nsupdate work for any I/F change?

From a quick glance it looks like not all the code generalized between 'ad' and 'ipa' providers in this area.

For 'ad' provider network_status_change_cb() -> data_provider_reset_offline() should trigger ad_dyndns_update_send()/ad_dyndns_update_recv(). But a proper solution should be generic for 'ad' and 'ipa'.

To mitigate a problem to some extent you can decrease dyndns_refresh_interval awhile.

@joakim-tjernlund
Copy link
Contributor Author

I remembered an observation I did long ago w.r.t RDNS updates: When ever one gets a new IP address, RDNS
will use this new IP address when recreating the RDNS entry. This leaves the old IPs RDNS entry in DNS and noone can
delete it.
I believe sssd needs to lookup DNS IP before updating the DNS with new IP so that RDNS can delete the old
entry(IP address) before recreating RDNS

@joakim-tjernlund
Copy link
Contributor Author

I can see that sssd notices that I/Fs comes and goes but that does not trigger nsupdate.

static void network_status_change_cb(void *cb_data)

->

check_if_online(be_ctx, 1);

So it looks it only has any effect if sssd_be is currently offline.

@aplopez, this is probably something to take into consideration while moving netlink watch from 'monitor' to 'sssd_be'.

Is the "move from monitor to sssd_be" in the near future?

@andreboscatto
Copy link
Contributor

At this point we are going over the entire backlog (~350 opened issues), as you probably noticed (I would like to thank you for your patience, being an active member and replying old queries!), triaging and prioritizing them accordingly.

This work you mentioned is triaged and being considered, but again, we don't know the full picture yet, so it is hard to tell.

Please bear in mind that contributions are more than welcome, and we are happy to assist :)

@joakim-tjernlund
Copy link
Contributor Author

At this point we are going over the entire backlog (~350 opened issues), as you probably noticed (I would like to thank you for your patience, being an active member and replying old queries!), triaging and prioritizing them accordingly.

This work you mentioned is triaged and being considered, but again, we don't know the full picture yet, so it is hard to tell.

Please bear in mind that contributions are more than welcome, and we are happy to assist :)

Would be nice if this issue could get some prio, it causes confusion here when DNS is not updated in a timely fashion.

@joakim-tjernlund
Copy link
Contributor Author

does this have a prio/planned status now?

@joakim-tjernlund
Copy link
Contributor Author

@aplopez , is this issue on your TODO list?

@aplopez
Copy link
Contributor

aplopez commented Mar 14, 2024

@aplopez , is this issue on your TODO list?

@joakim-tjernlund Yes, it is in my radar and it is confirmed work, but we don’t have a firm schedule for this yet. Contributions are welcomed.

@joakim-tjernlund
Copy link
Contributor Author

Thanks

@aplopez
Copy link
Contributor

aplopez commented May 13, 2024

@joakim-tjernlund Are you still facing this issue? I see 2.9.0 (published in May 2023) incorporated several fixes that could be related.

@joakim-tjernlund
Copy link
Contributor Author

@joakim-tjernlund Are you still facing this issue? I see 2.9.0 (published in May 2023) incorporated several fixes that could be related.

Yes I still see this.

@joakim-tjernlund
Copy link
Contributor Author

Question: Should the recent "[BACKENDS: Move the netlink watching to the backends]" have any impact on this issue?

@aplopez
Copy link
Contributor

aplopez commented May 14, 2024

Question: Should the recent "[BACKENDS: Move the netlink watching to the backends]" have any impact on this issue?

As I don't know yet what causes this problem, I cannot say if that commit will have any impact. My guess is that it won't, unless the problem you see is exactly with the D-Bus communications between the monitor and the backend.

@alexey-tikhonov
Copy link
Member

As I don't know yet what causes this problem

IIUC, issue is "by design": see #6540 (comment)

We just need to force nsupdate even if backend is currently online.

@aplopez
Copy link
Contributor

aplopez commented Sep 2, 2024

@joakim-tjernlund Could you please provide the SSSD logs with debug_level = 9 after a start up where your problem is happening? I don´t mean the syslog messages as above but the logs in /var/log/sssd/sssd.log

@joakim-tjernlund
Copy link
Contributor Author

Been a while now, anything I can test ?

@joakim-tjernlund
Copy link
Contributor Author

sssd_infinera.com.log

By luck I got an log that may be useful. One extra thing to note:

(2024-11-19 16:32:09): [be[infinera.com]] [be_nsupdate_create_fwd_msg] (0x0400): [RID#27]  -- Begin nsupdate message -- 

update delete se-jocke9-lx.infinera.com. in A
update add se-jocke9-lx.infinera.com. 3600 in A 10.210.72.56
send
update delete se-jocke9-lx.infinera.com. in AAAA
update add se-jocke9-lx.infinera.com. 3600 in AAAA 2a02:1420:1:72:765d:22ff:feba:7634
send
 -- End nsupdate message -- 
(2024-11-19 16:32:09): [be[infinera.com]] [child_handler_setup] (0x2000): [RID#27] Setting up signal handler up for pid [3913]
(2024-11-19 16:32:09): [be[infinera.com]] [child_handler_setup] (0x2000): [RID#27] Signal handler set up for pid [3913]
(2024-11-19 16:32:09): [be[infinera.com]] [_write_pipe_handler] (0x0400): [RID#27] All data has been sent!
(2024-11-19 16:32:09): [be[infinera.com]] [nsupdate_child_stdin_done] (0x1000): [RID#27] Sending nsupdate data complete
(2024-11-19 16:32:09): [be[infinera.com]] [child_sig_handler] (0x1000): [RID#27] Waiting for child [3913].
(2024-11-19 16:32:09): [be[infinera.com]] [child_sig_handler] (0x0100): [RID#27] child [3913] finished successfully.
(2024-11-19 16:32:09): [be[infinera.com]] [be_nsupdate_done] (0x0200): [RID#27] nsupdate child status: 0
(2024-11-19 16:32:09): [be[infinera.com]] [nsupdate_msg_create_common] (0x0200): [RID#27] Creating update message for auto-discovered realm.
(2024-11-19 16:32:09): [be[infinera.com]] [be_nsupdate_create_ptr_msg] (0x0400): [RID#27]  -- Begin nsupdate message -- 

update delete 56.72.210.10.in-addr.arpa. in PTR
update add 56.72.210.10.in-addr.arpa. 3600 in PTR se-jocke9-lx.infinera.com.
send
update delete 4.3.6.7.a.b.e.f.f.f.2.2.d.5.6.7.2.7.0.0.1.0.0.0.0.2.4.1.2.0.a.2.ip6.arpa. in PTR
update add 4.3.6.7.a.b.e.f.f.f.2.2.d.5.6.7.2.7.0.0.1.0.0.0.0.2.4.1.2.0.a.2.ip6.arpa. 3600 in PTR se-jocke9-lx.infinera.com.
send
 -- End nsupdate message -- 

Here you delete the old DNS before setting PTR entries. This will not work when IP address has changed since last time(e.g undocking a laptop which then moves from LAN to WiFi) so sssd
leaves old PTR records behind which no one but admin can delete.

@joakim-tjernlund
Copy link
Contributor Author

ping?

@alexey-tikhonov
Copy link
Member

sssd_infinera.com.log

Just a side note:

(2024-09-24 13:01:47): [be[infinera.com]] [server_loop] (0x3f7c0): Entering main loop under uid=0 (euid=0) : gid=0 (egid=0) with SECBIT_KEEP_CAPS = 0 and following capabilities:
                   CAP_CHOWN: effective = *1*, permitted = *1*, inheritable =  0 , bounding = *1*
            CAP_DAC_OVERRIDE: effective = *1*, permitted = *1*, inheritable =  0 , bounding = *1*
         CAP_DAC_READ_SEARCH: effective = *1*, permitted = *1*, inheritable =  0 , bounding = *1*
                  CAP_SETGID: effective = *1*, permitted = *1*, inheritable =  0 , bounding = *1*
                  CAP_SETUID: effective = *1*, permitted = *1*, inheritable =  0 , bounding = *1*

-- is wrong. 'sssd_be' shouldn't have any capabilities.
This is not relevant to this ticket, of course.

@joakim-tjernlund
Copy link
Contributor Author

sssd_infinera.com.log

Just a side note:

(2024-09-24 13:01:47): [be[infinera.com]] [server_loop] (0x3f7c0): Entering main loop under uid=0 (euid=0) : gid=0 (egid=0) with SECBIT_KEEP_CAPS = 0 and following capabilities:
                   CAP_CHOWN: effective = *1*, permitted = *1*, inheritable =  0 , bounding = *1*
            CAP_DAC_OVERRIDE: effective = *1*, permitted = *1*, inheritable =  0 , bounding = *1*
         CAP_DAC_READ_SEARCH: effective = *1*, permitted = *1*, inheritable =  0 , bounding = *1*
                  CAP_SETGID: effective = *1*, permitted = *1*, inheritable =  0 , bounding = *1*
                  CAP_SETUID: effective = *1*, permitted = *1*, inheritable =  0 , bounding = *1*

-- is wrong. 'sssd_be' shouldn't have any capabilities. This is not relevant to this ticket, of course.

Correct, I am still running sssd the old way as there was/is some rough corners. That has no impact on this problem though.

@alexey-tikhonov
Copy link
Member

There is an note in man pages backend offline/online should trigger nsupdate

Where is it exactly in the man pages? In the code I only find periodic task so far...

@joakim-tjernlund
Copy link
Contributor Author

There is an note in man pages backend offline/online should trigger nsupdate

Where is it exactly in the man pages? In the code I only find periodic task so far...

I cannot remember where 1.5 years later.

@alexey-tikhonov
Copy link
Member

@joakim-tjernlund, can you build and try a test package with a patch from #7798?

@joakim-tjernlund
Copy link
Contributor Author

@joakim-tjernlund, can you build and try a test package with a patch from #7798?

Jan 16 13:04:45 se-jocke9-lx.infinera.com systemd[1]: sssd.service: Failed with result 'exit-code'.
░░ Subject: Unit failed
░░ Defined-By: systemd
░░ Support: https://gentoo.org/support/
░░ 
░░ The unit sssd.service has entered the 'failed' state with result 'exit-code'.
Jan 16 13:04:45 se-jocke9-lx.infinera.com systemd[1]: Failed to start System Security Services Daemon.
░░ Subject: A start job for unit sssd.service has failed
░░ Defined-By: systemd
░░ Support: https://gentoo.org/support/
░░ 
░░ A start job for unit sssd.service has finished with a failure.
░░ 
░░ The job identifier is 14506 and the job result is failed.

and

  *  (2025-01-16 13:08:21): [be[infinera.com]] [sdap_select_principal_from_keytab_sync] (0x0020): Failed to get principal from keytab (sss_atomic_read_s() failed), see ldap_child.log (pid = 370329) for details.
   *  (2025-01-16 13:08:21): [be[infinera.com]] [ad_set_sdap_options] (0x0040): Cannot set the SASL-related options
   *  (2025-01-16 13:08:21): [be[infinera.com]] [sssm_ad_init] (0x0020): Unable to init AD id options
   *  (2025-01-16 13:08:21): [be[infinera.com]] [dp_module_run_constructor] (0x0010): Module [ad] constructor failed [5]: Input/output error
********************** BACKTRACE DUMP ENDS HERE *********************************

(2025-01-16 13:08:21): [be[infinera.com]] [dp_target_init] (0x0010): Unable to load module ad
********************** PREVIOUS MESSAGE WAS TRIGGERED BY THE FOLLOWING BACKTRACE:
   *  (2025-01-16 13:08:21): [be[infinera.com]] [dp_load_module] (0x0020): Unable to create DP module.
   *  (2025-01-16 13:08:21): [be[infinera.com]] [dp_target_init] (0x0010): Unable to load module ad
********************** BACKTRACE DUMP ENDS HERE *********************************

(2025-01-16 13:08:21): [be[infinera.com]] [be_process_init] (0x0010): Unable to setup data provider [1432158209]: Internal Error
********************** PREVIOUS MESSAGE WAS TRIGGERED BY THE FOLLOWING BACKTRACE:
   *  (2025-01-16 13:08:21): [be[infinera.com]] [dp_load_targets] (0x0020): Unable to load target [id] [80]: Accessing a corrupted shared library.
   *  (2025-01-16 13:08:21): [be[infinera.com]] [dp_init] (0x0020): Unable to initialize DP targets [1432158209]: Internal Error
   *  (2025-01-16 13:08:21): [be[infinera.com]] [dp_terminate_active_requests] (0x0400): Terminating active data provider requests
   *  (2025-01-16 13:08:21): [be[infinera.com]] [be_process_init] (0x0010): Unable to setup data provider [1432158209]: Internal Error

This is even without your patch, master fails here. Last master build was 12 dec for me

@alexey-tikhonov
Copy link
Member

[be[infinera.com]] [sdap_select_principal_from_keytab_sync] (0x0020): Failed to get principal from keytab (sss_atomic_read_s() failed), see ldap_child.log (pid = 370329) for details.

What is in 'ldap_child.log'? Most probably capabilities mismatch (sssd.service, ldap_child).

Btw, you can also cherry pick patch on top of your current sources.

@joakim-tjernlund
Copy link
Contributor Author

[be[infinera.com]] [sdap_select_principal_from_keytab_sync] (0x0020): Failed to get principal from keytab (sss_atomic_read_s() failed), see ldap_child.log (pid = 370329) for details.

What is in 'ldap_child.log'? Most probably capabilities mismatch (sssd.service, ldap_child).

It is empty

Btw, you can also cherry pick patch on top of your current sources.

I added as an external patch, it is easier for me that way when just testing

@joakim-tjernlund
Copy link
Contributor Author

just to be clear, all I did was to rebuild master the same was as always and that gave this error

@joakim-tjernlund
Copy link
Contributor Author

./configure --prefix=/usr --build=x86_64-pc-linux-gnu --host=x86_64-pc-linux-gnu --mandir=/usr/share/man --infodir=/usr/share/info --datadir=/usr/share --sysconfdir=/etc --localstatedir=/var/lib --datarootdir=/usr/share --disable-dependency-tracking --disable-silent-rules --disable-static --docdir=/usr/share/doc/sssd-9999 --htmldir=/usr/share/doc/sssd-9999/html --with-sysroot=/ --libdir=/usr/lib64 --libexecdir=/usr/libexec --localstatedir=/var --runstatedir=/run --sbindir=/usr/sbin --with-pid-path=/run --with-plugin-path=/usr/lib64/sssd --enable-pammoddir=//lib64/security --with-ldb-lib-dir=/usr/lib64/samba/ldb --with-db-path=/var/lib/sss/db --with-gpo-cache-path=/var/lib/sss/gpo_cache --with-pubconf-path=/var/lib/sss/pubconf --with-pipe-path=/var/lib/sss/pipes --with-mcache-path=/var/lib/sss/mc --with-secrets-db-path=/var/lib/sss/secrets --with-log-path=/var/log/sssd --with-kcm --enable-kcm-renewal --with-os=gentoo --disable-rpath --disable-static --disable-valgrind --with-samba --enable-cifs-idmap-plugin --without-selinux --enable-krb5-locator-plugin --disable-pac-responder --with-nfsv4-idmapd-plugin --enable-nls --with-libnl --with-manpages --without-sudo --with-autofs --with-ssh --without-oidc-child --without-passkey --without-subid --disable-systemtap --without-python2-bindings --with-python3-bindings --with-initscript=systemd --with-extended-enumeration-support --with-systemdunitdir=/usr/lib/systemd/system

@alexey-tikhonov
Copy link
Member

alexey-tikhonov commented Jan 16, 2025

[be[infinera.com]] [sdap_select_principal_from_keytab_sync] (0x0020): Failed to get principal from keytab (sss_atomic_read_s() failed), see ldap_child.log (pid = 370329) for details.

What is in 'ldap_child.log'? Most probably capabilities mismatch (sssd.service, ldap_child).

It is empty

In this case, what is the output of:

  • systemctl cat sssd | grep CapabilityBoundingSet
  • getcap /usr/libexec/sssd/ldap_child
    ?

I wonder if I missed an update in 'make install' while reducing set of required capabilities recently...

Btw, you can also cherry pick patch on top of your current sources.

I added as an external patch, it is easier for me that way when just testing

Let me know the result.

@joakim-tjernlund
Copy link
Contributor Author

joakim-tjernlund commented Jan 16, 2025

[be[infinera.com]] [sdap_select_principal_from_keytab_sync] (0x0020): Failed to get principal from keytab (sss_atomic_read_s() failed), see ldap_child.log (pid = 370329) for details.

What is in 'ldap_child.log'? Most probably capabilities mismatch (sssd.service, ldap_child).

It is empty

In this case, what is the output of:

* `systemctl cat sssd | grep CapabilityBoundingSet`

* `getcap /usr/libexec/sssd/ldap_child`
  ?
systemctl cat sssd | grep CapabilityBoundingSet
# Warning: sssd.service changed on disk, the version systemd has loaded is outdated.
# This output shows the current version of the unit's original fragment and drop-in files.
# If fragments or drop-ins were added or removed, they are not properly reflected in this output.
# Run 'systemctl daemon-reload' to reload units.
CapabilityBoundingSet= CAP_SETGID CAP_SETUID CAP_DAC_READ_SEARCH 
se-jocke9-lx sssd # getcap /usr/libexec/sssd/ldap_child
/usr/libexec/sssd/ldap_child cap_chown,cap_dac_override,cap_setgid,cap_setuid=ep

I wonder if I missed an update in 'make install' while reducing set of required capabilities recently...

Btw, you can also cherry pick patch on top of your current sources.

I added as an external patch, it is easier for me that way when just testing

Let me know the result.

Once I get sssd working again, I will

@alexey-tikhonov
Copy link
Member

/usr/libexec/sssd/ldap_child cap_chown,cap_dac_override,cap_setgid,cap_setuid=ep

That's wrong. It should be 'cap_dac_read_search=p'

sssd/Makefile.am

Line 5541 in 85784e7

-$(SETCAP) cap_dac_read_search=p $(DESTDIR)$(sssdlibexecdir)/ldap_child

@joakim-tjernlund
Copy link
Contributor Author

But I am still running sssd as root

@alexey-tikhonov
Copy link
Member

alexey-tikhonov commented Jan 16, 2025

But I am still running sssd as root

Doesn't matter.

'CapabilityBoundingSet' of your sssd.service doesn't have some of capabilities required by your 'ldap_child', hence 'ldap_child' can't start.

Btw, why "still running sssd as root"?

@joakim-tjernlund
Copy link
Contributor Author

hmm, looks like my reply never made it into github, odd

Anyway, I have fixed my sssd and running with your patch.
Tried pulling may LAN cable to let WiFi kick in and connect.
DNS did NOT change though.

@alexey-tikhonov
Copy link
Member

Anyway, I have fixed my sssd and running with your patch.
Tried pulling may LAN cable to let WiFi kick in and connect.
DNS did NOT change though.

Could you please provide domain log with debug_level=9 that covers this moment?

@joakim-tjernlund
Copy link
Contributor Author

sssd_infinera.com.log

@alexey-tikhonov
Copy link
Member

sssd_infinera.com.log

Argh...

[message_type] (0x0200): netlink Message type: 24
[route_msg_debug_print] (0x1000): route idx 4 flags 0 family 2 addr 0.0.0.0/0
[check_if_online] (0x2000): There is an online check already running.
[be_run_unconditional_online_cb] (0x0400): Running unconditional online callbacks.
[check_if_online_delayed] (0x2000): Backend is already online, nothing to do.
[be_ptask_online_cb] (0x0400): Back end is online
[be_ptask_enable] (0x0080): Task [Dyndns update]: already enabled

@alexey-tikhonov
Copy link
Member

Another attempt, I've updated patch in #7798
@joakim-tjernlund, could you please try this version?

@joakim-tjernlund
Copy link
Contributor Author

sssd_infinera.com.log-2.txt

Nope

@alexey-tikhonov
Copy link
Member

sssd_infinera.com.log-2.txt

(18:54:55): [message_type] (0x0200): netlink Message type: 16
(18:54:55): [link_msg_handler] (0x1000): netlink link message: iface idx 4 (wlan0) flags 0x1003 (broadcast,multicast,up)
(18:54:56): [watched_file_inotify_cb] (0x1000): Received inotify notification for /etc/resolv.conf
(18:54:56): [watch_update_resolv] (0x0400): Reloading /etc/resolv.conf.
(18:54:56): [recreate_ares_channel] (0x0100): Initializing new c-ares channel
(18:54:56): [recreate_ares_channel] (0x0100): Destroying the old c-ares channel
(18:54:56): [check_if_online] (0x2000): There is an online check already running.
(18:54:56): [be_run_unconditional_online_cb] (0x0400): Running unconditional online callbacks.
(18:54:56): [check_if_online_delayed] (0x2000): Backend is already online, nothing to do.
(18:54:56): [be_ptask_execute] (0x0400): Task [Dyndns update]: executing task, timeout 60 seconds
(18:54:56): [ad_dyndns_update_send] (0x0400): Performing update
(18:54:56): [sdap_id_op_connect_step] (0x4000): reusing cached connection
(18:54:56): [sdap_id_conn_data_not_idle] (0x4000): Marking connection as not idle
(18:54:56): [sdap_id_op_connect_step] (0x4000): reusing cached connection
(18:54:56): [sdap_id_conn_data_not_idle] (0x4000): Marking connection as not idle
(18:54:56): [sss_get_dualstack_addresses] (0x0080): find_iface_by_addr failed: 2:[No such file or directory]
(18:54:56): [sdap_dyndns_add_ldap_conn] (0x0080): sss_get_dualstack_addresses failed: 2:[No such file or directory]
(18:54:56): [sdap_dyndns_get_addrs_done] (0x0040): Can't get addresses from LDAP connection
...
(18:54:59): [message_type] (0x0200): netlink Message type: 24
(18:54:59): [route_msg_debug_print] (0x1000): route idx 4 flags 0 family 2 addr 0.0.0.0/0
(18:54:59): [check_if_online] (0x2000): There is an online check already running.
(18:55:00): [be_run_unconditional_online_cb] (0x0400): Running unconditional online callbacks.
(18:55:00): [check_if_online_delayed] (0x2000): Backend is already online, nothing to do.
(18:55:00): [be_ptask_execute] (0x0400): Task [Dyndns update]: executing task, timeout 60 seconds
(18:55:00): [ad_dyndns_update_send] (0x0400): Performing update
(18:55:00): [ad_dyndns_update_send] (0x0200): Last periodic update ran recently or timer in progress, not scheduling another update
(18:55:00): [be_ptask_done] (0x0400): Task [Dyndns update]: finished successfully
(18:55:00): [be_ptask_schedule] (0x0400): Task [Dyndns update]: scheduling task 345 seconds from last execution time [1737655245]

@alexey-tikhonov
Copy link
Member

alexey-tikhonov commented Jan 23, 2025

Looks like approach to force dyndns update right on netlink event is wrong.
It should only run when sssd_be is back online.

So in general existing code is more or less correct - it already enables dyndns task when backend switches online, the problem is the timer value...

@joakim-tjernlund
Copy link
Contributor Author

Looks like approach to force dyndns update right on netlink event is wrong. It should only run when sssd_be is back online.

right, you would have to be a bit smarter if using netlink since sssd would have to check if I/F went UP or DOWN etc.
before trying to update DNS.

So in general existing code is more or less correct - it already enables dyndns task when backend switches online, the problem is the timer value...

And a potential fix would be to set a low timer when netlink signals a new link?

@alexey-tikhonov
Copy link
Member

Looks like approach to force dyndns update right on netlink event is wrong. It should only run when sssd_be is back online.

right, you would have to be a bit smarter if using netlink since sssd would have to check if I/F went UP or DOWN etc. before trying to update DNS.

The problem is that with the default value of dyndns_iface: "Use the IP addresses of the interface which is used for AD LDAP connection".
So 'sssd_be' first needs to realize it's offline, re-connect using new interface, and only after that dyndns update task can update record.

So in general existing code is more or less correct - it already enables dyndns task when backend switches online, the problem is the timer value...

And a potential fix would be to set a low timer when netlink signals a new link?

Triggering event is "sssd_be switching online".
So imo proper solution is "force sssd_be to go offline if netlink says that iface used for current connection went down".

As a test, could you set ldap_connection_expire_timeout to, say, 30 seconds and see if it works as expected:

  • link1 is down, link2 is up
  • within (0..30) seconds 'sssd_be' drops current LDAP connection and reconnects
  • once reconnected, nsupdate is triggered immediately?

@joakim-tjernlund
Copy link
Contributor Author

As a test, could you set ldap_connection_expire_timeout to, say, 30 seconds and see if it works as expected:

* link1 is down, link2 is up

* within (0..30) seconds 'sssd_be' drops current LDAP connection and reconnects

* once reconnected, nsupdate is triggered immediately?

That seems to work, DNS changes within 30 secs.
I did have(for unknown reasons) ldap_enumeration_refresh_timeout = 1800 which I had
to remote before the new TMO worked.

@joakim-tjernlund
Copy link
Contributor Author

An automatic change of TMO would be nice

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants