Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FreeIPA server configuration failed - named starting failed - pipe() failed: Too many open files #653

Open
BrianMer opened this issue Feb 5, 2025 · 4 comments · May be fixed by #656
Open

Comments

@BrianMer
Copy link

BrianMer commented Feb 5, 2025

Hi,

I just tried to install FreeIPA via Podman, and it keeps failing.

After debugging for a while, it seems like it is related to named not (re)starting.

In journalctl -xeu named:

Feb 05 14:10:21 freeipa.XXX.lan named[8729]: ../../../../lib/isc/unix/socket.c:3560: unexpected error:
Feb 05 14:10:21 freeipa.XXX.lan named[8729]: pipe() failed: Too many open files
Feb 05 14:10:21 freeipa.XXX.lan named[8729]: ../../../../lib/isc/unix/socket.c:3458: fatal error:
Feb 05 14:10:21 freeipa.XXX.lan named[8729]: epoll_wait() failed: Invalid argument
Feb 05 14:10:21 freeipa.XXX.lan named[8729]: exiting (due to fatal error in library)
Feb 05 14:10:21 freeipa.XXX.lan named[8729]: ../../../../lib/isc/unix/socket.c:3560: unexpected error:
Feb 05 14:10:21 freeipa.XXX.lan named[8729]: pipe() failed: Too many open files

After checking sysctl, ulimit, /etc/security/limits.conf and finding nothing, I saw that a line was added in /etc/systemd/system.conf: DefaultLimitNOFILE=1024.

I tried to increase the value or comment the line, and the named service starts successfully after that.

It seems like it is related to commit 3d8437d.

Is it possible to comment the line or to increase the value to fix the installation?

@adelton
Copy link
Collaborator

adelton commented Feb 5, 2025

Our tests in https://github.com/freeipa/freeipa-container/actions pass so before rushing to make any changes, we'd really need to understand what is different about your setup.

What host OS and version do you use, what podman version, and what OS and version in the container is this? Is the podman running rootless or rootful? What is the exact podman run command and parameters you use?

Can you run

docker=podman tests/run-partial-tests.sh Dockerfile.<the-container-os-you-use>

to see if that passes?

@BrianMer
Copy link
Author

BrianMer commented Feb 6, 2025

Hi,

Thank you for the (very) quick response.

The server (host) runs on Debian 12, podman is version 4.3.1, and I run Rocky 9 in the container.

The podman is running rootful (due to Ansible become: yes in the playbook that installs every services on the system).

The podman command is done via Ansible containers.podman.podman_container, but it can be translated as:
sudo podman run --name freeipa-server --hostname freeipa.XXXX.lan --sysctl net.ipv6.conf.all.disable_ipv6=0 --sysctl net.ipv6.conf.lo.disable_ipv6=0 --cap-add CAP_SYS_TIME --env IPA_SERVER_INSTALL_OPTS="--unattended --ds-password='XXXXXXXX' --admin-password='XXXXXXXXX' --ip-address=XX.XX.XX.XX --dns=127.0.0.1 --domain=XXXX.lan --realm=XXXX.LAN --setup-dns --ntp-server=XX.XX.XX.X --no-reverse --no-forwarders" --restart=no --network XXXXX-XXX_XXXXXX --ip XX.XX.XX.XX --volume /srv/lib/freeipa-data:/data --detach=True localhost/freeipa-server:rocky9

The server is disconnected from the Internet, that's why you see the localhost/freeipa-server:rocky9 image name, but it is the official one from docker.io.

Finally running your command leads to the same error (excepting IPv6, but it is disabled in our context):

Named service failed to start (CalledProcessError(Command ['/bin/systemctl', 'restart', 'named.service'] returned non-zero exit status 1: 'Job for named.service failed because the control process exited with error code.\nSee "systemctl status named.service" and "journalctl -xeu named.service" for details.\n'))
named service failed to start
[...]
Invalid IP address fe80::fcf7:a2ff:fe66:e9e0 for ipa.example.test.: cannot use link-local IP address fe80::fcf7:a2ff:fe66:e9e0
CalledProcessError(Command ['/bin/systemctl', 'restart', 'ipa.service'] returned non-zero exit status 1: 'Job for ipa.service failed because the control process exited with error code.\nSee "systemctl status ipa.service" and "journalctl -xeu ipa.service" for details.\n')
The ipa-server-install command failed. See /var/log/ipaserver-install.log for more information

ipa.service   loaded failed failed Identity, Policy, Audit
named.service loaded failed failed Berkeley Internet Name Domain (DNS)
× ipa.service - Identity, Policy, Audit
     Loaded: loaded (/usr/lib/systemd/system/ipa.service; enabled; preset: disabled)
     Active: failed (Result: exit-code) since Thu 2025-02-06 09:06:02 UTC; 1s ago
    Process: 9163 ExecStart=/usr/sbin/ipactl start (code=exited, status=1/FAILURE)
   Main PID: 9163 (code=exited, status=1/FAILURE)
        CPU: 3.289s

Feb 06 09:06:02 ipa.example.test ipactl[9163]: Hint: You can use --ignore-service-failure option for forced start in case that a non-critical service failed
Feb 06 09:06:02 ipa.example.test ipactl[9163]: Aborting ipactl
Feb 06 09:06:02 ipa.example.test ipactl[9163]: Starting Directory Service
Feb 06 09:06:02 ipa.example.test ipactl[9163]: Starting krb5kdc Service
Feb 06 09:06:02 ipa.example.test ipactl[9163]: Starting kadmin Service
Feb 06 09:06:02 ipa.example.test ipactl[9163]: Starting named Service
Feb 06 09:06:02 ipa.example.test systemd[1]: ipa.service: Main process exited, code=exited, status=1/FAILURE
Feb 06 09:06:02 ipa.example.test systemd[1]: ipa.service: Failed with result 'exit-code'.
Feb 06 09:06:02 ipa.example.test systemd[1]: Failed to start Identity, Policy, Audit.
Feb 06 09:06:02 ipa.example.test systemd[1]: ipa.service: Consumed 3.289s CPU time.
× named.service - Berkeley Internet Name Domain (DNS)
     Loaded: loaded (/usr/lib/systemd/system/named.service; disabled; preset: disabled)
     Active: failed (Result: exit-code) since Thu 2025-02-06 09:05:57 UTC; 6s ago
    Process: 9175 ExecStartPre=/bin/bash -c if [ ! "$DISABLE_ZONE_CHECKING" == "yes" ]; then /usr/sbin/named-checkconf -z "$NAMEDCONF"; else echo "Checking of zone files is disabled"; fi (code=exited, status=0/SUCCESS)
    Process: 9177 ExecStart=/usr/sbin/named -u named -c ${NAMEDCONF} $OPTIONS (code=exited, status=1/FAILURE)
        CPU: 3.092s

Feb 06 09:05:57 ipa.example.test named[9178]: pipe() failed: Too many open files
Feb 06 09:05:57 ipa.example.test named[9178]: ../../../../lib/isc/unix/socket.c:3458: fatal error:
Feb 06 09:05:57 ipa.example.test named[9178]: epoll_wait() failed: Invalid argument
Feb 06 09:05:57 ipa.example.test named[9178]: exiting (due to fatal error in library)
Feb 06 09:05:57 ipa.example.test named[9178]: ../../../../lib/isc/unix/socket.c:3560: unexpected error:
Feb 06 09:05:57 ipa.example.test named[9178]: pipe() failed: Too many open files
Feb 06 09:05:57 ipa.example.test systemd[1]: named.service: Control process exited, code=exited, status=1/FAILURE
Feb 06 09:05:57 ipa.example.test systemd[1]: named.service: Failed with result 'exit-code'.
Feb 06 09:05:57 ipa.example.test systemd[1]: Failed to start Berkeley Internet Name Domain (DNS).
Feb 06 09:05:57 ipa.example.test systemd[1]: named.service: Consumed 3.092s CPU time.

@adelton
Copy link
Collaborator

adelton commented Feb 7, 2025

Could you please try if running

mkdir /etc/systemd/system.conf.d
( echo "[Manager]" ; echo "DefaultLimitNOFILE=1024:524288" ) >> /etc/systemd/system.conf.d/DefaultLimitNOFILE.conf

in the container (or putting this to RUN clauses in the Dockerfile and rebuilding the image) would fix the problem for you?

@adelton
Copy link
Collaborator

adelton commented Feb 7, 2025

So sum up the history around us setting the DefaultLimitNOFILE in FreeIPA container images:

There used to be an issue https://bugzilla.redhat.com/show_bug.cgi?id=1656519 where certmonger was blindly calling fcntl for all possible file descriptors. That lead to timeouts in containers because docker is setting the limit to very high number (or to the max?)

$ docker run --rm registry.fedoraproject.org/fedora ulimit -n
1073741816

so certmonger was looping for a very long time.

For comparison, on a typical Fedora host, the limit is

$ systemctl show | grep NOFILE
DefaultLimitNOFILE=524288
DefaultLimitNOFILESoft=1024

and podman by default sets more sensible defaults as well:

$ podman run --rm registry.fedoraproject.org/fedora ulimit -n
524288
$ sudo podman run --rm registry.fedoraproject.org/fedora ulimit -n
1048576

The certmonger code has since been changed (https://pagure.io/certmonger/pull-request/130#request_diff) but having the FreeIPA container's hard limit more sensible than docker's 1073741816 still seems like a reasonable security setup. Of course, forcing it to be just 1024 for both soft and hard might have been too restrictive, even if we did not see ill effects until this report.

That's why I'm trying to find out if having the soft/hard limits match what we have on a typical host would be the best approach going forward.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants