Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intermittent failure of testsock #23

Open
ralphlange opened this issue Oct 19, 2021 · 6 comments
Open

Intermittent failure of testsock #23

ralphlange opened this issue Oct 19, 2021 · 6 comments
Assignees
Labels
bug Something isn't working

Comments

@ralphlange
Copy link
Contributor

Description
In local builds on a VM, some tests fail consistently:

testsock.t .....
not ok  5 - Recv'd -1(11) [0, 0, 0, 0]
not ok  6 - src (<>) == send_addr (127.0.0.1:35007)
not ok  8 - Recv'd -1 [0, 0, 0, 0]
not ok  9 - src (<>) == sender_addr (127.0.0.1:52034)
Dubious, test returned 1 (wstat 256, 0x100)
Failed 4/33 subtests

Information:

  • PVXS Version or Git commit ID: 0.2.1
  • EPICS Base Version: 7.0.6.1
  • libevent Version: 2.1.8-5.el8
  • EPICS_HOST_ARCH: linux-x86_64
  • Host OS: RHEL 8.4
  • Compiler version: gcc version 8.4.1 20200928 (Red Hat 8.4.1-1) (GCC)
@ralphlange
Copy link
Contributor Author

It's not consistent, I have just seen all tests pass. (Just running them again.) Flap, flap.

@mdavidsaver
Copy link
Member

This is one of two (and only two!) spurious test failures I see with PVXS. Both seemingly related to apparent winsock specific synchronization oddities. This test (test_udp()) setups up two UDP sockets with one thread and uses one to send a packet to the other. It appears that, even though bind() has succeeded, sometimes the socket buffer for the second isn't ready by the time sendto() is called on the first.

The other failure I sometimes see originates with the libevent compatibility version of socketpair(), which as I think about it now is doing something similar with two TCP sockets on one thread.

@mdavidsaver mdavidsaver added the bug Something isn't working label Oct 19, 2021
@ralphlange
Copy link
Contributor Author

Still there, with PVXS 1.0.0 and EPICS Base 7.0.7 on RHEL 8.5

@mdavidsaver mdavidsaver changed the title Test failures under RHEL8 Intermittent failure of testsock Apr 14, 2023
@mdavidsaver
Copy link
Member

I think that the core (apparently incorrect) assumption I make in testsock is that the RX buffering behind a UDP socket is 100% ready after an apparently successful bind() and maybe a IP_ADD_MEMBERSHIP. So eg. a sequence bind(), sendto(), and recvfrom() can proceed without blocking.

@mdavidsaver
Copy link
Member

Attempting a fix with 5897fe2. I can't reliably trigger the failure, so I don't know if this will be sufficient.

@mdavidsaver
Copy link
Member

Well, now a different error. Seems to be less frequent than the previous ones.

  testsock.tap ..... 
  not ok 39 -  ret<0 RX3 expected error ret=14 err=11

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants