Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

socket.gaierror: [Errno -2] Name or service not known #1198

Open
jeremielalanne opened this issue Oct 15, 2024 · 3 comments
Open

socket.gaierror: [Errno -2] Name or service not known #1198

jeremielalanne opened this issue Oct 15, 2024 · 3 comments
Labels

Comments

@jeremielalanne
Copy link

jeremielalanne commented Oct 15, 2024

Context

  • Using Azure IoT Edge 1.4
  • OS and version used: Ubuntu 20.04
  • Python version: 3.9

Description of the issue

Hello guys,

HI have been having that same issue for a long time now, and only on python modules, most of the time the module cannot start, whenever the module is hard shutdown, without executing the shutdown function, for example when the device is unplugged. After many retries it finally succeeds, but sometimes it takes hours of restarts

Code sample exhibiting the issue

main.py

##################################
# Main function / Env settler
##################################
def main():
    """ Main function.
    Is used to setup the whole module.
    """
    
    if not sys.version >= "3.5.3":
        raise Exception( "The module requires python 3.5.3+. Current version of Python: %s" % sys.version )
    log("IoT Hub Client for Python" )
    time.sleep(5)

    # NOTE: Client is implicitly connected due to the handler being set on it
    client = create_client()

    # Define a handler to cleanup when module is is terminated by Edge
    def module_termination_handler(signal, frame):
        log("IoTHubClient stopped by Edge")
        stop_event.set()

    # Set the Edge termination handler
    signal.signal(signal.SIGTERM, module_termination_handler)

    # Run
    loop = asyncio.get_event_loop()
    try:
        loop.run_until_complete(run_module(client))
    except Exception as e:
        log("Unexpected error %s " % e)
        raise
    finally:
        log("Shutting down IoT Hub Client...")
        loop.run_until_complete(client.shutdown())
        loop.stop()
        loop.close()


if __name__ == "__main__":
    main()

As I saw in another post, try to add time.sleep(5), but didn't work

Console log of the issue

Before adding the time.sleep(5)

2024-10-15T06:14:46.631787      - IoT Hub Client for Python
Subscribe for input failed.  Not enabling feature
Traceback (most recent call last):
2024-10-15T06:14:47.166569      - Exception occured in create client: {ex}
  File "/usr/local/lib/python3.7/site-packages/azure/iot/device/common/mqtt_transport.py", line 396, in connect
    host=self._hostname, port=8883, keepalive=self._keep_alive
  File "/usr/local/lib/python3.7/site-packages/paho/mqtt/client.py", line 914, in connect
    return self.reconnect()
  File "/usr/local/lib/python3.7/site-packages/paho/mqtt/client.py", line 1044, in reconnect
    sock = self._create_socket_connection()
  File "/usr/local/lib/python3.7/site-packages/paho/mqtt/client.py", line 3685, in _create_socket_connection
    return socket.create_connection(addr, timeout=self._connect_timeout, source_address=source)
  File "/usr/local/lib/python3.7/socket.py", line 707, in create_connection
    for res in getaddrinfo(host, port, 0, SOCK_STREAM):
  File "/usr/local/lib/python3.7/socket.py", line 752, in getaddrinfo
    for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno -2] Name or service not known

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/azure/iot/device/iothub/aio/async_clients.py", line 33, in handle_result
    return await callback.completion()
  File "/usr/local/lib/python3.7/site-packages/azure/iot/device/common/async_adapter.py", line 94, in completion
    return await self.future
  File "/usr/local/lib/python3.7/site-packages/azure/iot/device/common/pipeline/pipeline_stages_mqtt.py", line 193, in _run_op
    self.transport.connect(password=password)
  File "/usr/local/lib/python3.7/site-packages/azure/iot/device/common/mqtt_transport.py", line 418, in connect
    raise exceptions.ConnectionFailedError(cause=e)
azure.iot.device.common.transport_exceptions.ConnectionFailedError: ConnectionFailedError(None) caused by gaierror(-2, 'Name or service not known')

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "./main.py", line 510, in <module>
    main()
  File "./main.py", line 482, in main
    client = create_client()
  File "./main.py", line 367, in create_client
    client.on_message_received = receive_message_handler
  File "/usr/local/lib/python3.7/site-packages/azure/iot/device/iothub/aio/async_clients.py", line 740, in on_message_received
    self._generic_receive_handler_setter("on_message_received", constant.INPUT_MSG, value)
  File "/usr/local/lib/python3.7/site-packages/azure/iot/device/iothub/aio/async_clients.py", line 483, in _generic_receive_handler_setter
    fut.result()
  File "/usr/local/lib/python3.7/concurrent/futures/_base.py", line 435, in result
    return self.__get_result()
  File "/usr/local/lib/python3.7/concurrent/futures/_base.py", line 384, in __get_result
    raise self._exception
  File "/usr/local/lib/python3.7/site-packages/azure/iot/device/iothub/aio/async_clients.py", line 106, in _enable_feature
    await handle_result(callback)
  File "/usr/local/lib/python3.7/site-packages/azure/iot/device/iothub/aio/async_clients.py", line 37, in handle_result
    raise exceptions.ConnectionFailedError(message="Could not connect to IoTHub", cause=e)
azure.iot.device.exceptions.ConnectionFailedError: ConnectionFailedError('Could not connect to IoTHub') caused by ConnectionFailedError(None)
Task was destroyed but it is pending!
task: <Task pending coro=<AsyncHandlerManager._receiver_handler_runner() running at /usr/local/lib/python3.7/site-packages/azure/iot/device/iothub/aio/async_handler_manager.py:43> wait_for=<Future pending cb=[_chain_future.<locals>._call_check_cancel() at /usr/local/lib/python3.7/asyncio/futures.py:351, <TaskWakeupMethWrapper object at 0xffffa9431d90>()]> cb=[_chain_future.<locals>._call_set_state() at /usr/local/lib/python3.7/asyncio/futures.py:358]>
Task was destroyed but it is pending!
task: <Task pending coro=<_AsyncQueueProxy.get() running at /usr/local/lib/python3.7/site-packages/janus/__init__.py:451> wait_for=<Future pending cb=[<TaskWakeupMethWrapper object at 0xffffa9431f90>()]> cb=[_chain_future.<locals>._call_set_state() at /usr/local/lib/python3.7/asyncio/futures.py:358]>

After adding the time.sleep(5)

2024-10-15T06:43:55.443170      - IoT Hub Client for Python
Subscribe for input failed.  Not enabling feature
2024-10-15T06:44:02.129833      - Exception occured in create client: {ex}
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/azure/iot/device/common/mqtt_transport.py", line 396, in connect
    host=self._hostname, port=8883, keepalive=self._keep_alive
  File "/usr/local/lib/python3.7/site-packages/paho/mqtt/client.py", line 914, in connect
    return self.reconnect()
  File "/usr/local/lib/python3.7/site-packages/paho/mqtt/client.py", line 1044, in reconnect
    sock = self._create_socket_connection()
  File "/usr/local/lib/python3.7/site-packages/paho/mqtt/client.py", line 3685, in _create_socket_connection
    return socket.create_connection(addr, timeout=self._connect_timeout, source_address=source)
  File "/usr/local/lib/python3.7/socket.py", line 728, in create_connection
    raise err
  File "/usr/local/lib/python3.7/socket.py", line 716, in create_connection
    sock.connect(sa)
ConnectionRefusedError: [Errno 111] Connection refused

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/azure/iot/device/iothub/aio/async_clients.py", line 33, in handle_result
    return await callback.completion()
  File "/usr/local/lib/python3.7/site-packages/azure/iot/device/common/async_adapter.py", line 94, in completion
    return await self.future
  File "/usr/local/lib/python3.7/site-packages/azure/iot/device/common/pipeline/pipeline_stages_mqtt.py", line 193, in _run_op
    self.transport.connect(password=password)
  File "/usr/local/lib/python3.7/site-packages/azure/iot/device/common/mqtt_transport.py", line 418, in connect
    raise exceptions.ConnectionFailedError(cause=e)
azure.iot.device.common.transport_exceptions.ConnectionFailedError: ConnectionFailedError(None) caused by ConnectionRefusedError(111, 'Connection refused')

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "./main.py", line 511, in <module>
    main()
  File "./main.py", line 483, in main
    client = create_client()
  File "./main.py", line 367, in create_client
    client.on_message_received = receive_message_handler
  File "/usr/local/lib/python3.7/site-packages/azure/iot/device/iothub/aio/async_clients.py", line 740, in on_message_received
    self._generic_receive_handler_setter("on_message_received", constant.INPUT_MSG, value)
  File "/usr/local/lib/python3.7/site-packages/azure/iot/device/iothub/aio/async_clients.py", line 483, in _generic_receive_handler_setter
    fut.result()
  File "/usr/local/lib/python3.7/concurrent/futures/_base.py", line 435, in result
    return self.__get_result()
  File "/usr/local/lib/python3.7/concurrent/futures/_base.py", line 384, in __get_result
    raise self._exception
  File "/usr/local/lib/python3.7/site-packages/azure/iot/device/iothub/aio/async_clients.py", line 106, in _enable_feature
    await handle_result(callback)
  File "/usr/local/lib/python3.7/site-packages/azure/iot/device/iothub/aio/async_clients.py", line 37, in handle_result
    raise exceptions.ConnectionFailedError(message="Could not connect to IoTHub", cause=e)
azure.iot.device.exceptions.ConnectionFailedError: ConnectionFailedError('Could not connect to IoTHub') caused by ConnectionFailedError(None)
Task was destroyed but it is pending!
task: <Task pending coro=<AsyncHandlerManager._receiver_handler_runner() running at /usr/local/lib/python3.7/site-packages/azure/iot/device/iothub/aio/async_handler_manager.py:43> wait_for=<Future pending cb=[_chain_future.<locals>._call_check_cancel() at /usr/local/lib/python3.7/asyncio/futures.py:351, <TaskWakeupMethWrapper object at 0xffff83500f50>()]> cb=[_chain_future.<locals>._call_set_state() at /usr/local/lib/python3.7/asyncio/futures.py:358]>
Task was destroyed but it is pending!
task: <Task pending coro=<_AsyncQueueProxy.get() running at /usr/local/lib/python3.7/site-packages/janus/__init__.py:451> wait_for=<Future pending cb=[<TaskWakeupMethWrapper object at 0xffff83500e90>()]> cb=[_chain_future.<locals>._call_set_state() at /usr/local/lib/python3.7/asyncio/futures.py:358]>
@jeremielalanne
Copy link
Author

I tried to force the repeatability by disabling the shutdown functions and termination signal.
I saw somewhere that a test to make is to ping the hostname before it tries to create the client, that it might be because it cannot access edgeHub, but after few restarts, it failed even after a successful ping, but with a different error 'Connection refused':

Subscribe for input failed.  Not enabling feature
2024-10-15T08:14:25.873468      - Exception occured in create client: {ex}
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/azure/iot/device/common/mqtt_transport.py", line 396, in connect
    host=self._hostname, port=8883, keepalive=self._keep_alive
  File "/usr/local/lib/python3.7/site-packages/paho/mqtt/client.py", line 914, in connect
    return self.reconnect()
  File "/usr/local/lib/python3.7/site-packages/paho/mqtt/client.py", line 1044, in reconnect
    sock = self._create_socket_connection()
  File "/usr/local/lib/python3.7/site-packages/paho/mqtt/client.py", line 3685, in _create_socket_connection
    return socket.create_connection(addr, timeout=self._connect_timeout, source_address=source)
  File "/usr/local/lib/python3.7/socket.py", line 728, in create_connection
    raise err
  File "/usr/local/lib/python3.7/socket.py", line 716, in create_connection
    sock.connect(sa)
ConnectionRefusedError: [Errno 111] Connection refused

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/azure/iot/device/iothub/aio/async_clients.py", line 33, in handle_result
    return await callback.completion()
  File "/usr/local/lib/python3.7/site-packages/azure/iot/device/common/async_adapter.py", line 94, in completion
    return await self.future
  File "/usr/local/lib/python3.7/site-packages/azure/iot/device/common/pipeline/pipeline_stages_mqtt.py", line 193, in _run_op
    self.transport.connect(password=password)
  File "/usr/local/lib/python3.7/site-packages/azure/iot/device/common/mqtt_transport.py", line 418, in connect
    raise exceptions.ConnectionFailedError(cause=e)
azure.iot.device.common.transport_exceptions.ConnectionFailedError: ConnectionFailedError(None) caused by ConnectionRefusedError(111, 'Connection refused')

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "./main.py", line 521, in <module>
    main()
  File "./main.py", line 490, in main
    client = create_client()
  File "./main.py", line 367, in create_client
    client.on_message_received = receive_message_handler
  File "/usr/local/lib/python3.7/site-packages/azure/iot/device/iothub/aio/async_clients.py", line 740, in on_message_received
    self._generic_receive_handler_setter("on_message_received", constant.INPUT_MSG, value)
  File "/usr/local/lib/python3.7/site-packages/azure/iot/device/iothub/aio/async_clients.py", line 483, in _generic_receive_handler_setter
    fut.result()
  File "/usr/local/lib/python3.7/concurrent/futures/_base.py", line 435, in result
    return self.__get_result()
  File "/usr/local/lib/python3.7/concurrent/futures/_base.py", line 384, in __get_result
    raise self._exception
  File "/usr/local/lib/python3.7/site-packages/azure/iot/device/iothub/aio/async_clients.py", line 106, in _enable_feature
    await handle_result(callback)
  File "/usr/local/lib/python3.7/site-packages/azure/iot/device/iothub/aio/async_clients.py", line 37, in handle_result
    raise exceptions.ConnectionFailedError(message="Could not connect to IoTHub", cause=e)
azure.iot.device.exceptions.ConnectionFailedError: ConnectionFailedError('Could not connect to IoTHub') caused by ConnectionFailedError(None)
Task was destroyed but it is pending!
task: <Task pending coro=<AsyncHandlerManager._receiver_handler_runner() running at /usr/local/lib/python3.7/site-packages/azure/iot/device/iothub/aio/async_handler_manager.py:43> wait_for=<Future pending cb=[_chain_future.<locals>._call_check_cancel() at /usr/local/lib/python3.7/asyncio/futures.py:351, <TaskWakeupMethWrapper object at 0xffffa010be90>()]> cb=[_chain_future.<locals>._call_set_state() at /usr/local/lib/python3.7/asyncio/futures.py:358]>
Task was destroyed but it is pending!
task: <Task pending coro=<_AsyncQueueProxy.get() running at /usr/local/lib/python3.7/site-packages/janus/__init__.py:451> wait_for=<Future pending cb=[<TaskWakeupMethWrapper object at 0xffffa010bc10>()]> cb=[_chain_future.<locals>._call_set_state() at /usr/local/lib/python3.7/asyncio/futures.py:358]>

@gbe-tuv
Copy link

gbe-tuv commented Jan 20, 2025

We are experiencing a similar issue.
In our custom IoT Edge module establishing connection is successful but the exception described above happens after about 24 hours of container runtime during renewal of SAS token.

Issue with more details:
#1202

@jelalanne
Copy link

It seems that I forgot to post my workaround :)
I simply added a retry process in the create_client function.

retries = 8
for attempt in range(retries):
    try:
        log("Trying to connect to IoT Edge environment")
        client = IoTHubModuleClient.create_from_edge_environment()
        client.on_message_received = receive_message_handler
        client.on_twin_desired_properties_patch_received = receive_twin_patch_handler
        client.on_method_request_received = method_request_handler
        break
    except Exception as ex:
        log(f"ERROR: Attempt {attempt + 1} failed: {ex}")
        if attempt < retries - 1:
            log("New try in {0} seconds.".format(2 ** attempt))
            time.sleep(2 ** attempt)  # Exponential delay
        else:
            log("Maximum number of attempts reached. Raising exception to higher level.")
            # Cleanup if failure occurs
            GPIO.cleanup()
            raise ex

return client

It rarely goes to 5th try

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants