Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mosquitto_loop_stop sometimes does not join the spawned thread #3207

Open
Vzaa opened this issue Jan 29, 2025 · 0 comments
Open

mosquitto_loop_stop sometimes does not join the spawned thread #3207

Vzaa opened this issue Jan 29, 2025 · 0 comments
Labels
Status: Available No one has claimed responsibility for resolving this issue.

Comments

@Vzaa
Copy link

Vzaa commented Jan 29, 2025

Hello,

EDIT: I've just noticed that I overlooked #2905 and #2377 which reports the same cases as one of the cases below where mosquitto_loop_stop can sometimes return MOSQ_ERR_INVAL

In an application that uses mosquitto's threaded client interface (Linux with pthreads), I've noticed mosquitto_loop_stop can sometimes return MOSQ_ERR_INVAL after calling mosquitto_disconnect and mosquitto_loop_stop in order, in which case the spawned thread is not joined and in an application that does issues many mosquitto_loop_start and mosquitto_loop_stops in its lifetime this ends up in memory leaks accumulating over time.

Not being familiar with the internals, I took a briefish look at it (with some additional insight from a colleague who is more familiar with mosquitto), and noticed mosquitto_loop_stop is probably prone to racy behavior and the change in 0d1837e that addressed #2242 is relevant.

From what I figured, the thread exit is initiated by mosquitto_disconnect, and while exiting in mosquitto__thread_main the spawned thread sets the handle's threaded to mosq_ts_none. Meanwhile in the other thread, mosquitto_loop_stop checks for mosq->threaded != mosq_ts_self. From what I can tell, if the spawned thread exits before this check mosquitto_loop_stop will always return MOSQ_ERR_INVAL and not join the thread, which will always result in memory leaks. Which is triggered by the race situation from calling disconnect and then stop.

While looking into this in a test application, I've ran into a another case where mosquitto_loop_stop returns MOSQ_ERR_INVAL. In a test application (added below) that does the below in a loop:

  1. mosquitto_loop_start
  2. mosquitto_connect_async
  3. wait for mosquitto connect callback
  4. mosquitto_disconnect
  5. mosquitto_loop_stop

In rare cases mosquitto_disconnect can return MOSQ_ERR_NO_CONN where the thread is already exited and mosquitto_loop_stop once again returns MOSQ_ERR_INVAL and the thread is not joined and the resources are leaked. I couldn't figure out why mosquitto_disconnect returns MOSQ_ERR_NO_CONN in this case.


I've managed to reproduce both cases on both the current master c85313d and the latest tagged release v2.0.20 on Ubuntu 22.04(gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0).

I built mosquitto with the cmake options:

cmake -DWITH_TLS=OFF -DWITH_CJSON=OFF -DWITH_WEBSOCKETS=OFF -DDOCUMENTATION=OFF -DWITH_THREADING=ON -DCMAKE_BUILD_TYPE=Release -DWITH_STATIC_LIBRARIES=ON

After which I start the broker with pretty much the default config

# adjust as needed
log_dest stderr
log_type error
# log_type warning
# log_type notice
# log_type information

# If mosquitto is started as root then it will change to this user.
# If the "mosquitto" user does not exist it will try to change to
# the "nobody" user.
# If mosquitto is not started as root then this is ignored.
user mosquitto

allow_anonymous true

With the broker running in the background. Running the below test application with stdout pointed to null with > /dev/null, in stderr, in error cases it can either print Failed to stop loop case 1, disc: The client is not currently connected., stop: Invalid arguments provided. or Failed to stop loop case 2, stop: Invalid arguments provided.. Case 1, disconnect returns NO_CONN, is a lot rare but it does seem to occur in my environment.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdbool.h>
#include <inttypes.h>
#include <unistd.h>
#include <assert.h>
#include <sys/eventfd.h>
#include "mosquitto.h"

static int efd;

static void write_eventfd()
{
	int rc;
	uint64_t one = 1;
	rc = write(efd, &one, sizeof(one));
	assert(rc >= 0);
}

static void read_eventfd()
{
	uint64_t v;
	ssize_t len;
	len = read(efd, &v, sizeof(v));
	assert(len == sizeof(v));
	assert(1 == v);
}

static void connect_cb(struct mosquitto *mosq, void *obj, int rc)
{
	printf("%s\n", __func__);
	if (rc != 0) {
		fprintf(stderr, "%s: rc: %d\n", __func__, rc);
	}
	write_eventfd();
}

int inner(void)
{
	int rc;

	struct mosquitto *client = NULL;

	client = mosquitto_new("testclient", true, NULL);
	if (!client) {
		goto bail;
	}

	mosquitto_connect_callback_set(client, connect_cb);
	rc = mosquitto_loop_start(client);
	if (rc != MOSQ_ERR_SUCCESS) {
		goto bail;
	}

	rc = mosquitto_connect_async(client, "localhost", 1883, 60);
	if (rc != MOSQ_ERR_SUCCESS) {
		goto bail;
	}
	printf("wait connection\n");
	read_eventfd();
	printf("connected\n");

	int disc_rc = mosquitto_disconnect(client);
#if 0 /* setting to 1 will almost always trigger case 2 below */
	sleep(1)
#endif
	int stop_rc = mosquitto_loop_stop(client, false);
	if (stop_rc != MOSQ_ERR_SUCCESS) {
		if (disc_rc != MOSQ_ERR_SUCCESS) {
			fprintf(stderr, "Failed to stop loop case 1, disc: %s, stop: %s\n",
				mosquitto_strerror(disc_rc), mosquitto_strerror(stop_rc));
		} else {
			fprintf(stderr, "Failed to stop loop case 2, stop: %s\n",
				mosquitto_strerror(stop_rc));
		}
	}

	mosquitto_destroy(client);

	return 0;
bail:
	if (client) {
		mosquitto_destroy(client);
	}
	return -1;
}

int main(int argc, char *argv[])
{
	int rc;

	efd = eventfd(0, 0);
	if (efd == -1) {
		goto bail;
	}

	rc = mosquitto_lib_init();
	if (rc != MOSQ_ERR_SUCCESS) {
		goto bail;
	}

	for (int i = 0; i < 10000; i++) {
		rc = inner();
		if (rc != 0) {
			fprintf(stderr, "other failure\n");
		}
	}

	mosquitto_lib_cleanup();
	close(efd);
	return EXIT_SUCCESS;
bail:
	if (efd > 0) {
		close(efd);
	}
	return EXIT_FAILURE;
}

I'm sorry if this was a bit too wordy. Thank you for the project!

@github-actions github-actions bot added the Status: Available No one has claimed responsibility for resolving this issue. label Jan 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Available No one has claimed responsibility for resolving this issue.
Projects
None yet
Development

No branches or pull requests

1 participant