You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
EDIT: I've just noticed that I overlooked #2905 and #2377 which reports the same cases as one of the cases below where mosquitto_loop_stop can sometimes return MOSQ_ERR_INVAL
In an application that uses mosquitto's threaded client interface (Linux with pthreads), I've noticed mosquitto_loop_stop can sometimes return MOSQ_ERR_INVAL after calling mosquitto_disconnect and mosquitto_loop_stop in order, in which case the spawned thread is not joined and in an application that does issues many mosquitto_loop_start and mosquitto_loop_stops in its lifetime this ends up in memory leaks accumulating over time.
Not being familiar with the internals, I took a briefish look at it (with some additional insight from a colleague who is more familiar with mosquitto), and noticed mosquitto_loop_stop is probably prone to racy behavior and the change in 0d1837e that addressed #2242 is relevant.
From what I figured, the thread exit is initiated by mosquitto_disconnect, and while exiting in mosquitto__thread_main the spawned thread sets the handle's threaded to mosq_ts_none. Meanwhile in the other thread, mosquitto_loop_stop checks for mosq->threaded != mosq_ts_self. From what I can tell, if the spawned thread exits before this check mosquitto_loop_stop will always return MOSQ_ERR_INVAL and not join the thread, which will always result in memory leaks. Which is triggered by the race situation from calling disconnect and then stop.
While looking into this in a test application, I've ran into a another case where mosquitto_loop_stop returns MOSQ_ERR_INVAL. In a test application (added below) that does the below in a loop:
mosquitto_loop_start
mosquitto_connect_async
wait for mosquitto connect callback
mosquitto_disconnect
mosquitto_loop_stop
In rare cases mosquitto_disconnect can return MOSQ_ERR_NO_CONN where the thread is already exited and mosquitto_loop_stop once again returns MOSQ_ERR_INVAL and the thread is not joined and the resources are leaked. I couldn't figure out why mosquitto_disconnect returns MOSQ_ERR_NO_CONN in this case.
I've managed to reproduce both cases on both the current master c85313d and the latest tagged release v2.0.20 on Ubuntu 22.04(gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0).
After which I start the broker with pretty much the default config
# adjust as needed
log_dest stderr
log_type error
# log_type warning
# log_type notice
# log_type information
# If mosquitto is started as root then it will change to this user.
# If the "mosquitto" user does not exist it will try to change to
# the "nobody" user.
# If mosquitto is not started as root then this is ignored.
user mosquitto
allow_anonymous true
With the broker running in the background. Running the below test application with stdout pointed to null with > /dev/null, in stderr, in error cases it can either print Failed to stop loop case 1, disc: The client is not currently connected., stop: Invalid arguments provided. or Failed to stop loop case 2, stop: Invalid arguments provided.. Case 1, disconnect returns NO_CONN, is a lot rare but it does seem to occur in my environment.
#include<stdio.h>#include<stdlib.h>#include<string.h>#include<stdbool.h>#include<inttypes.h>#include<unistd.h>#include<assert.h>#include<sys/eventfd.h>#include"mosquitto.h"staticintefd;
staticvoidwrite_eventfd()
{
intrc;
uint64_tone=1;
rc=write(efd, &one, sizeof(one));
assert(rc >= 0);
}
staticvoidread_eventfd()
{
uint64_tv;
ssize_tlen;
len=read(efd, &v, sizeof(v));
assert(len==sizeof(v));
assert(1==v);
}
staticvoidconnect_cb(structmosquitto*mosq, void*obj, intrc)
{
printf("%s\n", __func__);
if (rc!=0) {
fprintf(stderr, "%s: rc: %d\n", __func__, rc);
}
write_eventfd();
}
intinner(void)
{
intrc;
structmosquitto*client=NULL;
client=mosquitto_new("testclient", true, NULL);
if (!client) {
goto bail;
}
mosquitto_connect_callback_set(client, connect_cb);
rc=mosquitto_loop_start(client);
if (rc!=MOSQ_ERR_SUCCESS) {
goto bail;
}
rc=mosquitto_connect_async(client, "localhost", 1883, 60);
if (rc!=MOSQ_ERR_SUCCESS) {
goto bail;
}
printf("wait connection\n");
read_eventfd();
printf("connected\n");
intdisc_rc=mosquitto_disconnect(client);
#if0/* setting to 1 will almost always trigger case 2 below */sleep(1)
#endifintstop_rc=mosquitto_loop_stop(client, false);
if (stop_rc!=MOSQ_ERR_SUCCESS) {
if (disc_rc!=MOSQ_ERR_SUCCESS) {
fprintf(stderr, "Failed to stop loop case 1, disc: %s, stop: %s\n",
mosquitto_strerror(disc_rc), mosquitto_strerror(stop_rc));
} else {
fprintf(stderr, "Failed to stop loop case 2, stop: %s\n",
mosquitto_strerror(stop_rc));
}
}
mosquitto_destroy(client);
return0;
bail:
if (client) {
mosquitto_destroy(client);
}
return-1;
}
intmain(intargc, char*argv[])
{
intrc;
efd=eventfd(0, 0);
if (efd==-1) {
goto bail;
}
rc=mosquitto_lib_init();
if (rc!=MOSQ_ERR_SUCCESS) {
goto bail;
}
for (inti=0; i<10000; i++) {
rc=inner();
if (rc!=0) {
fprintf(stderr, "other failure\n");
}
}
mosquitto_lib_cleanup();
close(efd);
returnEXIT_SUCCESS;
bail:
if (efd>0) {
close(efd);
}
returnEXIT_FAILURE;
}
I'm sorry if this was a bit too wordy. Thank you for the project!
The text was updated successfully, but these errors were encountered:
Hello,
EDIT: I've just noticed that I overlooked #2905 and #2377 which reports the same cases as one of the cases below where
mosquitto_loop_stop
can sometimes returnMOSQ_ERR_INVAL
In an application that uses mosquitto's threaded client interface (Linux with pthreads), I've noticed
mosquitto_loop_stop
can sometimes returnMOSQ_ERR_INVAL
after callingmosquitto_disconnect
andmosquitto_loop_stop
in order, in which case the spawned thread is not joined and in an application that does issues manymosquitto_loop_start
andmosquitto_loop_stop
s in its lifetime this ends up in memory leaks accumulating over time.Not being familiar with the internals, I took a briefish look at it (with some additional insight from a colleague who is more familiar with mosquitto), and noticed
mosquitto_loop_stop
is probably prone to racy behavior and the change in 0d1837e that addressed #2242 is relevant.From what I figured, the thread exit is initiated by
mosquitto_disconnect
, and while exiting inmosquitto__thread_main
the spawned thread sets the handle'sthreaded
tomosq_ts_none
. Meanwhile in the other thread,mosquitto_loop_stop
checks formosq->threaded != mosq_ts_self
. From what I can tell, if the spawned thread exits before this checkmosquitto_loop_stop
will always returnMOSQ_ERR_INVAL
and not join the thread, which will always result in memory leaks. Which is triggered by the race situation from callingdisconnect
and thenstop
.While looking into this in a test application, I've ran into a another case where
mosquitto_loop_stop
returnsMOSQ_ERR_INVAL
. In a test application (added below) that does the below in a loop:mosquitto_loop_start
mosquitto_connect_async
mosquitto_disconnect
mosquitto_loop_stop
In rare cases
mosquitto_disconnect
can returnMOSQ_ERR_NO_CONN
where the thread is already exited andmosquitto_loop_stop
once again returnsMOSQ_ERR_INVAL
and the thread is not joined and the resources are leaked. I couldn't figure out whymosquitto_disconnect
returnsMOSQ_ERR_NO_CONN
in this case.I've managed to reproduce both cases on both the current master c85313d and the latest tagged release v2.0.20 on Ubuntu 22.04(gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0).
I built mosquitto with the cmake options:
After which I start the broker with pretty much the default config
With the broker running in the background. Running the below test application with
stdout
pointed to null with> /dev/null
, instderr
, in error cases it can either printFailed to stop loop case 1, disc: The client is not currently connected., stop: Invalid arguments provided.
orFailed to stop loop case 2, stop: Invalid arguments provided.
. Case 1, disconnect returns NO_CONN, is a lot rare but it does seem to occur in my environment.I'm sorry if this was a bit too wordy. Thank you for the project!
The text was updated successfully, but these errors were encountered: