-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Program crashes with connection pooling enabled #141
Comments
Do you have your sample code that I can try to help track down the problem? |
On 16/05/2023 14:16, Arnar Bjarni Arnarson wrote:
I have connection pooling enabled with the Microsoft ODBC Driver for
SQL Server.
I have a multithreaded application where each thread except the main
thread opens and closes connections in sequence.
It seems that the same pooled connection is being closed twice in the
pool search.
This causes the program to crash in close_pooled_connection with the
error displayed as: double free detected in tcache 2
Without connection pooling, there is no issue.
I'm using C++ with another library, SQLAPI++, but following the
problem lead me to the closed_pooled_connection function here.
I have tried isolating the connection code within my program with a
mutex, since I thought it was a thread safety issue originally, but
the issue seems to occur even with isolation.
My odbcinst.ini file:
Do you have a way I can reproduce the problem then I will see if I can
find a fix for it?
|
Yes, sorry for not including that in the original post. Here is a sample program that creates multiple threads, each with a connection. #include <SQLAPI.h> // main SQLAPI++ header
#include <algorithm>
#include <chrono>
#include <iostream>
#include <thread>
#include <vector>
using namespace std;
void connect_task(int pool_id, int thread_id) {
for (int j = 0; j < 1; j++) {
SAConnection con;
try {
con.Connect("Server=someserver;Database=somedb;Driver={ODBC Driver 17 for SQL Server}", "someuser", "somepass", SA_ODBC_Client);
auto random_wait = std::chrono::milliseconds(rand() % 100);
this_thread::sleep_for(random_wait);
if (con.isConnected()) {
con.Disconnect();
}
}
catch (SAException& x)
{
try
{
con.Rollback();
}
catch (SAException&)
{
}
cout << "Error text: " << x.ErrText().GetMultiByteChars() << endl;
}
catch (...)
{
cout << "unknown exception" << endl;
}
}
}
int main(int argc, char* argv[])
{
setlocale(LC_ALL, "");
for (int j = 0; j < 1000; j++) {
vector<std::thread> threads;
for (int i = 0; i < 1000; i++) {
threads.emplace_back(connect_task, j, i);
auto random_wait = std::chrono::milliseconds(rand() % 100);
this_thread::sleep_for(random_wait);
}
for (auto& t : threads) {
t.join();
}
}
return 0;
} |
On 16/05/2023 17:47, Arnar Bjarni Arnarson wrote:
Yes, sorry for not including that in the original post. Here is a
sample program that creates multiple threads, each with a connection.
It also uses the library I mentioned before.
I've run something similar to this (tweaked number of thread pools /
threads / iterations per thread / added mutexes) a hundred times and
almost every single time it crashes due to this error within 15 minutes.
I have not noticed any particular pattern regarding when it crashes,
seems rather random, but it's always in the same code path.
I have closed source software in which I discovered this where it
generally crashes after around 5 minutes.
Ok, I can reproduce it, looks like its a race condition once the DM
starts to expire connections from the pool. Will see what I can work out.
|
On 16/05/2023 19:12, Nick Gorham wrote:
On 16/05/2023 17:47, Arnar Bjarni Arnarson wrote:
>
> Yes, sorry for not including that in the original post. Here is a
> sample program that creates multiple threads, each with a connection.
> It also uses the library I mentioned before.
> I've run something similar to this (tweaked number of thread pools /
> threads / iterations per thread / added mutexes) a hundred times and
> almost every single time it crashes due to this error within 15 minutes.
> I have not noticed any particular pattern regarding when it crashes,
> seems rather random, but it's always in the same code path.
> I have closed source software in which I discovered this where it
> generally crashes after around 5 minutes.
>
Ok, I can reproduce it, looks like its a race condition once the DM
starts to expire connections from the pool. Will see what I can work out.
OK, So I think I can see the problem. Its one where I think I need some
opinions. The basic issue is SQLAPI issues multiple calls to SQLAllocEnv
and then does pooing. Now as it stands the DM tries to match multiple
env and pooling, but its basically failing. The failure starts once the
pool entries start to timeout. Once they timeout the DM starts to
disconnect them, but the disconnect is getting messed up with the
various env handles. Now there is a lot of code in there to try and
allow pooling and multiple envs and multiple drivers to be mixed and
matched. Now Windows seems to avoid the problem by forcing a shared
environment handle when pooling is used.
I can add shared handles, but the DM also tries very hard to avoid
people reporting that there is still memory allocated and in use when
the program exits, so its all back to counting usages.
There is a lot of code around pooling that looks a lot more complex than
it probably needs to be. We have things like pooling timeout and usage
counts that I think windows avoids. I think a lot of this can be
simplified if we implement shared env handles when pooling is enabled.
Part of the reported problem is a combination of release_env being
called when connections reach the pooling timeout value.
What do folk think about this and suggestions of directions of travel?
|
Hello, I helped locate a memory leak in UnixODBC about a year ago while
using my multithreaded server with ODBC connection pooling. I am no longer
using this multithreaded server w/connection pooling but I recall that I
had no problem using ODBC API following Microsoft ODBC docs advice for
connection pooling. I had a stress test those days, so I could check the
memory leak was gone using Valgrind, and there was no error exiting the
program after running thousands of requests and using the pool a lot.
You must set SQL_ATTR_CONNECTION_POOLING and a shared hEnv among threads,
in my case I used a global variable hEnv:
rc = SQLSetEnvAttr( NULL, SQL_ATTR_CONNECTION_POOLING,
(SQLPOINTER)SQL_CP_ONE_PER_HENV , 0 );
if ( rc != SQL_SUCCESS )
std::cerr << "[ODBC] Failed to set connection pooling option" << "\n";
rc = SQLAllocHandle ( SQL_HANDLE_ENV, SQL_NULL_HANDLE, &henv );
if ( rc != SQL_SUCCESS ) {
std::cerr << "[ODBC] Failed to allocate shared environment" << "\n";
throw std::runtime_error("Cannot initialize ODBC");
}
I don't know if this sheds some light on what may be happening with that
SQLAPI library, but I did not have any problem using ODBC API and threads,
with each thread managing it own connection using the shared hEnv, this
hEnv was freed on program exit.
Regards,
Martin
On Wed, May 17, 2023 at 5:41 AM Nick Gorham ***@***.***>
wrote:
… On 16/05/2023 19:12, Nick Gorham wrote:
> On 16/05/2023 17:47, Arnar Bjarni Arnarson wrote:
>>
>> Yes, sorry for not including that in the original post. Here is a
>> sample program that creates multiple threads, each with a connection.
>> It also uses the library I mentioned before.
>> I've run something similar to this (tweaked number of thread pools /
>> threads / iterations per thread / added mutexes) a hundred times and
>> almost every single time it crashes due to this error within 15 minutes.
>> I have not noticed any particular pattern regarding when it crashes,
>> seems rather random, but it's always in the same code path.
>> I have closed source software in which I discovered this where it
>> generally crashes after around 5 minutes.
>>
>
> Ok, I can reproduce it, looks like its a race condition once the DM
> starts to expire connections from the pool. Will see what I can work out.
>
OK, So I think I can see the problem. Its one where I think I need some
opinions. The basic issue is SQLAPI issues multiple calls to SQLAllocEnv
and then does pooing. Now as it stands the DM tries to match multiple
env and pooling, but its basically failing. The failure starts once the
pool entries start to timeout. Once they timeout the DM starts to
disconnect them, but the disconnect is getting messed up with the
various env handles. Now there is a lot of code in there to try and
allow pooling and multiple envs and multiple drivers to be mixed and
matched. Now Windows seems to avoid the problem by forcing a shared
environment handle when pooling is used.
I can add shared handles, but the DM also tries very hard to avoid
people reporting that there is still memory allocated and in use when
the program exits, so its all back to counting usages.
There is a lot of code around pooling that looks a lot more complex than
it probably needs to be. We have things like pooling timeout and usage
counts that I think windows avoids. I think a lot of this can be
simplified if we implement shared env handles when pooling is enabled.
Part of the reported problem is a combination of release_env being
called when connections reach the pooling timeout value.
What do folk think about this and suggestions of directions of travel?
—
Reply to this email directly, view it on GitHub
<#141 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AW34RADOPTPO2MT4KGAXQT3XGSMKZANCNFSM6AAAAAAYDUQAEU>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
On 17/05/2023 12:50, Martín Córdova wrote:
Hello, I helped locate a memory leak in UnixODBC about a year ago while
using my multithreaded server with ODBC connection pooling. I am no longer
using this multithreaded server w/connection pooling but I recall that I
had no problem using ODBC API following Microsoft ODBC docs advice for
connection pooling. I had a stress test those days, so I could check the
memory leak was gone using Valgrind, and there was no error exiting the
program after running thousands of requests and using the pool a lot.
You must set SQL_ATTR_CONNECTION_POOLING and a shared hEnv among threads,
in my case I used a global variable hEnv:
Yep, in this case the lib is using multiple Env's
|
I remember that the only limitation, regarding ODBC docs, was that UnixODBC
does not support SQL_CP_ONE_PER_DRIVER, the constant does not exist in
UnixODBC headers.
Reference:
https://learn.microsoft.com/en-us/sql/odbc/reference/syntax/sqlsetenvattr-function?view=sql-server-ver16
On Wed, May 17, 2023 at 8:00 AM Nick Gorham ***@***.***>
wrote:
… On 17/05/2023 12:50, Martín Córdova wrote:
> Hello, I helped locate a memory leak in UnixODBC about a year ago while
> using my multithreaded server with ODBC connection pooling. I am no
longer
> using this multithreaded server w/connection pooling but I recall that I
> had no problem using ODBC API following Microsoft ODBC docs advice for
> connection pooling. I had a stress test those days, so I could check the
> memory leak was gone using Valgrind, and there was no error exiting the
> program after running thousands of requests and using the pool a lot.
>
> You must set SQL_ATTR_CONNECTION_POOLING and a shared hEnv among threads,
> in my case I used a global variable hEnv:
Yep, in this case the lib is using multiple Env's
—
Reply to this email directly, view it on GitHub
<#141 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AW34RABQ3ZUT4ZNKJPMN5Q3XGS4WHANCNFSM6AAAAAAYDUQAEU>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
On 17/05/2023 13:14, Martín Córdova wrote:
I remember that the only limitation, regarding ODBC docs, was that
UnixODBC
does not support SQL_CP_ONE_PER_DRIVER, the constant does not exist in
UnixODBC headers.
Reference:
https://learn.microsoft.com/en-us/sql/odbc/reference/syntax/sqlsetenvattr-function?view=sql-server-ver16
Ok, so I have been looking at this and found and fixed a couple of
problems in the DM that the test code shows. I have not put that back in
git yet, but its on the ftp site now if the OP wants to try it.
However, I think there is also a race condition in SQLAPI that is
causing fatal problems the DM is unlikely to be able to catch. I see the
application via the lib calling SQLFreeHandle on a env handle, and then
using the same (now free'd) handle to call SQLAllocHandle and
SQLDriverConnect. Its possible that it gets away with this (more by luck
than judgement) on Windows because of the shared env handle used with
pooling. I have a snipped of a DM log that shows the problem happening:
[ODBC][1701236][1684403048.206901][SQLFreeHandle.c][220]
Entry:
Handle Type = 1
Input Handle = 0x7f7014003a50
[ODBC][1701236][1684403048.207035][SQLAllocHandle.c][395]
Entry:
Handle Type = 2
Input Handle = 0x7f7014003a50
[ODBC][1701236][1684403048.207059][SQLAllocHandle.c][511]
Exit:[SQL_SUCCESS]
Output Handle = 0x7f6ffc000d40
[ODBC][1701236][1684403048.207093][SQLDriverConnect.c][751]
Entry:
Connection = 0x7f6ffc000d40
I am assuming because of the threaded nature of the application, the
code in SQLFreeHandle has not yet got to the point of releasing the
handle so the handle validation code hasn't had a chance to mark the
handle as invalid so its not caught on the following calls, but I think
its a question to ask the SQLAPI folk. As its a closed source commercial
lib, I guess its best that the OP ask's them the question in the first case.
|
Thank you very much for the quick responses. I'll try the updated version later today. |
On 18/05/2023 11:33, Arnar Bjarni Arnarson wrote:
Thank you very much for the quick responses.
I can raise the concerns to the maintainer of SQLAPI.
I'll try the updated version later today.
—
Reply to this email directly, view it on GitHub
<#141 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABYK62LYR5XK43PP3R53RF3XGX3I7ANCNFSM6AAAAAAYDUQAEU>.
You are receiving this because you commented.Message ID:
***@***.***>
Thanks. The version on ftp now will catch this condition and report to
stderr that its happened before returning SQL_INVALID_HANDLE
|
That's great, I'll make sure to turn on tracing to try to spot it. |
On 18/05/2023 11:38, Arnar Bjarni Arnarson wrote:
That's great, I'll make sure to turn on tracing to try to spot it.
It will still catch and display the error without tracing. Though it
will also happen with tracing on. Part of the problem is, being a race
condition, adding tracing may alter the order this happen in and when it
happens.
|
I've tried it and now it no longer crashes but it seems as if connection pooling isn't really active. Before the fix, "connecting" to the database usually took 0ms since the connection was already alive, now with the same settings connecting seems to take 7-13ms which is the same speed I saw before enabling connection pooling. I feel like the changes you described here should not have serious performance impacts so a new connection is being established each time. Any idea why that would happen after these changes? I have the same config as in the original post and I have it both in |
On 19/05/2023 15:11, Arnar Bjarni Arnarson wrote:
I've tried it and now it no longer crashes but it seems as if
connection pooling isn't really active. Before the fix, "connecting"
to the database usually took 0ms since the connection was already
alive, now with the same settings connecting seems to take 7-13ms
which is the same speed I saw before enabling connection pooling.
I feel like the changes you described here should not have serious
performance impacts so a new connection is being established each
time. Any idea why that would happen after these changes?
I have the same config as in the original post and I have it both in
|/etc/odbcinst.ini| and |/usr/local/etc/odbcinst.ini|
|What I see the test application (via the lib) doing is it creates an
environment, then creates a couple of connections on that env, then
disconnect them and return to the pool, it then releases the
environment. What I have changed is that releasing the application
environment releases the pooled connections. It then creates a new
application env and a few connections, closes them and releases the env.
There is some pooling going on but then releasing the environment clears
that collection of connections out.|
|Now, it may be that its possible that pooled connections could be
safely shared between application environments, but the change I made
prevents this from happening at the moment as it was partly this that
was causing the crash.|
|I suspect (but cant know as its closed source) that SQLAPI gets away by
mixing and matching application environments on windows because the
environment is shared. I did try adding shared environments with usage
counting but the lib runs the usage count down to zero so the DM would
release the env as it does now so there is no real advantage.|
… Message ID: <lurcher/unixODBC/issues/141/15546485
|
On 19/05/2023 15:11, Arnar Bjarni Arnarson wrote:
I've tried it and now it no longer crashes but it seems as if
connection pooling isn't really active. Before the fix, "connecting"
to the database usually took 0ms since the connection was already
alive, now with the same settings connecting seems to take 7-13ms
which is the same speed I saw before enabling connection pooling.
I feel like the changes you described here should not have serious
performance impacts so a new connection is being established each
time. Any idea why that would happen after these changes?
I have the same config as in the original post and I have it both in
|/etc/odbcinst.ini| and |/usr/local/etc/odbcinst.ini|
—
Reply to this email directly, view it on GitHub
<#141 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABYK62PS5NKYE6VXS4BINETXG55S5ANCNFSM6AAAAAAYDUQAEU>.
You are receiving this because you commented.Message ID:
***@***.***>
Try the one there now (on ftp) this implements a shared env handle when
pooling is used. It does mean that the env and pooled connection will be
there on program exit, but it cant release these if the shard env is in
place as there is no real end.
|
I will try it on Monday morning as I'm on holiday right now. I can avoid the usage count going down by maintaining an extra SAConnection instance as a means of "Keep Alive" for the pool. Worked with the sample code at least. |
The new version you uploaded works as well. No extra connection object required on my end to keep the pool alive. |
On 22/05/2023 11:21, Arnar Bjarni Arnarson wrote:
Try the one there now (on ftp) this implements a shared env handle
when pooling is used. It does mean that the env and pooled
connection will be there on program exit, but it cant release
these if the shard env is in place as there is no real end.
The new version you uploaded works as well. No extra connection object
required on my end to keep the pool alive.
—
Reply to this email directly, view it on GitHub
<#141 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABYK62PTDPNLNJ4H6C4TY5DXHM4ZBANCNFSM6AAAAAAYDUQAEU>.
You are receiving this because you commented.Message ID:
***@***.***>
Ok, So the latest version creates a single env if pooing is in use, I
think this matches what happens on Windows. But the only down side is
that its then not possible to close things down before exit. Given the
shared env stuff is enabled by defining SHARED_POOLED_ENV in
drivermanager.h, and that you have a solution to the same problem, my
instinct is always do the least possible to address problems, so I think
I would leave it undefined and you can either make your own build, or
use your extra connection object.
I think the main body of changes however do address the original problem
that was causing the seg fault.
Just what change did you make to the test code to generate the extra
connection? I would be interested in seeing if it fixes the use after
release problem I saw in the SQLAPI lib. The shared env does hid this,
but doesn't address the underlying issue. I am guessing the windows
shared env has hidden the problem up to now to the lib maintainer.
Will leave it a bit and then push back into git unless anyone else has a
view on this?
|
Just change the main function like so: int main(int argc, char* argv[])
{
setlocale(LC_ALL, "");
SAConnection keepAlive;
keepAlive.Connect("Server=someserver;Database=somedb;Driver={ODBC Driver 17 for SQL Server}", "someuser", "somepass", SA_ODBC_Client);
for (int j = 0; j < 1000; j++) {
vector<std::thread> threads;
for (int i = 0; i < 1000; i++) {
threads.emplace_back(connect_task, j, i);
auto random_wait = std::chrono::milliseconds(rand() % 100);
this_thread::sleep_for(random_wait);
}
for (auto& t : threads) {
t.join();
}
}
keepAlive.Disconnect();
return 0;
} |
On 22/05/2023 11:44, Arnar Bjarni Arnarson wrote:
SAConnection keepAlive;
keepAlive.Connect("Server=someserver;Database=somedb;Driver={ODBC Driver 17 for SQL Server}","someuser","somepass", SA_ODBC_Client);
Thanks.
|
On 22/05/2023 12:20, Nick Gorham wrote:
On 22/05/2023 11:44, Arnar Bjarni Arnarson wrote:
> SAConnection keepAlive;
> keepAlive.Connect("Server=someserver;Database=somedb;Driver={ODBC Driver 17 for SQL
> Server}","someuser","somepass", SA_ODBC_Client);
Thanks.
I have pushed the changes back that prevent the seg fault with SQLAPI
and this set of operations
|
With the keepalive connection changes in my company's software we tried running a stress test with v2.3.11 and v2.3.12pre. |
I am interested in this fix. Is there an estimate when v2.3.12 will be released? Thanks! |
On 02/07/2023 11:00, Guillermo Céspedes Tabárez wrote:
I am interested in this fix. Is there an estimate when v2.3.12 will be
released? Thanks!
Not at the moment, but the code is there to be used.
|
I have connection pooling enabled with the Microsoft ODBC Driver for SQL Server.
I have a multithreaded application where each thread except the main thread opens and closes connections in sequence.
It seems that the same pooled connection is being closed twice in the pool search.
This causes the program to crash in close_pooled_connection with the error displayed as: double free detected in tcache 2
Without connection pooling, there is no issue.
I'm using C++ with another library, SQLAPI++, but following the problem led me to the closed_pooled_connection function here.
I have tried isolating the connection code within my program with a mutex, since I thought it was a thread safety issue originally, but the issue seems to occur even with isolation.
My odbcinst.ini file:
I've tried unixODBC versions 2.3.9, 2.3.11 and 2.3.12pre, issue is present in all of them.
I'm on Ubuntu 22.04, but tested on RHEL9 as well, issue present on both.
The text was updated successfully, but these errors were encountered: