Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Program crashes with connection pooling enabled #141

Open
Tagl opened this issue May 16, 2023 · 25 comments
Open

Program crashes with connection pooling enabled #141

Tagl opened this issue May 16, 2023 · 25 comments

Comments

@Tagl
Copy link

Tagl commented May 16, 2023

I have connection pooling enabled with the Microsoft ODBC Driver for SQL Server.
I have a multithreaded application where each thread except the main thread opens and closes connections in sequence.
It seems that the same pooled connection is being closed twice in the pool search.
This causes the program to crash in close_pooled_connection with the error displayed as: double free detected in tcache 2
Without connection pooling, there is no issue.

I'm using C++ with another library, SQLAPI++, but following the problem led me to the closed_pooled_connection function here.
I have tried isolating the connection code within my program with a mutex, since I thought it was a thread safety issue originally, but the issue seems to occur even with isolation.

My odbcinst.ini file:

[ODBC]
Pooling=Yes

[ODBC Driver 17 for SQL Server]
Description=Microsoft ODBC Driver 17 for SQL Server
Driver=/opt/microsoft/msodbcsql17/lib64/libmsodbcsql-17.10.so.1.1
UsageCount=1
CPTimeout=120
#0  0x00007ffff6a8e88d in ?? () from /opt/microsoft/msodbcsql17/lib64/libmsodbcsql-17.10.so.2.1
No symbol table info available.
#1  0x00007ffff6a9410f in SQLFreeHandle () from /opt/microsoft/msodbcsql17/lib64/libmsodbcsql-17.10.so.2.1
No symbol table info available.
#2  0x00007ffff6eda700 in close_pooled_connection (ptr=0x7fffec036a40) at SQLConnect.c:2924
        ret = 0
        conn = 0x7fffec036a68
#3  0x00007ffff6edaefa in search_for_pool (connection=0x7fffa40008e0, server_name=0x0, name_length1=0, user_name=0x0, name_length2=0, authentication=0x0, name_length3=0,
    connect_string=0x7fffa40129e0 "Server=****;Database=****;Driver={ODBC Driver 17 for SQL Server};UID=****;PWD=****", connect_string_length=-3, pooh=0x7fff657cb018, retrying=0) at SQLConnect.c:3290
        ret = -11168
        conn_match = 1
        current_time = 1684236577
        dead = 0
        ptrh = 0x7ffff0007010
        prevh = 0x0
        match_head = 0x7ffff0007010
        ptre = 0x7fffec036a40
        preve = 0x7fffe0036a80
        has_checked = 0
#4  0x00007ffff6ee2166 in SQLDriverConnect (hdbc=0x7fffa40008e0, hwnd=0x0, conn_str_in=0x7fffa40129e0 "Server=****;Database=****;Driver={ODBC Driver 17 for SQL Server};UID=****;PWD=****", len_conn_str_in=-3,
    conn_str_out=0x7fff657d1810 "", conn_str_out_max=1024, ptr_conn_str_out=0x7fff657d17cc, driver_completion=0) at SQLDriverConnect.c:905
        retpool = 0
        retrying = 0
        wait_begin = 1684236577
        connection = 0x7fffa40008e0
        con_struct = {count = 5, list = 0x7fffa405c5b0}
        driver = 0x0
        dsn = 0x0
        filedsn = 0x0
        tsavefile = 0x0
        savefile = '\000' <repeats 1000 times>
        lib_name = '\000' <repeats 1000 times>
        driver_name = '\000' <repeats 1000 times>
        ret_from_connect = 0
        s1 = '\000' <repeats 2047 times>
        local_conn_str_in = '\000' <repeats 2047 times>
        local_out_conection = '\000' <repeats 2047 times>
        save_filedsn = 0x0
        warnings = 0
        pooh = 0x0
#5  0x00007ffff6ee1a16 in SQLDriverConnectA (hdbc=0x7fffa40008e0, hwnd=0x0, conn_str_in=0x7fffa40129e0 "Server=****;Database=****;Driver={ODBC Driver 17 for SQL Server};UID=****;PWD=****", len_conn_str_in=-3,
    conn_str_out=0x7fff657d1810 "", conn_str_out_max=1024, ptr_conn_str_out=0x7fff657d17cc, driver_completion=0) at SQLDriverConnect.c:669
No locals.
#6  0x000055555557af8d in IodbcConnection::Connect (this=0x7fffa40337c0, sDBString=..., sUserID=..., sPassword=..., fHandler=0x0) at odbcClient.cpp:526
        s = {m_pchData = 0x7fffa40129e0 "Server=****;Database=****;Driver={ODBC Driver 17 for SQL Server};UID=****;PWD=****"}
        OutConnectionString = '\000' <repeats 664 times>...
        StringLength2 = 0
        sOption = {m_pchData = 0x5555555983d0 <_saInitData+48> ""}
        uDriverCompletion = 0
        __PRETTY_FUNCTION__ = "virtual void IodbcConnection::Connect(const SAString&, const SAString&, const SAString&, saConnectionHandler_t)"
        hdbc = 0x7fffa40008e0
#7  0x000055555555f90c in SAConnection::Connect (this=0x7fff657d1cf0, sDBString=..., sUserID=..., sPassword=..., eSAClient=SA_ODBC_Client, fHandler=0x0) at SQLAPI.cpp:1622
        pISAConnection = 0x7fffa40337c0
#8  0x0000555555559ca0 in connect_task (pool_id=10, thread_id=208) at sqltest.cpp:34
        random_wait = {__r = 140735945232960}
        con = {<SAOptions> = {_vptr.SAOptions = 0x555555597060 <vtable for SAConnection+16>, m_pOptions = 0x7fffa405c5e0}, m_pSAPI = 0x7ffff0000d10, m_pCommandsMutex = 0x7fffa4033800, m_pCommands = 0x0,
          m_eIsolationLevel = SA_LevelUnknown, m_eAutoCommit = SA_AutoCommitUnknown, nReserved = 0}
        j = 0
#9  0x000055555555c363 in std::__invoke_impl<void, void (*)(int, int), int, int> (__f=@0x5555555b8a10: 0x555555559b69 <connect_task(int, int)>) at /usr/include/c++/11/bits/invoke.h:61
No locals.
#10 0x000055555555c2a0 in std::__invoke<void (*)(int, int), int, int> (__fn=@0x5555555b8a10: 0x555555559b69 <connect_task(int, int)>) at /usr/include/c++/11/bits/invoke.h:96
No locals.
#11 0x000055555555c1bf in std::thread::_Invoker<std::tuple<void (*)(int, int), int, int> >::_M_invoke<0ul, 1ul, 2ul> (this=0x5555555b8a08) at /usr/include/c++/11/bits/std_thread.h:253
No locals.
#12 0x000055555555c158 in std::thread::_Invoker<std::tuple<void (*)(int, int), int, int> >::operator() (this=0x5555555b8a08) at /usr/include/c++/11/bits/std_thread.h:260
No locals.
#13 0x000055555555c138 in std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (*)(int, int), int, int> > >::_M_run (this=0x5555555b8a00) at /usr/include/c++/11/bits/std_thread.h:211
No locals.
--Type <RET> for more, q to quit, c to continue without paging--c
#14 0x00007ffff7e5d2b3 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
No symbol table info available.
#15 0x00007ffff7b06b43 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
        ret = <optimized out>
        pd = <optimized out>
        out = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140737488346848, 1118859342958286079, 140734896088640, 0, 140737348921424, 140737488347200, -1119084726718049025, -1118841299216771841}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = <optimized out>
#16 0x00007ffff7b98a00 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

I've tried unixODBC versions 2.3.9, 2.3.11 and 2.3.12pre, issue is present in all of them.
I'm on Ubuntu 22.04, but tested on RHEL9 as well, issue present on both.

@lurcher
Copy link
Owner

lurcher commented May 16, 2023

Do you have your sample code that I can try to help track down the problem?

@lurcher
Copy link
Owner

lurcher commented May 16, 2023 via email

@Tagl
Copy link
Author

Tagl commented May 16, 2023

Yes, sorry for not including that in the original post. Here is a sample program that creates multiple threads, each with a connection.
It also uses the library I mentioned before.
I've run something similar to this (tweaked number of thread pools / threads / iterations per thread / added mutexes) a hundred times and almost every single time it crashes due to this error within 15 minutes.
I have not noticed any particular pattern regarding when it crashes, seems rather random, but it's always in the same code path.
I have closed source software in which I discovered this where it generally crashes after around 5 minutes.

#include <SQLAPI.h> // main SQLAPI++ header

#include <algorithm>
#include <chrono>
#include <iostream>
#include <thread>
#include <vector>

using namespace std;

void connect_task(int pool_id, int thread_id) {
    for (int j = 0; j < 1; j++) {
        SAConnection con;
        try {
            con.Connect("Server=someserver;Database=somedb;Driver={ODBC Driver 17 for SQL Server}", "someuser", "somepass", SA_ODBC_Client);
            auto random_wait = std::chrono::milliseconds(rand() % 100);
            this_thread::sleep_for(random_wait);

            if (con.isConnected()) {
                con.Disconnect();
            }
        }
        catch (SAException& x)
        {
            try
            {
                con.Rollback();
            }
            catch (SAException&)
            {
            }
            cout << "Error text: " << x.ErrText().GetMultiByteChars() << endl;
        }
        catch (...)
        {
            cout << "unknown exception" << endl;
        }
    }
}

int main(int argc, char* argv[])
{
    setlocale(LC_ALL, "");

    for (int j = 0; j < 1000; j++) {
        vector<std::thread> threads;
        for (int i = 0; i < 1000; i++) {
            threads.emplace_back(connect_task, j, i);
            auto random_wait = std::chrono::milliseconds(rand() % 100);
            this_thread::sleep_for(random_wait);
        }
        for (auto& t : threads) {
            t.join();
        }
    }
        return 0;
}

@lurcher
Copy link
Owner

lurcher commented May 16, 2023 via email

@lurcher
Copy link
Owner

lurcher commented May 17, 2023 via email

@mcordova1967
Copy link

mcordova1967 commented May 17, 2023 via email

@lurcher
Copy link
Owner

lurcher commented May 17, 2023 via email

@mcordova1967
Copy link

mcordova1967 commented May 17, 2023 via email

@lurcher
Copy link
Owner

lurcher commented May 18, 2023 via email

@Tagl
Copy link
Author

Tagl commented May 18, 2023

Thank you very much for the quick responses.
I can raise the concerns to the maintainer of SQLAPI.

I'll try the updated version later today.

@lurcher
Copy link
Owner

lurcher commented May 18, 2023 via email

@Tagl
Copy link
Author

Tagl commented May 18, 2023

That's great, I'll make sure to turn on tracing to try to spot it.

@lurcher
Copy link
Owner

lurcher commented May 18, 2023 via email

@Tagl
Copy link
Author

Tagl commented May 19, 2023

I've tried it and now it no longer crashes but it seems as if connection pooling isn't really active. Before the fix, "connecting" to the database usually took 0ms since the connection was already alive, now with the same settings connecting seems to take 7-13ms which is the same speed I saw before enabling connection pooling.

I feel like the changes you described here should not have serious performance impacts so a new connection is being established each time. Any idea why that would happen after these changes?

I have the same config as in the original post and I have it both in /etc/odbcinst.ini and /usr/local/etc/odbcinst.ini

@lurcher
Copy link
Owner

lurcher commented May 19, 2023 via email

@lurcher
Copy link
Owner

lurcher commented May 20, 2023 via email

@Tagl
Copy link
Author

Tagl commented May 20, 2023

I will try it on Monday morning as I'm on holiday right now. I can avoid the usage count going down by maintaining an extra SAConnection instance as a means of "Keep Alive" for the pool. Worked with the sample code at least.

@Tagl
Copy link
Author

Tagl commented May 22, 2023

Try the one there now (on ftp) this implements a shared env handle when pooling is used. It does mean that the env and pooled connection will be there on program exit, but it cant release these if the shard env is in place as there is no real end.

The new version you uploaded works as well. No extra connection object required on my end to keep the pool alive.

@lurcher
Copy link
Owner

lurcher commented May 22, 2023 via email

@Tagl
Copy link
Author

Tagl commented May 22, 2023

Just what change did you make to the test code to generate the extra
connection? I would be interested in seeing if it fixes the use after
release problem I saw in the SQLAPI lib. The shared env does hid this,
but doesn't address the underlying issue. I am guessing the windows
shared env has hidden the problem up to now to the lib maintainer.

Just change the main function like so:

int main(int argc, char* argv[])
{
    setlocale(LC_ALL, "");
    SAConnection keepAlive;
    keepAlive.Connect("Server=someserver;Database=somedb;Driver={ODBC Driver 17 for SQL Server}", "someuser", "somepass", SA_ODBC_Client);
    for (int j = 0; j < 1000; j++) {
        vector<std::thread> threads;
        for (int i = 0; i < 1000; i++) {
            threads.emplace_back(connect_task, j, i);
            auto random_wait = std::chrono::milliseconds(rand() % 100);
            this_thread::sleep_for(random_wait);
        }
        for (auto& t : threads) {
            t.join();
        }
    }
    keepAlive.Disconnect();
    return 0;
}

@lurcher
Copy link
Owner

lurcher commented May 22, 2023 via email

@lurcher
Copy link
Owner

lurcher commented May 31, 2023 via email

@Tagl
Copy link
Author

Tagl commented Jun 2, 2023

With the keepalive connection changes in my company's software we tried running a stress test with v2.3.11 and v2.3.12pre.
Both ran without crashing but I saw the memory leak mentioned before on 2.3.11 and the test was cut short due to the machine running out of memory. I also ran 2.3.12pre without keepalive changes and it did not crash.
Both the leak and the crash seems to be completely resolved in 2.3.12pre.
Thanks for the quick responses, issue is resolved.

@dertin
Copy link

dertin commented Jul 2, 2023

I am interested in this fix. Is there an estimate when v2.3.12 will be released? Thanks!

@lurcher
Copy link
Owner

lurcher commented Jul 2, 2023 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants