Skip to content

Commit

Permalink
fix(profiling): fix data race when accessing span for thread [backpor…
Browse files Browse the repository at this point in the history
…t 2.16] (#11210)

Backport 64b3374 from #11167 to 2.16.

The ThreadSpanLinks singleton holds the active span (if one exists) for
a given thread ID. The `get_active_span_from_thread_id` member function
returns a pointer to the active span for a thread. The `link_span`
member
function sets the active span for a thread.
`get_active_span_from_thread_id`
accesses the map of spans under a mutex, but returns the pointer after
releasing the mutex, meaning `link_span` can modify the members of the
Span while the caller of `get_active_span_from_thread_id` is reading
them.

Fix this by returning a copy of the `Span`. Use a `std::optional` to
wrap
the return value of `get_active_span_from_thread_id`, rather than
returning a pointer. We want to tell whether or not there actually was a
span associated with the thread, but returning a pointer would require
us to heap allocate the copy of the Span.

I added a simplistic regression test which fails reliably without this
fix
when built with the thread sanitizer enabled. Output like:

```
WARNING: ThreadSanitizer: data race (pid=2971510)
  Read of size 8 at 0x7b2000004080 by thread T2:
    #0 memcpy ../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:823 (libtsan.so.0+0x42313)
    #1 memcpy ../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:815 (libtsan.so.0+0x42313)
    #2 std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_assign(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) <null> (libstdc++.so.6+0x1432b4)
    #3 std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > std::__invoke_impl<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > (*)()>(std::__invoke_other, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > (*&&)()) <null> (thread_span_links+0xe46e)
    #4 std::__invoke_result<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > (*)()>::type std::__invoke<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > (*)()>(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > (*&&)()) <null> (thread_span_links+0xe2fe)
    #5 std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > std::thread::_Invoker<std::tuple<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > (*)()> >::_M_invoke<0ul>(std::_Index_tuple<0ul>) <null> (thread_span_links+0xe1cf)
    #6 std::thread::_Invoker<std::tuple<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > (*)()> >::operator()() <null> (thread_span_links+0xe0f6)
    #7 std::thread::_State_impl<std::thread::_Invoker<std::tuple<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > (*)()> > >::_M_run() <null> (thread_span_links+0xdf40)
    #8 <null> <null> (libstdc++.so.6+0xd6df3)

  Previous write of size 8 at 0x7b2000004080 by thread T1 (mutexes: write M47):
    #0 memcpy ../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:823 (libtsan.so.0+0x42313)
    #1 memcpy ../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:815 (libtsan.so.0+0x42313)
    #2 std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_assign(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocato
r<char> > const&) <null> (libstdc++.so.6+0x1432b4)
    #3 get() <null> (thread_span_links+0xb570)
    #4 void std::__invoke_impl<void, void (*)()>(std::__invoke_other, void (*&&)()) <null> (thread_span_links+0xe525)
    #5 std::__invoke_result<void (*)()>::type std::__invoke<void (*)()>(void (*&&)()) <null> (thread_span_links+0xe3b5)
    #6 void std::thread::_Invoker<std::tuple<void (*)()> >::_M_invoke<0ul>(std::_Index_tuple<0ul>) <null> (thread_span_links+0xe242)
    #7 std::thread::_Invoker<std::tuple<void (*)()> >::operator()() <null> (thread_span_links+0xe158)
[ ... etc ... ]
```

## Checklist
- [x] PR author has checked that all the criteria below are met
- The PR description includes an overview of the change
- The PR description articulates the motivation for the change
- The change includes tests OR the PR description describes a testing
strategy
- The PR description notes risks associated with the change, if any
- Newly-added code is easy to change
- The change follows the [library release note
guidelines](https://ddtrace.readthedocs.io/en/stable/releasenotes.html)
- The change includes or references documentation updates if necessary
- Backport labels are set (if
[applicable](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting))

## Reviewer Checklist
- [x] Reviewer has checked that all the criteria below are met 
- Title is accurate
- All changes are related to the pull request's stated goal
- Avoids breaking
[API](https://ddtrace.readthedocs.io/en/stable/versioning.html#interfaces)
changes
- Testing strategy adequately addresses listed risks
- Newly-added code is easy to change
- Release note makes sense to a user of the library
- If necessary, author has acknowledged and discussed the performance
implications of this PR as reported in the benchmarks PR comment
- Backport labels are set in a manner that is consistent with the
[release branch maintenance
policy](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting)

Co-authored-by: Nick Ripley <[email protected]>
Co-authored-by: Taegyun Kim <[email protected]>
  • Loading branch information
3 people authored Oct 30, 2024
1 parent 4079dab commit 61ef93b
Show file tree
Hide file tree
Showing 7 changed files with 95 additions and 7 deletions.
6 changes: 6 additions & 0 deletions ddtrace/internal/datadog/profiling/stack_v2/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -117,3 +117,9 @@ if(LIB_INSTALL_DIR)
ARCHIVE DESTINATION ${LIB_INSTALL_DIR}
RUNTIME DESTINATION ${LIB_INSTALL_DIR})
endif()

if(BUILD_TESTING)
enable_testing()
add_subdirectory(test)
endif()

Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@

#include <memory>
#include <mutex>
#include <optional>
#include <stdint.h>
#include <string>
#include <unordered_map>
Expand Down Expand Up @@ -36,7 +37,7 @@ class ThreadSpanLinks
ThreadSpanLinks& operator=(ThreadSpanLinks const&) = delete;

void link_span(uint64_t thread_id, uint64_t span_id, uint64_t local_root_span_id, std::string span_type);
const Span* get_active_span_from_thread_id(uint64_t thread_id);
const std::optional<Span> get_active_span_from_thread_id(uint64_t thread_id);
void reset();

static void postfork_child();
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -55,8 +55,8 @@ StackRenderer::render_thread_begin(PyThreadState* tstate,
ddup_push_threadinfo(sample, static_cast<int64_t>(thread_id), static_cast<int64_t>(native_id), name);
ddup_push_walltime(sample, thread_state.wall_time_ns, 1);

const Span* active_span = ThreadSpanLinks::get_instance().get_active_span_from_thread_id(thread_id);
if (active_span != nullptr) {
const std::optional<Span> active_span = ThreadSpanLinks::get_instance().get_active_span_from_thread_id(thread_id);
if (active_span) {
ddup_push_span_id(sample, active_span->span_id);
ddup_push_local_root_span_id(sample, active_span->local_root_span_id);
ddup_push_trace_type(sample, std::string_view(active_span->span_type));
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@

#include <iostream>
#include <mutex>
#include <optional>
#include <stdint.h>
#include <string>

Expand All @@ -19,15 +20,17 @@ ThreadSpanLinks::link_span(uint64_t thread_id, uint64_t span_id, uint64_t local_
thread_id_to_span[thread_id]->span_type = span_type;
}

const Span*
const std::optional<Span>
ThreadSpanLinks::get_active_span_from_thread_id(uint64_t thread_id)
{
std::lock_guard<std::mutex> lock(mtx);

if (thread_id_to_span.find(thread_id) == thread_id_to_span.end()) {
return nullptr;
std::optional<Span> span;
auto it = thread_id_to_span.find(thread_id);
if (it != thread_id_to_span.end()) {
span = *(it->second);
}
return thread_id_to_span[thread_id].get();
return span;
}

void
Expand Down
25 changes: 25 additions & 0 deletions ddtrace/internal/datadog/profiling/stack_v2/test/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
FetchContent_Declare(
googletest
GIT_REPOSITORY https://github.com/google/googletest.git
GIT_TAG release-1.11.0)
set(gtest_force_shared_crt
ON
CACHE BOOL "" FORCE)
set(INSTALL_GTEST
OFF
CACHE BOOL "" FORCE)
FetchContent_MakeAvailable(googletest)
include(GoogleTest)
include(AnalysisFunc)

function(dd_wrapper_add_test name)
add_executable(${name} ${ARGN})
target_include_directories(${name} PRIVATE ../include)
target_link_libraries(${name} PRIVATE gmock gtest_main _stack_v2)
add_ddup_config(${name})

gtest_discover_tests(${name})
endfunction()

# Add the tests
dd_wrapper_add_test(thread_span_links thread_span_links.cpp)
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
#include <gtest/gtest.h>

#include <string>
#include <thread>

#include "thread_span_links.hpp"

static void
get()
{
for (int i = 0; i < 100; i++) {
std::string span_type;
for (int j = 0; j < i; j++) {
span_type.append("a");
}
Datadog::ThreadSpanLinks::get_instance().link_span(42, 1, 2, span_type);
}
}

static std::string
set()
{
std::string s;
for (int i = 0; i < 100; i++) {
auto thing = Datadog::ThreadSpanLinks::get_instance().get_active_span_from_thread_id(42);
if (!thing) {
continue;
}
s = thing->span_type;
}
return s;
}

TEST(ThreadSpanLinksConcurrency, GetSetRace)
{
std::thread t1(get);
std::thread t2(set);
t1.join();
t2.join();
}

int
main(int argc, char** argv)
{
::testing::InitGoogleTest(&argc, argv);
(void)(::testing::GTEST_FLAG(death_test_style) = "threadsafe");
return RUN_ALL_TESTS();
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
---
fixes:
- |
profiling: fix a data race where span information associated with a thread
was read and updated concurrently, leading to segfaults

0 comments on commit 61ef93b

Please sign in to comment.