-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add TaskContext and Data as task argument #40
Conversation
WalkthroughThis pull request introduces several enhancements to the Spider project's architecture, focusing on data management, task context, and storage interactions. Key modifications include the addition of new header files, updates to library linking, and significant changes to the Changes
Possibly Related PRs
Tip CodeRabbit's docstrings feature is now available as part of our Early Access Program! Simply use the command Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
🧹 Nitpick comments (20)
src/spider/client/Data.hpp (1)
102-124
: build method: Data creation and storage addition.
• The switch over DataSource is correct.
• Good job raising exceptions on storage failures.
• Consider verifying that the data size is not too large for the store in case of resource limitations.src/spider/worker/task_executor.cpp (2)
86-89
: Require storage_url presence.
Returning cCmdArgParseErr if not present is appropriate. Maybe log a descriptive error if “storage_url” is missing for debugging.🧰 Tools
🪛 cppcheck (2.10-2)
[performance] 88-88: Variable 'id' is assigned in constructor body. Consider performing initialization in initialization list.
(useInitializationList)
109-124
: Connect to data and metadata storage.
• Good error handling with cStorageErr.
• Consider logging success messages to confirm a successful connection.src/spider/worker/FunctionManager.hpp (4)
48-48
: Function alias uses TaskContext
We see that Function now requires a TaskContext as its first parameter. This is a beneficial approach for context-awareness, but ensure we handle concurrency concerns if TaskContext is shared across threads.
287-288
: Potential concurrency concerns
We retrieve a shared pointer to DataStorage. If concurrent tasks share DataStorage, ensure that all accesses remain thread-safe.
295-302
: Argument size check
We compare std::tuple_size_v - 1 to object.via.array.size. Ensure that the logic remains accurate if more parameters are added to TaskContext in future expansions.
313-342
: Argument parsing flow
The code strongly checks for data retrieval errors and gracefully returns error responses. This is a good practice. However, consider clarifying the error messages to specify which argument fails to parse if possible.src/spider/core/DataImpl.hpp (1)
1-2
: Header guard
The header guard is standard. Ensure the naming convention (SPIDER_CORE_DATAIMPL_HPP) matches the project's guidelines.tests/worker/worker-test.cpp (3)
4-5
: Header location
These relative includes reference ../../src/spider/client/Data.hpp and TaskContext.hpp. Confirm that test includes remain aligned with the broader project structure for future relocations.
14-14
: Error test function
Accepting TaskContext provides an avenue for more context-aware error handling. For example, logging metadata about the failing task. Consider leveraging the context.
26-26
: Task registration
The data_test function is now registered. This looks correct. Make sure to add associated integration tests for data flows.src/spider/core/TaskContextImpl.hpp (3)
6-8
: Confirm correct storage includes
We reference DataStorage.hpp and MetadataStorage.hpp—double-check that the forward declarations are not sufficient, or otherwise prefer them if you only use pointers.
20-22
: get_data_store
Returning a shared pointer from inside the TaskContext fosters concurrency safety. Confirm that all references to m_data_store are read/written safely.
24-27
: get_metadata_store
Similar to get_data_store, ensure that metadata manipulations are properly synchronized if multiple tasks can access them simultaneously.src/spider/storage/DataStorage.hpp (1)
29-29
: Please add documentation for set_data_localityThe new method follows good design patterns, but would benefit from documentation explaining:
- Purpose and expected behaviour
- Parameter requirements
- Possible error conditions
Consider adding a comment block:
/** * Sets the locality information for the specified data. * @param data The data object containing locality information to be set * @return StorageErr indicating success or specific error condition */src/spider/worker/TaskExecutor.hpp (2)
Line range hint
37-52
: Add storage_url validationWhile the storage_url parameter is properly added, consider validating the URL format to prevent runtime issues.
template <class... Args> TaskExecutor( boost::asio::io_context& context, std::string const& func_name, std::string const& storage_url, std::vector<std::string> const& libs, absl::flat_hash_map< boost::process::v2::environment::key, boost::process::v2::environment::value> const& environment, Args&&... args ) : m_read_pipe(context), m_write_pipe(context) { + if (storage_url.empty()) { + throw std::invalid_argument("storage_url cannot be empty"); + } std::vector<std::string> process_args{"--func", func_name, "--storage_url", storage_url, "--libs"};
Line range hint
82-92
: Ensure consistent storage_url validationThe same storage_url validation should be applied to this constructor overload for consistency.
TaskExecutor( boost::asio::io_context& context, std::string const& func_name, std::string const& storage_url, std::vector<std::string> const& libs, absl::flat_hash_map< boost::process::v2::environment::key, boost::process::v2::environment::value> const& environment, std::vector<msgpack::sbuffer> const& args_buffers ) : m_read_pipe(context), m_write_pipe(context) { + if (storage_url.empty()) { + throw std::invalid_argument("storage_url cannot be empty"); + } std::vector<std::string> process_args{"--func", func_name, "--storage_url", storage_url, "--libs"};tests/integration/test_scheduler_worker.py (2)
141-141
: Avoid hard-coded IP addresses in tests.Consider using a configuration variable or environment variable for the driver address to make tests more flexible and maintainable.
- driver = Driver(id=uuid.uuid4(), addr="127.0.0.1") + driver = Driver(id=uuid.uuid4(), addr=os.getenv("TEST_DRIVER_ADDR", "127.0.0.1"))
194-202
: Consider implementing a more robust wait mechanism.Using a fixed sleep time of 2 seconds could make the test flaky. Consider implementing a polling mechanism with timeout.
+ def wait_for_task_completion(self, storage, task_id, timeout=10): + start_time = time.time() + while time.time() - start_time < timeout: + state = get_task_state(storage, task_id) + if state in ["success", "fail"]: + return state + time.sleep(0.1) + raise TimeoutError(f"Task {task_id} did not complete within {timeout} seconds") + def test_data_job(self, scheduler_worker, storage, data_job): task = data_job - # Wait for 2 seconds and check task output - time.sleep(2) - state = get_task_state(storage, task.id) + state = self.wait_for_task_completion(storage, task.id) assert state == "success" outputs = get_task_outputs(storage, task.id) assert len(outputs) == 1 assert outputs[0].value == msgpack.packb(2).decode("utf-8")tests/worker/test-TaskExecutor.cpp (1)
127-174
: Improve test robustness and remove redundant checksConsider these improvements:
- Verify data cleanup success by attempting to retrieve the deleted data
- Remove redundant if condition around REQUIRE as has_value() is already verified
- if (optional_result.has_value()) { - REQUIRE(3 == optional_result.value()); - } + REQUIRE(3 == optional_result.value()); // Clean up REQUIRE(data_storage->remove_data(data.get_id()).success()); + // Verify cleanup + Data retrieved_data{data.get_id(), ""}; + REQUIRE(!data_storage->get_data(data.get_id(), &retrieved_data).success());
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (20)
src/spider/CMakeLists.txt
(2 hunks)src/spider/client/Data.hpp
(5 hunks)src/spider/client/Driver.hpp
(2 hunks)src/spider/client/TaskContext.hpp
(3 hunks)src/spider/core/Data.hpp
(1 hunks)src/spider/core/DataImpl.hpp
(1 hunks)src/spider/core/TaskContextImpl.hpp
(1 hunks)src/spider/storage/DataStorage.hpp
(1 hunks)src/spider/storage/MysqlStorage.cpp
(1 hunks)src/spider/storage/MysqlStorage.hpp
(1 hunks)src/spider/worker/FunctionManager.hpp
(6 hunks)src/spider/worker/TaskExecutor.hpp
(5 hunks)src/spider/worker/task_executor.cpp
(7 hunks)src/spider/worker/worker.cpp
(3 hunks)tests/CMakeLists.txt
(2 hunks)tests/integration/client.py
(2 hunks)tests/integration/test_scheduler_worker.py
(3 hunks)tests/worker/test-FunctionManager.cpp
(3 hunks)tests/worker/test-TaskExecutor.cpp
(4 hunks)tests/worker/worker-test.cpp
(1 hunks)
🧰 Additional context used
🪛 cppcheck (2.10-2)
src/spider/worker/task_executor.cpp
[performance] 88-88: Variable 'id' is assigned in constructor body. Consider performing initialization in initialization list.
(useInitializationList)
tests/worker/test-FunctionManager.cpp
[error] 39-39: syntax error
(syntaxError)
tests/worker/test-TaskExecutor.cpp
[error] 56-56: syntax error
(syntaxError)
🔇 Additional comments (70)
src/spider/client/TaskContext.hpp (9)
4-7
: Includes look good.
No concerns identified; the added headers and utilities seem appropriate for the new functionality.
12-13
: Confirmed usage of Serializer and Data headers.
These includes are necessary for serialization support and data management.
19-22
: Forward declarations verified.
Declaring DataStorage, MetadataStorage, and TaskContextImpl here avoids unnecessary header dependencies in this file.
45-50
: Validate coverage of template method get_data_builder().
The template function to retrieve a Data builder is well-defined. Please confirm callers handle potential exceptions from the underlying storage operations, if any.
131-132
: Default constructor usage.
Having a default constructor maintains flexibility for creating empty TaskContext objects, but be mindful of usage in contexts where storage must be initialized.
133-140
: New constructor sets up storage references.
The constructor consistently sets internal references to data and metadata stores. Great for ensuring the TaskContext is readily available with storage resources.
141-143
: Private getters for internal usage.
Providing accessors to the private shared pointers centralizes how they're retrieved, preserving encapsulation. No issues observed.
145-146
: Private members to manage data & metadata store.
These shared pointers are appropriately encapsulated. Just ensure references remain valid throughout the TaskContext's lifetime.
148-148
: Friend declaration for core::TaskContextImpl.
Allowing TaskContextImpl to construct fully-initialized TaskContext objects is consistent with the design.
src/spider/client/Data.hpp (11)
4-8
: Includes and standard library usage are fine.
No issues with the inclusion of , , , , and .
11-17
: External dependencies validated.
The usage of Boost UUID, MsgPack, and DataStorage references is consistent with the class’s needs for serialization and storage.
48-51
: Unpack logic in get() method.
Please ensure that the packed data is always valid, as msgpack::unpack can throw on corrupted data. A try/catch or a check might be advisable if corruption is possible.
61-65
: set_locality: Potential concurrency considerations.
If multiple threads set locality simultaneously, ensure DataStorage’s thread safety. If it is guaranteed thread-safe, no further issues.
77-81
: set_locality builder method.
Straightforward assignment. Ensure usage clarifies that data must call build() afterwards to finalize the object.
90-93
: set_cleanup_func in builder.
The design of storing a cleanup function is flexible. Confirm that the function is invoked at the correct time in the data object’s lifecycle.
153-154
: Default constructor provides flexibility.
No immediate issues. This can be convenient for scenarios requiring placeholder Data objects.
156-159
: Private constructor for Data with unique_ptr implementation.
Ensuring the Data object can only be created via build or internal logic helps maintain consistency.
160-160
: get_impl() method.
Returning a const reference to the unique_ptr is a neat approach. Just ensure that external code does not store this reference longer than the Data object’s scope.
162-164
: Internal Data pointers.
Storing the implementation pointer as a unique_ptr is correct to enforce ownership semantics. The shared_ptr to DataStorage also aligns with your usage pattern.
165-165
: Friend class declaration for core::DataImpl.
Enables that class to manage and access internal pointers for advanced logic. No issues noted.
src/spider/worker/task_executor.cpp (6)
5-5
: Include for smart pointers.
No concerns here.
20-26
: Include TaskContext and storage headers.
These includes are expected for creating and using TaskContext, DataStorage, and MetadataStorage.
43-47
: storage_url argument.
Ensuring users can pass a storage URL is crucial. Confirm that it’s documented in usage instructions.
62-66
: New error codes for clarity.
Defining separate codes clarifies error handling logic.
80-80
: Reading func_name from args.
No issues. If an invalid function name is provided, we handle it in subsequent checks.
162-163
: Create TaskContext with data & metadata store references.
Ensures tasks can access the correct storage within the executor.
tests/worker/test-FunctionManager.cpp (18)
2-6
: Includes align with test coverage.
Boost UUID and Catch2 are relevant for testing.
8-10
: Boost UUID usage.
Boost for UUID is appropriate; random generation for test objects is valid.
13-16
: Client and core headers included.
Connecting Data, Driver, TaskContext, and TaskContextImpl for test coverage is consistent.
19-19
: StorageTestHelper usage.
This header is used to create test storage, ensuring coverage for multiple storage backends.
22-23
: int_test function with TaskContext.
The function simply adds two integers but takes a context parameter to match the new function signature. Fine for demonstrations.
26-27
: tuple_ret_test function with TaskContext.
Returning a tuple from a function that now includes a TaskContext parameter. Straightforward approach.
31-33
: data_test function to confirm Data usage.
Retrieves an int from spider::Data. This clarifies that data is read properly from storage.
37-37
: SPIDER_WORKER_REGISTER_TASK for data_test.
Ensures the new data_test is discoverable by the FunctionManager.
39-49
: First test block: Register and run function with POD inputs.
• Great approach for verifying the new TaskContext usage in manager calls.
• The function is fetched by name and arguments are validated.
🧰 Tools
🪛 cppcheck (2.10-2)
[error] 39-39: syntax error
(syntaxError)
58-58
: Ensure function pointer is not null.
Ensures retrieval of a valid function from the manager.
63-63
: Invoke the function with valid arguments.
This tests correct argument packaging and function invocation with new context usage.
70-70
: Wrong number of arguments test.
Properly checks for error handling in the invocation.
81-81
: Wrong type of arguments test.
Again, confirms the correctness of error handling.
90-100
: Second test block: tuple return with context.
Demonstrates returning a tuple while referencing the new TaskContext signature.
107-107
: Assertion for correct tuple content.
Ensures returned data is accurate.
114-118
: Third test block: data inputs.
Expands coverage to confirm data retrieval from storage is integrated with the new context-based tasks.
119-125
: Create and store data in the driver context.
Ensures data is placed in the store, ready to be fetched.
126-147
: Invoke data_test function.
• Confirms that the function retrieves the correct int value.
• Ensures data cleanup from storage.
src/spider/worker/FunctionManager.hpp (5)
275-279
: Validation of TaskContext
These static_assert checks are helpful for ensuring the first argument is TaskContext. This is a good compile-time safety check.
9-9
: Check for consistent library usage
The addition of is appropriate here, but please confirm that we consistently use smart pointers in all areas referencing memory management to avoid potential ownership confusion.
✅ Verification successful
Smart pointer usage is consistent throughout the codebase
The codebase demonstrates a robust and consistent use of smart pointers (std::shared_ptr
and std::unique_ptr
) for memory management. Raw pointer usage is minimal and appears only in test utilities or specific DLL loading scenarios. The inclusion of <memory>
in FunctionManager.hpp aligns with the project's memory management practices.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Searching for raw pointer usage or unorthodox memory allocation
rg -A 5 -E "new |delete "
Length of output: 424
Script:
#!/bin/bash
# Search for raw pointer usage patterns
rg "new\s+\w+[\s\*]" --type cpp
# Search for smart pointer usage
rg "std::(unique_ptr|shared_ptr|weak_ptr)" --type cpp
# Search for delete operations
rg "\bdelete\b" --type cpp
# Look for memory header inclusion
rg "#include\s+[<\"]memory[>\"]" --type cpp
Length of output: 25455
388-393
: Function registration placeholders
The usage of std::placeholders ensures our parameters are forwarded in order, but confirm that the _1
(TaskContext) and _2
(ArgsBuffer) ordering remains correct in all usage cases.
✅ Verification successful
Let me gather more information about the parameter ordering by examining the FunctionInvoker
implementation and its usage.
Let me try a different approach to find the FunctionInvoker implementation.
Parameter ordering is correctly enforced through static assertions
The implementation enforces the correct parameter ordering through compile-time checks:
- The
static_assert
verifies thatTaskContext
is the first parameter of the function - The
apply
method signature explicitly defines the order as(function, TaskContext, ArgsBuffer)
- The placeholders
_1
and_2
correctly map toTaskContext
andArgsBuffer
respectively
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Searching for register_function usage to confirm parameter order is correct
rg -A 3 "register_function"
Length of output: 1220
Script:
#!/bin/bash
# Search for FunctionInvoker class implementation and usage
ast-grep --pattern 'struct FunctionInvoker {
$$$
}'
# Also search for any apply method implementations
ast-grep --pattern 'apply($$$) {
$$$
}'
Length of output: 99
Script:
#!/bin/bash
# Search for FunctionInvoker template class and its apply method
rg -A 10 "FunctionInvoker"
# Also search specifically for the apply method with more context
rg -A 10 "apply.*TaskContext.*ArgsBuffer"
Length of output: 2890
22-24
: Ensure correct headers referencing
You are including multiple client-side headers (Data.hpp, task.hpp, TaskContext.hpp). Verify that these dependencies align with the worker domain, so we do not expose client details unnecessarily.
✅ Verification successful
Let me gather more information about the actual usage of these headers in the worker domain.
Let me check the actual usage of these headers in the worker code to understand if they're truly necessary.
Based on the analysis of the codebase, I can now provide a final response:
The client header dependencies are necessary and correctly used
The client headers (Data.hpp
, task.hpp
, TaskContext.hpp
) are essential in the worker domain as they define the interfaces that both client and worker components need to share. The FunctionManager
uses these interfaces to:
- Handle task execution context via
TaskContext
- Process data objects via the
Data
template - Manage task-related structures
The core implementations (DataImpl
, TaskContextImpl
) properly extend these shared interfaces while keeping implementation details separate. This is a common pattern where interfaces are shared between components while implementations remain isolated.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
#!/bin/bash
# Searching for direct references to the client directory from the worker code to confirm minimal coupling
rg -A 3 "src/spider/client"
Length of output: 880
Script:
#!/bin/bash
# Check the content of FunctionManager.hpp to understand how these headers are used
cat src/spider/worker/FunctionManager.hpp
# Also check for any other worker files that might be using these client headers
rg -l "client/Data\.hpp|client/task\.hpp|client/TaskContext\.hpp" src/spider/worker/
Length of output: 14562
Script:
#!/bin/bash
# Check how these client headers are used in task_executor.cpp
cat src/spider/worker/task_executor.cpp
# Check if there are any core interfaces that should be used instead
fd -e hpp -e cpp -t f . src/spider/core/ -x cat {} \;
Length of output: 21072
269-270
: New apply method signature
The apply method now expects a TaskContext. This is consistent with the updated function alias. Confirm that external callers have updated their invocation approach accordingly.
src/spider/core/DataImpl.hpp (2)
12-24
: Utility class design
DataImpl centralizes creation and retrieval of Data. This design pattern is beneficial for keeping data management logic in one place.
7-8
: Redundant includes?
We see both "../client/Data.hpp" and "../core/Data.hpp" which may create confusion if multiple Data classes exist. Validate if both are required.
✅ Verification successful
Let me gather more information about the actual usage of these Data classes.
Both includes are required - they serve different purposes
The includes are not redundant. The file structure shows that:
client/Data.hpp
defines the public templatedData<T>
class that users interact withcore/Data.hpp
defines the internal implementation classcore::Data
that handles the underlying storageDataImpl.hpp
bridges between these two, providing implementation details and factory methods
The includes are necessary as DataImpl.hpp
needs both the public interface and the core implementation to fulfill its bridging role.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Searching references of Data.hpp in core and client to confirm usage
rg -A 5 "Data\.hpp"
Length of output: 7716
Script:
#!/bin/bash
# Check the content of both Data.hpp files to understand their relationship
echo "=== Content of client/Data.hpp ==="
cat src/spider/client/Data.hpp
echo -e "\n=== Content of core/Data.hpp ==="
cat src/spider/core/Data.hpp
# Check DataImpl.hpp implementation to see how these are used
echo -e "\n=== Content of core/DataImpl.hpp ==="
cat src/spider/core/DataImpl.hpp
Length of output: 7410
tests/worker/worker-test.cpp (2)
9-9
: Signature update
sum_test now expects a TaskContext. This is consistent with the new function signature across the codebase. Good practice to display the context usage in logs or debugging.
18-20
: data_test function
This function simply retrieves and returns the data's integer value. Ensure there are tests verifying correct behaviour if the underlying storage is empty or invalid.
src/spider/core/TaskContextImpl.hpp (1)
1-2
: Header guard check
Header guard naming is consistent with other files.
src/spider/core/Data.hpp (1)
22-22
: Excellent optimization of get_value()
The change to return a const reference improves performance by avoiding unnecessary string copies. This is particularly beneficial when dealing with large strings.
Let's verify the usage of get_value() to ensure proper lifetime management:
✅ Verification successful
Based on the search results, I can generate the final response as the usage patterns are clear. The get_value()
is used safely throughout the codebase:
The const reference return type is safe and well-handled
The codebase demonstrates proper lifetime management of the returned reference:
- Used in comparison operations where the reference is immediately consumed
- Stored in const references with proper lifetime scope
- Properly handled when passing to storage operations
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for get_value() usage to verify proper handling of the returned reference
rg --type cpp "get_value\(\)" -B 2 -A 2
Length of output: 7669
tests/CMakeLists.txt (1)
34-39
: Well-structured CMake dependency additions
The addition of spider_client_lib dependencies is properly structured with clear multi-line formatting. This improves readability and maintainability.
Also applies to: 54-59
src/spider/storage/DataStorage.hpp (1)
29-29
: Verify implementation in derived storage classes
The new pure virtual method requires implementation in all concrete storage classes.
✅ Verification successful
Pure virtual method set_data_locality
is properly implemented
The verification shows that MySqlDataStorage
, the only concrete class deriving from DataStorage
, correctly implements the pure virtual method set_data_locality
with the required signature:
auto set_data_locality(Data const& data) -> StorageErr override;
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for classes deriving from DataStorage and verify set_data_locality implementation
ast-grep --pattern 'class $_ : public DataStorage {
$$$
auto set_data_locality(Data const& data) -> StorageErr $$$
$$$
}'
Length of output: 129
Script:
#!/bin/bash
# First, let's find all files that might contain DataStorage derived classes
fd -e hpp -e cpp | xargs rg "class.*:.*DataStorage" -l
# Then let's examine the content of these files to check implementations
fd -e hpp -e cpp | xargs rg "class.*:.*DataStorage" -A 50
Length of output: 3838
src/spider/CMakeLists.txt (2)
13-13
: LGTM: Header additions align with PR objectives
The addition of DataImpl.hpp and TaskContextImpl.hpp headers corresponds to the new TaskContext and Data components being introduced.
Also applies to: 17-17
88-93
: Verify the necessity of spider_client_lib dependency
While adding spider_client_lib enables the task executor to use client functionality, this creates tighter coupling between components. Consider if this dependency can be minimized or if an interface abstraction would be more appropriate.
✅ Verification successful
Let me gather more information about the task executor's dependencies and usage.
Let me check the actual usage of client-related code in the task executor files we found.
Let me check the TaskContext.hpp to understand the dependency better.
The spider_client_lib dependency is necessary and correctly used
The dependency on spider_client_lib is justified as the task executor requires TaskContext, which is a core interface for tasks to interact with Spider functionalities. The coupling is intentional and appropriate because:
- TaskContext is used as a mandatory first parameter in the function execution interface
- It provides essential functionality like abort operations and data storage access
- The implementation follows a clean separation where TaskContextImpl (core) provides the actual implementation
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check if spider_client_lib is actually used in task executor
# Look for client library includes in task executor sources
echo "Checking task executor sources for client library usage:"
for file in $(echo "${SPIDER_TASK_EXECUTOR_SOURCES}" | tr ' ' '\n'); do
if [ -n "$file" ]; then
echo "Checking $file:"
rg -l "spider/client" "$file" || true
fi
done
Length of output: 148
Script:
#!/bin/bash
# First, let's find the task executor source files
fd "task.*executor" --type f
# Then check for any client-related includes or usage
rg -l "client" --type cpp --type hpp
# Also check CMakeLists.txt files for context
fd "CMakeLists.txt" --type f --exec rg -l "spider_client"
Length of output: 385
Script:
#!/bin/bash
# Check TaskExecutor files for client usage
echo "=== Checking TaskExecutor.cpp ==="
rg "client" "src/spider/worker/TaskExecutor.cpp"
echo -e "\n=== Checking TaskExecutor.hpp ==="
rg "client" "src/spider/worker/TaskExecutor.hpp"
echo -e "\n=== Checking task_executor.cpp ==="
rg "client" "src/spider/worker/task_executor.cpp"
# Also check what spider_client_lib provides
echo -e "\n=== Checking spider_client_lib definition ==="
rg "add_library.*spider_client" src/spider/CMakeLists.txt -A 5
Length of output: 1151
Script:
#!/bin/bash
# Check the TaskContext.hpp content
echo "=== Checking TaskContext.hpp ==="
rg -A 10 "class TaskContext" src/spider/client/TaskContext.hpp
# Check if TaskContext is used elsewhere in worker
echo -e "\n=== Checking TaskContext usage in worker ==="
rg "TaskContext" src/spider/worker/
# Check if there are any other client dependencies in worker
echo -e "\n=== Checking other client dependencies in worker ==="
rg -l "spider/client" src/spider/worker/
Length of output: 2040
src/spider/client/Driver.hpp (2)
10-12
: LGTM: Required includes for serialization support
The includes are appropriately added to support the new data builder functionality with serialization capabilities.
57-61
: LGTM: Well-designed data builder method
The get_data_builder template method is well-constrained with the Serializable concept and follows the builder pattern, promoting type safety and clean object construction.
✅ Verification successful
Let me verify the Packable concept to ensure the Serializable constraint is properly defined.
Let me check the Data class implementation to verify the Builder pattern is properly implemented.
Let me check for the Builder class definition, as it might be defined separately or using a different pattern.
The get_data_builder template method is correctly implemented and well-constrained
The verification confirms that:
- The Serializable concept is properly defined with both SerializableImpl and DeSerializableImpl constraints
- The Data class includes a well-implemented Builder pattern with proper type safety
- The Builder class provides fluent interface methods for locality and cleanup configuration
- The msgpack serialization/deserialization is correctly integrated
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify the Serializable concept definition exists
# and check its constraints
echo "Checking for Serializable concept definition:"
rg -A 5 "concept\s+Serializable" || \
ast-grep --pattern 'concept Serializable { $$$_ }'
Length of output: 673
Script:
#!/bin/bash
# Check for Packable concept definition
echo "Checking for Packable concept definition:"
rg -A 5 "concept\s+Packable" || \
ast-grep --pattern 'concept Packable { $$$_ }'
# Check for any usage of Data::Builder with Serializable
echo -e "\nChecking Data::Builder usage with Serializable:"
rg "Data<.*>::Builder"
Length of output: 783
Script:
#!/bin/bash
# Check Data class implementation
echo "Checking Data class implementation:"
rg -A 10 "class\s+Data\s*(<.*>)?\s*\{" || \
ast-grep --pattern 'class Data { $$$_ }'
# Check Builder class implementation within Data
echo -e "\nChecking Builder class implementation:"
ast-grep --pattern 'class Data {
$$$
class Builder {
$$$
}
$$$
}'
Length of output: 1495
Script:
#!/bin/bash
# Check for Builder class or struct definition in Data files
echo "Checking for Builder definition in Data files:"
rg -A 10 "class\s+Builder|struct\s+Builder" src/spider/client/Data.hpp src/spider/core/Data.hpp
# Check for any other Builder-related code in Data files
echo -e "\nChecking for Builder-related code:"
rg "Builder" src/spider/client/Data.hpp src/spider/core/Data.hpp
Length of output: 1486
src/spider/worker/TaskExecutor.hpp (1)
118-119
: LGTM: Proper NOLINT directive placement
The NOLINTEND directive is properly placed to match its corresponding NOLINTBEGIN.
tests/integration/client.py (1)
41-51
: LGTM! Well-structured data classes.
The Driver
and Data
classes are properly defined with appropriate field types and clear purpose.
src/spider/storage/MysqlStorage.hpp (1)
91-91
: LGTM! Method declaration follows class conventions.
The set_data_locality
method is properly declared with appropriate parameter types and return type.
tests/integration/test_scheduler_worker.py (2)
11-17
: LGTM! Well-organized imports.
The imports are properly organized and include all necessary components for the new data-related functionality.
135-164
: LGTM! Well-structured test fixture.
The fixture properly sets up the test environment with appropriate cleanup in the teardown phase.
tests/worker/test-TaskExecutor.cpp (3)
Line range hint 56-80
: Test case properly validates task execution with storage integration
The test case follows good testing practices by:
- Including storage integration in the test setup
- Validating both execution success and result correctness
🧰 Tools
🪛 cppcheck (2.10-2)
[error] 56-56: syntax error
(syntaxError)
81-103
: Test case properly validates argument validation error handling
The test effectively verifies:
- Error detection for incorrect argument count
- Specific error type validation
104-125
: Test case properly validates task execution failure handling
The test effectively verifies:
- Task execution failure detection
- Specific error type validation
src/spider/worker/worker.cpp (1)
240-247
: Constructor call properly integrates storage URL parameter
The TaskExecutor construction is correctly updated to include the storage URL parameter, maintaining consistency with the interface changes.
static auto create_task_context( | ||
std::shared_ptr<DataStorage> const& data_storage, | ||
std::shared_ptr<MetadataStorage> const& metadata_storage | ||
) -> TaskContext { | ||
return TaskContext{data_storage, metadata_storage}; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codebase verification
Null pointer validation needed in TaskContext construction
Based on the code analysis, there are no explicit null pointer checks when constructing TaskContext objects. Since these shared pointers are crucial for the task execution context, I recommend adding validation to prevent potential null pointer issues.
- Add null pointer validation in
create_task_context
method insrc/spider/core/TaskContextImpl.hpp
- Consider throwing a meaningful exception (e.g.,
std::invalid_argument
) if eitherdata_storage
ormetadata_storage
is null
🔗 Analysis chain
create_task_context
The static method create_task_context cleanly constructs a contextual environment. Confirm that the usage covers potential error handling if data_storage or metadata_storage is null.
Let me check the implementation details and usage context more thoroughly.
Let me check the TaskContext implementation and constructor to see how it handles null pointers.
Let me try one final check to see the constructor implementation.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Searching for create_task_context usage to see if we pass null pointers
rg "create_task_context"
Length of output: 627
Script:
#!/bin/bash
# Check TaskContext constructor and implementation
ast-grep --pattern 'class TaskContext {
$$$
}'
# Check the actual usage in test and implementation files
rg -A 5 "create_task_context"
Length of output: 2623
Script:
#!/bin/bash
# Check TaskContext implementation
ast-grep --pattern 'struct TaskContext {
$$$
}'
# Check TaskContextImpl implementation
ast-grep --pattern 'class TaskContextImpl {
$$$
}'
# Look for any null checks in the relevant files
rg -A 3 "if.*null|nullptr" src/spider/core/
Length of output: 152
Script:
#!/bin/bash
# Look for TaskContext constructor implementation
rg -B 2 -A 5 "TaskContext::"
# Check for any assertions or validations in the header
rg -B 2 -A 5 "TaskContext\{" src/spider/core/
Length of output: 1180
def remove_data(conn, data: Data): | ||
cursor = conn.cursor() | ||
|
||
cursor.execute("DELETE FROM data WHERE id = %s", (data.id.bytes,)) | ||
conn.commit() | ||
cursor.close() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Add error handling for data removal.
The function should handle potential database errors during deletion.
def remove_data(conn, data: Data):
cursor = conn.cursor()
-
- cursor.execute("DELETE FROM data WHERE id = %s", (data.id.bytes,))
- conn.commit()
- cursor.close()
+ try:
+ cursor.execute("DELETE FROM data WHERE id = %s", (data.id.bytes,))
+ conn.commit()
+ except mysql.connector.Error as e:
+ conn.rollback()
+ raise RuntimeError(f"Failed to remove data: {e}")
+ finally:
+ cursor.close()
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
def remove_data(conn, data: Data): | |
cursor = conn.cursor() | |
cursor.execute("DELETE FROM data WHERE id = %s", (data.id.bytes,)) | |
conn.commit() | |
cursor.close() | |
def remove_data(conn, data: Data): | |
cursor = conn.cursor() | |
try: | |
cursor.execute("DELETE FROM data WHERE id = %s", (data.id.bytes,)) | |
conn.commit() | |
except mysql.connector.Error as e: | |
conn.rollback() | |
raise RuntimeError(f"Failed to remove data: {e}") | |
finally: | |
cursor.close() |
def add_driver(conn, driver: Driver): | ||
cursor = conn.cursor() | ||
|
||
cursor.execute( | ||
"INSERT INTO drivers (id, address) VALUES (%s, %s)", (driver.id.bytes, driver.addr) | ||
) | ||
|
||
conn.commit() | ||
cursor.close() | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Add error handling for database operations.
The function should handle potential database errors that could occur during insertion.
def add_driver(conn, driver: Driver):
cursor = conn.cursor()
-
- cursor.execute(
- "INSERT INTO drivers (id, address) VALUES (%s, %s)", (driver.id.bytes, driver.addr)
- )
-
- conn.commit()
- cursor.close()
+ try:
+ cursor.execute(
+ "INSERT INTO drivers (id, address) VALUES (%s, %s)",
+ (driver.id.bytes, driver.addr)
+ )
+ conn.commit()
+ except mysql.connector.Error as e:
+ conn.rollback()
+ raise RuntimeError(f"Failed to add driver: {e}")
+ finally:
+ cursor.close()
Committable suggestion skipped: line range outside the PR's diff.
def add_driver_data(conn, driver: Driver, data: Data): | ||
cursor = conn.cursor() | ||
|
||
cursor.execute("INSERT INTO data (id, value) VALUES (%s, %s)", (data.id.bytes, data.value)) | ||
cursor.execute( | ||
"INSERT INTO data_ref_driver (driver_id, id) VALUES (%s, %s)", | ||
(driver.id.bytes, data.id.bytes), | ||
) | ||
|
||
conn.commit() | ||
cursor.close() | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Improve transaction handling for multiple database operations.
The function performs two database operations that should be atomic. If the second operation fails, the first operation should be rolled back.
def add_driver_data(conn, driver: Driver, data: Data):
cursor = conn.cursor()
-
- cursor.execute("INSERT INTO data (id, value) VALUES (%s, %s)", (data.id.bytes, data.value))
- cursor.execute(
- "INSERT INTO data_ref_driver (driver_id, id) VALUES (%s, %s)",
- (driver.id.bytes, data.id.bytes),
- )
-
- conn.commit()
- cursor.close()
+ try:
+ cursor.execute("INSERT INTO data (id, value) VALUES (%s, %s)",
+ (data.id.bytes, data.value))
+ cursor.execute(
+ "INSERT INTO data_ref_driver (driver_id, id) VALUES (%s, %s)",
+ (driver.id.bytes, data.id.bytes),
+ )
+ conn.commit()
+ except mysql.connector.Error as e:
+ conn.rollback()
+ raise RuntimeError(f"Failed to add driver data: {e}")
+ finally:
+ cursor.close()
Committable suggestion skipped: line range outside the PR's diff.
auto MySqlDataStorage::set_data_locality(Data const& data) -> StorageErr { | ||
try { | ||
std::unique_ptr<sql::PreparedStatement> const delete_statement( | ||
m_conn->prepareStatement("DELETE FROM `data_locality` WHERE `id` = ?") | ||
); | ||
sql::bytes id_bytes = uuid_get_bytes(data.get_id()); | ||
delete_statement->setBytes(1, &id_bytes); | ||
delete_statement->executeUpdate(); | ||
std::unique_ptr<sql::PreparedStatement> const insert_statement(m_conn->prepareStatement( | ||
"INSERT INTO `data_locality` (`id`, `address`) VALUES(?, ?)" | ||
)); | ||
for (std::string const& addr : data.get_locality()) { | ||
insert_statement->setBytes(1, &id_bytes); | ||
insert_statement->setString(2, addr); | ||
insert_statement->executeUpdate(); | ||
} | ||
std::unique_ptr<sql::PreparedStatement> const hard_locality_statement( | ||
m_conn->prepareStatement("UPDATE `data` SET `hard_locality` = ? WHERE `id` = ?") | ||
); | ||
hard_locality_statement->setBoolean(1, data.is_hard_locality()); | ||
hard_locality_statement->setBytes(2, &id_bytes); | ||
hard_locality_statement->executeUpdate(); | ||
} catch (sql::SQLException& e) { | ||
m_conn->rollback(); | ||
return StorageErr{StorageErrType::OtherErr, e.what()}; | ||
} | ||
m_conn->commit(); | ||
return StorageErr{}; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Enhance data locality update robustness
Consider these improvements for better error handling and safety:
- Validate data existence before updates
- Check update operation success
- Use const for SQL statement strings
auto MySqlDataStorage::set_data_locality(Data const& data) -> StorageErr {
try {
+ // Verify data exists
+ std::unique_ptr<sql::PreparedStatement> const check_statement(
+ m_conn->prepareStatement("SELECT 1 FROM `data` WHERE `id` = ?")
+ );
+ sql::bytes id_bytes = uuid_get_bytes(data.get_id());
+ check_statement->setBytes(1, &id_bytes);
+ std::unique_ptr<sql::ResultSet> const check_result(check_statement->executeQuery());
+ if (check_result->rowsCount() == 0) {
+ m_conn->rollback();
+ return StorageErr{
+ StorageErrType::KeyNotFoundErr,
+ fmt::format("no data with id {}", boost::uuids::to_string(data.get_id()))
+ };
+ }
+
std::unique_ptr<sql::PreparedStatement> const delete_statement(
- m_conn->prepareStatement("DELETE FROM `data_locality` WHERE `id` = ?")
+ m_conn->prepareStatement("DELETE FROM `data_locality` WHERE `id` = ?")
);
- sql::bytes id_bytes = uuid_get_bytes(data.get_id());
delete_statement->setBytes(1, &id_bytes);
- delete_statement->executeUpdate();
+ int const deleted_rows = delete_statement->executeUpdate();
+ if (deleted_rows < 0) {
+ m_conn->rollback();
+ return StorageErr{StorageErrType::OtherErr, "Failed to delete existing locality"};
+ }
Committable suggestion skipped: line range outside the PR's diff.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Nitpick comments (4)
tests/worker/test-FunctionManager.cpp (2)
22-33
: Document intended usage of TaskContext parameterThe TaskContext parameter is currently unused in all test functions. If this parameter is intended for future use or required by design, please add documentation explaining its purpose and expected future usage.
-auto int_test(spider::TaskContext const& /*context*/, int const x, int const y) -> int { +// TODO: TaskContext will be used in future to [describe intended usage] +auto int_test(spider::TaskContext const& context, int const x, int const y) -> int {
90-113
: Add error test cases for tuple return functionThe test case for tuple returns only covers the happy path. Consider adding test cases for:
- Wrong number of arguments
- Wrong argument types
- Invalid string input
tests/worker/test-TaskExecutor.cpp (2)
Line range hint
58-127
: Test cases properly adapted for storage integrationThe modifications consistently integrate the storage URL parameter across all test cases while maintaining the original test coverage and assertions. However, consider adding test cases for storage-specific failure scenarios.
Consider adding test cases for:
- Invalid storage URL
- Storage connection failures
- Storage operation failures
170-172
: Remove redundant value checkThe
has_value()
check is redundant as it's already verified by the REQUIRE statement at line 169.Suggested fix:
- if (optional_result.has_value()) { - REQUIRE(3 == optional_result.value()); - } + REQUIRE(3 == optional_result.value());
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
src/spider/worker/TaskExecutor.hpp
(4 hunks)tests/worker/test-FunctionManager.cpp
(3 hunks)tests/worker/test-TaskExecutor.cpp
(4 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- src/spider/worker/TaskExecutor.hpp
🧰 Additional context used
🪛 cppcheck (2.10-2)
tests/worker/test-FunctionManager.cpp
[error] 39-39: syntax error
(syntaxError)
tests/worker/test-TaskExecutor.cpp
[error] 58-58: syntax error
(syntaxError)
🔇 Additional comments (2)
tests/worker/test-FunctionManager.cpp (1)
35-37
: LGTM! Function registration is complete
All test functions are properly registered using the SPIDER_WORKER_REGISTER_TASK macro.
tests/worker/test-TaskExecutor.cpp (1)
2-2
: LGTM: Header additions are well-organized
The new headers appropriately support the Data and Storage functionality being added to the TaskExecutor tests.
Also applies to: 12-14, 17-22
TEMPLATE_LIST_TEST_CASE( | ||
"Register and run function with data inputs", | ||
"[core][storage]", | ||
spider::test::StorageTypeList | ||
) { | ||
auto [unique_metadata_storage, unique_data_storage] = spider::test:: | ||
create_storage<std::tuple_element_t<0, TestType>, std::tuple_element_t<1, TestType>>(); | ||
|
||
std::shared_ptr<spider::core::MetadataStorage> const metadata_storage | ||
= std::move(unique_metadata_storage); | ||
std::shared_ptr<spider::core::DataStorage> const data_storage = std::move(unique_data_storage); | ||
|
||
msgpack::sbuffer buffer; | ||
msgpack::pack(buffer, 3); | ||
spider::core::Data const data{std::string{buffer.data(), buffer.size()}}; | ||
boost::uuids::random_generator gen; | ||
boost::uuids::uuid const driver_id = gen(); | ||
spider::core::Driver const driver{driver_id, "127.0.0.1"}; | ||
REQUIRE(metadata_storage->add_driver(driver).success()); | ||
REQUIRE(data_storage->add_driver_data(driver_id, data).success()); | ||
|
||
spider::TaskContext const context | ||
= spider::core::TaskContextImpl::create_task_context(data_storage, metadata_storage); | ||
|
||
spider::core::FunctionManager const& manager = spider::core::FunctionManager::get_instance(); | ||
|
||
spider::core::Function const* function = manager.get_function("data_test"); | ||
|
||
spider::core::ArgsBuffer const args_buffers = spider::core::create_args_buffers(data.get_id()); | ||
msgpack::sbuffer const result = (*function)(context, args_buffers); | ||
REQUIRE(3 == spider::core::response_get_result<int>(result).value_or(0)); | ||
|
||
REQUIRE(data_storage->remove_data(data.get_id()).success()); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Enhance data test coverage
Consider adding the following test scenarios:
- Attempting to access non-existent data
- Handling invalid data IDs
- Testing data removal with invalid IDs
- Verifying behaviour when storage operations fail
Example test case:
// Test accessing non-existent data
spider::core::ArgsBuffer const invalid_args_buffers = spider::core::create_args_buffers("invalid-id");
msgpack::sbuffer const error_result = (*function)(context, invalid_args_buffers);
auto error = spider::core::response_get_error(error_result);
REQUIRE(error.has_value());
TEMPLATE_LIST_TEST_CASE( | ||
"Task execute data argument", | ||
"[worker][storage]", | ||
spider::test::StorageTypeList | ||
) { | ||
auto [unique_metadata_storage, unique_data_storage] = spider::test:: | ||
create_storage<std::tuple_element_t<0, TestType>, std::tuple_element_t<1, TestType>>(); | ||
std::shared_ptr<spider::core::MetadataStorage> const metadata_storage | ||
= std::move(unique_metadata_storage); | ||
std::shared_ptr<spider::core::DataStorage> const data_storage = std::move(unique_data_storage); | ||
|
||
// Create driver and data | ||
msgpack::sbuffer buffer; | ||
msgpack::pack(buffer, 3); | ||
spider::core::Data const data{std::string{buffer.data(), buffer.size()}}; | ||
boost::uuids::random_generator gen; | ||
boost::uuids::uuid const driver_id = gen(); | ||
spider::core::Driver const driver{driver_id, "127.0.0.1"}; | ||
REQUIRE(metadata_storage->add_driver(driver).success()); | ||
REQUIRE(data_storage->add_driver_data(driver_id, data).success()); | ||
|
||
absl::flat_hash_map< | ||
boost::process::v2::environment::key, | ||
boost::process::v2::environment::value> const environment_variable | ||
= get_environment_variable(); | ||
|
||
boost::asio::io_context context; | ||
|
||
spider::worker::TaskExecutor executor{ | ||
context, | ||
"data_test", | ||
spider::test::cStorageUrl, | ||
get_libraries(), | ||
environment_variable, | ||
data.get_id() | ||
}; | ||
context.run(); | ||
executor.wait(); | ||
REQUIRE(executor.succeed()); | ||
std::optional<int> const optional_result = executor.get_result<int>(); | ||
REQUIRE(optional_result.has_value()); | ||
if (optional_result.has_value()) { | ||
REQUIRE(3 == optional_result.value()); | ||
} | ||
|
||
// Clean up | ||
REQUIRE(data_storage->remove_data(data.get_id()).success()); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Add test coverage for data retrieval failures
The template test case only covers the happy path. Consider adding test cases for data retrieval failures.
Add test cases for:
- Non-existent data ID
- Corrupted data
- Storage access failures during data retrieval
Would you like me to provide example test case implementations for these scenarios?
msgpack::sbuffer buffer; | ||
msgpack::pack(buffer, 3); | ||
spider::core::Data const data{std::string{buffer.data(), buffer.size()}}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add error handling for data serialization
The MsgPack serialization could throw exceptions but lacks try-catch blocks.
Suggested fix:
msgpack::sbuffer buffer;
- msgpack::pack(buffer, 3);
- spider::core::Data const data{std::string{buffer.data(), buffer.size()}};
+ try {
+ msgpack::pack(buffer, 3);
+ spider::core::Data const data{std::string{buffer.data(), buffer.size()}};
+ } catch (const msgpack::type_error& e) {
+ FAIL("Failed to serialize data: " << e.what());
+ }
Committable suggestion skipped: line range outside the PR's diff.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
tests/worker/test-TaskExecutor.cpp (3)
Line range hint
58-127
: Add storage-related validation to existing test casesWhile the storage URL parameter has been added, the test cases don't verify storage interactions. Consider adding assertions to validate that:
- The storage connection is established successfully
- Resources are properly cleaned up after test execution
Example addition for the success test case:
TEST_CASE("Task execute success", "[worker][storage]") { + // Verify storage setup + REQUIRE(spider::test::verify_storage_connection(spider::test::cStorageUrl)); // ... existing test code ... + // Verify cleanup + REQUIRE(executor.cleanup_storage_resources().success()); }
168-172
: Remove redundant null checkThe
has_value()
check is redundant after theREQUIRE(optional_result.has_value())
. The value access can be simplified.REQUIRE(optional_result.has_value()); - if (optional_result.has_value()) { - REQUIRE(3 == optional_result.value()); - } + REQUIRE(3 == optional_result.value());
174-176
: Enhance cleanup with RAII patternConsider using RAII to ensure cleanup occurs even if the test fails. This prevents resource leaks in the storage system.
+ class CleanupGuard { + public: + CleanupGuard(std::shared_ptr<spider::core::DataStorage> storage, + spider::core::Data::Id id) + : storage_(storage), id_(id) {} + ~CleanupGuard() { + if (storage_) { + storage_->remove_data(id_); + } + } + private: + std::shared_ptr<spider::core::DataStorage> storage_; + spider::core::Data::Id id_; + }; + + // Create cleanup guard at the start + CleanupGuard cleanup_guard(data_storage, data.get_id()); - // Clean up - REQUIRE(data_storage->remove_data(data.get_id()).success());
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
tests/worker/test-TaskExecutor.cpp
(4 hunks)
🧰 Additional context used
🪛 cppcheck (2.10-2)
tests/worker/test-TaskExecutor.cpp
[error] 58-58: syntax error
(syntaxError)
🔇 Additional comments (2)
tests/worker/test-TaskExecutor.cpp (2)
129-176
: Add test coverage for error scenarios
The template test case only covers the happy path. Consider adding test cases for:
- Invalid data ID
- Corrupted data
- Storage access failures
141-143
: Add error handling for data serialization
The MsgPack serialization could throw exceptions but lacks try-catch blocks.
Description
spider::TaskContext
as the first argument for a task.TaskContext
itself is not yet implemented.spider::Data
to task. Task can useData
as task arguments, and access the their values. However, creating data builder is not implemented yet.Validation performed
Summary by CodeRabbit
New Features
Data
,Driver
, andTaskContext
classes.DataStorage
class.FunctionManager
andTaskExecutor
to utilize task contexts.Bug Fixes
Tests