feat: Add TaskContext and Data as task argument #40

sitaowang1998 · 2024-12-21T06:54:00Z

Description

Add spider::TaskContext as the first argument for a task. TaskContext itself is not yet implemented.
Add spider::Data to task. Task can use Data as task arguments, and access the their values. However, creating data builder is not implemented yet.

Validation performed

GitHub workflows pass.
All unit tests pass in devcontainer.
All integration tests pass in devcontainer.

Summary by CodeRabbit

New Features
- Introduced new header files for enhanced core functionality.
- Added methods for improved data handling and error management in the Data, Driver, and TaskContext classes.
- New functionality for managing data locality in the DataStorage class.
- Enhanced task execution with a new parameter for storage URL.
- Expanded capabilities of the FunctionManager and TaskExecutor to utilize task contexts.
- New data classes and functions for driver and data management in integration tests.
Bug Fixes
- Improved error handling in various classes related to data and task execution.
Tests
- Expanded test coverage for new functionalities, including data management and task execution scenarios.

coderabbitai · 2024-12-21T06:54:09Z

Walkthrough

This pull request introduces several enhancements to the Spider project's architecture, focusing on data management, task context, and storage interactions. Key modifications include the addition of new header files, updates to library linking, and significant changes to the Data, Driver, and TaskContext classes. The integration of a TaskContext parameter in various components enhances task execution and data handling capabilities. Furthermore, new methods for managing data locality and improved error handling mechanisms are implemented across multiple files, ensuring a more robust and flexible architecture.

Changes

File	Change Summary
`src/spider/CMakeLists.txt`	Added headers for `core/DataImpl.hpp` and `core/TaskContextImpl.hpp`, updated library linking for `spider_task_executor` to include `spider_client_lib`
`src/spider/client/Data.hpp`, `src/spider/client/Driver.hpp`, `src/spider/client/TaskContext.hpp`	Enhanced data and task context management, added new methods for data builder and context creation, improved error handling
`src/spider/core/Data.hpp`, `src/spider/core/DataImpl.hpp`, `src/spider/core/TaskContextImpl.hpp`	Introduced new implementation classes, modified data retrieval methods, added static utility methods for task context management
`src/spider/storage/DataStorage.hpp`, `src/spider/storage/MysqlStorage.hpp/.cpp`	Added `set_data_locality` method to support data locality configuration in storage classes
`src/spider/worker/FunctionManager.hpp`, `src/spider/worker/TaskExecutor.hpp`, `src/spider/worker/task_executor.cpp`, `src/spider/worker/worker.cpp`	Updated function and task execution signatures to include `TaskContext`, added storage URL handling in task execution
`tests/CMakeLists.txt`	Added `spider_client_lib` as a dependency for test targets
Test files	Updated test cases to accommodate new context and data management patterns, including new fixtures and methods for driver and data management

Possibly Related PRs

feat: Add pytest integration test for scheduler and worker #39: Changes in CMakeLists.txt and integration of scheduler and worker components.

Tip

CodeRabbit's docstrings feature is now available as part of our Early Access Program! Simply use the command @coderabbitai generate docstrings to have CodeRabbit automatically generate docstrings for your pull request. We would love to hear your feedback on Discord.

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 5

🧹 Nitpick comments (20)

src/spider/client/Data.hpp (1)

102-124: build method: Data creation and storage addition.
• The switch over DataSource is correct.
• Good job raising exceptions on storage failures.
• Consider verifying that the data size is not too large for the store in case of resource limitations.

src/spider/worker/task_executor.cpp (2)

86-89: Require storage_url presence.
Returning cCmdArgParseErr if not present is appropriate. Maybe log a descriptive error if “storage_url” is missing for debugging.

🧰 Tools

🪛 cppcheck (2.10-2)

[performance] 88-88: Variable 'id' is assigned in constructor body. Consider performing initialization in initialization list.

(useInitializationList)

109-124: Connect to data and metadata storage.
• Good error handling with cStorageErr.
• Consider logging success messages to confirm a successful connection.

src/spider/worker/FunctionManager.hpp (4)

48-48: Function alias uses TaskContext
We see that Function now requires a TaskContext as its first parameter. This is a beneficial approach for context-awareness, but ensure we handle concurrency concerns if TaskContext is shared across threads.

287-288: Potential concurrency concerns
We retrieve a shared pointer to DataStorage. If concurrent tasks share DataStorage, ensure that all accesses remain thread-safe.

295-302: Argument size check
We compare std::tuple_size_v - 1 to object.via.array.size. Ensure that the logic remains accurate if more parameters are added to TaskContext in future expansions.

313-342: Argument parsing flow
The code strongly checks for data retrieval errors and gracefully returns error responses. This is a good practice. However, consider clarifying the error messages to specify which argument fails to parse if possible.

src/spider/core/DataImpl.hpp (1)

1-2: Header guard
The header guard is standard. Ensure the naming convention (SPIDER_CORE_DATAIMPL_HPP) matches the project's guidelines.

tests/worker/worker-test.cpp (3)

4-5: Header location
These relative includes reference ../../src/spider/client/Data.hpp and TaskContext.hpp. Confirm that test includes remain aligned with the broader project structure for future relocations.

14-14: Error test function
Accepting TaskContext provides an avenue for more context-aware error handling. For example, logging metadata about the failing task. Consider leveraging the context.

26-26: Task registration
The data_test function is now registered. This looks correct. Make sure to add associated integration tests for data flows.

src/spider/core/TaskContextImpl.hpp (3)

6-8: Confirm correct storage includes
We reference DataStorage.hpp and MetadataStorage.hpp—double-check that the forward declarations are not sufficient, or otherwise prefer them if you only use pointers.

20-22: get_data_store
Returning a shared pointer from inside the TaskContext fosters concurrency safety. Confirm that all references to m_data_store are read/written safely.

24-27: get_metadata_store
Similar to get_data_store, ensure that metadata manipulations are properly synchronized if multiple tasks can access them simultaneously.
src/spider/storage/DataStorage.hpp (1)
29-29: Please add documentation for set_data_locality

The new method follows good design patterns, but would benefit from documentation explaining:

Purpose and expected behaviour

Parameter requirements

Possible error conditions

Consider adding a comment block:
/**
 * Sets the locality information for the specified data.
 * @param data The data object containing locality information to be set
 * @return StorageErr indicating success or specific error condition
 */
src/spider/worker/TaskExecutor.hpp (2)
Line range hint 37-52: Add storage_url validation

While the storage_url parameter is properly added, consider validating the URL format to prevent runtime issues.
 template <class... Args>
 TaskExecutor(
         boost::asio::io_context& context,
         std::string const& func_name,
         std::string const& storage_url,
         std::vector<std::string> const& libs,
         absl::flat_hash_map<
                 boost::process::v2::environment::key,
                 boost::process::v2::environment::value> const& environment,
         Args&&... args
 )
         : m_read_pipe(context),
           m_write_pipe(context) {
+    if (storage_url.empty()) {
+        throw std::invalid_argument("storage_url cannot be empty");
+    }
     std::vector<std::string>
             process_args{"--func", func_name, "--storage_url", storage_url, "--libs"};
Line range hint 82-92: Ensure consistent storage_url validation

The same storage_url validation should be applied to this constructor overload for consistency.
 TaskExecutor(
         boost::asio::io_context& context,
         std::string const& func_name,
         std::string const& storage_url,
         std::vector<std::string> const& libs,
         absl::flat_hash_map<
                 boost::process::v2::environment::key,
                 boost::process::v2::environment::value> const& environment,
         std::vector<msgpack::sbuffer> const& args_buffers
 )
         : m_read_pipe(context),
           m_write_pipe(context) {
+    if (storage_url.empty()) {
+        throw std::invalid_argument("storage_url cannot be empty");
+    }
     std::vector<std::string>
             process_args{"--func", func_name, "--storage_url", storage_url, "--libs"};
tests/integration/test_scheduler_worker.py (2)
141-141: Avoid hard-coded IP addresses in tests.

Consider using a configuration variable or environment variable for the driver address to make tests more flexible and maintainable.
-    driver = Driver(id=uuid.uuid4(), addr="127.0.0.1")
+    driver = Driver(id=uuid.uuid4(), addr=os.getenv("TEST_DRIVER_ADDR", "127.0.0.1"))
194-202: Consider implementing a more robust wait mechanism.

Using a fixed sleep time of 2 seconds could make the test flaky. Consider implementing a polling mechanism with timeout.
+    def wait_for_task_completion(self, storage, task_id, timeout=10):
+        start_time = time.time()
+        while time.time() - start_time < timeout:
+            state = get_task_state(storage, task_id)
+            if state in ["success", "fail"]:
+                return state
+            time.sleep(0.1)
+        raise TimeoutError(f"Task {task_id} did not complete within {timeout} seconds")
+
     def test_data_job(self, scheduler_worker, storage, data_job):
         task = data_job
-        # Wait for 2 seconds and check task output
-        time.sleep(2)
-        state = get_task_state(storage, task.id)
+        state = self.wait_for_task_completion(storage, task.id)
         assert state == "success"
         outputs = get_task_outputs(storage, task.id)
         assert len(outputs) == 1
         assert outputs[0].value == msgpack.packb(2).decode("utf-8")
tests/worker/test-TaskExecutor.cpp (1)
127-174: Improve test robustness and remove redundant checks

Consider these improvements:

Verify data cleanup success by attempting to retrieve the deleted data

Remove redundant if condition around REQUIRE as has_value() is already verified
- if (optional_result.has_value()) {
-     REQUIRE(3 == optional_result.value());
- }
+ REQUIRE(3 == optional_result.value());

  // Clean up
  REQUIRE(data_storage->remove_data(data.get_id()).success());
+ // Verify cleanup
+ Data retrieved_data{data.get_id(), ""};
+ REQUIRE(!data_storage->get_data(data.get_id(), &retrieved_data).success());

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0a896a8 and 5e87551.

📒 Files selected for processing (20)

src/spider/CMakeLists.txt (2 hunks)
src/spider/client/Data.hpp (5 hunks)
src/spider/client/Driver.hpp (2 hunks)
src/spider/client/TaskContext.hpp (3 hunks)
src/spider/core/Data.hpp (1 hunks)
src/spider/core/DataImpl.hpp (1 hunks)
src/spider/core/TaskContextImpl.hpp (1 hunks)
src/spider/storage/DataStorage.hpp (1 hunks)
src/spider/storage/MysqlStorage.cpp (1 hunks)
src/spider/storage/MysqlStorage.hpp (1 hunks)
src/spider/worker/FunctionManager.hpp (6 hunks)
src/spider/worker/TaskExecutor.hpp (5 hunks)
src/spider/worker/task_executor.cpp (7 hunks)
src/spider/worker/worker.cpp (3 hunks)
tests/CMakeLists.txt (2 hunks)
tests/integration/client.py (2 hunks)
tests/integration/test_scheduler_worker.py (3 hunks)
tests/worker/test-FunctionManager.cpp (3 hunks)
tests/worker/test-TaskExecutor.cpp (4 hunks)
tests/worker/worker-test.cpp (1 hunks)

🧰 Additional context used

🪛 cppcheck (2.10-2)

src/spider/worker/task_executor.cpp

[performance] 88-88: Variable 'id' is assigned in constructor body. Consider performing initialization in initialization list.

(useInitializationList)

tests/worker/test-FunctionManager.cpp

[error] 39-39: syntax error

(syntaxError)

tests/worker/test-TaskExecutor.cpp

[error] 56-56: syntax error

(syntaxError)

🔇 Additional comments (70)

src/spider/client/TaskContext.hpp (9)

4-7: Includes look good.
No concerns identified; the added headers and utilities seem appropriate for the new functionality.

12-13: Confirmed usage of Serializer and Data headers.
These includes are necessary for serialization support and data management.

19-22: Forward declarations verified.
Declaring DataStorage, MetadataStorage, and TaskContextImpl here avoids unnecessary header dependencies in this file.

45-50: Validate coverage of template method get_data_builder().
The template function to retrieve a Data builder is well-defined. Please confirm callers handle potential exceptions from the underlying storage operations, if any.

131-132: Default constructor usage.
Having a default constructor maintains flexibility for creating empty TaskContext objects, but be mindful of usage in contexts where storage must be initialized.

133-140: New constructor sets up storage references.
The constructor consistently sets internal references to data and metadata stores. Great for ensuring the TaskContext is readily available with storage resources.

141-143: Private getters for internal usage.
Providing accessors to the private shared pointers centralizes how they're retrieved, preserving encapsulation. No issues observed.

145-146: Private members to manage data & metadata store.
These shared pointers are appropriately encapsulated. Just ensure references remain valid throughout the TaskContext's lifetime.

148-148: Friend declaration for core::TaskContextImpl.
Allowing TaskContextImpl to construct fully-initialized TaskContext objects is consistent with the design.

src/spider/client/Data.hpp (11)

4-8: Includes and standard library usage are fine.
No issues with the inclusion of , , , , and .

11-17: External dependencies validated.
The usage of Boost UUID, MsgPack, and DataStorage references is consistent with the class’s needs for serialization and storage.

48-51: Unpack logic in get() method.
Please ensure that the packed data is always valid, as msgpack::unpack can throw on corrupted data. A try/catch or a check might be advisable if corruption is possible.

61-65: set_locality: Potential concurrency considerations.
If multiple threads set locality simultaneously, ensure DataStorage’s thread safety. If it is guaranteed thread-safe, no further issues.

77-81: set_locality builder method.
Straightforward assignment. Ensure usage clarifies that data must call build() afterwards to finalize the object.

90-93: set_cleanup_func in builder.
The design of storing a cleanup function is flexible. Confirm that the function is invoked at the correct time in the data object’s lifecycle.

153-154: Default constructor provides flexibility.
No immediate issues. This can be convenient for scenarios requiring placeholder Data objects.

156-159: Private constructor for Data with unique_ptr implementation.
Ensuring the Data object can only be created via build or internal logic helps maintain consistency.

160-160: get_impl() method.
Returning a const reference to the unique_ptr is a neat approach. Just ensure that external code does not store this reference longer than the Data object’s scope.

162-164: Internal Data pointers.
Storing the implementation pointer as a unique_ptr is correct to enforce ownership semantics. The shared_ptr to DataStorage also aligns with your usage pattern.

165-165: Friend class declaration for core::DataImpl.
Enables that class to manage and access internal pointers for advanced logic. No issues noted.

src/spider/worker/task_executor.cpp (6)

5-5: Include for smart pointers.
No concerns here.

20-26: Include TaskContext and storage headers.
These includes are expected for creating and using TaskContext, DataStorage, and MetadataStorage.

43-47: storage_url argument.
Ensuring users can pass a storage URL is crucial. Confirm that it’s documented in usage instructions.

62-66: New error codes for clarity.
Defining separate codes clarifies error handling logic.

80-80: Reading func_name from args.
No issues. If an invalid function name is provided, we handle it in subsequent checks.

162-163: Create TaskContext with data & metadata store references.
Ensures tasks can access the correct storage within the executor.

tests/worker/test-FunctionManager.cpp (18)

2-6: Includes align with test coverage.
Boost UUID and Catch2 are relevant for testing.

8-10: Boost UUID usage.
Boost for UUID is appropriate; random generation for test objects is valid.

13-16: Client and core headers included.
Connecting Data, Driver, TaskContext, and TaskContextImpl for test coverage is consistent.

19-19: StorageTestHelper usage.
This header is used to create test storage, ensuring coverage for multiple storage backends.

22-23: int_test function with TaskContext.
The function simply adds two integers but takes a context parameter to match the new function signature. Fine for demonstrations.

26-27: tuple_ret_test function with TaskContext.
Returning a tuple from a function that now includes a TaskContext parameter. Straightforward approach.

31-33: data_test function to confirm Data usage.
Retrieves an int from spider::Data. This clarifies that data is read properly from storage.

37-37: SPIDER_WORKER_REGISTER_TASK for data_test.
Ensures the new data_test is discoverable by the FunctionManager.

39-49: First test block: Register and run function with POD inputs.
• Great approach for verifying the new TaskContext usage in manager calls.
• The function is fetched by name and arguments are validated.

🧰 Tools

🪛 cppcheck (2.10-2)

[error] 39-39: syntax error

(syntaxError)

58-58: Ensure function pointer is not null.
Ensures retrieval of a valid function from the manager.

63-63: Invoke the function with valid arguments.
This tests correct argument packaging and function invocation with new context usage.

70-70: Wrong number of arguments test.
Properly checks for error handling in the invocation.

81-81: Wrong type of arguments test.
Again, confirms the correctness of error handling.

90-100: Second test block: tuple return with context.
Demonstrates returning a tuple while referencing the new TaskContext signature.

107-107: Assertion for correct tuple content.
Ensures returned data is accurate.

114-118: Third test block: data inputs.
Expands coverage to confirm data retrieval from storage is integrated with the new context-based tasks.

119-125: Create and store data in the driver context.
Ensures data is placed in the store, ready to be fetched.

126-147: Invoke data_test function.
• Confirms that the function retrieves the correct int value.
• Ensures data cleanup from storage.

src/spider/worker/FunctionManager.hpp (5)

275-279: Validation of TaskContext
These static_assert checks are helpful for ensuring the first argument is TaskContext. This is a good compile-time safety check.

9-9: Check for consistent library usage
The addition of is appropriate here, but please confirm that we consistently use smart pointers in all areas referencing memory management to avoid potential ownership confusion.

✅ Verification successful

Smart pointer usage is consistent throughout the codebase
The codebase demonstrates a robust and consistent use of smart pointers (std::shared_ptr and std::unique_ptr) for memory management. Raw pointer usage is minimal and appears only in test utilities or specific DLL loading scenarios. The inclusion of <memory> in FunctionManager.hpp aligns with the project's memory management practices.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Searching for raw pointer usage or unorthodox memory allocation
rg -A 5 -E "new |delete "

Length of output: 424

Script:

#!/bin/bash
# Search for raw pointer usage patterns
rg "new\s+\w+[\s\*]" --type cpp

# Search for smart pointer usage
rg "std::(unique_ptr|shared_ptr|weak_ptr)" --type cpp

# Search for delete operations
rg "\bdelete\b" --type cpp

# Look for memory header inclusion
rg "#include\s+[<\"]memory[>\"]" --type cpp

Length of output: 25455

388-393: Function registration placeholders
The usage of std::placeholders ensures our parameters are forwarded in order, but confirm that the _1 (TaskContext) and _2 (ArgsBuffer) ordering remains correct in all usage cases.

✅ Verification successful

Let me gather more information about the parameter ordering by examining the FunctionInvoker implementation and its usage.

Let me try a different approach to find the FunctionInvoker implementation.

Parameter ordering is correctly enforced through static assertions

The implementation enforces the correct parameter ordering through compile-time checks:

The static_assert verifies that TaskContext is the first parameter of the function
The apply method signature explicitly defines the order as (function, TaskContext, ArgsBuffer)
The placeholders _1 and _2 correctly map to TaskContext and ArgsBuffer respectively

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Searching for register_function usage to confirm parameter order is correct
rg -A 3 "register_function"

Length of output: 1220

Script:

#!/bin/bash
# Search for FunctionInvoker class implementation and usage
ast-grep --pattern 'struct FunctionInvoker {
  $$$
}'

# Also search for any apply method implementations
ast-grep --pattern 'apply($$$) {
  $$$
}'

Length of output: 99

Script:

#!/bin/bash
# Search for FunctionInvoker template class and its apply method
rg -A 10 "FunctionInvoker"

# Also search specifically for the apply method with more context
rg -A 10 "apply.*TaskContext.*ArgsBuffer"

Length of output: 2890

22-24: Ensure correct headers referencing
You are including multiple client-side headers (Data.hpp, task.hpp, TaskContext.hpp). Verify that these dependencies align with the worker domain, so we do not expose client details unnecessarily.

✅ Verification successful

Let me gather more information about the actual usage of these headers in the worker domain.

Let me check the actual usage of these headers in the worker code to understand if they're truly necessary.

Based on the analysis of the codebase, I can now provide a final response:

The client header dependencies are necessary and correctly used
The client headers (Data.hpp, task.hpp, TaskContext.hpp) are essential in the worker domain as they define the interfaces that both client and worker components need to share. The FunctionManager uses these interfaces to:

Handle task execution context via TaskContext
Process data objects via the Data template
Manage task-related structures

The core implementations (DataImpl, TaskContextImpl) properly extend these shared interfaces while keeping implementation details separate. This is a common pattern where interfaces are shared between components while implementations remain isolated.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
#!/bin/bash
# Searching for direct references to the client directory from the worker code to confirm minimal coupling
rg -A 3 "src/spider/client"

Length of output: 880

Script:

#!/bin/bash
# Check the content of FunctionManager.hpp to understand how these headers are used
cat src/spider/worker/FunctionManager.hpp

# Also check for any other worker files that might be using these client headers
rg -l "client/Data\.hpp|client/task\.hpp|client/TaskContext\.hpp" src/spider/worker/

Length of output: 14562

Script:

#!/bin/bash
# Check how these client headers are used in task_executor.cpp
cat src/spider/worker/task_executor.cpp

# Check if there are any core interfaces that should be used instead
fd -e hpp -e cpp -t f . src/spider/core/ -x cat {} \;

Length of output: 21072

269-270: New apply method signature
The apply method now expects a TaskContext. This is consistent with the updated function alias. Confirm that external callers have updated their invocation approach accordingly.

src/spider/core/DataImpl.hpp (2)

12-24: Utility class design
DataImpl centralizes creation and retrieval of Data. This design pattern is beneficial for keeping data management logic in one place.

7-8: Redundant includes?
We see both "../client/Data.hpp" and "../core/Data.hpp" which may create confusion if multiple Data classes exist. Validate if both are required.

✅ Verification successful

Let me gather more information about the actual usage of these Data classes.

Both includes are required - they serve different purposes

The includes are not redundant. The file structure shows that:

client/Data.hpp defines the public templated Data<T> class that users interact with
core/Data.hpp defines the internal implementation class core::Data that handles the underlying storage
DataImpl.hpp bridges between these two, providing implementation details and factory methods

The includes are necessary as DataImpl.hpp needs both the public interface and the core implementation to fulfill its bridging role.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Searching references of Data.hpp in core and client to confirm usage
rg -A 5 "Data\.hpp"

Length of output: 7716

Script:

#!/bin/bash
# Check the content of both Data.hpp files to understand their relationship
echo "=== Content of client/Data.hpp ==="
cat src/spider/client/Data.hpp
echo -e "\n=== Content of core/Data.hpp ==="
cat src/spider/core/Data.hpp

# Check DataImpl.hpp implementation to see how these are used
echo -e "\n=== Content of core/DataImpl.hpp ==="
cat src/spider/core/DataImpl.hpp

Length of output: 7410

tests/worker/worker-test.cpp (2)

9-9: Signature update
sum_test now expects a TaskContext. This is consistent with the new function signature across the codebase. Good practice to display the context usage in logs or debugging.

18-20: data_test function
This function simply retrieves and returns the data's integer value. Ensure there are tests verifying correct behaviour if the underlying storage is empty or invalid.

src/spider/core/TaskContextImpl.hpp (1)

1-2: Header guard check
Header guard naming is consistent with other files.

src/spider/core/Data.hpp (1)

22-22: Excellent optimization of get_value()

The change to return a const reference improves performance by avoiding unnecessary string copies. This is particularly beneficial when dealing with large strings.

Let's verify the usage of get_value() to ensure proper lifetime management:

✅ Verification successful

Based on the search results, I can generate the final response as the usage patterns are clear. The get_value() is used safely throughout the codebase:

The const reference return type is safe and well-handled

The codebase demonstrates proper lifetime management of the returned reference:

Used in comparison operations where the reference is immediately consumed
Stored in const references with proper lifetime scope
Properly handled when passing to storage operations

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for get_value() usage to verify proper handling of the returned reference
rg --type cpp "get_value\(\)" -B 2 -A 2

Length of output: 7669

tests/CMakeLists.txt (1)

34-39: Well-structured CMake dependency additions

The addition of spider_client_lib dependencies is properly structured with clear multi-line formatting. This improves readability and maintainability.

Also applies to: 54-59

src/spider/storage/DataStorage.hpp (1)

29-29: Verify implementation in derived storage classes

The new pure virtual method requires implementation in all concrete storage classes.

✅ Verification successful

Pure virtual method set_data_locality is properly implemented

The verification shows that MySqlDataStorage, the only concrete class deriving from DataStorage, correctly implements the pure virtual method set_data_locality with the required signature:

auto set_data_locality(Data const& data) -> StorageErr override;

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for classes deriving from DataStorage and verify set_data_locality implementation
ast-grep --pattern 'class $_ : public DataStorage {
  $$$
  auto set_data_locality(Data const& data) -> StorageErr $$$
  $$$
}'

Length of output: 129

Script:

#!/bin/bash
# First, let's find all files that might contain DataStorage derived classes
fd -e hpp -e cpp | xargs rg "class.*:.*DataStorage" -l

# Then let's examine the content of these files to check implementations
fd -e hpp -e cpp | xargs rg "class.*:.*DataStorage" -A 50

Length of output: 3838

src/spider/CMakeLists.txt (2)

13-13: LGTM: Header additions align with PR objectives

The addition of DataImpl.hpp and TaskContextImpl.hpp headers corresponds to the new TaskContext and Data components being introduced.

Also applies to: 17-17

88-93: Verify the necessity of spider_client_lib dependency

While adding spider_client_lib enables the task executor to use client functionality, this creates tighter coupling between components. Consider if this dependency can be minimized or if an interface abstraction would be more appropriate.

✅ Verification successful

Let me gather more information about the task executor's dependencies and usage.

Let me check the actual usage of client-related code in the task executor files we found.

Let me check the TaskContext.hpp to understand the dependency better.

The spider_client_lib dependency is necessary and correctly used

The dependency on spider_client_lib is justified as the task executor requires TaskContext, which is a core interface for tasks to interact with Spider functionalities. The coupling is intentional and appropriate because:

TaskContext is used as a mandatory first parameter in the function execution interface
It provides essential functionality like abort operations and data storage access
The implementation follows a clean separation where TaskContextImpl (core) provides the actual implementation

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check if spider_client_lib is actually used in task executor
# Look for client library includes in task executor sources

echo "Checking task executor sources for client library usage:"
for file in $(echo "${SPIDER_TASK_EXECUTOR_SOURCES}" | tr ' ' '\n'); do
  if [ -n "$file" ]; then
    echo "Checking $file:"
    rg -l "spider/client" "$file" || true
  fi
done

Length of output: 148

Script:

#!/bin/bash
# First, let's find the task executor source files
fd "task.*executor" --type f

# Then check for any client-related includes or usage
rg -l "client" --type cpp --type hpp

# Also check CMakeLists.txt files for context
fd "CMakeLists.txt" --type f --exec rg -l "spider_client"

Length of output: 385

Script:

#!/bin/bash
# Check TaskExecutor files for client usage
echo "=== Checking TaskExecutor.cpp ==="
rg "client" "src/spider/worker/TaskExecutor.cpp"

echo -e "\n=== Checking TaskExecutor.hpp ==="
rg "client" "src/spider/worker/TaskExecutor.hpp"

echo -e "\n=== Checking task_executor.cpp ==="
rg "client" "src/spider/worker/task_executor.cpp"

# Also check what spider_client_lib provides
echo -e "\n=== Checking spider_client_lib definition ==="
rg "add_library.*spider_client" src/spider/CMakeLists.txt -A 5

Length of output: 1151

Script:

#!/bin/bash
# Check the TaskContext.hpp content
echo "=== Checking TaskContext.hpp ==="
rg -A 10 "class TaskContext" src/spider/client/TaskContext.hpp

# Check if TaskContext is used elsewhere in worker
echo -e "\n=== Checking TaskContext usage in worker ==="
rg "TaskContext" src/spider/worker/

# Check if there are any other client dependencies in worker
echo -e "\n=== Checking other client dependencies in worker ==="
rg -l "spider/client" src/spider/worker/

Length of output: 2040

src/spider/client/Driver.hpp (2)

10-12: LGTM: Required includes for serialization support

The includes are appropriately added to support the new data builder functionality with serialization capabilities.

57-61: LGTM: Well-designed data builder method

The get_data_builder template method is well-constrained with the Serializable concept and follows the builder pattern, promoting type safety and clean object construction.

✅ Verification successful

Let me verify the Packable concept to ensure the Serializable constraint is properly defined.

Let me check the Data class implementation to verify the Builder pattern is properly implemented.

Let me check for the Builder class definition, as it might be defined separately or using a different pattern.

The get_data_builder template method is correctly implemented and well-constrained

The verification confirms that:

The Serializable concept is properly defined with both SerializableImpl and DeSerializableImpl constraints
The Data class includes a well-implemented Builder pattern with proper type safety
The Builder class provides fluent interface methods for locality and cleanup configuration
The msgpack serialization/deserialization is correctly integrated

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify the Serializable concept definition exists
# and check its constraints

echo "Checking for Serializable concept definition:"
rg -A 5 "concept\s+Serializable" || \
ast-grep --pattern 'concept Serializable { $$$_ }'

Length of output: 673

Script:

#!/bin/bash
# Check for Packable concept definition
echo "Checking for Packable concept definition:"
rg -A 5 "concept\s+Packable" || \
ast-grep --pattern 'concept Packable { $$$_ }'

# Check for any usage of Data::Builder with Serializable
echo -e "\nChecking Data::Builder usage with Serializable:"
rg "Data<.*>::Builder"

Length of output: 783

Script:

#!/bin/bash
# Check Data class implementation
echo "Checking Data class implementation:"
rg -A 10 "class\s+Data\s*(<.*>)?\s*\{" || \
ast-grep --pattern 'class Data { $$$_ }'

# Check Builder class implementation within Data
echo -e "\nChecking Builder class implementation:"
ast-grep --pattern 'class Data {
  $$$
  class Builder {
    $$$
  }
  $$$
}'

Length of output: 1495

Script:

#!/bin/bash
# Check for Builder class or struct definition in Data files
echo "Checking for Builder definition in Data files:"
rg -A 10 "class\s+Builder|struct\s+Builder" src/spider/client/Data.hpp src/spider/core/Data.hpp

# Check for any other Builder-related code in Data files
echo -e "\nChecking for Builder-related code:"
rg "Builder" src/spider/client/Data.hpp src/spider/core/Data.hpp

Length of output: 1486

src/spider/worker/TaskExecutor.hpp (1)

118-119: LGTM: Proper NOLINT directive placement

The NOLINTEND directive is properly placed to match its corresponding NOLINTBEGIN.

tests/integration/client.py (1)

41-51: LGTM! Well-structured data classes.

The Driver and Data classes are properly defined with appropriate field types and clear purpose.

src/spider/storage/MysqlStorage.hpp (1)

91-91: LGTM! Method declaration follows class conventions.

The set_data_locality method is properly declared with appropriate parameter types and return type.

tests/integration/test_scheduler_worker.py (2)

11-17: LGTM! Well-organized imports.

The imports are properly organized and include all necessary components for the new data-related functionality.

135-164: LGTM! Well-structured test fixture.

The fixture properly sets up the test environment with appropriate cleanup in the teardown phase.

tests/worker/test-TaskExecutor.cpp (3)

Line range hint 56-80: Test case properly validates task execution with storage integration

The test case follows good testing practices by:

Including storage integration in the test setup
Validating both execution success and result correctness

🧰 Tools

🪛 cppcheck (2.10-2)

[error] 56-56: syntax error

(syntaxError)

81-103: Test case properly validates argument validation error handling

The test effectively verifies:

Error detection for incorrect argument count
Specific error type validation

104-125: Test case properly validates task execution failure handling

The test effectively verifies:

Task execution failure detection
Specific error type validation

src/spider/worker/worker.cpp (1)

240-247: Constructor call properly integrates storage URL parameter

The TaskExecutor construction is correctly updated to include the storage URL parameter, maintaining consistency with the interface changes.

coderabbitai · 2024-12-21T06:58:13Z

src/spider/core/TaskContextImpl.hpp

+    static auto create_task_context(
+            std::shared_ptr<DataStorage> const& data_storage,
+            std::shared_ptr<MetadataStorage> const& metadata_storage
+    ) -> TaskContext {
+        return TaskContext{data_storage, metadata_storage};
+    }


💡 Codebase verification

Null pointer validation needed in TaskContext construction

Based on the code analysis, there are no explicit null pointer checks when constructing TaskContext objects. Since these shared pointers are crucial for the task execution context, I recommend adding validation to prevent potential null pointer issues.

Add null pointer validation in create_task_context method in src/spider/core/TaskContextImpl.hpp

Consider throwing a meaningful exception (e.g., std::invalid_argument) if either data_storage or metadata_storage is null

🔗 Analysis chain

create_task_context
The static method create_task_context cleanly constructs a contextual environment. Confirm that the usage covers potential error handling if data_storage or metadata_storage is null.

Let me check the implementation details and usage context more thoroughly.

Let me check the TaskContext implementation and constructor to see how it handles null pointers.

Let me try one final check to see the constructor implementation.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash # Searching for create_task_context usage to see if we pass null pointers rg "create_task_context"

Length of output: 627

Script:

#!/bin/bash # Check TaskContext constructor and implementation ast-grep --pattern 'class TaskContext { $$$ }' # Check the actual usage in test and implementation files rg -A 5 "create_task_context"

Length of output: 2623

Script:

#!/bin/bash # Check TaskContext implementation ast-grep --pattern 'struct TaskContext { $$$ }' # Check TaskContextImpl implementation ast-grep --pattern 'class TaskContextImpl { $$$ }' # Look for any null checks in the relevant files rg -A 3 "if.*null|nullptr" src/spider/core/

Length of output: 152

Script:

#!/bin/bash # Look for TaskContext constructor implementation rg -B 2 -A 5 "TaskContext::" # Check for any assertions or validations in the header rg -B 2 -A 5 "TaskContext\{" src/spider/core/

Length of output: 1180

coderabbitai · 2024-12-21T06:58:13Z

tests/integration/client.py

+def remove_data(conn, data: Data):
+    cursor = conn.cursor()
+
+    cursor.execute("DELETE FROM data WHERE id = %s", (data.id.bytes,))
+    conn.commit()
+    cursor.close()


🛠️ Refactor suggestion

Add error handling for data removal.

The function should handle potential database errors during deletion.

def remove_data(conn, data: Data): cursor = conn.cursor() - - cursor.execute("DELETE FROM data WHERE id = %s", (data.id.bytes,)) - conn.commit() - cursor.close() + try: + cursor.execute("DELETE FROM data WHERE id = %s", (data.id.bytes,)) + conn.commit() + except mysql.connector.Error as e: + conn.rollback() + raise RuntimeError(f"Failed to remove data: {e}") + finally: + cursor.close()

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def remove_data(conn, data: Data):

cursor = conn.cursor()

cursor.execute("DELETE FROM data WHERE id = %s", (data.id.bytes,))

conn.commit()

cursor.close()

def remove_data(conn, data: Data):

cursor = conn.cursor()

try:

cursor.execute("DELETE FROM data WHERE id = %s", (data.id.bytes,))

conn.commit()

except mysql.connector.Error as e:

conn.rollback()

raise RuntimeError(f"Failed to remove data: {e}")

finally:

cursor.close()

coderabbitai · 2024-12-21T06:58:13Z

tests/integration/client.py

+def add_driver(conn, driver: Driver):
+    cursor = conn.cursor()
+
+    cursor.execute(
+        "INSERT INTO drivers (id, address) VALUES (%s, %s)", (driver.id.bytes, driver.addr)
+    )
+
+    conn.commit()
+    cursor.close()
+


🛠️ Refactor suggestion

Add error handling for database operations.

The function should handle potential database errors that could occur during insertion.

def add_driver(conn, driver: Driver): cursor = conn.cursor() - - cursor.execute( - "INSERT INTO drivers (id, address) VALUES (%s, %s)", (driver.id.bytes, driver.addr) - ) - - conn.commit() - cursor.close() + try: + cursor.execute( + "INSERT INTO drivers (id, address) VALUES (%s, %s)", + (driver.id.bytes, driver.addr) + ) + conn.commit() + except mysql.connector.Error as e: + conn.rollback() + raise RuntimeError(f"Failed to add driver: {e}") + finally: + cursor.close()

Committable suggestion skipped: line range outside the PR's diff.

coderabbitai · 2024-12-21T06:58:13Z

tests/integration/client.py

+def add_driver_data(conn, driver: Driver, data: Data):
+    cursor = conn.cursor()
+
+    cursor.execute("INSERT INTO data (id, value) VALUES (%s, %s)", (data.id.bytes, data.value))
+    cursor.execute(
+        "INSERT INTO data_ref_driver (driver_id, id) VALUES (%s, %s)",
+        (driver.id.bytes, data.id.bytes),
+    )
+
+    conn.commit()
+    cursor.close()
+


🛠️ Refactor suggestion

Improve transaction handling for multiple database operations.

The function performs two database operations that should be atomic. If the second operation fails, the first operation should be rolled back.

def add_driver_data(conn, driver: Driver, data: Data): cursor = conn.cursor() - - cursor.execute("INSERT INTO data (id, value) VALUES (%s, %s)", (data.id.bytes, data.value)) - cursor.execute( - "INSERT INTO data_ref_driver (driver_id, id) VALUES (%s, %s)", - (driver.id.bytes, data.id.bytes), - ) - - conn.commit() - cursor.close() + try: + cursor.execute("INSERT INTO data (id, value) VALUES (%s, %s)", + (data.id.bytes, data.value)) + cursor.execute( + "INSERT INTO data_ref_driver (driver_id, id) VALUES (%s, %s)", + (driver.id.bytes, data.id.bytes), + ) + conn.commit() + except mysql.connector.Error as e: + conn.rollback() + raise RuntimeError(f"Failed to add driver data: {e}") + finally: + cursor.close()

Committable suggestion skipped: line range outside the PR's diff.

coderabbitai · 2024-12-21T06:58:13Z

src/spider/storage/MysqlStorage.cpp

+auto MySqlDataStorage::set_data_locality(Data const& data) -> StorageErr {
+    try {
+        std::unique_ptr<sql::PreparedStatement> const delete_statement(
+                m_conn->prepareStatement("DELETE FROM `data_locality` WHERE `id` = ?")
+        );
+        sql::bytes id_bytes = uuid_get_bytes(data.get_id());
+        delete_statement->setBytes(1, &id_bytes);
+        delete_statement->executeUpdate();
+        std::unique_ptr<sql::PreparedStatement> const insert_statement(m_conn->prepareStatement(
+                "INSERT INTO `data_locality` (`id`, `address`) VALUES(?, ?)"
+        ));
+        for (std::string const& addr : data.get_locality()) {
+            insert_statement->setBytes(1, &id_bytes);
+            insert_statement->setString(2, addr);
+            insert_statement->executeUpdate();
+        }
+        std::unique_ptr<sql::PreparedStatement> const hard_locality_statement(
+                m_conn->prepareStatement("UPDATE `data` SET `hard_locality` = ? WHERE `id` = ?")
+        );
+        hard_locality_statement->setBoolean(1, data.is_hard_locality());
+        hard_locality_statement->setBytes(2, &id_bytes);
+        hard_locality_statement->executeUpdate();
+    } catch (sql::SQLException& e) {
+        m_conn->rollback();
+        return StorageErr{StorageErrType::OtherErr, e.what()};
+    }
+    m_conn->commit();
+    return StorageErr{};
+}


🛠️ Refactor suggestion

Enhance data locality update robustness

Consider these improvements for better error handling and safety:

Validate data existence before updates

Check update operation success

Use const for SQL statement strings

auto MySqlDataStorage::set_data_locality(Data const& data) -> StorageErr { try { + // Verify data exists + std::unique_ptr<sql::PreparedStatement> const check_statement( + m_conn->prepareStatement("SELECT 1 FROM `data` WHERE `id` = ?") + ); + sql::bytes id_bytes = uuid_get_bytes(data.get_id()); + check_statement->setBytes(1, &id_bytes); + std::unique_ptr<sql::ResultSet> const check_result(check_statement->executeQuery()); + if (check_result->rowsCount() == 0) { + m_conn->rollback(); + return StorageErr{ + StorageErrType::KeyNotFoundErr, + fmt::format("no data with id {}", boost::uuids::to_string(data.get_id())) + }; + } + std::unique_ptr<sql::PreparedStatement> const delete_statement( - m_conn->prepareStatement("DELETE FROM `data_locality` WHERE `id` = ?") + m_conn->prepareStatement("DELETE FROM `data_locality` WHERE `id` = ?") ); - sql::bytes id_bytes = uuid_get_bytes(data.get_id()); delete_statement->setBytes(1, &id_bytes); - delete_statement->executeUpdate(); + int const deleted_rows = delete_statement->executeUpdate(); + if (deleted_rows < 0) { + m_conn->rollback(); + return StorageErr{StorageErrType::OtherErr, "Failed to delete existing locality"}; + }

Committable suggestion skipped: line range outside the PR's diff.

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (4)

tests/worker/test-FunctionManager.cpp (2)
22-33: Document intended usage of TaskContext parameter

The TaskContext parameter is currently unused in all test functions. If this parameter is intended for future use or required by design, please add documentation explaining its purpose and expected future usage.
-auto int_test(spider::TaskContext const& /*context*/, int const x, int const y) -> int {
+// TODO: TaskContext will be used in future to [describe intended usage]
+auto int_test(spider::TaskContext const& context, int const x, int const y) -> int {
90-113: Add error test cases for tuple return function

The test case for tuple returns only covers the happy path. Consider adding test cases for:

Wrong number of arguments

Wrong argument types

Invalid string input
tests/worker/test-TaskExecutor.cpp (2)
Line range hint 58-127: Test cases properly adapted for storage integration

The modifications consistently integrate the storage URL parameter across all test cases while maintaining the original test coverage and assertions. However, consider adding test cases for storage-specific failure scenarios.

Consider adding test cases for:

Invalid storage URL

Storage connection failures

Storage operation failures

170-172: Remove redundant value check

The has_value() check is redundant as it's already verified by the REQUIRE statement at line 169.

Suggested fix:
-    if (optional_result.has_value()) {
-        REQUIRE(3 == optional_result.value());
-    }
+    REQUIRE(3 == optional_result.value());

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5e87551 and 1508038.

📒 Files selected for processing (3)

src/spider/worker/TaskExecutor.hpp (4 hunks)
tests/worker/test-FunctionManager.cpp (3 hunks)
tests/worker/test-TaskExecutor.cpp (4 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

src/spider/worker/TaskExecutor.hpp

🧰 Additional context used

🪛 cppcheck (2.10-2)

tests/worker/test-FunctionManager.cpp

[error] 39-39: syntax error

(syntaxError)

tests/worker/test-TaskExecutor.cpp

[error] 58-58: syntax error

(syntaxError)

🔇 Additional comments (2)

tests/worker/test-FunctionManager.cpp (1)

35-37: LGTM! Function registration is complete

All test functions are properly registered using the SPIDER_WORKER_REGISTER_TASK macro.

tests/worker/test-TaskExecutor.cpp (1)

2-2: LGTM: Header additions are well-organized

The new headers appropriately support the Data and Storage functionality being added to the TaskExecutor tests.

Also applies to: 12-14, 17-22

coderabbitai · 2024-12-21T07:55:34Z

tests/worker/test-FunctionManager.cpp

+TEMPLATE_LIST_TEST_CASE(
+        "Register and run function with data inputs",
+        "[core][storage]",
+        spider::test::StorageTypeList
+) {
+    auto [unique_metadata_storage, unique_data_storage] = spider::test::
+            create_storage<std::tuple_element_t<0, TestType>, std::tuple_element_t<1, TestType>>();
+
+    std::shared_ptr<spider::core::MetadataStorage> const metadata_storage
+            = std::move(unique_metadata_storage);
+    std::shared_ptr<spider::core::DataStorage> const data_storage = std::move(unique_data_storage);
+
+    msgpack::sbuffer buffer;
+    msgpack::pack(buffer, 3);
+    spider::core::Data const data{std::string{buffer.data(), buffer.size()}};
+    boost::uuids::random_generator gen;
+    boost::uuids::uuid const driver_id = gen();
+    spider::core::Driver const driver{driver_id, "127.0.0.1"};
+    REQUIRE(metadata_storage->add_driver(driver).success());
+    REQUIRE(data_storage->add_driver_data(driver_id, data).success());
+
+    spider::TaskContext const context
+            = spider::core::TaskContextImpl::create_task_context(data_storage, metadata_storage);
+
+    spider::core::FunctionManager const& manager = spider::core::FunctionManager::get_instance();
+
+    spider::core::Function const* function = manager.get_function("data_test");
+
+    spider::core::ArgsBuffer const args_buffers = spider::core::create_args_buffers(data.get_id());
+    msgpack::sbuffer const result = (*function)(context, args_buffers);
+    REQUIRE(3 == spider::core::response_get_result<int>(result).value_or(0));
+
+    REQUIRE(data_storage->remove_data(data.get_id()).success());
+}


🛠️ Refactor suggestion

Enhance data test coverage

Consider adding the following test scenarios:

Attempting to access non-existent data

Handling invalid data IDs

Testing data removal with invalid IDs

Verifying behaviour when storage operations fail

Example test case:

// Test accessing non-existent data spider::core::ArgsBuffer const invalid_args_buffers = spider::core::create_args_buffers("invalid-id"); msgpack::sbuffer const error_result = (*function)(context, invalid_args_buffers); auto error = spider::core::response_get_error(error_result); REQUIRE(error.has_value());

coderabbitai · 2024-12-21T07:55:34Z

tests/worker/test-TaskExecutor.cpp

+TEMPLATE_LIST_TEST_CASE(
+        "Task execute data argument",
+        "[worker][storage]",
+        spider::test::StorageTypeList
+) {
+    auto [unique_metadata_storage, unique_data_storage] = spider::test::
+            create_storage<std::tuple_element_t<0, TestType>, std::tuple_element_t<1, TestType>>();
+    std::shared_ptr<spider::core::MetadataStorage> const metadata_storage
+            = std::move(unique_metadata_storage);
+    std::shared_ptr<spider::core::DataStorage> const data_storage = std::move(unique_data_storage);
+
+    // Create driver and data
+    msgpack::sbuffer buffer;
+    msgpack::pack(buffer, 3);
+    spider::core::Data const data{std::string{buffer.data(), buffer.size()}};
+    boost::uuids::random_generator gen;
+    boost::uuids::uuid const driver_id = gen();
+    spider::core::Driver const driver{driver_id, "127.0.0.1"};
+    REQUIRE(metadata_storage->add_driver(driver).success());
+    REQUIRE(data_storage->add_driver_data(driver_id, data).success());
+
+    absl::flat_hash_map<
+            boost::process::v2::environment::key,
+            boost::process::v2::environment::value> const environment_variable
+            = get_environment_variable();
+
+    boost::asio::io_context context;
+
+    spider::worker::TaskExecutor executor{
+            context,
+            "data_test",
+            spider::test::cStorageUrl,
+            get_libraries(),
+            environment_variable,
+            data.get_id()
+    };
+    context.run();
+    executor.wait();
+    REQUIRE(executor.succeed());
+    std::optional<int> const optional_result = executor.get_result<int>();
+    REQUIRE(optional_result.has_value());
+    if (optional_result.has_value()) {
+        REQUIRE(3 == optional_result.value());
+    }
+
+    // Clean up
+    REQUIRE(data_storage->remove_data(data.get_id()).success());
+}


🛠️ Refactor suggestion

Add test coverage for data retrieval failures

The template test case only covers the happy path. Consider adding test cases for data retrieval failures.

Add test cases for:

Non-existent data ID

Corrupted data

Storage access failures during data retrieval

Would you like me to provide example test case implementations for these scenarios?

coderabbitai · 2024-12-21T07:55:34Z

tests/worker/test-TaskExecutor.cpp

+    msgpack::sbuffer buffer;
+    msgpack::pack(buffer, 3);
+    spider::core::Data const data{std::string{buffer.data(), buffer.size()}};


⚠️ Potential issue

Add error handling for data serialization

The MsgPack serialization could throw exceptions but lacks try-catch blocks.

Suggested fix:

msgpack::sbuffer buffer; - msgpack::pack(buffer, 3); - spider::core::Data const data{std::string{buffer.data(), buffer.size()}}; + try { + msgpack::pack(buffer, 3); + spider::core::Data const data{std::string{buffer.data(), buffer.size()}}; + } catch (const msgpack::type_error& e) { + FAIL("Failed to serialize data: " << e.what()); + }

Committable suggestion skipped: line range outside the PR's diff.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (3)

tests/worker/test-TaskExecutor.cpp (3)

Line range hint 58-127: Add storage-related validation to existing test cases

While the storage URL parameter has been added, the test cases don't verify storage interactions. Consider adding assertions to validate that:

The storage connection is established successfully
Resources are properly cleaned up after test execution

Example addition for the success test case:

 TEST_CASE("Task execute success", "[worker][storage]") {
+    // Verify storage setup
+    REQUIRE(spider::test::verify_storage_connection(spider::test::cStorageUrl));
     
     // ... existing test code ...
     
+    // Verify cleanup
+    REQUIRE(executor.cleanup_storage_resources().success());
 }

168-172: Remove redundant null check

The has_value() check is redundant after the REQUIRE(optional_result.has_value()). The value access can be simplified.

     REQUIRE(optional_result.has_value());
-    if (optional_result.has_value()) {
-        REQUIRE(3 == optional_result.value());
-    }
+    REQUIRE(3 == optional_result.value());

174-176: Enhance cleanup with RAII pattern

Consider using RAII to ensure cleanup occurs even if the test fails. This prevents resource leaks in the storage system.

+    class CleanupGuard {
+    public:
+        CleanupGuard(std::shared_ptr<spider::core::DataStorage> storage, 
+                     spider::core::Data::Id id) 
+            : storage_(storage), id_(id) {}
+        ~CleanupGuard() {
+            if (storage_) {
+                storage_->remove_data(id_);
+            }
+        }
+    private:
+        std::shared_ptr<spider::core::DataStorage> storage_;
+        spider::core::Data::Id id_;
+    };
+    
+    // Create cleanup guard at the start
+    CleanupGuard cleanup_guard(data_storage, data.get_id());
-    // Clean up
-    REQUIRE(data_storage->remove_data(data.get_id()).success());

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1508038 and 9b3098b.

📒 Files selected for processing (1)

tests/worker/test-TaskExecutor.cpp (4 hunks)

🧰 Additional context used

🪛 cppcheck (2.10-2)

tests/worker/test-TaskExecutor.cpp

[error] 58-58: syntax error

(syntaxError)

🔇 Additional comments (2)

tests/worker/test-TaskExecutor.cpp (2)

129-176: Add test coverage for error scenarios

The template test case only covers the happy path. Consider adding test cases for:

Invalid data ID
Corrupted data
Storage access failures

141-143: Add error handling for data serialization

The MsgPack serialization could throw exceptions but lacks try-catch blocks.

sitaowang1998 and others added 16 commits December 18, 2024 23:14

Add TaskContext as first argument of task

b7b4a7e

Add data storage set locality and basic client data implementation

aa8115c

Add serializer for data

0330fde

Fix clang tidy

b9d9db3

Fix clang tidy

84d5d7f

Add data implementation

988f2c7

Add create data builder interface in driver and task context

7a424a3

Remove direct Data serializer becuase data serialization is ambiguous

c6ee8fe

Add data serializer

a791b1e

Reformat cmake files

c34e0eb

Add task id into args buffer message

cb662e7

Add data support in task

ef1104b

Add function manager test for data

65bc812

Add data in task executor unit test

6222a65

Add data test in integration test

82e6e17

Fix clang tidy

5e87551

coderabbitai bot reviewed Dec 21, 2024

View reviewed changes

Fix clang tidy

1508038

coderabbitai bot reviewed Dec 21, 2024

View reviewed changes

Fix clang tidy

9b3098b

coderabbitai bot reviewed Dec 21, 2024

View reviewed changes

sitaowang1998 changed the title ~~feat: Add TaskContext and Data in task~~ feat: Add TaskContext and Data as task argument Dec 21, 2024

sitaowang1998 merged commit 03d043a into y-scope:main Dec 21, 2024
5 checks passed

sitaowang1998 deleted the client-task branch December 21, 2024 16:58

coderabbitai bot mentioned this pull request Dec 23, 2024

feat: Add failure recovery by retrying the whole job #41

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add TaskContext and Data as task argument #40

feat: Add TaskContext and Data as task argument #40

sitaowang1998 commented Dec 21, 2024 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Dec 21, 2024 •

edited

Loading

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

coderabbitai bot left a comment

coderabbitai bot Dec 21, 2024

coderabbitai bot Dec 21, 2024

coderabbitai bot Dec 21, 2024

coderabbitai bot Dec 21, 2024

coderabbitai bot Dec 21, 2024

coderabbitai bot left a comment

coderabbitai bot Dec 21, 2024

coderabbitai bot Dec 21, 2024

coderabbitai bot Dec 21, 2024

coderabbitai bot left a comment

feat: Add TaskContext and Data as task argument #40

feat: Add TaskContext and Data as task argument #40

Conversation

sitaowang1998 commented Dec 21, 2024 • edited by coderabbitai bot Loading

Description

Validation performed

Summary by CodeRabbit

coderabbitai bot commented Dec 21, 2024 • edited Loading

Walkthrough

Changes

Possibly Related PRs

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Dec 21, 2024

Choose a reason for hiding this comment

coderabbitai bot Dec 21, 2024

Choose a reason for hiding this comment

coderabbitai bot Dec 21, 2024

Choose a reason for hiding this comment

coderabbitai bot Dec 21, 2024

Choose a reason for hiding this comment

coderabbitai bot Dec 21, 2024

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Dec 21, 2024

Choose a reason for hiding this comment

coderabbitai bot Dec 21, 2024

Choose a reason for hiding this comment

coderabbitai bot Dec 21, 2024

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

sitaowang1998 commented Dec 21, 2024 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Dec 21, 2024 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)