test: Add unit tests for metadata storage and fix bugs in MySQL storage backend #23

sitaowang1998 · 2024-11-10T17:46:01Z

Description

As title.

Validation performed

Pass all github workflows
Pass all unit tests on ubuntu focal devcontainer.

Summary by CodeRabbit

Release Notes

New Features
- Simplified Task class constructor, requiring only the function name.
- Added methods to access task inputs and outputs.
- New methods for retrieving driver and scheduler addresses in MetadataStorage.
- Introduced utility functions for comparing tasks and task graphs.
Bug Fixes
- Enhanced error handling in database operations.
Tests
- New test suite for MetadataStorage functionality.
- Updated test cases for improved structure and clarity.
- Added new utility functions for comparing various data structures related to tasks and task graphs.

coderabbitai · 2024-11-10T17:46:08Z

Walkthrough

The pull request introduces significant modifications to several classes within the spider project, particularly focusing on the Task, TaskGraph, and MetadataStorage classes. Key changes include the removal of the TaskCreatorType enum and associated parameters from the Task class, updates to method signatures and return types in the TaskGraph class, and enhancements to the MetadataStorage class with new methods for driver and scheduler management. Additionally, new utility functions for comparing task-related data structures have been added, along with new tests to validate the updated functionalities.

Changes

File	Change Summary
`src/spider/core/Task.hpp`	Removed `TaskCreatorType` enum, simplified `Task` constructor, removed `get_creator_type()` and `get_creator_id()` methods, added `get_inputs()` and `get_outputs()` methods.
`src/spider/core/TaskGraph.hpp`	Renamed `add_dependencies` to `add_dependency`, changed `get_task` return type to `std::optional<Task const>`, added overload for `get_task` returning `std::optional<Task>`.
`src/spider/storage/MetadataStorage.hpp`	Added `get_driver()` method, renamed `get_child_task()` to `get_child_tasks()` with updated return type, changed `heartbeat_timeout` parameter type from `float` to `double`, added `get_scheduler_addr()`.
`src/spider/storage/MysqlStorage.cpp`	Altered `client_id` column type in the `jobs` table, added `get_driver()` and `get_scheduler_addr()` methods, updated `add_job` logic for task dependencies, enhanced error handling, and refined SQL queries.
`src/spider/storage/MysqlStorage.hpp`	Added `get_driver()` and `get_scheduler_addr()` methods, renamed `get_child_task()` to `get_child_tasks()`, updated `heartbeat_timeout` parameter type, renamed `fetch_task()` to `fetch_full_task()`.
`tests/CMakeLists.txt`	Added new test source files: `storage/test-MetadataStorage.cpp`, `utils/CoreTaskUtils.hpp`, and `utils/CoreTaskUtils.cpp`.
`tests/storage/StorageTestHelper.hpp`	Changed include guard, added type aliases for storage types.
`tests/storage/test-DataStorage.cpp`	Updated test case declarations to use `TEMPLATE_LIST_TEST_CASE`, enhanced NOLINT comments.
`tests/storage/test-MetadataStorage.cpp`	Introduced new test suite for `MetadataStorage` with multiple test cases validating its functionalities.
`tests/utils/CoreDataUtils.hpp`	Changed namespace from `spider::core` to `spider::test`, updated type qualification for `data_equal` function.
`tests/utils/CoreTaskUtils.cpp`	Added utility functions for comparing various task-related data structures.
`tests/utils/CoreTaskUtils.hpp`	Introduced new header file defining utility functions for comparing `Task`, `TaskInput`, `TaskOutput`, and `TaskGraph` objects.

Possibly related PRs

feat: Add MySql support for storage backend #20: The changes in the main PR regarding the Task class and its methods are related to the modifications in the MySqlMetadataStorage and MySqlDataStorage classes in PR feat: Add MySql support for storage backend #20, which also involve changes to how tasks and data are managed within the storage backend.
test: Add tests for data storage #21: The new tests introduced in PR test: Add tests for data storage #21 for the MySqlDataStorage class directly relate to the changes made in the Task class in the main PR, as they validate the functionality of data storage that includes tasks and their management.

Suggested reviewers

kirkrodrigues

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 4

🧹 Outside diff range and nitpick comments (18)

tests/utils/CoreDataUtils.hpp (1)
Line range hint 7-30: Consider optimizing property access with structured bindings.

While the implementation is correct, we could make it more efficient by reducing repeated getter calls.

Here's a suggested optimization:
-inline auto data_equal(core::Data const& d1, core::Data const& d2) -> bool {
-    if (d1.get_id() != d2.get_id()) {
-        return false;
-    }
-
-    if (d1.get_key() != d2.get_key()) {
-        return false;
-    }
-
-    if (d1.get_locality() != d2.get_locality()) {
-        return false;
-    }
-
-    if (d1.is_hard_locality() != d2.is_hard_locality()) {
-        return false;
-    }
-
-    if (d1.get_value() != d2.get_value()) {
-        return false;
-    }
-
-    return true;
+inline auto data_equal(core::Data const& d1, core::Data const& d2) -> bool {
+    auto [id1, key1, locality1, hard_locality1, value1] = 
+        std::tuple{d1.get_id(), d1.get_key(), d1.get_locality(), 
+                  d1.is_hard_locality(), d1.get_value()};
+    auto [id2, key2, locality2, hard_locality2, value2] = 
+        std::tuple{d2.get_id(), d2.get_key(), d2.get_locality(), 
+                  d2.is_hard_locality(), d2.get_value()};
+    
+    return id1 == id2 && key1 == key2 && locality1 == locality2 && 
+           hard_locality1 == hard_locality2 && value1 == value2;
}
tests/storage/StorageTestHelper.hpp (1)
Line range hint 23-54: Consider improving test environment management

Two suggestions for the storage creation functions:

Add cleanup functionality to ensure test data is removed after each test

Move the hardcoded database credentials to a configuration file or environment variables

Example improvement:
+ // Load credentials from environment or config
+ static auto get_storage_url() {
+     auto* url = std::getenv("SPIDER_TEST_DB_URL");
+     return url ? url : cStorageUrl;  // Fall back to default if not set
+ }

  template <class T>
  requires std::derived_from<T, core::DataStorage>
  auto create_data_storage() -> std::unique_ptr<core::DataStorage> {
      std::unique_ptr<core::DataStorage> storage = std::make_unique<T>();
-     REQUIRE(storage->connect(spider::test::cStorageUrl).success());
+     REQUIRE(storage->connect(get_storage_url()).success());
      REQUIRE(storage->initialize().success());
      return storage;
  }
src/spider/storage/MetadataStorage.hpp (1)

56-57: LGTM: Well-structured method for scheduler address retrieval.

The new method follows the established pattern and provides essential functionality for distributed system coordination.

Consider adding a brief comment documenting the expected format of the address string (e.g., IPv4, IPv6, hostname) to ensure consistent usage across implementations.
src/spider/core/TaskGraph.hpp (2)
51-56: Document pointer validity guarantees

The method now returns a pointer to a map element. Please document the lifetime/validity guarantees of the returned pointer, particularly in relation to map modifications.
+    /**
+     * Retrieves a const pointer to a task by its ID.
+     * @note The returned pointer is valid until the next modification of the task graph.
+     */
     [[nodiscard]] auto get_task(boost::uuids::uuid id) const -> std::optional<Task const*> {
58-62: Document usage guidelines for mutable task access

The non-const overload allows direct modification of tasks, which could potentially break graph invariants. Consider:

Documenting safe modification patterns

Adding runtime checks for critical task properties
+    /**
+     * Retrieves a mutable pointer to a task by its ID.
+     * @warning Modify task properties with caution to maintain graph consistency.
+     *          The task ID must not be changed.
+     */
     [[nodiscard]] auto get_task(boost::uuids::uuid id) -> std::optional<Task*> {
src/spider/storage/MysqlStorage.hpp (1)
66-66: Add documentation for the enhanced functionality

The rename from fetch_task to fetch_full_task suggests expanded functionality. Please add a brief comment explaining what constitutes a "full" task fetch.
-    auto fetch_full_task(std::unique_ptr<sql::ResultSet> const& res) -> Task;
+    /// Fetches a task with all its associated data (inputs, outputs, etc.) from the result set
+    auto fetch_full_task(std::unique_ptr<sql::ResultSet> const& res) -> Task;
src/spider/core/Task.hpp (1)
135-138: Consider marking vectors as const in class definition.

The new getter methods returning const references are good for performance. Consider also marking the vectors as const in the class definition if they're not meant to be modified after construction.
private:
    boost::uuids::uuid m_id;
    std::string m_function_name;
    TaskState m_state = TaskState::Pending;
    float m_timeout = 0;
-   std::vector<TaskInput> m_inputs;
-   std::vector<TaskOutput> m_outputs;
+   const std::vector<TaskInput> m_inputs;
+   const std::vector<TaskOutput> m_outputs;
tests/storage/test-DataStorage.cpp (2)
38-38: Consider adding more specific failure messages.

While the data comparison is correct, adding a custom message would help debug test failures.
-    REQUIRE(spider::test::data_equal(data, result));
+    REQUIRE(spider::test::data_equal(data, result), "Retrieved data does not match stored data");
48-51: Apply the same improvements as the first test case.

For consistency with the first test case:

The test structure improvements are good

Consider adding the same custom failure message for data comparison
-    REQUIRE(spider::test::data_equal(data, result));
+    REQUIRE(spider::test::data_equal(data, result), "Retrieved data by key does not match stored data");
Also applies to: 67-67

🧰 Tools

🪛 cppcheck

[error] 48-48: syntax error

(syntaxError)
tests/storage/test-MetadataStorage.cpp (3)
23-63: Consider improving test reliability for timing-dependent assertions.

The test case effectively validates driver heartbeat functionality but has two potential reliability issues:

The cDuration constant lacks units specification

Using sleep_for in tests can lead to flaky behaviour in CI environments

Consider these improvements:
-    constexpr double cDuration = 100;
+    constexpr std::chrono::milliseconds cHeartbeatTimeout{100};
Also, consider implementing a mock time provider to avoid relying on actual sleep durations.

65-104: Enhance test coverage and maintainability.

The scheduler state test is thorough but could be improved in several areas:

Magic values should be named constants

Missing test cases for invalid state transitions

Limited set of state values tested

Consider these improvements:
+    // At the namespace level
+    namespace {
+    constexpr int kDefaultSchedulerPort = 3306;
+    constexpr char kNormalState[] = "normal";
+    constexpr char kRecoveryState[] = "recovery";
+    }

-    constexpr int cPort = 3306;
+    constexpr int cPort = kDefaultSchedulerPort;

-    REQUIRE(state_res == "normal");
+    REQUIRE(state_res == kNormalState);
Also consider adding test cases for:

Invalid state transitions

Invalid state string values

Concurrent state updates

🧰 Tools

🪛 cppcheck

[error] 65-65: syntax error

(syntaxError)

106-110: Fix typo in test case name.

There's a typo in the test case name: "Job add, get ane remove" should be "Job add, get and remove".
tests/utils/CoreTaskUtils.cpp (2)
22-24: Simplify float_equal function using std::abs

For better clarity and to handle negative differences appropriately, consider using std::abs in the float_equal function.

Apply this diff to simplify the function:
+#include <cmath>

 auto float_equal(float f1, float f2) -> bool {
-    return f1 - f2 < cEpsilon && f2 - f1 < cEpsilon;
+    return std::abs(f1 - f2) < cEpsilon;
 }
62-63: Use size_t for loop counter to match vector size type

Changing the loop variable i from int to size_t avoids potential signed/unsigned comparison issues.

Apply this diff:
-        for (int i = 0; i < v1.size(); ++i) {
+        for (size_t i = 0; i < v1.size(); ++i) {
src/spider/storage/MysqlStorage.cpp (4)

305-328: Consider removing unnecessary transaction management in read-only method get_driver

The method get_driver performs a read-only operation but uses m_conn->rollback() and m_conn->commit(). For read-only operations, transaction management may not be necessary and can be omitted to improve performance.

503-514: Use column names instead of indices for clarity

In the functions fetch_task_input and fetch_task_output, hard-coded column indices like res->getString(3) are used. Using column names instead of indices improves readability and reduces the risk of errors if the SQL query changes.

Also applies to: 524-533

Line range hint 767-781: Consider removing unnecessary transaction management in read-only method get_task

The get_task method performs a read-only operation but includes m_conn->rollback() and m_conn->commit(). Omitting transaction management in read-only methods can enhance performance.

Line range hint 1147-1164: Ensure consistent error handling in data retrieval methods

In get_data and get_data_by_key, when no data is found, m_conn->rollback() is called before returning an error. Since these methods are read-only, consider whether transaction rollback is necessary.

Also applies to: 1190-1208

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 9d6e5fa and 7ab7a61.

📒 Files selected for processing (12)

src/spider/core/Task.hpp (2 hunks)
src/spider/core/TaskGraph.hpp (1 hunks)
src/spider/storage/MetadataStorage.hpp (2 hunks)
src/spider/storage/MysqlStorage.cpp (23 hunks)
src/spider/storage/MysqlStorage.hpp (2 hunks)
tests/CMakeLists.txt (1 hunks)
tests/storage/StorageTestHelper.hpp (3 hunks)
tests/storage/test-DataStorage.cpp (9 hunks)
tests/storage/test-MetadataStorage.cpp (1 hunks)
tests/utils/CoreDataUtils.hpp (2 hunks)
tests/utils/CoreTaskUtils.cpp (1 hunks)
tests/utils/CoreTaskUtils.hpp (1 hunks)

✅ Files skipped from review due to trivial changes (1)

tests/utils/CoreTaskUtils.hpp

🧰 Additional context used

🪛 cppcheck

tests/storage/test-DataStorage.cpp

[error] 48-48: syntax error

(syntaxError)

tests/storage/test-MetadataStorage.cpp

[error] 65-65: syntax error

(syntaxError)

tests/utils/CoreTaskUtils.cpp

[performance] 87-87: Variable 'id' is assigned in constructor body. Consider performing initialization in initialization list.

(useInitializationList)

[performance] 106-106: Variable 'm_id' is assigned in constructor body. Consider performing initialization in initialization list.

(useInitializationList)

🔇 Additional comments (20)

tests/CMakeLists.txt (1)

3-7: LGTM! The new test files are properly integrated.

The addition of metadata storage tests and task utility files aligns well with the PR objectives. The files are correctly organized in their respective directories (storage/ and utils/) and properly integrated into the SPIDER_TEST_SOURCES variable.

tests/utils/CoreDataUtils.hpp (3)

1-2: LGTM! Header guards are properly implemented.

The header guard naming convention follows the project standards.

5-5: Appropriate namespace change for test utilities.

Moving this utility to the spider::test namespace better reflects its purpose as a testing component and maintains clear separation from core functionality.

31-33: LGTM! Proper file closure with clear comments.

The namespace and header guard closings are well-documented.

tests/storage/StorageTestHelper.hpp (2)

1-2: LGTM: Include guard naming aligns with directory structure

The update from SPIDER_TEST_ to SPIDER_TESTS_ in the include guard better reflects the actual directory structure and maintains consistency.

Also applies to: 57-57

7-7: Consider making storage type lists extensible

While the type aliases improve readability, hardcoding MySQL as the only storage type might limit testing flexibility. Consider:

Making these lists configurable or extensible for future storage backends
Adding a comment explaining why MySQL is currently the only supported backend

Let's check if other storage backends exist:

Also applies to: 19-21

src/spider/storage/MetadataStorage.hpp (3)

54-54: LGTM: Enhanced precision for timeout values.

The change from float to double provides better precision for timeout values, which is important for accurate scheduling and monitoring.

#!/bin/bash
# Verify all timeout-related methods use double
rg 'timeout.*float|float.*timeout'

29-29: LGTM: Well-designed getter method for driver information.

The new method follows a consistent pattern with other getters in the interface and provides essential functionality for driver management.

✅ Verification successful

Method is properly implemented in MySQL backend

The get_driver method is correctly implemented in MysqlStorage.cpp with a matching signature and follows the interface contract. The implementation includes proper SQL query preparation and error handling.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Verify the implementation exists in MySQL backend
ast-grep --pattern 'auto get_driver(boost::uuids::uuid id, std::string* addr) -> StorageErr'

Length of output: 695

Script:

#!/bin/bash
# Search for get_driver implementation in MySQL storage
rg -A 5 "get_driver.*uuid.*addr.*StorageErr" src/

# Search for get_driver in MySQL storage class definition
ast-grep --pattern 'class MysqlStorage {
  $$$
  auto get_driver($$$) -> StorageErr $$$
  $$$
}'

Length of output: 1813

47-48: LGTM: Improved method signature for handling multiple child tasks.

The rename from get_child_task to get_child_tasks and the change to return a vector better represents the one-to-many relationship between tasks.

✅ Verification successful

The method signature change has been properly implemented across the codebase

The verification confirms that:

The pure virtual method get_child_tasks is correctly declared in MetadataStorage.hpp
The implementation in MySqlMetadataStorage properly overrides the method with the same signature
The method is being used correctly in tests and throughout the codebase
No legacy get_child_task (singular) calls remain

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Verify all implementations have been updated
rg -A 2 'get_child_task[s]?\('

Length of output: 695

Script:

#!/bin/bash
# Search for get_child_task or get_child_tasks with proper pattern matching
rg -A 2 'get_child_tasks?\s*\(' --type cpp

# Also search for class implementations of MetadataStorage
ast-grep --pattern 'class $_ : public MetadataStorage {
  $$$
}'

Length of output: 2191

src/spider/storage/MysqlStorage.hpp (4)

34-34: LGTM: Method signature is well-designed

The method declaration follows good C++ practices with proper error handling and output parameter design.

58-58: LGTM: Well-structured method signature

The method follows consistent patterns with other address retrieval methods in the class, properly separating address and port parameters.

55-55: Verify caller compatibility with double precision

The change from float to double improves precision but may affect existing callers.

#!/bin/bash
# Search for calls to heartbeat_timeout to verify type compatibility
rg "heartbeat_timeout\s*\([^)]*\)" --type cpp

51-51: Verify task collection handling in implementation

The method signature change from single task to vector is appropriate. Please ensure the implementation properly handles:

Memory management for Task objects
Error cases (empty results, partial failures)

src/spider/core/Task.hpp (1)

104-107: Verify the impact of removing creator-related parameters.

The constructor simplification looks good, but we should verify that removing the creator-related parameters doesn't break existing task creation workflows.

Also applies to: 109-114

✅ Verification successful

Let's search for more context about Task usage patterns and any related types that might have been affected by the constructor changes.

Let's search for more context about the Task class definition and its usage patterns, particularly focusing on instantiation and any remaining references to the removed creator-related code.

Constructor changes are safe and backwards compatible

The simplified Task constructor changes appear to be safe based on the following evidence:

No TaskCreator references exist in the codebase
Task instantiation patterns in tests show only function name being used: Task{"child"}, Task{"p1"}
TaskGraph integration shows compatibility with the new constructor through add_child_task
The constructor maintains core functionality (ID generation and function name) while removing unused creator-related parameters

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Search for Task constructor usage patterns to ensure all call sites are updated

# Test: Search for Task constructor calls
rg -A 2 'Task\(' --type cpp

# Test: Search for potential creator-related usages that might need updating
rg -i 'creator.*task' --type cpp

Length of output: 546

Script:

#!/bin/bash
# Search for any references to Task class and its constructors in implementation files
rg -A 5 'Task\s+\w+\s*\(' --type cpp

# Search for any Task instantiation patterns
rg -A 3 'new\s+Task\s*\(' --type cpp

# Search for any Task-related enums or types that might have been affected
rg -A 5 'enum.*Task' --type cpp

# Look for any Task factory or creation patterns
rg -A 5 'create.*Task' --type cpp

Length of output: 397

Script:

#!/bin/bash
# Search for any Task class usage in header files
rg -A 5 'class.*Task' --type cpp

# Search for any Task object instantiation or usage
rg -A 3 'Task\s+[a-zA-Z_][a-zA-Z0-9_]*\s*[{=]' --type cpp

# Look for any removed TaskCreatorType references
rg -A 3 'TaskCreator' --type cpp

# Search for any Task-related method implementations
ast-grep --pattern 'Task::$_($$$) {
  $$$
}'

Length of output: 3065

tests/storage/test-DataStorage.cpp (4)

1-1: LGTM! Enhanced NOLINT comments improve code quality guidelines.

The NOLINT comments now include additional checks against non-const global variables and C-style arrays, which better aligns with modern C++ practices.

Also applies to: 141-141

19-23: LGTM! Improved test case flexibility with TEMPLATE_LIST_TEST_CASE.

The change to use TEMPLATE_LIST_TEST_CASE with DataStorageTypeList allows for better test parameterization.

109-112: LGTM! Consistent test structure with task reference test case.

The test follows the same robust pattern as the task reference test case, properly handling error cases and prerequisites.

77-80: LGTM! Verify task creation parameters.

The test structure is good, using proper error cases and prerequisites. However, let's verify the task creation parameters are consistent with the updated Task class (removal of TaskCreatorType).

Also applies to: 90-90

✅ Verification successful

The previous ast-grep pattern might have failed due to syntax. Let's try with ripgrep to get a better understanding of Task usage and definition in the codebase.

Let's verify the actual test implementation to see how Tasks are created and if there are any remaining TaskCreatorType references.

Task constructor usage is consistent with current implementation

The test is using the correct Task constructor signature (Task{"func"}), which matches the current Task class implementation that accepts a function name. There are no references to the removed TaskCreatorType, and the Task creation is properly implemented throughout the codebase.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify Task constructor usage consistency
# Test: Search for Task constructor calls to ensure they match the new signature
ast-grep --pattern 'Task($_)'

Length of output: 31

Script:

#!/bin/bash
# Search for Task constructor calls with ripgrep for better coverage
rg "Task\(" -A 2

# Also search for Task class definition to understand the constructor signature
rg "class Task" -A 5

Length of output: 2263

Script:

#!/bin/bash
# Let's examine the test file to see how Task is being used
rg "Task" tests/storage/test-DataStorage.cpp -A 2

# Also check for any TaskCreatorType references that might have been removed
rg "TaskCreatorType" -A 2

Length of output: 432

tests/storage/test-MetadataStorage.cpp (1)

1-20: LGTM: File structure and includes are well-organized.

The file includes all necessary headers and uses appropriate NOLINT pragmas to suppress specific warnings that are acceptable in test files.

src/spider/storage/MysqlStorage.cpp (1)

955-962: Validate the heartbeat timeout calculation

In the heartbeat_timeout method, the timeout is converted using cMillisecondToMicrosecond, which is set to 1000. Since there are 1,000,000 microseconds in a second, ensure that the conversion accurately reflects the intended timeout duration.

coderabbitai · 2024-11-10T17:52:38Z

src/spider/core/TaskGraph.hpp

+    void add_dependency(boost::uuids::uuid parent, boost::uuids::uuid child) {
        m_dependencies.emplace_back(parent, child);
    }


🛠️ Refactor suggestion

Consider adding task existence validation

The method should verify that both parent and child tasks exist in m_tasks before adding the dependency to prevent orphaned dependencies.

void add_dependency(boost::uuids::uuid parent, boost::uuids::uuid child) { + if (!m_tasks.contains(parent) || !m_tasks.contains(child)) { + throw std::invalid_argument("Both parent and child tasks must exist"); + } m_dependencies.emplace_back(parent, child); }

Committable suggestion skipped: line range outside the PR's diff.

coderabbitai · 2024-11-10T17:52:38Z

tests/storage/test-MetadataStorage.cpp

+    std::unique_ptr<spider::core::MetadataStorage> storage
+            = spider::test::create_metadata_storage<TestType>();
+
+    boost::uuids::random_generator gen;
+    boost::uuids::uuid const job_id = gen();
+
+    // Create a complicated task graph
+    boost::uuids::uuid const client_id = gen();
+    spider::core::Task child_task{"child"};
+    spider::core::Task parent_1{"p1"};
+    spider::core::Task parent_2{"p2"};
+    parent_1.add_input(spider::core::TaskInput{"1", "float"});
+    parent_1.add_input(spider::core::TaskInput{"2", "float"});
+    parent_2.add_input(spider::core::TaskInput{"3", "int"});
+    parent_2.add_input(spider::core::TaskInput{"4", "int"});
+    parent_1.add_output(spider::core::TaskOutput{"float"});
+    parent_2.add_output(spider::core::TaskOutput{"int"});
+    child_task.add_input(spider::core::TaskInput{parent_1.get_id(), 0, "float"});
+    child_task.add_input(spider::core::TaskInput{parent_2.get_id(), 0, "int"});
+    child_task.add_output(spider::core::TaskOutput{"float"});
+    spider::core::TaskGraph graph;
+    // Add task and dependencies to task graph in wrong order
+    graph.add_task(child_task);
+    graph.add_task(parent_1);
+    graph.add_task(parent_2);
+    graph.add_dependency(parent_2.get_id(), child_task.get_id());
+    graph.add_dependency(parent_1.get_id(), child_task.get_id());
+
+    // Get head tasks should succeed
+    absl::flat_hash_set<boost::uuids::uuid> heads = graph.get_head_tasks();
+    REQUIRE(2 == heads.size());
+    REQUIRE(heads.contains(parent_1.get_id()));
+    REQUIRE(heads.contains(parent_2.get_id()));
+
+    // Submit a simple job
+    boost::uuids::uuid const simple_job_id = gen();
+    spider::core::Task const simple_task{"simple"};
+    spider::core::TaskGraph simple_graph;
+    simple_graph.add_task(simple_task);
+
+    heads = simple_graph.get_head_tasks();
+    REQUIRE(1 == heads.size());
+    REQUIRE(heads.contains(simple_task.get_id()));
+
+    // Submit job should success
+    REQUIRE(storage->add_job(job_id, client_id, graph).success());
+    REQUIRE(storage->add_job(simple_job_id, client_id, simple_graph).success());
+
+    // Get job id for non-existent client id should return empty vector
+    std::vector<boost::uuids::uuid> job_ids;
+    REQUIRE(storage->get_jobs_by_client_id(gen(), &job_ids).success());
+    REQUIRE(job_ids.empty());
+
+    // Get job id for client id should get correct value;
+    REQUIRE(storage->get_jobs_by_client_id(client_id, &job_ids).success());
+    REQUIRE(2 == job_ids.size());
+    REQUIRE(
+            ((job_ids[0] == job_id && job_ids[1] == simple_job_id)
+             || (job_ids[0] == simple_job_id && job_ids[1] == job_id))
+    );
+
+    // Get task graph should succeed
+    spider::core::TaskGraph graph_res{};
+    REQUIRE(storage->get_task_graph(job_id, &graph_res).success());
+    REQUIRE(spider::test::task_graph_equal(graph, graph_res));
+    spider::core::TaskGraph simple_graph_res{};
+    REQUIRE(storage->get_task_graph(simple_job_id, &simple_graph_res).success());
+    REQUIRE(spider::test::task_graph_equal(simple_graph, simple_graph_res));
+
+    // Get task should succeed
+    spider::core::Task task_res{""};
+    REQUIRE(storage->get_task(child_task.get_id(), &task_res).success());
+    REQUIRE(spider::test::task_equal(child_task, task_res));
+
+    // Get child tasks should succeed
+    std::vector<spider::core::Task> tasks;
+    REQUIRE(storage->get_child_tasks(parent_1.get_id(), &tasks).success());
+    REQUIRE(1 == tasks.size());
+    REQUIRE(spider::test::task_equal(child_task, tasks[0]));
+    tasks.clear();
+
+    // Get parent tasks should succeed
+    REQUIRE(storage->get_parent_tasks(child_task.get_id(), &tasks).success());
+    REQUIRE(2 == tasks.size());
+    REQUIRE(
+            ((spider::test::task_equal(tasks[0], parent_1)
+              && spider::test::task_equal(tasks[1], parent_2))
+             || (spider::test::task_equal(tasks[0], parent_2)
+                 && spider::test::task_equal(tasks[1], parent_1)))
+    );
+}


💡 Codebase verification

🛠️ Refactor suggestion

The review comment is accurate - test coverage needs improvement

The test file's name suggests it covers job removal ("Job add, get ane[sic] remove"), but despite remove_job being a required virtual method in the MetadataStorage interface, there are no tests validating this functionality. Additionally, the review correctly identifies missing test cases for error scenarios and the need to refactor the complex task graph setup.

Key findings:

No tests for remove_job despite being part of the interface

Missing error scenario tests (invalid graphs, duplicates)

Complex setup code that could benefit from helper functions

Test name is misleading as it doesn't test removal functionality

🔗 Analysis chain

Consider improving test structure and coverage.

While the test thoroughly validates the happy path for job management, there are several areas for improvement:

Complex task graph setup could be moved to helper functions

Missing test cases for error scenarios

No validation of job removal despite being mentioned in the test name

Consider extracting task graph setup to a helper function:

namespace spider::test { spider::core::TaskGraph create_test_task_graph() { spider::core::Task child_task{"child"}; spider::core::Task parent_1{"p1"}; spider::core::Task parent_2{"p2"}; // ... setup tasks ... spider::core::TaskGraph graph; // ... add tasks and dependencies ... return graph; } }

Also, consider adding test cases for:

Job removal functionality

Adding jobs with invalid task graphs

Adding duplicate jobs

Race conditions in concurrent job submissions

Let me gather more information about the test coverage for job removal functionality.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash # Verify if job removal functionality exists in MetadataStorage rg -A 5 "remove_job|delete_job" src/spider/storage/MetadataStorage.hpp

Length of output: 570

Script:

#!/bin/bash # Check if there are any tests for remove_job in the test files rg -B 2 -A 5 "remove_job|delete_job" tests/ # Look for the complete test file to understand the test scope cat tests/storage/test-MetadataStorage.cpp

Length of output: 8168

coderabbitai · 2024-11-10T17:52:38Z

tests/utils/CoreTaskUtils.cpp

+    if (!vector_sort_equal<std::pair<boost::uuids::uuid, boost::uuids::uuid>>(
+                graph_1.get_dependencies(),
+                graph_1.get_dependencies()
+        ))


⚠️ Potential issue

Fix comparison in task_graph_equal function

The function currently compares graph_1.get_dependencies() with itself instead of graph_2.get_dependencies(). This prevents accurate comparison between two different task graphs.

Apply this diff to correct the comparison:

if (!vector_sort_equal<std::pair<boost::uuids::uuid, boost::uuids::uuid>>( graph_1.get_dependencies(), - graph_1.get_dependencies() + graph_2.get_dependencies() ))

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if (!vector_sort_equal<std::pair<boost::uuids::uuid, boost::uuids::uuid>>(

graph_1.get_dependencies(),

graph_1.get_dependencies()

))

if (!vector_sort_equal<std::pair<boost::uuids::uuid, boost::uuids::uuid>>(

graph_1.get_dependencies(),

graph_2.get_dependencies()

))

coderabbitai · 2024-11-10T17:52:38Z

src/spider/storage/MysqlStorage.cpp

+            std::optional<Task const*> const task_option = task_graph.get_task(task_id);
            if (!task_option.has_value()) {
                m_conn->rollback();
                return StorageErr{StorageErrType::KeyNotFoundErr, "Task graph inconsistent"};
            }
-            Task const& task = task_option.value();
-            this->add_task(job_id_bytes, task);
+            Task const* task = task_option.value();


🛠️ Refactor suggestion

Refactor duplicated code in add_job to a helper function

The code blocks at lines 427-432 and 444-449 are identical. Refactoring this duplicated code into a helper function would enhance maintainability and reduce redundancy.

Also applies to: 444-449

coderabbitai

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (6)

tests/storage/test-MetadataStorage.cpp (5)
1-1: Consider documenting the rationale for suppressed warnings.

Please add a brief comment explaining why each warning is being suppressed. This helps future maintainers understand the reasoning behind these suppressions.

27-27: Consider using named constants for better readability.

The magic numbers in the test could be more descriptive:

cDuration = 100 could indicate the unit (milliseconds/seconds)

The sleep duration of 1 second could be a named constant
-    constexpr double cDuration = 100;
+    // Duration in milliseconds for heartbeat timeout
+    constexpr double cHeartbeatTimeoutMs = 100;

-    std::this_thread::sleep_for(std::chrono::seconds(1));
+    constexpr auto cHeartbeatWaitTime = std::chrono::seconds(1);
+    std::this_thread::sleep_for(cHeartbeatWaitTime);
Also applies to: 47-47

75-75: Consider using enums or constants for scheduler states and ports.

The hardcoded values could be more maintainable:

Port 3306 should be a named constant

Scheduler states could be defined as enums or constants
+    // Common ports used in tests
+    namespace test_ports {
+        constexpr int kMySqlPort = 3306;
+    }
+
+    // Scheduler states
+    namespace scheduler_states {
+        constexpr char kNormal[] = "normal";
+        constexpr char kRecovery[] = "recovery";
+    }

-    constexpr int cPort = 3306;
+    constexpr int cPort = test_ports::kMySqlPort;

-    REQUIRE(state_res == "normal");
+    REQUIRE(state_res == scheduler_states::kNormal);
Also applies to: 94-94

117-138: Consider extracting task graph setup to helper functions.

The task graph creation logic is complex and could be moved to helper functions for better reusability and readability.
namespace spider::test {
    spider::core::TaskGraph create_complex_task_graph() {
        spider::core::Task child_task{"child"};
        spider::core::Task parent_1{"p1"};
        spider::core::Task parent_2{"p2"};
        
        // Setup inputs/outputs
        setup_parent_task(parent_1, {"1", "2"}, "float");
        setup_parent_task(parent_2, {"3", "4"}, "int");
        setup_child_task(child_task, parent_1, parent_2);
        
        // Create graph with dependencies
        spider::core::TaskGraph graph;
        add_tasks_with_dependencies(graph, {child_task, parent_1, parent_2});
        return graph;
    }
}
106-209: Consider adding more error test cases.

While the happy path is well tested, consider adding test cases for:

Adding jobs with invalid task graphs (cycles, disconnected components)

Adding duplicate jobs

Removing non-existent jobs

Race conditions in concurrent job submissions
tests/utils/CoreTaskUtils.cpp (1)
63-64: Use size_t for loop index to match container size type

The loop index i is declared as int, whereas v1.size() returns a size_t. Using different integer types may lead to potential warnings or issues related to signed versus unsigned comparisons.

Apply this diff to correct the index type:
-for (int i = 0; i < v1.size(); ++i) {
+for (size_t i = 0; i < v1.size(); ++i) {

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 7ab7a61 and ef9fd9a.

📒 Files selected for processing (2)

tests/storage/test-MetadataStorage.cpp (1 hunks)
tests/utils/CoreTaskUtils.cpp (1 hunks)

🧰 Additional context used

🪛 cppcheck

tests/storage/test-MetadataStorage.cpp

[error] 65-65: syntax error

(syntaxError)

tests/utils/CoreTaskUtils.cpp

[performance] 87-87: Variable 'id' is assigned in constructor body. Consider performing initialization in initialization list.

(useInitializationList)

[performance] 106-106: Variable 'm_id' is assigned in constructor body. Consider performing initialization in initialization list.

(useInitializationList)

🔇 Additional comments (2)

tests/storage/test-MetadataStorage.cpp (1)

202-208: Well done on implementing job removal tests!

The test now properly validates the remove_job functionality, addressing the concerns from previous reviews. It verifies both the successful removal and the persistence of unrelated jobs.

tests/utils/CoreTaskUtils.cpp (1)

1-187: Overall, the code changes are well-implemented

The utility functions for comparing task-related data structures are correctly defined, and the logic is sound. The previous issue regarding the comparison of dependencies in task_graph_equal has been resolved.

🧰 Tools

🪛 cppcheck

[performance] 87-87: Variable 'id' is assigned in constructor body. Consider performing initialization in initialization list.

(useInitializationList)

[performance] 106-106: Variable 'm_id' is assigned in constructor body. Consider performing initialization in initialization list.

(useInitializationList)

sitaowang1998 added 8 commits November 8, 2024 19:20

test: Add driver heartbeat test

dc019a2

test: Add scheduler tests

ca5bbeb

style: Fix lint

cbb09ba

style: Fix lint

fee117f

test: Use template list test case to remove reduency

b049834

style: Fix lint

48266ae

test: Add task graph tests and bug fix

4de62c7

style: Fix lint

7ab7a61

sitaowang1998 changed the title ~~tests: Add unit tests for metadata storage and fix bugs in MySQL storage backend~~ test: Add unit tests for metadata storage and fix bugs in MySQL storage backend Nov 10, 2024

coderabbitai bot reviewed Nov 10, 2024

View reviewed changes

test: Add task graph removal test

ef9fd9a

coderabbitai bot reviewed Nov 10, 2024

View reviewed changes

sitaowang1998 merged commit 7bfdc37 into y-scope:main Nov 10, 2024
3 checks passed

sitaowang1998 deleted the metadatastorage_test branch November 10, 2024 18:28

This was referenced Nov 29, 2024

feat: Add client interface. #16

Merged

feat: Add message pipe #27

Merged

feat: Add task executor #28

Merged

feat: Add scheduler scheduling #31

Merged

This was referenced Dec 7, 2024

feat: Add variable size message sending and receiving through socket #32

Merged

feat: Add scheduler policy and data stroage garbage collection #34

Merged

feat: Add worker client that submit tasks and get next task from schedulers #35

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test: Add unit tests for metadata storage and fix bugs in MySQL storage backend #23

test: Add unit tests for metadata storage and fix bugs in MySQL storage backend #23

sitaowang1998 commented Nov 10, 2024 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Nov 10, 2024 •

edited

Loading

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

coderabbitai bot left a comment

coderabbitai bot Nov 10, 2024

coderabbitai bot Nov 10, 2024

coderabbitai bot Nov 10, 2024

coderabbitai bot Nov 10, 2024

coderabbitai bot left a comment

test: Add unit tests for metadata storage and fix bugs in MySQL storage backend #23

test: Add unit tests for metadata storage and fix bugs in MySQL storage backend #23

Conversation

sitaowang1998 commented Nov 10, 2024 • edited by coderabbitai bot Loading

Description

Validation performed

Summary by CodeRabbit

Release Notes

coderabbitai bot commented Nov 10, 2024 • edited Loading

Walkthrough

Changes

Possibly related PRs

Suggested reviewers

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Nov 10, 2024

Choose a reason for hiding this comment

coderabbitai bot Nov 10, 2024

Choose a reason for hiding this comment

coderabbitai bot Nov 10, 2024

Choose a reason for hiding this comment

coderabbitai bot Nov 10, 2024

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

sitaowang1998 commented Nov 10, 2024 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Nov 10, 2024 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)