Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test: Add unit tests for metadata storage and fix bugs in MySQL storage backend #23

Merged
merged 9 commits into from
Nov 10, 2024

Conversation

sitaowang1998
Copy link
Collaborator

@sitaowang1998 sitaowang1998 commented Nov 10, 2024

Description

As title.

Validation performed

  • Pass all github workflows
  • Pass all unit tests on ubuntu focal devcontainer.

Summary by CodeRabbit

Release Notes

  • New Features

    • Simplified Task class constructor, requiring only the function name.
    • Added methods to access task inputs and outputs.
    • New methods for retrieving driver and scheduler addresses in MetadataStorage.
    • Introduced utility functions for comparing tasks and task graphs.
  • Bug Fixes

    • Enhanced error handling in database operations.
  • Tests

    • New test suite for MetadataStorage functionality.
    • Updated test cases for improved structure and clarity.
    • Added new utility functions for comparing various data structures related to tasks and task graphs.

Copy link

coderabbitai bot commented Nov 10, 2024

Walkthrough

The pull request introduces significant modifications to several classes within the spider project, particularly focusing on the Task, TaskGraph, and MetadataStorage classes. Key changes include the removal of the TaskCreatorType enum and associated parameters from the Task class, updates to method signatures and return types in the TaskGraph class, and enhancements to the MetadataStorage class with new methods for driver and scheduler management. Additionally, new utility functions for comparing task-related data structures have been added, along with new tests to validate the updated functionalities.

Changes

File Change Summary
src/spider/core/Task.hpp Removed TaskCreatorType enum, simplified Task constructor, removed get_creator_type() and get_creator_id() methods, added get_inputs() and get_outputs() methods.
src/spider/core/TaskGraph.hpp Renamed add_dependencies to add_dependency, changed get_task return type to std::optional<Task const*>, added overload for get_task returning std::optional<Task*>.
src/spider/storage/MetadataStorage.hpp Added get_driver() method, renamed get_child_task() to get_child_tasks() with updated return type, changed heartbeat_timeout parameter type from float to double, added get_scheduler_addr().
src/spider/storage/MysqlStorage.cpp Altered client_id column type in the jobs table, added get_driver() and get_scheduler_addr() methods, updated add_job logic for task dependencies, enhanced error handling, and refined SQL queries.
src/spider/storage/MysqlStorage.hpp Added get_driver() and get_scheduler_addr() methods, renamed get_child_task() to get_child_tasks(), updated heartbeat_timeout parameter type, renamed fetch_task() to fetch_full_task().
tests/CMakeLists.txt Added new test source files: storage/test-MetadataStorage.cpp, utils/CoreTaskUtils.hpp, and utils/CoreTaskUtils.cpp.
tests/storage/StorageTestHelper.hpp Changed include guard, added type aliases for storage types.
tests/storage/test-DataStorage.cpp Updated test case declarations to use TEMPLATE_LIST_TEST_CASE, enhanced NOLINT comments.
tests/storage/test-MetadataStorage.cpp Introduced new test suite for MetadataStorage with multiple test cases validating its functionalities.
tests/utils/CoreDataUtils.hpp Changed namespace from spider::core to spider::test, updated type qualification for data_equal function.
tests/utils/CoreTaskUtils.cpp Added utility functions for comparing various task-related data structures.
tests/utils/CoreTaskUtils.hpp Introduced new header file defining utility functions for comparing Task, TaskInput, TaskOutput, and TaskGraph objects.

Possibly related PRs

Suggested reviewers

  • kirkrodrigues

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@sitaowang1998 sitaowang1998 changed the title tests: Add unit tests for metadata storage and fix bugs in MySQL storage backend test: Add unit tests for metadata storage and fix bugs in MySQL storage backend Nov 10, 2024
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Outside diff range and nitpick comments (18)
tests/utils/CoreDataUtils.hpp (1)

Line range hint 7-30: Consider optimizing property access with structured bindings.

While the implementation is correct, we could make it more efficient by reducing repeated getter calls.

Here's a suggested optimization:

-inline auto data_equal(core::Data const& d1, core::Data const& d2) -> bool {
-    if (d1.get_id() != d2.get_id()) {
-        return false;
-    }
-
-    if (d1.get_key() != d2.get_key()) {
-        return false;
-    }
-
-    if (d1.get_locality() != d2.get_locality()) {
-        return false;
-    }
-
-    if (d1.is_hard_locality() != d2.is_hard_locality()) {
-        return false;
-    }
-
-    if (d1.get_value() != d2.get_value()) {
-        return false;
-    }
-
-    return true;
+inline auto data_equal(core::Data const& d1, core::Data const& d2) -> bool {
+    auto [id1, key1, locality1, hard_locality1, value1] = 
+        std::tuple{d1.get_id(), d1.get_key(), d1.get_locality(), 
+                  d1.is_hard_locality(), d1.get_value()};
+    auto [id2, key2, locality2, hard_locality2, value2] = 
+        std::tuple{d2.get_id(), d2.get_key(), d2.get_locality(), 
+                  d2.is_hard_locality(), d2.get_value()};
+    
+    return id1 == id2 && key1 == key2 && locality1 == locality2 && 
+           hard_locality1 == hard_locality2 && value1 == value2;
}
tests/storage/StorageTestHelper.hpp (1)

Line range hint 23-54: Consider improving test environment management

Two suggestions for the storage creation functions:

  1. Add cleanup functionality to ensure test data is removed after each test
  2. Move the hardcoded database credentials to a configuration file or environment variables

Example improvement:

+ // Load credentials from environment or config
+ static auto get_storage_url() {
+     auto* url = std::getenv("SPIDER_TEST_DB_URL");
+     return url ? url : cStorageUrl;  // Fall back to default if not set
+ }

  template <class T>
  requires std::derived_from<T, core::DataStorage>
  auto create_data_storage() -> std::unique_ptr<core::DataStorage> {
      std::unique_ptr<core::DataStorage> storage = std::make_unique<T>();
-     REQUIRE(storage->connect(spider::test::cStorageUrl).success());
+     REQUIRE(storage->connect(get_storage_url()).success());
      REQUIRE(storage->initialize().success());
      return storage;
  }
src/spider/storage/MetadataStorage.hpp (1)

56-57: LGTM: Well-structured method for scheduler address retrieval.

The new method follows the established pattern and provides essential functionality for distributed system coordination.

Consider adding a brief comment documenting the expected format of the address string (e.g., IPv4, IPv6, hostname) to ensure consistent usage across implementations.

src/spider/core/TaskGraph.hpp (2)

51-56: Document pointer validity guarantees

The method now returns a pointer to a map element. Please document the lifetime/validity guarantees of the returned pointer, particularly in relation to map modifications.

+    /**
+     * Retrieves a const pointer to a task by its ID.
+     * @note The returned pointer is valid until the next modification of the task graph.
+     */
     [[nodiscard]] auto get_task(boost::uuids::uuid id) const -> std::optional<Task const*> {

58-62: Document usage guidelines for mutable task access

The non-const overload allows direct modification of tasks, which could potentially break graph invariants. Consider:

  1. Documenting safe modification patterns
  2. Adding runtime checks for critical task properties
+    /**
+     * Retrieves a mutable pointer to a task by its ID.
+     * @warning Modify task properties with caution to maintain graph consistency.
+     *          The task ID must not be changed.
+     */
     [[nodiscard]] auto get_task(boost::uuids::uuid id) -> std::optional<Task*> {
src/spider/storage/MysqlStorage.hpp (1)

66-66: Add documentation for the enhanced functionality

The rename from fetch_task to fetch_full_task suggests expanded functionality. Please add a brief comment explaining what constitutes a "full" task fetch.

-    auto fetch_full_task(std::unique_ptr<sql::ResultSet> const& res) -> Task;
+    /// Fetches a task with all its associated data (inputs, outputs, etc.) from the result set
+    auto fetch_full_task(std::unique_ptr<sql::ResultSet> const& res) -> Task;
src/spider/core/Task.hpp (1)

135-138: Consider marking vectors as const in class definition.

The new getter methods returning const references are good for performance. Consider also marking the vectors as const in the class definition if they're not meant to be modified after construction.

private:
    boost::uuids::uuid m_id;
    std::string m_function_name;
    TaskState m_state = TaskState::Pending;
    float m_timeout = 0;
-   std::vector<TaskInput> m_inputs;
-   std::vector<TaskOutput> m_outputs;
+   const std::vector<TaskInput> m_inputs;
+   const std::vector<TaskOutput> m_outputs;
tests/storage/test-DataStorage.cpp (2)

38-38: Consider adding more specific failure messages.

While the data comparison is correct, adding a custom message would help debug test failures.

-    REQUIRE(spider::test::data_equal(data, result));
+    REQUIRE(spider::test::data_equal(data, result), "Retrieved data does not match stored data");

48-51: Apply the same improvements as the first test case.

For consistency with the first test case:

  1. The test structure improvements are good
  2. Consider adding the same custom failure message for data comparison
-    REQUIRE(spider::test::data_equal(data, result));
+    REQUIRE(spider::test::data_equal(data, result), "Retrieved data by key does not match stored data");

Also applies to: 67-67

🧰 Tools
🪛 cppcheck

[error] 48-48: syntax error

(syntaxError)

tests/storage/test-MetadataStorage.cpp (3)

23-63: Consider improving test reliability for timing-dependent assertions.

The test case effectively validates driver heartbeat functionality but has two potential reliability issues:

  1. The cDuration constant lacks units specification
  2. Using sleep_for in tests can lead to flaky behaviour in CI environments

Consider these improvements:

-    constexpr double cDuration = 100;
+    constexpr std::chrono::milliseconds cHeartbeatTimeout{100};

Also, consider implementing a mock time provider to avoid relying on actual sleep durations.


65-104: Enhance test coverage and maintainability.

The scheduler state test is thorough but could be improved in several areas:

  1. Magic values should be named constants
  2. Missing test cases for invalid state transitions
  3. Limited set of state values tested

Consider these improvements:

+    // At the namespace level
+    namespace {
+    constexpr int kDefaultSchedulerPort = 3306;
+    constexpr char kNormalState[] = "normal";
+    constexpr char kRecoveryState[] = "recovery";
+    }

-    constexpr int cPort = 3306;
+    constexpr int cPort = kDefaultSchedulerPort;

-    REQUIRE(state_res == "normal");
+    REQUIRE(state_res == kNormalState);

Also consider adding test cases for:

  • Invalid state transitions
  • Invalid state string values
  • Concurrent state updates
🧰 Tools
🪛 cppcheck

[error] 65-65: syntax error

(syntaxError)


106-110: Fix typo in test case name.

There's a typo in the test case name: "Job add, get ane remove" should be "Job add, get and remove".

tests/utils/CoreTaskUtils.cpp (2)

22-24: Simplify float_equal function using std::abs

For better clarity and to handle negative differences appropriately, consider using std::abs in the float_equal function.

Apply this diff to simplify the function:

+#include <cmath>

 auto float_equal(float f1, float f2) -> bool {
-    return f1 - f2 < cEpsilon && f2 - f1 < cEpsilon;
+    return std::abs(f1 - f2) < cEpsilon;
 }

62-63: Use size_t for loop counter to match vector size type

Changing the loop variable i from int to size_t avoids potential signed/unsigned comparison issues.

Apply this diff:

-        for (int i = 0; i < v1.size(); ++i) {
+        for (size_t i = 0; i < v1.size(); ++i) {
src/spider/storage/MysqlStorage.cpp (4)

305-328: Consider removing unnecessary transaction management in read-only method get_driver

The method get_driver performs a read-only operation but uses m_conn->rollback() and m_conn->commit(). For read-only operations, transaction management may not be necessary and can be omitted to improve performance.


503-514: Use column names instead of indices for clarity

In the functions fetch_task_input and fetch_task_output, hard-coded column indices like res->getString(3) are used. Using column names instead of indices improves readability and reduces the risk of errors if the SQL query changes.

Also applies to: 524-533


Line range hint 767-781: Consider removing unnecessary transaction management in read-only method get_task

The get_task method performs a read-only operation but includes m_conn->rollback() and m_conn->commit(). Omitting transaction management in read-only methods can enhance performance.


Line range hint 1147-1164: Ensure consistent error handling in data retrieval methods

In get_data and get_data_by_key, when no data is found, m_conn->rollback() is called before returning an error. Since these methods are read-only, consider whether transaction rollback is necessary.

Also applies to: 1190-1208

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 9d6e5fa and 7ab7a61.

📒 Files selected for processing (12)
  • src/spider/core/Task.hpp (2 hunks)
  • src/spider/core/TaskGraph.hpp (1 hunks)
  • src/spider/storage/MetadataStorage.hpp (2 hunks)
  • src/spider/storage/MysqlStorage.cpp (23 hunks)
  • src/spider/storage/MysqlStorage.hpp (2 hunks)
  • tests/CMakeLists.txt (1 hunks)
  • tests/storage/StorageTestHelper.hpp (3 hunks)
  • tests/storage/test-DataStorage.cpp (9 hunks)
  • tests/storage/test-MetadataStorage.cpp (1 hunks)
  • tests/utils/CoreDataUtils.hpp (2 hunks)
  • tests/utils/CoreTaskUtils.cpp (1 hunks)
  • tests/utils/CoreTaskUtils.hpp (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • tests/utils/CoreTaskUtils.hpp
🧰 Additional context used
🪛 cppcheck
tests/storage/test-DataStorage.cpp

[error] 48-48: syntax error

(syntaxError)

tests/storage/test-MetadataStorage.cpp

[error] 65-65: syntax error

(syntaxError)

tests/utils/CoreTaskUtils.cpp

[performance] 87-87: Variable 'id' is assigned in constructor body. Consider performing initialization in initialization list.

(useInitializationList)


[performance] 106-106: Variable 'm_id' is assigned in constructor body. Consider performing initialization in initialization list.

(useInitializationList)

🔇 Additional comments (20)
tests/CMakeLists.txt (1)

3-7: LGTM! The new test files are properly integrated.

The addition of metadata storage tests and task utility files aligns well with the PR objectives. The files are correctly organized in their respective directories (storage/ and utils/) and properly integrated into the SPIDER_TEST_SOURCES variable.

tests/utils/CoreDataUtils.hpp (3)

1-2: LGTM! Header guards are properly implemented.

The header guard naming convention follows the project standards.


5-5: Appropriate namespace change for test utilities.

Moving this utility to the spider::test namespace better reflects its purpose as a testing component and maintains clear separation from core functionality.


31-33: LGTM! Proper file closure with clear comments.

The namespace and header guard closings are well-documented.

tests/storage/StorageTestHelper.hpp (2)

1-2: LGTM: Include guard naming aligns with directory structure

The update from SPIDER_TEST_ to SPIDER_TESTS_ in the include guard better reflects the actual directory structure and maintains consistency.

Also applies to: 57-57


7-7: Consider making storage type lists extensible

While the type aliases improve readability, hardcoding MySQL as the only storage type might limit testing flexibility. Consider:

  1. Making these lists configurable or extensible for future storage backends
  2. Adding a comment explaining why MySQL is currently the only supported backend

Let's check if other storage backends exist:

Also applies to: 19-21

src/spider/storage/MetadataStorage.hpp (3)

54-54: LGTM: Enhanced precision for timeout values.

The change from float to double provides better precision for timeout values, which is important for accurate scheduling and monitoring.

#!/bin/bash
# Verify all timeout-related methods use double
rg 'timeout.*float|float.*timeout'

29-29: LGTM: Well-designed getter method for driver information.

The new method follows a consistent pattern with other getters in the interface and provides essential functionality for driver management.

✅ Verification successful

Method is properly implemented in MySQL backend

The get_driver method is correctly implemented in MysqlStorage.cpp with a matching signature and follows the interface contract. The implementation includes proper SQL query preparation and error handling.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Verify the implementation exists in MySQL backend
ast-grep --pattern 'auto get_driver(boost::uuids::uuid id, std::string* addr) -> StorageErr'

Length of output: 695


Script:

#!/bin/bash
# Search for get_driver implementation in MySQL storage
rg -A 5 "get_driver.*uuid.*addr.*StorageErr" src/

# Search for get_driver in MySQL storage class definition
ast-grep --pattern 'class MysqlStorage {
  $$$
  auto get_driver($$$) -> StorageErr $$$
  $$$
}'

Length of output: 1813


47-48: LGTM: Improved method signature for handling multiple child tasks.

The rename from get_child_task to get_child_tasks and the change to return a vector better represents the one-to-many relationship between tasks.

✅ Verification successful

The method signature change has been properly implemented across the codebase

The verification confirms that:

  • The pure virtual method get_child_tasks is correctly declared in MetadataStorage.hpp
  • The implementation in MySqlMetadataStorage properly overrides the method with the same signature
  • The method is being used correctly in tests and throughout the codebase
  • No legacy get_child_task (singular) calls remain
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Verify all implementations have been updated
rg -A 2 'get_child_task[s]?\(' 

Length of output: 695


Script:

#!/bin/bash
# Search for get_child_task or get_child_tasks with proper pattern matching
rg -A 2 'get_child_tasks?\s*\(' --type cpp

# Also search for class implementations of MetadataStorage
ast-grep --pattern 'class $_ : public MetadataStorage {
  $$$
}'

Length of output: 2191

src/spider/storage/MysqlStorage.hpp (4)

34-34: LGTM: Method signature is well-designed

The method declaration follows good C++ practices with proper error handling and output parameter design.


58-58: LGTM: Well-structured method signature

The method follows consistent patterns with other address retrieval methods in the class, properly separating address and port parameters.


55-55: Verify caller compatibility with double precision

The change from float to double improves precision but may affect existing callers.

#!/bin/bash
# Search for calls to heartbeat_timeout to verify type compatibility
rg "heartbeat_timeout\s*\([^)]*\)" --type cpp

51-51: Verify task collection handling in implementation

The method signature change from single task to vector is appropriate. Please ensure the implementation properly handles:

  1. Memory management for Task objects
  2. Error cases (empty results, partial failures)
src/spider/core/Task.hpp (1)

104-107: Verify the impact of removing creator-related parameters.

The constructor simplification looks good, but we should verify that removing the creator-related parameters doesn't break existing task creation workflows.

Also applies to: 109-114

✅ Verification successful

Let's search for more context about Task usage patterns and any related types that might have been affected by the constructor changes.


Let's search for more context about the Task class definition and its usage patterns, particularly focusing on instantiation and any remaining references to the removed creator-related code.


Constructor changes are safe and backwards compatible

The simplified Task constructor changes appear to be safe based on the following evidence:

  • No TaskCreator references exist in the codebase
  • Task instantiation patterns in tests show only function name being used: Task{"child"}, Task{"p1"}
  • TaskGraph integration shows compatibility with the new constructor through add_child_task
  • The constructor maintains core functionality (ID generation and function name) while removing unused creator-related parameters
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Search for Task constructor usage patterns to ensure all call sites are updated

# Test: Search for Task constructor calls
rg -A 2 'Task\(' --type cpp

# Test: Search for potential creator-related usages that might need updating
rg -i 'creator.*task' --type cpp

Length of output: 546


Script:

#!/bin/bash
# Search for any references to Task class and its constructors in implementation files
rg -A 5 'Task\s+\w+\s*\(' --type cpp

# Search for any Task instantiation patterns
rg -A 3 'new\s+Task\s*\(' --type cpp

# Search for any Task-related enums or types that might have been affected
rg -A 5 'enum.*Task' --type cpp

# Look for any Task factory or creation patterns
rg -A 5 'create.*Task' --type cpp

Length of output: 397


Script:

#!/bin/bash
# Search for any Task class usage in header files
rg -A 5 'class.*Task' --type cpp

# Search for any Task object instantiation or usage
rg -A 3 'Task\s+[a-zA-Z_][a-zA-Z0-9_]*\s*[{=]' --type cpp

# Look for any removed TaskCreatorType references
rg -A 3 'TaskCreator' --type cpp

# Search for any Task-related method implementations
ast-grep --pattern 'Task::$_($$$) {
  $$$
}'

Length of output: 3065

tests/storage/test-DataStorage.cpp (4)

1-1: LGTM! Enhanced NOLINT comments improve code quality guidelines.

The NOLINT comments now include additional checks against non-const global variables and C-style arrays, which better aligns with modern C++ practices.

Also applies to: 141-141


19-23: LGTM! Improved test case flexibility with TEMPLATE_LIST_TEST_CASE.

The change to use TEMPLATE_LIST_TEST_CASE with DataStorageTypeList allows for better test parameterization.


109-112: LGTM! Consistent test structure with task reference test case.

The test follows the same robust pattern as the task reference test case, properly handling error cases and prerequisites.


77-80: LGTM! Verify task creation parameters.

The test structure is good, using proper error cases and prerequisites. However, let's verify the task creation parameters are consistent with the updated Task class (removal of TaskCreatorType).

Also applies to: 90-90

✅ Verification successful

The previous ast-grep pattern might have failed due to syntax. Let's try with ripgrep to get a better understanding of Task usage and definition in the codebase.


Let's verify the actual test implementation to see how Tasks are created and if there are any remaining TaskCreatorType references.


Task constructor usage is consistent with current implementation

The test is using the correct Task constructor signature (Task{"func"}), which matches the current Task class implementation that accepts a function name. There are no references to the removed TaskCreatorType, and the Task creation is properly implemented throughout the codebase.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify Task constructor usage consistency
# Test: Search for Task constructor calls to ensure they match the new signature
ast-grep --pattern 'Task($_)'

Length of output: 31


Script:

#!/bin/bash
# Search for Task constructor calls with ripgrep for better coverage
rg "Task\(" -A 2

# Also search for Task class definition to understand the constructor signature
rg "class Task" -A 5

Length of output: 2263


Script:

#!/bin/bash
# Let's examine the test file to see how Task is being used
rg "Task" tests/storage/test-DataStorage.cpp -A 2

# Also check for any TaskCreatorType references that might have been removed
rg "TaskCreatorType" -A 2

Length of output: 432

tests/storage/test-MetadataStorage.cpp (1)

1-20: LGTM: File structure and includes are well-organized.

The file includes all necessary headers and uses appropriate NOLINT pragmas to suppress specific warnings that are acceptable in test files.

src/spider/storage/MysqlStorage.cpp (1)

955-962: Validate the heartbeat timeout calculation

In the heartbeat_timeout method, the timeout is converted using cMillisecondToMicrosecond, which is set to 1000. Since there are 1,000,000 microseconds in a second, ensure that the conversion accurately reflects the intended timeout duration.

Comment on lines +47 to 49
void add_dependency(boost::uuids::uuid parent, boost::uuids::uuid child) {
m_dependencies.emplace_back(parent, child);
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Consider adding task existence validation

The method should verify that both parent and child tasks exist in m_tasks before adding the dependency to prevent orphaned dependencies.

 void add_dependency(boost::uuids::uuid parent, boost::uuids::uuid child) {
+    if (!m_tasks.contains(parent) || !m_tasks.contains(child)) {
+        throw std::invalid_argument("Both parent and child tasks must exist");
+    }
     m_dependencies.emplace_back(parent, child);
 }

Committable suggestion skipped: line range outside the PR's diff.

Comment on lines 111 to 201
std::unique_ptr<spider::core::MetadataStorage> storage
= spider::test::create_metadata_storage<TestType>();

boost::uuids::random_generator gen;
boost::uuids::uuid const job_id = gen();

// Create a complicated task graph
boost::uuids::uuid const client_id = gen();
spider::core::Task child_task{"child"};
spider::core::Task parent_1{"p1"};
spider::core::Task parent_2{"p2"};
parent_1.add_input(spider::core::TaskInput{"1", "float"});
parent_1.add_input(spider::core::TaskInput{"2", "float"});
parent_2.add_input(spider::core::TaskInput{"3", "int"});
parent_2.add_input(spider::core::TaskInput{"4", "int"});
parent_1.add_output(spider::core::TaskOutput{"float"});
parent_2.add_output(spider::core::TaskOutput{"int"});
child_task.add_input(spider::core::TaskInput{parent_1.get_id(), 0, "float"});
child_task.add_input(spider::core::TaskInput{parent_2.get_id(), 0, "int"});
child_task.add_output(spider::core::TaskOutput{"float"});
spider::core::TaskGraph graph;
// Add task and dependencies to task graph in wrong order
graph.add_task(child_task);
graph.add_task(parent_1);
graph.add_task(parent_2);
graph.add_dependency(parent_2.get_id(), child_task.get_id());
graph.add_dependency(parent_1.get_id(), child_task.get_id());

// Get head tasks should succeed
absl::flat_hash_set<boost::uuids::uuid> heads = graph.get_head_tasks();
REQUIRE(2 == heads.size());
REQUIRE(heads.contains(parent_1.get_id()));
REQUIRE(heads.contains(parent_2.get_id()));

// Submit a simple job
boost::uuids::uuid const simple_job_id = gen();
spider::core::Task const simple_task{"simple"};
spider::core::TaskGraph simple_graph;
simple_graph.add_task(simple_task);

heads = simple_graph.get_head_tasks();
REQUIRE(1 == heads.size());
REQUIRE(heads.contains(simple_task.get_id()));

// Submit job should success
REQUIRE(storage->add_job(job_id, client_id, graph).success());
REQUIRE(storage->add_job(simple_job_id, client_id, simple_graph).success());

// Get job id for non-existent client id should return empty vector
std::vector<boost::uuids::uuid> job_ids;
REQUIRE(storage->get_jobs_by_client_id(gen(), &job_ids).success());
REQUIRE(job_ids.empty());

// Get job id for client id should get correct value;
REQUIRE(storage->get_jobs_by_client_id(client_id, &job_ids).success());
REQUIRE(2 == job_ids.size());
REQUIRE(
((job_ids[0] == job_id && job_ids[1] == simple_job_id)
|| (job_ids[0] == simple_job_id && job_ids[1] == job_id))
);

// Get task graph should succeed
spider::core::TaskGraph graph_res{};
REQUIRE(storage->get_task_graph(job_id, &graph_res).success());
REQUIRE(spider::test::task_graph_equal(graph, graph_res));
spider::core::TaskGraph simple_graph_res{};
REQUIRE(storage->get_task_graph(simple_job_id, &simple_graph_res).success());
REQUIRE(spider::test::task_graph_equal(simple_graph, simple_graph_res));

// Get task should succeed
spider::core::Task task_res{""};
REQUIRE(storage->get_task(child_task.get_id(), &task_res).success());
REQUIRE(spider::test::task_equal(child_task, task_res));

// Get child tasks should succeed
std::vector<spider::core::Task> tasks;
REQUIRE(storage->get_child_tasks(parent_1.get_id(), &tasks).success());
REQUIRE(1 == tasks.size());
REQUIRE(spider::test::task_equal(child_task, tasks[0]));
tasks.clear();

// Get parent tasks should succeed
REQUIRE(storage->get_parent_tasks(child_task.get_id(), &tasks).success());
REQUIRE(2 == tasks.size());
REQUIRE(
((spider::test::task_equal(tasks[0], parent_1)
&& spider::test::task_equal(tasks[1], parent_2))
|| (spider::test::task_equal(tasks[0], parent_2)
&& spider::test::task_equal(tasks[1], parent_1)))
);
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codebase verification

🛠️ Refactor suggestion

The review comment is accurate - test coverage needs improvement

The test file's name suggests it covers job removal ("Job add, get ane[sic] remove"), but despite remove_job being a required virtual method in the MetadataStorage interface, there are no tests validating this functionality. Additionally, the review correctly identifies missing test cases for error scenarios and the need to refactor the complex task graph setup.

Key findings:

  • No tests for remove_job despite being part of the interface
  • Missing error scenario tests (invalid graphs, duplicates)
  • Complex setup code that could benefit from helper functions
  • Test name is misleading as it doesn't test removal functionality
🔗 Analysis chain

Consider improving test structure and coverage.

While the test thoroughly validates the happy path for job management, there are several areas for improvement:

  1. Complex task graph setup could be moved to helper functions
  2. Missing test cases for error scenarios
  3. No validation of job removal despite being mentioned in the test name

Consider extracting task graph setup to a helper function:

namespace spider::test {
    spider::core::TaskGraph create_test_task_graph() {
        spider::core::Task child_task{"child"};
        spider::core::Task parent_1{"p1"};
        spider::core::Task parent_2{"p2"};
        // ... setup tasks ...
        spider::core::TaskGraph graph;
        // ... add tasks and dependencies ...
        return graph;
    }
}

Also, consider adding test cases for:

  • Job removal functionality
  • Adding jobs with invalid task graphs
  • Adding duplicate jobs
  • Race conditions in concurrent job submissions

Let me gather more information about the test coverage for job removal functionality.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Verify if job removal functionality exists in MetadataStorage
rg -A 5 "remove_job|delete_job" src/spider/storage/MetadataStorage.hpp

Length of output: 570


Script:

#!/bin/bash
# Check if there are any tests for remove_job in the test files
rg -B 2 -A 5 "remove_job|delete_job" tests/

# Look for the complete test file to understand the test scope
cat tests/storage/test-MetadataStorage.cpp

Length of output: 8168

Comment on lines 125 to 128
if (!vector_sort_equal<std::pair<boost::uuids::uuid, boost::uuids::uuid>>(
graph_1.get_dependencies(),
graph_1.get_dependencies()
))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix comparison in task_graph_equal function

The function currently compares graph_1.get_dependencies() with itself instead of graph_2.get_dependencies(). This prevents accurate comparison between two different task graphs.

Apply this diff to correct the comparison:

 if (!vector_sort_equal<std::pair<boost::uuids::uuid, boost::uuids::uuid>>(
             graph_1.get_dependencies(),
-            graph_1.get_dependencies()
+            graph_2.get_dependencies()
         ))
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if (!vector_sort_equal<std::pair<boost::uuids::uuid, boost::uuids::uuid>>(
graph_1.get_dependencies(),
graph_1.get_dependencies()
))
if (!vector_sort_equal<std::pair<boost::uuids::uuid, boost::uuids::uuid>>(
graph_1.get_dependencies(),
graph_2.get_dependencies()
))

Comment on lines +427 to +432
std::optional<Task const*> const task_option = task_graph.get_task(task_id);
if (!task_option.has_value()) {
m_conn->rollback();
return StorageErr{StorageErrType::KeyNotFoundErr, "Task graph inconsistent"};
}
Task const& task = task_option.value();
this->add_task(job_id_bytes, task);
Task const* task = task_option.value();
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Refactor duplicated code in add_job to a helper function

The code blocks at lines 427-432 and 444-449 are identical. Refactoring this duplicated code into a helper function would enhance maintainability and reduce redundancy.

Also applies to: 444-449

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (6)
tests/storage/test-MetadataStorage.cpp (5)

1-1: Consider documenting the rationale for suppressed warnings.

Please add a brief comment explaining why each warning is being suppressed. This helps future maintainers understand the reasoning behind these suppressions.


27-27: Consider using named constants for better readability.

The magic numbers in the test could be more descriptive:

  • cDuration = 100 could indicate the unit (milliseconds/seconds)
  • The sleep duration of 1 second could be a named constant
-    constexpr double cDuration = 100;
+    // Duration in milliseconds for heartbeat timeout
+    constexpr double cHeartbeatTimeoutMs = 100;

-    std::this_thread::sleep_for(std::chrono::seconds(1));
+    constexpr auto cHeartbeatWaitTime = std::chrono::seconds(1);
+    std::this_thread::sleep_for(cHeartbeatWaitTime);

Also applies to: 47-47


75-75: Consider using enums or constants for scheduler states and ports.

The hardcoded values could be more maintainable:

  • Port 3306 should be a named constant
  • Scheduler states could be defined as enums or constants
+    // Common ports used in tests
+    namespace test_ports {
+        constexpr int kMySqlPort = 3306;
+    }
+
+    // Scheduler states
+    namespace scheduler_states {
+        constexpr char kNormal[] = "normal";
+        constexpr char kRecovery[] = "recovery";
+    }

-    constexpr int cPort = 3306;
+    constexpr int cPort = test_ports::kMySqlPort;

-    REQUIRE(state_res == "normal");
+    REQUIRE(state_res == scheduler_states::kNormal);

Also applies to: 94-94


117-138: Consider extracting task graph setup to helper functions.

The task graph creation logic is complex and could be moved to helper functions for better reusability and readability.

namespace spider::test {
    spider::core::TaskGraph create_complex_task_graph() {
        spider::core::Task child_task{"child"};
        spider::core::Task parent_1{"p1"};
        spider::core::Task parent_2{"p2"};
        
        // Setup inputs/outputs
        setup_parent_task(parent_1, {"1", "2"}, "float");
        setup_parent_task(parent_2, {"3", "4"}, "int");
        setup_child_task(child_task, parent_1, parent_2);
        
        // Create graph with dependencies
        spider::core::TaskGraph graph;
        add_tasks_with_dependencies(graph, {child_task, parent_1, parent_2});
        return graph;
    }
}

106-209: Consider adding more error test cases.

While the happy path is well tested, consider adding test cases for:

  • Adding jobs with invalid task graphs (cycles, disconnected components)
  • Adding duplicate jobs
  • Removing non-existent jobs
  • Race conditions in concurrent job submissions
tests/utils/CoreTaskUtils.cpp (1)

63-64: Use size_t for loop index to match container size type

The loop index i is declared as int, whereas v1.size() returns a size_t. Using different integer types may lead to potential warnings or issues related to signed versus unsigned comparisons.

Apply this diff to correct the index type:

-for (int i = 0; i < v1.size(); ++i) {
+for (size_t i = 0; i < v1.size(); ++i) {
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 7ab7a61 and ef9fd9a.

📒 Files selected for processing (2)
  • tests/storage/test-MetadataStorage.cpp (1 hunks)
  • tests/utils/CoreTaskUtils.cpp (1 hunks)
🧰 Additional context used
🪛 cppcheck
tests/storage/test-MetadataStorage.cpp

[error] 65-65: syntax error

(syntaxError)

tests/utils/CoreTaskUtils.cpp

[performance] 87-87: Variable 'id' is assigned in constructor body. Consider performing initialization in initialization list.

(useInitializationList)


[performance] 106-106: Variable 'm_id' is assigned in constructor body. Consider performing initialization in initialization list.

(useInitializationList)

🔇 Additional comments (2)
tests/storage/test-MetadataStorage.cpp (1)

202-208: Well done on implementing job removal tests!

The test now properly validates the remove_job functionality, addressing the concerns from previous reviews. It verifies both the successful removal and the persistence of unrelated jobs.

tests/utils/CoreTaskUtils.cpp (1)

1-187: Overall, the code changes are well-implemented

The utility functions for comparing task-related data structures are correctly defined, and the logic is sound. The previous issue regarding the comparison of dependencies in task_graph_equal has been resolved.

🧰 Tools
🪛 cppcheck

[performance] 87-87: Variable 'id' is assigned in constructor body. Consider performing initialization in initialization list.

(useInitializationList)


[performance] 106-106: Variable 'm_id' is assigned in constructor body. Consider performing initialization in initialization list.

(useInitializationList)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant