-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test: Add unit tests for metadata storage and fix bugs in MySQL storage backend #23
test: Add unit tests for metadata storage and fix bugs in MySQL storage backend #23
Conversation
WalkthroughThe pull request introduces significant modifications to several classes within the Changes
Possibly related PRs
Suggested reviewers
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🧹 Outside diff range and nitpick comments (18)
tests/utils/CoreDataUtils.hpp (1)
Line range hint
7-30
: Consider optimizing property access with structured bindings.While the implementation is correct, we could make it more efficient by reducing repeated getter calls.
Here's a suggested optimization:
-inline auto data_equal(core::Data const& d1, core::Data const& d2) -> bool { - if (d1.get_id() != d2.get_id()) { - return false; - } - - if (d1.get_key() != d2.get_key()) { - return false; - } - - if (d1.get_locality() != d2.get_locality()) { - return false; - } - - if (d1.is_hard_locality() != d2.is_hard_locality()) { - return false; - } - - if (d1.get_value() != d2.get_value()) { - return false; - } - - return true; +inline auto data_equal(core::Data const& d1, core::Data const& d2) -> bool { + auto [id1, key1, locality1, hard_locality1, value1] = + std::tuple{d1.get_id(), d1.get_key(), d1.get_locality(), + d1.is_hard_locality(), d1.get_value()}; + auto [id2, key2, locality2, hard_locality2, value2] = + std::tuple{d2.get_id(), d2.get_key(), d2.get_locality(), + d2.is_hard_locality(), d2.get_value()}; + + return id1 == id2 && key1 == key2 && locality1 == locality2 && + hard_locality1 == hard_locality2 && value1 == value2; }tests/storage/StorageTestHelper.hpp (1)
Line range hint
23-54
: Consider improving test environment managementTwo suggestions for the storage creation functions:
- Add cleanup functionality to ensure test data is removed after each test
- Move the hardcoded database credentials to a configuration file or environment variables
Example improvement:
+ // Load credentials from environment or config + static auto get_storage_url() { + auto* url = std::getenv("SPIDER_TEST_DB_URL"); + return url ? url : cStorageUrl; // Fall back to default if not set + } template <class T> requires std::derived_from<T, core::DataStorage> auto create_data_storage() -> std::unique_ptr<core::DataStorage> { std::unique_ptr<core::DataStorage> storage = std::make_unique<T>(); - REQUIRE(storage->connect(spider::test::cStorageUrl).success()); + REQUIRE(storage->connect(get_storage_url()).success()); REQUIRE(storage->initialize().success()); return storage; }src/spider/storage/MetadataStorage.hpp (1)
56-57
: LGTM: Well-structured method for scheduler address retrieval.The new method follows the established pattern and provides essential functionality for distributed system coordination.
Consider adding a brief comment documenting the expected format of the address string (e.g., IPv4, IPv6, hostname) to ensure consistent usage across implementations.
src/spider/core/TaskGraph.hpp (2)
51-56
: Document pointer validity guaranteesThe method now returns a pointer to a map element. Please document the lifetime/validity guarantees of the returned pointer, particularly in relation to map modifications.
+ /** + * Retrieves a const pointer to a task by its ID. + * @note The returned pointer is valid until the next modification of the task graph. + */ [[nodiscard]] auto get_task(boost::uuids::uuid id) const -> std::optional<Task const*> {
58-62
: Document usage guidelines for mutable task accessThe non-const overload allows direct modification of tasks, which could potentially break graph invariants. Consider:
- Documenting safe modification patterns
- Adding runtime checks for critical task properties
+ /** + * Retrieves a mutable pointer to a task by its ID. + * @warning Modify task properties with caution to maintain graph consistency. + * The task ID must not be changed. + */ [[nodiscard]] auto get_task(boost::uuids::uuid id) -> std::optional<Task*> {src/spider/storage/MysqlStorage.hpp (1)
66-66
: Add documentation for the enhanced functionalityThe rename from
fetch_task
tofetch_full_task
suggests expanded functionality. Please add a brief comment explaining what constitutes a "full" task fetch.- auto fetch_full_task(std::unique_ptr<sql::ResultSet> const& res) -> Task; + /// Fetches a task with all its associated data (inputs, outputs, etc.) from the result set + auto fetch_full_task(std::unique_ptr<sql::ResultSet> const& res) -> Task;src/spider/core/Task.hpp (1)
135-138
: Consider marking vectors as const in class definition.The new getter methods returning const references are good for performance. Consider also marking the vectors as const in the class definition if they're not meant to be modified after construction.
private: boost::uuids::uuid m_id; std::string m_function_name; TaskState m_state = TaskState::Pending; float m_timeout = 0; - std::vector<TaskInput> m_inputs; - std::vector<TaskOutput> m_outputs; + const std::vector<TaskInput> m_inputs; + const std::vector<TaskOutput> m_outputs;tests/storage/test-DataStorage.cpp (2)
38-38
: Consider adding more specific failure messages.While the data comparison is correct, adding a custom message would help debug test failures.
- REQUIRE(spider::test::data_equal(data, result)); + REQUIRE(spider::test::data_equal(data, result), "Retrieved data does not match stored data");
48-51
: Apply the same improvements as the first test case.For consistency with the first test case:
- The test structure improvements are good
- Consider adding the same custom failure message for data comparison
- REQUIRE(spider::test::data_equal(data, result)); + REQUIRE(spider::test::data_equal(data, result), "Retrieved data by key does not match stored data");Also applies to: 67-67
🧰 Tools
🪛 cppcheck
[error] 48-48: syntax error
(syntaxError)
tests/storage/test-MetadataStorage.cpp (3)
23-63
: Consider improving test reliability for timing-dependent assertions.The test case effectively validates driver heartbeat functionality but has two potential reliability issues:
- The
cDuration
constant lacks units specification- Using
sleep_for
in tests can lead to flaky behaviour in CI environmentsConsider these improvements:
- constexpr double cDuration = 100; + constexpr std::chrono::milliseconds cHeartbeatTimeout{100};Also, consider implementing a mock time provider to avoid relying on actual sleep durations.
65-104
: Enhance test coverage and maintainability.The scheduler state test is thorough but could be improved in several areas:
- Magic values should be named constants
- Missing test cases for invalid state transitions
- Limited set of state values tested
Consider these improvements:
+ // At the namespace level + namespace { + constexpr int kDefaultSchedulerPort = 3306; + constexpr char kNormalState[] = "normal"; + constexpr char kRecoveryState[] = "recovery"; + } - constexpr int cPort = 3306; + constexpr int cPort = kDefaultSchedulerPort; - REQUIRE(state_res == "normal"); + REQUIRE(state_res == kNormalState);Also consider adding test cases for:
- Invalid state transitions
- Invalid state string values
- Concurrent state updates
🧰 Tools
🪛 cppcheck
[error] 65-65: syntax error
(syntaxError)
106-110
: Fix typo in test case name.There's a typo in the test case name: "Job add, get ane remove" should be "Job add, get and remove".
tests/utils/CoreTaskUtils.cpp (2)
22-24
: Simplifyfloat_equal
function usingstd::abs
For better clarity and to handle negative differences appropriately, consider using
std::abs
in thefloat_equal
function.Apply this diff to simplify the function:
+#include <cmath> auto float_equal(float f1, float f2) -> bool { - return f1 - f2 < cEpsilon && f2 - f1 < cEpsilon; + return std::abs(f1 - f2) < cEpsilon; }
62-63
: Usesize_t
for loop counter to match vector size typeChanging the loop variable
i
fromint
tosize_t
avoids potential signed/unsigned comparison issues.Apply this diff:
- for (int i = 0; i < v1.size(); ++i) { + for (size_t i = 0; i < v1.size(); ++i) {src/spider/storage/MysqlStorage.cpp (4)
305-328
: Consider removing unnecessary transaction management in read-only methodget_driver
The method
get_driver
performs a read-only operation but usesm_conn->rollback()
andm_conn->commit()
. For read-only operations, transaction management may not be necessary and can be omitted to improve performance.
503-514
: Use column names instead of indices for clarityIn the functions
fetch_task_input
andfetch_task_output
, hard-coded column indices likeres->getString(3)
are used. Using column names instead of indices improves readability and reduces the risk of errors if the SQL query changes.Also applies to: 524-533
Line range hint
767-781
: Consider removing unnecessary transaction management in read-only methodget_task
The
get_task
method performs a read-only operation but includesm_conn->rollback()
andm_conn->commit()
. Omitting transaction management in read-only methods can enhance performance.
Line range hint
1147-1164
: Ensure consistent error handling in data retrieval methodsIn
get_data
andget_data_by_key
, when no data is found,m_conn->rollback()
is called before returning an error. Since these methods are read-only, consider whether transaction rollback is necessary.Also applies to: 1190-1208
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (12)
src/spider/core/Task.hpp
(2 hunks)src/spider/core/TaskGraph.hpp
(1 hunks)src/spider/storage/MetadataStorage.hpp
(2 hunks)src/spider/storage/MysqlStorage.cpp
(23 hunks)src/spider/storage/MysqlStorage.hpp
(2 hunks)tests/CMakeLists.txt
(1 hunks)tests/storage/StorageTestHelper.hpp
(3 hunks)tests/storage/test-DataStorage.cpp
(9 hunks)tests/storage/test-MetadataStorage.cpp
(1 hunks)tests/utils/CoreDataUtils.hpp
(2 hunks)tests/utils/CoreTaskUtils.cpp
(1 hunks)tests/utils/CoreTaskUtils.hpp
(1 hunks)
✅ Files skipped from review due to trivial changes (1)
- tests/utils/CoreTaskUtils.hpp
🧰 Additional context used
🪛 cppcheck
tests/storage/test-DataStorage.cpp
[error] 48-48: syntax error
(syntaxError)
tests/storage/test-MetadataStorage.cpp
[error] 65-65: syntax error
(syntaxError)
tests/utils/CoreTaskUtils.cpp
[performance] 87-87: Variable 'id' is assigned in constructor body. Consider performing initialization in initialization list.
(useInitializationList)
[performance] 106-106: Variable 'm_id' is assigned in constructor body. Consider performing initialization in initialization list.
(useInitializationList)
🔇 Additional comments (20)
tests/CMakeLists.txt (1)
3-7
: LGTM! The new test files are properly integrated.
The addition of metadata storage tests and task utility files aligns well with the PR objectives. The files are correctly organized in their respective directories (storage/ and utils/) and properly integrated into the SPIDER_TEST_SOURCES variable.
tests/utils/CoreDataUtils.hpp (3)
1-2
: LGTM! Header guards are properly implemented.
The header guard naming convention follows the project standards.
5-5
: Appropriate namespace change for test utilities.
Moving this utility to the spider::test
namespace better reflects its purpose as a testing component and maintains clear separation from core functionality.
31-33
: LGTM! Proper file closure with clear comments.
The namespace and header guard closings are well-documented.
tests/storage/StorageTestHelper.hpp (2)
1-2
: LGTM: Include guard naming aligns with directory structure
The update from SPIDER_TEST_
to SPIDER_TESTS_
in the include guard better reflects the actual directory structure and maintains consistency.
Also applies to: 57-57
7-7
: Consider making storage type lists extensible
While the type aliases improve readability, hardcoding MySQL as the only storage type might limit testing flexibility. Consider:
- Making these lists configurable or extensible for future storage backends
- Adding a comment explaining why MySQL is currently the only supported backend
Let's check if other storage backends exist:
Also applies to: 19-21
src/spider/storage/MetadataStorage.hpp (3)
54-54
: LGTM: Enhanced precision for timeout values.
The change from float
to double
provides better precision for timeout values, which is important for accurate scheduling and monitoring.
#!/bin/bash
# Verify all timeout-related methods use double
rg 'timeout.*float|float.*timeout'
29-29
: LGTM: Well-designed getter method for driver information.
The new method follows a consistent pattern with other getters in the interface and provides essential functionality for driver management.
✅ Verification successful
Method is properly implemented in MySQL backend
The get_driver
method is correctly implemented in MysqlStorage.cpp
with a matching signature and follows the interface contract. The implementation includes proper SQL query preparation and error handling.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Verify the implementation exists in MySQL backend
ast-grep --pattern 'auto get_driver(boost::uuids::uuid id, std::string* addr) -> StorageErr'
Length of output: 695
Script:
#!/bin/bash
# Search for get_driver implementation in MySQL storage
rg -A 5 "get_driver.*uuid.*addr.*StorageErr" src/
# Search for get_driver in MySQL storage class definition
ast-grep --pattern 'class MysqlStorage {
$$$
auto get_driver($$$) -> StorageErr $$$
$$$
}'
Length of output: 1813
47-48
: LGTM: Improved method signature for handling multiple child tasks.
The rename from get_child_task
to get_child_tasks
and the change to return a vector better represents the one-to-many relationship between tasks.
✅ Verification successful
The method signature change has been properly implemented across the codebase
The verification confirms that:
- The pure virtual method
get_child_tasks
is correctly declared inMetadataStorage.hpp
- The implementation in
MySqlMetadataStorage
properly overrides the method with the same signature - The method is being used correctly in tests and throughout the codebase
- No legacy
get_child_task
(singular) calls remain
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Verify all implementations have been updated
rg -A 2 'get_child_task[s]?\('
Length of output: 695
Script:
#!/bin/bash
# Search for get_child_task or get_child_tasks with proper pattern matching
rg -A 2 'get_child_tasks?\s*\(' --type cpp
# Also search for class implementations of MetadataStorage
ast-grep --pattern 'class $_ : public MetadataStorage {
$$$
}'
Length of output: 2191
src/spider/storage/MysqlStorage.hpp (4)
34-34
: LGTM: Method signature is well-designed
The method declaration follows good C++ practices with proper error handling and output parameter design.
58-58
: LGTM: Well-structured method signature
The method follows consistent patterns with other address retrieval methods in the class, properly separating address and port parameters.
55-55
: Verify caller compatibility with double precision
The change from float to double improves precision but may affect existing callers.
#!/bin/bash
# Search for calls to heartbeat_timeout to verify type compatibility
rg "heartbeat_timeout\s*\([^)]*\)" --type cpp
51-51
: Verify task collection handling in implementation
The method signature change from single task to vector is appropriate. Please ensure the implementation properly handles:
- Memory management for Task objects
- Error cases (empty results, partial failures)
src/spider/core/Task.hpp (1)
104-107
: Verify the impact of removing creator-related parameters.
The constructor simplification looks good, but we should verify that removing the creator-related parameters doesn't break existing task creation workflows.
Also applies to: 109-114
✅ Verification successful
Let's search for more context about Task usage patterns and any related types that might have been affected by the constructor changes.
Let's search for more context about the Task class definition and its usage patterns, particularly focusing on instantiation and any remaining references to the removed creator-related code.
Constructor changes are safe and backwards compatible
The simplified Task constructor changes appear to be safe based on the following evidence:
- No TaskCreator references exist in the codebase
- Task instantiation patterns in tests show only function name being used:
Task{"child"}
,Task{"p1"}
- TaskGraph integration shows compatibility with the new constructor through
add_child_task
- The constructor maintains core functionality (ID generation and function name) while removing unused creator-related parameters
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Search for Task constructor usage patterns to ensure all call sites are updated
# Test: Search for Task constructor calls
rg -A 2 'Task\(' --type cpp
# Test: Search for potential creator-related usages that might need updating
rg -i 'creator.*task' --type cpp
Length of output: 546
Script:
#!/bin/bash
# Search for any references to Task class and its constructors in implementation files
rg -A 5 'Task\s+\w+\s*\(' --type cpp
# Search for any Task instantiation patterns
rg -A 3 'new\s+Task\s*\(' --type cpp
# Search for any Task-related enums or types that might have been affected
rg -A 5 'enum.*Task' --type cpp
# Look for any Task factory or creation patterns
rg -A 5 'create.*Task' --type cpp
Length of output: 397
Script:
#!/bin/bash
# Search for any Task class usage in header files
rg -A 5 'class.*Task' --type cpp
# Search for any Task object instantiation or usage
rg -A 3 'Task\s+[a-zA-Z_][a-zA-Z0-9_]*\s*[{=]' --type cpp
# Look for any removed TaskCreatorType references
rg -A 3 'TaskCreator' --type cpp
# Search for any Task-related method implementations
ast-grep --pattern 'Task::$_($$$) {
$$$
}'
Length of output: 3065
tests/storage/test-DataStorage.cpp (4)
1-1
: LGTM! Enhanced NOLINT comments improve code quality guidelines.
The NOLINT comments now include additional checks against non-const global variables and C-style arrays, which better aligns with modern C++ practices.
Also applies to: 141-141
19-23
: LGTM! Improved test case flexibility with TEMPLATE_LIST_TEST_CASE.
The change to use TEMPLATE_LIST_TEST_CASE with DataStorageTypeList allows for better test parameterization.
109-112
: LGTM! Consistent test structure with task reference test case.
The test follows the same robust pattern as the task reference test case, properly handling error cases and prerequisites.
77-80
: LGTM! Verify task creation parameters.
The test structure is good, using proper error cases and prerequisites. However, let's verify the task creation parameters are consistent with the updated Task class (removal of TaskCreatorType).
Also applies to: 90-90
✅ Verification successful
The previous ast-grep pattern might have failed due to syntax. Let's try with ripgrep to get a better understanding of Task usage and definition in the codebase.
Let's verify the actual test implementation to see how Tasks are created and if there are any remaining TaskCreatorType references.
Task constructor usage is consistent with current implementation
The test is using the correct Task constructor signature (Task{"func"}
), which matches the current Task class implementation that accepts a function name. There are no references to the removed TaskCreatorType, and the Task creation is properly implemented throughout the codebase.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify Task constructor usage consistency
# Test: Search for Task constructor calls to ensure they match the new signature
ast-grep --pattern 'Task($_)'
Length of output: 31
Script:
#!/bin/bash
# Search for Task constructor calls with ripgrep for better coverage
rg "Task\(" -A 2
# Also search for Task class definition to understand the constructor signature
rg "class Task" -A 5
Length of output: 2263
Script:
#!/bin/bash
# Let's examine the test file to see how Task is being used
rg "Task" tests/storage/test-DataStorage.cpp -A 2
# Also check for any TaskCreatorType references that might have been removed
rg "TaskCreatorType" -A 2
Length of output: 432
tests/storage/test-MetadataStorage.cpp (1)
1-20
: LGTM: File structure and includes are well-organized.
The file includes all necessary headers and uses appropriate NOLINT pragmas to suppress specific warnings that are acceptable in test files.
src/spider/storage/MysqlStorage.cpp (1)
955-962
: Validate the heartbeat timeout calculation
In the heartbeat_timeout
method, the timeout is converted using cMillisecondToMicrosecond
, which is set to 1000
. Since there are 1,000,000 microseconds in a second, ensure that the conversion accurately reflects the intended timeout duration.
void add_dependency(boost::uuids::uuid parent, boost::uuids::uuid child) { | ||
m_dependencies.emplace_back(parent, child); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Consider adding task existence validation
The method should verify that both parent and child tasks exist in m_tasks before adding the dependency to prevent orphaned dependencies.
void add_dependency(boost::uuids::uuid parent, boost::uuids::uuid child) {
+ if (!m_tasks.contains(parent) || !m_tasks.contains(child)) {
+ throw std::invalid_argument("Both parent and child tasks must exist");
+ }
m_dependencies.emplace_back(parent, child);
}
Committable suggestion skipped: line range outside the PR's diff.
std::unique_ptr<spider::core::MetadataStorage> storage | ||
= spider::test::create_metadata_storage<TestType>(); | ||
|
||
boost::uuids::random_generator gen; | ||
boost::uuids::uuid const job_id = gen(); | ||
|
||
// Create a complicated task graph | ||
boost::uuids::uuid const client_id = gen(); | ||
spider::core::Task child_task{"child"}; | ||
spider::core::Task parent_1{"p1"}; | ||
spider::core::Task parent_2{"p2"}; | ||
parent_1.add_input(spider::core::TaskInput{"1", "float"}); | ||
parent_1.add_input(spider::core::TaskInput{"2", "float"}); | ||
parent_2.add_input(spider::core::TaskInput{"3", "int"}); | ||
parent_2.add_input(spider::core::TaskInput{"4", "int"}); | ||
parent_1.add_output(spider::core::TaskOutput{"float"}); | ||
parent_2.add_output(spider::core::TaskOutput{"int"}); | ||
child_task.add_input(spider::core::TaskInput{parent_1.get_id(), 0, "float"}); | ||
child_task.add_input(spider::core::TaskInput{parent_2.get_id(), 0, "int"}); | ||
child_task.add_output(spider::core::TaskOutput{"float"}); | ||
spider::core::TaskGraph graph; | ||
// Add task and dependencies to task graph in wrong order | ||
graph.add_task(child_task); | ||
graph.add_task(parent_1); | ||
graph.add_task(parent_2); | ||
graph.add_dependency(parent_2.get_id(), child_task.get_id()); | ||
graph.add_dependency(parent_1.get_id(), child_task.get_id()); | ||
|
||
// Get head tasks should succeed | ||
absl::flat_hash_set<boost::uuids::uuid> heads = graph.get_head_tasks(); | ||
REQUIRE(2 == heads.size()); | ||
REQUIRE(heads.contains(parent_1.get_id())); | ||
REQUIRE(heads.contains(parent_2.get_id())); | ||
|
||
// Submit a simple job | ||
boost::uuids::uuid const simple_job_id = gen(); | ||
spider::core::Task const simple_task{"simple"}; | ||
spider::core::TaskGraph simple_graph; | ||
simple_graph.add_task(simple_task); | ||
|
||
heads = simple_graph.get_head_tasks(); | ||
REQUIRE(1 == heads.size()); | ||
REQUIRE(heads.contains(simple_task.get_id())); | ||
|
||
// Submit job should success | ||
REQUIRE(storage->add_job(job_id, client_id, graph).success()); | ||
REQUIRE(storage->add_job(simple_job_id, client_id, simple_graph).success()); | ||
|
||
// Get job id for non-existent client id should return empty vector | ||
std::vector<boost::uuids::uuid> job_ids; | ||
REQUIRE(storage->get_jobs_by_client_id(gen(), &job_ids).success()); | ||
REQUIRE(job_ids.empty()); | ||
|
||
// Get job id for client id should get correct value; | ||
REQUIRE(storage->get_jobs_by_client_id(client_id, &job_ids).success()); | ||
REQUIRE(2 == job_ids.size()); | ||
REQUIRE( | ||
((job_ids[0] == job_id && job_ids[1] == simple_job_id) | ||
|| (job_ids[0] == simple_job_id && job_ids[1] == job_id)) | ||
); | ||
|
||
// Get task graph should succeed | ||
spider::core::TaskGraph graph_res{}; | ||
REQUIRE(storage->get_task_graph(job_id, &graph_res).success()); | ||
REQUIRE(spider::test::task_graph_equal(graph, graph_res)); | ||
spider::core::TaskGraph simple_graph_res{}; | ||
REQUIRE(storage->get_task_graph(simple_job_id, &simple_graph_res).success()); | ||
REQUIRE(spider::test::task_graph_equal(simple_graph, simple_graph_res)); | ||
|
||
// Get task should succeed | ||
spider::core::Task task_res{""}; | ||
REQUIRE(storage->get_task(child_task.get_id(), &task_res).success()); | ||
REQUIRE(spider::test::task_equal(child_task, task_res)); | ||
|
||
// Get child tasks should succeed | ||
std::vector<spider::core::Task> tasks; | ||
REQUIRE(storage->get_child_tasks(parent_1.get_id(), &tasks).success()); | ||
REQUIRE(1 == tasks.size()); | ||
REQUIRE(spider::test::task_equal(child_task, tasks[0])); | ||
tasks.clear(); | ||
|
||
// Get parent tasks should succeed | ||
REQUIRE(storage->get_parent_tasks(child_task.get_id(), &tasks).success()); | ||
REQUIRE(2 == tasks.size()); | ||
REQUIRE( | ||
((spider::test::task_equal(tasks[0], parent_1) | ||
&& spider::test::task_equal(tasks[1], parent_2)) | ||
|| (spider::test::task_equal(tasks[0], parent_2) | ||
&& spider::test::task_equal(tasks[1], parent_1))) | ||
); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codebase verification
🛠️ Refactor suggestion
The review comment is accurate - test coverage needs improvement
The test file's name suggests it covers job removal ("Job add, get ane[sic] remove"), but despite remove_job
being a required virtual method in the MetadataStorage interface, there are no tests validating this functionality. Additionally, the review correctly identifies missing test cases for error scenarios and the need to refactor the complex task graph setup.
Key findings:
- No tests for
remove_job
despite being part of the interface - Missing error scenario tests (invalid graphs, duplicates)
- Complex setup code that could benefit from helper functions
- Test name is misleading as it doesn't test removal functionality
🔗 Analysis chain
Consider improving test structure and coverage.
While the test thoroughly validates the happy path for job management, there are several areas for improvement:
- Complex task graph setup could be moved to helper functions
- Missing test cases for error scenarios
- No validation of job removal despite being mentioned in the test name
Consider extracting task graph setup to a helper function:
namespace spider::test {
spider::core::TaskGraph create_test_task_graph() {
spider::core::Task child_task{"child"};
spider::core::Task parent_1{"p1"};
spider::core::Task parent_2{"p2"};
// ... setup tasks ...
spider::core::TaskGraph graph;
// ... add tasks and dependencies ...
return graph;
}
}
Also, consider adding test cases for:
- Job removal functionality
- Adding jobs with invalid task graphs
- Adding duplicate jobs
- Race conditions in concurrent job submissions
Let me gather more information about the test coverage for job removal functionality.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Verify if job removal functionality exists in MetadataStorage
rg -A 5 "remove_job|delete_job" src/spider/storage/MetadataStorage.hpp
Length of output: 570
Script:
#!/bin/bash
# Check if there are any tests for remove_job in the test files
rg -B 2 -A 5 "remove_job|delete_job" tests/
# Look for the complete test file to understand the test scope
cat tests/storage/test-MetadataStorage.cpp
Length of output: 8168
tests/utils/CoreTaskUtils.cpp
Outdated
if (!vector_sort_equal<std::pair<boost::uuids::uuid, boost::uuids::uuid>>( | ||
graph_1.get_dependencies(), | ||
graph_1.get_dependencies() | ||
)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix comparison in task_graph_equal
function
The function currently compares graph_1.get_dependencies()
with itself instead of graph_2.get_dependencies()
. This prevents accurate comparison between two different task graphs.
Apply this diff to correct the comparison:
if (!vector_sort_equal<std::pair<boost::uuids::uuid, boost::uuids::uuid>>(
graph_1.get_dependencies(),
- graph_1.get_dependencies()
+ graph_2.get_dependencies()
))
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
if (!vector_sort_equal<std::pair<boost::uuids::uuid, boost::uuids::uuid>>( | |
graph_1.get_dependencies(), | |
graph_1.get_dependencies() | |
)) | |
if (!vector_sort_equal<std::pair<boost::uuids::uuid, boost::uuids::uuid>>( | |
graph_1.get_dependencies(), | |
graph_2.get_dependencies() | |
)) |
std::optional<Task const*> const task_option = task_graph.get_task(task_id); | ||
if (!task_option.has_value()) { | ||
m_conn->rollback(); | ||
return StorageErr{StorageErrType::KeyNotFoundErr, "Task graph inconsistent"}; | ||
} | ||
Task const& task = task_option.value(); | ||
this->add_task(job_id_bytes, task); | ||
Task const* task = task_option.value(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Refactor duplicated code in add_job
to a helper function
The code blocks at lines 427-432 and 444-449 are identical. Refactoring this duplicated code into a helper function would enhance maintainability and reduce redundancy.
Also applies to: 444-449
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (6)
tests/storage/test-MetadataStorage.cpp (5)
1-1
: Consider documenting the rationale for suppressed warnings.Please add a brief comment explaining why each warning is being suppressed. This helps future maintainers understand the reasoning behind these suppressions.
27-27
: Consider using named constants for better readability.The magic numbers in the test could be more descriptive:
cDuration = 100
could indicate the unit (milliseconds/seconds)- The sleep duration of 1 second could be a named constant
- constexpr double cDuration = 100; + // Duration in milliseconds for heartbeat timeout + constexpr double cHeartbeatTimeoutMs = 100; - std::this_thread::sleep_for(std::chrono::seconds(1)); + constexpr auto cHeartbeatWaitTime = std::chrono::seconds(1); + std::this_thread::sleep_for(cHeartbeatWaitTime);Also applies to: 47-47
75-75
: Consider using enums or constants for scheduler states and ports.The hardcoded values could be more maintainable:
- Port 3306 should be a named constant
- Scheduler states could be defined as enums or constants
+ // Common ports used in tests + namespace test_ports { + constexpr int kMySqlPort = 3306; + } + + // Scheduler states + namespace scheduler_states { + constexpr char kNormal[] = "normal"; + constexpr char kRecovery[] = "recovery"; + } - constexpr int cPort = 3306; + constexpr int cPort = test_ports::kMySqlPort; - REQUIRE(state_res == "normal"); + REQUIRE(state_res == scheduler_states::kNormal);Also applies to: 94-94
117-138
: Consider extracting task graph setup to helper functions.The task graph creation logic is complex and could be moved to helper functions for better reusability and readability.
namespace spider::test { spider::core::TaskGraph create_complex_task_graph() { spider::core::Task child_task{"child"}; spider::core::Task parent_1{"p1"}; spider::core::Task parent_2{"p2"}; // Setup inputs/outputs setup_parent_task(parent_1, {"1", "2"}, "float"); setup_parent_task(parent_2, {"3", "4"}, "int"); setup_child_task(child_task, parent_1, parent_2); // Create graph with dependencies spider::core::TaskGraph graph; add_tasks_with_dependencies(graph, {child_task, parent_1, parent_2}); return graph; } }
106-209
: Consider adding more error test cases.While the happy path is well tested, consider adding test cases for:
- Adding jobs with invalid task graphs (cycles, disconnected components)
- Adding duplicate jobs
- Removing non-existent jobs
- Race conditions in concurrent job submissions
tests/utils/CoreTaskUtils.cpp (1)
63-64
: Usesize_t
for loop index to match container size typeThe loop index
i
is declared asint
, whereasv1.size()
returns asize_t
. Using different integer types may lead to potential warnings or issues related to signed versus unsigned comparisons.Apply this diff to correct the index type:
-for (int i = 0; i < v1.size(); ++i) { +for (size_t i = 0; i < v1.size(); ++i) {
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (2)
tests/storage/test-MetadataStorage.cpp
(1 hunks)tests/utils/CoreTaskUtils.cpp
(1 hunks)
🧰 Additional context used
🪛 cppcheck
tests/storage/test-MetadataStorage.cpp
[error] 65-65: syntax error
(syntaxError)
tests/utils/CoreTaskUtils.cpp
[performance] 87-87: Variable 'id' is assigned in constructor body. Consider performing initialization in initialization list.
(useInitializationList)
[performance] 106-106: Variable 'm_id' is assigned in constructor body. Consider performing initialization in initialization list.
(useInitializationList)
🔇 Additional comments (2)
tests/storage/test-MetadataStorage.cpp (1)
202-208
: Well done on implementing job removal tests!
The test now properly validates the remove_job
functionality, addressing the concerns from previous reviews. It verifies both the successful removal and the persistence of unrelated jobs.
tests/utils/CoreTaskUtils.cpp (1)
1-187
: Overall, the code changes are well-implemented
The utility functions for comparing task-related data structures are correctly defined, and the logic is sound. The previous issue regarding the comparison of dependencies in task_graph_equal
has been resolved.
🧰 Tools
🪛 cppcheck
[performance] 87-87: Variable 'id' is assigned in constructor body. Consider performing initialization in initialization list.
(useInitializationList)
[performance] 106-106: Variable 'm_id' is assigned in constructor body. Consider performing initialization in initialization list.
(useInitializationList)
Description
As title.
Validation performed
Summary by CodeRabbit
Release Notes
New Features
Task
class constructor, requiring only the function name.MetadataStorage
.Bug Fixes
Tests
MetadataStorage
functionality.