-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add scheduler scheduling #31
Conversation
Warning Rate limit exceeded@sitaowang1998 has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 16 minutes and 48 seconds before requesting another review. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. WalkthroughThe pull request introduces significant changes to the Spider project, focusing on the addition of a new scheduling policy and associated metadata management. A new Changes
Possibly related PRs
Suggested reviewers
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🧹 Outside diff range and nitpick comments (16)
src/spider/scheduler/SchedulerPolicy.hpp (3)
23-29
: Consider passingboost::uuids::uuid
by constant referencePassing
boost::uuids::uuid
byconst
reference inschedule_next
could improve performance by avoiding unnecessary copies.
30-30
: Consider passingboost::uuids::uuid
by constant referenceSimilarly, passing
boost::uuids::uuid
byconst
reference incleanup_job
could enhance performance by preventing unnecessary copying.
10-11
: Reduce header dependencies by forward declaring classesForward declaring
core::DataStorage
andcore::MetadataStorage
instead of including their headers can minimize compilation dependencies and improve build times.src/spider/scheduler/FifoPolicy.cpp (1)
39-43
: Prefer using!expression
overfalse == expression
for clarityReplacing instances of
false == expression
with!expression
enhances readability and aligns with common C++ conventions.Apply these changes:
- if (false == data_store->get_data(data_id, &data).success()) { + if (!data_store->get_data(data_id, &data).success()) {- if (false == data.is_hard_locality()) { + if (!data.is_hard_locality()) {- if (false == metadata_store->get_task_job_id(task_id, &job_id).success()) { + if (!metadata_store->get_task_job_id(task_id, &job_id).success()) {- if (false == metadata_store->get_job_metadata(job_id, &job_metadata).success()) { + if (!metadata_store->get_job_metadata(job_id, &job_metadata).success()) {Also applies to: 44-46, 89-95, 103-109
tests/worker/worker-test.cpp (1)
3-4
: Remove redundant include or confirm its necessityThe inclusion of
"../../src/spider/worker/FunctionManager.hpp"
seems to be uncommented now. Ensure that it's required for this test file; otherwise, consider removing it to reduce dependencies.If the
FunctionManager
is used, then this change is appropriate.src/spider/core/JobMetadata.hpp (1)
14-21
: Use member initializer list consistentlyEnsure all member variables are initialized using the member initializer list to improve performance and clarity.
Confirm that no assignments occur within the constructor body.
src/spider/storage/MetadataStorage.hpp (1)
36-36
: LGTM: Well-designed interface extensionsThe new pure virtual methods:
- Follow consistent error handling patterns
- Maintain interface consistency
- Use appropriate parameter types
- Have clear, single responsibilities
Consider adding documentation comments to describe the expected behaviour and error conditions.
Also applies to: 45-46
src/spider/scheduler/FifoPolicy.hpp (2)
18-27
: Consider documenting thread safety guaranteesThe class design looks solid with appropriate use of final and override keywords. However, please document thread safety guarantees as this class will likely be used in a concurrent context.
Add documentation comments like:
/** * FIFO scheduling policy implementation. * Thread safety: [document guarantees here] */ class FifoPolicy final : public SchedulerPolicy {
29-30
: Consider atomic operations or mutex protectionThe member variables storing task-job mappings and timestamps might need protection in a concurrent environment. Consider:
- Adding a mutex to protect concurrent access
- Documenting the synchronization requirements for derived implementations
src/spider/CMakeLists.txt (1)
95-105
: Consider adding install target for spider_schedulerThe executable is properly set up with sources and dependencies, but missing an install configuration.
Add install configuration similar to other executables:
target_link_libraries( spider_scheduler PRIVATE Boost::headers absl::flat_hash_map spdlog::spdlog ) + install(TARGETS spider_scheduler + RUNTIME DESTINATION bin)src/spider/storage/MysqlStorage.hpp (2)
40-40
: Consider documenting return valuesThe new method would benefit from documentation explaining possible storage errors.
Add documentation:
+ /** + * Retrieves job metadata for the given job ID + * @param id The job UUID to look up + * @param job Pointer to JobMetadata object to populate + * @return StorageErr::Success on successful retrieval + * StorageErr::NotFound if job doesn't exist + */ auto get_job_metadata(boost::uuids::uuid id, JobMetadata* job) -> StorageErr override;
49-49
: Consider documenting return valuesSimilar to get_job_metadata, this method would benefit from documentation.
Add documentation:
+ /** + * Retrieves the job ID associated with a task + * @param id The task UUID to look up + * @param job_id Pointer to UUID to populate with job ID + * @return StorageErr::Success on successful retrieval + * StorageErr::NotFound if task doesn't exist + */ auto get_task_job_id(boost::uuids::uuid id, boost::uuids::uuid* job_id) -> StorageErr override;tests/scheduler/test-SchedulerPolicy.cpp (1)
135-135
: Fix comment accuracyThe comment incorrectly states "hard locality" for a soft locality test.
- // Submit task with hard locality + // Submit task with soft localitytests/storage/test-MetadataStorage.cpp (3)
3-21
: Consider adding header guardsWhile the includes are well-organized by category (STL, external, project), consider adding header guards to prevent potential multiple inclusion issues, especially since this is a test file that might be included elsewhere.
+#ifndef SPIDER_TESTS_STORAGE_TEST_METADATA_STORAGE_HPP +#define SPIDER_TESTS_STORAGE_TEST_METADATA_STORAGE_HPP // existing includes... +#endif // SPIDER_TESTS_STORAGE_TEST_METADATA_STORAGE_HPP
147-149
: Consider using steady_clock for more precise timing testsFor testing time-related functionality,
std::chrono::steady_clock
might be more appropriate thansystem_clock
as it's monotonic and not affected by system time adjustments.- std::chrono::system_clock::time_point const job_creation_time - = std::chrono::system_clock::now(); + std::chrono::steady_clock::time_point const job_creation_time + = std::chrono::steady_clock::now();
177-185
: Enhance test coverage with edge casesThe metadata verification is good but could be more comprehensive. Consider adding test cases for:
- Jobs with empty task graphs
- Jobs with maximum allowed tasks
- Jobs with circular dependencies (should fail)
- Jobs with duplicate task IDs (should fail)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (20)
src/spider/CMakeLists.txt
(3 hunks)src/spider/core/JobMetadata.hpp
(1 hunks)src/spider/scheduler/FifoPolicy.cpp
(1 hunks)src/spider/scheduler/FifoPolicy.hpp
(1 hunks)src/spider/scheduler/SchedulerPolicy.hpp
(1 hunks)src/spider/scheduler/scheduler.cpp
(1 hunks)src/spider/storage/MetadataStorage.hpp
(3 hunks)src/spider/storage/MysqlStorage.cpp
(8 hunks)src/spider/storage/MysqlStorage.hpp
(3 hunks)tests/.clang-format
(1 hunks)tests/CMakeLists.txt
(1 hunks)tests/scheduler/test-SchedulerPolicy.cpp
(1 hunks)tests/storage/StorageTestHelper.hpp
(1 hunks)tests/storage/test-DataStorage.cpp
(1 hunks)tests/storage/test-MetadataStorage.cpp
(4 hunks)tests/utils/CoreTaskUtils.cpp
(1 hunks)tests/worker/test-FunctionManager.cpp
(1 hunks)tests/worker/test-MessagePipe.cpp
(1 hunks)tests/worker/test-TaskExecutor.cpp
(1 hunks)tests/worker/worker-test.cpp
(1 hunks)
✅ Files skipped from review due to trivial changes (8)
- src/spider/scheduler/scheduler.cpp
- tests/worker/test-MessagePipe.cpp
- tests/storage/StorageTestHelper.hpp
- tests/storage/test-DataStorage.cpp
- tests/utils/CoreTaskUtils.cpp
- tests/worker/test-FunctionManager.cpp
- tests/.clang-format
- tests/worker/test-TaskExecutor.cpp
🧰 Additional context used
🪛 cppcheck (2.10-2)
src/spider/scheduler/FifoPolicy.cpp
[performance] 88-88: Variable 'id' is assigned in constructor body. Consider performing initialization in initialization list.
(useInitializationList)
[performance] 107-107: Variable 'm_id' is assigned in constructor body. Consider performing initialization in initialization list.
(useInitializationList)
tests/scheduler/test-SchedulerPolicy.cpp
[error] 73-73: syntax error
(syntaxError)
🔇 Additional comments (7)
src/spider/storage/MysqlStorage.cpp (1)
722-746
: 🛠️ Refactor suggestion
Ensure accurate time parsing in parse_timestamp
When parsing timestamps, consider handling time zones and potential parsing errors to avoid inconsistencies or exceptions.
Would you like assistance in enhancing the parse_timestamp
function to handle time zones and error cases?
src/spider/storage/MetadataStorage.hpp (1)
10-10
: LGTM: Include placement is appropriate
The JobMetadata.hpp include is correctly placed with other core includes and is necessary for the new method parameter.
src/spider/scheduler/FifoPolicy.hpp (1)
4-14
: LGTM: Well-organized includes
The includes are properly organized:
- Standard library headers
- Third-party headers (abseil, boost)
- Project-specific headers
tests/CMakeLists.txt (2)
11-11
: LGTM: Test file properly integrated
The scheduler policy test file is correctly added to SPIDER_TEST_SOURCES.
25-29
: LGTM: Well-structured test source organization
The new SPIDER_TEST_SCHEDULER_SOURCES variable follows the established pattern used for worker sources, maintaining consistency in the build system.
src/spider/CMakeLists.txt (1)
88-94
: LGTM: Well-organized scheduler source files
The scheduler source files are properly organized and cached internally for access from other scopes.
tests/storage/test-MetadataStorage.cpp (1)
223-223
: Verify complete cleanup after job removal
While the basic removal test is good, consider adding verification that all associated data (tasks, dependencies, metadata) is properly cleaned up from storage.
✅ Verification successful
Let me gather more information about the cleanup implementation in MySQL storage.
Let me try a different approach to check the implementation details.
Cleanup verification is properly implemented
The MySQL schema shows comprehensive cascading delete constraints that ensure complete cleanup:
- Tasks are automatically deleted when a job is removed (
task_job_id
foreign key withON DELETE CASCADE
) - All related data is properly cleaned up through cascading deletes:
- Task inputs and outputs
- Task dependencies
- Task instances
- Data references
The implementation in MysqlStorage::remove_job
is sufficient as the database schema handles the complete cleanup automatically.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check for potential orphaned data after job removal
# Search for any references to job cleanup in the codebase
rg -A 5 "remove_job|cleanup.*job|delete.*job" --type cpp
Length of output: 5990
Script:
#!/bin/bash
# Check MySQL schema and cleanup implementation
rg -A 10 "CREATE TABLE.*jobs" --type sql
rg -A 10 "CREATE TABLE.*tasks" --type sql
rg -A 10 "FOREIGN KEY.*CASCADE" --type sql
Length of output: 578
Script:
#!/bin/bash
# Check MySQL implementation for cascading deletes and task cleanup
rg -A 10 "DELETE.*FROM.*jobs" --type cpp
rg -A 10 "DELETE.*FROM.*tasks" --type cpp
rg -A 10 "ON DELETE CASCADE" --type cpp
Length of output: 8795
src/spider/storage/MysqlStorage.cpp
Outdated
auto parse_timestamp(std::string const& timestamp) -> std::chrono::system_clock::time_point { | ||
std::tm time_date{}; | ||
std::stringstream ss{timestamp}; | ||
ss >> std::get_time(&time_date, "%Y-%m-%d %H:%M:%S"); | ||
return std::chrono::system_clock::from_time_t(std::mktime(&time_date)); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Handle potential exceptions when parsing timestamps
The parse_timestamp
function may throw an exception if the timestamp format doesn't match. Consider adding error handling to manage parsing failures gracefully.
Apply this change to handle parsing errors:
auto parse_timestamp(std::string const& timestamp) -> std::chrono::system_clock::time_point {
std::tm time_date{};
std::stringstream ss{timestamp};
ss >> std::get_time(&time_date, "%Y-%m-%d %H:%M:%S");
+ if (ss.fail()) {
+ throw std::runtime_error("Failed to parse timestamp: " + timestamp);
+ }
return std::chrono::system_clock::from_time_t(std::mktime(&time_date));
}
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
auto parse_timestamp(std::string const& timestamp) -> std::chrono::system_clock::time_point { | |
std::tm time_date{}; | |
std::stringstream ss{timestamp}; | |
ss >> std::get_time(&time_date, "%Y-%m-%d %H:%M:%S"); | |
return std::chrono::system_clock::from_time_t(std::mktime(&time_date)); | |
} | |
auto parse_timestamp(std::string const& timestamp) -> std::chrono::system_clock::time_point { | |
std::tm time_date{}; | |
std::stringstream ss{timestamp}; | |
ss >> std::get_time(&time_date, "%Y-%m-%d %H:%M:%S"); | |
if (ss.fail()) { | |
throw std::runtime_error("Failed to parse timestamp: " + timestamp); | |
} | |
return std::chrono::system_clock::from_time_t(std::mktime(&time_date)); | |
} |
src/spider/core/JobMetadata.hpp
Outdated
[[nodiscard]] auto get_id() -> boost::uuids::uuid { return m_id; } | ||
|
||
[[nodiscard]] auto get_client_id() -> boost::uuids::uuid { return m_client_id; } | ||
|
||
[[nodiscard]] auto get_creation_time() -> std::chrono::system_clock::time_point { | ||
return m_creation_time; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Mark getter methods as const
Marking the getter methods as const
ensures they do not modify the object's state and conveys the intended usage.
Apply this diff to make the methods const
:
[[nodiscard]] auto get_id() -> boost::uuids::uuid {
+ return m_id;
+}
[[nodiscard]] auto get_client_id() -> boost::uuids::uuid {
+ return m_client_id;
}
[[nodiscard]] auto get_creation_time() -> std::chrono::system_clock::time_point {
return m_creation_time;
}
After modification:
[[nodiscard]] auto get_id() const -> boost::uuids::uuid { return m_id; }
[[nodiscard]] auto get_client_id() const -> boost::uuids::uuid { return m_client_id; }
[[nodiscard]] auto get_creation_time() const -> std::chrono::system_clock::time_point {
return m_creation_time;
}
REQUIRE(metadata_store->add_job(job_id_1, client_id, graph_1).success()); | ||
REQUIRE(metadata_store->set_task_state(task_1.get_id(), spider::core::TaskState::Ready) | ||
.success()); | ||
std::this_thread::sleep_for(std::chrono::seconds(1)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Consider removing sleep and adding error handling
The sleep in the first test case seems unnecessary, and the second test case is missing error handling for add_data.
- std::this_thread::sleep_for(std::chrono::seconds(1));
And in the soft locality test:
- data_store->add_data(data);
+ REQUIRE(data_store->add_data(data).success());
Also applies to: 140-140
REQUIRE(task_id == task.get_id()); | ||
} | ||
|
||
// REQUIRE(metadata_store->remove_job(job_id).success()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Uncomment cleanup code
The cleanup code is commented out, which could leave test data in the storage.
- // REQUIRE(metadata_store->remove_job(job_id).success());
+ REQUIRE(metadata_store->remove_job(job_id).success());
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
// REQUIRE(metadata_store->remove_job(job_id).success()); | |
REQUIRE(metadata_store->remove_job(job_id).success()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Outside diff range and nitpick comments (3)
tests/scheduler/test-SchedulerPolicy.cpp (2)
135-135
: Fix incorrect commentThe comment states "hard locality" but the test is actually setting up soft locality.
- // Submit task with hard locality + // Submit task with soft locality
1-162
: Consider adding more edge casesThe test suite covers the basic scheduling scenarios well. Consider adding tests for:
- Multiple tasks with the same locality
- Tasks with multiple locality constraints
- Edge cases around locality string formats
🧰 Tools
🪛 cppcheck (2.10-2)
[error] 73-73: syntax error
(syntaxError)
src/spider/storage/MysqlStorage.cpp (1)
709-718
: Consider making timestamp format configurableThe timestamp format is hardcoded. Consider making it configurable or defining it as a constant to improve maintainability.
+namespace { +constexpr char const* const cTimestampFormat = "%Y-%m-%d %H:%M:%S"; +} auto parse_timestamp(std::string const& timestamp ) -> std::optional<std::chrono::system_clock::time_point> { std::tm time_date{}; std::stringstream ss{timestamp}; - ss >> std::get_time(&time_date, "%Y-%m-%d %H:%M:%S"); + ss >> std::get_time(&time_date, cTimestampFormat); if (ss.fail()) { return std::nullopt; } return std::chrono::system_clock::from_time_t(std::mktime(&time_date)); }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (3)
src/spider/core/JobMetadata.hpp
(1 hunks)src/spider/storage/MysqlStorage.cpp
(8 hunks)tests/scheduler/test-SchedulerPolicy.cpp
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- src/spider/core/JobMetadata.hpp
🧰 Additional context used
🪛 cppcheck (2.10-2)
tests/scheduler/test-SchedulerPolicy.cpp
[error] 73-73: syntax error
(syntaxError)
🔇 Additional comments (3)
tests/scheduler/test-SchedulerPolicy.cpp (1)
49-49
: Remove unnecessary sleep
The sleep between task submissions is not required and could make the test flaky. The FIFO order is guaranteed by the database transaction order, not by the timing of submissions.
- std::this_thread::sleep_for(std::chrono::seconds(1));
src/spider/storage/MysqlStorage.cpp (2)
54-54
: Schema changes improve data integrity and flexibility
The changes to the schema are well thought out:
- Adding NOT NULL constraints to timestamp columns
- Changing address format to VARCHAR(40) for more flexible address formats
- Consistent use of CURRENT_TIMESTAMP defaults
Also applies to: 69-69, 123-123, 141-141
724-760
: Well-implemented job metadata retrieval
The implementation of get_job_metadata and get_task_job_id is robust:
- Proper error handling for database operations
- Consistent transaction management
- Clear error messages with formatted strings
Also applies to: 852-876
spider::core::Data data{"value"}; | ||
data.set_hard_locality(true); | ||
data.set_locality({"127.0.0.1"}); | ||
REQUIRE(data_store->add_data(data).success()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Add error handling for data store operations
The data store operation should be checked for errors to ensure the test fails appropriately if the setup fails.
- REQUIRE(data_store->add_data(data).success());
+ auto result = data_store->add_data(data);
+ REQUIRE_MESSAGE(result.success(),
+ fmt::format("Failed to add data: {}", result.message()));
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
REQUIRE(data_store->add_data(data).success()); | |
auto result = data_store->add_data(data); | |
REQUIRE_MESSAGE(result.success(), | |
fmt::format("Failed to add data: {}", result.message())); |
Description
Add scheduler policy interface and fifo scheduling policy.
Validation performed
Summary by CodeRabbit
Release Notes
New Features
Bug Fixes
Tests
Documentation