feat: Add scheduler scheduling #31

sitaowang1998 · 2024-12-04T08:31:59Z

Description

Add scheduler policy interface and fifo scheduling policy.

Validation performed

lint:check
Unit test pass in devcontainer with mysql setup as storage.

Summary by CodeRabbit

Release Notes

New Features
- Introduced a FIFO scheduling policy to manage task execution efficiently.
- Added job metadata management capabilities, allowing better tracking of job details.
- Enhanced database schema for improved data integrity in job and task management.
Bug Fixes
- Improved error handling for job and task metadata retrieval.
Tests
- Added unit tests for the new FIFO scheduling policy, ensuring correct task scheduling based on locality constraints.
- Expanded test coverage for job metadata handling, validating successful retrieval and removal of job metadata.
Documentation
- Updated configuration files for consistent code formatting across the project.

coderabbitai · 2024-12-04T08:32:07Z

Warning

Rate limit exceeded

@sitaowang1998 has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 16 minutes and 48 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between 93481f8 and 7725428.

Walkthrough

The pull request introduces significant changes to the Spider project, focusing on the addition of a new scheduling policy and associated metadata management. A new SchedulerPolicy class and its derived FifoPolicy class are implemented, along with a JobMetadata class to encapsulate job-related information. Modifications to the CMakeLists.txt reflect the inclusion of new source files and adjustments in linking Boost libraries. Additionally, unit tests are added to validate the new scheduling functionalities, while existing test files undergo reorganization of include statements.

Changes

File Path	Change Summary
src/spider/CMakeLists.txt	Added `SPIDER_SCHEDULER_SOURCES` variable, created `spider_scheduler` executable, updated linking for Boost libraries.
src/spider/core/JobMetadata.hpp	Introduced `JobMetadata` class with constructors and getter methods for job metadata.
src/spider/scheduler/FifoPolicy.cpp	Implemented `FifoPolicy` class with methods for scheduling and job cleanup.
src/spider/scheduler/FifoPolicy.hpp	Defined `FifoPolicy` class and its methods for scheduling and cleanup operations.
src/spider/scheduler/SchedulerPolicy.hpp	Created abstract `SchedulerPolicy` class with pure virtual methods for scheduling policies.
src/spider/scheduler/scheduler.cpp	Added main function with an empty body for the scheduler executable.
src/spider/storage/MetadataStorage.hpp	Added pure virtual methods for job metadata retrieval and task job ID retrieval.
src/spider/storage/MysqlStorage.cpp	Enhanced database schema and added methods for job metadata management.
src/spider/storage/MysqlStorage.hpp	Added methods for job metadata retrieval and task job ID retrieval.
tests/.clang-format	Introduced clang-format configuration file for C++ code formatting rules.
tests/CMakeLists.txt	Added new test source file for scheduler tests and organized test sources.
tests/scheduler/test-SchedulerPolicy.cpp	Added unit tests for FIFO scheduling policy with various locality constraints.
tests/storage/StorageTestHelper.hpp	Reordered include statements without functional changes.
tests/storage/test-DataStorage.cpp	Reorganized include statements; no functional changes.
tests/storage/test-MetadataStorage.cpp	Expanded test cases for job metadata retrieval and reorganized include directives.
tests/utils/CoreTaskUtils.cpp	Reordered include statements without functional changes.
tests/worker/test-FunctionManager.cpp	Reordered include statements without functional changes.
tests/worker/test-MessagePipe.cpp	Reordered include statements without functional changes.
tests/worker/test-TaskExecutor.cpp	Reordered include statements without functional changes.
tests/worker/worker-test.cpp	Activated inclusion of `FunctionManager.hpp` header in tests.

Possibly related PRs

chore: Add boilerplate CMake project and Taskfile tasks to lint CMake files. #7: Modifications to the CMakeLists.txt file, focusing on CMake configuration, indicating a direct connection.
build: Add CMake config and libraries #13: Changes to the CMakeLists.txt file, adding configurations for libraries, relating to source file organization.
feat: Add MySql support for storage backend #20: Introduces MySQL support, relating to modifications enhancing project structure and linking libraries.
test: Add tests for data storage #21: Adds tests for data storage, connecting to changes involving new source files and functionalities.
test: Add unit tests for metadata storage and fix bugs in MySQL storage backend #23: Focus on enhancing metadata storage and fixing bugs in MySQL backend, aligning with structural changes.
feat: Add function manager for register and run function by function name #25: Introduction of a function manager, relating to enhancements in project structure and task management.
feat: Add message pipe #27: Addition of message pipe functionality, connecting with improvements in communication and execution mechanisms.
feat: Add task executor #28: Introduction of a task executor, directly relating to enhancements in organization of source files and new executables.

Suggested reviewers

kirkrodrigues

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 4

🧹 Outside diff range and nitpick comments (16)

src/spider/scheduler/SchedulerPolicy.hpp (3)

23-29: Consider passing boost::uuids::uuid by constant reference

Passing boost::uuids::uuid by const reference in schedule_next could improve performance by avoiding unnecessary copies.

30-30: Consider passing boost::uuids::uuid by constant reference

Similarly, passing boost::uuids::uuid by const reference in cleanup_job could enhance performance by preventing unnecessary copying.

10-11: Reduce header dependencies by forward declaring classes

Forward declaring core::DataStorage and core::MetadataStorage instead of including their headers can minimize compilation dependencies and improve build times.
src/spider/scheduler/FifoPolicy.cpp (1)
39-43: Prefer using !expression over false == expression for clarity

Replacing instances of false == expression with !expression enhances readability and aligns with common C++ conventions.

Apply these changes:
- if (false == data_store->get_data(data_id, &data).success()) {
+ if (!data_store->get_data(data_id, &data).success()) {
- if (false == data.is_hard_locality()) {
+ if (!data.is_hard_locality()) {
- if (false == metadata_store->get_task_job_id(task_id, &job_id).success()) {
+ if (!metadata_store->get_task_job_id(task_id, &job_id).success()) {
- if (false == metadata_store->get_job_metadata(job_id, &job_metadata).success()) {
+ if (!metadata_store->get_job_metadata(job_id, &job_metadata).success()) {
Also applies to: 44-46, 89-95, 103-109
tests/worker/worker-test.cpp (1)

3-4: Remove redundant include or confirm its necessity

The inclusion of "../../src/spider/worker/FunctionManager.hpp" seems to be uncommented now. Ensure that it's required for this test file; otherwise, consider removing it to reduce dependencies.

If the FunctionManager is used, then this change is appropriate.

src/spider/core/JobMetadata.hpp (1)

14-21: Use member initializer list consistently

Ensure all member variables are initialized using the member initializer list to improve performance and clarity.

Confirm that no assignments occur within the constructor body.

src/spider/storage/MetadataStorage.hpp (1)

36-36: LGTM: Well-designed interface extensions

The new pure virtual methods:

Follow consistent error handling patterns

Maintain interface consistency

Use appropriate parameter types

Have clear, single responsibilities

Consider adding documentation comments to describe the expected behaviour and error conditions.

Also applies to: 45-46
src/spider/scheduler/FifoPolicy.hpp (2)
18-27: Consider documenting thread safety guarantees

The class design looks solid with appropriate use of final and override keywords. However, please document thread safety guarantees as this class will likely be used in a concurrent context.

Add documentation comments like:
/**
 * FIFO scheduling policy implementation.
 * Thread safety: [document guarantees here]
 */
class FifoPolicy final : public SchedulerPolicy {
29-30: Consider atomic operations or mutex protection

The member variables storing task-job mappings and timestamps might need protection in a concurrent environment. Consider:

Adding a mutex to protect concurrent access

Documenting the synchronization requirements for derived implementations
src/spider/CMakeLists.txt (1)
95-105: Consider adding install target for spider_scheduler

The executable is properly set up with sources and dependencies, but missing an install configuration.

Add install configuration similar to other executables:
target_link_libraries(
    spider_scheduler
    PRIVATE
        Boost::headers
        absl::flat_hash_map
        spdlog::spdlog
)
+ install(TARGETS spider_scheduler
+         RUNTIME DESTINATION bin)
src/spider/storage/MysqlStorage.hpp (2)
40-40: Consider documenting return values

The new method would benefit from documentation explaining possible storage errors.

Add documentation:
+    /**
+     * Retrieves job metadata for the given job ID
+     * @param id The job UUID to look up
+     * @param job Pointer to JobMetadata object to populate
+     * @return StorageErr::Success on successful retrieval
+     *         StorageErr::NotFound if job doesn't exist
+     */
     auto get_job_metadata(boost::uuids::uuid id, JobMetadata* job) -> StorageErr override;
49-49: Consider documenting return values

Similar to get_job_metadata, this method would benefit from documentation.

Add documentation:
+    /**
+     * Retrieves the job ID associated with a task
+     * @param id The task UUID to look up
+     * @param job_id Pointer to UUID to populate with job ID
+     * @return StorageErr::Success on successful retrieval
+     *         StorageErr::NotFound if task doesn't exist
+     */
     auto get_task_job_id(boost::uuids::uuid id, boost::uuids::uuid* job_id) -> StorageErr override;
tests/scheduler/test-SchedulerPolicy.cpp (1)
135-135: Fix comment accuracy

The comment incorrectly states "hard locality" for a soft locality test.
-    // Submit task with hard locality
+    // Submit task with soft locality
tests/storage/test-MetadataStorage.cpp (3)
3-21: Consider adding header guards

While the includes are well-organized by category (STL, external, project), consider adding header guards to prevent potential multiple inclusion issues, especially since this is a test file that might be included elsewhere.
+#ifndef SPIDER_TESTS_STORAGE_TEST_METADATA_STORAGE_HPP
+#define SPIDER_TESTS_STORAGE_TEST_METADATA_STORAGE_HPP

// existing includes...

+#endif // SPIDER_TESTS_STORAGE_TEST_METADATA_STORAGE_HPP
147-149: Consider using steady_clock for more precise timing tests

For testing time-related functionality, std::chrono::steady_clock might be more appropriate than system_clock as it's monotonic and not affected by system time adjustments.
-    std::chrono::system_clock::time_point const job_creation_time
-            = std::chrono::system_clock::now();
+    std::chrono::steady_clock::time_point const job_creation_time
+            = std::chrono::steady_clock::now();
177-185: Enhance test coverage with edge cases

The metadata verification is good but could be more comprehensive. Consider adding test cases for:

Jobs with empty task graphs

Jobs with maximum allowed tasks

Jobs with circular dependencies (should fail)

Jobs with duplicate task IDs (should fail)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 8c346c5 and 289a151.

📒 Files selected for processing (20)

src/spider/CMakeLists.txt (3 hunks)
src/spider/core/JobMetadata.hpp (1 hunks)
src/spider/scheduler/FifoPolicy.cpp (1 hunks)
src/spider/scheduler/FifoPolicy.hpp (1 hunks)
src/spider/scheduler/SchedulerPolicy.hpp (1 hunks)
src/spider/scheduler/scheduler.cpp (1 hunks)
src/spider/storage/MetadataStorage.hpp (3 hunks)
src/spider/storage/MysqlStorage.cpp (8 hunks)
src/spider/storage/MysqlStorage.hpp (3 hunks)
tests/.clang-format (1 hunks)
tests/CMakeLists.txt (1 hunks)
tests/scheduler/test-SchedulerPolicy.cpp (1 hunks)
tests/storage/StorageTestHelper.hpp (1 hunks)
tests/storage/test-DataStorage.cpp (1 hunks)
tests/storage/test-MetadataStorage.cpp (4 hunks)
tests/utils/CoreTaskUtils.cpp (1 hunks)
tests/worker/test-FunctionManager.cpp (1 hunks)
tests/worker/test-MessagePipe.cpp (1 hunks)
tests/worker/test-TaskExecutor.cpp (1 hunks)
tests/worker/worker-test.cpp (1 hunks)

✅ Files skipped from review due to trivial changes (8)

src/spider/scheduler/scheduler.cpp
tests/worker/test-MessagePipe.cpp
tests/storage/StorageTestHelper.hpp
tests/storage/test-DataStorage.cpp
tests/utils/CoreTaskUtils.cpp
tests/worker/test-FunctionManager.cpp
tests/.clang-format
tests/worker/test-TaskExecutor.cpp

🧰 Additional context used

🪛 cppcheck (2.10-2)

src/spider/scheduler/FifoPolicy.cpp

[performance] 88-88: Variable 'id' is assigned in constructor body. Consider performing initialization in initialization list.

(useInitializationList)

[performance] 107-107: Variable 'm_id' is assigned in constructor body. Consider performing initialization in initialization list.

(useInitializationList)

tests/scheduler/test-SchedulerPolicy.cpp

[error] 73-73: syntax error

(syntaxError)

🔇 Additional comments (7)

src/spider/storage/MysqlStorage.cpp (1)

722-746: 🛠️ Refactor suggestion

Ensure accurate time parsing in parse_timestamp

When parsing timestamps, consider handling time zones and potential parsing errors to avoid inconsistencies or exceptions.

Would you like assistance in enhancing the parse_timestamp function to handle time zones and error cases?

src/spider/storage/MetadataStorage.hpp (1)

10-10: LGTM: Include placement is appropriate

The JobMetadata.hpp include is correctly placed with other core includes and is necessary for the new method parameter.

src/spider/scheduler/FifoPolicy.hpp (1)

4-14: LGTM: Well-organized includes

The includes are properly organized:

Standard library headers
Third-party headers (abseil, boost)
Project-specific headers

tests/CMakeLists.txt (2)

11-11: LGTM: Test file properly integrated

The scheduler policy test file is correctly added to SPIDER_TEST_SOURCES.

25-29: LGTM: Well-structured test source organization

The new SPIDER_TEST_SCHEDULER_SOURCES variable follows the established pattern used for worker sources, maintaining consistency in the build system.

src/spider/CMakeLists.txt (1)

88-94: LGTM: Well-organized scheduler source files

The scheduler source files are properly organized and cached internally for access from other scopes.

tests/storage/test-MetadataStorage.cpp (1)

223-223: Verify complete cleanup after job removal

While the basic removal test is good, consider adding verification that all associated data (tasks, dependencies, metadata) is properly cleaned up from storage.

✅ Verification successful

Let me gather more information about the cleanup implementation in MySQL storage.

Let me try a different approach to check the implementation details.

Cleanup verification is properly implemented

The MySQL schema shows comprehensive cascading delete constraints that ensure complete cleanup:

Tasks are automatically deleted when a job is removed (task_job_id foreign key with ON DELETE CASCADE)
All related data is properly cleaned up through cascading deletes:
- Task inputs and outputs
- Task dependencies
- Task instances
- Data references

The implementation in MysqlStorage::remove_job is sufficient as the database schema handles the complete cleanup automatically.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for potential orphaned data after job removal

# Search for any references to job cleanup in the codebase
rg -A 5 "remove_job|cleanup.*job|delete.*job" --type cpp

Length of output: 5990

Script:

#!/bin/bash
# Check MySQL schema and cleanup implementation
rg -A 10 "CREATE TABLE.*jobs" --type sql
rg -A 10 "CREATE TABLE.*tasks" --type sql
rg -A 10 "FOREIGN KEY.*CASCADE" --type sql

Length of output: 578

Script:

#!/bin/bash
# Check MySQL implementation for cascading deletes and task cleanup
rg -A 10 "DELETE.*FROM.*jobs" --type cpp
rg -A 10 "DELETE.*FROM.*tasks" --type cpp
rg -A 10 "ON DELETE CASCADE" --type cpp

Length of output: 8795

coderabbitai · 2024-12-04T08:38:03Z

src/spider/storage/MysqlStorage.cpp

+auto parse_timestamp(std::string const& timestamp) -> std::chrono::system_clock::time_point {
+    std::tm time_date{};
+    std::stringstream ss{timestamp};
+    ss >> std::get_time(&time_date, "%Y-%m-%d %H:%M:%S");
+    return std::chrono::system_clock::from_time_t(std::mktime(&time_date));
+}


⚠️ Potential issue

Handle potential exceptions when parsing timestamps

The parse_timestamp function may throw an exception if the timestamp format doesn't match. Consider adding error handling to manage parsing failures gracefully.

Apply this change to handle parsing errors:

auto parse_timestamp(std::string const& timestamp) -> std::chrono::system_clock::time_point { std::tm time_date{}; std::stringstream ss{timestamp}; ss >> std::get_time(&time_date, "%Y-%m-%d %H:%M:%S"); + if (ss.fail()) { + throw std::runtime_error("Failed to parse timestamp: " + timestamp); + } return std::chrono::system_clock::from_time_t(std::mktime(&time_date)); }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

auto parse_timestamp(std::string const& timestamp) -> std::chrono::system_clock::time_point {

std::tm time_date{};

std::stringstream ss{timestamp};

ss >> std::get_time(&time_date, "%Y-%m-%d %H:%M:%S");

return std::chrono::system_clock::from_time_t(std::mktime(&time_date));

}

auto parse_timestamp(std::string const& timestamp) -> std::chrono::system_clock::time_point {

std::tm time_date{};

std::stringstream ss{timestamp};

ss >> std::get_time(&time_date, "%Y-%m-%d %H:%M:%S");

if (ss.fail()) {

throw std::runtime_error("Failed to parse timestamp: " + timestamp);

}

return std::chrono::system_clock::from_time_t(std::mktime(&time_date));

}

coderabbitai · 2024-12-04T08:38:03Z

src/spider/core/JobMetadata.hpp

+    [[nodiscard]] auto get_id() -> boost::uuids::uuid { return m_id; }
+
+    [[nodiscard]] auto get_client_id() -> boost::uuids::uuid { return m_client_id; }
+
+    [[nodiscard]] auto get_creation_time() -> std::chrono::system_clock::time_point {
+        return m_creation_time;
+    }


🛠️ Refactor suggestion

Mark getter methods as const

Marking the getter methods as const ensures they do not modify the object's state and conveys the intended usage.

Apply this diff to make the methods const:

[[nodiscard]] auto get_id() -> boost::uuids::uuid { + return m_id; +} [[nodiscard]] auto get_client_id() -> boost::uuids::uuid { + return m_client_id; } [[nodiscard]] auto get_creation_time() -> std::chrono::system_clock::time_point { return m_creation_time; }

After modification:

[[nodiscard]] auto get_id() const -> boost::uuids::uuid { return m_id; } [[nodiscard]] auto get_client_id() const -> boost::uuids::uuid { return m_client_id; } [[nodiscard]] auto get_creation_time() const -> std::chrono::system_clock::time_point { return m_creation_time; }

coderabbitai · 2024-12-04T08:38:03Z

tests/scheduler/test-SchedulerPolicy.cpp

+    REQUIRE(metadata_store->add_job(job_id_1, client_id, graph_1).success());
+    REQUIRE(metadata_store->set_task_state(task_1.get_id(), spider::core::TaskState::Ready)
+                    .success());
+    std::this_thread::sleep_for(std::chrono::seconds(1));


🛠️ Refactor suggestion

Consider removing sleep and adding error handling

The sleep in the first test case seems unnecessary, and the second test case is missing error handling for add_data.

- std::this_thread::sleep_for(std::chrono::seconds(1));

And in the soft locality test:

- data_store->add_data(data); + REQUIRE(data_store->add_data(data).success());

Also applies to: 140-140

coderabbitai · 2024-12-04T08:38:03Z

tests/scheduler/test-SchedulerPolicy.cpp

+        REQUIRE(task_id == task.get_id());
+    }
+
+    // REQUIRE(metadata_store->remove_job(job_id).success());


⚠️ Potential issue

Uncomment cleanup code

The cleanup code is commented out, which could leave test data in the storage.

- // REQUIRE(metadata_store->remove_job(job_id).success()); + REQUIRE(metadata_store->remove_job(job_id).success());

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

// REQUIRE(metadata_store->remove_job(job_id).success());

REQUIRE(metadata_store->remove_job(job_id).success());

coderabbitai

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (3)

tests/scheduler/test-SchedulerPolicy.cpp (2)
135-135: Fix incorrect comment

The comment states "hard locality" but the test is actually setting up soft locality.
-    // Submit task with hard locality
+    // Submit task with soft locality
1-162: Consider adding more edge cases

The test suite covers the basic scheduling scenarios well. Consider adding tests for:

Multiple tasks with the same locality

Tasks with multiple locality constraints

Edge cases around locality string formats

🧰 Tools

🪛 cppcheck (2.10-2)

[error] 73-73: syntax error

(syntaxError)
src/spider/storage/MysqlStorage.cpp (1)
709-718: Consider making timestamp format configurable

The timestamp format is hardcoded. Consider making it configurable or defining it as a constant to improve maintainability.
+namespace {
+constexpr char const* const cTimestampFormat = "%Y-%m-%d %H:%M:%S";
+}

auto parse_timestamp(std::string const& timestamp
) -> std::optional<std::chrono::system_clock::time_point> {
    std::tm time_date{};
    std::stringstream ss{timestamp};
-    ss >> std::get_time(&time_date, "%Y-%m-%d %H:%M:%S");
+    ss >> std::get_time(&time_date, cTimestampFormat);
    if (ss.fail()) {
        return std::nullopt;
    }
    return std::chrono::system_clock::from_time_t(std::mktime(&time_date));
}

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 289a151 and 93481f8.

📒 Files selected for processing (3)

src/spider/core/JobMetadata.hpp (1 hunks)
src/spider/storage/MysqlStorage.cpp (8 hunks)
tests/scheduler/test-SchedulerPolicy.cpp (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

src/spider/core/JobMetadata.hpp

🧰 Additional context used

🪛 cppcheck (2.10-2)

tests/scheduler/test-SchedulerPolicy.cpp

[error] 73-73: syntax error

(syntaxError)

🔇 Additional comments (3)

tests/scheduler/test-SchedulerPolicy.cpp (1)

49-49: Remove unnecessary sleep

The sleep between task submissions is not required and could make the test flaky. The FIFO order is guaranteed by the database transaction order, not by the timing of submissions.

-    std::this_thread::sleep_for(std::chrono::seconds(1));

src/spider/storage/MysqlStorage.cpp (2)

54-54: Schema changes improve data integrity and flexibility

The changes to the schema are well thought out:

Adding NOT NULL constraints to timestamp columns
Changing address format to VARCHAR(40) for more flexible address formats
Consistent use of CURRENT_TIMESTAMP defaults

Also applies to: 69-69, 123-123, 141-141

724-760: Well-implemented job metadata retrieval

The implementation of get_job_metadata and get_task_job_id is robust:

Proper error handling for database operations
Consistent transaction management
Clear error messages with formatted strings

Also applies to: 852-876

coderabbitai · 2024-12-04T21:31:22Z

tests/scheduler/test-SchedulerPolicy.cpp

+    spider::core::Data data{"value"};
+    data.set_hard_locality(true);
+    data.set_locality({"127.0.0.1"});
+    REQUIRE(data_store->add_data(data).success());


🛠️ Refactor suggestion

Add error handling for data store operations

The data store operation should be checked for errors to ensure the test fails appropriately if the setup fails.

- REQUIRE(data_store->add_data(data).success()); + auto result = data_store->add_data(data); + REQUIRE_MESSAGE(result.success(), + fmt::format("Failed to add data: {}", result.message()));

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

REQUIRE(data_store->add_data(data).success());

auto result = data_store->add_data(data);

REQUIRE_MESSAGE(result.success(),

fmt::format("Failed to add data: {}", result.message()));

sitaowang1998 added 10 commits December 3, 2024 21:42

Add creation time in job metadata

7b87f9c

Add clang format for tests

49c0ee3

Make SchedulerPolicy pure virtual

7e5eb95

Fix bug in mysql table creation

954416d

Add fifo job scheduler

2248eb1

Add scheduler test

d826b4f

Fix scheduler

54e23a5

Fix clang tidy

7e3dfb6

Fix clang tidy

cd01beb

Fix clang tidy

289a151

coderabbitai bot reviewed Dec 4, 2024

View reviewed changes

sitaowang1998 added 3 commits December 4, 2024 21:25

Add error handle for mysql timestamp parsing

d614d9b

Mark JobMetadata getter as const

c10a764

Add more check for scheduler test

93481f8

coderabbitai bot reviewed Dec 4, 2024

View reviewed changes

Fix typo in comment

7725428

sitaowang1998 merged commit d307a2a into y-scope:main Dec 5, 2024
2 checks passed

sitaowang1998 deleted the scheduler branch December 5, 2024 00:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add scheduler scheduling #31

feat: Add scheduler scheduling #31

sitaowang1998 commented Dec 4, 2024 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Dec 4, 2024 •

edited

Loading

Rate limit exceeded

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

coderabbitai bot left a comment

coderabbitai bot Dec 4, 2024

coderabbitai bot Dec 4, 2024

coderabbitai bot Dec 4, 2024

coderabbitai bot Dec 4, 2024

coderabbitai bot left a comment

coderabbitai bot Dec 4, 2024

	// REQUIRE(metadata_store->remove_job(job_id).success());
	REQUIRE(metadata_store->remove_job(job_id).success());

-    REQUIRE(data_store->add_data(data).success());
+    auto result = data_store->add_data(data);
+    REQUIRE_MESSAGE(result.success(),
+        fmt::format("Failed to add data: {}", result.message()));

feat: Add scheduler scheduling #31

feat: Add scheduler scheduling #31

Conversation

sitaowang1998 commented Dec 4, 2024 • edited by coderabbitai bot Loading

Description

Validation performed

Summary by CodeRabbit

Release Notes

coderabbitai bot commented Dec 4, 2024 • edited Loading

Rate limit exceeded

Walkthrough

Changes

Possibly related PRs

Suggested reviewers

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Dec 4, 2024

Choose a reason for hiding this comment

coderabbitai bot Dec 4, 2024

Choose a reason for hiding this comment

coderabbitai bot Dec 4, 2024

Choose a reason for hiding this comment

coderabbitai bot Dec 4, 2024

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Dec 4, 2024

Choose a reason for hiding this comment

sitaowang1998 commented Dec 4, 2024 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Dec 4, 2024 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)