Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add scheduler scheduling #31

Merged
merged 14 commits into from
Dec 5, 2024
Merged

Conversation

sitaowang1998
Copy link
Collaborator

@sitaowang1998 sitaowang1998 commented Dec 4, 2024

Description

Add scheduler policy interface and fifo scheduling policy.

Validation performed

  • lint:check
  • Unit test pass in devcontainer with mysql setup as storage.

Summary by CodeRabbit

Release Notes

  • New Features

    • Introduced a FIFO scheduling policy to manage task execution efficiently.
    • Added job metadata management capabilities, allowing better tracking of job details.
    • Enhanced database schema for improved data integrity in job and task management.
  • Bug Fixes

    • Improved error handling for job and task metadata retrieval.
  • Tests

    • Added unit tests for the new FIFO scheduling policy, ensuring correct task scheduling based on locality constraints.
    • Expanded test coverage for job metadata handling, validating successful retrieval and removal of job metadata.
  • Documentation

    • Updated configuration files for consistent code formatting across the project.

Copy link

coderabbitai bot commented Dec 4, 2024

Warning

Rate limit exceeded

@sitaowang1998 has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 16 minutes and 48 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between 93481f8 and 7725428.

Walkthrough

The pull request introduces significant changes to the Spider project, focusing on the addition of a new scheduling policy and associated metadata management. A new SchedulerPolicy class and its derived FifoPolicy class are implemented, along with a JobMetadata class to encapsulate job-related information. Modifications to the CMakeLists.txt reflect the inclusion of new source files and adjustments in linking Boost libraries. Additionally, unit tests are added to validate the new scheduling functionalities, while existing test files undergo reorganization of include statements.

Changes

File Path Change Summary
src/spider/CMakeLists.txt Added SPIDER_SCHEDULER_SOURCES variable, created spider_scheduler executable, updated linking for Boost libraries.
src/spider/core/JobMetadata.hpp Introduced JobMetadata class with constructors and getter methods for job metadata.
src/spider/scheduler/FifoPolicy.cpp Implemented FifoPolicy class with methods for scheduling and job cleanup.
src/spider/scheduler/FifoPolicy.hpp Defined FifoPolicy class and its methods for scheduling and cleanup operations.
src/spider/scheduler/SchedulerPolicy.hpp Created abstract SchedulerPolicy class with pure virtual methods for scheduling policies.
src/spider/scheduler/scheduler.cpp Added main function with an empty body for the scheduler executable.
src/spider/storage/MetadataStorage.hpp Added pure virtual methods for job metadata retrieval and task job ID retrieval.
src/spider/storage/MysqlStorage.cpp Enhanced database schema and added methods for job metadata management.
src/spider/storage/MysqlStorage.hpp Added methods for job metadata retrieval and task job ID retrieval.
tests/.clang-format Introduced clang-format configuration file for C++ code formatting rules.
tests/CMakeLists.txt Added new test source file for scheduler tests and organized test sources.
tests/scheduler/test-SchedulerPolicy.cpp Added unit tests for FIFO scheduling policy with various locality constraints.
tests/storage/StorageTestHelper.hpp Reordered include statements without functional changes.
tests/storage/test-DataStorage.cpp Reorganized include statements; no functional changes.
tests/storage/test-MetadataStorage.cpp Expanded test cases for job metadata retrieval and reorganized include directives.
tests/utils/CoreTaskUtils.cpp Reordered include statements without functional changes.
tests/worker/test-FunctionManager.cpp Reordered include statements without functional changes.
tests/worker/test-MessagePipe.cpp Reordered include statements without functional changes.
tests/worker/test-TaskExecutor.cpp Reordered include statements without functional changes.
tests/worker/worker-test.cpp Activated inclusion of FunctionManager.hpp header in tests.

Possibly related PRs

Suggested reviewers

  • kirkrodrigues

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Outside diff range and nitpick comments (16)
src/spider/scheduler/SchedulerPolicy.hpp (3)

23-29: Consider passing boost::uuids::uuid by constant reference

Passing boost::uuids::uuid by const reference in schedule_next could improve performance by avoiding unnecessary copies.


30-30: Consider passing boost::uuids::uuid by constant reference

Similarly, passing boost::uuids::uuid by const reference in cleanup_job could enhance performance by preventing unnecessary copying.


10-11: Reduce header dependencies by forward declaring classes

Forward declaring core::DataStorage and core::MetadataStorage instead of including their headers can minimize compilation dependencies and improve build times.

src/spider/scheduler/FifoPolicy.cpp (1)

39-43: Prefer using !expression over false == expression for clarity

Replacing instances of false == expression with !expression enhances readability and aligns with common C++ conventions.

Apply these changes:

- if (false == data_store->get_data(data_id, &data).success()) {
+ if (!data_store->get_data(data_id, &data).success()) {
- if (false == data.is_hard_locality()) {
+ if (!data.is_hard_locality()) {
- if (false == metadata_store->get_task_job_id(task_id, &job_id).success()) {
+ if (!metadata_store->get_task_job_id(task_id, &job_id).success()) {
- if (false == metadata_store->get_job_metadata(job_id, &job_metadata).success()) {
+ if (!metadata_store->get_job_metadata(job_id, &job_metadata).success()) {

Also applies to: 44-46, 89-95, 103-109

tests/worker/worker-test.cpp (1)

3-4: Remove redundant include or confirm its necessity

The inclusion of "../../src/spider/worker/FunctionManager.hpp" seems to be uncommented now. Ensure that it's required for this test file; otherwise, consider removing it to reduce dependencies.

If the FunctionManager is used, then this change is appropriate.

src/spider/core/JobMetadata.hpp (1)

14-21: Use member initializer list consistently

Ensure all member variables are initialized using the member initializer list to improve performance and clarity.

Confirm that no assignments occur within the constructor body.

src/spider/storage/MetadataStorage.hpp (1)

36-36: LGTM: Well-designed interface extensions

The new pure virtual methods:

  • Follow consistent error handling patterns
  • Maintain interface consistency
  • Use appropriate parameter types
  • Have clear, single responsibilities

Consider adding documentation comments to describe the expected behaviour and error conditions.

Also applies to: 45-46

src/spider/scheduler/FifoPolicy.hpp (2)

18-27: Consider documenting thread safety guarantees

The class design looks solid with appropriate use of final and override keywords. However, please document thread safety guarantees as this class will likely be used in a concurrent context.

Add documentation comments like:

/**
 * FIFO scheduling policy implementation.
 * Thread safety: [document guarantees here]
 */
class FifoPolicy final : public SchedulerPolicy {

29-30: Consider atomic operations or mutex protection

The member variables storing task-job mappings and timestamps might need protection in a concurrent environment. Consider:

  1. Adding a mutex to protect concurrent access
  2. Documenting the synchronization requirements for derived implementations
src/spider/CMakeLists.txt (1)

95-105: Consider adding install target for spider_scheduler

The executable is properly set up with sources and dependencies, but missing an install configuration.

Add install configuration similar to other executables:

target_link_libraries(
    spider_scheduler
    PRIVATE
        Boost::headers
        absl::flat_hash_map
        spdlog::spdlog
)
+ install(TARGETS spider_scheduler
+         RUNTIME DESTINATION bin)
src/spider/storage/MysqlStorage.hpp (2)

40-40: Consider documenting return values

The new method would benefit from documentation explaining possible storage errors.

Add documentation:

+    /**
+     * Retrieves job metadata for the given job ID
+     * @param id The job UUID to look up
+     * @param job Pointer to JobMetadata object to populate
+     * @return StorageErr::Success on successful retrieval
+     *         StorageErr::NotFound if job doesn't exist
+     */
     auto get_job_metadata(boost::uuids::uuid id, JobMetadata* job) -> StorageErr override;

49-49: Consider documenting return values

Similar to get_job_metadata, this method would benefit from documentation.

Add documentation:

+    /**
+     * Retrieves the job ID associated with a task
+     * @param id The task UUID to look up
+     * @param job_id Pointer to UUID to populate with job ID
+     * @return StorageErr::Success on successful retrieval
+     *         StorageErr::NotFound if task doesn't exist
+     */
     auto get_task_job_id(boost::uuids::uuid id, boost::uuids::uuid* job_id) -> StorageErr override;
tests/scheduler/test-SchedulerPolicy.cpp (1)

135-135: Fix comment accuracy

The comment incorrectly states "hard locality" for a soft locality test.

-    // Submit task with hard locality
+    // Submit task with soft locality
tests/storage/test-MetadataStorage.cpp (3)

3-21: Consider adding header guards

While the includes are well-organized by category (STL, external, project), consider adding header guards to prevent potential multiple inclusion issues, especially since this is a test file that might be included elsewhere.

+#ifndef SPIDER_TESTS_STORAGE_TEST_METADATA_STORAGE_HPP
+#define SPIDER_TESTS_STORAGE_TEST_METADATA_STORAGE_HPP

// existing includes...

+#endif // SPIDER_TESTS_STORAGE_TEST_METADATA_STORAGE_HPP

147-149: Consider using steady_clock for more precise timing tests

For testing time-related functionality, std::chrono::steady_clock might be more appropriate than system_clock as it's monotonic and not affected by system time adjustments.

-    std::chrono::system_clock::time_point const job_creation_time
-            = std::chrono::system_clock::now();
+    std::chrono::steady_clock::time_point const job_creation_time
+            = std::chrono::steady_clock::now();

177-185: Enhance test coverage with edge cases

The metadata verification is good but could be more comprehensive. Consider adding test cases for:

  • Jobs with empty task graphs
  • Jobs with maximum allowed tasks
  • Jobs with circular dependencies (should fail)
  • Jobs with duplicate task IDs (should fail)
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 8c346c5 and 289a151.

📒 Files selected for processing (20)
  • src/spider/CMakeLists.txt (3 hunks)
  • src/spider/core/JobMetadata.hpp (1 hunks)
  • src/spider/scheduler/FifoPolicy.cpp (1 hunks)
  • src/spider/scheduler/FifoPolicy.hpp (1 hunks)
  • src/spider/scheduler/SchedulerPolicy.hpp (1 hunks)
  • src/spider/scheduler/scheduler.cpp (1 hunks)
  • src/spider/storage/MetadataStorage.hpp (3 hunks)
  • src/spider/storage/MysqlStorage.cpp (8 hunks)
  • src/spider/storage/MysqlStorage.hpp (3 hunks)
  • tests/.clang-format (1 hunks)
  • tests/CMakeLists.txt (1 hunks)
  • tests/scheduler/test-SchedulerPolicy.cpp (1 hunks)
  • tests/storage/StorageTestHelper.hpp (1 hunks)
  • tests/storage/test-DataStorage.cpp (1 hunks)
  • tests/storage/test-MetadataStorage.cpp (4 hunks)
  • tests/utils/CoreTaskUtils.cpp (1 hunks)
  • tests/worker/test-FunctionManager.cpp (1 hunks)
  • tests/worker/test-MessagePipe.cpp (1 hunks)
  • tests/worker/test-TaskExecutor.cpp (1 hunks)
  • tests/worker/worker-test.cpp (1 hunks)
✅ Files skipped from review due to trivial changes (8)
  • src/spider/scheduler/scheduler.cpp
  • tests/worker/test-MessagePipe.cpp
  • tests/storage/StorageTestHelper.hpp
  • tests/storage/test-DataStorage.cpp
  • tests/utils/CoreTaskUtils.cpp
  • tests/worker/test-FunctionManager.cpp
  • tests/.clang-format
  • tests/worker/test-TaskExecutor.cpp
🧰 Additional context used
🪛 cppcheck (2.10-2)
src/spider/scheduler/FifoPolicy.cpp

[performance] 88-88: Variable 'id' is assigned in constructor body. Consider performing initialization in initialization list.

(useInitializationList)


[performance] 107-107: Variable 'm_id' is assigned in constructor body. Consider performing initialization in initialization list.

(useInitializationList)

tests/scheduler/test-SchedulerPolicy.cpp

[error] 73-73: syntax error

(syntaxError)

🔇 Additional comments (7)
src/spider/storage/MysqlStorage.cpp (1)

722-746: 🛠️ Refactor suggestion

Ensure accurate time parsing in parse_timestamp

When parsing timestamps, consider handling time zones and potential parsing errors to avoid inconsistencies or exceptions.

Would you like assistance in enhancing the parse_timestamp function to handle time zones and error cases?

src/spider/storage/MetadataStorage.hpp (1)

10-10: LGTM: Include placement is appropriate

The JobMetadata.hpp include is correctly placed with other core includes and is necessary for the new method parameter.

src/spider/scheduler/FifoPolicy.hpp (1)

4-14: LGTM: Well-organized includes

The includes are properly organized:

  1. Standard library headers
  2. Third-party headers (abseil, boost)
  3. Project-specific headers
tests/CMakeLists.txt (2)

11-11: LGTM: Test file properly integrated

The scheduler policy test file is correctly added to SPIDER_TEST_SOURCES.


25-29: LGTM: Well-structured test source organization

The new SPIDER_TEST_SCHEDULER_SOURCES variable follows the established pattern used for worker sources, maintaining consistency in the build system.

src/spider/CMakeLists.txt (1)

88-94: LGTM: Well-organized scheduler source files

The scheduler source files are properly organized and cached internally for access from other scopes.

tests/storage/test-MetadataStorage.cpp (1)

223-223: Verify complete cleanup after job removal

While the basic removal test is good, consider adding verification that all associated data (tasks, dependencies, metadata) is properly cleaned up from storage.

✅ Verification successful

Let me gather more information about the cleanup implementation in MySQL storage.


Let me try a different approach to check the implementation details.


Cleanup verification is properly implemented

The MySQL schema shows comprehensive cascading delete constraints that ensure complete cleanup:

  • Tasks are automatically deleted when a job is removed (task_job_id foreign key with ON DELETE CASCADE)
  • All related data is properly cleaned up through cascading deletes:
    • Task inputs and outputs
    • Task dependencies
    • Task instances
    • Data references

The implementation in MysqlStorage::remove_job is sufficient as the database schema handles the complete cleanup automatically.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for potential orphaned data after job removal

# Search for any references to job cleanup in the codebase
rg -A 5 "remove_job|cleanup.*job|delete.*job" --type cpp

Length of output: 5990


Script:

#!/bin/bash
# Check MySQL schema and cleanup implementation
rg -A 10 "CREATE TABLE.*jobs" --type sql
rg -A 10 "CREATE TABLE.*tasks" --type sql
rg -A 10 "FOREIGN KEY.*CASCADE" --type sql

Length of output: 578


Script:

#!/bin/bash
# Check MySQL implementation for cascading deletes and task cleanup
rg -A 10 "DELETE.*FROM.*jobs" --type cpp
rg -A 10 "DELETE.*FROM.*tasks" --type cpp
rg -A 10 "ON DELETE CASCADE" --type cpp

Length of output: 8795

Comment on lines 709 to 714
auto parse_timestamp(std::string const& timestamp) -> std::chrono::system_clock::time_point {
std::tm time_date{};
std::stringstream ss{timestamp};
ss >> std::get_time(&time_date, "%Y-%m-%d %H:%M:%S");
return std::chrono::system_clock::from_time_t(std::mktime(&time_date));
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Handle potential exceptions when parsing timestamps

The parse_timestamp function may throw an exception if the timestamp format doesn't match. Consider adding error handling to manage parsing failures gracefully.

Apply this change to handle parsing errors:

auto parse_timestamp(std::string const& timestamp) -> std::chrono::system_clock::time_point {
    std::tm time_date{};
    std::stringstream ss{timestamp};
    ss >> std::get_time(&time_date, "%Y-%m-%d %H:%M:%S");
+   if (ss.fail()) {
+       throw std::runtime_error("Failed to parse timestamp: " + timestamp);
+   }
    return std::chrono::system_clock::from_time_t(std::mktime(&time_date));
}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
auto parse_timestamp(std::string const& timestamp) -> std::chrono::system_clock::time_point {
std::tm time_date{};
std::stringstream ss{timestamp};
ss >> std::get_time(&time_date, "%Y-%m-%d %H:%M:%S");
return std::chrono::system_clock::from_time_t(std::mktime(&time_date));
}
auto parse_timestamp(std::string const& timestamp) -> std::chrono::system_clock::time_point {
std::tm time_date{};
std::stringstream ss{timestamp};
ss >> std::get_time(&time_date, "%Y-%m-%d %H:%M:%S");
if (ss.fail()) {
throw std::runtime_error("Failed to parse timestamp: " + timestamp);
}
return std::chrono::system_clock::from_time_t(std::mktime(&time_date));
}

Comment on lines 23 to 29
[[nodiscard]] auto get_id() -> boost::uuids::uuid { return m_id; }

[[nodiscard]] auto get_client_id() -> boost::uuids::uuid { return m_client_id; }

[[nodiscard]] auto get_creation_time() -> std::chrono::system_clock::time_point {
return m_creation_time;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Mark getter methods as const

Marking the getter methods as const ensures they do not modify the object's state and conveys the intended usage.

Apply this diff to make the methods const:

 [[nodiscard]] auto get_id() -> boost::uuids::uuid {
+    return m_id;
+}

 [[nodiscard]] auto get_client_id() -> boost::uuids::uuid {
+    return m_client_id;
 }

 [[nodiscard]] auto get_creation_time() -> std::chrono::system_clock::time_point {
     return m_creation_time;
 }

After modification:

[[nodiscard]] auto get_id() const -> boost::uuids::uuid { return m_id; }

[[nodiscard]] auto get_client_id() const -> boost::uuids::uuid { return m_client_id; }

[[nodiscard]] auto get_creation_time() const -> std::chrono::system_clock::time_point {
    return m_creation_time;
}

REQUIRE(metadata_store->add_job(job_id_1, client_id, graph_1).success());
REQUIRE(metadata_store->set_task_state(task_1.get_id(), spider::core::TaskState::Ready)
.success());
std::this_thread::sleep_for(std::chrono::seconds(1));
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Consider removing sleep and adding error handling

The sleep in the first test case seems unnecessary, and the second test case is missing error handling for add_data.

-    std::this_thread::sleep_for(std::chrono::seconds(1));

And in the soft locality test:

-    data_store->add_data(data);
+    REQUIRE(data_store->add_data(data).success());

Also applies to: 140-140

REQUIRE(task_id == task.get_id());
}

// REQUIRE(metadata_store->remove_job(job_id).success());
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Uncomment cleanup code

The cleanup code is commented out, which could leave test data in the storage.

-    // REQUIRE(metadata_store->remove_job(job_id).success());
+    REQUIRE(metadata_store->remove_job(job_id).success());
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// REQUIRE(metadata_store->remove_job(job_id).success());
REQUIRE(metadata_store->remove_job(job_id).success());

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (3)
tests/scheduler/test-SchedulerPolicy.cpp (2)

135-135: Fix incorrect comment

The comment states "hard locality" but the test is actually setting up soft locality.

-    // Submit task with hard locality
+    // Submit task with soft locality

1-162: Consider adding more edge cases

The test suite covers the basic scheduling scenarios well. Consider adding tests for:

  • Multiple tasks with the same locality
  • Tasks with multiple locality constraints
  • Edge cases around locality string formats
🧰 Tools
🪛 cppcheck (2.10-2)

[error] 73-73: syntax error

(syntaxError)

src/spider/storage/MysqlStorage.cpp (1)

709-718: Consider making timestamp format configurable

The timestamp format is hardcoded. Consider making it configurable or defining it as a constant to improve maintainability.

+namespace {
+constexpr char const* const cTimestampFormat = "%Y-%m-%d %H:%M:%S";
+}

auto parse_timestamp(std::string const& timestamp
) -> std::optional<std::chrono::system_clock::time_point> {
    std::tm time_date{};
    std::stringstream ss{timestamp};
-    ss >> std::get_time(&time_date, "%Y-%m-%d %H:%M:%S");
+    ss >> std::get_time(&time_date, cTimestampFormat);
    if (ss.fail()) {
        return std::nullopt;
    }
    return std::chrono::system_clock::from_time_t(std::mktime(&time_date));
}
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 289a151 and 93481f8.

📒 Files selected for processing (3)
  • src/spider/core/JobMetadata.hpp (1 hunks)
  • src/spider/storage/MysqlStorage.cpp (8 hunks)
  • tests/scheduler/test-SchedulerPolicy.cpp (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/spider/core/JobMetadata.hpp
🧰 Additional context used
🪛 cppcheck (2.10-2)
tests/scheduler/test-SchedulerPolicy.cpp

[error] 73-73: syntax error

(syntaxError)

🔇 Additional comments (3)
tests/scheduler/test-SchedulerPolicy.cpp (1)

49-49: Remove unnecessary sleep

The sleep between task submissions is not required and could make the test flaky. The FIFO order is guaranteed by the database transaction order, not by the timing of submissions.

-    std::this_thread::sleep_for(std::chrono::seconds(1));
src/spider/storage/MysqlStorage.cpp (2)

54-54: Schema changes improve data integrity and flexibility

The changes to the schema are well thought out:

  • Adding NOT NULL constraints to timestamp columns
  • Changing address format to VARCHAR(40) for more flexible address formats
  • Consistent use of CURRENT_TIMESTAMP defaults

Also applies to: 69-69, 123-123, 141-141


724-760: Well-implemented job metadata retrieval

The implementation of get_job_metadata and get_task_job_id is robust:

  • Proper error handling for database operations
  • Consistent transaction management
  • Clear error messages with formatted strings

Also applies to: 852-876

spider::core::Data data{"value"};
data.set_hard_locality(true);
data.set_locality({"127.0.0.1"});
REQUIRE(data_store->add_data(data).success());
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add error handling for data store operations

The data store operation should be checked for errors to ensure the test fails appropriately if the setup fails.

-    REQUIRE(data_store->add_data(data).success());
+    auto result = data_store->add_data(data);
+    REQUIRE_MESSAGE(result.success(), 
+        fmt::format("Failed to add data: {}", result.message()));
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
REQUIRE(data_store->add_data(data).success());
auto result = data_store->add_data(data);
REQUIRE_MESSAGE(result.success(),
fmt::format("Failed to add data: {}", result.message()));

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant