Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add worker client that submit tasks and get next task from schedulers #35

Merged
merged 8 commits into from
Dec 12, 2024

Conversation

sitaowang1998
Copy link
Collaborator

@sitaowang1998 sitaowang1998 commented Dec 12, 2024

Description

Add get normal schedulers in metadata storage. Add logic to update outputs, update dependent tasks inputs and states in metadata storage task_finish. Add worker client that submit tasks and get next task from scheduler.

Validation performed

  • GitHub workflows pass
  • Unit tests for metadata storage in devcontainer

Summary by CodeRabbit

  • New Features

    • Introduced Driver and Scheduler classes for enhanced task and driver management.
    • Added functionality to manage task completion and retrieval of active schedulers.
    • New test case for task completion handling.
  • Bug Fixes

    • Updated method signatures for adding drivers and schedulers to improve encapsulation.
  • Tests

    • Refactored test cases to utilize Driver and Scheduler objects.
    • Added new test case for verifying task completion functionality.

Copy link

coderabbitai bot commented Dec 12, 2024

Walkthrough

The pull request introduces modifications to the CMakeLists.txt file and various source files within the Spider project. Key changes include the addition of new header and source files for core, worker, and client modules, expanding their functionalities. The Driver and Scheduler classes are introduced, along with updates to the MetadataStorage class methods to accept these new classes. Additionally, test cases have been revised to reflect these changes, ensuring a consistent object-oriented approach throughout the project.

Changes

File Path Change Summary
src/spider/CMakeLists.txt Added new header and source files: core/Driver.hpp, worker/WorkerClient.hpp, worker/WorkerClient.cpp, client/Driver.hpp, client/Job.hpp.
src/spider/core/Driver.hpp Introduced classes Driver and Scheduler with respective constructors and accessor methods.
src/spider/storage/MetadataStorage.hpp Updated add_driver to accept a Driver object; added add_scheduler and get_active_scheduler methods; modified task_finish method signature.
src/spider/storage/MysqlStorage.cpp Refactored add_driver and add_scheduler methods to accept Driver and Scheduler objects; added get_active_scheduler method; modified task_finish.
src/spider/storage/MysqlStorage.hpp Updated method signatures for add_driver, add_scheduler, and task_finish to reflect the new object-oriented design.
src/spider/worker/WorkerClient.cpp Implemented WorkerClient class with methods for task completion and retrieval of the next task.
src/spider/worker/WorkerClient.hpp Introduced WorkerClient class with methods for task management; deleted copy and move constructors to ensure unique ownership.
tests/scheduler/test-SchedulerPolicy.cpp Updated test cases to add drivers using Driver objects instead of separate parameters.
tests/storage/test-DataStorage.cpp Modified test cases to utilize Driver objects for adding drivers to metadata storage.
tests/storage/test-MetadataStorage.cpp Updated test cases to use Driver and Scheduler objects; added a new test case for task completion handling.

Possibly related PRs

Suggested reviewers

  • kirkrodrigues

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Experiment)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Outside diff range and nitpick comments (4)
src/spider/storage/MysqlStorage.cpp (1)

984-1052: Consider refactoring to eliminate code duplication in task_finish

The loops for updating task outputs and inputs share similar structures. Refactoring these into a helper function could reduce duplication and improve maintainability.

Apply this refactoring suggestion:

// Create a helper function to update outputs and inputs
+void update_task_data(std::unique_ptr<sql::PreparedStatement>& statement, const std::vector<TaskOutput>& outputs, const sql::bytes& task_id_bytes) {
+    for (size_t i = 0; i < outputs.size(); ++i) {
+        const TaskOutput& output = outputs[i];
+        const std::optional<std::string>& value = output.get_value();
+        if (value.has_value()) {
+            statement->setString(1, value.value());
+        } else {
+            statement->setNull(1, sql::DataType::VARCHAR);
+        }
+        const std::optional<boost::uuids::uuid>& data_id = output.get_data_id();
+        if (data_id.has_value()) {
+            sql::bytes data_id_bytes = uuid_get_bytes(data_id.value());
+            statement->setBytes(2, &data_id_bytes);
+        } else {
+            statement->setNull(2, sql::DataType::BINARY);
+        }
+        statement->setBytes(3, &task_id_bytes);
+        statement->setUInt(4, i);
+        statement->executeUpdate();
+    }
+}

// Use the helper function for outputs
 update_task_data(output_statement, outputs, task_id_bytes);

// Use the helper function for inputs
 update_task_data(input_statement, outputs, task_id_bytes);
src/spider/core/Driver.hpp (1)

31-34: Optimise Scheduler accessors by returning const references

Returning const references for get_addr() and possibly get_id() improves efficiency by preventing copies.

Apply these diffs:

-    [[nodiscard]] auto get_addr() const -> std::string { return m_addr; }
+    [[nodiscard]] auto get_addr() const -> const std::string& { return m_addr; }

-    [[nodiscard]] auto get_id() const -> boost::uuids::uuid { return m_id; }
+    [[nodiscard]] auto get_id() const -> const boost::uuids::uuid& { return m_id; }
src/spider/storage/MetadataStorage.hpp (1)

51-52: Consider adding documentation for the outputs parameter.

The addition of TaskOutput vector parameter is well-designed, but documenting the expected format and handling of outputs would improve maintainability.

tests/storage/test-MetadataStorage.cpp (1)

228-284: Review the new task finish test case implementation

The test case effectively validates:

  1. Task state transitions
  2. Parent-child task relationships
  3. Output propagation

However, consider adding test coverage for edge cases:

  1. Task finish with invalid outputs
  2. Concurrent task completions
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e4e988b and b691e37.

📒 Files selected for processing (10)
  • src/spider/CMakeLists.txt (2 hunks)
  • src/spider/core/Driver.hpp (1 hunks)
  • src/spider/storage/MetadataStorage.hpp (3 hunks)
  • src/spider/storage/MysqlStorage.cpp (5 hunks)
  • src/spider/storage/MysqlStorage.hpp (3 hunks)
  • src/spider/worker/WorkerClient.cpp (1 hunks)
  • src/spider/worker/WorkerClient.hpp (1 hunks)
  • tests/scheduler/test-SchedulerPolicy.cpp (3 hunks)
  • tests/storage/test-DataStorage.cpp (4 hunks)
  • tests/storage/test-MetadataStorage.cpp (5 hunks)
🧰 Additional context used
🪛 cppcheck (2.10-2)
src/spider/worker/WorkerClient.cpp

[performance] 88-88: Variable 'id' is assigned in constructor body. Consider performing initialization in initialization list.

(useInitializationList)

🔇 Additional comments (19)
src/spider/worker/WorkerClient.cpp (2)

26-35: Constructor implementation looks correct and efficient

The member variables are properly initialized using the initializer list, and std::move is used where appropriate.


37-44: task_finish method is well-implemented

The method correctly updates task status and retrieves the next task.

src/spider/storage/MysqlStorage.cpp (2)

Line range hint 295-309: Refactored add_driver method enhances encapsulation

Accepting a Driver object instead of separate parameters improves code clarity and maintainability.


Line range hint 310-328: add_scheduler method updated effectively

The change to accept a Scheduler object aligns with the updated design and promotes better encapsulation.

src/spider/worker/WorkerClient.hpp (1)

17-48: WorkerClient class design is sound and follows best practices

The class correctly manages resources and prevents copying and moving, ensuring unique ownership.

src/spider/storage/MetadataStorage.hpp (2)

9-9: LGTM! Include addition is appropriate.

The addition of Driver.hpp is necessary for the new method signatures and follows the existing include organization pattern.


29-32: Well-structured API changes for driver and scheduler management.

The changes improve encapsulation by using proper object-oriented design:

  • Using Driver object instead of separate parameters reduces coupling
  • New scheduler methods align with the PR's objective to enhance scheduler management
src/spider/CMakeLists.txt (2)

13-13: LGTM! Core header addition is appropriate.

The Driver.hpp header is correctly placed in SPIDER_CORE_HEADERS, maintaining the project's organization.


52-53: Verify WorkerClient implementation completeness.

The addition of WorkerClient files aligns with the PR objectives. Let's verify the implementation includes all necessary components.

✅ Verification successful

WorkerClient implementation is complete and well-structured

The implementation includes all essential components:

  • Proper task lifecycle management through task_finish method
  • Task scheduling via get_next_task method
  • Robust error handling and scheduler communication
  • Appropriate storage interfaces for both data and metadata
  • Thread-safe design with deleted copy/move operations
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify WorkerClient implementation completeness

# Check for essential methods in WorkerClient
ast-grep --pattern 'class WorkerClient {
  $$$
  submit_task($_) {
    $$$
  }
  get_next_task($_) {
    $$$
  }
  $$$
}'

Length of output: 123


Script:

#!/bin/bash
# Let's check the actual content of WorkerClient files
echo "=== WorkerClient.hpp ==="
cat src/spider/worker/WorkerClient.hpp

echo -e "\n=== WorkerClient.cpp ==="
cat src/spider/worker/WorkerClient.cpp

Length of output: 4968

src/spider/storage/MysqlStorage.hpp (3)

14-14: LGTM! Include addition is appropriate.

The Driver.hpp include is necessary and follows the existing include organization.


55-56: LGTM! Method signature properly overrides base class.

The task_finish method correctly implements the interface change to handle task outputs.


35-38: Verify MySQL implementation completeness.

The method declarations properly override the base class. Let's verify the implementation handles all cases correctly.

✅ Verification successful

Let me gather more information about the base class to ensure the implementations match the expected behavior.


Let me try a different approach to verify the base class and implementation details.


Implementation appears complete and well-structured

The MySQL implementation properly handles all required operations:

  • add_driver: Correctly inserts driver data with proper error handling for duplicates
  • add_scheduler: Handles both driver and scheduler table insertions in a single transaction
  • get_active_scheduler: Implements proper JOIN query to fetch active schedulers with their details

The implementations include:

  • Proper transaction handling
  • Error handling with appropriate StorageErr types
  • Consistent data type conversions
  • Complete CRUD operations matching the interface requirements
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify MySQL implementation completeness

# Check for complete implementation of new methods
rg -A 5 "MySqlMetadataStorage::add_driver|MySqlMetadataStorage::add_scheduler|MySqlMetadataStorage::get_active_scheduler" --type cpp

Length of output: 1875


Script:

#!/bin/bash
# Check base class method signatures and any relevant documentation
ast-grep --pattern 'class Storage {
  $$$
  virtual auto add_driver(Driver const& driver) -> StorageErr = 0;
  virtual auto add_scheduler(Scheduler const& scheduler) -> StorageErr = 0;
  virtual auto get_active_scheduler(std::vector<Scheduler>* schedulers) -> StorageErr = 0;
  $$$
}'

# Check for any test cases that might reveal expected behavior
rg -l "TEST.*MySqlMetadataStorage" --type cpp

Length of output: 334


Script:

#!/bin/bash
# Check Storage base class definition
rg -A 10 "class Storage" --type hpp

# Check complete implementation of these methods
rg -A 15 "MySqlMetadataStorage::(add_driver|add_scheduler|get_active_scheduler)" --type cpp

# Look for any related test files
fd "mysql.*test" --type f

Length of output: 4404

tests/scheduler/test-SchedulerPolicy.cpp (2)

16-16: LGTM: Core Driver header inclusion

The addition of the Driver header aligns with the encapsulation of driver-related functionality.


93-93: LGTM: Driver object encapsulation

The update to use the Driver class for driver creation improves code organization and type safety.

Also applies to: 138-138

tests/storage/test-DataStorage.cpp (2)

10-10: LGTM: Driver header inclusion

The addition of the Driver header supports the new driver management approach.


28-28: LGTM: Consistent Driver object usage

The update to use the Driver class across all test cases ensures consistency in driver management.

Also applies to: 174-175

tests/storage/test-MetadataStorage.cpp (3)

15-15: LGTM: Driver header inclusion

The addition of the Driver header supports the encapsulation of driver-related functionality.


185-186: Verify the removal of time-based assertions

The commented-out time assertions suggest potential timing issues. Consider either:

  1. Implementing more reliable time-based testing
  2. Using a different approach to validate job creation timing
✅ Verification successful

Let me gather more context about the test file and related implementations.


Time-based assertions were appropriately removed

The commented-out assertions were attempting to verify job creation timestamps with a 1-second delta tolerance, which is inherently flaky. The removal is justified as:

  • These assertions could lead to intermittent test failures
  • The remaining assertions adequately verify the critical aspects (job ID, client ID, and task graph equality)
  • No other time-based assertions are present in the test suite
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check for other time-based assertions in test files
rg "time_delta|creation_time" tests/

Length of output: 474


Script:

#!/bin/bash
# Check the full test context and implementation
rg -B 5 -A 5 "job_creation_time" tests/storage/test-MetadataStorage.cpp

# Check for other timing-related tests or assertions
ast-grep --pattern 'REQUIRE($$$time$$$)'

Length of output: 1213


258-270: Verify task state transition handling

The test validates that completing one parent task doesn't prematurely update child task states. Consider adding assertions to verify:

  1. Parent task's final state
  2. Task output persistence

Comment on lines 46 to 97
auto WorkerClient::get_next_task() -> std::optional<boost::uuids::uuid> {
// Get schedulers
std::vector<core::Scheduler> schedulers;
if (!m_metadata_store->get_active_scheduler(&schedulers).success()) {
return std::nullopt;
}
std::random_device random_device;
std::default_random_engine rng{random_device()};
std::ranges::shuffle(schedulers, rng);

std::vector<boost::asio::ip::tcp::endpoint> endpoints;
std::ranges::transform(
schedulers,
std::back_inserter(endpoints),
[](core::Scheduler const& scheduler) {
return boost::asio::ip::tcp::endpoint{
boost::asio::ip::make_address(scheduler.get_addr()),
static_cast<unsigned short>(scheduler.get_port())
};
}
);
try {
// Create socket to scheduler
boost::asio::ip::tcp::socket socket(m_context);
boost::asio::connect(socket, endpoints);

scheduler::ScheduleTaskRequest const request{m_worker_id, m_worker_addr};
msgpack::sbuffer request_buffer;
msgpack::pack(request_buffer, request);

core::send_message(socket, request_buffer);

// Receive response
std::optional<msgpack::sbuffer> const optional_response_buffer
= core::receive_message(socket);
if (!optional_response_buffer.has_value()) {
return std::nullopt;
}
msgpack::sbuffer const& response_buffer = optional_response_buffer.value();

scheduler::ScheduleTaskResponse response;
msgpack::object_handle const response_handle
= msgpack::unpack(response_buffer.data(), response_buffer.size());
response_handle.get().convert(response);

return response.get_task_id();
} catch (boost::system::system_error const& e) {
return std::nullopt;
} catch (std::runtime_error const& e) {
return std::nullopt;
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codebase verification

Empty scheduler list is properly handled, but connection attempt could be optimized

The code already handles the case of no active schedulers through two mechanisms:

  • Early return with std::nullopt if get_active_scheduler fails
  • Exception handling that returns std::nullopt if the connection attempt fails

However, it would be more efficient to add an explicit check before the connection attempt:

if (endpoints.empty()) {
    return std::nullopt;
}
🔗 Analysis chain

Ensure robust handling when no schedulers are available

If there are no active schedulers, the method will attempt to connect with an empty endpoints list. Confirm that this behaviour is acceptable and consider handling this case explicitly.

Run the following script to check for potential issues when no schedulers are available:


Let me gather more information about the handling of empty lists in boost::asio::connect.


Let me check for tests in the correct directory structure and look for any error handling patterns.


Let me check the test files specifically for WorkerClient tests and empty scheduler scenarios.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify behaviour when no schedulers are returned.

# Test: Check the handling of empty schedulers list in `get_next_task`
rg -A 5 'get_active_scheduler' src/spider/worker/WorkerClient.cpp

Length of output: 308


Script:

#!/bin/bash
# Check if there are any tests covering this scenario
rg -l "get_next_task.*empty" test/
rg -l "boost::asio::connect.*empty" test/

# Look for connect documentation or similar usage
ast-grep --pattern 'boost::asio::connect($_, $_)'

Length of output: 224


Script:

#!/bin/bash
# Find test directories
fd -t d "test"

# Look for any boost::asio connect usage patterns
rg -A 5 "boost::asio::connect" 

# Check for any error handling patterns with empty containers
rg "endpoints\.empty\(\)"

Length of output: 2310


Script:

#!/bin/bash
# Look for WorkerClient tests
fd "test.*WorkerClient" tests/

# Check test content if found
rg -A 10 "get_next_task" tests/worker/

# Look at metadata store interface
rg -A 5 "get_active_scheduler" src/

Length of output: 2407

🧰 Tools
🪛 cppcheck (2.10-2)

[performance] 88-88: Variable 'id' is assigned in constructor body. Consider performing initialization in initialization list.

(useInitializationList)

Comment on lines 17 to 18
[[nodiscard]] auto get_addr() const -> std::string { return m_addr; }

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Optimise get_addr() by returning a const reference

Returning a const std::string& avoids unnecessary copying of the address string.

Apply this diff:

-    [[nodiscard]] auto get_addr() const -> std::string { return m_addr; }
+    [[nodiscard]] auto get_addr() const -> const std::string& { return m_addr; }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
[[nodiscard]] auto get_addr() const -> std::string { return m_addr; }
[[nodiscard]] auto get_addr() const -> const std::string& { return m_addr; }

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Outside diff range and nitpick comments (2)
src/spider/core/Driver.hpp (2)

11-13: Add class documentation and improve constructor parameter semantics.

Consider adding documentation to explain the class's purpose and responsibility. Also, make the move semantics more explicit in the constructor.

+/**
+ * Represents a worker node in the Spider system.
+ * Encapsulates the identity and network location of a worker.
+ */
 class Driver {
 public:
-    Driver(boost::uuids::uuid const id, std::string addr) : m_id{id}, m_addr{std::move(addr)} {}
+    /**
+     * Constructs a Driver instance.
+     * @param id Unique identifier for the driver
+     * @param addr Network address (ownership transferred)
+     */
+    explicit Driver(boost::uuids::uuid const id, std::string&& addr) : m_id{id}, m_addr{std::move(addr)} {}

11-41: Consider extracting a common interface for network entities.

Both Driver and Scheduler share common network entity characteristics (ID and address). Consider introducing an abstract base class or interface to represent this common behaviour, which could benefit future extensions and maintain consistency.

Example interface:

class NetworkEntity {
public:
    virtual ~NetworkEntity() = default;
    [[nodiscard]] virtual auto get_id() const -> boost::uuids::uuid const& = 0;
    [[nodiscard]] virtual auto get_addr() const -> std::string const& = 0;
};
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b691e37 and 97d46a2.

📒 Files selected for processing (2)
  • src/spider/core/Driver.hpp (1 hunks)
  • src/spider/worker/WorkerClient.cpp (1 hunks)
🧰 Additional context used
🪛 cppcheck (2.10-2)
src/spider/worker/WorkerClient.cpp

[performance] 88-88: Variable 'id' is assigned in constructor body. Consider performing initialization in initialization list.

(useInitializationList)

🔇 Additional comments (1)
src/spider/core/Driver.hpp (1)

1-9: LGTM! Header structure and includes are well-organized.

The file structure follows best practices with proper header guards and minimal necessary includes.


class Driver {
public:
Driver(boost::uuids::uuid const id, std::string addr) : m_id{id}, m_addr{std::move(addr)} {}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add address validation in constructor.

Consider validating that the address isn't empty to ensure valid driver instances.

-    Driver(boost::uuids::uuid const id, std::string addr) : m_id{id}, m_addr{std::move(addr)} {}
+    Driver(boost::uuids::uuid const id, std::string addr) {
+        if (addr.empty()) {
+            throw std::invalid_argument("Driver address cannot be empty");
+        }
+        m_id = id;
+        m_addr = std::move(addr);
+    }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
Driver(boost::uuids::uuid const id, std::string addr) : m_id{id}, m_addr{std::move(addr)} {}
Driver(boost::uuids::uuid const id, std::string addr) {
if (addr.empty()) {
throw std::invalid_argument("Driver address cannot be empty");
}
m_id = id;
m_addr = std::move(addr);
}

Comment on lines +24 to +30
class Scheduler {
public:
Scheduler(boost::uuids::uuid const id, std::string addr, int port)
: m_id{id},
m_addr{std::move(addr)},
m_port{port} {}

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add documentation and parameter validation for Scheduler class.

The Scheduler class needs documentation and parameter validation.

+/**
+ * Represents a scheduler node in the Spider system.
+ * Manages task distribution and scheduling operations.
+ */
 class Scheduler {
 public:
+    /**
+     * Constructs a Scheduler instance.
+     * @param id Unique identifier for the scheduler
+     * @param addr Network address (ownership transferred)
+     * @param port Network port number (must be between 1 and 65535)
+     * @throws std::invalid_argument if port is invalid or address is empty
+     */
-    Scheduler(boost::uuids::uuid const id, std::string addr, int port)
-            : m_id{id},
-              m_addr{std::move(addr)},
-              m_port{port} {}
+    Scheduler(boost::uuids::uuid const id, std::string&& addr, int port) {
+        if (addr.empty()) {
+            throw std::invalid_argument("Scheduler address cannot be empty");
+        }
+        if (port <= 0 || port > 65535) {
+            throw std::invalid_argument("Invalid port number");
+        }
+        m_id = id;
+        m_addr = std::move(addr);
+        m_port = port;
+    }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
class Scheduler {
public:
Scheduler(boost::uuids::uuid const id, std::string addr, int port)
: m_id{id},
m_addr{std::move(addr)},
m_port{port} {}
/**
* Represents a scheduler node in the Spider system.
* Manages task distribution and scheduling operations.
*/
class Scheduler {
public:
/**
* Constructs a Scheduler instance.
* @param id Unique identifier for the scheduler
* @param addr Network address (ownership transferred)
* @param port Network port number (must be between 1 and 65535)
* @throws std::invalid_argument if port is invalid or address is empty
*/
Scheduler(boost::uuids::uuid const id, std::string&& addr, int port) {
if (addr.empty()) {
throw std::invalid_argument("Scheduler address cannot be empty");
}
if (port <= 0 || port > 65535) {
throw std::invalid_argument("Invalid port number");
}
m_id = id;
m_addr = std::move(addr);
m_port = port;
}

Comment on lines +26 to +35
WorkerClient::WorkerClient(
boost::uuids::uuid const worker_id,
std::string worker_addr,
std::shared_ptr<core::DataStorage> data_store,
std::shared_ptr<core::MetadataStorage> metadata_store
)
: m_worker_id{worker_id},
m_worker_addr{std::move(worker_addr)},
m_data_store(std::move(data_store)),
m_metadata_store(std::move(metadata_store)) {}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Declare and initialize m_context member variable

The member variable m_context is used in the get_next_task method but is neither declared nor initialized in the WorkerClient class. This omission will lead to compilation errors due to the undefined reference to m_context. Please declare m_context as a member variable of type boost::asio::io_context.

Add the following code to WorkerClient.hpp inside the WorkerClient class:

private:
    boost::asio::io_context m_context;

Comment on lines +64 to +70
[](core::Scheduler const& scheduler) {
return boost::asio::ip::tcp::endpoint{
boost::asio::ip::make_address(scheduler.get_addr()),
static_cast<unsigned short>(scheduler.get_port())
};
}
);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Validate port numbers to prevent potential truncation

Casting scheduler.get_port() to unsigned short without validation may result in truncation or unexpected behaviour if the port number exceeds the valid range of 0 to 65,535. To ensure robust network communication, validate that the port number is within the acceptable range before casting.

Apply this diff to add port number validation:

return boost::asio::ip::tcp::endpoint{
    boost::asio::ip::make_address(scheduler.get_addr()),
-   static_cast<unsigned short>(scheduler.get_port())
+   static_cast<unsigned short>(
+       (scheduler.get_port() >= 0 && scheduler.get_port() <= 65535)
+           ? scheduler.get_port()
+           : throw std::out_of_range("Port number out of valid range")
+   )
};
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
[](core::Scheduler const& scheduler) {
return boost::asio::ip::tcp::endpoint{
boost::asio::ip::make_address(scheduler.get_addr()),
static_cast<unsigned short>(scheduler.get_port())
};
}
);
[](core::Scheduler const& scheduler) {
return boost::asio::ip::tcp::endpoint{
boost::asio::ip::make_address(scheduler.get_addr()),
static_cast<unsigned short>(
(scheduler.get_port() >= 0 && scheduler.get_port() <= 65535)
? scheduler.get_port()
: throw std::out_of_range("Port number out of valid range")
)
};
}
);

@sitaowang1998 sitaowang1998 merged commit 38297fc into y-scope:main Dec 12, 2024
4 checks passed
@sitaowang1998 sitaowang1998 deleted the worker-client branch December 12, 2024 16:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant