feat: Add MySql support for storage backend #20

sitaowang1998 · 2024-11-05T00:39:24Z

Description

Add MySql implementation for data and metadata storage.

Validation performed

task:lint

Summary by CodeRabbit

New Features
- Enhanced task and data management functionalities in the database layer.
- Added methods for adding drivers and tasks, managing task graphs, and improved error handling.
- Introduced new methods for task management in TaskGraph and DataStorage.
Bug Fixes
- Improved connection error handling in database operations.
Refactor
- Updated method signatures for better parameter handling, transitioning to pointer-based management for certain methods.
Documentation
- Updated comments and method descriptions for clarity and maintainability.

…ctory explicitly

This reverts commit 4b52fb4.

coderabbitai · 2024-11-05T00:39:32Z

Warning

Rate limit exceeded

@sitaowang1998 has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 3 minutes and 3 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between 6c68a95 and 8b84f98.

Walkthrough

This pull request includes several changes across multiple files. The GitHub Actions workflow for code linting has been updated to only support the "ubuntu-latest" operating system. The FindMariaDBClient.cmake file has been deleted, while FindMariaDBClientCpp.cmake has been modified to enhance include directory handling. The CMakeLists.txt for the Spider project has adjusted library visibility and added dependencies. Multiple classes in the Spider core have been updated with new constructors, methods, and member variables to improve functionality and error handling, particularly in task and data management.

Changes

File Path	Change Summary
`.github/workflows/code-linting-checks.yaml`	Updated to only support "ubuntu-latest"; removed macOS-specific steps.
`cmake/Modules/FindMariaDBClient.cmake`	Deleted file that located the MariaDB client library.
`cmake/Modules/FindMariaDBClientCpp.cmake`	Enhanced include directory handling by adding new paths for headers.
`src/spider/CMakeLists.txt`	Adjusted library visibility for `spider_core` and added `fmt::fmt` as a private dependency; created an alias for `spider_worker`.
`src/spider/core/Data.hpp`	Added a constructor and new member variables for locality management; introduced new getter and setter methods.
`src/spider/core/Error.hpp`	Added `OtherErr` to `StorageErrType` enum; included a boolean conversion operator for `StorageErr`.
`src/spider/core/Task.hpp`	Introduced new setter methods in `TaskInput` and `TaskOutput`; modified `TaskInstance` to a struct with new member variables; added a new constructor in `Task`.
`src/spider/core/TaskGraph.hpp`	Added constructors and new methods for task management, including adding tasks and dependencies; marked `get_id` with `[[nodiscard]]`.
`src/spider/storage/DataStorage.hpp`	Updated method signatures to simplify parameters, removing UUID from `connect` and changing `get_data` to use a pointer.
`src/spider/storage/MetadataStorage.hpp`	Simplified `connect` method; added new methods for driver management and updated several method signatures to use pointers instead of references.
`src/spider/storage/MysqlStorage.cpp`	Introduced a new enumeration for MariaDB errors; added methods for managing tasks and drivers; enhanced error handling in database operations.
`src/spider/storage/MysqlStorage.hpp`	Removed copy/move constructors; updated method signatures to simplify parameters and introduced new methods for driver management.

Possibly related PRs

build: Add CMake config and libraries #13: Modifications to the CMake configuration, including the addition of the MariaDBClient library, relevant to the updates on the CMake build system and library management.
feat: Add storage interface #14: Introduction of a storage interface that includes methods for handling data storage, which may relate to the overall project structure and dependencies managed in the main PR.
build: Split core into spearate library #15: Separation of core components into a distinct library, enhancing the project's modularity, aligning with the main PR's focus on improving the CMake configuration and build process.

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 18

🧹 Outside diff range and nitpick comments (15)

src/spider/core/Error.hpp (1)

15-16: Consider replacing OtherErr with specific error types

While ConstraintViolationErr is well-defined, using a catch-all OtherErr type might obscure specific error conditions. Consider identifying and adding explicit error types for known MySQL-specific errors that could occur.
.github/workflows/code-linting-checks.yaml (1)
Line range hint 39-41: Remove redundant macOS-specific step.

Since macOS has been removed from the matrix configuration, this conditional step for installing coreutils on macOS should be removed as it will never be executed.

Apply this diff to remove the redundant step:
-      - if: "matrix.os == 'macos-latest'"
-        name: "Install coreutils (for md5sum)"
-        run: "brew install coreutils"
src/spider/core/Data.hpp (1)

35-35: Consider a more descriptive name for the boolean flag

While the implementation is correct, the name is_hard_locality could be more descriptive about its purpose in the context of MySQL storage.

Consider renaming to something more specific like is_locality_strict or has_strict_locality_constraints.
src/spider/storage/MetadataStorage.hpp (2)
21-27: Consider consolidating driver addition methods

The two add_driver methods could be simplified by providing a default port value in the three-parameter version, eliminating the need for two separate methods.
-virtual auto add_driver(boost::uuids::uuid id, std::string const& addr) -> StorageErr = 0;
-virtual auto add_driver(boost::uuids::uuid id, std::string const& addr, int port) -> StorageErr = 0;
+virtual auto add_driver(boost::uuids::uuid id, std::string const& addr, int port = 3306) -> StorageErr = 0;
45-49: Document scheduler state management

The new scheduler state management methods introduce important orchestration capabilities. Please add documentation comments to describe:

Expected state format/values

State persistence guarantees

Error handling expectations
+    /**
+     * Retrieves the current state of a scheduler
+     * @param id The UUID of the scheduler
+     * @param state Output parameter for the scheduler state
+     * @return StorageErr indicating success or specific error condition
+     */
     virtual auto get_scheduler_state(boost::uuids::uuid id, std::string* state) -> StorageErr = 0;

+    /**
+     * Updates the state of a scheduler
+     * @param id The UUID of the scheduler
+     * @param state The new state to set
+     * @return StorageErr indicating success or specific error condition
+     */
     virtual auto set_scheduler_state(boost::uuids::uuid id, std::string const& state) -> StorageErr = 0;
src/spider/storage/MysqlStorage.hpp (3)
22-46: Standardize class naming convention

The class name MySqlMetadataStorage uses inconsistent capitalization (MySql). Consider renaming to MySQLMetadataStorage to match common MySQL naming conventions.

Also, the pointer-based interface changes look good and provide better null handling capabilities.

Line range hint 55-72: Align MysqlDataStorage implementation with metadata storage patterns

Class naming should be consistent with MySQL conventions (consider MySQLDataStorage)

Connection member should use smart pointer:
-    sql::Connection* m_conn = nullptr;
+    std::unique_ptr<sql::Connection> m_conn{nullptr};
The pointer-based interface changes are well-designed and consistent with the metadata storage implementation.

Thread safety concerns are valid and need to be addressed

The codebase analysis reveals that the MySQL storage classes maintain connection state (m_conn) that's shared across multiple methods, but lack any synchronization mechanisms. The implementation shows:

Direct member access to m_conn across multiple methods

Connection state modifications (connect/close operations)

No mutex or other synchronization primitives

SQL operations that could be interleaved if called concurrently

Key areas requiring synchronization:

Connection management (connect/close methods)

All database operations using m_conn

State-changing operations (add/remove/update methods)

Recommendation: Add a mutex member to protect all operations that access or modify the connection state and database operations.

🔗 Analysis chain

Line range hint 22-72: Verify thread safety requirements

The storage classes handle shared state through database connections. Please verify if concurrent access is a requirement and consider adding appropriate synchronization if needed.
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Search for concurrent access patterns in the codebase
rg -l "std::thread|std::async|std::future" | while read -r file; do
  echo "=== $file ==="
  rg -A 3 "MysqlStorage|MySqlMetadataStorage" "$file"
done
Length of output: 59

Script:
#!/bin/bash
# Let's check for any threading or async patterns in header files
rg -t hpp "thread|async|future|mutex|concurrent|parallel"

# Also check if there are any comments or documentation about threading
rg -g "*.{hpp,cpp,md}" -i "thread.*safe|concurrent.*access"

# Check if the storage is used in any scheduler or worker contexts
rg -A 3 "class.*Storage.*\{" 
Length of output: 1529

Script:
#!/bin/bash
# Let's check how these storage classes are being used
rg -g "*.{cpp,hpp}" -B 3 -A 3 "MySqlMetadataStorage|MysqlDataStorage"

# Check for any scheduler or worker implementations
rg -g "*.{cpp,hpp}" "class.*(Scheduler|Worker|Engine)"

# Look for any initialization or usage patterns
rg -g "*.{cpp,hpp}" "new MySqlMetadataStorage|new MysqlDataStorage|make_.*Storage"
Length of output: 17369
src/spider/core/TaskGraph.hpp (3)
18-23: Consider optimizing UUID generator creation.

The random generator is created each time a TaskGraph is constructed. Consider making it static to avoid repeated initialization.
-    TaskGraph() {
-        boost::uuids::random_generator gen;
-        m_id = gen();
-    }
+    TaskGraph() {
+        static boost::uuids::random_generator gen;
+        m_id = gen();
+    }
43-51: Enhance method documentation.

While the method is well-implemented, the comment could be more descriptive about user responsibilities regarding dependency management.
-    // User is responsible to add the dependencies
+    /**
+     * Adds a task to the graph without any dependencies.
+     * Note: Users must explicitly add dependencies using add_dependencies() if needed.
+     * @param task The task to add
+     * @return true if task was added, false if task ID already exists
+     */
92-101: Consider caching head tasks for better performance.

The current implementation recalculates head tasks on every call. For large graphs with frequent access, consider maintaining a cached set of head tasks that's updated when dependencies change.

This would require:

Adding a member variable for cached head tasks

Updating it in add_task and add_dependencies methods

Adding a flag to track if cache is dirty

Would you like me to provide a detailed implementation?
cmake/Modules/FindMariaDBClientCpp.cmake (2)
108-109: Consider validating the existence of include subdirectories

The addition of these include directories is necessary for the MariaDB C++ connector headers. However, to make the module more robust, consider validating that these subdirectories exist before adding them to the target properties.

Here's a suggested improvement:
     set_target_properties(
         MariaDBClientCpp::MariaDBClientCpp
         PROPERTIES
             INTERFACE_INCLUDE_DIRECTORIES
                 "${MariaDBClientCpp_INCLUDE_DIR}"
-                "${MariaDBClientCpp_INCLUDE_DIR}/conncpp"
-                    "${MariaDBClientCpp_INCLUDE_DIR}/conncpp/compat"
+                $<$<BOOL:$<AND:
+                    $<BOOL:EXISTS:${MariaDBClientCpp_INCLUDE_DIR}/conncpp>,
+                    $<BOOL:EXISTS:${MariaDBClientCpp_INCLUDE_DIR}/conncpp/compat>>>:
+                    ${MariaDBClientCpp_INCLUDE_DIR}/conncpp;${MariaDBClientCpp_INCLUDE_DIR}/conncpp/compat>
     )
108-109: Fix indentation inconsistency

The indentation of the include paths is inconsistent. The second path has an extra level of indentation.

Apply this formatting fix:
     set_target_properties(
         MariaDBClientCpp::MariaDBClientCpp
         PROPERTIES
             INTERFACE_INCLUDE_DIRECTORIES
                 "${MariaDBClientCpp_INCLUDE_DIR}"
                 "${MariaDBClientCpp_INCLUDE_DIR}/conncpp"
-                    "${MariaDBClientCpp_INCLUDE_DIR}/conncpp/compat"
+                "${MariaDBClientCpp_INCLUDE_DIR}/conncpp/compat"
     )
src/spider/core/Task.hpp (2)

40-42: Consider adding validation and thread safety mechanisms

The new setter methods could benefit from:

Input validation to ensure data consistency

Thread safety mechanisms if the TaskInput objects are shared between threads

Documentation explaining the thread safety guarantees

Line range hint 53-74: Consider extracting common functionality

The setter methods and member variables in TaskInput and TaskOutput are nearly identical. Consider:

Creating a common base class or trait

Using templates to reduce code duplication

Documenting why these are separate classes if the functionality must remain duplicated

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 8491fb3 and a1986e3.

📒 Files selected for processing (12)

.github/workflows/code-linting-checks.yaml (1 hunks)
cmake/Modules/FindMariaDBClient.cmake (0 hunks)
cmake/Modules/FindMariaDBClientCpp.cmake (1 hunks)
src/spider/CMakeLists.txt (1 hunks)
src/spider/core/Data.hpp (2 hunks)
src/spider/core/Error.hpp (2 hunks)
src/spider/core/Task.hpp (4 hunks)
src/spider/core/TaskGraph.hpp (4 hunks)
src/spider/storage/DataStorage.hpp (1 hunks)
src/spider/storage/MetadataStorage.hpp (1 hunks)
src/spider/storage/MysqlStorage.cpp (1 hunks)
src/spider/storage/MysqlStorage.hpp (3 hunks)

💤 Files with no reviewable changes (1)

cmake/Modules/FindMariaDBClient.cmake

🧰 Additional context used

🪛 cppcheck

src/spider/storage/MysqlStorage.cpp

[performance] 87-87: Variable 'id' is assigned in constructor body. Consider performing initialization in initialization list.

(useInitializationList)

[performance] 114-114: Variable 'm_id' is assigned in constructor body. Consider performing initialization in initialization list.

(useInitializationList)

[performance] 20-20: Variable 'm_id' is assigned in constructor body. Consider performing initialization in initialization list.

(useInitializationList)

🔇 Additional comments (13)

src/spider/core/Error.hpp (1)

29-29: LGTM! Well-implemented boolean conversion operator

The explicit bool operator is a good addition that enables convenient error checking while preventing implicit conversions. The implementation correctly evaluates the error state.

src/spider/CMakeLists.txt (2)

28-28: LGTM: Appropriate visibility for fmt dependency

The fmt library is correctly set as PRIVATE since it's likely used only for implementation details and not exposed through public headers.

23-26: Review library visibility changes carefully

The change from PRIVATE to PUBLIC visibility for these dependencies means that any project linking against spider_core will also need to link against these libraries. This has several implications:

Boost, abseil, and MariaDBClientCpp become part of spider_core's public API
Version conflicts could arise in consumer projects
ABI compatibility becomes more complex to maintain

Consider keeping these as PRIVATE unless they are explicitly needed in public headers.

Let's verify if these dependencies are used in public headers:

✅ Verification successful

PUBLIC visibility is justified for these dependencies

The verification shows that these libraries are indeed used in public header files:

Boost is used in multiple public headers including TaskGraph.hpp, Task.hpp, Data.hpp, and storage headers
abseil (absl) is used in TaskGraph.hpp
MariaDB/MySQL is used in MysqlStorage.hpp

Since these dependencies are part of the public interface through header files, marking them as PUBLIC in CMakeLists.txt is the correct approach. This ensures that projects consuming the spider_core library will have access to the required dependencies.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check if these dependencies are used in public headers
# Expected: Headers that include these dependencies

# Search for Boost includes
echo "Checking Boost usage in headers:"
rg -l "include.*boost" --type hpp

# Search for abseil includes
echo "Checking abseil usage in headers:"
rg -l "include.*absl" --type hpp

# Search for MariaDB includes
echo "Checking MariaDB usage in headers:"
rg -l "include.*mariadb" --type hpp

Length of output: 423

Script:

#!/bin/bash
# Description: Check if these dependencies are used in public headers
# Expected: Headers that include these dependencies

# First, find header files
echo "Finding header files:"
fd -e h -e hpp

# Then search for includes in those files
echo -e "\nChecking Boost usage in headers:"
rg -l "include.*boost"

echo -e "\nChecking abseil usage in headers:"
rg -l "include.*absl"

echo -e "\nChecking MariaDB usage in headers:"
rg -l "include.*mariadb"

# Also check for public headers specifically
echo -e "\nListing potential public headers:"
fd -e h -e hpp . src/spider/include/

Length of output: 1203

src/spider/storage/DataStorage.hpp (2)

24-24: Consider implications of using pointer parameter in get_data()

While changing from reference to pointer parameter allows for null checks, it introduces potential null pointer risks. Please ensure:

All implementations handle null pointer cases
Documentation clearly states if null is a valid input
Callers are updated to handle potential null cases

#!/bin/bash
# Search for get_data implementations to verify null handling
ast-grep --pattern 'get_data(boost::uuids::uuid $_, Data* $_) {
  $$$
}'

Consider using std::optional instead of a raw pointer to make the nullable semantics more explicit and safer.

19-19: Verify impact of removing UUID from connect() method

The simplified signature improves the interface, but we should ensure this change doesn't break existing connection tracking functionality.

✅ Verification successful

Let me search for more specific patterns to understand the impact of this change.

Removing UUID from connect() method is safe and improves the interface

The verification shows that the simplified connect() method signature is already consistently implemented across the codebase:

MysqlDataStorage and MySqlMetadataStorage implementations use only the URL parameter
No existing code relies on UUID for connection tracking
Connection errors are properly handled through StorageErr return type

The change aligns with the current implementation and usage patterns.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for existing connect() method calls to verify impact
rg -A 2 "->connect\(" --type cpp

Length of output: 74

Script:

#!/bin/bash
# Search for connect method implementations and usages with broader patterns
rg "connect\(" --type cpp -A 3

# Search for storage class implementations
fd -e hpp -e cpp -e h . | xargs rg "class.*Storage.*:" -A 3

# Search for interface implementations
ast-grep --pattern 'class $_ : public DataStorage {
  $$$
}'

Length of output: 3681

src/spider/core/Data.hpp (2)

20-23: Consider adding input validation to the new constructor

The constructor accepts external IDs and strings without validation. This could lead to inconsistent state if invalid UUIDs or empty strings are provided.

Consider adding validation:

 Data(boost::uuids::uuid id, std::string key, std::string value)
-            : m_id(id),
-              m_key(std::move(key)),
-              m_value(std::move(value)) {}
+            : m_id(id),
+              m_key(std::move(key)),
+              m_value(std::move(value)) {
+    if (boost::uuids::uuid{} == id) {
+        throw std::invalid_argument("UUID cannot be null");
+    }
+    if (value.empty()) {
+        throw std::invalid_argument("Value cannot be empty");
+    }
+}

31-33: LGTM! Well-implemented getter method

The implementation follows best practices with const reference return type and [[nodiscard]] attribute.

src/spider/storage/MetadataStorage.hpp (1)

34-41: Verify TaskInstance compatibility

The change from Task to TaskInstance in get_task_timeout suggests a more detailed tracking mechanism. Ensure all implementations properly handle the additional instance-specific data.

#!/bin/bash
# Search for TaskInstance usage patterns
rg -A 5 "TaskInstance.*get_task_timeout"

src/spider/storage/MysqlStorage.hpp (1)

5-7: LGTM: Appropriate use of MariaDB C++ connector

The choice of MariaDB's C++ connector (conncpp) is well-suited for this C++ codebase, providing type-safe database operations.

src/spider/core/TaskGraph.hpp (4)

5-5: LGTM! Required includes are properly added.

The new includes support the added functionality for UUID generation and hash set operations.

Also applies to: 7-7

26-26: LGTM! Good optimization.

Storing the task ID locally avoids multiple getter calls and improves readability.

Also applies to: 32-32

57-57: LGTM! Proper use of [[nodiscard]].

The method correctly prevents accidental ignoring of the return value.

109-109: LGTM! Proper member variable declaration.

The m_id member is correctly declared as private and follows the class's naming convention.

coderabbitai · 2024-11-05T00:53:01Z

.github/workflows/code-linting-checks.yaml

@@ -20,7 +20,7 @@ jobs:
  lint:
    strategy:
      matrix:
-        os: ["macos-latest", "ubuntu-latest"]
+        os: ["ubuntu-latest"]


💡 Codebase verification

Dropping macOS CI checks requires attention to development dependencies

The codebase shows active macOS support requirements:

Library installation tasks specifically for macOS in dep-tasks.yaml

Platform-specific build configurations in CMake files

Dedicated macOS scripts directory at tools/scripts/macOS

Removing macOS from the CI matrix while maintaining macOS-specific development dependencies could lead to undetected issues for macOS developers. Consider either:

Restoring macOS CI checks to ensure cross-platform compatibility

Or documenting the decision to not verify macOS builds if that's intentional

🔗 Analysis chain

Verify if dropping macOS CI checks aligns with development needs.

The simplification to use only Ubuntu for CI is reasonable for reducing resource usage. However, please confirm that this change won't impact developers working on macOS systems.

Let's verify if there are any macOS-specific development requirements:

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash # Description: Check for macOS-specific code or configurations # Test: Search for macOS-specific patterns in the codebase rg -i "darwin|macos|apple|osx" --type yaml --type cmake

Length of output: 373

Script:

#!/bin/bash # Check for any platform-specific build or development setup cat dep-tasks.yaml

Length of output: 436

Script:

#!/bin/bash # Check if there are any active development branches or recent commits mentioning macOS git log --grep="macos\|darwin\|osx" --since="6 months ago" --pretty=format:"%h %s"

Length of output: 84

Script:

#!/bin/bash # Look for any platform-specific development documentation fd -e md -e txt | xargs rg -i "macos|darwin|development.*environment|prerequisites"

Length of output: 161

coderabbitai · 2024-11-05T00:53:01Z

src/spider/core/Data.hpp

+    void set_locality(std::vector<std::string>& locality) { m_locality = locality; }
+
+    void set_hard_locality(bool hard) { m_hard_locality = hard; }


🛠️ Refactor suggestion

Improve setter methods for better safety and clarity

The current implementation has several potential improvements:

set_locality takes a non-const reference which is unusual for setters

No validation of the locality vector

Missing documentation about the purpose of these settings in the MySQL storage context

Consider this improvement:

- void set_locality(std::vector<std::string>& locality) { m_locality = locality; } + void set_locality(std::vector<std::string> locality) { m_locality = std::move(locality); }

Also, please add documentation comments explaining:

What locality means in this context

When to use hard locality vs soft locality

Any constraints on the locality vector contents

Committable suggestion skipped: line range outside the PR's diff.

coderabbitai · 2024-11-05T00:53:01Z

src/spider/core/Data.hpp

+    std::vector<std::string> m_locality;
+    bool m_hard_locality = false;


💡 Codebase verification

Add documentation for locality-related member variables

These members are used in MySQL storage implementation for managing data locality:

m_locality: Stores a list of locality identifiers retrieved from MySQL storage

m_hard_locality: Controls locality behaviour (likely for enforcement settings)

Please add documentation above these member variables to explain:

The purpose and format of locality identifiers stored in the vector

The specific impact of the hard locality flag on data storage/retrieval

Any constraints or assumptions about these values

🔗 Analysis chain

Add documentation for new member variables

The new member variables need documentation explaining their role in MySQL storage implementation.

Let's verify how these members are used:

Let me gather more context about how these members are used.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash # Search for usage of locality in the codebase rg -l "m_locality" --type cpp rg -l "set_locality" --type cpp

Length of output: 147

Script:

#!/bin/bash # Check the implementation details rg "m_locality" -A 5 -B 5 --type cpp rg "set_locality" -A 5 -B 5 --type cpp

Length of output: 2797

coderabbitai · 2024-11-05T00:53:01Z

src/spider/storage/MetadataStorage.hpp

+    virtual auto get_task_graph(boost::uuids::uuid id, TaskGraph* task_graph) -> StorageErr = 0;
+    virtual auto get_task_graphs(std::vector<boost::uuids::uuid>* task_graphs) -> StorageErr = 0;


💡 Codebase verification

Null pointer checks needed in MySqlMetadataStorage implementation

The implementation of get_task_graph and get_task_graphs in MySqlMetadataStorage does not validate the pointer parameters before use. This could lead to undefined behaviour if null pointers are passed. Add null checks at the start of these methods:

src/spider/storage/MysqlStorage.cpp: Add null check for task_graph parameter in get_task_graph

src/spider/storage/MysqlStorage.cpp: Add null check for task_graphs parameter in get_task_graphs

Example pattern to follow:

if (task_graph == nullptr) { return StorageErr::InvalidArgument; }

🔗 Analysis chain

Verify null pointer handling in implementations

The change from references to pointers for output parameters is good for explicit ownership, but ensure all implementations include proper null checks.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash # Search for implementations to verify null checks ast-grep --pattern 'if ($pointer == nullptr) { $$$ }'

Length of output: 195

Script:

#!/bin/bash # Search for concrete implementations of MetadataStorage rg -l "class.*:.*public.*MetadataStorage" # Then search for get_task_graph implementations ast-grep --pattern 'auto get_task_graph(boost::uuids::uuid id, TaskGraph* task_graph)' # And get_task_graphs implementations ast-grep --pattern 'auto get_task_graphs(std::vector<boost::uuids::uuid>* task_graphs)'

Length of output: 258

Script:

#!/bin/bash # Check implementation in MysqlStorage rg -A 10 "get_task_graph.*TaskGraph\*" src/spider/storage/MysqlStorage.hpp rg -A 10 "get_task_graph.*TaskGraph\*" src/spider/storage/MysqlStorage.cpp rg -A 10 "get_task_graphs.*vector.*uuid" src/spider/storage/MysqlStorage.hpp rg -A 10 "get_task_graphs.*vector.*uuid" src/spider/storage/MysqlStorage.cpp

Length of output: 3205

coderabbitai · 2024-11-05T00:53:01Z

src/spider/storage/MysqlStorage.hpp

+    set_scheduler_state(boost::uuids::uuid id, std::string const& state) -> StorageErr override;
+
+private:
+    sql::Connection* m_conn = nullptr;


🛠️ Refactor suggestion

Consider using smart pointer for database connection

The raw pointer m_conn could lead to memory leaks. Consider using std::unique_ptr for automatic resource management.

- sql::Connection* m_conn = nullptr; + std::unique_ptr<sql::Connection> m_conn{nullptr};

Committable suggestion skipped: line range outside the PR's diff.

coderabbitai · 2024-11-05T00:53:02Z

src/spider/storage/MysqlStorage.cpp

+                m_conn->prepareStatement("UPDATE divers SET (heartbeat) VALUES "
+                                         "(CURRENT_TIMESTAMP()) WHERE id = ?")


⚠️ Potential issue

Typo in Table Name and Incorrect SQL UPDATE Syntax

The SQL statement references divers, but the correct table name is drivers. Additionally, the UPDATE syntax is incorrect.

Please correct the table name and fix the SQL syntax:

-m_conn->prepareStatement("UPDATE divers SET (heartbeat) VALUES (CURRENT_TIMESTAMP()) WHERE id = ?") +m_conn->prepareStatement("UPDATE drivers SET heartbeat = CURRENT_TIMESTAMP() WHERE id = ?")

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

m_conn->prepareStatement("UPDATE divers SET (heartbeat) VALUES "

"(CURRENT_TIMESTAMP()) WHERE id = ?")

m_conn->prepareStatement("UPDATE drivers SET heartbeat = CURRENT_TIMESTAMP() WHERE id = ?")

coderabbitai · 2024-11-05T00:53:02Z

src/spider/storage/MysqlStorage.cpp

+                m_conn->prepareStatement("SELECT id, key, value, hard_locality "
+                                         "FROM schedulers WHERE id = ?")
+        );
+        sql::bytes id_bytes = uuid_get_bytes(id);


⚠️ Potential issue

Incorrect Table Name in 'get_data' Method

The SELECT query is retrieving data from the schedulers table instead of the data table.

Please correct the table name:

-m_conn->prepareStatement("SELECT id, key, value, hard_locality FROM schedulers WHERE id = ?") +m_conn->prepareStatement("SELECT id, key, value, hard_locality FROM data WHERE id = ?")

Committable suggestion skipped: line range outside the PR's diff.

coderabbitai · 2024-11-05T00:53:03Z

src/spider/storage/MysqlStorage.cpp

+                m_conn->prepareStatement("INSERT INTO data_ref_driver (id, "
+                                         "task_id) VALUES(?, ?)")


⚠️ Potential issue

Mismatch in Column Name for 'data_ref_driver' Table

The INSERT statement uses task_id instead of driver_id for the data_ref_driver table.

Please correct the column name:

-m_conn->prepareStatement("INSERT INTO data_ref_driver (id, task_id) VALUES(?, ?)") +m_conn->prepareStatement("INSERT INTO data_ref_driver (id, driver_id) VALUES(?, ?)")

Committable suggestion skipped: line range outside the PR's diff.

coderabbitai · 2024-11-05T00:53:03Z

src/spider/storage/MysqlStorage.cpp

+    id BINARY(16) NOT NULL,
+    task_id BINARY(16) NOT NULL,
+    start_time TIMESTAMP NOT NULL,
+    CONSTRAINT instance_task_id FOREIGN KEY (task_id) REFERENCES task (id) ON UPDATE NO ACTION ON DELETE CASCADE,


⚠️ Potential issue

Incorrect Foreign Key Reference to Non-Existent Table 'task'

The foreign key constraint references a non-existent table task. The correct table name should be tasks.

Please update the foreign key reference:

-CONSTRAINT instance_task_id FOREIGN KEY (task_id) REFERENCES task (id) ON UPDATE NO ACTION ON DELETE CASCADE, +CONSTRAINT instance_task_id FOREIGN KEY (task_id) REFERENCES tasks (id) ON UPDATE NO ACTION ON DELETE CASCADE,

Committable suggestion skipped: line range outside the PR's diff.

coderabbitai · 2024-11-05T00:53:03Z

src/spider/storage/MysqlStorage.cpp

+                m_conn->prepareStatement("DELETE FROM data_ref_driver "
+                                         "WHERE id = ? AND task_id = ?")


⚠️ Potential issue

Incorrect Column Name in 'remove_driver_reference' Method

The DELETE statement incorrectly references task_id instead of driver_id in the WHERE clause.

Please amend the column name:

-m_conn->prepareStatement("DELETE FROM data_ref_driver WHERE id = ? AND task_id = ?") +m_conn->prepareStatement("DELETE FROM data_ref_driver WHERE id = ? AND driver_id = ?")

Committable suggestion skipped: line range outside the PR's diff.

coderabbitai

Actionable comments posted: 3

🧹 Outside diff range and nitpick comments (4)

src/spider/core/Data.hpp (2)
20-23: Add documentation and parameter validation

While the constructor implementation is correct, consider:

Adding documentation explaining when to use this constructor vs others

Adding validation for the UUID parameter (e.g., check for nil UUID)
+    /**
+     * Constructs a Data object with a pre-existing UUID
+     * @param id The UUID to use (must not be nil)
+     * @param key The key associated with this data
+     * @param value The value to store
+     * @throws std::invalid_argument if id is nil
+     */
     Data(boost::uuids::uuid id, std::string key, std::string value)
-            : m_id(id),
+            : m_id(id.is_nil() ? throw std::invalid_argument("UUID cannot be nil") : id),
               m_key(std::move(key)),
               m_value(std::move(value)) {}
37-37: Optimize set_locality with move semantics

The current implementation copies the locality vector. Consider using move semantics for better performance with large vectors.
-    void set_locality(std::vector<std::string> const& locality) { m_locality = locality; }
+    void set_locality(std::vector<std::string> locality) { m_locality = std::move(locality); }
src/spider/storage/MysqlStorage.cpp (2)
168-172: Consider using reinterpret_cast instead of C-style cast.

While the C-style cast works, using reinterpret_cast would be more idiomatic in C++ and makes the casting intention clearer.
-    return {(char*)id.data(), id.size()};
+    return {reinterpret_cast<char*>(id.data()), id.size()};
752-753: Replace magic number with named constant.

The value 0.0001 is used as a threshold for timeout comparison. This should be defined as a named constant for better maintainability.
+    constexpr float cMinTimeoutThreshold = 0.0001f;
     // ...
-                "tasks.id WHERE tasks.timeout > 0.0001 AND TIMESTAMPDIFF(MICROSECOND, "
+                "tasks.id WHERE tasks.timeout > cMinTimeoutThreshold AND TIMESTAMPDIFF(MICROSECOND, "

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between a1986e3 and 7db8925.

📒 Files selected for processing (2)

src/spider/core/Data.hpp (2 hunks)
src/spider/storage/MysqlStorage.cpp (1 hunks)

🧰 Additional context used

🪛 cppcheck

src/spider/storage/MysqlStorage.cpp

[performance] 87-87: Variable 'id' is assigned in constructor body. Consider performing initialization in initialization list.

(useInitializationList)

[performance] 114-114: Variable 'm_id' is assigned in constructor body. Consider performing initialization in initialization list.

(useInitializationList)

[performance] 20-20: Variable 'm_id' is assigned in constructor body. Consider performing initialization in initialization list.

(useInitializationList)

🔇 Additional comments (4)

src/spider/core/Data.hpp (2)

9-9: LGTM! Necessary include for std::vector

The vector include is properly placed and required for the new locality feature.

45-46: Verify MySQL storage implementation details

Let's verify the locality implementation matches MySQL storage requirements.

✅ Verification successful

The MySQL implementation shows proper handling of the locality fields, with:

Appropriate schema definition including hard_locality boolean column and separate data_locality table
Complete CRUD operations for both m_hard_locality and m_locality vector
Foreign key constraints ensuring referential integrity

The locality implementation matches MySQL storage requirements

The verification confirms:

hard_locality is stored as a BOOL column in the main data table
m_locality values are stored in a separate data_locality table with proper foreign key relationship
Storage operations handle both fields correctly during insert and select operations
Schema design follows database normalization principles

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for MySQL-specific locality handling
echo "Checking MySQL storage implementation:"
rg -A 5 "locality" --type cpp | grep -i "mysql"

echo -e "\nChecking storage interface requirements:"
ast-grep --pattern 'class $storage : public Storage {
  $$$
  locality($$$) {
    $$$
  }
  $$$
}'

Length of output: 5280

src/spider/storage/MysqlStorage.cpp (2)

32-166: LGTM! Well-structured table definitions and error codes.

The SQL table definitions are well-organized with appropriate constraints, indexes, and data types. The MariadbErr enum provides good coverage of common error scenarios.

🧰 Tools

🪛 cppcheck

[performance] 87-87: Variable 'id' is assigned in constructor body. Consider performing initialization in initialization list.

(useInitializationList)

[performance] 114-114: Variable 'm_id' is assigned in constructor body. Consider performing initialization in initialization list.

(useInitializationList)

383-383: ⚠️ Potential issue

Fix incorrect iteration over heads set.

The code attempts to iterate over heads before adding any elements to it, which will result in no tasks being processed.

-        for (boost::uuids::uuid const task_id : heads) {
+        for (boost::uuids::uuid const& task_id : task_graph.get_head_tasks()) {

Likely invalid or redundant comment.

coderabbitai · 2024-11-05T05:29:00Z

src/spider/storage/MysqlStorage.cpp

+auto MySqlMetadataStorage::update_heartbeat(boost::uuids::uuid id) -> StorageErr {
+    try {
+        std::unique_ptr<sql::PreparedStatement> statement(m_conn->prepareStatement(
+                "UPDATE divers SET heartbeat = CURRENT_TIMESTAMP() WHERE id = ?"


⚠️ Potential issue

Fix typo in table name.

The table name 'divers' should be 'drivers'.

- "UPDATE divers SET heartbeat = CURRENT_TIMESTAMP() WHERE id = ?" + "UPDATE drivers SET heartbeat = CURRENT_TIMESTAMP() WHERE id = ?"

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

"UPDATE divers SET heartbeat = CURRENT_TIMESTAMP() WHERE id = ?"

"UPDATE drivers SET heartbeat = CURRENT_TIMESTAMP() WHERE id = ?"

coderabbitai · 2024-11-05T05:29:01Z

src/spider/storage/MysqlStorage.cpp

+        while (res->next()) {
+            locality.emplace_back(res->getString(1));
+        }


⚠️ Potential issue

Fix incorrect ResultSet usage in locality fetching.

The code uses res instead of locality_res when fetching locality data, which will result in incorrect data being retrieved.

- while (res->next()) { - locality.emplace_back(res->getString(1)); + while (locality_res->next()) { + locality.emplace_back(locality_res->getString(1));

Committable suggestion skipped: line range outside the PR's diff.

coderabbitai · 2024-11-05T05:29:01Z

src/spider/storage/MysqlStorage.cpp

+auto string_to_task_state(std::string const& state) -> spider::core::TaskState {
+    if (state == "pending") {
+        return spider::core::TaskState::Pending;
+    }
+    if (state == "ready") {
+        return spider::core::TaskState::Ready;
+    }
+    if (state == "running") {
+        return spider::core::TaskState::Running;
+    }
+    if (state == "success") {
+        return spider::core::TaskState::Succeed;
+    }
+    if (state == "fail") {
+        return spider::core::TaskState::Failed;
+    }
+    if (state == "cancel") {
+        return spider::core::TaskState::Canceled;
+    }
+    return spider::core::TaskState::Pending;
+}


🛠️ Refactor suggestion

Add logging for unhandled task state.

The default case in string_to_task_state silently returns TaskState::Pending without any warning. This could mask potential issues if an invalid state string is provided.

if (state == "cancel") { return spider::core::TaskState::Canceled; } + // Log warning for unhandled state + std::cerr << "Warning: Unknown task state '" << state << "', defaulting to Pending" << std::endl; return spider::core::TaskState::Pending;

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

auto string_to_task_state(std::string const& state) -> spider::core::TaskState {

if (state == "pending") {

return spider::core::TaskState::Pending;

}

if (state == "ready") {

return spider::core::TaskState::Ready;

}

if (state == "running") {

return spider::core::TaskState::Running;

}

if (state == "success") {

return spider::core::TaskState::Succeed;

}

if (state == "fail") {

return spider::core::TaskState::Failed;

}

if (state == "cancel") {

return spider::core::TaskState::Canceled;

}

return spider::core::TaskState::Pending;

}

auto string_to_task_state(std::string const& state) -> spider::core::TaskState {

if (state == "pending") {

return spider::core::TaskState::Pending;

}

if (state == "ready") {

return spider::core::TaskState::Ready;

}

if (state == "running") {

return spider::core::TaskState::Running;

}

if (state == "success") {

return spider::core::TaskState::Succeed;

}

if (state == "fail") {

return spider::core::TaskState::Failed;

}

if (state == "cancel") {

return spider::core::TaskState::Canceled;

}

// Log warning for unhandled state

std::cerr << "Warning: Unknown task state '" << state << "', defaulting to Pending" << std::endl;

return spider::core::TaskState::Pending;

}

coderabbitai

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (2)

src/spider/storage/MysqlStorage.cpp (2)
47-166: Consider adding indexes for frequently queried columns.

While the schema design is solid, consider adding indexes for columns frequently used in WHERE clauses or JOIN conditions, such as:

tasks.state for querying ready tasks

task_instances.start_time for timeout queries

data.key for key-based lookups (already indexed)

🧰 Tools

🪛 cppcheck

[performance] 87-87: Variable 'id' is assigned in constructor body. Consider performing initialization in initialization list.

(useInitializationList)

[performance] 114-114: Variable 'm_id' is assigned in constructor body. Consider performing initialization in initialization list.

(useInitializationList)

168-172: Consider using reinterpret_cast instead of C-style cast.

While the NOLINT directive suppresses the warning, using reinterpret_cast would be more idiomatic in C++ and makes the casting intention clearer.
-    return {(char*)id.data(), id.size()};
+    return {reinterpret_cast<char*>(id.data()), id.size()};

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 7db8925 and 6c68a95.

📒 Files selected for processing (1)

src/spider/storage/MysqlStorage.cpp (1 hunks)

🧰 Additional context used

🪛 cppcheck

src/spider/storage/MysqlStorage.cpp

[performance] 87-87: Variable 'id' is assigned in constructor body. Consider performing initialization in initialization list.

(useInitializationList)

[performance] 114-114: Variable 'm_id' is assigned in constructor body. Consider performing initialization in initialization list.

(useInitializationList)

[performance] 20-20: Variable 'm_id' is assigned in constructor body. Consider performing initialization in initialization list.

(useInitializationList)

🔇 Additional comments (1)

src/spider/storage/MysqlStorage.cpp (1)

3-42: LGTM! Well-organized includes and error code definitions.

The includes are logically grouped, and the MariadbErr enum provides clear error code definitions for MariaDB-specific errors.

🧰 Tools

🪛 cppcheck

[performance] 20-20: Variable 'm_id' is assigned in constructor body. Consider performing initialization in initialization list.

(useInitializationList)

coderabbitai · 2024-11-05T05:43:47Z

src/spider/storage/MysqlStorage.cpp

+        // Tasks must be added in graph order to avoid the dangling reference.
+        absl::flat_hash_set<boost::uuids::uuid> heads;
+        std::deque<boost::uuids::uuid> queue;
+        // First go ver all heads
+        for (boost::uuids::uuid const task_id : heads) {
+            std::optional<Task> const task_option = task_graph.get_task(task_id);
+            if (!task_option.has_value()) {
+                m_conn->rollback();
+                return StorageErr{StorageErrType::KeyNotFoundErr, "Task graph inconsistent"};
+            }
+            Task const& task = task_option.value();
+            this->add_task(job_id_bytes, task);
+            for (boost::uuids::uuid const id : task_graph.get_child_tasks(task_id)) {
+                queue.push_back(id);
+            }
+        }


⚠️ Potential issue

Fix empty heads set in task graph traversal.

The heads set is empty when used in the loop at line 383, which means no tasks will be processed initially. This could lead to incomplete task graph processing.

- absl::flat_hash_set<boost::uuids::uuid> heads; + absl::flat_hash_set<boost::uuids::uuid> heads = task_graph.get_head_tasks(); std::deque<boost::uuids::uuid> queue; - // First go ver all heads + // First go over all heads for (boost::uuids::uuid const task_id : heads) {

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

// Tasks must be added in graph order to avoid the dangling reference.

absl::flat_hash_set<boost::uuids::uuid> heads;

std::deque<boost::uuids::uuid> queue;

// First go ver all heads

for (boost::uuids::uuid const task_id : heads) {

std::optional<Task> const task_option = task_graph.get_task(task_id);

if (!task_option.has_value()) {

m_conn->rollback();

return StorageErr{StorageErrType::KeyNotFoundErr, "Task graph inconsistent"};

}

Task const& task = task_option.value();

this->add_task(job_id_bytes, task);

for (boost::uuids::uuid const id : task_graph.get_child_tasks(task_id)) {

queue.push_back(id);

}

}

// Tasks must be added in graph order to avoid the dangling reference.

absl::flat_hash_set<boost::uuids::uuid> heads = task_graph.get_head_tasks();

std::deque<boost::uuids::uuid> queue;

// First go over all heads

for (boost::uuids::uuid const task_id : heads) {

std::optional<Task> const task_option = task_graph.get_task(task_id);

if (!task_option.has_value()) {

m_conn->rollback();

return StorageErr{StorageErrType::KeyNotFoundErr, "Task graph inconsistent"};

}

Task const& task = task_option.value();

this->add_task(job_id_bytes, task);

for (boost::uuids::uuid const id : task_graph.get_child_tasks(task_id)) {

queue.push_back(id);

}

}

sitaowang1998 added 12 commits November 1, 2024 14:54

storage: Add mysql storage implementation

fc3f9e6

style: Format code

3e9df9f

fix: Try satisfy clang-tidy

6b3ab0c

fix: Finally beat clang-tidy

0015645

fix: Try fix error in github workflow

b94ed8d

fix: Try fix the file not found for clang-tidy

4b52fb4

fix: Try fix the file not found for clang-tidy by adding include dire…

59f8450

…ctory explicitly

Revert "fix: Try fix the file not found for clang-tidy"

7240d4c

This reverts commit 4b52fb4.

fix: Fix the cmake include directory

2660e3a

build: Remove unnecessary include

d6bef0f

fix: Use fmt::fmt instead of format library from c++20

e490e31

ci: Try remove macOS to see if linux passes

a1986e3

coderabbitai bot reviewed Nov 5, 2024

View reviewed changes

fix: Fix table set syntax and some typos

7db8925

coderabbitai bot reviewed Nov 5, 2024

View reviewed changes

fix: Fix wrong variable and some typos

6c68a95

coderabbitai bot reviewed Nov 5, 2024

View reviewed changes

fix: Fix the vairiable not initialized

8b84f98

sitaowang1998 merged commit 41df760 into y-scope:main Nov 5, 2024
2 checks passed

sitaowang1998 deleted the storage_mysql branch November 5, 2024 06:08

This was referenced Nov 6, 2024

test: Add tests for data storage #21

Merged

test: Add unit tests for metadata storage and fix bugs in MySQL storage backend #23

Merged

This was referenced Nov 29, 2024

feat: Add client interface. #16

Merged

feat: Add scheduler scheduling #31

Merged

This was referenced Dec 7, 2024

feat: Add variable size message sending and receiving through socket #32

Merged

feat: Add scheduler policy and data stroage garbage collection #34

Merged

feat: Add worker client that submit tasks and get next task from schedulers #35

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add MySql support for storage backend #20

feat: Add MySql support for storage backend #20

sitaowang1998 commented Nov 5, 2024 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Nov 5, 2024 •

edited

Loading

Rate limit exceeded

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

coderabbitai bot left a comment

coderabbitai bot Nov 5, 2024

coderabbitai bot Nov 5, 2024

coderabbitai bot Nov 5, 2024

coderabbitai bot Nov 5, 2024

coderabbitai bot Nov 5, 2024

coderabbitai bot Nov 5, 2024

coderabbitai bot Nov 5, 2024

coderabbitai bot Nov 5, 2024

coderabbitai bot Nov 5, 2024

coderabbitai bot Nov 5, 2024

coderabbitai bot left a comment

coderabbitai bot Nov 5, 2024

coderabbitai bot Nov 5, 2024

coderabbitai bot Nov 5, 2024

coderabbitai bot left a comment

coderabbitai bot Nov 5, 2024

		void set_locality(std::vector<std::string>& locality) { m_locality = locality; }

		void set_hard_locality(bool hard) { m_hard_locality = hard; }

		std::vector<std::string> m_locality;
		bool m_hard_locality = false;

		virtual auto get_task_graph(boost::uuids::uuid id, TaskGraph* task_graph) -> StorageErr = 0;
		virtual auto get_task_graphs(std::vector<boost::uuids::uuid>* task_graphs) -> StorageErr = 0;

		m_conn->prepareStatement("UPDATE divers SET (heartbeat) VALUES "
		"(CURRENT_TIMESTAMP()) WHERE id = ?")

	m_conn->prepareStatement("UPDATE divers SET (heartbeat) VALUES "
	"(CURRENT_TIMESTAMP()) WHERE id = ?")
	m_conn->prepareStatement("UPDATE drivers SET heartbeat = CURRENT_TIMESTAMP() WHERE id = ?")

		m_conn->prepareStatement("INSERT INTO data_ref_driver (id, "
		"task_id) VALUES(?, ?)")

		m_conn->prepareStatement("DELETE FROM data_ref_driver "
		"WHERE id = ? AND task_id = ?")

	"UPDATE divers SET heartbeat = CURRENT_TIMESTAMP() WHERE id = ?"
	"UPDATE drivers SET heartbeat = CURRENT_TIMESTAMP() WHERE id = ?"

feat: Add MySql support for storage backend #20

feat: Add MySql support for storage backend #20

Conversation

sitaowang1998 commented Nov 5, 2024 • edited by coderabbitai bot Loading

Description

Validation performed

Summary by CodeRabbit

coderabbitai bot commented Nov 5, 2024 • edited Loading

Rate limit exceeded

Walkthrough

Changes

Possibly related PRs

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Nov 5, 2024

Choose a reason for hiding this comment

coderabbitai bot Nov 5, 2024

Choose a reason for hiding this comment

coderabbitai bot Nov 5, 2024

Choose a reason for hiding this comment

coderabbitai bot Nov 5, 2024

Choose a reason for hiding this comment

coderabbitai bot Nov 5, 2024

Choose a reason for hiding this comment

coderabbitai bot Nov 5, 2024

Choose a reason for hiding this comment

coderabbitai bot Nov 5, 2024

Choose a reason for hiding this comment

coderabbitai bot Nov 5, 2024

Choose a reason for hiding this comment

coderabbitai bot Nov 5, 2024

Choose a reason for hiding this comment

coderabbitai bot Nov 5, 2024

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Nov 5, 2024

Choose a reason for hiding this comment

coderabbitai bot Nov 5, 2024

Choose a reason for hiding this comment

coderabbitai bot Nov 5, 2024

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Nov 5, 2024

Choose a reason for hiding this comment

sitaowang1998 commented Nov 5, 2024 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Nov 5, 2024 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)