Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test: Add tests for data storage #21

Merged
merged 28 commits into from
Nov 8, 2024

Conversation

sitaowang1998
Copy link
Collaborator

@sitaowang1998 sitaowang1998 commented Nov 6, 2024

Description

Add unit tests for data storage.

To run unit test, setup mysql on local machine. Change cStorageUrl in tests/storage/StorageTestHelper.hpp to correct url. Then compile and run unit test.

Validation performed

  • task:lint
  • unit tests pass on ubuntu jammy

Summary by CodeRabbit

Release Notes

  • New Features

    • Integrated the Catch2 testing framework for enhanced testing capabilities.
    • Added new methods in DataStorage for data retrieval and management.
    • Introduced a new constructor in the Data class for flexible object creation.
    • Established a new unit testing executable with associated source files.
    • Expanded linting tasks to include test files for improved code quality.
  • Bug Fixes

    • Improved SQL syntax correctness in table creation and queries.
  • Documentation

    • Added guidelines for setting up a storage backend for unit tests.
  • Chores

    • Updated installation scripts to include necessary libraries and improve error handling.

Copy link

coderabbitai bot commented Nov 6, 2024

Warning

Rate limit exceeded

@sitaowang1998 has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 7 minutes and 21 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between 1e3f1c5 and 5929c68.

Walkthrough

The pull request introduces significant modifications across various files, primarily focusing on enhancing the build configuration with the addition of the Catch2 testing framework and refining library dependencies in CMakeLists.txt. The linting tasks have been expanded to include test files. Changes to the DataStorage and MetadataStorage classes enforce non-copyable designs while introducing new methods for data management. SQL syntax improvements are made in MysqlStorage.cpp, and new testing functionalities are established in the tests directory. Overall, the updates improve testing capabilities, code quality, and data handling.

Changes

File Change Summary
CMakeLists.txt Added Catch2 subdirectory; modified Abseil inclusion; retained library dependency checks.
lint-tasks.yaml Updated linting tasks to include test files in G_TEST_DIR; modified clang-tidy and format tasks.
src/spider/CMakeLists.txt Updated source and header file configurations for spider_core; simplified library addition logic.
src/spider/core/Data.hpp Added new constructor for Data class using boost::uuids::uuid.
src/spider/core/Error.hpp Replaced operator bool() with success() method in StorageErr struct.
src/spider/storage/DataStorage.hpp Deleted copy/move constructors and assignment operators; added get_data_by_key and remove_data.
src/spider/storage/MetadataStorage.hpp Added default constructor; deleted copy/move constructors; renamed methods for job management.
src/spider/storage/MysqlStorage.cpp Added backticks to SQL statements; modified method signatures for MySqlDataStorage.
src/spider/storage/MysqlStorage.hpp Updated constructors and methods for MySqlMetadataStorage and MySqlDataStorage.
taskfile.yaml Added G_TEST_DIR variable for test directory specification.
tests/CMakeLists.txt Created unitTest executable; defined SPIDER_TEST_SOURCES for testing files.
tests/storage/StorageTestHelper.hpp Introduced helper functions and cStorageUrl for database connection in tests.
tests/storage/test-DataStorage.cpp Added comprehensive test cases for DataStorage functionality.
tests/utils/CoreDataUtils.hpp Added data_equal function for comparing Data objects.
src/spider/core/TaskGraph.hpp Removed m_id member variable and related methods; simplified constructor.
cmake/Modules/FindMariaDBClientCpp.cmake Consolidated include directory settings for MariaDBClientCpp.
tools/scripts/lib_install/macOS/install-lib.sh Updated installation script to include mariadb-connector-c.
tools/scripts/lib_install/mariadb-connector-cpp.sh Enhanced cmake command with additional parameters for installation.
docs/tests.md Added documentation for setting up a MySQL backend for unit tests.
tools/scripts/lib_install/linux/install-lib.sh Added installation for libmariadb-dev with elevated privileges check.

Possibly related PRs


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🧹 Outside diff range and nitpick comments (14)
tests/CMakeLists.txt (1)

1-7: Consider separating header files and improving variable configuration

A few suggestions to improve the test sources configuration:

  1. Header files should not be listed in SOURCES variable
  2. CACHE INTERNAL is typically unnecessary for test sources
  3. Consider using separate variables for headers and sources

Here's the suggested improvement:

-set(SPIDER_TEST_SOURCES
-    storage/test-DataStorage.cpp
-    storage/StorageTestHelper.hpp
-    utils/CoreDataUtils.hpp
-    CACHE INTERNAL
-    "spider test source files"
-)
+set(SPIDER_TEST_SOURCES
+    storage/test-DataStorage.cpp
+)
+
+set(SPIDER_TEST_HEADERS
+    storage/StorageTestHelper.hpp
+    utils/CoreDataUtils.hpp
+)
tests/utils/CoreDataUtils.hpp (3)

1-1: Consider using project-relative includes

The relative path with multiple parent directories (../../) can become difficult to maintain if the directory structure changes. Consider using project-relative includes by adding the source directory to the include path in CMake.

-#include "../../src/spider/core/Data.hpp"
+#include "spider/core/Data.hpp"

5-27: Add function documentation

The function would benefit from documentation explaining its purpose, parameters, and return value. Consider adding a descriptive comment block.

+/**
+ * Compares two Data objects for equality.
+ *
+ * @param d1 First Data object to compare
+ * @param d2 Second Data object to compare
+ * @return true if all fields are equal, false otherwise
+ */
 inline auto data_equal(Data const& d1, Data const& d2) -> bool {

5-27: Consider a more concise implementation

The current implementation, while clear, could be more concise by combining conditions. However, this is a trade-off between readability and conciseness.

 inline auto data_equal(Data const& d1, Data const& d2) -> bool {
-    if (d1.get_id() != d2.get_id()) {
-        return false;
-    }
-
-    if (d1.get_key() != d2.get_key()) {
-        return false;
-    }
-
-    if (d1.get_locality() != d2.get_locality()) {
-        return false;
-    }
-
-    if (d1.is_hard_locality() != d2.is_hard_locality()) {
-        return false;
-    }
-
-    if (d1.get_value() != d2.get_value()) {
-        return false;
-    }
-
-    return true;
+    return d1.get_id() == d2.get_id()
+        && d1.get_key() == d2.get_key()
+        && d1.get_locality() == d2.get_locality()
+        && d1.is_hard_locality() == d2.is_hard_locality()
+        && d1.get_value() == d2.get_value();
 }
src/spider/CMakeLists.txt (1)

4-13: Improve header list organization

The header organization is logical, but we could enhance readability with grouped components.

Consider this organization:

 set(SPIDER_CORE_HEADERS
+    # Core components
     core/Error.hpp
     core/Data.hpp
     core/Task.hpp
     core/TaskGraph.hpp
+    # Storage components
     storage/MetadataStorage.hpp
     storage/DataStorage.hpp
     storage/MysqlStorage.hpp
     CACHE INTERNAL
     "spider core header files"
)
src/spider/storage/DataStorage.hpp (1)

26-27: Consider enhancing method documentation and error handling

While the new methods add valuable functionality, they would benefit from:

  1. Documentation specifying:
    • Behaviour when data pointer is null in get_data_by_key
    • Error handling for non-existent keys/IDs
    • Ownership semantics for the data parameter
  2. Consider returning std::optional<Data> or expected<Data, StorageErr> instead of using an output parameter for get_data_by_key
src/spider/core/Data.hpp (2)

16-17: Add documentation for the new constructor.

The implementation looks good, but could benefit from documentation explaining its purpose and use cases.

Add a documentation comment:

+    /**
+     * Constructs a Data object with a specific UUID and value.
+     * @param id The UUID to use for this data
+     * @param value The value to store
+     */
     Data(boost::uuids::uuid id, std::string value) : m_id(id), m_value(std::move(value)) {}

16-17: Consider validating the UUID parameter.

The constructor accepts any UUID without validation. Consider checking for nil UUIDs to prevent potential issues.

Example implementation:

     Data(boost::uuids::uuid id, std::string value) : m_id(id), m_value(std::move(value)) {
+        if (id.is_nil()) {
+            throw std::invalid_argument("UUID cannot be nil");
+        }
     }
src/spider/storage/MetadataStorage.hpp (1)

15-19: LGTM! Proper enforcement of unique ownership semantics.

The deletion of copy/move constructors and assignment operators while providing a default constructor enforces proper RAII principles and prevents accidental sharing of storage resources.

Consider documenting these ownership semantics in a comment block above the class declaration to make it clear to users of this interface that implementations should maintain exclusive ownership of their resources.

tests/storage/test-DataStorage.cpp (2)

14-21: Add documentation for the helper function.

While the function is well-implemented with proper type constraints and memory management, it would benefit from documentation explaining its purpose and requirements.

Add documentation like this:

 namespace {
+/**
+ * Creates a new instance of a DataStorage-derived class.
+ * 
+ * @tparam T The concrete storage class type that derives from DataStorage
+ * @return std::unique_ptr<spider::core::DataStorage> Smart pointer to the created storage instance
+ */
 template <class T>
 requires std::derived_from<T, spider::core::DataStorage>
 auto create_data_storage() -> std::unique_ptr<spider::core::DataStorage> {

22-46: Consider adding RAII-style cleanup.

While the test case covers the core functionality well, consider using a fixture or RAII pattern to ensure proper cleanup even if the test fails midway.

Here's a suggested approach:

class StorageFixture {
    std::unique_ptr<spider::core::DataStorage> storage;
public:
    StorageFixture() {
        storage = create_data_storage<TestType>();
        storage->connect(spider::test::cStorageUrl);
        storage->initialize();
    }
    ~StorageFixture() {
        // Cleanup logic here
    }
    auto get_storage() -> spider::core::DataStorage& { return *storage; }
};

TEMPLATE_TEST_CASE("Add, get and remove data", "[storage]", spider::core::MySqlDataStorage) {
    StorageFixture fixture;
    auto& storage = fixture.get_storage();
    // Rest of the test...
}
🧰 Tools
🪛 cppcheck

[error] 22-22: syntax error

(syntaxError)

lint-tasks.yaml (1)

73-76: Consider optimizing the clang-format tasks

The separate clang-format task for the test directory could be combined with the source directory task to reduce duplication.

  cpp-format-fix:
    sources: *cpp_format_src_files
    deps: ["cpp-configs", "venv"]
    cmds:
      - task: "clang-format"
        vars:
          FLAGS: "-i"
-         SRC_DIR: "{{.G_SRC_SPIDER_DIR}}"
-      - task: "clang-format"
-        vars:
-          FLAGS: "-i"
-          SRC_DIR: "{{.G_TEST_DIR}}"
+         DIRS: "{{.G_SRC_SPIDER_DIR}} {{.G_TEST_DIR}}"

You'll also need to update the clang-format task definition accordingly:

  clang-format:
    internal: true
    requires:
-     vars: ["FLAGS", "SRC_DIR"]
+     vars: ["FLAGS", "DIRS"]
    cmd: |-
      . "{{.G_LINT_VENV_DIR}}/bin/activate"
-     find "{{.SRC_DIR}}" \
+     find {{.DIRS}} \
        -type f \
        \( -iname "*.cpp" -o -iname "*.h" -o -iname "*.hpp" \) \
        -print0 | \
          xargs -0 --no-run-if-empty clang-format {{.FLAGS}} -Werror
src/spider/storage/MysqlStorage.cpp (2)

116-124: Consider specifying index length for the key column.

For the UNIQUE KEY on the key column in the data table, consider specifying an index length to optimize performance and storage space, especially since MySQL has limitations on index sizes.

Apply this diff:

-    UNIQUE KEY (`key`) USING BTREE,
+    UNIQUE KEY (`key`(64)) USING BTREE,

166-170: Replace C-style cast with reinterpret_cast.

While the current code is marked with NOLINT, consider using reinterpret_cast for better type safety and clearer intent.

Apply this diff:

-    return {(char const*)id.data(), id.size()};
+    return {reinterpret_cast<char const*>(id.data()), id.size()};
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 41df760 and b0232a1.

📒 Files selected for processing (14)
  • CMakeLists.txt (1 hunks)
  • lint-tasks.yaml (5 hunks)
  • src/spider/CMakeLists.txt (1 hunks)
  • src/spider/core/Data.hpp (1 hunks)
  • src/spider/core/Error.hpp (1 hunks)
  • src/spider/storage/DataStorage.hpp (2 hunks)
  • src/spider/storage/MetadataStorage.hpp (1 hunks)
  • src/spider/storage/MysqlStorage.cpp (40 hunks)
  • src/spider/storage/MysqlStorage.hpp (2 hunks)
  • taskfile.yaml (1 hunks)
  • tests/CMakeLists.txt (1 hunks)
  • tests/storage/StorageTestHelper.hpp (1 hunks)
  • tests/storage/test-DataStorage.cpp (1 hunks)
  • tests/utils/CoreDataUtils.hpp (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • tests/storage/StorageTestHelper.hpp
🧰 Additional context used
🪛 cppcheck
src/spider/storage/MysqlStorage.cpp

[performance] 87-87: Variable 'id' is assigned in constructor body. Consider performing initialization in initialization list.

(useInitializationList)


[performance] 114-114: Variable 'm_id' is assigned in constructor body. Consider performing initialization in initialization list.

(useInitializationList)

tests/storage/test-DataStorage.cpp

[error] 22-22: syntax error

(syntaxError)

🔇 Additional comments (16)
src/spider/core/Error.hpp (1)

29-29: Excellent improvement to error handling clarity!

The replacement of the implicit boolean operator with an explicit success() method is a positive change that:

  • Makes success checks more self-documenting
  • Prevents accidental misuse through the [[nodiscard]] attribute
  • Aligns with modern C++ best practices

Let's verify the impact on existing code:

✅ Verification successful

Change is safe to proceed with

The search results show that StorageErr is primarily used as a return type in method declarations, particularly in storage-related interfaces and implementations. No direct boolean operator usage was found in conditional statements, suggesting that the code already uses explicit error checking rather than implicit boolean conversions.

The change from boolean operator to explicit success() method is localized to the following files that include Error.hpp:

  • tests/storage/test-DataStorage.cpp
  • src/spider/storage/MysqlStorage.hpp
  • src/spider/storage/MetadataStorage.hpp
  • src/spider/storage/DataStorage.hpp
  • src/spider/storage/MysqlStorage.cpp
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Search for potential boolean operator usage that needs updating
# Look for common patterns where StorageErr might be used in conditions

# Search for if statements with StorageErr
rg "if\s*\(\s*\w+\s*\)" --type cpp | rg -i "storage.*err"

# Search for while statements with StorageErr
rg "while\s*\(\s*\w+\s*\)" --type cpp | rg -i "storage.*err"

# Search for ternary operations with StorageErr
rg "\?\s*.*\:" --type cpp | rg -i "storage.*err"

Length of output: 256


Script:

#!/bin/bash
# Let's try a different approach to find potential usages of StorageErr in conditions

# Search for StorageErr usage in conditional contexts
rg -t cpp "if.*StorageErr|while.*StorageErr" -A 2

# Search for direct usage of StorageErr variables
rg -t cpp "StorageErr\s+\w+" -A 2

# Look for files that include Error.hpp to analyze potential impacts
rg -t cpp "#include.*Error\.hpp"

Length of output: 4769

taskfile.yaml (1)

14-14: LGTM! The test directory variable is well-structured.

The addition of G_TEST_DIR follows existing naming conventions and path structures, properly supporting the test infrastructure enhancement.

Let's verify the variable usage in the referenced lint tasks:

✅ Verification successful

Variable usage verified and properly integrated

The G_TEST_DIR variable is correctly referenced across multiple tasks in lint-tasks.yaml:

  • Used in file pattern matching for C++ linting
  • Set as SRC_DIR for static analysis
  • Set as TEST_DIR for test-related tasks
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify G_TEST_DIR usage in lint-tasks.yaml
# Expected: Should find references to G_TEST_DIR in linting configurations

rg -A 3 "G_TEST_DIR" lint-tasks.yaml

Length of output: 677

src/spider/CMakeLists.txt (2)

Line range hint 1-35: Overall CMake configuration looks good

The CMake configuration properly supports the testing infrastructure while maintaining clear component organization and dependency management. The separation of sources, headers, and their respective visibilities is well-structured.


2-2: Verify completeness of SPIDER_CORE_SOURCES

The core library appears to contain only a single source file. Given that the headers list includes multiple components (Core, Storage), we should verify that all necessary implementation files are included.

✅ Verification successful

SPIDER_CORE_SOURCES is correctly defined

The source list is complete and properly structured. The worker.cpp file is intentionally separated into SPIDER_WORKER_SOURCES as it belongs to the worker executable target, not the core library. The core library appropriately contains only the MySQL storage implementation, which aligns with the PR's focus on data storage testing.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for potential missing source files
# Look for .cpp files that might need to be included in SPIDER_CORE_SOURCES

echo "Searching for potential source files in core/ and storage/ directories:"
fd -e cpp --type f . src/spider/core src/spider/storage

Length of output: 246


Script:

#!/bin/bash
# Let's verify if there are any other source files in the spider directory
# that might need to be included
fd -e cpp --type f . src/spider

# Also check if there are any other CMakeLists.txt files that might
# append to SPIDER_CORE_SOURCES
fd CMakeLists.txt src/spider

# Look at the content of any found CMakeLists.txt to see if they modify SPIDER_CORE_SOURCES
fd CMakeLists.txt src/spider -x cat {}

Length of output: 1228

src/spider/storage/DataStorage.hpp (2)

13-17: Well-implemented Rule of Five!

The explicit deletion of copy/move operations enforces proper RAII semantics and prevents accidental resource duplication, which is crucial for a storage interface.


Line range hint 13-27: Verify interface implementation completeness

Let's ensure all derived classes properly implement the new interface and respect the non-copyable semantics.

✅ Verification successful

Let me generate additional verification scripts to check for any other derived classes and ensure complete implementation coverage.


Interface implementation and usage patterns are properly aligned

The verification confirms:

  • MySqlDataStorage is the only implementation of DataStorage and properly implements all required methods including get_data_by_key and remove_data
  • Non-copyable semantics are respected throughout the codebase, with proper deletion of copy constructors/assignments
  • Usage patterns show only safe handling via std::unique_ptr in test code
  • No dangerous copy operations or raw pointer usage detected
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify implementation of new methods and usage of DataStorage

# Check for derived classes that need to implement the new methods
echo "Checking for DataStorage implementations..."
ast-grep --pattern 'class $_ : public spider::core::DataStorage'

# Look for potential copy attempts that would now fail
echo "Checking for potential copy operations..."
rg "DataStorage\(" --type cpp

# Verify all pure virtual methods are implemented
echo "Checking for get_data_by_key implementations..."
rg "get_data_by_key.*StorageErr" --type cpp

echo "Checking for remove_data implementations..."
rg "remove_data.*StorageErr" --type cpp

Length of output: 1892


Script:

#!/bin/bash
# Search for any other storage implementations we might have missed
echo "Checking for other storage implementations..."
fd -e hpp -e cpp -e h | xargs rg "class.*Storage.*public"

# Check for any DataStorage object instantiations or assignments
echo "Checking DataStorage usage patterns..."
rg "std::.*<.*DataStorage" --type cpp
rg "= new.*Storage" --type cpp
rg "make_.*<.*Storage>" --type cpp

# Verify the complete interface implementation in MySqlDataStorage
echo "Checking MySqlDataStorage interface implementation..."
ast-grep --pattern 'class MySqlDataStorage : public spider::core::DataStorage {
  $$$
}'

Length of output: 1325

src/spider/storage/MetadataStorage.hpp (1)

15-19: Verify impact on existing storage implementations.

The changes to ownership semantics might affect existing implementations of this interface.

✅ Verification successful

No impact on existing implementations - ownership semantics already aligned

The existing MySqlMetadataStorage implementation already follows the same ownership semantics as the base class, with copy and move operations explicitly deleted. The implementation maintains a raw pointer to a connection but properly manages its lifetime through connect() and close() methods.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Find all classes implementing MetadataStorage to verify compatibility
# with the new ownership semantics

# Search for class declarations that inherit from MetadataStorage
ast-grep --pattern 'class $_ : $$$public MetadataStorage$$$'

# Also check for any direct instantiations of MetadataStorage subclasses
# that might be doing copies or moves
rg -l "new \w+MetadataStorage|= \w+MetadataStorage|std::make_\w+<\w+MetadataStorage>"

Length of output: 150


Script:

#!/bin/bash
# Let's try a different approach to find implementations and usage

# Find files that include MetadataStorage.hpp
rg -l "MetadataStorage\.hpp"

# Look for class definitions that might inherit from MetadataStorage
# using a broader pattern
rg -l "class.*MetadataStorage"

# Search for any usage of MetadataStorage
rg -l "MetadataStorage"

# Look for potential factory methods or constructors
rg -A 5 "MetadataStorage::|->MetadataStorage"

Length of output: 13395


Script:

#!/bin/bash
# Let's examine the MySqlStorage implementation to verify ownership semantics

# Check MysqlStorage.hpp for its class definition and member variables
cat src/spider/storage/MysqlStorage.hpp

# Look for any copy/move operations in MysqlStorage
rg -A 2 "MySqlMetadataStorage\(.*\)" src/spider/storage/MysqlStorage.hpp

Length of output: 4892

tests/storage/test-DataStorage.cpp (2)

1-13: LGTM! Well-organized includes and proper lint suppressions.

The includes are logically organized and the NOLINT suppressions are appropriately scoped to specific rules.


1-79: Overall implementation is solid and well-tested.

The test cases provide good coverage of the DataStorage functionality, including error cases and data integrity checks. The suggested improvements are mainly for maintainability and don't impact the correctness of the tests.

Note: The cppcheck syntax error can be safely ignored as it appears to be a false positive related to template syntax parsing.

🧰 Tools
🪛 cppcheck

[error] 22-22: syntax error

(syntaxError)

CMakeLists.txt (2)

138-139: LGTM! Good choice of testing framework.

The addition of Catch2 is appropriate for C++ unit testing and aligns well with the PR's objective of adding data storage tests.


136-136: Verify if abseil-cpp is required in default build target.

The removal of EXCLUDE_FROM_ALL means abseil-cpp will now be built by default, which could impact build times. Please confirm if this is necessary for the test framework or if it can remain excluded from the default target.

✅ Verification successful

Abseil library is required in default build target

The removal of EXCLUDE_FROM_ALL is necessary as abseil-cpp is used in core components:

  • TaskGraph uses absl::flat_hash_map and absl::flat_hash_set for essential data structures
  • MysqlStorage depends on TaskGraph and uses abseil collections
  • These components are tested in tests/storage/test-DataStorage.cpp

Since these are core data structures used in both the main application and tests, abseil should be included in the default build target.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check if abseil is used in test files
echo "Checking abseil usage in test files..."
rg -l "absl::" tests/

Length of output: 108


Script:

#!/bin/bash
# Let's check for abseil usage in the entire codebase
echo "Checking abseil usage across the codebase..."
rg -l "absl::" .

# Also check for any direct includes of abseil headers
echo -e "\nChecking abseil header includes..."
rg -l "#include.*absl/" .

# Check if there are any CMake dependencies on abseil
echo -e "\nChecking CMake dependencies on abseil..."
rg -l "abseil" . --type cmake

Length of output: 540


Script:

#!/bin/bash
# Let's examine the actual usage in these files to understand if it's essential
echo "Checking MysqlStorage.cpp abseil usage..."
rg "absl::" src/spider/storage/MysqlStorage.cpp -B 2 -A 2

echo -e "\nChecking TaskGraph.hpp abseil usage..."
rg "absl::" src/spider/core/TaskGraph.hpp -B 2 -A 2

# Check if these files are used in tests
echo -e "\nChecking if these files are referenced in tests..."
rg "MysqlStorage|TaskGraph" tests/ -l

Length of output: 1284

src/spider/storage/MysqlStorage.hpp (2)

22-27: Well-implemented Rule of Five declarations!

The special member functions are properly declared to prevent object slicing and ensure unique ownership semantics. The virtual destructor is correctly marked as override.


63-68: Proper implementation of Rule of Five!

The special member functions are correctly declared to prevent object slicing and ensure unique ownership semantics.

lint-tasks.yaml (3)

53-55: Excellent addition of test files to format checking!

The inclusion of test files in the formatting checks will help maintain consistent code style throughout the codebase.


89-91: Well-structured addition of test files to static analysis!

The inclusion ensures test files receive the same level of static analysis as source files, maintaining code quality standards across the entire codebase.


137-137: Excellent implementation of the clang-tidy task!

The approach of combining source and test directories in a single find command is efficient and well-structured. This serves as a good example for the format-fix task optimization suggested earlier.

Also applies to: 140-140

Comment on lines +9 to +12
add_executable(unitTest)
target_sources(unitTest PRIVATE ${SPIDER_TEST_SOURCES})
target_link_libraries(unitTest PRIVATE Catch2::Catch2WithMain)
target_link_libraries(unitTest PRIVATE spider_core)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codebase verification

Catch2 submodule needs initialization

The Catch2 dependency is properly configured in CMake via submodules, but the submodule hasn't been initialized. The developer needs to run:

git submodule update --init --recursive

before building the tests.

  • The .gitmodules file correctly references Catch2 at submodules/Catch2
  • The - prefix in git submodule status indicates the submodule is not initialized
  • The empty Catch2 directory confirms this state
🔗 Analysis chain

Verify Catch2 dependency setup

The test executable configuration looks good, following CMake best practices with proper visibility modifiers and dependencies.

Let's verify the Catch2 setup:

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify Catch2 is properly configured in the project

# Check if Catch2 is included in the main CMakeLists.txt
echo "Checking Catch2 configuration..."
rg "add_subdirectory.*Catch2" CMakeLists.txt

# Verify Catch2 submodule exists
echo "Checking Catch2 submodule..."
if [ -d "submodules/Catch2" ]; then
    echo "Catch2 submodule found"
    cat submodules/Catch2/CMakeLists.txt | head -n 5
else
    echo "Warning: Catch2 submodule not found"
fi

Length of output: 426


Script:

#!/bin/bash
# Let's verify how Catch2 is being found/configured in the project

# Check for FetchContent usage for Catch2
echo "Checking for FetchContent usage..."
rg "FetchContent.*Catch2" -A 5

# Check for find_package usage for Catch2
echo "Checking for find_package usage..."
rg "find_package.*Catch2"

# Look for any Catch2 related configuration
echo "Checking for any Catch2 related config..."
rg "Catch2" CMakeLists.txt

Length of output: 374


Script:

#!/bin/bash
# Let's check the git submodules configuration and status

# Check .gitmodules file
echo "Checking .gitmodules configuration..."
cat .gitmodules

# Check git submodule status
echo -e "\nChecking git submodule status..."
git submodule status

# List contents of submodules directory
echo -e "\nListing submodules directory..."
ls -la submodules/

Length of output: 1063

Comment on lines +16 to +18
add_library(spider_core)
target_sources(spider_core PRIVATE ${SPIDER_CORE_SOURCES})
target_sources(spider_core PUBLIC ${SPIDER_CORE_HEADERS})
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codebase verification

Library type should be explicitly defined

The removal of static/shared configuration in src/spider/CMakeLists.txt conflicts with the project-wide build configuration. The main CMakeLists.txt still manages the SPIDER_USE_STATIC_LIBS option which affects multiple dependencies (Boost, spdlog) and has platform-specific handling. The library type should be explicitly set based on this global setting.

  • In src/spider/CMakeLists.txt, restore the conditional library type:
if(SPIDER_USE_STATIC_LIBS)
    add_library(spider_core STATIC)
else()
    add_library(spider_core SHARED)
endif()
🔗 Analysis chain

Consider preserving build flexibility

The removal of the static/shared library condition might limit build configuration options. Was this intentional?


Let me check the content of these CMake files to better understand the library configuration.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for other components that might depend on library type

echo "Checking for potential library type dependencies:"
rg -l "SPIDER_USE_STATIC_LIBS" .

echo "Checking for library references in test configurations:"
fd CMakeLists.txt -t f tests/

Length of output: 332


Script:

#!/bin/bash
# Check the main CMakeLists.txt for static library configuration
echo "=== Main CMakeLists.txt ==="
rg "SPIDER_USE_STATIC_LIBS" CMakeLists.txt -A 3 -B 3

echo -e "\n=== Test CMakeLists.txt ==="
cat tests/CMakeLists.txt

echo -e "\n=== Previous spider library configuration ==="
git log -p src/spider/CMakeLists.txt | grep -A 10 "add_library"

Length of output: 4035

Comment on lines 48 to 76
TEMPLATE_TEST_CASE(
"Add, get and remove data with key",
"[storage]",
spider::core::MySqlDataStorage
) {
std::unique_ptr<spider::core::DataStorage> storage = create_data_storage<TestType>();
REQUIRE(storage->connect(spider::test::cStorageUrl).success());
REQUIRE(storage->initialize().success());

// Add data
spider::core::Data const data{"key", "value"};
REQUIRE(storage->add_data(data).success());

// Add data with same key again should fail
spider::core::Data const data_same_key{"key", "value2"};
REQUIRE(spider::core::StorageErrType::DuplicateKeyErr == storage->add_data(data_same_key).type);

// Get data should match
spider::core::Data result{"temp"};
REQUIRE(storage->get_data_by_key("key", &result).success());
REQUIRE(spider::core::data_equal(data, result));

// Remove data should succeed
REQUIRE(storage->remove_data(data.get_id()).success());

// Get data should fail
REQUIRE(spider::core::StorageErrType::KeyNotFoundErr
== storage->get_data_by_key("key", &result).type);
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Reduce code duplication between test cases.

The setup and teardown code is duplicated between the two test cases. Consider extracting common test setup into shared helper functions or using the fixture pattern suggested earlier.

Example approach:

namespace {
// Test helpers
template<typename T>
void verify_duplicate_key_error(spider::core::DataStorage& storage, 
                              spider::core::Data const& duplicate_data) {
    REQUIRE(spider::core::StorageErrType::DuplicateKeyErr == 
            storage.add_data(duplicate_data).type);
}

template<typename T>
void verify_data_retrieval(spider::core::DataStorage& storage,
                          spider::core::Data const& expected) {
    spider::core::Data result{"temp"};
    REQUIRE(storage.get_data(expected.get_id(), &result).success());
    REQUIRE(spider::core::data_equal(expected, result));
}
// ... more helpers ...
}

@@ -52,13 +58,21 @@ class MySqlMetadataStorage : public MetadataStorage {
auto fetch_task(std::unique_ptr<sql::ResultSet> const& res) -> Task;
};

class MysqlDataStorage : public DataStorage {
class MySqlDataStorage : public DataStorage {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codebase verification

MySQL capitalization is inconsistent in filenames but consistent in class names

The codebase shows:

  • Class names consistently use "MySql" capitalization (e.g., MySqlDataStorage, MySqlMetadataStorage)
  • File names inconsistently use "Mysql" (e.g., MysqlStorage.hpp, MysqlStorage.cpp)

To maintain consistency:

  • Rename MysqlStorage.hpp to MySqlStorage.hpp
  • Rename MysqlStorage.cpp to MySqlStorage.cpp
  • Update corresponding include statements and header guards
🔗 Analysis chain

Consider standardizing MySQL capitalization

The class name uses "MySql" while the conventional spelling is "MySQL". Consider standardizing the capitalization across the codebase.

Additionally, the new methods get_data_by_key and remove_data are well-integrated with the existing interface.

Let's verify the MySQL capitalization across the codebase:

Also applies to: 74-75

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for different MySQL capitalizations to ensure consistency
echo "Searching for MySQL capitalization variants..."
rg -i "mysql" --type cpp

Length of output: 6177

Comment on lines 1106 to 1107
m_conn->prepareStatement("DELETE FROM `data_ref_task` WHERE "
"`id` = ? AND `task_i`d = ?")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix typo in SQL query.

There's a typo in the column name task_id where a backtick is incorrectly placed.

Apply this diff:

-                                         "`id` = ? AND `task_i`d = ?")
+                                         "`id` = ? AND `task_id` = ?")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
m_conn->prepareStatement("DELETE FROM `data_ref_task` WHERE "
"`id` = ? AND `task_i`d = ?")
m_conn->prepareStatement("DELETE FROM `data_ref_task` WHERE "
"`id` = ? AND `task_id` = ?")

Comment on lines 842 to 845
std::unique_ptr<sql::PreparedStatement> statement(m_conn->prepareStatement(
"SELECT `id` FROM `driver`s WHERE TIMESTAMPDIFF(MICROSECOND, "
"`heartbeat`, CURRENT_TIMESTAMP()) > ?"
));
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix typo in SQL query.

There's a typo in the table name drivers where a backtick is incorrectly placed.

Apply this diff:

-                "SELECT `id` FROM `driver`s WHERE TIMESTAMPDIFF(MICROSECOND, "
+                "SELECT `id` FROM `drivers` WHERE TIMESTAMPDIFF(MICROSECOND, "
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
std::unique_ptr<sql::PreparedStatement> statement(m_conn->prepareStatement(
"SELECT `id` FROM `driver`s WHERE TIMESTAMPDIFF(MICROSECOND, "
"`heartbeat`, CURRENT_TIMESTAMP()) > ?"
));
std::unique_ptr<sql::PreparedStatement> statement(m_conn->prepareStatement(
"SELECT `id` FROM `drivers` WHERE TIMESTAMPDIFF(MICROSECOND, "
"`heartbeat`, CURRENT_TIMESTAMP()) > ?"
));

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (5)
src/spider/storage/MysqlStorage.cpp (5)

152-163: Consider reordering table creation statements for better clarity.

The table creation order is critical due to foreign key constraints. Consider grouping related tables together and adding comments to explain the dependencies.

Apply this diff:

 std::array<char const* const, 11> const cCreateStorage = {
-        cCreateDriverTable,  // drivers table must be created before data_ref_driver
-        cCreateSchedulerTable,
-        cCreateTaskTable,  // tasks table must be created before data_ref_task
-        cCreateDataTable,  // data table must be created before task_outputs
-        cCreateDataLocalityTable,
-        cCreateDataRefDriverTable,
-        cCreateDataRefTaskTable,
-        cCreateTaskOutputTable,  // task_outputs table must be created before task_inputs
-        cCreateTaskInputTable,
-        cCreateTaskDependencyTable,
-        cCreateTaskInstanceTable,
+        // Core tables
+        cCreateDriverTable,
+        cCreateSchedulerTable,
+        cCreateTaskTable,
+        cCreateDataTable,
+        
+        // Data locality and references
+        cCreateDataLocalityTable,
+        cCreateDataRefDriverTable,
+        cCreateDataRefTaskTable,
+        
+        // Task relationships
+        cCreateTaskOutputTable,
+        cCreateTaskInputTable,
+        cCreateTaskDependencyTable,
+        cCreateTaskInstanceTable,
 };

108-113: Consider using DATETIME instead of TIMESTAMP for task_instances.

The TIMESTAMP data type in MySQL has a Y2K38 limitation (will overflow in January 2038). For future-proofing, consider using DATETIME which has a much larger range.

Apply this diff:

 char const* const cCreateTaskInstanceTable = R"(CREATE TABLE IF NOT EXISTS `task_instances` (
     `id` BINARY(16) NOT NULL,
     `task_id` BINARY(16) NOT NULL,
-    `start_time` TIMESTAMP NOT NULL,
+    `start_time` DATETIME(6) NOT NULL,
     CONSTRAINT `instance_task_id` FOREIGN KEY (`task_id`) REFERENCES `tasks` (`id`) ON UPDATE NO ACTION ON DELETE CASCADE,
     PRIMARY KEY (`id`)
 ))";

Line range hint 1022-1062: Potential SQL injection vulnerability in get_data_by_key method.

While the code uses prepared statements which generally prevent SQL injection, the key parameter should be validated to ensure it doesn't contain malicious patterns.

Add input validation:

 auto MySqlDataStorage::get_data_by_key(std::string const& key, Data* data) -> StorageErr {
+    // Validate key length and characters
+    if (key.length() > 64 || key.find_first_not_of("abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789-_") != std::string::npos) {
+        return StorageErr{StorageErrType::InvalidInputErr, "Invalid key format"};
+    }
     try {
         std::unique_ptr<sql::PreparedStatement> statement(

Line range hint 978-1020: Enhance error handling with more specific error types.

The error handling could be more granular to help with debugging and monitoring.

Consider adding specific error types for common scenarios:

// Add to StorageErrType enum
enum class StorageErrType {
    // ... existing types ...
    DataNotFoundErr,
    DataCorruptedErr,
    LocalityNotFoundErr,
    // ... more specific types ...
};

Line range hint 300-313: Consider extracting helper methods to improve readability.

The add_task method is quite long and handles multiple concerns. Consider extracting the task input and output handling into separate methods.

Example refactoring:

void MySqlMetadataStorage::add_task_inputs(sql::bytes const& task_id_bytes, Task const& task);
void MySqlMetadataStorage::add_task_outputs(sql::bytes const& task_id_bytes, Task const& task);

void MySqlMetadataStorage::add_task(sql::bytes job_id, Task const& task) {
    add_task_base(job_id, task);
    add_task_inputs(task_id_bytes, task);
    add_task_outputs(task_id_bytes, task);
}
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between b0232a1 and 52a1e17.

📒 Files selected for processing (1)
  • src/spider/storage/MysqlStorage.cpp (40 hunks)
🧰 Additional context used
🪛 cppcheck
src/spider/storage/MysqlStorage.cpp

[performance] 87-87: Variable 'id' is assigned in constructor body. Consider performing initialization in initialization list.

(useInitializationList)


[performance] 114-114: Variable 'm_id' is assigned in constructor body. Consider performing initialization in initialization list.

(useInitializationList)

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🧹 Outside diff range and nitpick comments (8)
tests/storage/StorageTestHelper.hpp (1)

32-45: Reduce code duplication in create_storage function

The connection and initialization logic is duplicated from the single-storage creation functions. Consider reusing the existing helper functions.

Suggested refactor:

 template <class M, class D>
 requires std::derived_from<M, core::MetadataStorage> && std::derived_from<D, core::DataStorage>
 auto create_storage(
 ) -> std::tuple<std::unique_ptr<core::MetadataStorage>, std::unique_ptr<core::DataStorage>> {
-    std::unique_ptr<core::MetadataStorage> metadata_storage{
-            static_cast<core::MetadataStorage*>(new M())
-    };
-    REQUIRE(metadata_storage->connect(spider::test::cStorageUrl).success());
-    REQUIRE(metadata_storage->initialize().success());
-    std::unique_ptr<core::DataStorage> data_storage{static_cast<core::DataStorage*>(new D())};
-    REQUIRE(data_storage->connect(spider::test::cStorageUrl).success());
-    REQUIRE(data_storage->initialize().success());
+    auto metadata_storage = create_metadata_storage<M>();
+    auto data_storage = create_data_storage<D>();
     return std::make_tuple(std::move(metadata_storage), std::move(data_storage));
 }
src/spider/storage/MetadataStorage.hpp (2)

15-19: LGTM: Appropriate ownership semantics established

The explicit deletion of copy/move operations enforces single ownership, which is crucial for storage interfaces managing unique resources like database connections.

This design choice prevents accidental resource sharing and potential connection management issues in a distributed system.


Line range hint 1-38: Consider adding interface documentation

While the interface is well-designed, it would benefit from:

  • Class-level documentation explaining the storage abstraction
  • Method-level documentation describing error conditions and threading guarantees

Would you like me to provide a template for the documentation?

src/spider/core/TaskGraph.hpp (2)

Line range hint 6-8: Clean up unused boost::uuid headers

The boost::uuid headers are only used for Task IDs now, but not for TaskGraph. Consider moving these includes to Task.hpp if they're not needed here.

-#include <boost/uuid/random_generator.hpp>
-#include <boost/uuid/uuid.hpp>

Line range hint 1-105: Consider documenting the TaskGraph lifecycle

With the removal of TaskGraph identification, it would be valuable to document:

  1. The expected lifecycle of TaskGraph instances
  2. How TaskGraph instances interact with the data storage layer
  3. Whether multiple TaskGraph instances can exist simultaneously

This will help maintain architectural clarity as the system evolves.

src/spider/storage/MysqlStorage.cpp (3)

174-177: Consider using reinterpret_cast instead of C-style cast.

While the C-style cast is marked with NOLINT directives, using reinterpret_cast would be more idiomatic in C++ and makes the casting intention more explicit.

-    return {(char const*)id.data(), id.size()};
+    return {reinterpret_cast<char const*>(id.data()), id.size()};

Line range hint 525-585: Consider adding indexes to optimize task graph retrieval.

The get_task_graph method performs multiple JOIN operations that could benefit from proper indexing. While there are some indexes defined in the table creation statements, additional indexes might improve query performance, especially for large task graphs.

Consider adding the following indexes:

  1. Composite index on tasks(job_id, id) to optimize the initial task selection
  2. Composite index on task_inputs(task_id, position) to optimize input retrieval
  3. Composite index on task_outputs(task_id, position) to optimize output retrieval

Line range hint 1047-1087: Reduce code duplication in data retrieval methods.

The locality retrieval logic is duplicated between get_data and get_data_by_key. Consider extracting this common functionality into a private helper method.

Create a private helper method:

private:
    auto fetch_locality(boost::uuids::uuid const& id, Data* data) -> void {
        sql::bytes id_bytes = uuid_get_bytes(id);
        std::unique_ptr<sql::PreparedStatement> locality_statement(
                m_conn->prepareStatement("SELECT `address` FROM `data_locality` WHERE `id` = ?")
        );
        locality_statement->setBytes(1, &id_bytes);
        std::unique_ptr<sql::ResultSet> const locality_res(locality_statement->executeQuery());
        std::vector<std::string> locality;
        while (locality_res->next()) {
            locality.emplace_back(locality_res->getString(1));
        }
        if (!locality.empty()) {
            data->set_locality(locality);
        }
    }

Then update both methods to use this helper:

// In get_data and get_data_by_key after setting data properties
fetch_locality(id, data);
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 52a1e17 and e9e5e40.

📒 Files selected for processing (6)
  • src/spider/core/TaskGraph.hpp (1 hunks)
  • src/spider/storage/MetadataStorage.hpp (2 hunks)
  • src/spider/storage/MysqlStorage.cpp (41 hunks)
  • src/spider/storage/MysqlStorage.hpp (2 hunks)
  • tests/storage/StorageTestHelper.hpp (1 hunks)
  • tests/storage/test-DataStorage.cpp (1 hunks)
🧰 Additional context used
🪛 cppcheck
src/spider/storage/MysqlStorage.cpp

[performance] 87-87: Variable 'id' is assigned in constructor body. Consider performing initialization in initialization list.

(useInitializationList)


[performance] 114-114: Variable 'm_id' is assigned in constructor body. Consider performing initialization in initialization list.

(useInitializationList)

tests/storage/test-DataStorage.cpp

[error] 41-41: syntax error

(syntaxError)

🔇 Additional comments (9)
src/spider/storage/MetadataStorage.hpp (1)

30-38: LGTM: Enhanced job management with proper ownership

The introduction of client_id and job-centric operations improves the data model by establishing clear ownership and hierarchical relationships.

Let's verify consistent UUID usage across the codebase:

✅ Verification successful

UUID usage is consistent across the codebase

The verification confirms consistent usage of boost::uuids::uuid throughout the codebase:

  • All UUID declarations use the same type boost::uuids::uuid
  • Proper boost UUID headers are included where needed
  • UUID generation is consistently handled using boost::uuids::random_generator
  • The hierarchical relationship (client -> job -> task) maintains consistent UUID usage for IDs

The changes to the job management interface maintain type consistency with the rest of the system.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for consistent UUID type usage across the codebase
# Expect: All UUIDs should use boost::uuids::uuid type

# Search for UUID parameter declarations
rg -t cpp -t hpp "uuid|UUID" -A 2 -B 2

# Search specifically for boost UUID includes
rg -t cpp -t hpp "boost/uuid"

Length of output: 126


Script:

#!/bin/bash
# Let's try with correct file extensions and more specific patterns
# Search for UUID usage patterns
rg -t h -t cpp -t c -t hpp "boost::uuids::uuid" -A 2 -B 2

# Search for boost UUID includes
rg -t h -t cpp -t c -t hpp "#include.*boost.*uuid" 

# Search for any UUID-related declarations
rg -t h -t cpp -t c -t hpp "uuid|UUID" -A 2 -B 2 --ignore-case

Length of output: 259


Script:

#!/bin/bash
# Let's try without file type restrictions and use fd to find relevant files first
# Find all C++ related files
fd -e cpp -e h -e c --type f

# Then use these commands on all files
rg "boost::uuids::uuid" 
rg "#include.*boost.*uuid"
rg "class.*UUID|struct.*UUID|uuid_t|UUID|uuid" --ignore-case

Length of output: 34867

src/spider/core/TaskGraph.hpp (1)

18-18: Verify the impact of removing TaskGraph identification

The change from a UUID-generating constructor to a default constructor removes the ability to uniquely identify TaskGraph instances. This could impact distributed processing or persistence scenarios.

Let's verify if TaskGraph identification is used elsewhere in the codebase:

src/spider/storage/MysqlStorage.hpp (5)

22-27: LGTM: Proper implementation of special member functions.

The class correctly implements the Rule of Five with appropriate defaulting and deletion of special member functions. This enforces proper resource management and prevents unintended copying.


68-73: LGTM: Consistent implementation of special member functions.

The special member functions mirror the implementation in MySqlMetadataStorage, maintaining consistency across the codebase.


66-66: Previous comment about MySQL capitalization inconsistency is still applicable.

Please refer to the previous review comment about standardizing MySQL capitalization in file names and class names.


79-80: Consider adding transaction support for data operations.

The new methods enhance the data management capabilities. However, consider wrapping these operations in transactions to ensure atomicity, especially when used in conjunction with reference management methods.

#!/bin/bash
# Description: Check for transaction handling in storage operations
# Test: Look for transaction-related calls in implementation
rg "BEGIN|COMMIT|ROLLBACK" --type cpp

34-42: Verify job management changes across the codebase.

The renaming from task_graph to job and the addition of client_id parameter represent significant interface changes.

✅ Verification successful

Let me search for these patterns to verify the changes properly. The previous search didn't provide enough context to determine if the interface changes were properly implemented across the codebase.


Interface changes have been properly implemented across the codebase

The search results confirm that:

  • No occurrences of old method names (add_task_graph or get_task_graphs) were found
  • The new add_job interface with client_id parameter is consistently defined in the base MetadataStorage class and implemented in MySqlMetadataStorage
  • The only caller found in tests is already using the new interface correctly with both job_id and client_id parameters
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify consistent usage of new job management methods
# Test: Search for old method names to ensure complete migration
rg "add_task_graph|get_task_graphs" --type cpp

# Test: Verify all callers have been updated with client_id
rg "add_job.*client_id.*task_graph" --type cpp

Length of output: 932


Script:

#!/bin/bash
# Search for old method names with better context
rg "add_task_graph|get_task_graphs" -B 2 -A 2 --type cpp

# Search for add_job method calls
rg "add_job\(" -B 2 -A 2 --type cpp

# Search for Storage interface definition
fd "Storage.hpp" --type f --exec cat {}

Length of output: 10706

src/spider/storage/MysqlStorage.cpp (1)

123-131: Consider the implications of UNIQUE KEY on a nullable column.

The key column in the data table is nullable and has a UNIQUE KEY constraint. In MySQL/MariaDB, NULL values are considered distinct in UNIQUE indexes, allowing multiple NULL values. This might lead to unexpected behaviour if your application logic assumes uniqueness across all values including NULL.

Consider one of these approaches:

  1. Make the key column NOT NULL if uniqueness is required for all records
  2. Add a CHECK constraint to ensure either all keys are unique or NULL
  3. Handle NULL keys explicitly in the application logic
✅ Verification successful

UNIQUE KEY constraint on nullable column is properly handled

The code shows proper handling of NULL values for the key column:

  • The Data class uses std::optional<std::string> for the key field
  • In MysqlStorage.cpp, NULL keys are explicitly handled during insertion:
    if (key.has_value()) {
        statement->setString(2, key.value());
    } else {
        statement->setNull(2, sql::DataType::VARCHAR);
    }

The design appears intentional, allowing both unique non-NULL keys and multiple NULL keys, which is a valid pattern for optional unique identifiers.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check if there are any NULL keys in the data table
# Test: Search for NULL handling in the codebase
rg -A 5 'isNull.*key|key.*isNull'

Length of output: 35


Script:

#!/bin/bash
# Let's check how the key column is used in SQL queries and data operations
rg -A 5 "INSERT.*data.*key|SELECT.*data.*key|UPDATE.*data.*key"

# Also check for any key-related method implementations
ast-grep --pattern 'key($$$)'

Length of output: 658


Script:

#!/bin/bash
# Let's examine the actual usage of the data table in the code
rg -A 10 "CREATE TABLE.*data"

Length of output: 3129


Script:

#!/bin/bash
# Check for any key validation or handling in the storage classes
ast-grep --pattern 'class MySqlDataStorage {
  $$$
}'

Length of output: 55


Script:

#!/bin/bash
# Let's check how the data object is handled in the code
rg -A 10 "get_key|set_key" 

# Also check for any NULL handling in SQL statements
rg -A 5 "WHERE.*key.*NULL|key.*IS.*NULL"

# Look for data class definition
ast-grep --pattern 'class Data {
  $$$
}'

Length of output: 2252

tests/storage/test-DataStorage.cpp (1)

74-74: ⚠️ Potential issue

Adjust macro syntax for std::tuple in TEMPLATE_TEST_CASE

When using std::tuple in the TEMPLATE_TEST_CASE macro, ensure that the syntax is correct to prevent compilation errors. The closing parenthesis ) should be placed correctly.

Apply the following diff to adjust the syntax:

-TEMPLATE_TEST_CASE(
-        "Add and remove data reference for task",
-        "[storage]",
-        (std::tuple<spider::core::MySqlMetadataStorage, spider::core::MySqlDataStorage>)
-) {
+TEMPLATE_TEST_CASE(
+    "Add and remove data reference for task",
+    "[storage]",
+    (std::tuple<spider::core::MySqlMetadataStorage, spider::core::MySqlDataStorage>)) {

Make sure the extra closing parenthesis is added after the template parameter list.

Likely invalid or redundant comment.

Comment on lines 16 to 21
auto create_data_storage() -> std::unique_ptr<core::DataStorage> {
std::unique_ptr<core::DataStorage> storage{static_cast<core::DataStorage*>(new T())};
REQUIRE(storage->connect(spider::test::cStorageUrl).success());
REQUIRE(storage->initialize().success());
return storage;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Use std::make_unique instead of raw new

Replace raw new operations with std::make_unique for better exception safety and cleaner code.

Example refactor for create_data_storage:

 auto create_data_storage() -> std::unique_ptr<core::DataStorage> {
-    std::unique_ptr<core::DataStorage> storage{static_cast<core::DataStorage*>(new T())};
+    auto storage = std::make_unique<T>();
     REQUIRE(storage->connect(spider::test::cStorageUrl).success());
     REQUIRE(storage->initialize().success());
     return storage;
 }

Apply similar changes to create_metadata_storage.

Also applies to: 25-30

Comment on lines 1 to 2
#ifndef SPIDER_TEST_STORAGETETHELPER_HPP
#define SPIDER_TEST_STORAGETETHELPER_HPP
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix typo in header guard macro name

The header guard macro contains a typo: STORAGETETHELPER should be STORAGETESTHELPER.

Apply this diff:

-#ifndef SPIDER_TEST_STORAGETETHELPER_HPP
-#define SPIDER_TEST_STORAGETETHELPER_HPP
+#ifndef SPIDER_TEST_STORAGETESTHELPER_HPP
+#define SPIDER_TEST_STORAGETESTHELPER_HPP
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
#ifndef SPIDER_TEST_STORAGETETHELPER_HPP
#define SPIDER_TEST_STORAGETETHELPER_HPP
#ifndef SPIDER_TEST_STORAGETESTHELPER_HPP
#define SPIDER_TEST_STORAGETESTHELPER_HPP

Comment on lines +11 to +12
char const* const cStorageUrl
= "jdbc:mariadb://localhost:3306/spider_test?user=root&password=password";
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Move database credentials to environment variables

Hardcoding database credentials in source code is a security risk and reduces test maintainability. Consider using environment variables or a test configuration file.

Suggested approach:

-char const* const cStorageUrl
-        = "jdbc:mariadb://localhost:3306/spider_test?user=root&password=password";
+char const* const cStorageUrl = []() {
+    auto* url = std::getenv("SPIDER_TEST_DB_URL");
+    if (url == nullptr) {
+        throw std::runtime_error("SPIDER_TEST_DB_URL environment variable not set");
+    }
+    return url;
+}();
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
char const* const cStorageUrl
= "jdbc:mariadb://localhost:3306/spider_test?user=root&password=password";
char const* const cStorageUrl = []() {
auto* url = std::getenv("SPIDER_TEST_DB_URL");
if (url == nullptr) {
throw std::runtime_error("SPIDER_TEST_DB_URL environment variable not set");
}
return url;
}();

Comment on lines +103 to +107
"Add and remove data reference for driver",
"[storage]",
(std::tuple<spider::core::MySqlMetadataStorage, spider::core::MySqlDataStorage>)
) {
auto [metadata_storage, data_storage] = spider::test::
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Correct syntax error in TEMPLATE_TEST_CASE declaration

There is a syntax error in the TEMPLATE_TEST_CASE starting at line 103. The macro usage might be incorrect due to misplaced parentheses.

Apply the following diff to fix the syntax:

-TEMPLATE_TEST_CASE(
-        "Add and remove data reference for driver",
-        "[storage]",
-        (std::tuple<spider::core::MySqlMetadataStorage, spider::core::MySqlDataStorage>)
-) {
+TEMPLATE_TEST_CASE(
+    "Add and remove data reference for driver",
+    "[storage]",
+    (std::tuple<spider::core::MySqlMetadataStorage, spider::core::MySqlDataStorage>)) {

Ensure that all parentheses are correctly balanced.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
"Add and remove data reference for driver",
"[storage]",
(std::tuple<spider::core::MySqlMetadataStorage, spider::core::MySqlDataStorage>)
) {
auto [metadata_storage, data_storage] = spider::test::
TEMPLATE_TEST_CASE(
"Add and remove data reference for driver",
"[storage]",
(std::tuple<spider::core::MySqlMetadataStorage, spider::core::MySqlDataStorage>)) {
auto [metadata_storage, data_storage] = spider::test::

Comment on lines +41 to +45
TEMPLATE_TEST_CASE(
"Add, get and remove data with key",
"[storage]",
spider::core::MySqlDataStorage
) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix syntax error in TEMPLATE_TEST_CASE macro usage

There is a syntax error at line 41 due to the incorrect use of the TEMPLATE_TEST_CASE macro. The macro parameters should not include a line break between the parameters and the opening brace {.

Apply the following diff to correct the syntax:

-TEMPLATE_TEST_CASE(
-        "Add, get and remove data with key",
-        "[storage]",
-        spider::core::MySqlDataStorage
-) {
+TEMPLATE_TEST_CASE(
+    "Add, get and remove data with key",
+    "[storage]",
+    spider::core::MySqlDataStorage) {

Ensure that the closing parenthesis ) is directly before the opening brace {.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
TEMPLATE_TEST_CASE(
"Add, get and remove data with key",
"[storage]",
spider::core::MySqlDataStorage
) {
TEMPLATE_TEST_CASE(
"Add, get and remove data with key",
"[storage]",
spider::core::MySqlDataStorage) {
🧰 Tools
🪛 cppcheck

[error] 41-41: syntax error

(syntaxError)

Comment on lines 117 to 119
std::cout << metadata_storage->add_driver(driver_id, "127.0.0.1").description << std::endl;
// REQUIRE(metadata_storage->add_driver(driver_id, "127.0.0.1").success());

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Replace std::cout with REQUIRE to assert operation success

Using std::cout to output error descriptions in test cases is not appropriate. Instead, you should assert the success of add_driver using REQUIRE to ensure the test behaves correctly.

Apply the following diff to correct this:

-std::cout << metadata_storage->add_driver(driver_id, "127.0.0.1").description << std::endl;
-// REQUIRE(metadata_storage->add_driver(driver_id, "127.0.0.1").success());
+auto result = metadata_storage->add_driver(driver_id, "127.0.0.1");
+REQUIRE(result.success());

This change assigns the result to a variable and then checks for success using REQUIRE.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
std::cout << metadata_storage->add_driver(driver_id, "127.0.0.1").description << std::endl;
// REQUIRE(metadata_storage->add_driver(driver_id, "127.0.0.1").success());
auto result = metadata_storage->add_driver(driver_id, "127.0.0.1");
REQUIRE(result.success());

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Outside diff range and nitpick comments (10)
tests/storage/StorageTestHelper.hpp (2)

18-49: Reduce code duplication in storage creation functions

The storage creation logic is duplicated across functions. Consider extracting the common connection and initialization pattern into a helper function.

+template <typename T>
+requires std::derived_from<T, core::DataStorage> || std::derived_from<T, core::MetadataStorage>
+auto create_and_init_storage() -> std::unique_ptr<T> {
+    std::unique_ptr<T> storage{new T()};
+    REQUIRE(storage->connect(spider::test::cStorageUrl).success());
+    REQUIRE(storage->initialize().success());
+    return storage;
+}
+
 template <class T>
 requires std::derived_from<T, core::DataStorage>
 auto create_data_storage() -> std::unique_ptr<core::DataStorage> {
-    std::unique_ptr<core::DataStorage> storage{static_cast<core::DataStorage*>(new T())};
-    REQUIRE(storage->connect(spider::test::cStorageUrl).success());
-    REQUIRE(storage->initialize().success());
-    return storage;
+    return create_and_init_storage<T>();
 }

3-3: Document reasons for lint suppressions

Please add comments explaining why each lint rule is being suppressed:

  • cert-err58-cpp
  • cppcoreguidelines-avoid-do-while
  • readability-function-cognitive-complexity

This helps maintainers understand if these suppressions are still necessary in the future.

Also applies to: 53-53

src/spider/storage/MetadataStorage.hpp (1)

15-19: Good RAII design with explicit copy/move semantics!

The explicit deletion of copy/move operations is appropriate for a storage interface. This prevents accidental copying of storage handles and enforces proper resource management.

Consider documenting these design decisions in a comment block above the class declaration to explain the rationale behind making the class non-copyable/movable.

src/spider/core/TaskGraph.hpp (1)

Line range hint 1-108: Consider adding documentation for task lifecycle management.

The class would benefit from documentation explaining:

  • The ownership model for tasks
  • The lifecycle of tasks within the graph
  • Constraints on task dependencies (e.g., cycles)

Add class-level documentation like this:

 namespace spider::core {
+/**
+ * @brief Manages a directed acyclic graph of tasks and their dependencies.
+ *
+ * TaskGraph maintains relationships between tasks without managing their unique
+ * identifiers. Tasks are stored in a flat map and their dependencies are tracked
+ * separately. The graph structure enforces parent-child relationships while
+ * preventing cycles.
+ */
 class TaskGraph {
src/spider/storage/MysqlStorage.hpp (1)

22-27: LGTM! Consider adding class documentation.

The special member functions follow the Rule of Five correctly, preventing accidental copying or moving of the storage object. This is appropriate for a database connection wrapper.

Consider adding a class-level documentation comment explaining the purpose and lifecycle of MySqlMetadataStorage.

src/spider/storage/MysqlStorage.cpp (5)

123-131: Consider adding an index on the value column.

The data table has a UNIQUE KEY on the key column but no index on the value column. If there are queries that search by value, consider adding an index to improve query performance.


525-528: Consider consistent error handling.

The error handling in get_task_graph checks for empty result set after executing the query, but it would be more consistent to check if the job exists before fetching its tasks.

Consider adding a job existence check:

+        // Check if job exists
+        std::unique_ptr<sql::PreparedStatement> job_statement(
+                m_conn->prepareStatement("SELECT 1 FROM `jobs` WHERE `id` = ?")
+        );
+        job_statement->setBytes(1, &id_bytes);
+        std::unique_ptr<sql::ResultSet> const job_res(job_statement->executeQuery());
+        if (job_res->rowsCount() == 0) {
+            m_conn->commit();
+            return StorageErr{
+                    StorageErrType::KeyNotFoundErr,
+                    fmt::format("no job with id %s", boost::uuids::to_string(id))
+            };
+        }
+
         std::unique_ptr<sql::PreparedStatement> task_statement(

106-112: Consider adding a composite index for task dependencies.

The task_dependencies table has separate indexes on parent and child columns. For queries that join on both columns, a composite index might be more efficient.

Consider adding a composite index:

     `parent` BINARY(16) NOT NULL,
     `child` BINARY(16) NOT NULL,
-    KEY (`parent`) USING BTREE,
-    KEY (`child`) USING BTREE,
+    KEY (`parent`, `child`) USING BTREE,
     CONSTRAINT `task_dep_parent` FOREIGN KEY (`parent`) REFERENCES `tasks` (`id`) ON UPDATE NO ACTION ON DELETE CASCADE,

951-951: Fix typo in comment.

There's a typo in the comment: "voilated" should be "violated".

-        // Need to initialize metadata storage first so that foreign constraint is not voilated
+        // Need to initialize metadata storage first so that foreign constraint is not violated

Line range hint 1047-1087: Consider using a transaction with SELECT FOR UPDATE.

The get_data_by_key method might be susceptible to race conditions if multiple transactions attempt to read and modify the same data concurrently. Consider using SELECT ... FOR UPDATE to ensure data consistency.

         std::unique_ptr<sql::PreparedStatement> statement(
-                m_conn->prepareStatement("SELECT `id`, `key`, `value`, `hard_locality` "
-                                         "FROM `data` WHERE `key` = ?")
+                m_conn->prepareStatement("SELECT `id`, `key`, `value`, `hard_locality` "
+                                         "FROM `data` WHERE `key` = ? FOR UPDATE")
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 52a1e17 and 1f1445b.

📒 Files selected for processing (6)
  • src/spider/core/TaskGraph.hpp (1 hunks)
  • src/spider/storage/MetadataStorage.hpp (2 hunks)
  • src/spider/storage/MysqlStorage.cpp (41 hunks)
  • src/spider/storage/MysqlStorage.hpp (2 hunks)
  • tests/storage/StorageTestHelper.hpp (1 hunks)
  • tests/storage/test-DataStorage.cpp (1 hunks)
🧰 Additional context used
🪛 cppcheck
src/spider/storage/MysqlStorage.cpp

[performance] 87-87: Variable 'id' is assigned in constructor body. Consider performing initialization in initialization list.

(useInitializationList)


[performance] 114-114: Variable 'm_id' is assigned in constructor body. Consider performing initialization in initialization list.

(useInitializationList)

tests/storage/test-DataStorage.cpp

[error] 45-45: syntax error

(syntaxError)

🔇 Additional comments (10)
tests/storage/StorageTestHelper.hpp (1)

1-54: Verify CMake integration

Let's verify this header is properly included in the test build configuration.

✅ Verification successful

CMake integration is properly configured

The header file is correctly included in the test build:

  • Listed in SPIDER_TEST_SOURCES in tests/CMakeLists.txt
  • Actively used by test-DataStorage.cpp which is also part of the build
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check if the file is included in CMake configuration
rg -l "StorageTestHelper\.hpp" CMakeLists.txt tests/CMakeLists.txt

# Check which test files are using this helper
rg -l "StorageTestHelper\.hpp" tests/

Length of output: 185


Script:

#!/bin/bash
# Check how StorageTestHelper.hpp is included in CMakeLists.txt
rg "StorageTestHelper\.hpp" tests/CMakeLists.txt -C 5

# Check if test-DataStorage.cpp is included in the build
rg "test-DataStorage\.cpp" tests/CMakeLists.txt

Length of output: 310

src/spider/storage/MetadataStorage.hpp (1)

30-38: Verify implementations of the new job-based interface

The shift from task-graph to job-based model with client tracking is a significant architectural change. This requires careful verification of all implementing classes.

Let's verify the implementations:

✅ Verification successful

Let me gather more information about the implementation details in the MySQL storage class.


Implementation verification completed - all methods properly implemented

The MySqlMetadataStorage class has correctly implemented all the required methods from the interface:

  • add_job: Properly handles job and client IDs with appropriate SQL statements
  • get_task_graph: Implements task graph retrieval with proper error handling
  • get_jobs_by_client_id: Correctly queries jobs using client ID
  • remove_job: Implements job removal with proper transaction handling
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for classes implementing MetadataStorage
echo "Checking for MetadataStorage implementations..."
rg "class.*:.*public.*MetadataStorage"

# Verify if all pure virtual methods are implemented
echo "Checking for add_job implementations..."
ast-grep --pattern 'auto add_job(boost::uuids::uuid job_id, boost::uuids::uuid client_id, TaskGraph const& task_graph)'

echo "Checking for get_jobs_by_client_id implementations..."
ast-grep --pattern 'auto get_jobs_by_client_id(boost::uuids::uuid client_id, std::vector<boost::uuids::uuid>* job_ids)'

Length of output: 685


Script:

#!/bin/bash
# Check the implementation details in MysqlStorage
echo "Checking MySQL implementation details..."
rg -A 10 "auto MySqlMetadataStorage::(add_job|get_task_graph|get_jobs_by_client_id|remove_job)" src/spider/storage/MysqlStorage.cpp

# Look for any old task-graph related methods that might need cleanup
echo "Checking for old task-graph methods..."
rg "task_graph" src/spider/storage/MysqlStorage.hpp

Length of output: 2627

src/spider/core/TaskGraph.hpp (2)

17-17: LGTM: Constructor simplification aligns with modern C++ practices.

The change to use = default removes unnecessary initialization and improves clarity.


17-17: Verify the impact of removing TaskGraph ID management.

The removal of the ID management functionality (m_id and get_id()) might affect code that previously relied on these features.

✅ Verification successful

The removal of TaskGraph ID management is safe

The search results show that TaskGraph itself doesn't expose or manage any ID functionality. Instead, the storage layer (MysqlStorage) handles all ID management externally, using UUIDs for jobs, clients, and tasks. The TaskGraph class is only used as a data structure to store and manage tasks and their relationships.

Key findings:

  • No direct usage of get_id() method was found in the codebase
  • All UUID handling is done at the storage layer, not within TaskGraph
  • TaskGraph is used primarily as a container for tasks and their relationships
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for any remaining references to get_id() method of TaskGraph
# to ensure all callers have been updated

# Look for direct method calls
rg -A 2 "TaskGraph.*get_id\(\)" 

# Look for stored TaskGraph IDs
rg -A 2 "TaskGraph.*uuid" 

Length of output: 61


Script:

#!/bin/bash
# Let's try a broader search for any references to TaskGraph and ID/id
rg -A 2 "TaskGraph.*[iI][dD]"

# Also search for any files that might be using TaskGraph
fd -e hpp -e cpp -e h -e c | xargs rg "TaskGraph" -l

# Look for any potential test files related to TaskGraph
fd -e test.cpp -e Test.cpp -e test.hpp -e Test.hpp | xargs rg "TaskGraph" -A 2

Length of output: 4299

src/spider/storage/MysqlStorage.hpp (4)

68-73: LGTM! Special member functions are properly defined.

The implementation of special member functions follows best practices for resource management.


66-66: Duplicate comment: MySQL capitalization inconsistency.

Referring to the previous review comment about MySQL capitalization inconsistency in filenames vs class names.


79-80: New methods enhance the storage interface.

The addition of get_data_by_key and remove_data methods completes the CRUD operations set, providing more flexibility in data management.

Let's verify the implementation of these methods:

#!/bin/bash
# Check for consistent error handling in the implementation
echo "Checking error handling patterns..."
rg "get_data_by_key|remove_data.*StorageErr" --type cpp -A 5

34-42: API changes look good but verify impact.

The renaming from task_graph to job terminology (e.g., add_task_graph → add_job) and the addition of client_id parameter improve the API clarity.

Let's verify the impact of these API changes:

✅ Verification successful

Let me gather more information about the API changes by searching for the old task_graph related methods.


API changes are consistent and properly implemented

The codebase shows a well-coordinated transition from task_graph to job terminology. The old add_task_graph method is completely removed, and the new add_job method is consistently used across the codebase. The TaskGraph type is still used internally as a parameter and return type, which is correct since it represents the graph structure of tasks within a job.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for uses of old method names to ensure all calls are updated
echo "Checking for old method calls..."
rg "add_task_graph|get_task_graphs" --type cpp

# Search for new method usage patterns
echo "Checking new method usage..."
rg "add_job|get_jobs_by_client_id" --type cpp

Length of output: 4330


Script:

#!/bin/bash
# Search for old task_graph method declarations and implementations
ast-grep --pattern 'add_task_graph($_) -> StorageErr'
ast-grep --pattern 'get_task_graphs($_) -> StorageErr'

# Search for any remaining task_graph references
rg "task_graph" --type cpp -A 2

Length of output: 5017

src/spider/storage/MysqlStorage.cpp (1)

159-171: LGTM: Table creation order handles foreign key dependencies correctly.

The order of table creation in cCreateStorage array properly handles foreign key dependencies:

  1. drivers before data_ref_driver
  2. jobs before tasks
  3. tasks before data_ref_task
  4. data before task_outputs
  5. task_outputs before task_inputs
tests/storage/test-DataStorage.cpp (1)

20-43: Test case "Add, get and remove data" is well-implemented

The test case accurately verifies the basic data storage operations, and the code is correct.

Comment on lines 1 to 2
#ifndef SPIDER_TEST_STORAGETETHELPER_HPP
#define SPIDER_TEST_STORAGETETHELPER_HPP
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix typo in header guard macro name

The header guard contains a typo: "STORAGETET" should be "STORAGETESTHELPER"

-#ifndef SPIDER_TEST_STORAGETETHELPER_HPP
-#define SPIDER_TEST_STORAGETETHELPER_HPP
+#ifndef SPIDER_TEST_STORAGETESTHELPER_HPP
+#define SPIDER_TEST_STORAGETESTHELPER_HPP
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
#ifndef SPIDER_TEST_STORAGETETHELPER_HPP
#define SPIDER_TEST_STORAGETETHELPER_HPP
#ifndef SPIDER_TEST_STORAGETESTHELPER_HPP
#define SPIDER_TEST_STORAGETESTHELPER_HPP

Comment on lines +15 to +16
char const* const cStorageUrl
= "jdbc:mariadb://localhost:3306/spider_test?user=root&password=password";
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Security: Move database credentials to environment variables

Storing database credentials in source code is a security risk. Consider:

  1. Moving credentials to environment variables
  2. Using a non-root test user with minimal privileges
  3. Using a strong password
-char const* const cStorageUrl
-        = "jdbc:mariadb://localhost:3306/spider_test?user=root&password=password";
+char const* const cStorageUrl = []() {
+    auto const* url = std::getenv("SPIDER_TEST_DB_URL");
+    if (url == nullptr) {
+        throw std::runtime_error("SPIDER_TEST_DB_URL environment variable not set");
+    }
+    return url;
+}();
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
char const* const cStorageUrl
= "jdbc:mariadb://localhost:3306/spider_test?user=root&password=password";
char const* const cStorageUrl = []() {
auto const* url = std::getenv("SPIDER_TEST_DB_URL");
if (url == nullptr) {
throw std::runtime_error("SPIDER_TEST_DB_URL environment variable not set");
}
return url;
}();

Comment on lines +106 to +111
TEMPLATE_TEST_CASE(
"Add and remove data reference for driver",
"[storage]",
(std::tuple<spider::core::MySqlMetadataStorage, spider::core::MySqlDataStorage>)
) {
auto [metadata_storage, data_storage] = spider::test::
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix syntax error in TEMPLATE_TEST_CASE macro

An extra closing parenthesis at line 111 is causing a syntax error. Please adjust the macro to fix the issue.

Apply this diff:

 TEMPLATE_TEST_CASE(
         "Add and remove data reference for driver",
         "[storage]",
         (std::tuple<spider::core::MySqlMetadataStorage, spider::core::MySqlDataStorage>)
-)
-) {
+){
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
TEMPLATE_TEST_CASE(
"Add and remove data reference for driver",
"[storage]",
(std::tuple<spider::core::MySqlMetadataStorage, spider::core::MySqlDataStorage>)
) {
auto [metadata_storage, data_storage] = spider::test::
TEMPLATE_TEST_CASE(
"Add and remove data reference for driver",
"[storage]",
(std::tuple<spider::core::MySqlMetadataStorage, spider::core::MySqlDataStorage>)
){
auto [metadata_storage, data_storage] = spider::test::

Comment on lines +74 to +79
TEMPLATE_TEST_CASE(
"Add and remove data reference for task",
"[storage]",
(std::tuple<spider::core::MySqlMetadataStorage, spider::core::MySqlDataStorage>)
) {
auto [metadata_storage, data_storage] = spider::test::
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix syntax error in TEMPLATE_TEST_CASE macro

An extra closing parenthesis at line 79 is causing a syntax error. Correct the macro to ensure proper syntax.

Apply this diff:

 TEMPLATE_TEST_CASE(
         "Add and remove data reference for task",
         "[storage]",
         (std::tuple<spider::core::MySqlMetadataStorage, spider::core::MySqlDataStorage>)
-)
-) {
+){
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
TEMPLATE_TEST_CASE(
"Add and remove data reference for task",
"[storage]",
(std::tuple<spider::core::MySqlMetadataStorage, spider::core::MySqlDataStorage>)
) {
auto [metadata_storage, data_storage] = spider::test::
TEMPLATE_TEST_CASE(
"Add and remove data reference for task",
"[storage]",
(std::tuple<spider::core::MySqlMetadataStorage, spider::core::MySqlDataStorage>)
){
auto [metadata_storage, data_storage] = spider::test::

Comment on lines +45 to +50
TEMPLATE_TEST_CASE(
"Add, get and remove data with key",
"[storage]",
spider::core::MySqlDataStorage
) {
std::unique_ptr<spider::core::DataStorage> storage
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix syntax error due to extra closing parenthesis

There is an extra closing parenthesis at line 50, causing a syntax error in the TEMPLATE_TEST_CASE macro. Ensure that the parentheses and braces are correctly matched.

Apply this diff to fix the syntax error:

 TEMPLATE_TEST_CASE(
         "Add, get and remove data with key",
         "[storage]",
         spider::core::MySqlDataStorage
-)
-) {
+){
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
TEMPLATE_TEST_CASE(
"Add, get and remove data with key",
"[storage]",
spider::core::MySqlDataStorage
) {
std::unique_ptr<spider::core::DataStorage> storage
TEMPLATE_TEST_CASE(
"Add, get and remove data with key",
"[storage]",
spider::core::MySqlDataStorage
){
std::unique_ptr<spider::core::DataStorage> storage
🧰 Tools
🪛 cppcheck

[error] 45-45: syntax error

(syntaxError)

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (5)
tests/storage/StorageTestHelper.hpp (2)

36-47: Refactor Suggestion: Reduce Code Duplication in Storage Creation

The connection and initialization logic is duplicated across all three functions. Consider extracting this common pattern into a helper function.

Here's a suggested refactor:

template <typename T>
auto connect_and_initialize(std::unique_ptr<T>& storage) {
    REQUIRE(storage->connect(spider::test::cStorageUrl).success());
    REQUIRE(storage->initialize().success());
}

template <class M, class D>
requires std::derived_from<M, core::MetadataStorage> && std::derived_from<D, core::DataStorage>
auto create_storage() {
    auto metadata_storage = std::make_unique<M>();
    auto data_storage = std::make_unique<D>();
    
    connect_and_initialize(metadata_storage);
    connect_and_initialize(data_storage);
    
    return std::make_tuple(std::move(metadata_storage), std::move(data_storage));
}

20-25: Consider Adding Error Context to Storage Creation

The REQUIRE statements could benefit from more descriptive error messages to aid in debugging test failures.

Example improvement:

 template <class T>
 requires std::derived_from<T, core::DataStorage>
 auto create_data_storage() -> std::unique_ptr<core::DataStorage> {
     std::unique_ptr<core::DataStorage> storage = std::make_unique<T>();
-    REQUIRE(storage->connect(spider::test::cStorageUrl).success());
-    REQUIRE(storage->initialize().success());
+    REQUIRE(storage->connect(spider::test::cStorageUrl).success(), 
+            "Failed to connect to storage at " << spider::test::cStorageUrl);
+    REQUIRE(storage->initialize().success(),
+            "Failed to initialize storage");
     return storage;
 }

Also applies to: 29-34

tests/storage/test-DataStorage.cpp (3)

20-43: Consider enhancing test data and RAII usage.

The test case effectively covers CRUD operations and error conditions. Consider these improvements:

  1. Use const auto for better type deduction:
-    std::unique_ptr<spider::core::DataStorage> storage
+    const auto storage
  1. Use more meaningful test data to improve test documentation:
-    spider::core::Data const data{"value"};
+    spider::core::Data const data{"test_payload_1"};  // Descriptive test data

133-134: Fix incorrect comment.

The comment refers to "task reference" but this is actually removing a driver reference.

-    // Remove task reference
+    // Remove driver reference

130-131: Fix incorrect comment.

The comment refers to "task reference" but this is actually adding a driver reference.

-    // Add task reference
+    // Add driver reference
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 1f1445b and 67f108a.

📒 Files selected for processing (2)
  • tests/storage/StorageTestHelper.hpp (1 hunks)
  • tests/storage/test-DataStorage.cpp (1 hunks)
🧰 Additional context used
🪛 cppcheck
tests/storage/test-DataStorage.cpp

[error] 45-45: syntax error

(syntaxError)

🔇 Additional comments (4)
tests/storage/StorageTestHelper.hpp (1)

15-16: ⚠️ Potential issue

Security Risk: Database Credentials Should Not Be Hardcoded

Storing database credentials in source code poses several security risks:

  1. Credentials are exposed in version control
  2. Using root access for tests is unnecessarily privileged
  3. The password 'password' is extremely weak

Please implement this more secure approach:

-char const* const cStorageUrl
-        = "jdbc:mariadb://localhost:3306/spider_test?user=root&password=password";
+char const* const cStorageUrl = []() {
+    auto const* url = std::getenv("SPIDER_TEST_DB_URL");
+    if (url == nullptr) {
+        throw std::runtime_error("SPIDER_TEST_DB_URL environment variable not set");
+    }
+    return url;
+}();

Also, consider:

  1. Creating a dedicated test user with minimal privileges
  2. Using environment-specific configuration for different CI/CD stages
  3. Adding setup instructions to the README

Likely invalid or redundant comment.

tests/storage/test-DataStorage.cpp (3)

1-17: LGTM! Well-organized file structure.

The includes are properly organized, and the NOLINT pragmas are appropriately scoped to specific rules.


45-72: Extract common test setup and verification logic.

This test case shares similar setup and verification patterns with the first test case. Consider extracting common logic into test helpers:

namespace {
    template<typename T>
    auto create_and_verify_data(std::unique_ptr<T>& storage, 
                               std::string_view key = "") {
        spider::core::Data data{key.empty() ? "test_data" : 
                                            std::string(key), "test_value"};
        REQUIRE(storage->add_data(data).success());
        return data;
    }
}
🧰 Tools
🪛 cppcheck

[error] 45-45: syntax error

(syntaxError)


1-138: Overall excellent test coverage and implementation.

The test suite provides comprehensive coverage of data storage operations, including:

  • Basic CRUD operations
  • Key-based operations
  • Task and driver reference management
  • Error conditions and edge cases

While there are some minor improvements suggested above, the implementation is solid and follows testing best practices.

🧰 Tools
🪛 cppcheck

[error] 45-45: syntax error

(syntaxError)

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (2)
tests/storage/test-DataStorage.cpp (2)

28-31: Enhance error scenario coverage.

While basic error cases are covered, consider adding these additional scenarios:

  1. Adding data with invalid input (empty values, special characters)
  2. Concurrent access scenarios
  3. Resource cleanup after failures
  4. Edge cases for reference management

Would you like assistance in implementing these additional test cases?

Also applies to: 58-59, 83-84, 93-94, 123-124


1-138: Improve test suite organization and documentation.

Consider these enhancements to improve maintainability:

  1. Add a test suite description using Catch2's SCENARIO macro
  2. Group related test cases using Catch2's TEST_SUITE macro
  3. Add documentation explaining the test strategy and coverage goals

Example structure:

TEST_SUITE("DataStorage") {
    SCENARIO("Data operations with MySQL storage") {
        // Existing test cases here
    }
}
🧰 Tools
🪛 cppcheck

[error] 45-45: syntax error

(syntaxError)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 67f108a and 4322f85.

📒 Files selected for processing (1)
  • tests/storage/test-DataStorage.cpp (1 hunks)
🧰 Additional context used
🪛 cppcheck
tests/storage/test-DataStorage.cpp

[error] 45-45: syntax error

(syntaxError)

🔇 Additional comments (1)
tests/storage/test-DataStorage.cpp (1)

1-17: Well-organized includes and appropriate NOLINT usage.

The file structure demonstrates good organization with logical grouping of includes and specific NOLINT pragmas.

Comment on lines +20 to +43
TEMPLATE_TEST_CASE("Add, get and remove data", "[storage]", spider::core::MySqlDataStorage) {
std::unique_ptr<spider::core::DataStorage> storage
= spider::test::create_data_storage<TestType>();

// Add data
spider::core::Data const data{"value"};
REQUIRE(storage->add_data(data).success());

// Add data with same id again should fail
spider::core::Data const data_same_id{data.get_id(), "value2"};
REQUIRE(spider::core::StorageErrType::DuplicateKeyErr == storage->add_data(data_same_id).type);

// Get data should match
spider::core::Data result{"temp"};
REQUIRE(storage->get_data(data.get_id(), &result).success());
REQUIRE(spider::core::data_equal(data, result));

// Remove data should succeed
REQUIRE(storage->remove_data(data.get_id()).success());

// Get data should fail
REQUIRE(spider::core::StorageErrType::KeyNotFoundErr
== storage->get_data(data.get_id(), &result).type);
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Test cases need structural improvements.

While the test coverage is comprehensive, there are several areas for improvement:

  1. The test cases contain duplicated setup code for storage creation
  2. Error scenarios could benefit from more descriptive failure messages
  3. The TEMPLATE_TEST_CASE macro formatting is inconsistent

Consider implementing these improvements:

  1. Extract common setup into a fixture:
template<typename T>
class StorageTestFixture {
protected:
    std::unique_ptr<spider::core::DataStorage> storage;
    
    void SetUp() {
        storage = spider::test::create_data_storage<T>();
    }
};
  1. Add descriptive messages to REQUIRE statements:
REQUIRE(storage->add_data(data).success(), 
        "Adding data with unique ID should succeed");
  1. Standardize TEMPLATE_TEST_CASE formatting:
TEMPLATE_TEST_CASE("Test name", "[storage]", TestType) {
    // Test body
}

Also applies to: 45-72, 74-104, 106-135

…based on it

libmariadbcpp.so depends on libmariadb.so, but does not set the rpath
correctly, so the built library cannot find libmariadb.so. Install
libmariadb using apt and brew and build libmariadbcpp.so with system
installed libmariadb.so.
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (3)
tools/scripts/lib_install/macOS/install-lib.sh (1)

13-14: Consider pinning package versions for reproducibility.

To ensure consistent development environments across the team, consider specifying versions for all packages. This helps prevent unexpected issues from package updates.

 brew install \
-  boost \
-  fmt \
-  spdlog \
-  mariadb-connector-c
+  [email protected] \
+  [email protected] \
+  [email protected] \
+  [email protected]
tools/scripts/lib_install/linux/install-dev.sh (2)

23-23: Add a space after the package name.

The addition of libmariadb-dev is appropriate for supporting MySQL-based storage tests. However, please add a space after the package name for consistency.

-    libmariadb-dev\
+    libmariadb-dev \

Line range hint 1-26: Consider organizing packages into logical groups.

To improve maintainability, consider grouping the packages by their purpose. For example:

  1. Build tools (cmake, gcc, g++, etc.)
  2. System libraries (libssl-dev, libcurl4, etc.)
  3. Database libraries (libmariadb-dev)
  4. Programming languages (python3 and related packages)
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 4322f85 and 47feb1e.

📒 Files selected for processing (4)
  • cmake/Modules/FindMariaDBClientCpp.cmake (1 hunks)
  • tools/scripts/lib_install/linux/install-dev.sh (1 hunks)
  • tools/scripts/lib_install/macOS/install-lib.sh (1 hunks)
  • tools/scripts/lib_install/mariadb-connector-cpp.sh (1 hunks)
🔇 Additional comments (3)
tools/scripts/lib_install/macOS/install-lib.sh (1)

13-14: LGTM! Addition of MariaDB connector aligns with testing requirements.

The addition of mariadb-connector-c is appropriate for supporting the new data storage tests.

tools/scripts/lib_install/mariadb-connector-cpp.sh (1)

65-65: Verify RPM layout compatibility with Debian-based systems

The script uses dpkg for package management but configures cmake with INSTALL_LAYOUT=RPM. This might cause inconsistencies in file locations.

Let's verify the impact of RPM layout on Debian systems:

cmake/Modules/FindMariaDBClientCpp.cmake (1)

107-107: Excellent consolidation of include directories!

The change improves code organization by combining the include directory paths into a single property declaration, which is the preferred approach in CMake. This modification maintains the same functionality while reducing code duplication.

Comment on lines +64 to +65
# Setting USE_SYSTEM_INSTALLED_LIB mess up the install prefix, so set it manually
cmake -DUSE_SYSTEM_INSTALLED_LIB=ON -DCMAKE_INSTALL_LIBDIR=/usr/local -DINSTALL_LAYOUT=RPM ..
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add directory existence check before cmake configuration

The script should verify that /usr/local exists and is writable before proceeding with the cmake configuration to prevent installation failures.

Add these checks before the cmake command:

 # Setting USE_SYSTEM_INSTALLED_LIB mess up the install prefix, so set it manually
+if [ ! -d "/usr/local" ]; then
+  ${privileged_command_prefix} mkdir -p /usr/local
+elif [ ! -w "/usr/local" ]; then
+  echo "Error: /usr/local is not writable"
+  exit 1
+fi
 cmake -DUSE_SYSTEM_INSTALLED_LIB=ON -DCMAKE_INSTALL_LIBDIR=/usr/local -DINSTALL_LAYOUT=RPM ..
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Setting USE_SYSTEM_INSTALLED_LIB mess up the install prefix, so set it manually
cmake -DUSE_SYSTEM_INSTALLED_LIB=ON -DCMAKE_INSTALL_LIBDIR=/usr/local -DINSTALL_LAYOUT=RPM ..
# Setting USE_SYSTEM_INSTALLED_LIB mess up the install prefix, so set it manually
if [ ! -d "/usr/local" ]; then
${privileged_command_prefix} mkdir -p /usr/local
elif [ ! -w "/usr/local" ]; then
echo "Error: /usr/local is not writable"
exit 1
fi
cmake -DUSE_SYSTEM_INSTALLED_LIB=ON -DCMAKE_INSTALL_LIBDIR=/usr/local -DINSTALL_LAYOUT=RPM ..

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Outside diff range and nitpick comments (2)
tools/scripts/lib_install/linux/install-lib.sh (1)

10-15: LGTM! Consider adding error handling for sudo check.

The privilege checking mechanism is well-implemented. However, consider adding error handling in case the sudo check fails.

 if [ ${EUID:-$(id -u)} -ne 0 ] ; then
-  sudo echo "Script can elevate privileges."
+  if ! sudo echo "Script can elevate privileges."; then
+    echo "Error: This script requires sudo privileges to install packages."
+    exit 1
+  fi
   privileged_command_prefix="${privileged_command_prefix} sudo"
 fi
docs/tests.md (1)

16-24: Improve the build instructions clarity.

The build instructions should be more detailed and include prerequisites.

 ## Build and run unit tests
 
-To build the unit tests, run the following commands in project root directory.
+Prerequisites:
+- CMake version X.Y or higher
+- C++ compiler supporting C++17
+
+To build the unit tests, run the following commands in the project root directory:
 
 ```shell
+# Configure the build
 cmake -S . -B build
+
+# Build the unit test target
 cmake --build build --target unitTest --parallel
+
+# Run the tests
 ./build/tests/unitTest

<details>
<summary>🧰 Tools</summary>

<details>
<summary>🪛 LanguageTool</summary>

[uncategorized] ~18-~18: You might be missing the article “the” here.
Context: ...it tests, run the following commands in project root directory.  ```shell cmake -S . -B...

(AI_EN_LECTOR_MISSING_DETERMINER_THE)

</details>

</details>

</blockquote></details>

</blockquote></details>

<details>
<summary>📜 Review details</summary>

**Configuration used: CodeRabbit UI**
**Review profile: CHILL**

<details>
<summary>📥 Commits</summary>

Reviewing files that changed from the base of the PR and between 47feb1ea5bdb541b8255a93a0235dcc14b869e56 and 0ae30cc0706679e7c6191c067c48a4a7bd0b862b.

</details>

<details>
<summary>📒 Files selected for processing (2)</summary>

* `docs/tests.md` (1 hunks)
* `tools/scripts/lib_install/linux/install-lib.sh` (1 hunks)

</details>

<details>
<summary>🧰 Additional context used</summary>

<details>
<summary>🪛 LanguageTool</summary>

<details>
<summary>docs/tests.md</summary>

[uncategorized] ~10-~10: You might be missing the article “the” here.
Context: ...d  1. Start a Mysql database running in background. 2. Set the password for `root` or any ...

(AI_EN_LECTOR_MISSING_DETERMINER_THE)

---

[uncategorized] ~11-~11: You might be missing the article “a” here.
Context: ... `root` or any create another user with password 3. Create an empty database. 4. Set the...

(AI_EN_LECTOR_MISSING_DETERMINER_A)

---

[uncategorized] ~18-~18: You might be missing the article “the” here.
Context: ...it tests, run the following commands in project root directory.  ```shell cmake -S . -B...

(AI_EN_LECTOR_MISSING_DETERMINER_THE)

---

[uncategorized] ~26-~26: You might be missing the article “the” here.
Context: ...``  If the tests show error message for connection function below, revisit the [Setup stor...

(AI_EN_LECTOR_MISSING_DETERMINER_THE)

</details>

</details>

</details>

<details>
<summary>🔇 Additional comments (1)</summary>

<details>
<summary>tools/scripts/lib_install/linux/install-lib.sh (1)</summary>

`16-18`: **LGTM! Package installation follows security best practices.**

The package installation is properly implemented with:
- Non-interactive frontend to prevent hanging
- Minimal installation using --no-install-recommends
- Package list update before installation

</details>

</details>

</details>

<!-- This is an auto-generated comment by CodeRabbit for review status -->

docs/tests.md Outdated
Comment on lines 3 to 15
## Setup storage backend

Spider relies on a fault-tolerant storage to store metadata and data. Unit tests need the storage
backend available to run the unit tests.

### Setup Mysql as storage backend

1. Start a Mysql database running in background.
2. Set the password for `root` or any create another user with password
3. Create an empty database.
4. Set the `cStorageUrl` in `tests/storage/StorageTestHelper.hpp` to
`jdbc:mariadb://localhost:3306/<db_name>?user=<usr>&password=<pwd>`

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Enhance the storage backend setup instructions.

The setup instructions could be more comprehensive. Consider the following improvements:

 ## Setup storage backend
 
 Spider relies on a fault-tolerant storage to store metadata and data. Unit tests need the storage
 backend available to run the unit tests.
 
 ### Setup Mysql as storage backend
+
+Prerequisites:
+- MySQL/MariaDB version X.Y or higher
+- MySQL client tools
 
-1. Start a Mysql database running in background.
-2. Set the password for `root` or any create another user with password
-3. Create an empty database.
+1. Start a MySQL database running in the background.
+2. Set up database access:
+   ```sql
+   ALTER USER 'root'@'localhost' IDENTIFIED BY 'your_password';
+   -- OR create a new user
+   CREATE USER 'spider_test'@'localhost' IDENTIFIED BY 'your_password';
+   GRANT ALL PRIVILEGES ON *.* TO 'spider_test'@'localhost';
+   ```
+3. Create an empty database:
+   ```sql
+   CREATE DATABASE spider_test;
+   ```
 4. Set the `cStorageUrl` in `tests/storage/StorageTestHelper.hpp` to
    `jdbc:mariadb://localhost:3306/<db_name>?user=<usr>&password=<pwd>`
+
+   Where:
+   - `<db_name>`: Name of the database (e.g., spider_test)
+   - `<usr>`: MySQL username
+   - `<pwd>`: MySQL password
🧰 Tools
🪛 LanguageTool

[uncategorized] ~10-~10: You might be missing the article “the” here.
Context: ...d 1. Start a Mysql database running in background. 2. Set the password for root or any ...

(AI_EN_LECTOR_MISSING_DETERMINER_THE)


[uncategorized] ~11-~11: You might be missing the article “a” here.
Context: ... root or any create another user with password 3. Create an empty database. 4. Set the...

(AI_EN_LECTOR_MISSING_DETERMINER_A)

docs/tests.md Outdated
Comment on lines 26 to 32
If the tests show error message for connection function below, revisit
the [Setup storage backend](#setup-storage-backend) and double check if `cStroageUrl` is correctly
set.

```c++
REQUIRE( storage->connect(spider::test::cStorageUrl).success() )
```
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix typo and enhance troubleshooting guidance.

There's a typo in the variable name and the error handling could be more informative.

-If the tests show error message for connection function below, revisit
-the [Setup storage backend](#setup-storage-backend) and double check if `cStroageUrl` is correctly
+If the tests show an error message for the connection function below, revisit
+the [Setup storage backend](#setup-storage-backend) and double check if `cStorageUrl` is correctly
 set.
 
 ```c++
 REQUIRE( storage->connect(spider::test::cStorageUrl).success() )

+Common connection issues:
+- MySQL service not running
+- Incorrect database name
+- Wrong username or password
+- MySQL server not listening on the specified port


<details>
<summary>🧰 Tools</summary>

<details>
<summary>🪛 LanguageTool</summary>

[uncategorized] ~26-~26: You might be missing the article “the” here.
Context: ...``  If the tests show error message for connection function below, revisit the [Setup stor...

(AI_EN_LECTOR_MISSING_DETERMINER_THE)

</details>

</details>

<!-- This is an auto-generated comment by CodeRabbit -->

docs/tests.md Outdated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes more sense to call this testing.md.

docs/tests.md Outdated
@@ -0,0 +1,32 @@
# Tests
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Tests
# Testing

docs/tests.md Outdated
@@ -0,0 +1,32 @@
# Tests

## Setup storage backend
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
## Setup storage backend
## Set up a storage backend

"Set up" is the verb, "setup" is the noun.

docs/tests.md Outdated
Comment on lines 5 to 6
Spider relies on a fault-tolerant storage to store metadata and data. Unit tests need the storage
backend available to run the unit tests.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Spider relies on a fault-tolerant storage to store metadata and data. Unit tests need the storage
backend available to run the unit tests.
Spider relies on a fault-tolerant storage backend to store metadata and data. Spider's unit tests
also require this storage backend.

docs/tests.md Outdated
Spider relies on a fault-tolerant storage to store metadata and data. Unit tests need the storage
backend available to run the unit tests.

### Setup Mysql as storage backend
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
### Setup Mysql as storage backend
### Set up MySQL as storage backend

docs/tests.md Outdated
Comment on lines 10 to 11
1. Start a Mysql database running in background.
2. Set the password for `root` or any create another user with password
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
1. Start a Mysql database running in background.
2. Set the password for `root` or any create another user with password
1. Start a MySQL database.
2. Set the password for `root` or create another user with a password.

docs/tests.md Outdated
Comment on lines 13 to 14
4. Set the `cStorageUrl` in `tests/storage/StorageTestHelper.hpp` to
`jdbc:mariadb://localhost:3306/<db_name>?user=<usr>&password=<pwd>`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
4. Set the `cStorageUrl` in `tests/storage/StorageTestHelper.hpp` to
`jdbc:mariadb://localhost:3306/<db_name>?user=<usr>&password=<pwd>`
4. Set `cStorageUrl` in `tests/storage/StorageTestHelper.hpp` to
`jdbc:mariadb://localhost:3306/<db_name>?user=<db_user>&password=<db_password>`
* Replace `db_name`, `db_user`, and `db_password` with the values used in steps 2 & 3.

docs/tests.md Outdated

## Build and run unit tests

To build the unit tests, run the following commands in project root directory.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
To build the unit tests, run the following commands in project root directory.
To build and run the unit tests, run the following commands in the project's root directory.

docs/tests.md Outdated
Comment on lines 26 to 28
If the tests show error message for connection function below, revisit
the [Setup storage backend](#setup-storage-backend) and double check if `cStorageUrl` is correctly
set.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
If the tests show error message for connection function below, revisit
the [Setup storage backend](#setup-storage-backend) and double check if `cStorageUrl` is correctly
set.
If the tests show an error message for the connection function below, revisit the
[Set up storage backend](#set-up-a-storage-backend) section and check if `cStorageUrl` is set
correctly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants