-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test: Add tests for data storage #21
Conversation
sql::bytes have different constructors when we pass in char * and char const* together with the length. When we pass in char *, the length it stored internally is -length we passed in. To get the correct behavior. i.e. having a valid byte array with positive range, we must pass in char const*.
…eign key constraint
Warning Rate limit exceeded@sitaowang1998 has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 7 minutes and 21 seconds before requesting another review. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. WalkthroughThe pull request introduces significant modifications across various files, primarily focusing on enhancing the build configuration with the addition of the Catch2 testing framework and refining library dependencies in Changes
Possibly related PRs
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 6
🧹 Outside diff range and nitpick comments (14)
tests/CMakeLists.txt (1)
1-7
: Consider separating header files and improving variable configurationA few suggestions to improve the test sources configuration:
- Header files should not be listed in SOURCES variable
- CACHE INTERNAL is typically unnecessary for test sources
- Consider using separate variables for headers and sources
Here's the suggested improvement:
-set(SPIDER_TEST_SOURCES - storage/test-DataStorage.cpp - storage/StorageTestHelper.hpp - utils/CoreDataUtils.hpp - CACHE INTERNAL - "spider test source files" -) +set(SPIDER_TEST_SOURCES + storage/test-DataStorage.cpp +) + +set(SPIDER_TEST_HEADERS + storage/StorageTestHelper.hpp + utils/CoreDataUtils.hpp +)tests/utils/CoreDataUtils.hpp (3)
1-1
: Consider using project-relative includesThe relative path with multiple parent directories (
../../
) can become difficult to maintain if the directory structure changes. Consider using project-relative includes by adding the source directory to the include path in CMake.-#include "../../src/spider/core/Data.hpp" +#include "spider/core/Data.hpp"
5-27
: Add function documentationThe function would benefit from documentation explaining its purpose, parameters, and return value. Consider adding a descriptive comment block.
+/** + * Compares two Data objects for equality. + * + * @param d1 First Data object to compare + * @param d2 Second Data object to compare + * @return true if all fields are equal, false otherwise + */ inline auto data_equal(Data const& d1, Data const& d2) -> bool {
5-27
: Consider a more concise implementationThe current implementation, while clear, could be more concise by combining conditions. However, this is a trade-off between readability and conciseness.
inline auto data_equal(Data const& d1, Data const& d2) -> bool { - if (d1.get_id() != d2.get_id()) { - return false; - } - - if (d1.get_key() != d2.get_key()) { - return false; - } - - if (d1.get_locality() != d2.get_locality()) { - return false; - } - - if (d1.is_hard_locality() != d2.is_hard_locality()) { - return false; - } - - if (d1.get_value() != d2.get_value()) { - return false; - } - - return true; + return d1.get_id() == d2.get_id() + && d1.get_key() == d2.get_key() + && d1.get_locality() == d2.get_locality() + && d1.is_hard_locality() == d2.is_hard_locality() + && d1.get_value() == d2.get_value(); }src/spider/CMakeLists.txt (1)
4-13
: Improve header list organizationThe header organization is logical, but we could enhance readability with grouped components.
Consider this organization:
set(SPIDER_CORE_HEADERS + # Core components core/Error.hpp core/Data.hpp core/Task.hpp core/TaskGraph.hpp + # Storage components storage/MetadataStorage.hpp storage/DataStorage.hpp storage/MysqlStorage.hpp CACHE INTERNAL "spider core header files" )src/spider/storage/DataStorage.hpp (1)
26-27
: Consider enhancing method documentation and error handlingWhile the new methods add valuable functionality, they would benefit from:
- Documentation specifying:
- Behaviour when
data
pointer is null inget_data_by_key
- Error handling for non-existent keys/IDs
- Ownership semantics for the
data
parameter- Consider returning
std::optional<Data>
orexpected<Data, StorageErr>
instead of using an output parameter forget_data_by_key
src/spider/core/Data.hpp (2)
16-17
: Add documentation for the new constructor.The implementation looks good, but could benefit from documentation explaining its purpose and use cases.
Add a documentation comment:
+ /** + * Constructs a Data object with a specific UUID and value. + * @param id The UUID to use for this data + * @param value The value to store + */ Data(boost::uuids::uuid id, std::string value) : m_id(id), m_value(std::move(value)) {}
16-17
: Consider validating the UUID parameter.The constructor accepts any UUID without validation. Consider checking for nil UUIDs to prevent potential issues.
Example implementation:
Data(boost::uuids::uuid id, std::string value) : m_id(id), m_value(std::move(value)) { + if (id.is_nil()) { + throw std::invalid_argument("UUID cannot be nil"); + } }src/spider/storage/MetadataStorage.hpp (1)
15-19
: LGTM! Proper enforcement of unique ownership semantics.The deletion of copy/move constructors and assignment operators while providing a default constructor enforces proper RAII principles and prevents accidental sharing of storage resources.
Consider documenting these ownership semantics in a comment block above the class declaration to make it clear to users of this interface that implementations should maintain exclusive ownership of their resources.
tests/storage/test-DataStorage.cpp (2)
14-21
: Add documentation for the helper function.While the function is well-implemented with proper type constraints and memory management, it would benefit from documentation explaining its purpose and requirements.
Add documentation like this:
namespace { +/** + * Creates a new instance of a DataStorage-derived class. + * + * @tparam T The concrete storage class type that derives from DataStorage + * @return std::unique_ptr<spider::core::DataStorage> Smart pointer to the created storage instance + */ template <class T> requires std::derived_from<T, spider::core::DataStorage> auto create_data_storage() -> std::unique_ptr<spider::core::DataStorage> {
22-46
: Consider adding RAII-style cleanup.While the test case covers the core functionality well, consider using a fixture or RAII pattern to ensure proper cleanup even if the test fails midway.
Here's a suggested approach:
class StorageFixture { std::unique_ptr<spider::core::DataStorage> storage; public: StorageFixture() { storage = create_data_storage<TestType>(); storage->connect(spider::test::cStorageUrl); storage->initialize(); } ~StorageFixture() { // Cleanup logic here } auto get_storage() -> spider::core::DataStorage& { return *storage; } }; TEMPLATE_TEST_CASE("Add, get and remove data", "[storage]", spider::core::MySqlDataStorage) { StorageFixture fixture; auto& storage = fixture.get_storage(); // Rest of the test... }🧰 Tools
🪛 cppcheck
[error] 22-22: syntax error
(syntaxError)
lint-tasks.yaml (1)
73-76
: Consider optimizing the clang-format tasksThe separate clang-format task for the test directory could be combined with the source directory task to reduce duplication.
cpp-format-fix: sources: *cpp_format_src_files deps: ["cpp-configs", "venv"] cmds: - task: "clang-format" vars: FLAGS: "-i" - SRC_DIR: "{{.G_SRC_SPIDER_DIR}}" - - task: "clang-format" - vars: - FLAGS: "-i" - SRC_DIR: "{{.G_TEST_DIR}}" + DIRS: "{{.G_SRC_SPIDER_DIR}} {{.G_TEST_DIR}}"You'll also need to update the
clang-format
task definition accordingly:clang-format: internal: true requires: - vars: ["FLAGS", "SRC_DIR"] + vars: ["FLAGS", "DIRS"] cmd: |- . "{{.G_LINT_VENV_DIR}}/bin/activate" - find "{{.SRC_DIR}}" \ + find {{.DIRS}} \ -type f \ \( -iname "*.cpp" -o -iname "*.h" -o -iname "*.hpp" \) \ -print0 | \ xargs -0 --no-run-if-empty clang-format {{.FLAGS}} -Werrorsrc/spider/storage/MysqlStorage.cpp (2)
116-124
: Consider specifying index length for thekey
column.For the UNIQUE KEY on the
key
column in thedata
table, consider specifying an index length to optimize performance and storage space, especially since MySQL has limitations on index sizes.Apply this diff:
- UNIQUE KEY (`key`) USING BTREE, + UNIQUE KEY (`key`(64)) USING BTREE,
166-170
: Replace C-style cast with reinterpret_cast.While the current code is marked with NOLINT, consider using
reinterpret_cast
for better type safety and clearer intent.Apply this diff:
- return {(char const*)id.data(), id.size()}; + return {reinterpret_cast<char const*>(id.data()), id.size()};
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (14)
CMakeLists.txt
(1 hunks)lint-tasks.yaml
(5 hunks)src/spider/CMakeLists.txt
(1 hunks)src/spider/core/Data.hpp
(1 hunks)src/spider/core/Error.hpp
(1 hunks)src/spider/storage/DataStorage.hpp
(2 hunks)src/spider/storage/MetadataStorage.hpp
(1 hunks)src/spider/storage/MysqlStorage.cpp
(40 hunks)src/spider/storage/MysqlStorage.hpp
(2 hunks)taskfile.yaml
(1 hunks)tests/CMakeLists.txt
(1 hunks)tests/storage/StorageTestHelper.hpp
(1 hunks)tests/storage/test-DataStorage.cpp
(1 hunks)tests/utils/CoreDataUtils.hpp
(1 hunks)
✅ Files skipped from review due to trivial changes (1)
- tests/storage/StorageTestHelper.hpp
🧰 Additional context used
🪛 cppcheck
src/spider/storage/MysqlStorage.cpp
[performance] 87-87: Variable 'id' is assigned in constructor body. Consider performing initialization in initialization list.
(useInitializationList)
[performance] 114-114: Variable 'm_id' is assigned in constructor body. Consider performing initialization in initialization list.
(useInitializationList)
tests/storage/test-DataStorage.cpp
[error] 22-22: syntax error
(syntaxError)
🔇 Additional comments (16)
src/spider/core/Error.hpp (1)
29-29
: Excellent improvement to error handling clarity!
The replacement of the implicit boolean operator with an explicit success()
method is a positive change that:
- Makes success checks more self-documenting
- Prevents accidental misuse through the [[nodiscard]] attribute
- Aligns with modern C++ best practices
Let's verify the impact on existing code:
✅ Verification successful
Change is safe to proceed with
The search results show that StorageErr is primarily used as a return type in method declarations, particularly in storage-related interfaces and implementations. No direct boolean operator usage was found in conditional statements, suggesting that the code already uses explicit error checking rather than implicit boolean conversions.
The change from boolean operator to explicit success()
method is localized to the following files that include Error.hpp:
- tests/storage/test-DataStorage.cpp
- src/spider/storage/MysqlStorage.hpp
- src/spider/storage/MetadataStorage.hpp
- src/spider/storage/DataStorage.hpp
- src/spider/storage/MysqlStorage.cpp
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Search for potential boolean operator usage that needs updating
# Look for common patterns where StorageErr might be used in conditions
# Search for if statements with StorageErr
rg "if\s*\(\s*\w+\s*\)" --type cpp | rg -i "storage.*err"
# Search for while statements with StorageErr
rg "while\s*\(\s*\w+\s*\)" --type cpp | rg -i "storage.*err"
# Search for ternary operations with StorageErr
rg "\?\s*.*\:" --type cpp | rg -i "storage.*err"
Length of output: 256
Script:
#!/bin/bash
# Let's try a different approach to find potential usages of StorageErr in conditions
# Search for StorageErr usage in conditional contexts
rg -t cpp "if.*StorageErr|while.*StorageErr" -A 2
# Search for direct usage of StorageErr variables
rg -t cpp "StorageErr\s+\w+" -A 2
# Look for files that include Error.hpp to analyze potential impacts
rg -t cpp "#include.*Error\.hpp"
Length of output: 4769
taskfile.yaml (1)
14-14
: LGTM! The test directory variable is well-structured.
The addition of G_TEST_DIR
follows existing naming conventions and path structures, properly supporting the test infrastructure enhancement.
Let's verify the variable usage in the referenced lint tasks:
✅ Verification successful
Variable usage verified and properly integrated
The G_TEST_DIR
variable is correctly referenced across multiple tasks in lint-tasks.yaml
:
- Used in file pattern matching for C++ linting
- Set as
SRC_DIR
for static analysis - Set as
TEST_DIR
for test-related tasks
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify G_TEST_DIR usage in lint-tasks.yaml
# Expected: Should find references to G_TEST_DIR in linting configurations
rg -A 3 "G_TEST_DIR" lint-tasks.yaml
Length of output: 677
src/spider/CMakeLists.txt (2)
Line range hint 1-35
: Overall CMake configuration looks good
The CMake configuration properly supports the testing infrastructure while maintaining clear component organization and dependency management. The separation of sources, headers, and their respective visibilities is well-structured.
2-2
: Verify completeness of SPIDER_CORE_SOURCES
The core library appears to contain only a single source file. Given that the headers list includes multiple components (Core, Storage), we should verify that all necessary implementation files are included.
✅ Verification successful
SPIDER_CORE_SOURCES is correctly defined
The source list is complete and properly structured. The worker.cpp
file is intentionally separated into SPIDER_WORKER_SOURCES
as it belongs to the worker executable target, not the core library. The core library appropriately contains only the MySQL storage implementation, which aligns with the PR's focus on data storage testing.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check for potential missing source files
# Look for .cpp files that might need to be included in SPIDER_CORE_SOURCES
echo "Searching for potential source files in core/ and storage/ directories:"
fd -e cpp --type f . src/spider/core src/spider/storage
Length of output: 246
Script:
#!/bin/bash
# Let's verify if there are any other source files in the spider directory
# that might need to be included
fd -e cpp --type f . src/spider
# Also check if there are any other CMakeLists.txt files that might
# append to SPIDER_CORE_SOURCES
fd CMakeLists.txt src/spider
# Look at the content of any found CMakeLists.txt to see if they modify SPIDER_CORE_SOURCES
fd CMakeLists.txt src/spider -x cat {}
Length of output: 1228
src/spider/storage/DataStorage.hpp (2)
13-17
: Well-implemented Rule of Five!
The explicit deletion of copy/move operations enforces proper RAII semantics and prevents accidental resource duplication, which is crucial for a storage interface.
Line range hint 13-27
: Verify interface implementation completeness
Let's ensure all derived classes properly implement the new interface and respect the non-copyable semantics.
✅ Verification successful
Let me generate additional verification scripts to check for any other derived classes and ensure complete implementation coverage.
Interface implementation and usage patterns are properly aligned
The verification confirms:
- MySqlDataStorage is the only implementation of DataStorage and properly implements all required methods including
get_data_by_key
andremove_data
- Non-copyable semantics are respected throughout the codebase, with proper deletion of copy constructors/assignments
- Usage patterns show only safe handling via std::unique_ptr in test code
- No dangerous copy operations or raw pointer usage detected
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify implementation of new methods and usage of DataStorage
# Check for derived classes that need to implement the new methods
echo "Checking for DataStorage implementations..."
ast-grep --pattern 'class $_ : public spider::core::DataStorage'
# Look for potential copy attempts that would now fail
echo "Checking for potential copy operations..."
rg "DataStorage\(" --type cpp
# Verify all pure virtual methods are implemented
echo "Checking for get_data_by_key implementations..."
rg "get_data_by_key.*StorageErr" --type cpp
echo "Checking for remove_data implementations..."
rg "remove_data.*StorageErr" --type cpp
Length of output: 1892
Script:
#!/bin/bash
# Search for any other storage implementations we might have missed
echo "Checking for other storage implementations..."
fd -e hpp -e cpp -e h | xargs rg "class.*Storage.*public"
# Check for any DataStorage object instantiations or assignments
echo "Checking DataStorage usage patterns..."
rg "std::.*<.*DataStorage" --type cpp
rg "= new.*Storage" --type cpp
rg "make_.*<.*Storage>" --type cpp
# Verify the complete interface implementation in MySqlDataStorage
echo "Checking MySqlDataStorage interface implementation..."
ast-grep --pattern 'class MySqlDataStorage : public spider::core::DataStorage {
$$$
}'
Length of output: 1325
src/spider/storage/MetadataStorage.hpp (1)
15-19
: Verify impact on existing storage implementations.
The changes to ownership semantics might affect existing implementations of this interface.
✅ Verification successful
No impact on existing implementations - ownership semantics already aligned
The existing MySqlMetadataStorage
implementation already follows the same ownership semantics as the base class, with copy and move operations explicitly deleted. The implementation maintains a raw pointer to a connection but properly manages its lifetime through connect()
and close()
methods.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Find all classes implementing MetadataStorage to verify compatibility
# with the new ownership semantics
# Search for class declarations that inherit from MetadataStorage
ast-grep --pattern 'class $_ : $$$public MetadataStorage$$$'
# Also check for any direct instantiations of MetadataStorage subclasses
# that might be doing copies or moves
rg -l "new \w+MetadataStorage|= \w+MetadataStorage|std::make_\w+<\w+MetadataStorage>"
Length of output: 150
Script:
#!/bin/bash
# Let's try a different approach to find implementations and usage
# Find files that include MetadataStorage.hpp
rg -l "MetadataStorage\.hpp"
# Look for class definitions that might inherit from MetadataStorage
# using a broader pattern
rg -l "class.*MetadataStorage"
# Search for any usage of MetadataStorage
rg -l "MetadataStorage"
# Look for potential factory methods or constructors
rg -A 5 "MetadataStorage::|->MetadataStorage"
Length of output: 13395
Script:
#!/bin/bash
# Let's examine the MySqlStorage implementation to verify ownership semantics
# Check MysqlStorage.hpp for its class definition and member variables
cat src/spider/storage/MysqlStorage.hpp
# Look for any copy/move operations in MysqlStorage
rg -A 2 "MySqlMetadataStorage\(.*\)" src/spider/storage/MysqlStorage.hpp
Length of output: 4892
tests/storage/test-DataStorage.cpp (2)
1-13
: LGTM! Well-organized includes and proper lint suppressions.
The includes are logically organized and the NOLINT suppressions are appropriately scoped to specific rules.
1-79
: Overall implementation is solid and well-tested.
The test cases provide good coverage of the DataStorage functionality, including error cases and data integrity checks. The suggested improvements are mainly for maintainability and don't impact the correctness of the tests.
Note: The cppcheck syntax error can be safely ignored as it appears to be a false positive related to template syntax parsing.
🧰 Tools
🪛 cppcheck
[error] 22-22: syntax error
(syntaxError)
CMakeLists.txt (2)
138-139
: LGTM! Good choice of testing framework.
The addition of Catch2 is appropriate for C++ unit testing and aligns well with the PR's objective of adding data storage tests.
136-136
: Verify if abseil-cpp is required in default build target.
The removal of EXCLUDE_FROM_ALL means abseil-cpp will now be built by default, which could impact build times. Please confirm if this is necessary for the test framework or if it can remain excluded from the default target.
✅ Verification successful
Abseil library is required in default build target
The removal of EXCLUDE_FROM_ALL is necessary as abseil-cpp is used in core components:
- TaskGraph uses
absl::flat_hash_map
andabsl::flat_hash_set
for essential data structures - MysqlStorage depends on TaskGraph and uses abseil collections
- These components are tested in
tests/storage/test-DataStorage.cpp
Since these are core data structures used in both the main application and tests, abseil should be included in the default build target.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Check if abseil is used in test files
echo "Checking abseil usage in test files..."
rg -l "absl::" tests/
Length of output: 108
Script:
#!/bin/bash
# Let's check for abseil usage in the entire codebase
echo "Checking abseil usage across the codebase..."
rg -l "absl::" .
# Also check for any direct includes of abseil headers
echo -e "\nChecking abseil header includes..."
rg -l "#include.*absl/" .
# Check if there are any CMake dependencies on abseil
echo -e "\nChecking CMake dependencies on abseil..."
rg -l "abseil" . --type cmake
Length of output: 540
Script:
#!/bin/bash
# Let's examine the actual usage in these files to understand if it's essential
echo "Checking MysqlStorage.cpp abseil usage..."
rg "absl::" src/spider/storage/MysqlStorage.cpp -B 2 -A 2
echo -e "\nChecking TaskGraph.hpp abseil usage..."
rg "absl::" src/spider/core/TaskGraph.hpp -B 2 -A 2
# Check if these files are used in tests
echo -e "\nChecking if these files are referenced in tests..."
rg "MysqlStorage|TaskGraph" tests/ -l
Length of output: 1284
src/spider/storage/MysqlStorage.hpp (2)
22-27
: Well-implemented Rule of Five declarations!
The special member functions are properly declared to prevent object slicing and ensure unique ownership semantics. The virtual destructor is correctly marked as override.
63-68
: Proper implementation of Rule of Five!
The special member functions are correctly declared to prevent object slicing and ensure unique ownership semantics.
lint-tasks.yaml (3)
53-55
: Excellent addition of test files to format checking!
The inclusion of test files in the formatting checks will help maintain consistent code style throughout the codebase.
89-91
: Well-structured addition of test files to static analysis!
The inclusion ensures test files receive the same level of static analysis as source files, maintaining code quality standards across the entire codebase.
137-137
: Excellent implementation of the clang-tidy task!
The approach of combining source and test directories in a single find command is efficient and well-structured. This serves as a good example for the format-fix task optimization suggested earlier.
Also applies to: 140-140
add_executable(unitTest) | ||
target_sources(unitTest PRIVATE ${SPIDER_TEST_SOURCES}) | ||
target_link_libraries(unitTest PRIVATE Catch2::Catch2WithMain) | ||
target_link_libraries(unitTest PRIVATE spider_core) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codebase verification
Catch2 submodule needs initialization
The Catch2 dependency is properly configured in CMake via submodules, but the submodule hasn't been initialized. The developer needs to run:
git submodule update --init --recursive
before building the tests.
- The
.gitmodules
file correctly references Catch2 atsubmodules/Catch2
- The
-
prefix in git submodule status indicates the submodule is not initialized - The empty Catch2 directory confirms this state
🔗 Analysis chain
Verify Catch2 dependency setup
The test executable configuration looks good, following CMake best practices with proper visibility modifiers and dependencies.
Let's verify the Catch2 setup:
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify Catch2 is properly configured in the project
# Check if Catch2 is included in the main CMakeLists.txt
echo "Checking Catch2 configuration..."
rg "add_subdirectory.*Catch2" CMakeLists.txt
# Verify Catch2 submodule exists
echo "Checking Catch2 submodule..."
if [ -d "submodules/Catch2" ]; then
echo "Catch2 submodule found"
cat submodules/Catch2/CMakeLists.txt | head -n 5
else
echo "Warning: Catch2 submodule not found"
fi
Length of output: 426
Script:
#!/bin/bash
# Let's verify how Catch2 is being found/configured in the project
# Check for FetchContent usage for Catch2
echo "Checking for FetchContent usage..."
rg "FetchContent.*Catch2" -A 5
# Check for find_package usage for Catch2
echo "Checking for find_package usage..."
rg "find_package.*Catch2"
# Look for any Catch2 related configuration
echo "Checking for any Catch2 related config..."
rg "Catch2" CMakeLists.txt
Length of output: 374
Script:
#!/bin/bash
# Let's check the git submodules configuration and status
# Check .gitmodules file
echo "Checking .gitmodules configuration..."
cat .gitmodules
# Check git submodule status
echo -e "\nChecking git submodule status..."
git submodule status
# List contents of submodules directory
echo -e "\nListing submodules directory..."
ls -la submodules/
Length of output: 1063
add_library(spider_core) | ||
target_sources(spider_core PRIVATE ${SPIDER_CORE_SOURCES}) | ||
target_sources(spider_core PUBLIC ${SPIDER_CORE_HEADERS}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codebase verification
Library type should be explicitly defined
The removal of static/shared configuration in src/spider/CMakeLists.txt
conflicts with the project-wide build configuration. The main CMakeLists.txt
still manages the SPIDER_USE_STATIC_LIBS
option which affects multiple dependencies (Boost, spdlog) and has platform-specific handling. The library type should be explicitly set based on this global setting.
- In
src/spider/CMakeLists.txt
, restore the conditional library type:
if(SPIDER_USE_STATIC_LIBS)
add_library(spider_core STATIC)
else()
add_library(spider_core SHARED)
endif()
🔗 Analysis chain
Consider preserving build flexibility
The removal of the static/shared library condition might limit build configuration options. Was this intentional?
Let me check the content of these CMake files to better understand the library configuration.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check for other components that might depend on library type
echo "Checking for potential library type dependencies:"
rg -l "SPIDER_USE_STATIC_LIBS" .
echo "Checking for library references in test configurations:"
fd CMakeLists.txt -t f tests/
Length of output: 332
Script:
#!/bin/bash
# Check the main CMakeLists.txt for static library configuration
echo "=== Main CMakeLists.txt ==="
rg "SPIDER_USE_STATIC_LIBS" CMakeLists.txt -A 3 -B 3
echo -e "\n=== Test CMakeLists.txt ==="
cat tests/CMakeLists.txt
echo -e "\n=== Previous spider library configuration ==="
git log -p src/spider/CMakeLists.txt | grep -A 10 "add_library"
Length of output: 4035
tests/storage/test-DataStorage.cpp
Outdated
TEMPLATE_TEST_CASE( | ||
"Add, get and remove data with key", | ||
"[storage]", | ||
spider::core::MySqlDataStorage | ||
) { | ||
std::unique_ptr<spider::core::DataStorage> storage = create_data_storage<TestType>(); | ||
REQUIRE(storage->connect(spider::test::cStorageUrl).success()); | ||
REQUIRE(storage->initialize().success()); | ||
|
||
// Add data | ||
spider::core::Data const data{"key", "value"}; | ||
REQUIRE(storage->add_data(data).success()); | ||
|
||
// Add data with same key again should fail | ||
spider::core::Data const data_same_key{"key", "value2"}; | ||
REQUIRE(spider::core::StorageErrType::DuplicateKeyErr == storage->add_data(data_same_key).type); | ||
|
||
// Get data should match | ||
spider::core::Data result{"temp"}; | ||
REQUIRE(storage->get_data_by_key("key", &result).success()); | ||
REQUIRE(spider::core::data_equal(data, result)); | ||
|
||
// Remove data should succeed | ||
REQUIRE(storage->remove_data(data.get_id()).success()); | ||
|
||
// Get data should fail | ||
REQUIRE(spider::core::StorageErrType::KeyNotFoundErr | ||
== storage->get_data_by_key("key", &result).type); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Reduce code duplication between test cases.
The setup and teardown code is duplicated between the two test cases. Consider extracting common test setup into shared helper functions or using the fixture pattern suggested earlier.
Example approach:
namespace {
// Test helpers
template<typename T>
void verify_duplicate_key_error(spider::core::DataStorage& storage,
spider::core::Data const& duplicate_data) {
REQUIRE(spider::core::StorageErrType::DuplicateKeyErr ==
storage.add_data(duplicate_data).type);
}
template<typename T>
void verify_data_retrieval(spider::core::DataStorage& storage,
spider::core::Data const& expected) {
spider::core::Data result{"temp"};
REQUIRE(storage.get_data(expected.get_id(), &result).success());
REQUIRE(spider::core::data_equal(expected, result));
}
// ... more helpers ...
}
@@ -52,13 +58,21 @@ class MySqlMetadataStorage : public MetadataStorage { | |||
auto fetch_task(std::unique_ptr<sql::ResultSet> const& res) -> Task; | |||
}; | |||
|
|||
class MysqlDataStorage : public DataStorage { | |||
class MySqlDataStorage : public DataStorage { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codebase verification
MySQL capitalization is inconsistent in filenames but consistent in class names
The codebase shows:
- Class names consistently use "MySql" capitalization (e.g.,
MySqlDataStorage
,MySqlMetadataStorage
) - File names inconsistently use "Mysql" (e.g.,
MysqlStorage.hpp
,MysqlStorage.cpp
)
To maintain consistency:
- Rename
MysqlStorage.hpp
toMySqlStorage.hpp
- Rename
MysqlStorage.cpp
toMySqlStorage.cpp
- Update corresponding include statements and header guards
🔗 Analysis chain
Consider standardizing MySQL capitalization
The class name uses "MySql" while the conventional spelling is "MySQL". Consider standardizing the capitalization across the codebase.
Additionally, the new methods get_data_by_key
and remove_data
are well-integrated with the existing interface.
Let's verify the MySQL capitalization across the codebase:
Also applies to: 74-75
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for different MySQL capitalizations to ensure consistency
echo "Searching for MySQL capitalization variants..."
rg -i "mysql" --type cpp
Length of output: 6177
src/spider/storage/MysqlStorage.cpp
Outdated
m_conn->prepareStatement("DELETE FROM `data_ref_task` WHERE " | ||
"`id` = ? AND `task_i`d = ?") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix typo in SQL query.
There's a typo in the column name task_id
where a backtick is incorrectly placed.
Apply this diff:
- "`id` = ? AND `task_i`d = ?")
+ "`id` = ? AND `task_id` = ?")
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
m_conn->prepareStatement("DELETE FROM `data_ref_task` WHERE " | |
"`id` = ? AND `task_i`d = ?") | |
m_conn->prepareStatement("DELETE FROM `data_ref_task` WHERE " | |
"`id` = ? AND `task_id` = ?") |
src/spider/storage/MysqlStorage.cpp
Outdated
std::unique_ptr<sql::PreparedStatement> statement(m_conn->prepareStatement( | ||
"SELECT `id` FROM `driver`s WHERE TIMESTAMPDIFF(MICROSECOND, " | ||
"`heartbeat`, CURRENT_TIMESTAMP()) > ?" | ||
)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix typo in SQL query.
There's a typo in the table name drivers
where a backtick is incorrectly placed.
Apply this diff:
- "SELECT `id` FROM `driver`s WHERE TIMESTAMPDIFF(MICROSECOND, "
+ "SELECT `id` FROM `drivers` WHERE TIMESTAMPDIFF(MICROSECOND, "
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
std::unique_ptr<sql::PreparedStatement> statement(m_conn->prepareStatement( | |
"SELECT `id` FROM `driver`s WHERE TIMESTAMPDIFF(MICROSECOND, " | |
"`heartbeat`, CURRENT_TIMESTAMP()) > ?" | |
)); | |
std::unique_ptr<sql::PreparedStatement> statement(m_conn->prepareStatement( | |
"SELECT `id` FROM `drivers` WHERE TIMESTAMPDIFF(MICROSECOND, " | |
"`heartbeat`, CURRENT_TIMESTAMP()) > ?" | |
)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (5)
src/spider/storage/MysqlStorage.cpp (5)
152-163
: Consider reordering table creation statements for better clarity.The table creation order is critical due to foreign key constraints. Consider grouping related tables together and adding comments to explain the dependencies.
Apply this diff:
std::array<char const* const, 11> const cCreateStorage = { - cCreateDriverTable, // drivers table must be created before data_ref_driver - cCreateSchedulerTable, - cCreateTaskTable, // tasks table must be created before data_ref_task - cCreateDataTable, // data table must be created before task_outputs - cCreateDataLocalityTable, - cCreateDataRefDriverTable, - cCreateDataRefTaskTable, - cCreateTaskOutputTable, // task_outputs table must be created before task_inputs - cCreateTaskInputTable, - cCreateTaskDependencyTable, - cCreateTaskInstanceTable, + // Core tables + cCreateDriverTable, + cCreateSchedulerTable, + cCreateTaskTable, + cCreateDataTable, + + // Data locality and references + cCreateDataLocalityTable, + cCreateDataRefDriverTable, + cCreateDataRefTaskTable, + + // Task relationships + cCreateTaskOutputTable, + cCreateTaskInputTable, + cCreateTaskDependencyTable, + cCreateTaskInstanceTable, };
108-113
: Consider using DATETIME instead of TIMESTAMP for task_instances.The TIMESTAMP data type in MySQL has a Y2K38 limitation (will overflow in January 2038). For future-proofing, consider using DATETIME which has a much larger range.
Apply this diff:
char const* const cCreateTaskInstanceTable = R"(CREATE TABLE IF NOT EXISTS `task_instances` ( `id` BINARY(16) NOT NULL, `task_id` BINARY(16) NOT NULL, - `start_time` TIMESTAMP NOT NULL, + `start_time` DATETIME(6) NOT NULL, CONSTRAINT `instance_task_id` FOREIGN KEY (`task_id`) REFERENCES `tasks` (`id`) ON UPDATE NO ACTION ON DELETE CASCADE, PRIMARY KEY (`id`) ))";
Line range hint
1022-1062
: Potential SQL injection vulnerability in get_data_by_key method.While the code uses prepared statements which generally prevent SQL injection, the key parameter should be validated to ensure it doesn't contain malicious patterns.
Add input validation:
auto MySqlDataStorage::get_data_by_key(std::string const& key, Data* data) -> StorageErr { + // Validate key length and characters + if (key.length() > 64 || key.find_first_not_of("abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789-_") != std::string::npos) { + return StorageErr{StorageErrType::InvalidInputErr, "Invalid key format"}; + } try { std::unique_ptr<sql::PreparedStatement> statement(
Line range hint
978-1020
: Enhance error handling with more specific error types.The error handling could be more granular to help with debugging and monitoring.
Consider adding specific error types for common scenarios:
// Add to StorageErrType enum enum class StorageErrType { // ... existing types ... DataNotFoundErr, DataCorruptedErr, LocalityNotFoundErr, // ... more specific types ... };
Line range hint
300-313
: Consider extracting helper methods to improve readability.The add_task method is quite long and handles multiple concerns. Consider extracting the task input and output handling into separate methods.
Example refactoring:
void MySqlMetadataStorage::add_task_inputs(sql::bytes const& task_id_bytes, Task const& task); void MySqlMetadataStorage::add_task_outputs(sql::bytes const& task_id_bytes, Task const& task); void MySqlMetadataStorage::add_task(sql::bytes job_id, Task const& task) { add_task_base(job_id, task); add_task_inputs(task_id_bytes, task); add_task_outputs(task_id_bytes, task); }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (1)
src/spider/storage/MysqlStorage.cpp
(40 hunks)
🧰 Additional context used
🪛 cppcheck
src/spider/storage/MysqlStorage.cpp
[performance] 87-87: Variable 'id' is assigned in constructor body. Consider performing initialization in initialization list.
(useInitializationList)
[performance] 114-114: Variable 'm_id' is assigned in constructor body. Consider performing initialization in initialization list.
(useInitializationList)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 6
🧹 Outside diff range and nitpick comments (8)
tests/storage/StorageTestHelper.hpp (1)
32-45
: Reduce code duplication in create_storage functionThe connection and initialization logic is duplicated from the single-storage creation functions. Consider reusing the existing helper functions.
Suggested refactor:
template <class M, class D> requires std::derived_from<M, core::MetadataStorage> && std::derived_from<D, core::DataStorage> auto create_storage( ) -> std::tuple<std::unique_ptr<core::MetadataStorage>, std::unique_ptr<core::DataStorage>> { - std::unique_ptr<core::MetadataStorage> metadata_storage{ - static_cast<core::MetadataStorage*>(new M()) - }; - REQUIRE(metadata_storage->connect(spider::test::cStorageUrl).success()); - REQUIRE(metadata_storage->initialize().success()); - std::unique_ptr<core::DataStorage> data_storage{static_cast<core::DataStorage*>(new D())}; - REQUIRE(data_storage->connect(spider::test::cStorageUrl).success()); - REQUIRE(data_storage->initialize().success()); + auto metadata_storage = create_metadata_storage<M>(); + auto data_storage = create_data_storage<D>(); return std::make_tuple(std::move(metadata_storage), std::move(data_storage)); }src/spider/storage/MetadataStorage.hpp (2)
15-19
: LGTM: Appropriate ownership semantics establishedThe explicit deletion of copy/move operations enforces single ownership, which is crucial for storage interfaces managing unique resources like database connections.
This design choice prevents accidental resource sharing and potential connection management issues in a distributed system.
Line range hint
1-38
: Consider adding interface documentationWhile the interface is well-designed, it would benefit from:
- Class-level documentation explaining the storage abstraction
- Method-level documentation describing error conditions and threading guarantees
Would you like me to provide a template for the documentation?
src/spider/core/TaskGraph.hpp (2)
Line range hint
6-8
: Clean up unused boost::uuid headersThe boost::uuid headers are only used for Task IDs now, but not for TaskGraph. Consider moving these includes to Task.hpp if they're not needed here.
-#include <boost/uuid/random_generator.hpp> -#include <boost/uuid/uuid.hpp>
Line range hint
1-105
: Consider documenting the TaskGraph lifecycleWith the removal of TaskGraph identification, it would be valuable to document:
- The expected lifecycle of TaskGraph instances
- How TaskGraph instances interact with the data storage layer
- Whether multiple TaskGraph instances can exist simultaneously
This will help maintain architectural clarity as the system evolves.
src/spider/storage/MysqlStorage.cpp (3)
174-177
: Consider using reinterpret_cast instead of C-style cast.While the C-style cast is marked with NOLINT directives, using
reinterpret_cast
would be more idiomatic in C++ and makes the casting intention more explicit.- return {(char const*)id.data(), id.size()}; + return {reinterpret_cast<char const*>(id.data()), id.size()};
Line range hint
525-585
: Consider adding indexes to optimize task graph retrieval.The
get_task_graph
method performs multiple JOIN operations that could benefit from proper indexing. While there are some indexes defined in the table creation statements, additional indexes might improve query performance, especially for large task graphs.Consider adding the following indexes:
- Composite index on
tasks(job_id, id)
to optimize the initial task selection- Composite index on
task_inputs(task_id, position)
to optimize input retrieval- Composite index on
task_outputs(task_id, position)
to optimize output retrieval
Line range hint
1047-1087
: Reduce code duplication in data retrieval methods.The locality retrieval logic is duplicated between
get_data
andget_data_by_key
. Consider extracting this common functionality into a private helper method.Create a private helper method:
private: auto fetch_locality(boost::uuids::uuid const& id, Data* data) -> void { sql::bytes id_bytes = uuid_get_bytes(id); std::unique_ptr<sql::PreparedStatement> locality_statement( m_conn->prepareStatement("SELECT `address` FROM `data_locality` WHERE `id` = ?") ); locality_statement->setBytes(1, &id_bytes); std::unique_ptr<sql::ResultSet> const locality_res(locality_statement->executeQuery()); std::vector<std::string> locality; while (locality_res->next()) { locality.emplace_back(locality_res->getString(1)); } if (!locality.empty()) { data->set_locality(locality); } }Then update both methods to use this helper:
// In get_data and get_data_by_key after setting data properties fetch_locality(id, data);
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (6)
src/spider/core/TaskGraph.hpp
(1 hunks)src/spider/storage/MetadataStorage.hpp
(2 hunks)src/spider/storage/MysqlStorage.cpp
(41 hunks)src/spider/storage/MysqlStorage.hpp
(2 hunks)tests/storage/StorageTestHelper.hpp
(1 hunks)tests/storage/test-DataStorage.cpp
(1 hunks)
🧰 Additional context used
🪛 cppcheck
src/spider/storage/MysqlStorage.cpp
[performance] 87-87: Variable 'id' is assigned in constructor body. Consider performing initialization in initialization list.
(useInitializationList)
[performance] 114-114: Variable 'm_id' is assigned in constructor body. Consider performing initialization in initialization list.
(useInitializationList)
tests/storage/test-DataStorage.cpp
[error] 41-41: syntax error
(syntaxError)
🔇 Additional comments (9)
src/spider/storage/MetadataStorage.hpp (1)
30-38
: LGTM: Enhanced job management with proper ownership
The introduction of client_id and job-centric operations improves the data model by establishing clear ownership and hierarchical relationships.
Let's verify consistent UUID usage across the codebase:
✅ Verification successful
UUID usage is consistent across the codebase
The verification confirms consistent usage of boost::uuids::uuid
throughout the codebase:
- All UUID declarations use the same type
boost::uuids::uuid
- Proper boost UUID headers are included where needed
- UUID generation is consistently handled using
boost::uuids::random_generator
- The hierarchical relationship (client -> job -> task) maintains consistent UUID usage for IDs
The changes to the job management interface maintain type consistency with the rest of the system.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check for consistent UUID type usage across the codebase
# Expect: All UUIDs should use boost::uuids::uuid type
# Search for UUID parameter declarations
rg -t cpp -t hpp "uuid|UUID" -A 2 -B 2
# Search specifically for boost UUID includes
rg -t cpp -t hpp "boost/uuid"
Length of output: 126
Script:
#!/bin/bash
# Let's try with correct file extensions and more specific patterns
# Search for UUID usage patterns
rg -t h -t cpp -t c -t hpp "boost::uuids::uuid" -A 2 -B 2
# Search for boost UUID includes
rg -t h -t cpp -t c -t hpp "#include.*boost.*uuid"
# Search for any UUID-related declarations
rg -t h -t cpp -t c -t hpp "uuid|UUID" -A 2 -B 2 --ignore-case
Length of output: 259
Script:
#!/bin/bash
# Let's try without file type restrictions and use fd to find relevant files first
# Find all C++ related files
fd -e cpp -e h -e c --type f
# Then use these commands on all files
rg "boost::uuids::uuid"
rg "#include.*boost.*uuid"
rg "class.*UUID|struct.*UUID|uuid_t|UUID|uuid" --ignore-case
Length of output: 34867
src/spider/core/TaskGraph.hpp (1)
18-18
: Verify the impact of removing TaskGraph identification
The change from a UUID-generating constructor to a default constructor removes the ability to uniquely identify TaskGraph instances. This could impact distributed processing or persistence scenarios.
Let's verify if TaskGraph identification is used elsewhere in the codebase:
src/spider/storage/MysqlStorage.hpp (5)
22-27
: LGTM: Proper implementation of special member functions.
The class correctly implements the Rule of Five with appropriate defaulting and deletion of special member functions. This enforces proper resource management and prevents unintended copying.
68-73
: LGTM: Consistent implementation of special member functions.
The special member functions mirror the implementation in MySqlMetadataStorage, maintaining consistency across the codebase.
66-66
: Previous comment about MySQL capitalization inconsistency is still applicable.
Please refer to the previous review comment about standardizing MySQL capitalization in file names and class names.
79-80
: Consider adding transaction support for data operations.
The new methods enhance the data management capabilities. However, consider wrapping these operations in transactions to ensure atomicity, especially when used in conjunction with reference management methods.
#!/bin/bash
# Description: Check for transaction handling in storage operations
# Test: Look for transaction-related calls in implementation
rg "BEGIN|COMMIT|ROLLBACK" --type cpp
34-42
: Verify job management changes across the codebase.
The renaming from task_graph to job and the addition of client_id parameter represent significant interface changes.
✅ Verification successful
Let me search for these patterns to verify the changes properly. The previous search didn't provide enough context to determine if the interface changes were properly implemented across the codebase.
Interface changes have been properly implemented across the codebase
The search results confirm that:
- No occurrences of old method names (
add_task_graph
orget_task_graphs
) were found - The new
add_job
interface withclient_id
parameter is consistently defined in the baseMetadataStorage
class and implemented inMySqlMetadataStorage
- The only caller found in tests is already using the new interface correctly with both job_id and client_id parameters
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify consistent usage of new job management methods
# Test: Search for old method names to ensure complete migration
rg "add_task_graph|get_task_graphs" --type cpp
# Test: Verify all callers have been updated with client_id
rg "add_job.*client_id.*task_graph" --type cpp
Length of output: 932
Script:
#!/bin/bash
# Search for old method names with better context
rg "add_task_graph|get_task_graphs" -B 2 -A 2 --type cpp
# Search for add_job method calls
rg "add_job\(" -B 2 -A 2 --type cpp
# Search for Storage interface definition
fd "Storage.hpp" --type f --exec cat {}
Length of output: 10706
src/spider/storage/MysqlStorage.cpp (1)
123-131
: Consider the implications of UNIQUE KEY on a nullable column.
The key
column in the data
table is nullable and has a UNIQUE KEY constraint. In MySQL/MariaDB, NULL values are considered distinct in UNIQUE indexes, allowing multiple NULL values. This might lead to unexpected behaviour if your application logic assumes uniqueness across all values including NULL.
Consider one of these approaches:
- Make the
key
column NOT NULL if uniqueness is required for all records - Add a CHECK constraint to ensure either all keys are unique or NULL
- Handle NULL keys explicitly in the application logic
✅ Verification successful
UNIQUE KEY constraint on nullable column is properly handled
The code shows proper handling of NULL values for the key
column:
- The
Data
class usesstd::optional<std::string>
for the key field - In
MysqlStorage.cpp
, NULL keys are explicitly handled during insertion:if (key.has_value()) { statement->setString(2, key.value()); } else { statement->setNull(2, sql::DataType::VARCHAR); }
The design appears intentional, allowing both unique non-NULL keys and multiple NULL keys, which is a valid pattern for optional unique identifiers.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check if there are any NULL keys in the data table
# Test: Search for NULL handling in the codebase
rg -A 5 'isNull.*key|key.*isNull'
Length of output: 35
Script:
#!/bin/bash
# Let's check how the key column is used in SQL queries and data operations
rg -A 5 "INSERT.*data.*key|SELECT.*data.*key|UPDATE.*data.*key"
# Also check for any key-related method implementations
ast-grep --pattern 'key($$$)'
Length of output: 658
Script:
#!/bin/bash
# Let's examine the actual usage of the data table in the code
rg -A 10 "CREATE TABLE.*data"
Length of output: 3129
Script:
#!/bin/bash
# Check for any key validation or handling in the storage classes
ast-grep --pattern 'class MySqlDataStorage {
$$$
}'
Length of output: 55
Script:
#!/bin/bash
# Let's check how the data object is handled in the code
rg -A 10 "get_key|set_key"
# Also check for any NULL handling in SQL statements
rg -A 5 "WHERE.*key.*NULL|key.*IS.*NULL"
# Look for data class definition
ast-grep --pattern 'class Data {
$$$
}'
Length of output: 2252
tests/storage/test-DataStorage.cpp (1)
74-74
:
Adjust macro syntax for std::tuple
in TEMPLATE_TEST_CASE
When using std::tuple
in the TEMPLATE_TEST_CASE
macro, ensure that the syntax is correct to prevent compilation errors. The closing parenthesis )
should be placed correctly.
Apply the following diff to adjust the syntax:
-TEMPLATE_TEST_CASE(
- "Add and remove data reference for task",
- "[storage]",
- (std::tuple<spider::core::MySqlMetadataStorage, spider::core::MySqlDataStorage>)
-) {
+TEMPLATE_TEST_CASE(
+ "Add and remove data reference for task",
+ "[storage]",
+ (std::tuple<spider::core::MySqlMetadataStorage, spider::core::MySqlDataStorage>)) {
Make sure the extra closing parenthesis is added after the template parameter list.
Likely invalid or redundant comment.
tests/storage/StorageTestHelper.hpp
Outdated
auto create_data_storage() -> std::unique_ptr<core::DataStorage> { | ||
std::unique_ptr<core::DataStorage> storage{static_cast<core::DataStorage*>(new T())}; | ||
REQUIRE(storage->connect(spider::test::cStorageUrl).success()); | ||
REQUIRE(storage->initialize().success()); | ||
return storage; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Use std::make_unique instead of raw new
Replace raw new operations with std::make_unique for better exception safety and cleaner code.
Example refactor for create_data_storage:
auto create_data_storage() -> std::unique_ptr<core::DataStorage> {
- std::unique_ptr<core::DataStorage> storage{static_cast<core::DataStorage*>(new T())};
+ auto storage = std::make_unique<T>();
REQUIRE(storage->connect(spider::test::cStorageUrl).success());
REQUIRE(storage->initialize().success());
return storage;
}
Apply similar changes to create_metadata_storage.
Also applies to: 25-30
tests/storage/StorageTestHelper.hpp
Outdated
#ifndef SPIDER_TEST_STORAGETETHELPER_HPP | ||
#define SPIDER_TEST_STORAGETETHELPER_HPP |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix typo in header guard macro name
The header guard macro contains a typo: STORAGETETHELPER
should be STORAGETESTHELPER
.
Apply this diff:
-#ifndef SPIDER_TEST_STORAGETETHELPER_HPP
-#define SPIDER_TEST_STORAGETETHELPER_HPP
+#ifndef SPIDER_TEST_STORAGETESTHELPER_HPP
+#define SPIDER_TEST_STORAGETESTHELPER_HPP
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
#ifndef SPIDER_TEST_STORAGETETHELPER_HPP | |
#define SPIDER_TEST_STORAGETETHELPER_HPP | |
#ifndef SPIDER_TEST_STORAGETESTHELPER_HPP | |
#define SPIDER_TEST_STORAGETESTHELPER_HPP |
char const* const cStorageUrl | ||
= "jdbc:mariadb://localhost:3306/spider_test?user=root&password=password"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move database credentials to environment variables
Hardcoding database credentials in source code is a security risk and reduces test maintainability. Consider using environment variables or a test configuration file.
Suggested approach:
-char const* const cStorageUrl
- = "jdbc:mariadb://localhost:3306/spider_test?user=root&password=password";
+char const* const cStorageUrl = []() {
+ auto* url = std::getenv("SPIDER_TEST_DB_URL");
+ if (url == nullptr) {
+ throw std::runtime_error("SPIDER_TEST_DB_URL environment variable not set");
+ }
+ return url;
+}();
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
char const* const cStorageUrl | |
= "jdbc:mariadb://localhost:3306/spider_test?user=root&password=password"; | |
char const* const cStorageUrl = []() { | |
auto* url = std::getenv("SPIDER_TEST_DB_URL"); | |
if (url == nullptr) { | |
throw std::runtime_error("SPIDER_TEST_DB_URL environment variable not set"); | |
} | |
return url; | |
}(); |
"Add and remove data reference for driver", | ||
"[storage]", | ||
(std::tuple<spider::core::MySqlMetadataStorage, spider::core::MySqlDataStorage>) | ||
) { | ||
auto [metadata_storage, data_storage] = spider::test:: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct syntax error in TEMPLATE_TEST_CASE
declaration
There is a syntax error in the TEMPLATE_TEST_CASE
starting at line 103. The macro usage might be incorrect due to misplaced parentheses.
Apply the following diff to fix the syntax:
-TEMPLATE_TEST_CASE(
- "Add and remove data reference for driver",
- "[storage]",
- (std::tuple<spider::core::MySqlMetadataStorage, spider::core::MySqlDataStorage>)
-) {
+TEMPLATE_TEST_CASE(
+ "Add and remove data reference for driver",
+ "[storage]",
+ (std::tuple<spider::core::MySqlMetadataStorage, spider::core::MySqlDataStorage>)) {
Ensure that all parentheses are correctly balanced.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
"Add and remove data reference for driver", | |
"[storage]", | |
(std::tuple<spider::core::MySqlMetadataStorage, spider::core::MySqlDataStorage>) | |
) { | |
auto [metadata_storage, data_storage] = spider::test:: | |
TEMPLATE_TEST_CASE( | |
"Add and remove data reference for driver", | |
"[storage]", | |
(std::tuple<spider::core::MySqlMetadataStorage, spider::core::MySqlDataStorage>)) { | |
auto [metadata_storage, data_storage] = spider::test:: |
TEMPLATE_TEST_CASE( | ||
"Add, get and remove data with key", | ||
"[storage]", | ||
spider::core::MySqlDataStorage | ||
) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix syntax error in TEMPLATE_TEST_CASE
macro usage
There is a syntax error at line 41 due to the incorrect use of the TEMPLATE_TEST_CASE
macro. The macro parameters should not include a line break between the parameters and the opening brace {
.
Apply the following diff to correct the syntax:
-TEMPLATE_TEST_CASE(
- "Add, get and remove data with key",
- "[storage]",
- spider::core::MySqlDataStorage
-) {
+TEMPLATE_TEST_CASE(
+ "Add, get and remove data with key",
+ "[storage]",
+ spider::core::MySqlDataStorage) {
Ensure that the closing parenthesis )
is directly before the opening brace {
.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
TEMPLATE_TEST_CASE( | |
"Add, get and remove data with key", | |
"[storage]", | |
spider::core::MySqlDataStorage | |
) { | |
TEMPLATE_TEST_CASE( | |
"Add, get and remove data with key", | |
"[storage]", | |
spider::core::MySqlDataStorage) { |
🧰 Tools
🪛 cppcheck
[error] 41-41: syntax error
(syntaxError)
tests/storage/test-DataStorage.cpp
Outdated
std::cout << metadata_storage->add_driver(driver_id, "127.0.0.1").description << std::endl; | ||
// REQUIRE(metadata_storage->add_driver(driver_id, "127.0.0.1").success()); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replace std::cout
with REQUIRE
to assert operation success
Using std::cout
to output error descriptions in test cases is not appropriate. Instead, you should assert the success of add_driver
using REQUIRE
to ensure the test behaves correctly.
Apply the following diff to correct this:
-std::cout << metadata_storage->add_driver(driver_id, "127.0.0.1").description << std::endl;
-// REQUIRE(metadata_storage->add_driver(driver_id, "127.0.0.1").success());
+auto result = metadata_storage->add_driver(driver_id, "127.0.0.1");
+REQUIRE(result.success());
This change assigns the result to a variable and then checks for success using REQUIRE
.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
std::cout << metadata_storage->add_driver(driver_id, "127.0.0.1").description << std::endl; | |
// REQUIRE(metadata_storage->add_driver(driver_id, "127.0.0.1").success()); | |
auto result = metadata_storage->add_driver(driver_id, "127.0.0.1"); | |
REQUIRE(result.success()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
🧹 Outside diff range and nitpick comments (10)
tests/storage/StorageTestHelper.hpp (2)
18-49
: Reduce code duplication in storage creation functionsThe storage creation logic is duplicated across functions. Consider extracting the common connection and initialization pattern into a helper function.
+template <typename T> +requires std::derived_from<T, core::DataStorage> || std::derived_from<T, core::MetadataStorage> +auto create_and_init_storage() -> std::unique_ptr<T> { + std::unique_ptr<T> storage{new T()}; + REQUIRE(storage->connect(spider::test::cStorageUrl).success()); + REQUIRE(storage->initialize().success()); + return storage; +} + template <class T> requires std::derived_from<T, core::DataStorage> auto create_data_storage() -> std::unique_ptr<core::DataStorage> { - std::unique_ptr<core::DataStorage> storage{static_cast<core::DataStorage*>(new T())}; - REQUIRE(storage->connect(spider::test::cStorageUrl).success()); - REQUIRE(storage->initialize().success()); - return storage; + return create_and_init_storage<T>(); }
3-3
: Document reasons for lint suppressionsPlease add comments explaining why each lint rule is being suppressed:
- cert-err58-cpp
- cppcoreguidelines-avoid-do-while
- readability-function-cognitive-complexity
This helps maintainers understand if these suppressions are still necessary in the future.
Also applies to: 53-53
src/spider/storage/MetadataStorage.hpp (1)
15-19
: Good RAII design with explicit copy/move semantics!The explicit deletion of copy/move operations is appropriate for a storage interface. This prevents accidental copying of storage handles and enforces proper resource management.
Consider documenting these design decisions in a comment block above the class declaration to explain the rationale behind making the class non-copyable/movable.
src/spider/core/TaskGraph.hpp (1)
Line range hint
1-108
: Consider adding documentation for task lifecycle management.The class would benefit from documentation explaining:
- The ownership model for tasks
- The lifecycle of tasks within the graph
- Constraints on task dependencies (e.g., cycles)
Add class-level documentation like this:
namespace spider::core { +/** + * @brief Manages a directed acyclic graph of tasks and their dependencies. + * + * TaskGraph maintains relationships between tasks without managing their unique + * identifiers. Tasks are stored in a flat map and their dependencies are tracked + * separately. The graph structure enforces parent-child relationships while + * preventing cycles. + */ class TaskGraph {src/spider/storage/MysqlStorage.hpp (1)
22-27
: LGTM! Consider adding class documentation.The special member functions follow the Rule of Five correctly, preventing accidental copying or moving of the storage object. This is appropriate for a database connection wrapper.
Consider adding a class-level documentation comment explaining the purpose and lifecycle of MySqlMetadataStorage.
src/spider/storage/MysqlStorage.cpp (5)
123-131
: Consider adding an index on thevalue
column.The
data
table has a UNIQUE KEY on thekey
column but no index on thevalue
column. If there are queries that search byvalue
, consider adding an index to improve query performance.
525-528
: Consider consistent error handling.The error handling in
get_task_graph
checks for empty result set after executing the query, but it would be more consistent to check if the job exists before fetching its tasks.Consider adding a job existence check:
+ // Check if job exists + std::unique_ptr<sql::PreparedStatement> job_statement( + m_conn->prepareStatement("SELECT 1 FROM `jobs` WHERE `id` = ?") + ); + job_statement->setBytes(1, &id_bytes); + std::unique_ptr<sql::ResultSet> const job_res(job_statement->executeQuery()); + if (job_res->rowsCount() == 0) { + m_conn->commit(); + return StorageErr{ + StorageErrType::KeyNotFoundErr, + fmt::format("no job with id %s", boost::uuids::to_string(id)) + }; + } + std::unique_ptr<sql::PreparedStatement> task_statement(
106-112
: Consider adding a composite index for task dependencies.The
task_dependencies
table has separate indexes onparent
andchild
columns. For queries that join on both columns, a composite index might be more efficient.Consider adding a composite index:
`parent` BINARY(16) NOT NULL, `child` BINARY(16) NOT NULL, - KEY (`parent`) USING BTREE, - KEY (`child`) USING BTREE, + KEY (`parent`, `child`) USING BTREE, CONSTRAINT `task_dep_parent` FOREIGN KEY (`parent`) REFERENCES `tasks` (`id`) ON UPDATE NO ACTION ON DELETE CASCADE,
951-951
: Fix typo in comment.There's a typo in the comment: "voilated" should be "violated".
- // Need to initialize metadata storage first so that foreign constraint is not voilated + // Need to initialize metadata storage first so that foreign constraint is not violated
Line range hint
1047-1087
: Consider using a transaction with SELECT FOR UPDATE.The
get_data_by_key
method might be susceptible to race conditions if multiple transactions attempt to read and modify the same data concurrently. Consider usingSELECT ... FOR UPDATE
to ensure data consistency.std::unique_ptr<sql::PreparedStatement> statement( - m_conn->prepareStatement("SELECT `id`, `key`, `value`, `hard_locality` " - "FROM `data` WHERE `key` = ?") + m_conn->prepareStatement("SELECT `id`, `key`, `value`, `hard_locality` " + "FROM `data` WHERE `key` = ? FOR UPDATE")
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (6)
src/spider/core/TaskGraph.hpp
(1 hunks)src/spider/storage/MetadataStorage.hpp
(2 hunks)src/spider/storage/MysqlStorage.cpp
(41 hunks)src/spider/storage/MysqlStorage.hpp
(2 hunks)tests/storage/StorageTestHelper.hpp
(1 hunks)tests/storage/test-DataStorage.cpp
(1 hunks)
🧰 Additional context used
🪛 cppcheck
src/spider/storage/MysqlStorage.cpp
[performance] 87-87: Variable 'id' is assigned in constructor body. Consider performing initialization in initialization list.
(useInitializationList)
[performance] 114-114: Variable 'm_id' is assigned in constructor body. Consider performing initialization in initialization list.
(useInitializationList)
tests/storage/test-DataStorage.cpp
[error] 45-45: syntax error
(syntaxError)
🔇 Additional comments (10)
tests/storage/StorageTestHelper.hpp (1)
1-54
: Verify CMake integration
Let's verify this header is properly included in the test build configuration.
✅ Verification successful
CMake integration is properly configured
The header file is correctly included in the test build:
- Listed in
SPIDER_TEST_SOURCES
in tests/CMakeLists.txt - Actively used by test-DataStorage.cpp which is also part of the build
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Check if the file is included in CMake configuration
rg -l "StorageTestHelper\.hpp" CMakeLists.txt tests/CMakeLists.txt
# Check which test files are using this helper
rg -l "StorageTestHelper\.hpp" tests/
Length of output: 185
Script:
#!/bin/bash
# Check how StorageTestHelper.hpp is included in CMakeLists.txt
rg "StorageTestHelper\.hpp" tests/CMakeLists.txt -C 5
# Check if test-DataStorage.cpp is included in the build
rg "test-DataStorage\.cpp" tests/CMakeLists.txt
Length of output: 310
src/spider/storage/MetadataStorage.hpp (1)
30-38
: Verify implementations of the new job-based interface
The shift from task-graph to job-based model with client tracking is a significant architectural change. This requires careful verification of all implementing classes.
Let's verify the implementations:
✅ Verification successful
Let me gather more information about the implementation details in the MySQL storage class.
Implementation verification completed - all methods properly implemented
The MySqlMetadataStorage class has correctly implemented all the required methods from the interface:
add_job
: Properly handles job and client IDs with appropriate SQL statementsget_task_graph
: Implements task graph retrieval with proper error handlingget_jobs_by_client_id
: Correctly queries jobs using client IDremove_job
: Implements job removal with proper transaction handling
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for classes implementing MetadataStorage
echo "Checking for MetadataStorage implementations..."
rg "class.*:.*public.*MetadataStorage"
# Verify if all pure virtual methods are implemented
echo "Checking for add_job implementations..."
ast-grep --pattern 'auto add_job(boost::uuids::uuid job_id, boost::uuids::uuid client_id, TaskGraph const& task_graph)'
echo "Checking for get_jobs_by_client_id implementations..."
ast-grep --pattern 'auto get_jobs_by_client_id(boost::uuids::uuid client_id, std::vector<boost::uuids::uuid>* job_ids)'
Length of output: 685
Script:
#!/bin/bash
# Check the implementation details in MysqlStorage
echo "Checking MySQL implementation details..."
rg -A 10 "auto MySqlMetadataStorage::(add_job|get_task_graph|get_jobs_by_client_id|remove_job)" src/spider/storage/MysqlStorage.cpp
# Look for any old task-graph related methods that might need cleanup
echo "Checking for old task-graph methods..."
rg "task_graph" src/spider/storage/MysqlStorage.hpp
Length of output: 2627
src/spider/core/TaskGraph.hpp (2)
17-17
: LGTM: Constructor simplification aligns with modern C++ practices.
The change to use = default
removes unnecessary initialization and improves clarity.
17-17
: Verify the impact of removing TaskGraph ID management.
The removal of the ID management functionality (m_id and get_id()) might affect code that previously relied on these features.
✅ Verification successful
The removal of TaskGraph ID management is safe
The search results show that TaskGraph itself doesn't expose or manage any ID functionality. Instead, the storage layer (MysqlStorage) handles all ID management externally, using UUIDs for jobs, clients, and tasks. The TaskGraph class is only used as a data structure to store and manage tasks and their relationships.
Key findings:
- No direct usage of
get_id()
method was found in the codebase - All UUID handling is done at the storage layer, not within TaskGraph
- TaskGraph is used primarily as a container for tasks and their relationships
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for any remaining references to get_id() method of TaskGraph
# to ensure all callers have been updated
# Look for direct method calls
rg -A 2 "TaskGraph.*get_id\(\)"
# Look for stored TaskGraph IDs
rg -A 2 "TaskGraph.*uuid"
Length of output: 61
Script:
#!/bin/bash
# Let's try a broader search for any references to TaskGraph and ID/id
rg -A 2 "TaskGraph.*[iI][dD]"
# Also search for any files that might be using TaskGraph
fd -e hpp -e cpp -e h -e c | xargs rg "TaskGraph" -l
# Look for any potential test files related to TaskGraph
fd -e test.cpp -e Test.cpp -e test.hpp -e Test.hpp | xargs rg "TaskGraph" -A 2
Length of output: 4299
src/spider/storage/MysqlStorage.hpp (4)
68-73
: LGTM! Special member functions are properly defined.
The implementation of special member functions follows best practices for resource management.
66-66
: Duplicate comment: MySQL capitalization inconsistency.
Referring to the previous review comment about MySQL capitalization inconsistency in filenames vs class names.
79-80
: New methods enhance the storage interface.
The addition of get_data_by_key and remove_data methods completes the CRUD operations set, providing more flexibility in data management.
Let's verify the implementation of these methods:
#!/bin/bash
# Check for consistent error handling in the implementation
echo "Checking error handling patterns..."
rg "get_data_by_key|remove_data.*StorageErr" --type cpp -A 5
34-42
: API changes look good but verify impact.
The renaming from task_graph to job terminology (e.g., add_task_graph → add_job) and the addition of client_id parameter improve the API clarity.
Let's verify the impact of these API changes:
✅ Verification successful
Let me gather more information about the API changes by searching for the old task_graph related methods.
API changes are consistent and properly implemented
The codebase shows a well-coordinated transition from task_graph to job terminology. The old add_task_graph
method is completely removed, and the new add_job
method is consistently used across the codebase. The TaskGraph
type is still used internally as a parameter and return type, which is correct since it represents the graph structure of tasks within a job.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for uses of old method names to ensure all calls are updated
echo "Checking for old method calls..."
rg "add_task_graph|get_task_graphs" --type cpp
# Search for new method usage patterns
echo "Checking new method usage..."
rg "add_job|get_jobs_by_client_id" --type cpp
Length of output: 4330
Script:
#!/bin/bash
# Search for old task_graph method declarations and implementations
ast-grep --pattern 'add_task_graph($_) -> StorageErr'
ast-grep --pattern 'get_task_graphs($_) -> StorageErr'
# Search for any remaining task_graph references
rg "task_graph" --type cpp -A 2
Length of output: 5017
src/spider/storage/MysqlStorage.cpp (1)
159-171
: LGTM: Table creation order handles foreign key dependencies correctly.
The order of table creation in cCreateStorage
array properly handles foreign key dependencies:
drivers
beforedata_ref_driver
jobs
beforetasks
tasks
beforedata_ref_task
data
beforetask_outputs
task_outputs
beforetask_inputs
tests/storage/test-DataStorage.cpp (1)
20-43
: Test case "Add, get and remove data" is well-implemented
The test case accurately verifies the basic data storage operations, and the code is correct.
tests/storage/StorageTestHelper.hpp
Outdated
#ifndef SPIDER_TEST_STORAGETETHELPER_HPP | ||
#define SPIDER_TEST_STORAGETETHELPER_HPP |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix typo in header guard macro name
The header guard contains a typo: "STORAGETET" should be "STORAGETESTHELPER"
-#ifndef SPIDER_TEST_STORAGETETHELPER_HPP
-#define SPIDER_TEST_STORAGETETHELPER_HPP
+#ifndef SPIDER_TEST_STORAGETESTHELPER_HPP
+#define SPIDER_TEST_STORAGETESTHELPER_HPP
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
#ifndef SPIDER_TEST_STORAGETETHELPER_HPP | |
#define SPIDER_TEST_STORAGETETHELPER_HPP | |
#ifndef SPIDER_TEST_STORAGETESTHELPER_HPP | |
#define SPIDER_TEST_STORAGETESTHELPER_HPP |
char const* const cStorageUrl | ||
= "jdbc:mariadb://localhost:3306/spider_test?user=root&password=password"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Security: Move database credentials to environment variables
Storing database credentials in source code is a security risk. Consider:
- Moving credentials to environment variables
- Using a non-root test user with minimal privileges
- Using a strong password
-char const* const cStorageUrl
- = "jdbc:mariadb://localhost:3306/spider_test?user=root&password=password";
+char const* const cStorageUrl = []() {
+ auto const* url = std::getenv("SPIDER_TEST_DB_URL");
+ if (url == nullptr) {
+ throw std::runtime_error("SPIDER_TEST_DB_URL environment variable not set");
+ }
+ return url;
+}();
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
char const* const cStorageUrl | |
= "jdbc:mariadb://localhost:3306/spider_test?user=root&password=password"; | |
char const* const cStorageUrl = []() { | |
auto const* url = std::getenv("SPIDER_TEST_DB_URL"); | |
if (url == nullptr) { | |
throw std::runtime_error("SPIDER_TEST_DB_URL environment variable not set"); | |
} | |
return url; | |
}(); |
TEMPLATE_TEST_CASE( | ||
"Add and remove data reference for driver", | ||
"[storage]", | ||
(std::tuple<spider::core::MySqlMetadataStorage, spider::core::MySqlDataStorage>) | ||
) { | ||
auto [metadata_storage, data_storage] = spider::test:: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix syntax error in TEMPLATE_TEST_CASE
macro
An extra closing parenthesis at line 111 is causing a syntax error. Please adjust the macro to fix the issue.
Apply this diff:
TEMPLATE_TEST_CASE(
"Add and remove data reference for driver",
"[storage]",
(std::tuple<spider::core::MySqlMetadataStorage, spider::core::MySqlDataStorage>)
-)
-) {
+){
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
TEMPLATE_TEST_CASE( | |
"Add and remove data reference for driver", | |
"[storage]", | |
(std::tuple<spider::core::MySqlMetadataStorage, spider::core::MySqlDataStorage>) | |
) { | |
auto [metadata_storage, data_storage] = spider::test:: | |
TEMPLATE_TEST_CASE( | |
"Add and remove data reference for driver", | |
"[storage]", | |
(std::tuple<spider::core::MySqlMetadataStorage, spider::core::MySqlDataStorage>) | |
){ | |
auto [metadata_storage, data_storage] = spider::test:: |
TEMPLATE_TEST_CASE( | ||
"Add and remove data reference for task", | ||
"[storage]", | ||
(std::tuple<spider::core::MySqlMetadataStorage, spider::core::MySqlDataStorage>) | ||
) { | ||
auto [metadata_storage, data_storage] = spider::test:: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix syntax error in TEMPLATE_TEST_CASE
macro
An extra closing parenthesis at line 79 is causing a syntax error. Correct the macro to ensure proper syntax.
Apply this diff:
TEMPLATE_TEST_CASE(
"Add and remove data reference for task",
"[storage]",
(std::tuple<spider::core::MySqlMetadataStorage, spider::core::MySqlDataStorage>)
-)
-) {
+){
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
TEMPLATE_TEST_CASE( | |
"Add and remove data reference for task", | |
"[storage]", | |
(std::tuple<spider::core::MySqlMetadataStorage, spider::core::MySqlDataStorage>) | |
) { | |
auto [metadata_storage, data_storage] = spider::test:: | |
TEMPLATE_TEST_CASE( | |
"Add and remove data reference for task", | |
"[storage]", | |
(std::tuple<spider::core::MySqlMetadataStorage, spider::core::MySqlDataStorage>) | |
){ | |
auto [metadata_storage, data_storage] = spider::test:: |
TEMPLATE_TEST_CASE( | ||
"Add, get and remove data with key", | ||
"[storage]", | ||
spider::core::MySqlDataStorage | ||
) { | ||
std::unique_ptr<spider::core::DataStorage> storage |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix syntax error due to extra closing parenthesis
There is an extra closing parenthesis at line 50, causing a syntax error in the TEMPLATE_TEST_CASE
macro. Ensure that the parentheses and braces are correctly matched.
Apply this diff to fix the syntax error:
TEMPLATE_TEST_CASE(
"Add, get and remove data with key",
"[storage]",
spider::core::MySqlDataStorage
-)
-) {
+){
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
TEMPLATE_TEST_CASE( | |
"Add, get and remove data with key", | |
"[storage]", | |
spider::core::MySqlDataStorage | |
) { | |
std::unique_ptr<spider::core::DataStorage> storage | |
TEMPLATE_TEST_CASE( | |
"Add, get and remove data with key", | |
"[storage]", | |
spider::core::MySqlDataStorage | |
){ | |
std::unique_ptr<spider::core::DataStorage> storage |
🧰 Tools
🪛 cppcheck
[error] 45-45: syntax error
(syntaxError)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (5)
tests/storage/StorageTestHelper.hpp (2)
36-47
: Refactor Suggestion: Reduce Code Duplication in Storage CreationThe connection and initialization logic is duplicated across all three functions. Consider extracting this common pattern into a helper function.
Here's a suggested refactor:
template <typename T> auto connect_and_initialize(std::unique_ptr<T>& storage) { REQUIRE(storage->connect(spider::test::cStorageUrl).success()); REQUIRE(storage->initialize().success()); } template <class M, class D> requires std::derived_from<M, core::MetadataStorage> && std::derived_from<D, core::DataStorage> auto create_storage() { auto metadata_storage = std::make_unique<M>(); auto data_storage = std::make_unique<D>(); connect_and_initialize(metadata_storage); connect_and_initialize(data_storage); return std::make_tuple(std::move(metadata_storage), std::move(data_storage)); }
20-25
: Consider Adding Error Context to Storage CreationThe REQUIRE statements could benefit from more descriptive error messages to aid in debugging test failures.
Example improvement:
template <class T> requires std::derived_from<T, core::DataStorage> auto create_data_storage() -> std::unique_ptr<core::DataStorage> { std::unique_ptr<core::DataStorage> storage = std::make_unique<T>(); - REQUIRE(storage->connect(spider::test::cStorageUrl).success()); - REQUIRE(storage->initialize().success()); + REQUIRE(storage->connect(spider::test::cStorageUrl).success(), + "Failed to connect to storage at " << spider::test::cStorageUrl); + REQUIRE(storage->initialize().success(), + "Failed to initialize storage"); return storage; }Also applies to: 29-34
tests/storage/test-DataStorage.cpp (3)
20-43
: Consider enhancing test data and RAII usage.The test case effectively covers CRUD operations and error conditions. Consider these improvements:
- Use
const auto
for better type deduction:- std::unique_ptr<spider::core::DataStorage> storage + const auto storage
- Use more meaningful test data to improve test documentation:
- spider::core::Data const data{"value"}; + spider::core::Data const data{"test_payload_1"}; // Descriptive test data
133-134
: Fix incorrect comment.The comment refers to "task reference" but this is actually removing a driver reference.
- // Remove task reference + // Remove driver reference
130-131
: Fix incorrect comment.The comment refers to "task reference" but this is actually adding a driver reference.
- // Add task reference + // Add driver reference
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (2)
tests/storage/StorageTestHelper.hpp
(1 hunks)tests/storage/test-DataStorage.cpp
(1 hunks)
🧰 Additional context used
🪛 cppcheck
tests/storage/test-DataStorage.cpp
[error] 45-45: syntax error
(syntaxError)
🔇 Additional comments (4)
tests/storage/StorageTestHelper.hpp (1)
15-16
:
Security Risk: Database Credentials Should Not Be Hardcoded
Storing database credentials in source code poses several security risks:
- Credentials are exposed in version control
- Using root access for tests is unnecessarily privileged
- The password 'password' is extremely weak
Please implement this more secure approach:
-char const* const cStorageUrl
- = "jdbc:mariadb://localhost:3306/spider_test?user=root&password=password";
+char const* const cStorageUrl = []() {
+ auto const* url = std::getenv("SPIDER_TEST_DB_URL");
+ if (url == nullptr) {
+ throw std::runtime_error("SPIDER_TEST_DB_URL environment variable not set");
+ }
+ return url;
+}();
Also, consider:
- Creating a dedicated test user with minimal privileges
- Using environment-specific configuration for different CI/CD stages
- Adding setup instructions to the README
Likely invalid or redundant comment.
tests/storage/test-DataStorage.cpp (3)
1-17
: LGTM! Well-organized file structure.
The includes are properly organized, and the NOLINT pragmas are appropriately scoped to specific rules.
45-72
: Extract common test setup and verification logic.
This test case shares similar setup and verification patterns with the first test case. Consider extracting common logic into test helpers:
namespace {
template<typename T>
auto create_and_verify_data(std::unique_ptr<T>& storage,
std::string_view key = "") {
spider::core::Data data{key.empty() ? "test_data" :
std::string(key), "test_value"};
REQUIRE(storage->add_data(data).success());
return data;
}
}
🧰 Tools
🪛 cppcheck
[error] 45-45: syntax error
(syntaxError)
1-138
: Overall excellent test coverage and implementation.
The test suite provides comprehensive coverage of data storage operations, including:
- Basic CRUD operations
- Key-based operations
- Task and driver reference management
- Error conditions and edge cases
While there are some minor improvements suggested above, the implementation is solid and follows testing best practices.
🧰 Tools
🪛 cppcheck
[error] 45-45: syntax error
(syntaxError)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Outside diff range and nitpick comments (2)
tests/storage/test-DataStorage.cpp (2)
28-31
: Enhance error scenario coverage.While basic error cases are covered, consider adding these additional scenarios:
- Adding data with invalid input (empty values, special characters)
- Concurrent access scenarios
- Resource cleanup after failures
- Edge cases for reference management
Would you like assistance in implementing these additional test cases?
Also applies to: 58-59, 83-84, 93-94, 123-124
1-138
: Improve test suite organization and documentation.Consider these enhancements to improve maintainability:
- Add a test suite description using Catch2's SCENARIO macro
- Group related test cases using Catch2's TEST_SUITE macro
- Add documentation explaining the test strategy and coverage goals
Example structure:
TEST_SUITE("DataStorage") { SCENARIO("Data operations with MySQL storage") { // Existing test cases here } }🧰 Tools
🪛 cppcheck
[error] 45-45: syntax error
(syntaxError)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (1)
tests/storage/test-DataStorage.cpp
(1 hunks)
🧰 Additional context used
🪛 cppcheck
tests/storage/test-DataStorage.cpp
[error] 45-45: syntax error
(syntaxError)
🔇 Additional comments (1)
tests/storage/test-DataStorage.cpp (1)
1-17
: Well-organized includes and appropriate NOLINT usage.
The file structure demonstrates good organization with logical grouping of includes and specific NOLINT pragmas.
TEMPLATE_TEST_CASE("Add, get and remove data", "[storage]", spider::core::MySqlDataStorage) { | ||
std::unique_ptr<spider::core::DataStorage> storage | ||
= spider::test::create_data_storage<TestType>(); | ||
|
||
// Add data | ||
spider::core::Data const data{"value"}; | ||
REQUIRE(storage->add_data(data).success()); | ||
|
||
// Add data with same id again should fail | ||
spider::core::Data const data_same_id{data.get_id(), "value2"}; | ||
REQUIRE(spider::core::StorageErrType::DuplicateKeyErr == storage->add_data(data_same_id).type); | ||
|
||
// Get data should match | ||
spider::core::Data result{"temp"}; | ||
REQUIRE(storage->get_data(data.get_id(), &result).success()); | ||
REQUIRE(spider::core::data_equal(data, result)); | ||
|
||
// Remove data should succeed | ||
REQUIRE(storage->remove_data(data.get_id()).success()); | ||
|
||
// Get data should fail | ||
REQUIRE(spider::core::StorageErrType::KeyNotFoundErr | ||
== storage->get_data(data.get_id(), &result).type); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Test cases need structural improvements.
While the test coverage is comprehensive, there are several areas for improvement:
- The test cases contain duplicated setup code for storage creation
- Error scenarios could benefit from more descriptive failure messages
- The TEMPLATE_TEST_CASE macro formatting is inconsistent
Consider implementing these improvements:
- Extract common setup into a fixture:
template<typename T>
class StorageTestFixture {
protected:
std::unique_ptr<spider::core::DataStorage> storage;
void SetUp() {
storage = spider::test::create_data_storage<T>();
}
};
- Add descriptive messages to REQUIRE statements:
REQUIRE(storage->add_data(data).success(),
"Adding data with unique ID should succeed");
- Standardize TEMPLATE_TEST_CASE formatting:
TEMPLATE_TEST_CASE("Test name", "[storage]", TestType) {
// Test body
}
Also applies to: 45-72, 74-104, 106-135
…based on it libmariadbcpp.so depends on libmariadb.so, but does not set the rpath correctly, so the built library cannot find libmariadb.so. Install libmariadb using apt and brew and build libmariadbcpp.so with system installed libmariadb.so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Outside diff range and nitpick comments (3)
tools/scripts/lib_install/macOS/install-lib.sh (1)
13-14
: Consider pinning package versions for reproducibility.To ensure consistent development environments across the team, consider specifying versions for all packages. This helps prevent unexpected issues from package updates.
brew install \ - boost \ - fmt \ - spdlog \ - mariadb-connector-c + [email protected] \ + [email protected] \ + [email protected] \ + [email protected]tools/scripts/lib_install/linux/install-dev.sh (2)
23-23
: Add a space after the package name.The addition of
libmariadb-dev
is appropriate for supporting MySQL-based storage tests. However, please add a space after the package name for consistency.- libmariadb-dev\ + libmariadb-dev \
Line range hint
1-26
: Consider organizing packages into logical groups.To improve maintainability, consider grouping the packages by their purpose. For example:
- Build tools (cmake, gcc, g++, etc.)
- System libraries (libssl-dev, libcurl4, etc.)
- Database libraries (libmariadb-dev)
- Programming languages (python3 and related packages)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (4)
cmake/Modules/FindMariaDBClientCpp.cmake
(1 hunks)tools/scripts/lib_install/linux/install-dev.sh
(1 hunks)tools/scripts/lib_install/macOS/install-lib.sh
(1 hunks)tools/scripts/lib_install/mariadb-connector-cpp.sh
(1 hunks)
🔇 Additional comments (3)
tools/scripts/lib_install/macOS/install-lib.sh (1)
13-14
: LGTM! Addition of MariaDB connector aligns with testing requirements.
The addition of mariadb-connector-c
is appropriate for supporting the new data storage tests.
tools/scripts/lib_install/mariadb-connector-cpp.sh (1)
65-65
: Verify RPM layout compatibility with Debian-based systems
The script uses dpkg
for package management but configures cmake with INSTALL_LAYOUT=RPM
. This might cause inconsistencies in file locations.
Let's verify the impact of RPM layout on Debian systems:
cmake/Modules/FindMariaDBClientCpp.cmake (1)
107-107
: Excellent consolidation of include directories!
The change improves code organization by combining the include directory paths into a single property declaration, which is the preferred approach in CMake. This modification maintains the same functionality while reducing code duplication.
# Setting USE_SYSTEM_INSTALLED_LIB mess up the install prefix, so set it manually | ||
cmake -DUSE_SYSTEM_INSTALLED_LIB=ON -DCMAKE_INSTALL_LIBDIR=/usr/local -DINSTALL_LAYOUT=RPM .. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Add directory existence check before cmake configuration
The script should verify that /usr/local
exists and is writable before proceeding with the cmake configuration to prevent installation failures.
Add these checks before the cmake command:
# Setting USE_SYSTEM_INSTALLED_LIB mess up the install prefix, so set it manually
+if [ ! -d "/usr/local" ]; then
+ ${privileged_command_prefix} mkdir -p /usr/local
+elif [ ! -w "/usr/local" ]; then
+ echo "Error: /usr/local is not writable"
+ exit 1
+fi
cmake -DUSE_SYSTEM_INSTALLED_LIB=ON -DCMAKE_INSTALL_LIBDIR=/usr/local -DINSTALL_LAYOUT=RPM ..
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
# Setting USE_SYSTEM_INSTALLED_LIB mess up the install prefix, so set it manually | |
cmake -DUSE_SYSTEM_INSTALLED_LIB=ON -DCMAKE_INSTALL_LIBDIR=/usr/local -DINSTALL_LAYOUT=RPM .. | |
# Setting USE_SYSTEM_INSTALLED_LIB mess up the install prefix, so set it manually | |
if [ ! -d "/usr/local" ]; then | |
${privileged_command_prefix} mkdir -p /usr/local | |
elif [ ! -w "/usr/local" ]; then | |
echo "Error: /usr/local is not writable" | |
exit 1 | |
fi | |
cmake -DUSE_SYSTEM_INSTALLED_LIB=ON -DCMAKE_INSTALL_LIBDIR=/usr/local -DINSTALL_LAYOUT=RPM .. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Outside diff range and nitpick comments (2)
tools/scripts/lib_install/linux/install-lib.sh (1)
10-15
: LGTM! Consider adding error handling for sudo check.The privilege checking mechanism is well-implemented. However, consider adding error handling in case the sudo check fails.
if [ ${EUID:-$(id -u)} -ne 0 ] ; then - sudo echo "Script can elevate privileges." + if ! sudo echo "Script can elevate privileges."; then + echo "Error: This script requires sudo privileges to install packages." + exit 1 + fi privileged_command_prefix="${privileged_command_prefix} sudo" fidocs/tests.md (1)
16-24
: Improve the build instructions clarity.The build instructions should be more detailed and include prerequisites.
## Build and run unit tests -To build the unit tests, run the following commands in project root directory. +Prerequisites: +- CMake version X.Y or higher +- C++ compiler supporting C++17 + +To build the unit tests, run the following commands in the project root directory: ```shell +# Configure the build cmake -S . -B build + +# Build the unit test target cmake --build build --target unitTest --parallel + +# Run the tests ./build/tests/unitTest<details> <summary>🧰 Tools</summary> <details> <summary>🪛 LanguageTool</summary> [uncategorized] ~18-~18: You might be missing the article “the” here. Context: ...it tests, run the following commands in project root directory. ```shell cmake -S . -B... (AI_EN_LECTOR_MISSING_DETERMINER_THE) </details> </details> </blockquote></details> </blockquote></details> <details> <summary>📜 Review details</summary> **Configuration used: CodeRabbit UI** **Review profile: CHILL** <details> <summary>📥 Commits</summary> Reviewing files that changed from the base of the PR and between 47feb1ea5bdb541b8255a93a0235dcc14b869e56 and 0ae30cc0706679e7c6191c067c48a4a7bd0b862b. </details> <details> <summary>📒 Files selected for processing (2)</summary> * `docs/tests.md` (1 hunks) * `tools/scripts/lib_install/linux/install-lib.sh` (1 hunks) </details> <details> <summary>🧰 Additional context used</summary> <details> <summary>🪛 LanguageTool</summary> <details> <summary>docs/tests.md</summary> [uncategorized] ~10-~10: You might be missing the article “the” here. Context: ...d 1. Start a Mysql database running in background. 2. Set the password for `root` or any ... (AI_EN_LECTOR_MISSING_DETERMINER_THE) --- [uncategorized] ~11-~11: You might be missing the article “a” here. Context: ... `root` or any create another user with password 3. Create an empty database. 4. Set the... (AI_EN_LECTOR_MISSING_DETERMINER_A) --- [uncategorized] ~18-~18: You might be missing the article “the” here. Context: ...it tests, run the following commands in project root directory. ```shell cmake -S . -B... (AI_EN_LECTOR_MISSING_DETERMINER_THE) --- [uncategorized] ~26-~26: You might be missing the article “the” here. Context: ...`` If the tests show error message for connection function below, revisit the [Setup stor... (AI_EN_LECTOR_MISSING_DETERMINER_THE) </details> </details> </details> <details> <summary>🔇 Additional comments (1)</summary> <details> <summary>tools/scripts/lib_install/linux/install-lib.sh (1)</summary> `16-18`: **LGTM! Package installation follows security best practices.** The package installation is properly implemented with: - Non-interactive frontend to prevent hanging - Minimal installation using --no-install-recommends - Package list update before installation </details> </details> </details> <!-- This is an auto-generated comment by CodeRabbit for review status -->
docs/tests.md
Outdated
## Setup storage backend | ||
|
||
Spider relies on a fault-tolerant storage to store metadata and data. Unit tests need the storage | ||
backend available to run the unit tests. | ||
|
||
### Setup Mysql as storage backend | ||
|
||
1. Start a Mysql database running in background. | ||
2. Set the password for `root` or any create another user with password | ||
3. Create an empty database. | ||
4. Set the `cStorageUrl` in `tests/storage/StorageTestHelper.hpp` to | ||
`jdbc:mariadb://localhost:3306/<db_name>?user=<usr>&password=<pwd>` | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Enhance the storage backend setup instructions.
The setup instructions could be more comprehensive. Consider the following improvements:
## Setup storage backend
Spider relies on a fault-tolerant storage to store metadata and data. Unit tests need the storage
backend available to run the unit tests.
### Setup Mysql as storage backend
+
+Prerequisites:
+- MySQL/MariaDB version X.Y or higher
+- MySQL client tools
-1. Start a Mysql database running in background.
-2. Set the password for `root` or any create another user with password
-3. Create an empty database.
+1. Start a MySQL database running in the background.
+2. Set up database access:
+ ```sql
+ ALTER USER 'root'@'localhost' IDENTIFIED BY 'your_password';
+ -- OR create a new user
+ CREATE USER 'spider_test'@'localhost' IDENTIFIED BY 'your_password';
+ GRANT ALL PRIVILEGES ON *.* TO 'spider_test'@'localhost';
+ ```
+3. Create an empty database:
+ ```sql
+ CREATE DATABASE spider_test;
+ ```
4. Set the `cStorageUrl` in `tests/storage/StorageTestHelper.hpp` to
`jdbc:mariadb://localhost:3306/<db_name>?user=<usr>&password=<pwd>`
+
+ Where:
+ - `<db_name>`: Name of the database (e.g., spider_test)
+ - `<usr>`: MySQL username
+ - `<pwd>`: MySQL password
🧰 Tools
🪛 LanguageTool
[uncategorized] ~10-~10: You might be missing the article “the” here.
Context: ...d 1. Start a Mysql database running in background. 2. Set the password for root
or any ...
(AI_EN_LECTOR_MISSING_DETERMINER_THE)
[uncategorized] ~11-~11: You might be missing the article “a” here.
Context: ... root
or any create another user with password 3. Create an empty database. 4. Set the...
(AI_EN_LECTOR_MISSING_DETERMINER_A)
docs/tests.md
Outdated
If the tests show error message for connection function below, revisit | ||
the [Setup storage backend](#setup-storage-backend) and double check if `cStroageUrl` is correctly | ||
set. | ||
|
||
```c++ | ||
REQUIRE( storage->connect(spider::test::cStorageUrl).success() ) | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix typo and enhance troubleshooting guidance.
There's a typo in the variable name and the error handling could be more informative.
-If the tests show error message for connection function below, revisit
-the [Setup storage backend](#setup-storage-backend) and double check if `cStroageUrl` is correctly
+If the tests show an error message for the connection function below, revisit
+the [Setup storage backend](#setup-storage-backend) and double check if `cStorageUrl` is correctly
set.
```c++
REQUIRE( storage->connect(spider::test::cStorageUrl).success() )
+Common connection issues:
+- MySQL service not running
+- Incorrect database name
+- Wrong username or password
+- MySQL server not listening on the specified port
<details>
<summary>🧰 Tools</summary>
<details>
<summary>🪛 LanguageTool</summary>
[uncategorized] ~26-~26: You might be missing the article “the” here.
Context: ...`` If the tests show error message for connection function below, revisit the [Setup stor...
(AI_EN_LECTOR_MISSING_DETERMINER_THE)
</details>
</details>
<!-- This is an auto-generated comment by CodeRabbit -->
docs/tests.md
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it makes more sense to call this testing.md
.
docs/tests.md
Outdated
@@ -0,0 +1,32 @@ | |||
# Tests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# Tests | |
# Testing |
docs/tests.md
Outdated
@@ -0,0 +1,32 @@ | |||
# Tests | |||
|
|||
## Setup storage backend |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
## Setup storage backend | |
## Set up a storage backend |
"Set up" is the verb, "setup" is the noun.
docs/tests.md
Outdated
Spider relies on a fault-tolerant storage to store metadata and data. Unit tests need the storage | ||
backend available to run the unit tests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spider relies on a fault-tolerant storage to store metadata and data. Unit tests need the storage | |
backend available to run the unit tests. | |
Spider relies on a fault-tolerant storage backend to store metadata and data. Spider's unit tests | |
also require this storage backend. |
docs/tests.md
Outdated
Spider relies on a fault-tolerant storage to store metadata and data. Unit tests need the storage | ||
backend available to run the unit tests. | ||
|
||
### Setup Mysql as storage backend |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
### Setup Mysql as storage backend | |
### Set up MySQL as storage backend |
docs/tests.md
Outdated
1. Start a Mysql database running in background. | ||
2. Set the password for `root` or any create another user with password |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1. Start a Mysql database running in background. | |
2. Set the password for `root` or any create another user with password | |
1. Start a MySQL database. | |
2. Set the password for `root` or create another user with a password. |
docs/tests.md
Outdated
4. Set the `cStorageUrl` in `tests/storage/StorageTestHelper.hpp` to | ||
`jdbc:mariadb://localhost:3306/<db_name>?user=<usr>&password=<pwd>` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
4. Set the `cStorageUrl` in `tests/storage/StorageTestHelper.hpp` to | |
`jdbc:mariadb://localhost:3306/<db_name>?user=<usr>&password=<pwd>` | |
4. Set `cStorageUrl` in `tests/storage/StorageTestHelper.hpp` to | |
`jdbc:mariadb://localhost:3306/<db_name>?user=<db_user>&password=<db_password>` | |
* Replace `db_name`, `db_user`, and `db_password` with the values used in steps 2 & 3. |
docs/tests.md
Outdated
|
||
## Build and run unit tests | ||
|
||
To build the unit tests, run the following commands in project root directory. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To build the unit tests, run the following commands in project root directory. | |
To build and run the unit tests, run the following commands in the project's root directory. |
docs/tests.md
Outdated
If the tests show error message for connection function below, revisit | ||
the [Setup storage backend](#setup-storage-backend) and double check if `cStorageUrl` is correctly | ||
set. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the tests show error message for connection function below, revisit | |
the [Setup storage backend](#setup-storage-backend) and double check if `cStorageUrl` is correctly | |
set. | |
If the tests show an error message for the connection function below, revisit the | |
[Set up storage backend](#set-up-a-storage-backend) section and check if `cStorageUrl` is set | |
correctly. |
Description
Add unit tests for data storage.
To run unit test, setup mysql on local machine. Change
cStorageUrl
intests/storage/StorageTestHelper.hpp
to correct url. Then compile and run unit test.Validation performed
Summary by CodeRabbit
Release Notes
New Features
Bug Fixes
Documentation
Chores