Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

build(core-clp): Remove boost::iostream dependency. #450

Merged
merged 5 commits into from
Nov 15, 2024

Conversation

LinZhihao-723
Copy link
Member

@LinZhihao-723 LinZhihao-723 commented Jun 18, 2024

Description

PR #445 has implemented our own memory-mapped file. With our implementation, we can now remove boost::iostream dependency from clp and unit tests.
This PR makes necessary changes to replace boost memory-mapped files with clp::ReadOnlyMemoryMappedFile, and remove the boost::iostream dependency from the cmake files of clp, clg, clo, make-dictionaries-readable, and unitTest. Since boost::iostream is still used in clp-s and glt, it is not yet fully removed from our library dependency.

Validation performed

Ensure all unit tests passed.

Summary by CodeRabbit

  • New Features

    • Introduced a custom ReadOnlyMemoryMappedFile class for improved memory mapping functionality.
  • Bug Fixes

    • Simplified error handling and resource management for memory-mapped files in the Segment class.
  • Tests

    • Updated tests to utilize the new ReadOnlyMemoryMappedFile class, enhancing the encapsulation of memory mapping functionality.
  • Documentation

    • Added necessary includes for the new memory mapping class in the relevant test files.

@kirkrodrigues
Copy link
Member

Do you want to try merging main to see if Centos 9 resolves the workflow failure?

Copy link
Contributor

coderabbitai bot commented Nov 13, 2024

Walkthrough

The pull request modifies several CMakeLists.txt files across different components, primarily removing the Boost::iostreams library from the linked libraries. Additionally, it updates the handling of memory-mapped files in Segment.cpp and Segment.hpp, replacing boost::iostreams::mapped_file_source with std::optional<ReadOnlyMemoryMappedFile>. The test file test-StreamingCompression.cpp transitions to using the custom ReadOnlyMemoryMappedFile class instead of Boost's memory mapping utilities.

Changes

File Path Change Summary
components/core/src/clp/clg/CMakeLists.txt Removed Boost::iostreams from target_link_libraries. Updated CLG_SOURCES (no specific changes).
components/core/src/clp/clo/CMakeLists.txt Removed Boost::iostreams from target_link_libraries. No changes to source files.
components/core/src/clp/clp/CMakeLists.txt Removed Boost::iostreams from target_link_libraries. Updated CLP_SOURCES (no specific changes).
components/core/src/clp/make_dictionaries_readable/CMakeLists.txt Removed Boost::iostreams from target_link_libraries. Updated MAKE_DICTIONARIES_READABLE_SOURCES (no specific changes).
components/core/src/clp/streaming_archive/reader/Segment.cpp Updated memory management for memory-mapped files, simplified error handling, and replaced close() with reset().
components/core/src/clp/streaming_archive/reader/Segment.hpp Changed type of m_memory_mapped_segment_file from boost::iostreams::mapped_file_source to std::optional<ReadOnlyMemoryMappedFile>. Added #include <optional>.
components/core/tests/test-StreamingCompression.cpp Replaced Boost's iostreams::mapped_file with clp::ReadOnlyMemoryMappedFile. Updated memory mapping logic. Added header for ReadOnlyMemoryMappedFile.hpp.

Suggested reviewers

  • kirkrodrigues

Warning

Rate limit exceeded

@LinZhihao-723 has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 19 minutes and 28 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between 33b21f1 and 1912615.


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 33b21f1 and 1912615.

📒 Files selected for processing (1)
  • components/core/src/clp/streaming_archive/reader/Segment.cpp (3 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • components/core/src/clp/streaming_archive/reader/Segment.cpp

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Outside diff range and nitpick comments (5)
components/core/src/clp/streaming_archive/reader/Segment.cpp (2)

Line range hint 46-56: Enhance error handling with specific failure details

Consider capturing and logging the specific reason for memory mapping failure to aid in debugging.

     m_memory_mapped_segment_file.emplace(segment_path);
     if (false == m_memory_mapped_segment_file.has_value()) {
         SPDLOG_ERROR(
-                "streaming_archive::reader:Segment: Unable to memory map the compressed "
-                "segment with path: {}",
-                segment_path.c_str()
+                "streaming_archive::reader:Segment: Unable to memory map the compressed "
+                "segment with path: {} (errno: {}, message: {})",
+                segment_path.c_str(),
+                errno,
+                std::strerror(errno)
         );
         return ErrorCode_Failure;
     }

Replace boost::filesystem with std::filesystem in multiple files

The verification shows extensive usage of boost::filesystem across the codebase, particularly in:

  • File operations (exists, file_size, remove)
  • Path manipulations (path, parent_path, weakly_canonical)
  • Directory operations (is_directory, create_directory, directory_iterator)

The migration to std::filesystem will require changes in:

  • Header includes
  • Namespace usage
  • Some API adjustments (e.g., boost::filesystem::filesystem_error to std::filesystem::filesystem_error)
🔗 Analysis chain

Line range hint 8-8: Replace boost::filesystem with std::filesystem

Since the PR's objective is to remove boost dependencies, consider replacing boost::filesystem with std::filesystem (available since C++17).

-#include <boost/filesystem.hpp>
+#include <filesystem>

Also update any boost::filesystem namespace usage to std::filesystem.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check for remaining boost::filesystem usage
rg "boost::filesystem" --type cpp

Length of output: 14835

components/core/tests/test-StreamingCompression.cpp (3)

4-4: Consider replacing boost::filesystem with std::filesystem

While the PR aims to remove boost dependencies, boost::filesystem is still used for file cleanup operations. Consider using std::filesystem (C++17) instead:

-#include <boost/filesystem.hpp>
+#include <filesystem>

 // ... rest of the file ...

-        boost::filesystem::remove(compressed_file_path);
+        std::filesystem::remove(compressed_file_path);

Also applies to: 232-232


Line range hint 19-23: Consider using std::vector for safer memory management

Replace manual memory management with std::vector to prevent potential memory leaks and improve exception safety:

-    char* uncompressed_data = new char[uncompressed_data_size];
-    for (size_t i = 0; i < uncompressed_data_size; ++i) {
-        uncompressed_data[i] = (char)('a' + (i % 26));
-    }
+    std::vector<char> uncompressed_data(uncompressed_data_size);
+    std::generate(uncompressed_data.begin(), uncompressed_data.end(),
+                 [n = 0]() mutable { return (char)('a' + ((n++) % 26)); });

Line range hint 1-233: Consider using parameterized tests to reduce code duplication

The test file contains significant repetition in the decompression verification steps. Consider using Catch2's parameterized testing features to reduce duplication:

SCENARIO("StreamingCompression decompression verification") {
    std::vector<size_t> chunk_sizes = {
        uncompressed_data_size / 100,
        uncompressed_data_size / 50,
        uncompressed_data_size / 25,
        // ... other sizes
    };
    
    for (const auto& chunk_size : chunk_sizes) {
        GIVEN("A chunk of size " + std::to_string(chunk_size)) {
            REQUIRE(ErrorCode_Success ==
                    decompressor.get_decompressed_stream_region(
                        uncompressed_bytes,
                        decompressed_data,
                        chunk_size));
            REQUIRE(memcmp(uncompressed_data, decompressed_data, chunk_size) == 0);
            memset(decompressed_data, 0, uncompressed_data_size);
            uncompressed_bytes += chunk_size;
        }
    }
}
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 53c4f52 and 2238406.

📒 Files selected for processing (7)
  • components/core/src/clp/clg/CMakeLists.txt (1 hunks)
  • components/core/src/clp/clo/CMakeLists.txt (1 hunks)
  • components/core/src/clp/clp/CMakeLists.txt (1 hunks)
  • components/core/src/clp/make_dictionaries_readable/CMakeLists.txt (1 hunks)
  • components/core/src/clp/streaming_archive/reader/Segment.cpp (3 hunks)
  • components/core/src/clp/streaming_archive/reader/Segment.hpp (2 hunks)
  • components/core/tests/test-StreamingCompression.cpp (2 hunks)
🔇 Additional comments (12)
components/core/src/clp/make_dictionaries_readable/CMakeLists.txt (1)

Line range hint 48-53: LGTM! Boost::iostreams dependency successfully removed.

The removal of Boost::iostreams while maintaining other necessary Boost components aligns well with the PR objective.

Let's verify that all Boost::iostreams usage has been replaced in the source files:

✅ Verification successful

Boost::iostreams successfully removed, only required Boost dependencies remain

The verification confirms that:

  • No traces of Boost::iostreams or mapped_file_source remain in the codebase
  • Only necessary Boost components are present:
    • boost/filesystem.hpp for path handling
    • boost/program_options.hpp for command-line parsing
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for any remaining Boost::iostreams usage in source files

# Search for any remaining boost::iostreams includes or usage
echo "Checking for boost::iostreams includes..."
rg -l "boost.*iostreams" components/core/src/clp/make_dictionaries_readable/

# Search for mapped_file_source specifically
echo "Checking for mapped_file_source usage..."
rg -l "mapped_file_source" components/core/src/clp/make_dictionaries_readable/

Length of output: 341


Script:

#!/bin/bash
# Let's try a more comprehensive search for any Boost I/O related code

# Search for any Boost I/O related patterns with context
echo "=== Searching for Boost I/O patterns ==="
rg -i "boost.*(io|stream|mapped)" components/core/src/clp/make_dictionaries_readable/ -C 2

# Check the implementation files for memory mapping
echo -e "\n=== Checking memory mapping implementation ==="
fd -e cpp -e hpp . components/core/src/clp/make_dictionaries_readable/ --exec cat {}

# Look for any remaining Boost includes
echo -e "\n=== Checking all Boost includes ==="
rg "#include.*boost" components/core/src/clp/make_dictionaries_readable/

Length of output: 13969

components/core/src/clp/streaming_archive/reader/Segment.hpp (2)

5-5: LGTM! Proper header inclusion for the new dependencies.

The new includes are correctly ordered and necessary for the transition from boost::iostreams to the custom memory-mapped file implementation.

Also applies to: 10-10


56-56: Excellent choice using std::optional for the memory-mapped file.

The transition from boost::iostreams to std::optional is a good modernization that makes the nullable state explicit and provides better ownership semantics.

Let's verify the interface compatibility between the old and new implementations:

✅ Verification successful

Let me gather more information about the ReadOnlyMemoryMappedFile implementation.


Interface compatibility verified successfully

The change from boost::iostreams::mapped_file_source to std::optional<ReadOnlyMemoryMappedFile> is fully compatible. The new ReadOnlyMemoryMappedFile class provides:

  • Memory-mapped file functionality through native POSIX mmap
  • A get_view() method that returns a view of the mapped memory
  • Proper RAII management with move/copy operations explicitly deleted
  • Proper error handling through OperationFailed exception
  • Special handling for empty files

The std::optional wrapper appropriately handles the case when the file mapping isn't initialized, which was previously implicit in the boost implementation.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify that ReadOnlyMemoryMappedFile provides equivalent functionality to boost::iostreams::mapped_file_source

# Search for the ReadOnlyMemoryMappedFile interface
echo "Searching for ReadOnlyMemoryMappedFile interface..."
rg -t hpp "class.*ReadOnlyMemoryMappedFile"

# Search for usage patterns of the old boost implementation to ensure all cases are covered
echo "Searching for previous boost::iostreams usage patterns..."
rg -B 2 -A 2 "boost::iostreams::mapped_file_source" 

Length of output: 2911


Script:

#!/bin/bash
# Search for ReadOnlyMemoryMappedFile implementation and usage
rg -g "*.{cpp,hpp,h}" "ReadOnlyMemoryMappedFile" -B 3 -A 3

# Search for memory mapped file related methods in Segment class
ast-grep --pattern 'class Segment {
  $$$
}'

Length of output: 12015

components/core/src/clp/streaming_archive/reader/Segment.cpp (2)

36-44: LGTM: Proper error handling and resource cleanup

The error handling for previously opened memory mapped files is well implemented with appropriate warning messages and cleanup.


66-68: LGTM: Proper resource cleanup

The cleanup sequence is correct, ensuring proper release of resources.

components/core/src/clp/clg/CMakeLists.txt (2)

Line range hint 1-132: Verify memory mapping implementation transition

The source files list includes ReadOnlyMemoryMappedFile, which replaces Boost's memory mapping functionality. Let's ensure all relevant files have been updated.

#!/bin/bash
# Description: Verify memory mapping implementation transition
# Expected: All files should use the new ReadOnlyMemoryMappedFile class

# Find files that might use memory mapping
rg -l "mapped_file|mmap" components/core/src/clp/clg/

# Check usage of ReadOnlyMemoryMappedFile
ast-grep --pattern 'ReadOnlyMemoryMappedFile'

Line range hint 133-143: LGTM! Boost::iostreams successfully removed.

The removal of Boost::iostreams while retaining other necessary Boost components aligns with the PR objectives.

Let's verify that no source files still include Boost iostreams headers:

✅ Verification successful

Boost::iostreams dependency removal verified successfully

The verification confirms that:

  • No remaining Boost::iostreams includes or usage in the clg directory
  • No direct usage of the old mapped_file or mmap functionality
  • ReadOnlyMemoryMappedFile is properly referenced in CMakeLists.txt
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for any remaining boost/iostreams includes
# Expected: No matches as the dependency has been removed

# Search for boost/iostreams includes
rg -l "boost/iostreams" components/core/src/clp/clg/

# Search for boost::iostreams namespace usage
rg -l "boost::iostreams" components/core/src/clp/clg/

Length of output: 803


Script:

#!/bin/bash
# Let's check for any remaining boost/iostreams includes or usage specifically in the clg directory
rg -l "boost/iostreams|boost::iostreams" components/core/src/clp/clg/

# Also check for any direct usage of mapped_file or mmap in clg directory
rg -l "mapped_file|mmap" components/core/src/clp/clg/

# Let's see how the new ReadOnlyMemoryMappedFile is being used in clg
rg -l "ReadOnlyMemoryMappedFile" components/core/src/clp/clg/

Length of output: 232

components/core/src/clp/clo/CMakeLists.txt (1)

161-161: LGTM! Successfully removed Boost::iostreams dependency.

The removal of Boost::iostreams from the target libraries aligns with the PR objective and follows the implementation of the custom memory-mapped file solution.

components/core/src/clp/clp/CMakeLists.txt (3)

174-174: LGTM! Boost::iostreams successfully removed.

The removal of Boost::iostreams while retaining other necessary Boost components aligns well with the PR objectives.


Line range hint 1-183: Consider updating dependency documentation

Since this represents a significant dependency change, consider updating the project's documentation to reflect the removal of Boost::iostreams and the transition to the custom memory-mapped file implementation.

#!/bin/bash
# Description: Check for documentation files that might need updates

# Look for common documentation files
fd -e md -e rst --exec rg -l "boost.*iostreams|dependencies|build requirements"

Line range hint 174-183: Verify remaining Boost dependencies

Let's verify that the remaining Boost dependencies (filesystem, program_options) aren't indirectly pulling in iostreams.

✅ Verification successful

No indirect Boost::iostreams dependencies found in clp target

The review of the codebase shows that while Boost::iostreams is used in some components, it's only used for its memory-mapped file functionality (mapped_file) and not for any streaming or filtering capabilities. The CMake dependencies for the clp target are safe as they don't indirectly pull in any other Boost::iostreams features.

  • The only Boost::iostreams usage is in:
    • ZstdDecompressor (clp_s component)
    • Segment reader (glt component)
    • LogtypeTableManager (glt component)

All these uses are isolated to memory mapping functionality and don't affect the clp target's dependencies.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for indirect Boost::iostreams dependencies

# Search for any remaining boost::iostreams usage
rg -l "boost.*iostreams|boost/iostreams" --type cpp --type hpp

# Check CMake files for indirect Boost dependencies
fd -e cmake -e txt -x rg -l "Boost::iostreams|boost.*iostreams"

Length of output: 992


Script:

#!/bin/bash
# Let's check the content of the CMake files that mention Boost::iostreams
echo "=== components/core/src/glt/glt/CMakeLists.txt ==="
cat components/core/src/glt/glt/CMakeLists.txt

echo -e "\n=== components/core/src/clp_s/CMakeLists.txt ==="
cat components/core/src/clp_s/CMakeLists.txt

echo -e "\n=== components/core/CMakeLists.txt ==="
cat components/core/CMakeLists.txt

# Also check if there are any direct includes of boost/iostreams in the codebase
echo -e "\n=== Checking for direct includes of boost/iostreams ==="
rg "include.*boost/iostreams" --type cpp --type hpp

Length of output: 33947


Script:

#!/bin/bash
# Let's check for any direct includes of boost/iostreams with proper file extensions
rg "include.*boost/iostreams" -t cpp -t h -t hpp -t cc

# Also check for any usage of boost::iostreams namespace
rg "boost::iostreams::" -t cpp -t h -t hpp -t cc

Length of output: 161


Script:

#!/bin/bash
# Search for boost/iostreams includes and usage without file type filters
rg "include.*boost/iostreams"

# Search for boost::iostreams namespace usage
rg "boost::iostreams::"

# Also check if there are any files that might be using iostreams indirectly
rg -l "iostream|istream|ostream|filtering_stream|filtering_streambuf"

Length of output: 4821

components/core/tests/test-StreamingCompression.cpp (1)

153-156: LGTM! Clean RAII-style memory mapping implementation

The new implementation using ReadOnlyMemoryMappedFile follows modern C++ practices:

  • RAII-style resource management
  • Proper const correctness
  • Clean view-based access to mapped memory

@LinZhihao-723 LinZhihao-723 changed the title core-clp: Remove boost::iostream dependency. refactor(core): Remove boost::iostream dependency. Nov 14, 2024
@LinZhihao-723 LinZhihao-723 changed the title refactor(core): Remove boost::iostream dependency. build(core): Remove boost::iostream dependency. Nov 14, 2024
@LinZhihao-723
Copy link
Member Author

@kirkrodrigues Yeah Centos 9 resolves the workflow failure. Can u check the commit message?

@LinZhihao-723 LinZhihao-723 self-assigned this Nov 14, 2024
Copy link
Member

@kirkrodrigues kirkrodrigues left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the PR title, how about:

build(core-clp): Remove `boost::iostream` dependency.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants