Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problems running bench with 1M files #199

Closed
HanSolo opened this issue Feb 1, 2024 · 6 comments · Fixed by #200
Closed

Problems running bench with 1M files #199

HanSolo opened this issue Feb 1, 2024 · 6 comments · Fixed by #200

Comments

@HanSolo
Copy link

HanSolo commented Feb 1, 2024

Downloaded the 1M dataset and adjusted the code in DownloadHelper and Bench according to the new file names but when trying to run the bench example using ./mvnw -Pjdk11 compile exec:exec@bench it stops running after a while with a RuntimeException.
RuntimeException_executing_bench_on_1M_dataset.txt
machine_info.txt

Is there anything specific one need to adjust to make the bench example run with the 1M dataset?

@jbellis
Copy link
Owner

jbellis commented Feb 1, 2024

Exception in thread "main" java.lang.RuntimeException: File sizes greater than 2GB are not supported on Windows--contributions welcome

Longer explanation: the easiest way to make a reasonably performant RandomAccessReader class is to use mmap, but Java's standard mmap only supports segments up to 2GB. Hence the comments here:

/**
 * Simple sample implementation of RandomAccessReader.
 * It provides a bare minimum to run against disk in reasonable time.
 * Does not handle files above 2 GB.
 */
public class SimpleMappedReader implements RandomAccessReader {

We plugged in a native library to work around this in MMapReader, but that library does not support Windows.

So your options include (from easiest to hardest)

  1. Use mac or linux to test large graphs
  2. Update SimpleMappedReader to segment large files into < 2GB chunks
  3. Implement a RandomAccessReader using standard io that matches mmap's performance
  4. Implement Windows support for the native library we use https://github.com/indeedeng/util/tree/main/mmap

@HanSolo
Copy link
Author

HanSolo commented Feb 1, 2024

Yep I know that but I’m running on Linux x64 which is the reason why I was surprised about the 2GB limit.

@jbellis
Copy link
Owner

jbellis commented Feb 1, 2024

        try {
            return new MMapReaderSupplier(path);
        } catch (UnsatisfiedLinkError|NoClassDefFoundError e) {
            if (Files.size(path) > Integer.MAX_VALUE) {
                throw new RuntimeException("File sizes greater than 2GB are not supported on Windows--contributions welcome");
            }

            return new SimpleMappedReaderSupplier(path);
        }

looks like for some reason it can't find the native mmap library

@HanSolo
Copy link
Author

HanSolo commented Feb 1, 2024

Will do, thx for the hint 👍🏻

@jkni
Copy link
Collaborator

jkni commented Feb 1, 2024

Thanks for the report @HanSolo. This identifies an issue with JVector and not your setup. I'll push a PR shortly that provides better diagnostic output and fixes the issue (which I'll link here).

@jkni jkni linked a pull request Feb 1, 2024 that will close this issue
@jkni
Copy link
Collaborator

jkni commented Feb 1, 2024

Issue reproduced and resolved locally by the linked PR. If this PR doesn't resolve your issue, it should at least provide a clearer reason for the fallback.

@jkni jkni closed this as completed in #200 Feb 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants