Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bloom filter occupied 90 % of memory on server with 836Gb available ram #99

Open
ethany21 opened this issue Nov 27, 2024 · 0 comments
Open

Comments

@ethany21
Copy link

I have 12Tb of text file for dedup
(with size of each file is 4Gb, about 652050487 tokens within)
and for bff, I configured exepected-ngram-count to be 2001794995090 , fp-rate to be 0.01

And when I started bff, bloom filter occupied 90% of ram and soon system nearly crashed.
Is it better way to divide files into smaller groups and run each group sequentially?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant