You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have 12Tb of text file for dedup
(with size of each file is 4Gb, about 652050487 tokens within)
and for bff, I configured exepected-ngram-count to be 2001794995090 , fp-rate to be 0.01
And when I started bff, bloom filter occupied 90% of ram and soon system nearly crashed.
Is it better way to divide files into smaller groups and run each group sequentially?
The text was updated successfully, but these errors were encountered:
I have 12Tb of text file for dedup
(with size of each file is 4Gb, about 652050487 tokens within)
and for bff, I configured exepected-ngram-count to be 2001794995090 , fp-rate to be 0.01
And when I started bff, bloom filter occupied 90% of ram and soon system nearly crashed.
Is it better way to divide files into smaller groups and run each group sequentially?
The text was updated successfully, but these errors were encountered: