Skip to content

Commit

Permalink
while we're at it, make the creation of the CE's an interable so we d…
Browse files Browse the repository at this point in the history
…on't need to keep the entire thing in memory. MH.export_multiple_to_single_hdf5 appears to already be set up to handle this. #29
  • Loading branch information
dkoslicki committed Mar 24, 2020
1 parent 0472879 commit 0f3da00
Showing 1 changed file with 6 additions and 3 deletions.
9 changes: 6 additions & 3 deletions scripts/MakeStreamingDNADatabase.py
Original file line number Diff line number Diff line change
Expand Up @@ -78,13 +78,16 @@ def main():
if verbose:
print("Creating Min Hash Sketches")
pool = Pool(processes=num_threads)
genome_sketches = pool.map(make_minhash_star, zip(file_names, repeat(max_h), repeat(prime), repeat(ksize)))
pool.close()
#genome_sketches = pool.map(make_minhash_star, zip(file_names, repeat(max_h), repeat(prime), repeat(ksize)))
# use imap so we get an iterable instead, that way we can immediately start writing to file and don't need to keep
# the entire genome sketches in memory
genome_sketches = pool.imap(make_minhash_star, zip(file_names, repeat(max_h), repeat(prime), repeat(ksize)))
#pool.close()
# Export all the sketches
if verbose:
print("Exporting sketches")
MH.export_multiple_to_single_hdf5(genome_sketches, out_file)

pool.close()
# Initialize the creation of the TST
M = MakeTSTNew(out_file, streaming_database_file)
if verbose:
Expand Down

0 comments on commit 0f3da00

Please sign in to comment.