-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Testing environment #14
Comments
…n MakeStreamingDNADatabase.py and fixed (empty kmer issue).
…of truncating and taking rev-comps
Some unit tests are in. MinHash module tests check validity of results. Query module is only really checking for code-breaking errors at this point, as there are a lot of FIXME's and TODO's. Will need to:
Will be tagging this as help wanted and assigning everyone, since all are welcome to contribute. SOP: create new branch: git checkout master
git pull origin master # make sure code is up to date
git checkout -b <some_feature_branch_name> # create a new branch implementing a new testing feature
# add your new feature
git commit -a # commit your contributions
git push origin <some_feature_branch_name> # push your changes to your feature branch
# then request a code review before merging to master |
Note: while I assigned all, this is mainly a QOL (quality of life) issue: things that will make our future contributions easier in the future, but should not distract from main projects. i.e. as time permits. |
… testing, and other testing issues. Further work on #14 will happen here.
… not __ for functions in pool.map() #14
@dkoslicki Make sure |
Ground truth Well that looks pretty nice to me! |
Switched to canonical k-mers to sanity check things, results basically unchanged: So we'll be sticking with canonical k-mers for the ground truth as it's much more straightforward to understand. |
…all it just like StreamingQueryDNADatabase.py #14
Note to self @dkoslicki: something odd is happening at small k-mer sizes: using
seems correct, but
returns accurate small k-mer size results... import CMash.GroundTruth as G
training_database_file = "/home/dkoslicki/Desktop/CMash/tests/script_tests/TrainingDatabase.h5"
query_file1 = "/home/dkoslicki/Desktop/CMash/tests/Organisms/taxid_1192839_4_genomic.fna.gz"
query_file2 = "/home/dkoslicki/Desktop/CMash/tests/Organisms/taxid_562_8705_genomic.fna.gz"
g = G.TrueContainment(training_database_file, "4-6-1")
len(g.training_file_to_ksize_to_kmers[query_file1][4].intersection(g.training_file_to_ksize_to_kmers[query_file2][4]))/float(len(g.training_file_to_ksize_to_kmers[query_file1][4]))
1.0
len(g.training_file_to_ksize_to_kmers[query_file1][4].intersection(g.training_file_to_ksize_to_kmers[query_file2][4]))/float(len(g.training_file_to_ksize_to_kmers[query_file2][4]))
0.3056179775280899 And the Oh yeah, and |
… make sure that the installation worked correctly (i.e. don't call with python from the scripts directory, instead call the Bioconda installed version) #14
Regarding direction of containment, I think the committed way is best:
set2 as denom:
But clearly something is up with
But clearly something is up with
Now to test on a "real" metagenome... |
Will create new issue for ground truth containment computation so it will be easier to track progress on this. |
Current tests are end-to-end integration tests that makes sure scripts execute successfully. There is much more testing that could be done including:
tests
folder (lots can be copied fromCMash/MinHash.py
)The text was updated successfully, but these errors were encountered: