-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathblaster_plan.txt
64 lines (53 loc) · 3.45 KB
/
blaster_plan.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
## Planned Structure of blast directory with new blaster script
D blast
D blast-2.2.17 # The blast software
L current # A soft link to the current version of the blast software, in case we use more than one
D db # The blast database directory
D FASTA # db subdirectory containing fasta formatted database files
D bin # Directory containing all my scripts
D blaster # subdirectory containing the blaster program
F blaster.rb # the main program for slicing genomes into 40mers,
# screening, blasting, and binning probe candidates
F config.yaml # config file for blaster.rb and subprograms
D lib # Directory containing Ruby class files for subprograms of blaster.rb
D test # Directory containing Ruby Test::Unit test files for methods of blaster and classes
D data # Directory containing input data used by blaster.rb
F bacteria.yaml # Structured list of bacteria to blast, each identified by NC_ID, genus, and species
D genome_fastas # Directory containing all fasta files for genomes to blast
F NC_XXXXXXX.fna # each bacterium will have one of these fasta files named by ID
D results # Directory containing output of blaster.rb
D NC_XXXXXXX_genus_species_strain # each organism will have a name directory for its results
F NC_XXXXXXX_human_bin # bin file for human matches (if it has a good match to human)
F NC_XXXXXXX_other_bin # bin file for other matches
#(if no good match to human but good match to something else)
F NC_XXXXXXX_genus_bin # bin file for genus matches (only good matches to same genus and species)
F NC_XXXXXXX_species_bin # bin file for species matches (only good match to same species)
F NC_XXXXXXX_blast_candidates.fasta # fasta file of 40mers created after screen
F NC_XXXXXXX_blast_candidates_after_human.fasta # fasta file of 40mers remaining after human bin pulled out
D logs # log files of runs of blaster.rb
F run_log_<datetime>.log # Each run will generate a log id'ed by date and time of start
USAGE (some details could vary):
Place fasta files properly named into genome_fastas directory.
Place bacteria.yaml file containing formatted list of bacteria to run into data directory.
Check and possibly edit config.yaml for conditions
Change directory (cd) to blast/bin/blaster
Run the program with "./blaster.rb"
Observe progress by using tail on the blast results files & reading command line feedback.
When it completes, examine bin files in the results folders.
Status:
blaster.rb basically done
bacterium.rb bin_human and later methods to write, needs lots of tests
blast_hash.rb tests to fill in
my_dna.rb tests complete
blast_flora.rb created from BlastFlora class pulled out of blaster.rb, has some test assertions, could use more.
Fixed several scoping bugs related to classes needing require statements, attr_accessors and formatting of some references.
Might look in later and see if attr_accessors can be trimmed back to attr_readers
Installed blast and dbs on laptop for use in testing
Installed git on sgtc-rails
copied project over to sgtc-rails using git
3/12/08 bug fixing and running to test - got it running blaster.rb up to first blast results, including dir creation and file generation
Tasks:
Download genomes
Fill in bacteria.yaml
Parse the "other" blast results
Flesh out other tests