Skip to content

Commit

Permalink
Merge pull request #181 from naobservatory/harmon_change_gz
Browse files Browse the repository at this point in the history
Convert gold-standard gzipped output to raw files
  • Loading branch information
willbradshaw authored Feb 5, 2025
2 parents e4a5bc7 + f72138b commit 1b37a8a
Show file tree
Hide file tree
Showing 22 changed files with 430 additions and 1 deletion.
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@
- Numerous changes to column names in viral hits TSV, mainly to improve clarity
- Updated mislabeled processes
- Added instructions for what to do should you run out of API requests for containers
- Unzipped gold standard reference output in `test-data/gold-standard-results`

# v2.7.0.2
- Updated `pipeline-version.txt`
Expand Down
3 changes: 3 additions & 0 deletions test-data/gold-standard-results/bracken_reports_merged.tsv
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
name taxid rank kraken_assigned_reads added_reads new_est_reads fraction_total_reads sample ribosomal
Viruses 10239 D 1 0 1 1.00000 gold_standard TRUE
Viruses 10239 D 41 0 41 1.00000 gold_standard FALSE
Binary file not shown.
78 changes: 78 additions & 0 deletions test-data/gold-standard-results/kraken_reports_merged.tsv
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
pc_reads_total n_reads_clade n_reads_direct n_minimizers_total n_minimizers_distinct rank taxid name sample ribosomal
95.65 22 22 0 0 U 0 unclassified gold_standard TRUE
4.35 1 0 21 4 R 1 root gold_standard TRUE
4.35 1 0 21 4 D 10239 Viruses gold_standard TRUE
4.35 1 0 4 3 D1 2559587 Riboviria gold_standard TRUE
4.35 1 0 4 3 K 2732396 Orthornavirae gold_standard TRUE
4.35 1 0 4 3 P 2497569 Negarnaviricota gold_standard TRUE
4.35 1 0 4 3 P1 2497571 Polyploviricotina gold_standard TRUE
4.35 1 0 4 3 C 2497576 Ellioviricetes gold_standard TRUE
4.35 1 0 4 3 O 1980410 Bunyavirales gold_standard TRUE
4.35 1 0 4 3 F 1980416 Peribunyaviridae gold_standard TRUE
4.35 1 0 4 3 G 11572 Orthobunyavirus gold_standard TRUE
4.35 1 0 4 3 S 3052437 Orthobunyavirus schmallenbergense gold_standard TRUE
4.35 1 1 4 3 S1 159150 Shamonda virus gold_standard TRUE
33.87 21 21 0 0 U 0 unclassified gold_standard FALSE
66.13 41 0 1001 986 R 1 root gold_standard FALSE
66.13 41 0 1001 986 D 10239 Viruses gold_standard FALSE
61.29 38 0 990 977 D1 2559587 Riboviria gold_standard FALSE
61.29 38 0 989 976 K 2732396 Orthornavirae gold_standard FALSE
32.26 20 0 747 737 P 2732406 Kitrinoviricota gold_standard FALSE
20.97 13 0 616 616 C 2732461 Alsuviricetes gold_standard FALSE
20.97 13 0 616 616 O 2732544 Martellivirales gold_standard FALSE
20.97 13 0 616 616 F 675071 Virgaviridae gold_standard FALSE
20.97 13 0 616 616 G 12234 Tobamovirus gold_standard FALSE
9.68 6 6 279 279 S 12239 Pepper mild mottle virus gold_standard FALSE
4.84 3 3 173 173 S 12235 Cucumber green mottle mosaic virus gold_standard FALSE
3.23 2 2 79 79 S 12241 Tobacco mild green mosaic virus gold_standard FALSE
1.61 1 1 45 45 S 12242 Tobacco mosaic virus gold_standard FALSE
1.61 1 1 30 30 S 12253 Tomato mosaic virus gold_standard FALSE
11.29 7 0 131 121 C 2732463 Tolucaviricetes gold_standard FALSE
11.29 7 0 131 121 O 2732548 Tolivirales gold_standard FALSE
11.29 7 0 131 121 F 39738 Tombusviridae gold_standard FALSE
11.29 7 0 131 121 F1 2560077 Procedovirinae gold_standard FALSE
11.29 7 0 131 121 G 1911601 Gammacarmovirus gold_standard FALSE
11.29 7 0 131 121 S 3048200 Gammacarmovirus melonis gold_standard FALSE
11.29 7 7 131 121 S1 11987 Melon necrotic spot virus gold_standard FALSE
27.42 17 0 240 237 P 2732408 Pisuviricota gold_standard FALSE
16.13 10 0 136 136 C 2732507 Stelpaviricetes gold_standard FALSE
16.13 10 0 136 136 O 2732551 Stellavirales gold_standard FALSE
16.13 10 0 136 136 F 39733 Astroviridae gold_standard FALSE
9.68 6 0 55 55 G 249588 Mamastrovirus gold_standard FALSE
4.84 3 3 25 25 S 1239565 Mamastrovirus 1 gold_standard FALSE
3.23 2 0 24 24 S 1239570 Mamastrovirus 6 gold_standard FALSE
3.23 2 2 24 24 S1 568715 Astrovirus MLB1 gold_standard FALSE
1.61 1 0 6 6 G1 526119 unclassified Mamastrovirus gold_standard FALSE
1.61 1 1 5 5 S 1389204 Feline astrovirus 2 gold_standard FALSE
6.45 4 0 67 67 F1 352926 unclassified Astroviridae gold_standard FALSE
6.45 4 4 66 66 S 1868658 Human astrovirus gold_standard FALSE
11.29 7 0 103 100 C 2732506 Pisoniviricetes gold_standard FALSE
11.29 7 0 103 100 O 464095 Picornavirales gold_standard FALSE
6.45 4 0 64 64 F 12058 Picornaviridae gold_standard FALSE
4.84 3 0 56 56 F1 2946635 Kodimesavirinae gold_standard FALSE
4.84 3 0 56 56 G 194960 Kobuvirus gold_standard FALSE
4.84 3 0 55 55 S 72149 Kobuvirus aichi gold_standard FALSE
4.84 3 3 55 55 S1 1313215 aichivirus A1 gold_standard FALSE
1.61 1 0 4 4 F1 2946640 Paavivirinae gold_standard FALSE
1.61 1 0 4 4 G 138954 Parechovirus gold_standard FALSE
1.61 1 0 4 4 S 1803956 Parechovirus A gold_standard FALSE
1.61 1 1 2 2 S1 12063 parechovirus A1 gold_standard FALSE
4.84 3 0 39 36 F 232795 Dicistroviridae gold_standard FALSE
4.84 3 0 39 36 F1 336635 unclassified Dicistroviridae gold_standard FALSE
4.84 3 3 39 36 S 1776109 Goose dicistrovirus gold_standard FALSE
1.61 1 0 2 2 P 2732405 Duplornaviricota gold_standard FALSE
1.61 1 0 2 2 C 2732459 Resentoviricetes gold_standard FALSE
1.61 1 0 2 2 O 2732541 Reovirales gold_standard FALSE
1.61 1 0 2 2 F 2946186 Sedoreoviridae gold_standard FALSE
1.61 1 0 2 2 G 10912 Rotavirus gold_standard FALSE
1.61 1 1 2 2 S 28875 Rotavirus A gold_standard FALSE
4.84 3 0 10 8 D1 2732004 Varidnaviria gold_standard FALSE
4.84 3 0 10 8 K 2732005 Bamfordvirae gold_standard FALSE
4.84 3 0 10 8 P 2732007 Nucleocytoviricota gold_standard FALSE
4.84 3 0 10 8 C 2732525 Pokkesviricetes gold_standard FALSE
4.84 3 0 10 8 O 2732527 Chitovirales gold_standard FALSE
4.84 3 0 10 8 F 10240 Poxviridae gold_standard FALSE
4.84 3 0 10 8 F1 10241 Chordopoxvirinae gold_standard FALSE
4.84 3 0 10 8 G 2733297 Oryzopoxvirus gold_standard FALSE
4.84 3 0 10 8 G1 2788403 unclassified Oryzopoxvirus gold_standard FALSE
4.84 3 3 10 8 S 67082 BeAn 58058 virus gold_standard FALSE
Binary file not shown.
31 changes: 31 additions & 0 deletions test-data/gold-standard-results/merged_blast_filtered.tsv
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
qseqid sseqid sgi staxid qlen evalue bitscore qcovs length pident mismatch gapopen sstrand qstart qend sstart send bitscore_rank_dense bitscore_fraction
SRR12204734.102846486 gi|1799084385|gb|MN030571.1| 1799084385 2699435 141 9.47e-30 134 100 141 83.688 23 0 minus 1 141 3990 3850 1 1.0
SRR12204734.102846486 gi|2571040051|gb|OR130732.1| 2571040051 3071813 141 9.47e-30 134 100 141 83.688 23 0 minus 1 141 3990 3850 2 1.0
SRR12204734.132645528 gi|2510069797|gb|ON398705.1| 2510069797 3049538 128 2.89e-59 231 100 128 99.219 1 0 minus 1 128 4697 4570 1 1.0
SRR12204734.136705306 gi|1799084385|gb|MN030571.1| 1799084385 2699435 140 1.53e-47 193 98 137 91.971 11 0 minus 4 140 1036 900 1 1.0
SRR12204734.136705306 gi|2571040051|gb|OR130732.1| 2571040051 3071813 140 1.53e-47 193 98 137 91.971 11 0 minus 4 140 1036 900 2 1.0
SRR12204734.146348965 gi|1799084385|gb|MN030571.1| 1799084385 2699435 142 7.38e-31 137 71 101 92.079 3 1 minus 1 101 6359 6264 1 1.0
SRR12204734.146348965 gi|2571040051|gb|OR130732.1| 2571040051 3071813 142 7.38e-31 137 71 101 92.079 3 1 minus 1 101 6760 6665 2 1.0
SRR12204734.146348965 gi|1430400470|gb|MF927778.1| 1430400470 2268895 142 2.69e-20 102 72 102 85.294 11 2 minus 1 102 9379 9282 2 0.7445255474452555
SRR12204734.158389317 gi|1799084385|gb|MN030571.1| 1799084385 2699435 140 1.95e-56 222 99 138 95.652 6 0 minus 1 138 1669 1532 1 1.0
SRR12204734.158389317 gi|2571040051|gb|OR130732.1| 2571040051 3071813 140 1.95e-56 222 99 138 95.652 6 0 minus 1 138 1669 1532 2 1.0
SRR12204848.140127907 gi|1799084385|gb|MN030571.1| 1799084385 2699435 143 1.58e-42 176 100 143 88.811 16 0 minus 1 143 1854 1712 1 1.0
SRR12204848.140127907 gi|2571040051|gb|OR130732.1| 2571040051 3071813 143 1.58e-42 176 100 143 88.811 16 0 minus 1 143 1854 1712 2 1.0
SRR12204848.15085913 gi|1799084385|gb|MN030571.1| 1799084385 2699435 142 2.04e-36 156 72 103 94.175 4 2 plus 41 142 1938 2039 1 1.0
SRR12204848.15085913 gi|2571040051|gb|OR130732.1| 2571040051 3071813 142 2.04e-36 156 72 103 94.175 4 2 plus 41 142 1938 2039 2 1.0
SRR12204848.28156033 gi|1799084385|gb|MN030571.1| 1799084385 2699435 136 1.90e-46 189 99 135 91.852 11 0 plus 1 135 1633 1767 1 1.0
SRR12204848.28156033 gi|2571040051|gb|OR130732.1| 2571040051 3071813 136 1.90e-46 189 99 135 91.852 11 0 plus 1 135 1633 1767 2 1.0
SRR12204848.31434798 gi|1799084385|gb|MN030571.1| 1799084385 2699435 147 4.47e-53 211 100 147 92.517 11 0 plus 1 147 1180 1326 1 1.0
SRR12204848.31434798 gi|2571040051|gb|OR130732.1| 2571040051 3071813 147 4.47e-53 211 100 147 92.517 11 0 plus 1 147 1180 1326 2 1.0
SRR12204849.140297134 gi|2510069797|gb|ON398705.1| 2510069797 3049538 148 5.71e-67 257 100 148 97.973 3 0 plus 1 148 2798 2945 1 1.0
SRR12204849.142401718 gi|2294264610|gb|MW678777.1| 2294264610 32630 144 9.31e-60 233 100 144 95.833 6 0 minus 1 144 2225 2082 1 1.0
SRR12204849.142401718 gi|2571040051|gb|OR130732.1| 2571040051 3071813 144 4.33e-58 228 100 144 95.139 7 0 minus 1 144 5510 5367 2 0.9785407725321889
SRR12204849.147306033 gi|2450664545|dbj|LC723624.1| 2450664545 2973485 146 4.37e-63 244 99 144 97.222 4 0 plus 3 146 499 642 1 1.0
SRR12204849.16071359 gi|2784409127|gb|PQ072823.1| 2784409127 32630 138 1.51e-42 176 99 139 89.928 10 3 plus 1 137 63 199 1 1.0
SRR12204850.120094795 gi|380719094|gb|JQ281544.1| 380719094 1163660 148 1.24e-63 246 100 148 96.622 5 0 minus 1 148 646 499 1 1.0
SRR12204850.140730292 gi|1799084385|gb|MN030571.1| 1799084385 2699435 139 1.16e-53 213 100 139 94.245 8 0 plus 1 139 2108 2246 1 1.0
SRR12204850.140730292 gi|2571040051|gb|OR130732.1| 2571040051 3071813 139 1.16e-53 213 100 139 94.245 8 0 plus 1 139 2108 2246 2 1.0
SRR12204850.28709236 gi|380719094|gb|JQ281544.1| 380719094 1163660 148 3.51e-49 198 95 140 92.143 11 0 plus 3 142 4290 4429 1 1.0
SRR12204850.8190301 gi|380719094|gb|JQ281544.1| 380719094 1163660 147 2.69e-50 202 99 145 91.724 12 0 minus 1 145 5481 5337 1 1.0
SRR12204850.89255206 gi|1799084385|gb|MN030571.1| 1799084385 2699435 148 4.51e-53 211 99 147 92.517 11 0 minus 2 148 553 407 1 1.0
SRR12204850.89255206 gi|2571040051|gb|OR130732.1| 2571040051 3071813 148 4.51e-53 211 99 147 92.517 11 0 minus 2 148 553 407 2 1.0
Binary file not shown.
2 changes: 2 additions & 0 deletions test-data/gold-standard-results/read_counts.tsv
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
sample n_reads_single n_read_pairs
gold_standard 330 165
Binary file removed test-data/gold-standard-results/read_counts.tsv.gz
Binary file not shown.
109 changes: 109 additions & 0 deletions test-data/gold-standard-results/subset_qc_adapter_stats.tsv
Original file line number Diff line number Diff line change
@@ -0,0 +1,109 @@
position pc_adapters file adapter stage sample
1 0 gold_standard_interleaved illumina_universal_adapter raw gold_standard
2 0 gold_standard_interleaved illumina_universal_adapter raw gold_standard
3 0 gold_standard_interleaved illumina_universal_adapter raw gold_standard
4 0 gold_standard_interleaved illumina_universal_adapter raw gold_standard
5 0 gold_standard_interleaved illumina_universal_adapter raw gold_standard
6 0 gold_standard_interleaved illumina_universal_adapter raw gold_standard
7 0 gold_standard_interleaved illumina_universal_adapter raw gold_standard
8 0 gold_standard_interleaved illumina_universal_adapter raw gold_standard
9 0 gold_standard_interleaved illumina_universal_adapter raw gold_standard
12 0 gold_standard_interleaved illumina_universal_adapter raw gold_standard
17 0 gold_standard_interleaved illumina_universal_adapter raw gold_standard
22 0 gold_standard_interleaved illumina_universal_adapter raw gold_standard
27 0 gold_standard_interleaved illumina_universal_adapter raw gold_standard
32 0 gold_standard_interleaved illumina_universal_adapter raw gold_standard
37 0 gold_standard_interleaved illumina_universal_adapter raw gold_standard
42 0 gold_standard_interleaved illumina_universal_adapter raw gold_standard
47 0 gold_standard_interleaved illumina_universal_adapter raw gold_standard
52 0 gold_standard_interleaved illumina_universal_adapter raw gold_standard
57 0 gold_standard_interleaved illumina_universal_adapter raw gold_standard
62 0 gold_standard_interleaved illumina_universal_adapter raw gold_standard
67 0 gold_standard_interleaved illumina_universal_adapter raw gold_standard
72 0 gold_standard_interleaved illumina_universal_adapter raw gold_standard
77 0 gold_standard_interleaved illumina_universal_adapter raw gold_standard
82 0 gold_standard_interleaved illumina_universal_adapter raw gold_standard
87 0 gold_standard_interleaved illumina_universal_adapter raw gold_standard
92 0 gold_standard_interleaved illumina_universal_adapter raw gold_standard
97 0 gold_standard_interleaved illumina_universal_adapter raw gold_standard
102 0 gold_standard_interleaved illumina_universal_adapter raw gold_standard
107 0 gold_standard_interleaved illumina_universal_adapter raw gold_standard
112 1.1428571428571428 gold_standard_interleaved illumina_universal_adapter raw gold_standard
117 4.571428571428571 gold_standard_interleaved illumina_universal_adapter raw gold_standard
122 5.714285714285714 gold_standard_interleaved illumina_universal_adapter raw gold_standard
127 5.714285714285714 gold_standard_interleaved illumina_universal_adapter raw gold_standard
132 8.38095238095238 gold_standard_interleaved illumina_universal_adapter raw gold_standard
137 10.857142857142858 gold_standard_interleaved illumina_universal_adapter raw gold_standard
140 12.380952380952381 gold_standard_interleaved illumina_universal_adapter raw gold_standard
1 0 gold_standard_interleaved polya raw gold_standard
2 0 gold_standard_interleaved polya raw gold_standard
3 0 gold_standard_interleaved polya raw gold_standard
4 0 gold_standard_interleaved polya raw gold_standard
5 0 gold_standard_interleaved polya raw gold_standard
6 0 gold_standard_interleaved polya raw gold_standard
7 0 gold_standard_interleaved polya raw gold_standard
8 0 gold_standard_interleaved polya raw gold_standard
9 0 gold_standard_interleaved polya raw gold_standard
12 0.38095238095238093 gold_standard_interleaved polya raw gold_standard
17 0.8571428571428571 gold_standard_interleaved polya raw gold_standard
22 1.0476190476190477 gold_standard_interleaved polya raw gold_standard
27 1.4285714285714286 gold_standard_interleaved polya raw gold_standard
32 1.8095238095238095 gold_standard_interleaved polya raw gold_standard
37 1.9047619047619047 gold_standard_interleaved polya raw gold_standard
42 2.0952380952380953 gold_standard_interleaved polya raw gold_standard
47 2.380952380952381 gold_standard_interleaved polya raw gold_standard
52 2.380952380952381 gold_standard_interleaved polya raw gold_standard
57 2.380952380952381 gold_standard_interleaved polya raw gold_standard
62 2.380952380952381 gold_standard_interleaved polya raw gold_standard
67 2.380952380952381 gold_standard_interleaved polya raw gold_standard
72 2.666666666666667 gold_standard_interleaved polya raw gold_standard
77 2.857142857142857 gold_standard_interleaved polya raw gold_standard
82 2.857142857142857 gold_standard_interleaved polya raw gold_standard
87 2.857142857142857 gold_standard_interleaved polya raw gold_standard
92 2.857142857142857 gold_standard_interleaved polya raw gold_standard
97 2.857142857142857 gold_standard_interleaved polya raw gold_standard
102 2.857142857142857 gold_standard_interleaved polya raw gold_standard
107 2.857142857142857 gold_standard_interleaved polya raw gold_standard
112 2.857142857142857 gold_standard_interleaved polya raw gold_standard
117 2.857142857142857 gold_standard_interleaved polya raw gold_standard
122 2.857142857142857 gold_standard_interleaved polya raw gold_standard
127 2.857142857142857 gold_standard_interleaved polya raw gold_standard
132 2.857142857142857 gold_standard_interleaved polya raw gold_standard
137 2.857142857142857 gold_standard_interleaved polya raw gold_standard
140 2.857142857142857 gold_standard_interleaved polya raw gold_standard
1 1.4285714285714286 gold_standard_interleaved polyg raw gold_standard
2 1.9047619047619047 gold_standard_interleaved polyg raw gold_standard
3 1.9047619047619047 gold_standard_interleaved polyg raw gold_standard
4 1.9047619047619047 gold_standard_interleaved polyg raw gold_standard
5 1.9047619047619047 gold_standard_interleaved polyg raw gold_standard
6 1.9047619047619047 gold_standard_interleaved polyg raw gold_standard
7 1.9047619047619047 gold_standard_interleaved polyg raw gold_standard
8 1.9047619047619047 gold_standard_interleaved polyg raw gold_standard
9 1.9047619047619047 gold_standard_interleaved polyg raw gold_standard
12 1.9047619047619047 gold_standard_interleaved polyg raw gold_standard
17 1.9047619047619047 gold_standard_interleaved polyg raw gold_standard
22 1.9047619047619047 gold_standard_interleaved polyg raw gold_standard
27 1.9047619047619047 gold_standard_interleaved polyg raw gold_standard
32 1.9047619047619047 gold_standard_interleaved polyg raw gold_standard
37 1.9047619047619047 gold_standard_interleaved polyg raw gold_standard
42 1.9047619047619047 gold_standard_interleaved polyg raw gold_standard
47 1.9047619047619047 gold_standard_interleaved polyg raw gold_standard
52 1.9047619047619047 gold_standard_interleaved polyg raw gold_standard
57 1.9047619047619047 gold_standard_interleaved polyg raw gold_standard
62 1.9047619047619047 gold_standard_interleaved polyg raw gold_standard
67 1.9047619047619047 gold_standard_interleaved polyg raw gold_standard
72 1.9047619047619047 gold_standard_interleaved polyg raw gold_standard
77 1.9047619047619047 gold_standard_interleaved polyg raw gold_standard
82 1.9047619047619047 gold_standard_interleaved polyg raw gold_standard
87 1.9047619047619047 gold_standard_interleaved polyg raw gold_standard
92 1.9047619047619047 gold_standard_interleaved polyg raw gold_standard
97 1.9047619047619047 gold_standard_interleaved polyg raw gold_standard
102 1.9047619047619047 gold_standard_interleaved polyg raw gold_standard
107 1.9047619047619047 gold_standard_interleaved polyg raw gold_standard
112 1.9047619047619047 gold_standard_interleaved polyg raw gold_standard
117 1.9047619047619047 gold_standard_interleaved polyg raw gold_standard
122 1.9047619047619047 gold_standard_interleaved polyg raw gold_standard
127 1.9047619047619047 gold_standard_interleaved polyg raw gold_standard
132 2.2857142857142856 gold_standard_interleaved polyg raw gold_standard
137 2.380952380952381 gold_standard_interleaved polyg raw gold_standard
140 2.380952380952381 gold_standard_interleaved polyg raw gold_standard
Binary file not shown.
3 changes: 3 additions & 0 deletions test-data/gold-standard-results/subset_qc_basic_stats.tsv
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
percent_gc mean_seq_len n_reads_single n_read_pairs percent_duplicates n_bases_approx per_base_sequence_quality per_sequence_quality_scores per_base_sequence_content per_sequence_gc_content per_base_n_content sequence_length_distribution sequence_duplication_levels overrepresented_sequences adapter_content stage sample
46 150.24761904761905 210 105 4.285714285714278 31500 pass pass fail fail pass warn pass fail fail raw gold_standard
47 145.31764705882352 170 85 7.058823529411768 24700 pass pass fail fail pass warn pass fail pass cleaned gold_standard
Binary file not shown.
Loading

0 comments on commit 1b37a8a

Please sign in to comment.