Skip to content

Commit

Permalink
Improve how 'subtype_query' is defined
Browse files Browse the repository at this point in the history
Brings the config structure in line with all of the other config values,
but this necessitates a more complicated usage of
`resolve_config_value`. Hopefully the added comments are instructive to
people in the future!

This also greatly improves the error handling if we forget to add the
appropriate subtype to the config, or leave out 'subtype_query'
altogether.

Inspired by <#104 (comment)>
and <#146 (comment)>
  • Loading branch information
jameshadfield committed Mar 6, 2025
1 parent 77796cc commit 7205647
Show file tree
Hide file tree
Showing 3 changed files with 20 additions and 14 deletions.
11 changes: 6 additions & 5 deletions genome-focused/config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -45,12 +45,13 @@ dropped_strains: config/{subtype}/dropped_strains_{subtype}.txt
clades_file: clade-labeling/h5n1-clades.tsv # use H5N1 clades
description: config/{subtype}/description.md

## Subtype query - this structure is different from all other config parameters, requiring
# a key for each of the subtypes defined above in 'builds'. The string will be supplied to
# augur filter's --query argument.
## Subtype query - note that you cannot vary this based on segment/time wildcards
# as the filtering to subtype is independent (upstream) of those wildcards.
# Note also that you aren't limited to the "subtype" metadata field - any valid
# `augur filter --query` argument is ok.
subtype_query:
"h5n1-cattle-outbreak": "genoflu in 'B3.13'"
"h5n1-d1.1": "genoflu in 'D1.1'"
"h5n1-cattle-outbreak/*/*": "genoflu in 'B3.13'"
"h5n1-d1.1/*/*": "genoflu in 'D1.1'"

#### Rule-specific parameters ####
# The formatting here represents the three-tiered nature of the avian-flu build which
Expand Down
7 changes: 5 additions & 2 deletions rules/main.smk
Original file line number Diff line number Diff line change
Expand Up @@ -183,7 +183,9 @@ rule filter_sequences_by_subtype:
output:
sequences = "results/{subtype}/{segment}/sequences.fasta",
params:
subtypes=lambda w: config['subtype_query'][w.subtype],
# We don't have all wildcards set here (too early!) so we need to manually specify them for `resolve_config_value`
# (Note that we do have w.segment set, but we deliberately don't use it as the query must not vary by segment)
subtypes = lambda w: resolve_config_value('subtype_query')({'subtype': w.subtype, 'segment': '*', 'time': '*'})
shell:
"""
augur filter \
Expand All @@ -199,7 +201,8 @@ rule filter_metadata_by_subtype:
output:
metadata = "results/{subtype}/metadata.tsv",
params:
subtypes= lambda w: config['subtype_query'][w.subtype],
# We don't have all wildcards set here (too early!) so we need to manually specify them for `resolve_config_value`
subtypes = lambda w: resolve_config_value('subtype_query')({'subtype': w.subtype, 'segment': '*', 'time': '*'})
shell:
"""
augur filter \
Expand Down
16 changes: 9 additions & 7 deletions segment-focused/config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -51,14 +51,16 @@ dropped_strains: config/{subtype}/dropped_strains_{subtype}.txt
clades_file: clade-labeling/{subtype}-clades.tsv
description: config/description_gisaid.md

## Subtype query - this structure is different from all other config parameters, requiring
# a key for each of the subtypes defined above in 'builds'. The string will be supplied to
# augur filter's --query argument.
## Subtype query - note that you cannot vary this based on segment/time wildcards
# as the filtering to subtype is independent (upstream) of those wildcards.
# Note also that you aren't limited to the "subtype" metadata field - any valid
# `augur filter --query` argument is ok.
subtype_query:
"h5nx": "subtype in ['h5n1', 'h5n2', 'h5n3', 'h5n4', 'h5n5', 'h5n6', 'h5n7', 'h5n8', 'h5n9']"
"h5n1": "subtype in ['h5n1']"
"h7n9": "subtype in ['h7n9']"
"h9n2": "subtype in ['h9n2']"
"h5nx/*/*": "subtype in ['h5n1', 'h5n2', 'h5n3', 'h5n4', 'h5n5', 'h5n6', 'h5n7', 'h5n8', 'h5n9']"
"h5n1/*/*": "subtype in ['h5n1']"
"h7n9/*/*": "subtype in ['h7n9']"
"h9n2/*/*": "subtype in ['h9n2']"


#### Rule-specific parameters ####
# The formatting here represents the three-tiered nature of the avian-flu build which
Expand Down

0 comments on commit 7205647

Please sign in to comment.