Skip to content

Commit

Permalink
Improve how 'subtype_query' is defined
Browse files Browse the repository at this point in the history
Brings the config structure in line with all of the other config values,
but this necessitates a helper function as this config value is accessed
before all the wildcards are set (because it's so early in the
pipeline).

This also greatly improves the error handling if we forget to add the
appropriate subtype to the config, or leave out 'subtype_query'
altogether.

Inspired by <#104 (comment)>
  • Loading branch information
jameshadfield committed Mar 6, 2025
1 parent 77796cc commit 43531bb
Show file tree
Hide file tree
Showing 4 changed files with 30 additions and 14 deletions.
11 changes: 6 additions & 5 deletions genome-focused/config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -45,12 +45,13 @@ dropped_strains: config/{subtype}/dropped_strains_{subtype}.txt
clades_file: clade-labeling/h5n1-clades.tsv # use H5N1 clades
description: config/{subtype}/description.md

## Subtype query - this structure is different from all other config parameters, requiring
# a key for each of the subtypes defined above in 'builds'. The string will be supplied to
# augur filter's --query argument.
## Subtype query - note that you cannot vary this based on segment/time wildcards
# as the filtering to subtype is independent (upstream) of those wildcards.
# Note also that you aren't limited to the "subtype" metadata field - any valid
# `augur filter --query` argument is ok.
subtype_query:
"h5n1-cattle-outbreak": "genoflu in 'B3.13'"
"h5n1-d1.1": "genoflu in 'D1.1'"
"h5n1-cattle-outbreak/*/*": "genoflu in 'B3.13'"
"h5n1-d1.1/*/*": "genoflu in 'D1.1'"

#### Rule-specific parameters ####
# The formatting here represents the three-tiered nature of the avian-flu build which
Expand Down
13 changes: 13 additions & 0 deletions rules/config.smk
Original file line number Diff line number Diff line change
Expand Up @@ -247,3 +247,16 @@ def expand_target_patterns():
targets.append(target)

return targets

def resolve_subtype_query():
"""
Returns a Snakemake Input function to resolve the `subtype_query` config parameter.
NOTE: This is subtly different from `resolve_config_value` as this config value is used
in the snakemake pipeline before we have {segment} or {time} wildcards set up, or if we do
have (some of) these wildcards set up then we explicitly do not use them. In other words,
the subtype query value does not vary across segment or time wildcards.
"""
def resolve(wildcards):
return resolve_config_value('subtype_query')({"subtype": wildcards.subtype, 'segment': '*', 'time': '*'})
return resolve
4 changes: 2 additions & 2 deletions rules/main.smk
Original file line number Diff line number Diff line change
Expand Up @@ -183,7 +183,7 @@ rule filter_sequences_by_subtype:
output:
sequences = "results/{subtype}/{segment}/sequences.fasta",
params:
subtypes=lambda w: config['subtype_query'][w.subtype],
subtypes = resolve_subtype_query(),
shell:
"""
augur filter \
Expand All @@ -199,7 +199,7 @@ rule filter_metadata_by_subtype:
output:
metadata = "results/{subtype}/metadata.tsv",
params:
subtypes= lambda w: config['subtype_query'][w.subtype],
subtypes= resolve_subtype_query(),
shell:
"""
augur filter \
Expand Down
16 changes: 9 additions & 7 deletions segment-focused/config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -51,14 +51,16 @@ dropped_strains: config/{subtype}/dropped_strains_{subtype}.txt
clades_file: clade-labeling/{subtype}-clades.tsv
description: config/description_gisaid.md

## Subtype query - this structure is different from all other config parameters, requiring
# a key for each of the subtypes defined above in 'builds'. The string will be supplied to
# augur filter's --query argument.
## Subtype query - note that you cannot vary this based on segment/time wildcards
# as the filtering to subtype is independent (upstream) of those wildcards.
# Note also that you aren't limited to the "subtype" metadata field - any valid
# `augur filter --query` argument is ok.
subtype_query:
"h5nx": "subtype in ['h5n1', 'h5n2', 'h5n3', 'h5n4', 'h5n5', 'h5n6', 'h5n7', 'h5n8', 'h5n9']"
"h5n1": "subtype in ['h5n1']"
"h7n9": "subtype in ['h7n9']"
"h9n2": "subtype in ['h9n2']"
"h5nx/*/*": "subtype in ['h5n1', 'h5n2', 'h5n3', 'h5n4', 'h5n5', 'h5n6', 'h5n7', 'h5n8', 'h5n9']"
"h5n1/*/*": "subtype in ['h5n1']"
"h7n9/*/*": "subtype in ['h7n9']"
"h9n2/*/*": "subtype in ['h9n2']"


#### Rule-specific parameters ####
# The formatting here represents the three-tiered nature of the avian-flu build which
Expand Down

0 comments on commit 43531bb

Please sign in to comment.