Skip to content

Commit

Permalink
Shift subtype query to config yaml
Browse files Browse the repository at this point in the history
This reduces the conditional logic used to special-case the workflows
which wanted to filter on GenoFLU by shifting the entire query arg
into the config YAML. A (not unintentional) side effect is that future
builds which want to analyse different subtypes such as H7N6
<#108> are possible to
set-up using only a config overlay.
  • Loading branch information
jameshadfield committed Mar 4, 2025
1 parent 8962a97 commit 2a47663
Show file tree
Hide file tree
Showing 4 changed files with 20 additions and 23 deletions.
5 changes: 5 additions & 0 deletions genome-focused-d1.1/config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,11 @@ dropped_strains: config/{subtype}/dropped_strains_{subtype}.txt
clades_file: clade-labeling/h5n1-clades.tsv # use H5N1 clades
description: config/{subtype}/description.md

## Subtype query - this structure is different from all other config parameters, requiring
# a key for each of the subtypes defined above in 'builds'. The string will be supplied to
# augur filter's --query argument.
subtype_query:
"h5n1-d1.1": "genoflu in 'D1.1'"

#### Rule-specific parameters ####
filter:
Expand Down
5 changes: 5 additions & 0 deletions genome-focused/config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,11 @@ dropped_strains: config/{subtype}/dropped_strains_{subtype}.txt
clades_file: clade-labeling/h5n1-clades.tsv # use H5N1 clades
description: config/{subtype}/description_{subtype}.md

## Subtype query - this structure is different from all other config parameters, requiring
# a key for each of the subtypes defined above in 'builds'. The string will be supplied to
# augur filter's --query argument.
subtype_query:
"h5n1-cattle-outbreak": "genoflu in 'B3.13'"

#### Rule-specific parameters ####
# The formatting here represents the three-tiered nature of the avian-flu build which
Expand Down
25 changes: 2 additions & 23 deletions rules/main.smk
Original file line number Diff line number Diff line change
Expand Up @@ -32,27 +32,6 @@ rule test_target:
"""
input: "auspice/avian-flu_h5n1_ha_all-time.json"

def subtypes_by_subtype_wildcard(wildcards):

# TODO - this function does more than strictly subtype filtering as certain builds filter to
# GenoFLU constellation, and in the future this may be expanded. We should rename the function!
# TODO XXX - move to configs (started in https://github.com/nextstrain/avian-flu/pull/104 but
# We should make the entire query config-definable)
if wildcards.subtype == 'h5n1-d1.1':
return "genoflu in 'D1.1'"
elif wildcards.subtype == 'h5n1-cattle-outbreak':
return "genoflu in 'B3.13'"

db = {
'h5nx': ['h5n1', 'h5n2', 'h5n3', 'h5n4', 'h5n5', 'h5n6', 'h5n7', 'h5n8', 'h5n9'],
'h5n1': ['h5n1'],
'h7n9': ['h7n9'],
'h9n2': ['h9n2'],
}
assert wildcards.subtype in db, (f"Subtype {wildcards.subtype!r} is not defined in the snakemake function "
"`subtypes_by_subtype_wildcard` -- is there a typo in the subtype you are targetting?")
return(f"subtype in [{', '.join([repr(s) for s in db[wildcards.subtype]])}]")

class InvalidConfigError(Exception):
pass

Expand Down Expand Up @@ -206,7 +185,7 @@ rule filter_sequences_by_subtype:
output:
sequences = "results/{subtype}/{segment}/sequences.fasta",
params:
subtypes=subtypes_by_subtype_wildcard,
subtypes=lambda w: config['subtype_query'][w.subtype],
shell:
"""
augur filter \
Expand All @@ -222,7 +201,7 @@ rule filter_metadata_by_subtype:
output:
metadata = "results/{subtype}/metadata.tsv",
params:
subtypes=subtypes_by_subtype_wildcard,
subtypes= lambda w: config['subtype_query'][w.subtype],
shell:
"""
augur filter \
Expand Down
8 changes: 8 additions & 0 deletions segment-focused/config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,14 @@ dropped_strains: config/{subtype}/dropped_strains_{subtype}.txt
clades_file: clade-labeling/{subtype}-clades.tsv
description: config/description_gisaid.md

## Subtype query - this structure is different from all other config parameters, requiring
# a key for each of the subtypes defined above in 'builds'. The string will be supplied to
# augur filter's --query argument.
subtype_query:
"h5nx": "subtype in ['h5n1', 'h5n2', 'h5n3', 'h5n4', 'h5n5', 'h5n6', 'h5n7', 'h5n8', 'h5n9']"
"h5n1": "subtype in ['h5n1']"
"h7n9": "subtype in ['h7n9']"
"h9n2": "subtype in ['h9n2']"

#### Rule-specific parameters ####
# The formatting here represents the three-tiered nature of the avian-flu build which
Expand Down

0 comments on commit 2a47663

Please sign in to comment.