diff --git a/scripts/variantstore/docs/aou/AOU_DELIVERABLES.md b/scripts/variantstore/docs/aou/AOU_DELIVERABLES.md index 20621772806..63db201db30 100644 --- a/scripts/variantstore/docs/aou/AOU_DELIVERABLES.md +++ b/scripts/variantstore/docs/aou/AOU_DELIVERABLES.md @@ -93,9 +93,7 @@ - This workflow needs to be run with the `filter_set_name` input from `GvsCreateFilterSet` step. - This workflow does not use the Terra Data Entity Model to run, so be sure to select the `Run workflow with inputs defined by file paths` workflow submission option. 1. `GvsCalculatePrecisionAndSensitivity` workflow - - You will need to have "Storage Object View" access granted for your @pmi-ops proxy group on the `gs://broad-dsp-spec-ops/gvs/truth` directory - - This workflow needs to be run with the control sample chr20 vcfs from `GvsExtractCallset` step which were placed in the `output_gcs_dir`. - - This workflow does not use the Terra Data Entity Model to run, so be sure to select the `Run workflow with inputs defined by file paths` workflow submission option. + - Please see the detailed instructions for running the Precision and Sensitivity workflow [here](../../tieout/AoU_PRECISION_SENSITIVITY.md). 1. `GvsCallsetCost` workflow - This workflow calculates the total BigQuery cost of generating this callset (which is not represented in the Terra UI total workflow cost) using the above GVS workflows; it's used to calculate the cost as a whole and by sample. diff --git a/scripts/variantstore/tieout/AoU_PRECISION_SENSITIVITY.md b/scripts/variantstore/tieout/AoU_PRECISION_SENSITIVITY.md index 21a504b06ba..892b0df5df7 100644 --- a/scripts/variantstore/tieout/AoU_PRECISION_SENSITIVITY.md +++ b/scripts/variantstore/tieout/AoU_PRECISION_SENSITIVITY.md @@ -14,8 +14,16 @@ ## Precision and Sensitivity 1. Use the GvsCalculatePrecisionAndSensitivity wdl to calculate the precision and sensitivity. + 1. You will need to have "Storage Object View" access granted for your @pmi-ops proxy group on the `gs://broad-dsp-spec-ops/gvs/truth` directory + 1. This workflow does not use the Terra Data Entity Model to run, so be sure to select the `Run workflow with inputs defined by file paths` workflow submission option. + + +The wdl takes several inputs as described below. Pro tip: it can be useful to look at the inputs from prior successful +runs of the precision and sensitivity workflow *as a model* for new runs, being careful to update values as necessary. +The inputs `sample_names`, `truth_vcfs`, `truth_vcf_indexes`, `truth_beds`, and `truth_fasta` are parallel arrays that +should all be updated in lockstep with one another. `input_vcf_fofn` will need to be generated anew for every unique upstream +run of `GvsExtractCallset`, while `chromosome` and `ref_fasta` will likely remain the same for every run. -The wdl takes several inputs: **input_vcf_fofn** - A FOFN (file of file names) of output VCFs for control samples generated by `GvsExtractCallSet`. These need not be subsetted down to the chromosome. The FOFN should contain the full cloud paths to the VCFs, not just the file names. @@ -45,7 +53,7 @@ gsutil cp vcfs_fofn.txt /p_and_s/vcfs_fofn.txt [ "NA12878", \ "NA24385" ``` -**truth_vcfs** - A list of the VCFs that contain the truth data used for analyzing the samples in `sample_names`. +**truth_vcfs** - A list of the VCFs that contain the truth data used for analyzing the samples in `sample_names`. Note this ``` [ "gs://broad-gotc-test-storage/gvs/truth/HG001_GRCh38_GIAB_highconf_CG-IllFB-IllGATKHC-Ion-10X-SOLID_CHROM1-X_v.3.3.2_highconf_PGandRTGphasetransfer.vcf.gz", \