Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unneccsary warning during liftover of each variant? #25

Open
jjfarrell opened this issue Dec 21, 2024 · 3 comments
Open

Unneccsary warning during liftover of each variant? #25

jjfarrell opened this issue Dec 21, 2024 · 3 comments

Comments

@jjfarrell
Copy link

When runing a liftover+ from 37 to 38, a warning is displayed for each variant (N=17697744/). If I ignore the warnings and let it continue, the liftover runs to completion and creates a b38 vcf. The command line is pointing to 2 references with -s and -f so the warning is not expected. The liftover does seem to be swapping alleles and rejecting variants. Any ideas on how to eliminate this warning?
Warning: as option --src-fasta-ref is missing it is impossible to infer which allele is the reference allele at position chr22:40552657
Warning: as option --src-fasta-ref is missing it is impossible to infer which allele is the reference allele at position chr22:40554777
Warning: as option --src-fasta-ref is missing it is impossible to infer which allele is the reference allele at position chr22:40559560
Warning: as option --src-fasta-ref is missing it is impossible to infer which allele is the reference allele at position chr22:40568384
Warning: as option --src-fasta-ref is missing it is impossible to infer which allele is the reference allele at position chr22:40611472
Lines total/swapped/reference added/rejected: 17697744/10897/8803/2421
Merging 9 temporary files
Cleaning
Done

@freeseek
Copy link
Owner

You need both the b37 (-s) and the b38 (-f) references as input, or BCFtools/liftover will not be able to handle some variants properly. For SNP this might not be too important (though some will be lost because of this) but it is still highly recommended to provide both options. I could add a fix to make sure the warning is not displayed too many times

@jjfarrell
Copy link
Author

But I am using both options (-s -f) .

bcftools +liftover --no-version  $VCF37  -- \
  -s  /restricted/projectnb/casa/ref/human_g1k_v37_decoy.fasta \
  -f $REF_DIR/GRCh38_full_analysis_set_plus_decoy_hla.fa \
  -c $REF_DIR/GRCh37_to_GRCh38.chain.gz \
  --reject  $VCF_reject\
  --reject-type b \
  --write-src | \
  bcftools sort -Oz -o $VCF38

There warning is generated in the code below in liftover.c. What is this code trying to do? What are maximally extended Alleles?

static int find_reference(bcf1_t *rec, const bcf_hdr_t *hdr, const char *ref, int is_snp, kstring_t *str) {
    int i, swap = -1;
    int ref_len = strlen(ref);
    for (i = 0; i < rec->n_allele; i++) {
        if (rec->d.allele[i][0] == '*') continue;
        int len = strlen(rec->d.allele[i]);
        if (len != ref_len) continue;
        if (ref_len == 1 ? (ref[0] == rec->d.allele[i][0]) : !strncasecmp(ref + 1, rec->d.allele[i] + 1, len - 2)) {
            if (swap >= 0) {
                // two matches can only happen if alleles were not maximally extended
                fprintf(stderr,
                        "Warning: as option --src-fasta-ref is missing it is impossible to infer which allele is the "
                        "reference allele at position %s:%" PRIhts_pos "\n",
                        bcf_seqname(hdr, rec), rec->pos + 1);
                break;
            } else {
                swap = i;
            }
        }
    }

@freeseek
Copy link
Owner

freeseek commented Jan 6, 2025

Can you show me some examples of a variants that are giving you this warning?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants