Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ERROR: Too many blocks, the phasing process must be wrong. #47

Closed
maudeardin opened this issue Jan 9, 2025 · 7 comments
Closed

ERROR: Too many blocks, the phasing process must be wrong. #47

maudeardin opened this issue Jan 9, 2025 · 7 comments

Comments

@maudeardin
Copy link

Hi,

I'm working on FFPE tumour samples (RNAseq and WES) and I'm getting an error during the phasing process.

Num of hete small variant is 26 in HLA_DQB1.
11 blocks after phasing.
Start link blocks with database...
Num of all possible haps is 1024.
ERROR: Too many blocks, the phasing process must be wrong.

This error leads to missing files during the annotation process, resulting in no results.

Do you have a suggestion for bypassing the phasing process for problematic HLA genes?

Many thanks for your help

start annotation...
parameter:	sample:D240219	dir:specHLA//D240219	pop:Unknown	wxs:exon	G_nom:0
specHLA//D240219/hla.allele.1.HLA_DQB1.fasta: No such file or directory
specHLA//D240219/hla.allele.1.HLA_DQB1.fasta: No such file or directory
BLAST engine error: Warning: Sequence contains no data 
specHLA//D240219/hla.allele.2.HLA_DQB1.fasta: No such file or directory
specHLA//D240219/hla.allele.2.HLA_DQB1.fasta: No such file or directory
BLAST engine error: Warning: Sequence contains no data 
mpileup: invalid option -- 't'

Usage: samtools mpileup [options] in1.bam [in2.bam [...]]

Input options:
  -6, --illumina1.3+      quality is in the Illumina-1.3+ encoding
  -A, --count-orphans     do not discard anomalous read pairs
  -b, --bam-list FILE     list of input BAM filenames, one per line
  -B, --no-BAQ            disable BAQ (per-Base Alignment Quality)
  -C, --adjust-MQ INT     adjust mapping quality; recommended:50, disable:0 [0]
  -d, --max-depth INT     max per-file depth; avoids excessive memory usage [8000]
  -E, --redo-BAQ          recalculate BAQ on the fly, ignore existing BQs
  -f, --fasta-ref FILE    faidx indexed reference sequence file
  -G, --exclude-RG FILE   exclude read groups listed in FILE
  -l, --positions FILE    skip unlisted positions (chr pos) or regions (BED)
  -q, --min-MQ INT        skip alignments with mapQ smaller than INT [0]
  -Q, --min-BQ INT        skip bases with baseQ/BAQ smaller than INT [13]
  -r, --region REG        region in which pileup is generated
  -R, --ignore-RG         ignore RG tags (one BAM = one sample)
  --rf, --incl-flags STR|INT  required flags: include reads with any of the mask bits set []
  --ff, --excl-flags STR|INT  filter flags: skip reads with any of the mask bits set
                                            [UNMAP,SECONDARY,QCFAIL,DUP]
  -x, --ignore-overlaps   disable read-pair overlap detection
  -X, --customized-index  use customized index files

Output options:
  -o, --output FILE        write output to FILE [standard output]
  -O, --output-BP          output base positions on reads, current orientation
      --output-BP-5        output base positions on reads, 5' to 3' orientation
  -M, --output-mods        output base modifications
  -s, --output-MQ          output mapping quality
      --output-QNAME       output read names
      --output-extra STR   output extra read fields and read tag values
      --output-sep CHAR    set the separator character for tag lists [,]
      --output-empty CHAR  set the no value character for tag lists [*]
      --no-output-ins      skip insertion sequence after +NUM
                           Use twice for complete insertion removal
      --no-output-ins-mods don't display base modifications within insertions
      --no-output-del      skip deletion sequence after -NUM
                           Use twice for complete deletion removal
      --no-output-ends     remove ^MQUAL and $ markup in sequence column
      --reverse-del        use '#' character for deletions on the reverse strand
  -a                       output all positions (including zero depth)
  -a -a (or -aa)           output absolutely all positions, including unused ref. sequences

Generic options:
      --input-fmt-option OPT[=VAL]
               Specify a single input file format option in the form
               of OPTION or OPTION=VALUE
      --reference FILE
               Reference sequence FASTA FILE [null]
      --verbosity INT
               Set level of verbosity

Note that using "samtools mpileup" to generate BCF or VCF files has been
removed.  To output these formats, please use "bcftools mpileup" instead.
No such file or directory
Start G group resolution annotation...
The region with low read depth is masked by N. The cutoff is specified by -k.
The ratio of N (masked) is 0.0 for the allele specHLA//D240219/hla.allele.1.HLA_A.fasta
The ratio of N (masked) is 0.0 for the allele specHLA//D240219/hla.allele.2.HLA_A.fasta
The ratio of N (masked) is 0.0 for the allele specHLA//D240219/hla.allele.1.HLA_B.fasta
The ratio of N (masked) is 0.0 for the allele specHLA//D240219/hla.allele.2.HLA_B.fasta
The ratio of N (masked) is 0.0 for the allele specHLA//D240219/hla.allele.1.HLA_C.fasta
The ratio of N (masked) is 0.0 for the allele specHLA//D240219/hla.allele.2.HLA_C.fasta
The ratio of N (masked) is 0.0 for the allele specHLA//D240219/hla.allele.1.HLA_DPA1.fasta
The ratio of N (masked) is 0.0 for the allele specHLA//D240219/hla.allele.2.HLA_DPA1.fasta
The ratio of N (masked) is 0.0 for the allele specHLA//D240219/hla.allele.1.HLA_DPB1.fasta
The ratio of N (masked) is 0.0 for the allele specHLA//D240219/hla.allele.2.HLA_DPB1.fasta
The ratio of N (masked) is 0.0 for the allele specHLA//D240219/hla.allele.1.HLA_DQA1.fasta
The ratio of N (masked) is 0.0 for the allele specHLA//D240219/hla.allele.2.HLA_DQA1.fasta
Traceback (most recent call last):
  File "SpecHLA/script/whole/g_group_annotation.py", line 244, in <module>
    g_ann.main()
  File "SpecHLA/script/whole/g_group_annotation.py", line 146, in main
    n_ratio = count_n_ratio(infer_hap_file) # cal the ratio of N in the fasta
  File "SpecHLA/script/whole/g_group_annotation.py", line 23, in count_n_ratio
    with open(fasta_file, 'r') as f:
FileNotFoundError: [Errno 2] No such file or directory: 'specHLA//D240219/hla.allele.1.HLA_DQB1.fasta'
Clean output dir.
# version: IPD-IMGT/HLA 3.58.0
Sample	HLA_A_1	HLA_A_2	HLA_B_1	HLA_B_2	HLA_C_1	HLA_C_2	HLA_DPA1_1	HLA_DPA1_2	HLA_DPB1_1	HLA_DPB1_2	HLA_DQA1_1	HLA_DQA1_2	HLA_DQB1_1	HLA_DQB1_2	HLA_DRB1_1	HLA_DRB1_2
D240219 is done.
@wshuai294
Copy link
Collaborator

Hi, the issue seems derive from previous steps. Could you please send me the log before these:

Num of hete small variant is 26 in HLA_DQB1.
11 blocks after phasing.
Start link blocks with database...
Num of all possible haps is 1024.
ERROR: Too many blocks, the phasing process must be wrong.

@maudeardin
Copy link
Author

Hi, please find attached the complete log execution.

specHLA_D240219.log

@wshuai294
Copy link
Collaborator

Hi,

First, it is highly recommended to align the raw reads onto human reference like hg19 or hg38, and use the givenscript/ExtractHLAread.sh to extract only HLA-related reads before running SpecHLA. Otherwise, it will make the process slow and possibly harm the typing performance.

Second, to solve this issue, you can simply change the number from 512 to a very large number like 5120000 in the line 1592 of the script script/phase_variants.py.

    if len(all_poss) > 512:
        print ("ERROR: Too many blocks, the phasing process must be wrong.")
        print ("-----------------------------------------------------------")
        sys.exit(1)

@maudeardin
Copy link
Author

Hi,

Thanks for your reply. It does solve the problem.
I'm now having an error with samtools mpileup during the annotation step.

Do you have any suggestions on how to generate the pileup file?

`
start annotation...

parameter: sample:D240219 dir:specHLA//D240219 pop:Unknown wxs:exon G_nom:0

mpileup: invalid option -- 't'

Usage: samtools mpileup [options] in1.bam [in2.bam [...]]

Input options:
-6, --illumina1.3+ quality is in the Illumina-1.3+ encoding
-A, --count-orphans do not discard anomalous read pairs
-b, --bam-list FILE list of input BAM filenames, one per line
-B, --no-BAQ disable BAQ (per-Base Alignment Quality)
-C, --adjust-MQ INT adjust mapping quality; recommended:50, disable:0 [0]
-d, --max-depth INT max per-file depth; avoids excessive memory usage [8000]
-E, --redo-BAQ recalculate BAQ on the fly, ignore existing BQs
-f, --fasta-ref FILE faidx indexed reference sequence file
-G, --exclude-RG FILE exclude read groups listed in FILE
-l, --positions FILE skip unlisted positions (chr pos) or regions (BED)
-q, --min-MQ INT skip alignments with mapQ smaller than INT [0]
-Q, --min-BQ INT skip bases with baseQ/BAQ smaller than INT [13]
-r, --region REG region in which pileup is generated
-R, --ignore-RG ignore RG tags (one BAM = one sample)
--rf, --incl-flags STR|INT required flags: include reads with any of the mask bits set []
--ff, --excl-flags STR|INT filter flags: skip reads with any of the mask bits set
[UNMAP,SECONDARY,QCFAIL,DUP]
-x, --ignore-overlaps disable read-pair overlap detection
-X, --customized-index use customized index files

Output options:
-o, --output FILE write output to FILE [standard output]
-O, --output-BP output base positions on reads, current orientation
--output-BP-5 output base positions on reads, 5' to 3' orientation
-M, --output-mods output base modifications
-s, --output-MQ output mapping quality
--output-QNAME output read names
--output-extra STR output extra read fields and read tag values
--output-sep CHAR set the separator character for tag lists [,]
--output-empty CHAR set the no value character for tag lists [*]
--no-output-ins skip insertion sequence after +NUM
Use twice for complete insertion removal
--no-output-ins-mods don't display base modifications within insertions
--no-output-del skip deletion sequence after -NUM
Use twice for complete deletion removal
--no-output-ends remove ^MQUAL and $ markup in sequence column
--reverse-del use '#' character for deletions on the reverse strand
-a output all positions (including zero depth)
-a -a (or -aa) output absolutely all positions, including unused ref. sequences

Generic options:
--input-fmt-option OPT[=VAL]
Specify a single input file format option in the form
of OPTION or OPTION=VALUE
--reference FILE
Reference sequence FASTA FILE [null]
--verbosity INT
Set level of verbosity

Note that using "samtools mpileup" to generate BCF or VCF files has been
removed. To output these formats, please use "bcftools mpileup" instead.
`

@wshuai294
Copy link
Collaborator

It seems you are using a wrong version of samtools. It should be SamTools-1.3.1. Make sure to use the samtools in SpecHLA's env. Also, you can install this version of samtools in your system. But this may lead to other bugs. So the first the solution is much better.

@maudeardin
Copy link
Author

Hi,
I was using the samtools provided in the conda environment of the tools which wasn't 1.3.1.
Updating samtools to the version 1.3.1 solved the problem and now SpecHLA runs without any errors

Thanks for your help

@wshuai294
Copy link
Collaborator

Happy to hear this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants