You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I just figured out that seqtk cutN tolerates non-N bases in its output. In the example below you can see that it reports the range 229-610, which is not exclusively made of Ns. Worse, it reports overlapping ranges, which confuses tools. Setting a high penalty with the -p option apparently solves the problem.
As I misunderstood how seqtk cutN works, the regions maked in pink in the dotplots may be overly broad. To resolve this issue I need either:
Set -p to a value that I know is always high enough, or
replace seqtk with an awk script.
Command used and terminal output
cat > test.fa <<__END__
>test
TNNNNNNNNNNNNNNNNTTATTTAAGNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNGCACTTTTAATTNN
NNNNNNNNNNNCTATTTAATCCTTCTTTTTCTTTAATCTTAAAATTATCNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNTTATANNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNTNNNNNNNNNNNNNNNNTAAGATT
TATANNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNTTATNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNATNNNNNNNNNNNNNNNNNNNATTNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNAGCTCTTTTTNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNTAAAANNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNAATA
ATTNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
__END__
$ seqtk cutN -g -n 1 test.fa
test 1 17
test 26 166
test 178 191
test 229 610
test 612 631
test 229 665
test 675 740
test 745 776
test 783 831
$ seqtk cutN -g -n 1 -p 10000 test.fa
test 1 17
test 26 166
test 178 191
test 229 304
test 309 396
test 397 413
test 424 495
test 499 610
test 612 631
test 634 665
test 675 740
test 745 776
test 783 831
The text was updated successfully, but these errors were encountered:
See <nf-core/pairgenomealign#18> for details.
After this change it should not be needed to sort the ranges anymore.
(This was a symptom of the issue that the sorting command was dusting
under the carpet without me realising it).
Description of the bug
I just figured out that
seqtk cutN
tolerates non-N bases in its output. In the example below you can see that it reports the range229-610
, which is not exclusively made of Ns. Worse, it reports overlapping ranges, which confuses tools. Setting a high penalty with the-p
option apparently solves the problem.As I misunderstood how
seqtk cutN
works, the regions maked in pink in the dotplots may be overly broad. To resolve this issue I need either:-p
to a value that I know is always high enough, orseqtk
with an awk script.Command used and terminal output
The text was updated successfully, but these errors were encountered: