Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

-B A2G output format explanation #46

Open
ckuenne opened this issue Aug 23, 2021 · 2 comments
Open

-B A2G output format explanation #46

ckuenne opened this issue Aug 23, 2021 · 2 comments

Comments

@ckuenne
Copy link

ckuenne commented Aug 23, 2021

Hi,

I'm back with more questions. Seems like I don't really understand the output format when using the "-B A2G" parameter.

cmd help:

-B <READ-TAG>         Tag reads by base substitution.
                      Count non-reference base substitution per read and stratify.
                      Requires stranded library type.
                      (Format for T to C mismatch: T2C; use ',' to separate substitutions)
                      Default: none

The manual has this to say:

6.3
Adding read/base stratification
Read stratification or partitioning based on base substitution(s) can be enabled by adding “-B <BASE-SUB>” to
your JACUSA2 run statement. <BASE-SUB> defines the base substitution X2Y : X, Y ∈ {A, C, G, T}, X ̸= Y
of interest where X is the reference base and Y is a base call from some read. It is required to provide a stranded
library type for each condition because otherwise X2Y cannot unambigously be determined from the sequencing
data. It is possible to provide multiple base substitutions by separating each with a “,”.
For each site the output will consist of at least on line the represents the total not stratified reads. The “info”
column will contain a field the following field “tag=*” indicating that the total reads are shown. If read with the
wanted base substitution A2G for example is encountered, all sites that are covered by this read will have an
additional line of output and the “info” column will have a value of “tag=A2G”.

call:
java -Xmx55g -jar /mnt/software/x86_64/packages/jacusa/2.0.1/jacusa.jar call-2 -a D,M,Y -filterNM 5 -s -c 2 -T 1 -P FR-SECONDSTRAND -B A2G -p 16 -r x/j2.a2g ./star_ht_1/igv/ht_1.bam,./star_ht_2/igv/ht_2.bam,./star_ht_3/igv/ht_3.bam,./star_ht_4/igv/ht_4.bam ./star_m3d-ht_1/igv/m3d-ht_1.bam,./star_m3d-ht_2/igv/m3d-ht_2.bam,./star_m3d-ht_3/igv/m3d-ht_3.bam,./star_m3d-ht_4/igv/m3d-ht_4.bam

example of j2.a2g:

#contig	start	end	name	score	strand	bases11	bases12	bases13	bases14	bases21	bases22	bases23	bases24	info	filter	ref		
chr1	3386985	3386986	call-2	1.086141859	-	6,0,0,0	5,0,0,0	8,0,0,0	18,0,2,0	4,0,0,0	7,0,0,0	12,0,0,0	3,0,0,0	tag=*	*	A
chr1	3386985	3386986	call-2	*	-	1,0,0,0	1,0,0,0	2,0,0,0	4,0,2,0	*	2,0,0,0	*	*	tag=A2G	*	A

bases1* = 4 replicates of the reference condition (ht)
bases2* = 4 replicates of the tribe condition (m3dht, =modififed A2G)

So the first line per variant is the "normal" jacusa output with ACGT coverage and the second should be only the A2G modifications? But how do I read that second line?

@ryrl9703
Copy link

I have the same question, too. Anyone can explan?

@piechottam
Copy link
Collaborator

The first line "tag=*" contains ALL reads.
The second line "tag=A2G" contains ONLY reads with A->G substititions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants