-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
What does spurious chain sharing mean? #56
Comments
Hi there, thanks for trying conga! Great question. With 10x VDJ data, it's not uncommon to see pairings between one of the chains (alpha or beta) in a high frequency (ie, highly expanded) clonotype and multiple other diverse chains at much lower counts. In other words, you see that two clonotypes, with very different clone sizes, share one of their chains (share in the sense of exact identity at the nucleotide level). Our interpretation of this is that there is some "leakage" (maybe ambient RNA) of the high-frequency transcripts, and these TCR transcripts get encapsulated in droplets that they don't really belong to. So in the filtering step, the alpha-beta pairings are sorted by the number of cells each pairing was observed in, and then we go through that list in decreasing order of cell count, looking for chains that we've already seen paired with a different partner and much higher count. There's some logic to allow for clonotypes with two in-frame alpha chains. In all of this, clonal expansion has been accounted for, and we are just operating at the level of unique alpha-beta sequence pairs (each might occur in many individual droplets/barcodes). Can you tell me a bit more about your data? That is a very large number of pairings! Is this from the new ChromiumX, or have multiple filtered_contig files been combined? Are these invariant T cells where we might expect an unusually high level of exact chain sharing? I've found that it's best to apply this filtering analysis at the level of individual 10x runs, since that's the situation where we run into this chain sharing. Happy to elaborate if any of this is unclear, and to help debug if possible. If you were comfortable sharing your contigs file, I could process it on my end and see if there's anything funny going on. Or you could share the full log file. |
Wow. Thank you for this very quick and detailed response! Yes, let me explain a bit about my data. So right now I have 40 samples sequenced with 10X immune profiling and I used cellranger VDJ on each of the samples. The RNA part goes through the standard cell ranger, integration, and QC stuff to become a single Seurat object. Then it is written to h5 per the instruction on Github. This means I have 40 contig files and 1 h5 format counts file for all 40 samples. Based on #28, I tried merging them together with the
My barcode in the Seurat object has sample ID as the prefix followed by an underscore and then a "-1" (e.g. A1_ACCTGGATAT-1). So I called this function like this:
However, it spits out an error message after denoting the presence of repeats.
So I tried concatenating these contig files altogether by changing the barcode to be the same format. This naming is consistent with the integrated Seurat object. I think that's why the pairing is successful.
However, since the majority is removed I am guessing simply concatenating the filtered contig files together might not be the right thing to do. I would love to share the data, but let me double-check with my PI first. I am thinking maybe a few of the contig files should be enough for testing. In the mean time, any thoughts on which step I did wrong based on my description? Thanks so much! |
Hi I can share a few annotation files with you, maybe over email? Is |
Yes, that email address is correct.
Take care,
Phil
…________________________________
From: chuckzzzz ***@***.***>
Sent: Wednesday, November 30, 2022 8:29 AM
To: phbradley/conga ***@***.***>
Cc: Bradley PhD, Phil ***@***.***>; Comment ***@***.***>
Subject: Re: [phbradley/conga] What does spurious chain sharing mean? (Issue #56)
Hi I can share a few annotation files with you, maybe over email? Is ***@***.*** your email address?
—
Reply to this email directly, view it on GitHub<https://urldefense.com/v3/__https://github.com/phbradley/conga/issues/56*issuecomment-1332430302__;Iw!!GuAItXPztq0!mxi_ERifpFkIlAzUNKYckp2lfmsQmUlNKQ0aa3sqcWE8Q_mRC7dXPR64c58qvucecoY5jPTaiQVS0z1OzixlOx1j$>, or unsubscribe<https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/ABBNCH5SSKEBNL6NUD7SYRTWK56HTANCNFSM6AAAAAASIJDBNQ__;!!GuAItXPztq0!mxi_ERifpFkIlAzUNKYckp2lfmsQmUlNKQ0aa3sqcWE8Q_mRC7dXPR64c58qvucecoY5jPTaiQVS0z1OzhAYwEyg$>.
You are receiving this because you commented.Message ID: ***@***.***>
|
Just sent it thanks! |
Hi, I have been using this package and have had great success with it. However, I am currently running into an error that I have never ran into before. When I run the make_10x_clones_file_batch command I got the error mentioned in this thread: TypeError: '<' not supported between instances of 'str' and 'float' I am reaching out in hope that it this was solved and there may be a solution. Much Appreciated! |
Hi there, |
Wow, thank you for the quick reply. Happy to provide more information. Here is the full error message: TypeError Traceback (most recent call last) File /media/cui-lab/Data_temp/Ryan_Brown/Pipelines/CoNGA_Pipeline/conga/conga/tcrdist/make_10x_clones_file.py:763, in make_10x_clones_file_batch(metadata_file, organism, clones_file, replace_batch_id, strip_batch_id_location, add_batch_id_location, batch_id_delim, stringent, **kwargs) File /media/cui-lab/Data_temp/Ryan_Brown/Pipelines/CoNGA_Pipeline/conga/conga/tcrdist/make_10x_clones_file.py:538, in setup_filtered_clonotype_dicts(clonotype2tcrs, clonotype2barcodes, min_repeat_count_fraction, verbose) TypeError: '<' not supported between instances of 'float' and 'str' |
Great, thanks for that. It looks to me like this could be due to empty values in the which you can just apply to the corresponding line of your code (if you don't want to update the repo), or try pulling the new version of the code from github. Let me know if that fixes it. |
Thanks so much! Removing the NA values fixed the error! |
Hi! Thanks for developing this amazing tool. I have a conceptual question regarding the filtering step.
In the paper, you mentioned:
This is also reflected by the default stringent=True for the make_10x_clones_file function.
However, this filtering step results in the majority of my paired data being dropped.
Setting stringent=False would obviously circumvent this problem, but I wonder what this actually means and whether my data has something seriously wrong haha? From the output, I am assuming there are duplicates but wouldn't clonal expansion also be considered as duplicates and thus be removed? Anyways it would be cool to know what exactly this stringent criteria is filtering for.
Thanks in advance!
The text was updated successfully, but these errors were encountered: