What does spurious chain sharing mean? #56

chuckzzzz · 2022-11-22T21:45:39Z

Hi! Thanks for developing this amazing tool. I have a conceptual question regarding the filtering step.

In the paper, you mentioned:

Here, by default, the 10x cellranger clonotype definitions are filtered to remove spurious chain sharing and merge split clonotypes (for example, due to partial recovery of a second TCRα transcript).

This is also reflected by the default stringent=True for the make_10x_clones_file function.

However, this filtering step results in the majority of my paired data being dropped.

repeat?? 1598 36 ('TRAV12-101', 'TRAJ4101', 'CVVNGNSGYALNF', 'tgtgtggtgaacggcaattccgggtatgcactcaacttc') ('TRAV12-201', 'TRAJ4101', 'CVVNGNSGYALNF', 'tgtgtggtgaacggcaattccgggtatgcactcaacttc')
...
old_unpaired_barcodes: 44 old_paired_barcodes: 131357 new_stringent_paired_barcodes: 238

Setting stringent=False would obviously circumvent this problem, but I wonder what this actually means and whether my data has something seriously wrong haha? From the output, I am assuming there are duplicates but wouldn't clonal expansion also be considered as duplicates and thus be removed? Anyways it would be cool to know what exactly this stringent criteria is filtering for.

Thanks in advance!

phbradley · 2022-11-25T15:05:00Z

Hi there, thanks for trying conga! Great question. With 10x VDJ data, it's not uncommon to see pairings between one of the chains (alpha or beta) in a high frequency (ie, highly expanded) clonotype and multiple other diverse chains at much lower counts. In other words, you see that two clonotypes, with very different clone sizes, share one of their chains (share in the sense of exact identity at the nucleotide level). Our interpretation of this is that there is some "leakage" (maybe ambient RNA) of the high-frequency transcripts, and these TCR transcripts get encapsulated in droplets that they don't really belong to. So in the filtering step, the alpha-beta pairings are sorted by the number of cells each pairing was observed in, and then we go through that list in decreasing order of cell count, looking for chains that we've already seen paired with a different partner and much higher count. There's some logic to allow for clonotypes with two in-frame alpha chains. In all of this, clonal expansion has been accounted for, and we are just operating at the level of unique alpha-beta sequence pairs (each might occur in many individual droplets/barcodes).

Can you tell me a bit more about your data? That is a very large number of pairings! Is this from the new ChromiumX, or have multiple filtered_contig files been combined? Are these invariant T cells where we might expect an unusually high level of exact chain sharing? I've found that it's best to apply this filtering analysis at the level of individual 10x runs, since that's the situation where we run into this chain sharing.

Happy to elaborate if any of this is unclear, and to help debug if possible. If you were comfortable sharing your contigs file, I could process it on my end and see if there's anything funny going on. Or you could share the full log file. [email protected]

chuckzzzz · 2022-11-25T20:37:24Z

Wow. Thank you for this very quick and detailed response!

Yes, let me explain a bit about my data. So right now I have 40 samples sequenced with 10X immune profiling and I used cellranger VDJ on each of the samples. The RNA part goes through the standard cell ranger, integration, and QC stuff to become a single Seurat object. Then it is written to h5 per the instruction on Github. This means I have 40 contig files and 1 h5 format counts file for all 40 samples.

Based on #28, I tried merging them together with the make_10x_clones_file_batch function but it failed. I created the meta file in csv format like this:

file	batch_id
{path_to_A1_filtered_contig_annotations.csv}	A1
{path_to_A2_filtered_contig_annotations.csv}	A2
...	...
{path_to_H5_filtered_contig_annotations.csv}	H5

My barcode in the Seurat object has sample ID as the prefix followed by an underscore and then a "-1" (e.g. A1_ACCTGGATAT-1). So I called this function like this:

make_10x_clones_file_batch({path_to_meta_file}, "human", clones_file, add_batch_id_location="prefix", batch_id_delim="_")

However, it spits out an error message after denoting the presence of repeats.

repeat?? 3 93 ('TRBV11-101', 'TRBJ2-501', 'CASSLLSDSLEETQYF', 'tgtgccagcagcttattgagtgactccttagaagagacccagtacttc') ('TRBV11-301', 'TRBJ2-501', 'CASSLLSDSLEETQYF', 'tgtgccagcagcttattgagtgactccttagaagagacccagtacttc')
repeat?? 1836 41 ('TRAV12-101', 'TRAJ4101', 'CVVNGNSGYALNF', 'tgtgtggtgaacggcaattccgggtatgcactcaacttc') ('TRAV12-201', 'TRAJ4101', 'CVVNGNSGYALNF', 'tgtgtggtgaacggcaattccgggtatgcactcaacttc')
repeat?? 1 2 ('TRBV12-401', 'TRBJ2-201', 'CASSEREGLTGELFF', 'tgtgccagcagtgagcgggagggcctaaccggggagctgtttttt') ('TRBV6-101', 'TRBJ2-201', 'CASSEREGLTGELFF', 'tgtgccagcagtgagcgggagggcctaaccggggagctgtttttt')

TypeError                                 Traceback (most recent call last)
<ipython-input-8-61e27aef550d> in <module>()
----> 1 make_10x_clones_file_batch("test/sample_meta.csv", organism, clones_file, add_batch_id_location="prefix", batch_id_delim="_")

conga/tcrdist/make_10x_clones_file.py in make_10x_clones_file_batch(metadata_file, organism, clones_file, replace_batch_id, strip_batch_id_location, add_batch_id_location, batch_id_delim, stringent, **kwargs)
    761 
    762     if stringent:
--> 763         clonotype2tcrs, clonotype2barcodes = setup_filtered_clonotype_dicts( clonotype2tcrs, clonotype2barcodes )
    764 
    765 

conga/tcrdist/make_10x_clones_file.py in setup_filtered_clonotype_dicts(clonotype2tcrs, clonotype2barcodes, min_repeat_count_fraction, verbose)
    536     pairs_tuple2clonotypes = {}
    537     ab_counts = Counter() # for diagnostics
--> 538     for (clone_size, cid) in reversed( sorted( (len(y), x) for x,y in clonotype2barcodes.items() ) ):
    539         if cid not in clonotype2tcrs:
    540             #print('WHOAH missing tcrs for clonotype', clone_size, cid, clonotype2barcodes[cid])

TypeError: '<' not supported between instances of 'str' and 'float'

So I tried concatenating these contig files altogether by changing the barcode to be the same format. This naming is consistent with the integrated Seurat object. I think that's why the pairing is successful.

old_unpaired_barcodes: 44 old_paired_barcodes: 131357 new_stringent_paired_barcodes: 238

However, since the majority is removed I am guessing simply concatenating the filtered contig files together might not be the right thing to do. I would love to share the data, but let me double-check with my PI first. I am thinking maybe a few of the contig files should be enough for testing.

In the mean time, any thoughts on which step I did wrong based on my description? Thanks so much!

chuckzzzz · 2022-11-30T16:29:34Z

Hi I can share a few annotation files with you, maybe over email? Is [email protected] your email address?

phbradley · 2022-12-03T20:04:06Z

Yes, that email address is correct. Take care, Phil

…

________________________________ From: chuckzzzz ***@***.***> Sent: Wednesday, November 30, 2022 8:29 AM To: phbradley/conga ***@***.***> Cc: Bradley PhD, Phil ***@***.***>; Comment ***@***.***> Subject: Re: [phbradley/conga] What does spurious chain sharing mean? (Issue #56) Hi I can share a few annotation files with you, maybe over email? Is ***@***.*** your email address? — Reply to this email directly, view it on GitHub<https://urldefense.com/v3/__https://github.com/phbradley/conga/issues/56*issuecomment-1332430302__;Iw!!GuAItXPztq0!mxi_ERifpFkIlAzUNKYckp2lfmsQmUlNKQ0aa3sqcWE8Q_mRC7dXPR64c58qvucecoY5jPTaiQVS0z1OzixlOx1j$>, or unsubscribe<https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/ABBNCH5SSKEBNL6NUD7SYRTWK56HTANCNFSM6AAAAAASIJDBNQ__;!!GuAItXPztq0!mxi_ERifpFkIlAzUNKYckp2lfmsQmUlNKQ0aa3sqcWE8Q_mRC7dXPR64c58qvucecoY5jPTaiQVS0z1OzhAYwEyg$>. You are receiving this because you commented.Message ID: ***@***.***>

chuckzzzz · 2022-12-07T20:47:06Z

Just sent it thanks!

TheRaspberryFox · 2023-12-27T22:10:55Z

Hi,

I have been using this package and have had great success with it.

However, I am currently running into an error that I have never ran into before. When I run the make_10x_clones_file_batch command I got the error mentioned in this thread:

TypeError: '<' not supported between instances of 'str' and 'float'

I am reaching out in hope that it this was solved and there may be a solution.

Much Appreciated!

phbradley · 2023-12-27T22:31:01Z

Hi there,
Thanks for the kind words and for letting us know about this issue. I have not seen a new error like this yet, so it might help to get a bit more context. Could you share a bit more of the error message, like the line where it happened? It sounds like it could be a missing field in the CSV file, since then you have an 'na' value in the dataframe which I think pandas treats as a float.
Take care,
Phil

TheRaspberryFox · 2023-12-27T22:46:28Z

Wow, thank you for the quick reply. Happy to provide more information.

Here is the full error message:

TypeError Traceback (most recent call last)
Cell In[13], line 1
----> 1 make_10x_clones_file_batch(metadata_file = "CoNGA_metadata.csv", organism = "mouse", clones_file = "clones.tsv", strip_batch_id_location = 'prefix', add_batch_id_location = 'prefix', stringent= True)

File /media/cui-lab/Data_temp/Ryan_Brown/Pipelines/CoNGA_Pipeline/conga/conga/tcrdist/make_10x_clones_file.py:763, in make_10x_clones_file_batch(metadata_file, organism, clones_file, replace_batch_id, strip_batch_id_location, add_batch_id_location, batch_id_delim, stringent, **kwargs)
754 clonotype2tcrs, clonotype2barcodes = read_tcr_data_batch( organism,
755 metadata_file,
756 replace_batch_id,
(...)
759 batch_id_delim,
760 **kwargs )
762 if stringent:
--> 763 clonotype2tcrs, clonotype2barcodes = setup_filtered_clonotype_dicts( clonotype2tcrs, clonotype2barcodes )
766 _make_clones_file( organism, clones_file, clonotype2tcrs, clonotype2barcodes )

File /media/cui-lab/Data_temp/Ryan_Brown/Pipelines/CoNGA_Pipeline/conga/conga/tcrdist/make_10x_clones_file.py:538, in setup_filtered_clonotype_dicts(clonotype2tcrs, clonotype2barcodes, min_repeat_count_fraction, verbose)
536 pairs_tuple2clonotypes = {}
537 ab_counts = Counter() # for diagnostics
--> 538 for (clone_size, cid) in reversed( sorted( (len(y), x) for x,y in clonotype2barcodes.items() ) ):
539 if cid not in clonotype2tcrs:
540 #print('WHOAH missing tcrs for clonotype', clone_size, cid, clonotype2barcodes[cid])
541 continue

TypeError: '<' not supported between instances of 'float' and 'str'

phbradley · 2023-12-28T14:01:15Z

Great, thanks for that. It looks to me like this could be due to empty values in the raw_clonotype_id column of the contigs csv file. I just checked in a change:

84c0ea7

which you can just apply to the corresponding line of your code (if you don't want to update the repo), or try pulling the new version of the code from github. Let me know if that fixes it.

TheRaspberryFox · 2023-12-28T19:42:33Z

Thanks so much!

Removing the NA values fixed the error!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What does spurious chain sharing mean? #56

What does spurious chain sharing mean? #56

chuckzzzz commented Nov 22, 2022 •

edited

Loading

phbradley commented Nov 25, 2022

chuckzzzz commented Nov 25, 2022

chuckzzzz commented Nov 30, 2022

phbradley commented Dec 3, 2022 via email

chuckzzzz commented Dec 7, 2022

TheRaspberryFox commented Dec 27, 2023

phbradley commented Dec 27, 2023

TheRaspberryFox commented Dec 27, 2023

phbradley commented Dec 28, 2023

TheRaspberryFox commented Dec 28, 2023

What does spurious chain sharing mean? #56

What does spurious chain sharing mean? #56

Comments

chuckzzzz commented Nov 22, 2022 • edited Loading

phbradley commented Nov 25, 2022

chuckzzzz commented Nov 25, 2022

chuckzzzz commented Nov 30, 2022

phbradley commented Dec 3, 2022 via email

chuckzzzz commented Dec 7, 2022

TheRaspberryFox commented Dec 27, 2023

phbradley commented Dec 27, 2023

TheRaspberryFox commented Dec 27, 2023

phbradley commented Dec 28, 2023

TheRaspberryFox commented Dec 28, 2023

chuckzzzz commented Nov 22, 2022 •

edited

Loading