Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preprocessing #25

Open
mbruhns opened this issue Feb 20, 2022 · 1 comment
Open

Preprocessing #25

mbruhns opened this issue Feb 20, 2022 · 1 comment
Assignees
Labels

Comments

@mbruhns
Copy link
Collaborator

mbruhns commented Feb 20, 2022

Habe gerade das Paper gefunden: DeepTCR is a deep learning framework for revealing sequence concepts within T-cell repertoires

Zur Vorverarbeitung der Daten schreiben sie

Methods

Data curation

TCR sequencing files were collected as raw tsv/csv formatted files (Supplementary Fig. 1) from the various sources cited within the manuscript. Sequencing files were parsed to take the amino acid sequence of the CDR3 after removing unproductive sequences. Clones with different nucleotide sequences but the same amino acid sequence were aggregated together under one amino acid sequence and their reads were summed to determine their relative abundance. Within the parsing code, we additionally specified to ignore sequences that used non-IUPAC letters (*,X,O) and removed sequences that were greater than 40 amino acids in length. For the purpose of the algorithm, the maximum length can be altered but we chose 40 as we did not expect any real sequences to be longer than this length.

Sowas haben wir bisher gar nicht drin, oder? Das ist jetzt relativ biologisch, wüsstest du z.B. was unproductive sequences sind? Auf die schnelle habe ich das Paper hier gefunden Non-productive human TCR beta chain genes represent V-D-J diversity before selection upon function: insight into biased usage of TCRBD and TCRBJ genes and diversity of CDR3 region length .

Das hier könnte insgesamte auch noch interessant sein zum allgemeinen Überblick:
Recent advances in T-cell receptor repertoire analysis: Bridging the gap with multimodal single-cell RNA sequencing

@donEnno
Copy link
Owner

donEnno commented Feb 21, 2022

Nein, sowas haben wir bisher nicht drin. Allerdings wurde der Datensatz in Hannover schon etwas vorverarbeitet, d.h. über Thresholding wurden potentielle Sequenzierfehler rausgenommen. Was es mit den unproductive sequences in unserem Kontext auf sich hat, kann ich nach überfliegen des Abstracts und der Introduction nicht genau sagen. Allerdings geht es in dem Paper auch um ab-T-Zellen und nicht um gd-T-Zellen.
Da kann ich Nicola mal fragen, ob ihr da was zu einfällt.

Ich setze es mal auf die Literatur-Liste.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants