Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

On the Calculation Formula of TPM #16

Open
Hiroyuki24 opened this issue Nov 14, 2024 · 1 comment
Open

On the Calculation Formula of TPM #16

Hiroyuki24 opened this issue Nov 14, 2024 · 1 comment
Assignees

Comments

@Hiroyuki24
Copy link

Hello,

I geatly appreciate the creation of a TPM calculation tool specifically for prokaryotes.
I'm writing to raise a concern regarding the TPM values calculated by FADU.

I tesed FADU with both the privided test data and my own RNA-seq data. The tool ran without issues, and I obtained a results table including TPM calues. However, when I summed the TPM values for all genes, the total did not reach one million. Typically represents the amount of transcrupt per million reads, so the sum of TPMs should be one million. Upon reviewing the FADU code, I found that the calculation formula for TPM is incorrect.

The correct calculation of TPM incolves the following steps:

  1. Normalize the read count for each gene by its length to get counts per kilobase.
  2. Sum the length-normalized counts.
  3. Divide the length-normalized count by the total sum and multiply by one million.

However, the current FADU code is as follows:
function calc_tpm(len::UInt, totalcounts::Float32, feat_counts::Float32) """Calculate TPM score for current feature.""" return @fastmath(feat_counts * 1000 / len) * 1000000 / totalcounts end

it appears that the total read counts is calculated first, followed by length normalization.
This seems to be the formula for calculating FPKM, not TPM.

Could you please verify if there is an error in the TPM calculation formula?

@adkinsrs adkinsrs self-assigned this Nov 17, 2024
@adkinsrs
Copy link
Member

adkinsrs commented Nov 18, 2024

I think it's fine, but will double-check.

Using this as source https://translational-medicine.biomedcentral.com/articles/10.1186/s12967-021-02936-w

The feat_counts used in the calc_tpm step was already normalized for length when compute_alignment_feature_ratio was run. The align_feat_ratio output from that function is factored into the final feat_counts operation when each feature overlap is processed in process_feature_overlaps.

I should probably add some better documentation to explain that the "feat_counts" variable is normalized while being calculated, since the current variable name can be misleading.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants