On the Calculation Formula of TPM #16

Hiroyuki24 · 2024-11-14T03:33:24Z

Hello,

I geatly appreciate the creation of a TPM calculation tool specifically for prokaryotes.
I'm writing to raise a concern regarding the TPM values calculated by FADU.

I tesed FADU with both the privided test data and my own RNA-seq data. The tool ran without issues, and I obtained a results table including TPM calues. However, when I summed the TPM values for all genes, the total did not reach one million. Typically represents the amount of transcrupt per million reads, so the sum of TPMs should be one million. Upon reviewing the FADU code, I found that the calculation formula for TPM is incorrect.

The correct calculation of TPM incolves the following steps:

Normalize the read count for each gene by its length to get counts per kilobase.
Sum the length-normalized counts.
Divide the length-normalized count by the total sum and multiply by one million.

However, the current FADU code is as follows:
function calc_tpm(len::UInt, totalcounts::Float32, feat_counts::Float32) """Calculate TPM score for current feature.""" return @fastmath(feat_counts * 1000 / len) * 1000000 / totalcounts end

it appears that the total read counts is calculated first, followed by length normalization.
This seems to be the formula for calculating FPKM, not TPM.

Could you please verify if there is an error in the TPM calculation formula?

The text was updated successfully, but these errors were encountered:

adkinsrs · 2024-11-18T14:16:43Z

I think it's fine, but will double-check.

Using this as source https://translational-medicine.biomedcentral.com/articles/10.1186/s12967-021-02936-w

The feat_counts used in the calc_tpm step was already normalized for length when compute_alignment_feature_ratio was run. The align_feat_ratio output from that function is factored into the final feat_counts operation when each feature overlap is processed in process_feature_overlaps.

I should probably add some better documentation to explain that the "feat_counts" variable is normalized while being calculated, since the current variable name can be misleading.

adkinsrs self-assigned this Nov 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

On the Calculation Formula of TPM #16

On the Calculation Formula of TPM #16

Hiroyuki24 commented Nov 14, 2024

adkinsrs commented Nov 18, 2024 •

edited

Loading

On the Calculation Formula of TPM #16

On the Calculation Formula of TPM #16

Comments

Hiroyuki24 commented Nov 14, 2024

adkinsrs commented Nov 18, 2024 • edited Loading

adkinsrs commented Nov 18, 2024 •

edited

Loading