-
Notifications
You must be signed in to change notification settings - Fork 0
Overview
Studying the phylogenetic reconstruction of somatic evolution can be challenging due to constraints in resources, the influence of various biological factors, and technical limitations. Many methods concentrate on the analysis of single-cell sequencing data. This approach, while generally more accurate than using bulk-sequencing data, can be limiting due to its computational complexity and potential technical artefacts.
We integrate two simulation software tools to facilitate the modelling of evolutionary histories between multiple metastatic sites within a patient. The first tool is used to generate phylogenetic trees, providing a framework that represents the true evolutionary histories both at a multi-tumour and individual cell level. Subsequently, the second tool generates single-cell tumour sequences based on the true cell-cell trees. By combining these two software tools, cells are assigned to specific tumours, and therefore simulated under a structured population.
Our study evaluates the viability of pooling single-cell data into consensus sequences by comparing their accuracy in reconstructing the multi-tumour tree against pseudo-bulk data. This approach addresses the challenge of obtaining multi-tumour level evolutionary histories from single-cell data. We aim to provide guidance for researchers when choosing their preferred sequencing analysis method or for those looking to trace multi-tumour evolution. Under various biological conditions, we simulate single-cell tumour sequences with a predefined multi-tumour tree and its corresponding cell-cell tree replicates. We construct consensus sequences, pooling cell sequences based on shared tumour lineage. For comparison, we construct pseudobulk data. Calculations of tree distance between initial trees against reconstructed trees show that consensus sequences do not perform as well as a pseudobulk dataset.
We explore the process of reconstructing the true multi-tumour tree by integrating existing data, compiled as sets of replicates. First, we perform tree reconstruction using a species estimation method on single-cell data. Additionally, we explore supertree construction as well as the use of concatenated sequences, leveraging pooled data. Through this analysis, # we find that using single-cell data directly or utilising pseudobulk data for reconstructing multi-tumour evolution yields the best results.
Our motivation lies in addressing a researcher’s choice in sequencing strategies when seeking to analyse metastatic progression. To be more specific, we consider a scenario where a researcher aims to map the evolutionary history among several tumours/metastatic sites within a patient. One approach they could consider is obtaining scSEQ data, selecting individual cells from each metastatic site. However, to depict the evolutionary history of multiple sites, rather than just the cells themselves, these cells must first be appropriately grouped. Alternatively, they could opt for bulkSEQ, where the composition of cells within a resection represents a cell admixture.
From these considerations, several questions emerge:
• Perhaps the researcher already possesses scSEQ data and is now seeking the optimal method to infer a multi-tumour phylogeny. Could applying a novel method using consensus sequences to pool data originally derived from scSEQ be more accurate in inferring the phylogeny versus just using bulkSEQ?
• Conversely, is the researcher interested in acquiring sequencing data to infer this evolutionary history, requiring determination of whether the investment in scSEQ data acquisition is worthwhile, or is bulk sequencing sufficient for their needs?
With evolutionary history in mind, our primary focus is on recovering the multi-tumour phylogeny, which drives our preference for pooled scSEQ data. This choice is motivated by concerns about potential interleaving that could arise if we were to rely solely on a phylogeny that displays the evolutionary history of individual cells as a fixed template for the evolutionary history of its higher population counterpart. In other words, the genealogy between individual cells a researcher samples may not correctly represent the genealogy of their corresponding metastatic sites. The individual cells are unlikely to cluster or evolve in perfect alignment with the progression depicted by a multi-tumour phylogeny.
Generating a dataframe in LHS, we introduce variability across multiple parameters within our simulations. These biological/technical parameters define a range of possible value combinations which can be represented as interconnected dimensions that form a complex high dimensional grid. This is referred to as the sample space.
We determine that manipulation of the VCF-PB data allows for similar tree reconstruction accuracy to that of scSEQ estimation methods.