Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cluster centroids #21

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Cluster centroids #21

wants to merge 1 commit into from

Conversation

awohns
Copy link
Owner

@awohns awohns commented Mar 1, 2022

Adds code to run louvain community detection and create cluster centroids from the results.
One issue is that clusters with ~ >10000 nodes are difficult to handle, since creating the genotype matrix is extremely expensive (10,000 nodes * 1,000,000 sites). I currently have a max_cluster_size parameter to randomly subsample large clusters.
More generally, we get around the expense of creating huge genotype matrices by using tskit.TreeSequence.simplify(), which cuts down the tree sequence quickly, making it easier to deal with.

@awohns awohns force-pushed the cluster_centroids branch from 271422d to 02ead76 Compare March 1, 2022 05:12
@awohns awohns force-pushed the cluster_centroids branch from 02ead76 to 2600564 Compare March 1, 2022 05:17
@awohns
Copy link
Owner Author

awohns commented Mar 1, 2022

On second thought, this might be better as two different functions: one to do the clustering and another to compute the centroids.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant