Skip to content

WinPCA. A package for windowed principal component analysis.

License

Notifications You must be signed in to change notification settings

MoritzBlumer/winpca

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

97 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

WinPCA

A package for windowed principal component analysis. WinPCA performs principal component analyses (PCA) in sliding windows along chromosomes. Both hard-called genotypes (input: VCF or TSV) or genotype likelihoods (input: VCF, TSV or BEAGLE) encoding biallellic SNPs are accepted. WinPCA uses scikit-allel to perfom PCAs on genotype data and PCAngsd methods for genotype likelihood (GL, PL) data.

WinPCA can aid the initial exploration of new datasets since no prior grouping of input samples is necessary to visualize genetic structure. It has also been used to identify chromosome-scale inversions in cichlids and to visualize the recombination landscape in a species cross (Fig. 2) or to identify ancestry tracts in a hybrid mouse (Fig. 6).

example_pca_plot

Installation

Dependencies

Please ensure to have these dependencies installed and accessible from your current shell environment: Python packages: numpy, pandas, numba, scikit-allel, plotly:

mamba install numpy pandas numba scikit-allel plotly

Additionally, to run WinPCA on genotype likelihood (GL/PL) data: PCAngsd (installation instructions included).

Obtain WinPCA

git clone https://github.com/MoritzBlumer/winpca.git  # clone github repository
chmod +x winpca/winpca                                # make excutable

Quick start

Minimal command line to visualize PC 1 along a chromosome (using GT data from a VCF):

# windowed PCA with default settings
winpca pca VCF_PATH CHROM_NAME:1-CHROM_SIZE PREFIX

# make a plot of principal component 1 and color by inversion state
winpca chromplot PREFIX CHROM_NAME:1-CHROM_SIZE -m METADATA_PATH -g METADATA_COLUMN_NAME

Please refer to the help messages (winpca {method} -h) or to the wiki for the full documentation, file format specifications, more use cases and a tutorial to produce the above plot.


Preprint

Blumer LM, Good JM & Durbin R (2025). WinPCA: A package for windowed principal component analysis. arXiv, 2501.11982.

Contact

Moritz Blumer: [email protected]

About

WinPCA. A package for windowed principal component analysis.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published