This repository contains a selection of my projects in bioinformatics and data analysis.
I describe the goals of each project below:
-PCA.ipynb: Implements principle comonent analysis from first principles to cluster RNA-seq data, and compares it to another clustering using a k-means algorithm, also implemented from first principles.
-t-SNE.ipynb: Implements t-SNE clustering of RNA-seq data from first principles. Compares to results from PCA, which fails to do a linear projection of this data to 2D space.
-RNA-seq classification.ipynb: Uses second-order markov model to determine whether RNA seq reads came from one of two populations.
-Clustering.ipynb: Implements hard k-means and mixture model algorithms to cluster RNA-seq data according to cell identity.
-Regression_optimization.ipynb: Given RNA-seq data for genes which follow oscillating expression patterns, find the parameters for the best regression model using maximum likelihood calculations.
-NMF.ipynb: Finds groupings of co-expressed genes (batteries) using non-negative matrix factorization. Then, finds moonlighting genes which are expressed in more than one battery.