Bioinformatics learning and data analysis tips and tricks. Please, contribute and get in touch!
- RNA-seq notes
- scRNA-seq notes
- ChIP-seq notes
- Methylation notes
- scATAC-seq notes
- SNP notes
- Hi-C tools
- Hi-C data
- scHi-C notes
- Cancer notes
- Immuno notes
See MDmisc notes for other programming and genomics-related notes.
- Historical perspective on genome sequencing technologies. From the landmark 1953 Watson and Crick publication through sequencing of nucleic acids, shotgun sequencing (Messing, Sanger), Human Genome Project (HGP) & Celera Genomics, milestones in genome assembly, Next and Third generation sequencing, single molecule sequencing (SMRT, ONT, PacBio), long-read genome assemblers (FALCON, Canu). Box1 - companies developing sequencing technologies, Box 2 - more details on HGP, Box 3 - bioinformatics tools for genome assembly.
Paper
Giani, Alice Maria, Guido Roberto Gallo, Luca Gianfranceschi, and Giulio Formenti. “Long Walk to Genomics: History and Current Approaches to Genome Sequencing and Assembly.” Computational and Structural Biotechnology Journal, November 2019, S2001037019303277. https://doi.org/10.1016/j.csbj.2019.11.002.
- The complete assembly of human genome (haploid CHM13 cell line). 3.055 billion base pairs, no gaps for all 22 chromosomes plus ChrX, new genes. Resolving ribosomal rDNA sequences. PacBio, Oxford Nanopore, other technologies. The Telomere-to-Telomere (T2T) consortium, UCSC, NCBI PRJNA559484, GitHub with download links to FASTA, gff3, liftover chains.
Paper
Nurk, Sergey, Sergey Koren, Arang Rhie, Mikko Rautiainen, Andrey V. Bzikadze, Alla Mikheenko, Mitchell R. Vollger et al. "The complete sequence of a human genome." bioRxiv (2021). https://doi.org/10.1101/2021.05.26.445798
-
Milestones in Genomic Sequencing by Nature, 2000-2021 period, interactive infographics
-
Awesome-Bioinformatics - A curated list of awesome Bioinformatics libraries and software, by Daniel Cook and community-contributed
-
awosome-bioinformatics - A curated list of resources for learning bioinformatics
-
awesome-cancer-variant-databases - A community-maintained repository of cancer clinical knowledge bases and databases focused on cancer variants, by Sean Davis and community-maintained
-
awesome-single-cell - List of software packages for single-cell data analysis, including RNA-seq, ATAC-seq, etc. By Sean Davis and community-maintained
-
awesome-alternative-splicing - Alternative splicing resources
-
awesome-10x-genomics - List of tools and resources related to the 10x genomics GEMCode/Chromium system, by Johan Dahlberg and community-contributed
-
awesome-multi-omics - List of software packages for multi-omics analysis, by Mike Love and community-maintained
-
awesome-bioinformatics-benchmarks - A curated list of bioinformatics bench-marking papers and resources
-
awesome_genome_browsers - genome browsers and genomic visualization tools. By David McGaughey
-
awesome-expression-browser - A curated list of software and resources for exploring and visualizing (browsing) expression data, and more. By Federico Marini and community-maintained
-
awesome-genome-visualization - A list of interesting genome browser or genome-browser-like implementations. By Colin Diesh
-
awesome-microbes - List of software packages (and the people developing these methods) for microbiome (16S), metagenomics (WGS, Shot-gun sequencing), and pathogen identification/detection/characterization. By Steve Tsang and community-contributed
-
algorithmsInBioinformatics - Bioinformatics algorithms: Needleman-Wunsch, Feng-Doolittle, Gotoh and Nussinov implemented in Python. By Joachim Wolff, with lecture notes. Also, rklib - Rabin-Karp implementation of sequence substring search for DNA/RNA, and lv89 - C implementation of the Landau-Vishkin algorithm to compute the edit distance between two strings.
-
sandbox.bio - an interactive bedtools tutorial developed by the Quinlan Lab
-
biotools - A massive collection of references on the topics of bioinformatics, sequencing technologies, programming, machine learning, and more. By John Didion
-
For all your seq... DNA & RNA - Illumina flyer with infographics of all sequencing-by-synthesis technologies. RNA and DNA versions
-
List of software/websites/databases/other stuff for genome engineering
-
multimodal-scRNA-seq - Figure depicting the breadth of multimodal scRNA-seq technologies. References to technology-specific papers
-
SequencEnG - Hierarchical summary of 66 sequencing technologies, computational algorithms, references to papers.
- Zhang, Y., Manjunath, M., Kim, Y., Heintz, J., and Song, J.S. (2019). SequencEnG: an interactive knowledge base of sequencing techniques. Bioinformatics 35, 1438–1440.
-
OmniPath: intra- & intercellular signaling knowledge - a database of molecular biology prior knowledge, combines data from more than 100 resources. Interface to R, Python, more.
-
The Leek group guide to genomics papers- Jeff Leek recommended list of genomics papers
-
Unix, R and python tools for genomics and data science - links and references to many computational biology resources, by Ming Tang
-
bioinformatics-one-liners - collection of bioinformatics-genomics bash one liners, using awk, sed etc., by Ming Tang
-
RNA-seq-analysis - RNAseq analysis notes, by Ming Tang
-
ChIP-seq-analysis - ChIP-seq analysis notes, by Ming Tang
-
DNA-seq-analysis - Notes on whole exome and whole genome sequencing analysis, by Ming Tang
-
DNA-methylation-analysis - DNA methylation analysis notes from Ming Tang
-
scATAC-seq-analysis-notes - single-cell ATAC-seq notes, by Ming Tang
-
List of bioinformatics tools developed by IHEC Int'l Human Epigenome Consortium researchers - tools for all types of genomic analyses
-
Learning Nextflow in 2020 - Materials to learn NextFlow
-
Nextflow Tower - management of Nextflow data pipelines. Video overview
-
nf-core - a framework for Nextflow-based pipeline creation, community-driven. Integrated with Conda, Docker, Biocontainers. Scalable to a cloud level. Pipeline assemblers: Flowcraft, Pipeliner. nf-core GitHub. All pipelines are on the nf-core hub.
- Ewels, Philip, Alexander Peltzer, Sven Fillinger, Johannes Alneberg, Harshil Patel, Andreas Wilm, Maxime Garcia, Paolo Di Tommaso, and Sven Nahnsen. “Nf-Core: Community Curated Bioinformatics Pipelines.” Preprint. Bioinformatics, April 16, 2019.
-
bcbio-nextgen - Validated, scalable, community developed variant calling and RNA-seq analysis. Documentation
-
2020-GGG298 - Course materials for GGG298 - Tools to support data-intensive research, by Titus Brown. Unix, Conda, Snakemake, project organization, Git, Slurm, R/Rmarkdown
-
biomedicalresearch2021 - Course Materials for EN.601.452 / AS.020.415 Computational Biomedical Research & Advanced Biomedical Research, by Michael Schatz. Links to other courses, papers.
-
RNA-seq theory and analysis - theory and practical guidelines for RNA-seq data analysis, by Skyler Kuhn
-
Data science for economists - from R, tidyverse to GitHub, web scraping, Docker, Google Cloud and more. By Grant McDermott
-
Reproducible research and data analysis with Linux containers and Nextflow pipelines, GitHub
-
Bioinformatics Coffee Hour - Short lessons from FAS Informatics coffee hour,data science, command line, R basics, Snakemake
-
The Biostar Handbook: 2nd Edition - Biostar Handbook - bioinformatics survival guide. A practical overview for the data analysis methods of bioinformatics. From Unix/command line to each type of sequencing data ana analysis
-
Beginner's Handbook to Next Generation Sequencing by GenoHub - omics technologies, experimental descriptions
-
Computational Genomics with R book by Altuna Akalin. From R basics to different types of bioinformatics analyses. GitHub
-
The Bioinformatics algorithms web site. Videos and Texts covering topics from biology to genome assembly algorithms, motif finding, sequence comparisons and more bioinformatics tasks and solutions. By Phillip Compeau and Pavel Pevzner
-
Algorithms for DNA Sequencing - Ben Langmead's course material. Slides on GitHub, Python/Colab code examples, Youtube videos
-
Applied Computational Genomics Course at UU by Aaron Quinlan. All slides on Google drive, Youtube videos for each lecture
-
Introduction to Computational Biology course by Mike Love. GitHub
-
Data Analysis in Genome Biology course by Thomas Girke. Bioinformatics of NGS data analysis GitHub
-
Bioconductor for Genomic Data Science course by Kasper Hansen. Includes videos, code examples and lecture material. GitHub
-
BPA-CSIRO Workshops - Cancer Genomics, Introduction to Next Generation Sequencing Hands-on Workshop. Links to topic-oriented GitHub repositories, PDF handouts. GitHub
-
The Bioconductor 2018 Workshop Compilation, editors - Levi Waldron, Sean Davis, Marcel Ramos, Lori Shepherd, Martin Morgan. Task-oriented workshops covering a range of genomics/Bioconductor analyses. GitHub
-
CSAMA - Course material for CSAMA: Statistical Data Analysis for Genome Scale Biology, by Bioconductor team. Lectures, Labs
-
JHU EN.601.749: Computational Genomics: Applied Comparative Genomics course, by Michael Schatz. Links to similar courses there. Other courses by Michael
-
JHU Data Science lab - several data science courses, links to topic-specific GitHub repositories, other resources
-
CSE 549 - Introduction to Computational Biology by Steven Skiena. Includes video lectures. Another course: CSE 519 - Data Science
-
DIYtranscriptomics.github.io - Course material for the "do-it-yourself" RNA-seq course, by Daniel Beiting. GitHub
-
Informatics for RNA-seq: A web resource for analysis on the cloud by Griffith lab. All aspects of RNA-seq analysis
-
RNA-seqlopedia - one long page of all steps of RNA-seq data analysis, from molecular biology to computational analysis, very detailed
-
BaRC Hot Topics - lecture slides and handouts on all genomics topics, from Unix to microarray, sequencing, genomics and statistics
-
Courses by Dr. Raghu Machiraju. Topics: data visualization, biomedical informatics, computer graphics, linked from the homepage. CSE5599-BMI7830 - biomedical informatics. CSE5544 – Introduction to Data Visualization
-
alignment-and-variant-calling-tutorial - basic walk-throughs for alignment and variant calling from NGS sequencing data, PDF lecture, by Erik Garrison.
-
Oxford Nanopore sequencing tutorial using procaryotic genomes. Supplementary material - walkthrough, tools, data, VM image available.
Paper
Salazar, Alex N., Franklin L. Nobrega, Christine Anyansi, Cristian Aparicio-Maldonado, Ana Rita Costa, Anna C. Haagsma, Anwar Hiralal, et al. “[An Educational Guide for Nanopore Sequencing in the Classroom](https://doi.org/10.1371/journal.pcbi.1007314).” PLOS Computational Biology, (January 23, 2020)
-
Machine Learning in Genomics - Fall 2019 by Manolis Kellis. Course website
-
Free online training in bioinformatics and biostatistics! by David Tabb. Various topics beyond genomics
-
Bioconductor Workshop 2: RNA Seq and ChIP Seq Analysis - 6 hours workshop on RNA-seq and ChIP-seq technology and analysis by Levi Waldron and others
-
Differential Splicing Analysis with RNA-Seq: Current Applications, Approaches, & Limitations - 1 hour overview of differential splicing analysis
-
NHGRI_Genomics2016 - "Current Topics in Genome Analysis 2016" course - A lecture series covering contemporary areas in genomics and bioinformatics, slides
-
MIT_SysBiol2014 - "Foundations of Computational and Systems Biology", slides
-
Regulatory Genomics and Epigenomics - series of genomcs-oriented talks by Simons Institute. 26 videos, ~30min each.
-
Foundations of Data Science — Spring 2016 - course from UC Berkeley. Instructor: John DeNero. Video and slides. This course is accompanied by other "connector" courses from UC Berkeley
-
A Roadmap to the Living Genome by John Stamatoyannopoulos. An overview of cell type-specific (epi)genomic landscapes, visualization and analysis techniques, association and enrichment of disease-associated variants in regulatory regions.
-
RNA-Seq Methods and Algorithms - short video course by Harold Pimentel, pseudoalignment, kallisto, sleuth, practical
-
Integrating ENCODE Data With Your Research: An Interactive Survey of ENCODE Tools and Resources - set of short videos about ENCODE data and functionality
-
Bioinfo-core.org - the community of Bioinformatics Core FAcilities, ISMB workshops, slides, resources about Cores.
-
List of bioinformatics core facilities at bioinfo-core.org
-
Dragon, Julie A., Chris Gates, Shannan Ho Sui, John N. Hutchinson, R. Krishna Murthy Karuturi, Alper Kucukural, Shawn Polson, et al. “Bioinformatics Core Survey Highlights the Challenges Facing Data Analysis Facilities.” Journal of Biomolecular Techniques, June 2020 - Bioinformatics core facility considerations. Responsibilities, finansial models, software/data, reporting/accreditation, challenges and concerns, future.
-
Kallioniemi, O., L. Wessels, and A. Valencia. “On the Organization of Bioinformatics Core Services in Biology-Based Research Institutes.” Bioinformatics, (May 15, 2011) - 1-page Bioinformatics core recommendations.
-
Lewitter, Fran, Michael Rebhan, Brent Richter, and David Sexton. “The Need for Centralization of Computational Biology Resources.” PLoS Computational Biology, (June 26, 2009) - Bioinformatics core as the center of computational resources, advantages and disadvantages. Questions to consider: IT and computational infrastructure, keeping up with science, training and education, funding model, hiring, evaluation, esternal affitiates with the core, outreach.
-
Richter, Brent G., and David P. Sexton. “Managing and Analyzing Next-Generation Sequence Data.” PLoS Computational Biology, (June 26, 2009) - Sequencing data computational analysis and storage, skills for data analysis (Unix, scripting, parallel computing, network, databases, biology/genomics, connect science with (novel) software solutions).
-
Lewitter, Fran, and Michael Rebhan. “Establishing a Successful Bioinformatics Core Facility Team.” PLoS Computational Biology, (June 26, 2009) - Considerations for successful bioinformatics core development (objectives, personnel, prioritization/time management, staying connected with research trends, outreach). Slides from the ISMB 2008 BoF on best practices in running bioinformatics cores.