Skip to content

Latest commit

 

History

History
133 lines (87 loc) · 3.64 KB

overview.md

File metadata and controls

133 lines (87 loc) · 3.64 KB

Data organisation with Spreadsheets (DC)

Questions:

  • What are basic principles for using spreadsheets for good data organization?

Objectives:

  • Describe best practices for organizing data so computers can make the best use of data sets.

Keypoints:

  • Good data organization is the foundation of any research project.

Starting with R (DC)

Learning Objectives

  • Describe the purpose of the RStudio Script, Console, Environment, and Plots panes.

  • Organize files and directories for a set of analyses as an R project, and understand the purpose of the working directory.

  • Use the built-in RStudio help interface to search for more information on R functions.

  • Demonstrate how to provide sufficient information for troubleshooting with the R user community.

Starting with data (DC)

Learning Objectives

  • Describe what a data frame is.
  • Load external data from a .csv file into a data frame.
  • Summarize the contents of a data frame.
  • Describe what a factor is.
  • Convert between strings and factors.
  • Reorder and rename factors.
  • Change how character strings are handled in a data frame.
  • Export and save data.

Manipulating and analyzing data with dplyr (DC)

Learning Objectives

  • Describe the purpose of the dplyr and tidyr packages.

  • Select certain columns in a data frame with the dplyr function select.

  • Select certain rows in a data frame according to filtering conditions with the dplyr function filter .

  • Link the output of one dplyr function to the input of another function with the 'pipe' operator %>%.

  • Add new columns to a data frame that are functions of existing columns with mutate.

  • Use the split-apply-combine concept for data analysis.

  • Use summarize, group_by, and count to split a data frame into groups of observations, apply summary statistics for each group, and then combine the results.

  • Describe the concept of a wide and a long table format and for which purpose those formats are useful.

  • Describe what key-value pairs are.

  • Reshape a data frame from long to wide format and back with the spread/pivot_wider and gather/pivot_longer commands from the tidyr package.

Data visualization (DC)

Learning Objectives

  • Produce scatter plots, boxplots, and time series plots using ggplot.
  • Set universal plot settings.
  • Describe what faceting is and apply faceting in ggplot.
  • Modify the aesthetics of an existing ggplot plot (including axis labels and color).
  • Build complex and customized plots from data in a data frame.

Joining tables

Learning Objectives

At the end of this section, students should understand

  • the need and concept of table joins,
  • different between different types of joins,
  • the importance of keys in joins,
  • circumstances leading to the appearance of missing values,
  • the implications of using non-unique keys.

Reproducible research

Learning Objectives

  • Understand the concept of reproducible research and reproducible documents.
  • Undertand the process by which a source document in compiled into a final report.
  • Generate a reproducible report in html or pdf from an Rmarkdown document using RStudio.

Bioinformatics

Learning objectives

  • Learning about the wider contect of bioinformatics and omics data analysis
  • First exposure to the Bioconductor project
  • Notions of experimental design
  • Omics data containers - theory
  • The SummarizedExperiment class

Additional programming concepts (optional)

Learning Objectives

Learn programming concepts, including

  • how to handle conditions
  • iterate of data structures
  • good coding practice
  • code re-use through functions