Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A function is added in each of the R scripts of isomir_commonality fo… #25

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 1 addition & 10 deletions projects/tewari/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,16 +21,7 @@ Ideally:
* isomiRSEA considers only iso_5p:+/-1
* sRNAbench labels some sequences as `mv` for isomiRs, these are lost for now in the conversion to GFF3 format.


# Methods

* Counts were normalized with edgeR and only counts > 1 were kept for each library
* Logic used to select a method to collapse replicates for each lab
* IDR was studied but if was difficult for us to find the best parameters. See how results varies when two main paraters change: https://github.com/lpantano/mypubs/blob/master/code-blog/idr/good_param_testing.png
* We played with the dispersion values for DESeq2 but nothing outstanding came to us to decide what isomiRs where well replicated or not.
* For now, we decided using isomiRs detected in the 4 replicates in each lab to plot reproducibility among labs using the same protocol.

# Analysis reproducibility
# reproducibility

The conversion to GFF3 is done by the script: `scripts/pre_cmd.sh`. This will generate the `expression_cpunts.tsv.gz` files for each tool showed here and script in `R` are the responsible to reproduce the figures:

Expand Down
Binary file removed projects/tewari/figures/replicates/bcbio.png
Binary file not shown.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file not shown.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified projects/tewari/figures/replicates/bcbio_libsize.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file removed projects/tewari/figures/replicates/isomirsea.png
Binary file not shown.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file not shown.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified projects/tewari/figures/replicates/isomirsea_libsize.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file removed projects/tewari/figures/replicates/mirge.png
Binary file not shown.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file not shown.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified projects/tewari/figures/replicates/mirge_libsize.png
Binary file removed projects/tewari/figures/replicates/srnabench.png
Diff not rendered.
Binary file modified projects/tewari/figures/replicates/srnabench_libsize.png
35 changes: 1 addition & 34 deletions projects/tewari/minutes.md
Original file line number Diff line number Diff line change
@@ -1,24 +1,6 @@
project road map: https://github.com/miRTop/incubator/projects/2

## 08-30-2018

People: Lorena, Aarun, Ioannis

* We presented the variation in the output of IDR package when changing two of the main paramters. See [Methods](https://github.com/miRTop/incubator/tree/master/projects/tewari#methods)
* We presented an example of figure when visualizing the [commonality among replicates and labs](https://github.com/miRTop/incubator/blob/master/projects/tewari/figures/labs/bcbio.png)

### The discussion was focused on finding a method to flag sequences as reproducible or not:

* One strategy was to let Ioannis to contact IDR authors to ask for advice about how to adapt the code to our question
* Another strategy was to let Lorena find some statistician to join the collaboration to help in this particular step

### Ready to do points:

* Lorena has to send FASTA file of all sequences to Ioannis to get the features that will be used for the machine learning analysis once we can classify sequence into reproducible or not.
* Aarun will update plots and divided by low/medium/high expression


## 08-09-2018
## 08-090-2018

We presented the data and results until now and discussed the following points:

Expand All @@ -29,21 +11,6 @@ We presented the data and results until now and discussed the following points:
* Repeat plots separating between low/medium/high
* kmer analysis to explain difference among labs

### Papers related to reproducibility and small RNA bias by Ioannis

The metric is called IDR (irreproducible discovery rate) and you can find it here:

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3431496/

http://cran.r-project.org/web/packages/idr/index.htm

These are the papers for the bias:

http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0126049

https://www.ncbi.nlm.nih.gov/pubmed/22647250


### Questions for next meeting

* what to do with isomiRs that map equally to other mature miRNAs
Expand Down
62 changes: 8 additions & 54 deletions projects/tewari/r_code/functions.R
Original file line number Diff line number Diff line change
Expand Up @@ -60,70 +60,24 @@ summarize_isomir = . %>% filter_labs %>%
n_isomirs_sum = sum(n_isomirs),
counts_sum = sum(counts))

summarize_isomirs_by_lab = . %>% filter_labs %>%
filter(iso_5p != 0 | iso_3p != 0 | !!sym(add_col) != 0 | !!sym(snp_col) != 0) %>%
select(iso_5p, iso_3p, !!sym(add_col), !!sym(snp_col),
miRNA, replicate, lab, value, replicate, lib_method_simple) %>%
gather(isomir_type, size,
-value, -miRNA, -lab, -replicate, -lib_method_simple ) %>%
filter(size != 0) %>%
group_by(miRNA, isomir_type, lab, replicate, lib_method_simple, size) %>%
summarise(counts = sum(value)) %>%
group_by(isomir_type, miRNA, lab, lib_method_simple, size) %>%
summarise(reps = length(replicate), counts = sum(counts)) %>%
group_by(isomir_type, miRNA, reps, lib_method_simple, size) %>%
summarise(nlabs = length(lab), counts = sum(counts)) %>%
group_by(nlabs, reps, lib_method_simple, isomir_type) %>%
summarise(n_isomirs = n(), counts = sum(counts)) %>%
group_by(reps, lib_method_simple, isomir_type) %>%
arrange(isomir_type, lib_method_simple, desc(nlabs), reps) %>%
mutate(n_isomirs_cum = cumsum(n_isomirs)/sum(n_isomirs),
counts_cum = cumsum(counts)/sum(counts),
n_isomirs_sum = sum(n_isomirs),
counts_sum = sum(counts))

plot_summarize_isomir_by_lab = function(df) {
plot_grid(
ggplot(filter(df, nlabs == 1), aes(x = n_isomirs_sum,
y = log10(counts_sum),
shape = as.factor(reps))) +
geom_point() +
scale_shape_discrete("filter:n_replicates") +
theme(axis.text.x = element_text(angle = 45, hjust = 0.5, vjust = 0.5)) +
xlab("number of unique isomiRs") +
ylab("counts of isomiRs") +
facet_grid(lib_method_simple~isomir_type),
ggplot(df, aes(group=as.factor(reps),
color=as.factor(reps),
x = n_isomirs_cum*100,
y = counts_cum*100,
shape=as.factor(nlabs))) +
geom_line() +
geom_point() +
scale_color_brewer("common:n_replicates", palette = "Set2") +
scale_size_continuous("filter:min_counts", range = c(1,2),
breaks = c(0, 1, 2, 3, 4)) +
scale_shape_discrete("laboratory") +
facet_grid(lib_method_simple~isomir_type),
nrow = 2,
align = "v")
}

plot_summarize_isomir = function(df) {
plot_grid(
ggplot(filter(df, reps == 1), aes(x = n_isomirs_sum,
y = log10(counts_sum),
shape = as.factor(lab))) +
ggplot(filter(df, reps == 1), aes(x = n_isomirs_sum, y = counts_sum,
shape = as.factor(lab),
size = min_counts)) +
geom_point() +
scale_shape_discrete("filter:lab") +
scale_size_continuous("filter:min_counts", range = c(1,2),
breaks = c(0, 1, 2, 3, 4)) +
scale_shape_discrete("filter:min_counts") +
theme(axis.text.x = element_text(angle = 45, hjust = 0.5, vjust = 0.5)) +
xlab("number of unique isomiRs") +
ylab("counts of isomiRs") +
facet_grid(lib_method_simple~isomir_type),
ggplot(df, aes(color=as.factor(reps),
x = n_isomirs_cum*100,
y = counts_cum*100,
shape=as.factor(lab))) +
shape=as.factor(lab),
size=min_counts)) +
geom_point() +
scale_color_brewer("common:n_replicates", palette = "Set2") +
scale_size_continuous("filter:min_counts", range = c(1,2),
Expand Down
116 changes: 45 additions & 71 deletions projects/tewari/r_code/isomir_commonality/bcbio_isomir_commonality.R
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@ library(tidyverse)
library(ggplot2)
library(cowplot)
library(edgeR)
library(matrixStats)
source("r_code/functions.R")
theme_set(theme_bw(base_size = 14))

Expand All @@ -14,16 +15,7 @@ dds = DGEList(complete[, 13:ncol(complete)])
dds = calcNormFactors(dds)
counts = cpm(dds, normalized.lib.sizes = TRUE)

pilot = cbind(complete[, 1], counts) %>%
gather(sample, value, -UID) %>%
mutate(sample = gsub("-mirbase-ready", "", sample)) %>%
filter(sample %in% meta_pilot[["fixed_name"]]) %>%
left_join(complete[,1:12]) %>%
filter(value >= 1) %>%
mutate(Variant = ifelse(is.na(Variant), "Reference", Variant)) %>%
left_join(meta_pilot, by = c("sample" = "fixed_name"))


###
dds$samples %>% rownames_to_column("sample") %>%
mutate(sample = gsub("-mirbase-ready", "", sample)) %>%
filter(sample %in% meta_pilot[["fixed_name"]]) %>%
Expand All @@ -33,72 +25,54 @@ dds$samples %>% rownames_to_column("sample") %>%
facet_grid(lab~lib_method_simple) +
ggsave("figures/replicates/bcbio_libsize.png", width = 7, height = 9)

pilot %>%
library("matrixStats")
new_average<- cbind(counts, average_val = rowMeans2(counts))
probs <- c(0.25, 0.5, 0.75)
pre_pilot = cbind(complete[, 1], new_average)

counts_avg <- data.matrix(new_average[, "average_val"])
cQuans<-colQuantiles(counts_avg, probs=probs)
minQ <- cQuans[[1]]
maxQ <- cQuans[[3]]
all_exprn = cbind(complete[, 1], counts)
low_exprn<-subset(pre_pilot, pre_pilot[,"average_val"] <= minQ)
medium_exprn<-subset(pre_pilot, pre_pilot[,"average_val"] >minQ & pre_pilot[,"average_val"] < maxQ)
high_exprn<-subset(pre_pilot, pre_pilot[,"average_val"] >= maxQ)


exprn = function(fn_exrp, fname)
{
filename3 = paste("figures/replicates/bcbio_counts_per_isomir_type_",fname,".jpg",sep="")
filename2 = paste("figures/replicates/bcbio_",fname,".jpg",sep="")

#pilot = cbind(complete[, 1], counts) %>%
pilot = fn_exrp %>%
gather(sample, value, -UID) %>%
mutate(sample = gsub("-mirbase-ready", "", sample)) %>%
filter(sample %in% meta_pilot[["fixed_name"]]) %>%
left_join(complete[,1:12]) %>%
filter(value >= 1) %>%
mutate(Variant = ifelse(is.na(Variant), "Reference", Variant)) %>%
left_join(meta_pilot, by = c("sample" = "fixed_name"))

lapply(2:3, function(x){
filter(pilot, value >= x) %>%
summarize_isomir %>%
plot_summarize_isomir +
ggsave("figures/replicates/bcbio.png", width = 11, height = 9)
mutate(min_counts = x)
}) %>% bind_rows() %>%
plot_summarize_isomir +
ggsave(filename2, width = 9, height = 9)

pilot %>%
summarize_isomirs_by_lab() %>%
plot_summarize_isomir_by_lab() +
ggsave("figures/labs/bcbio.png", width = 11, height = 9)

pilot %>% expression_isomirs_by_lab_protocol_isomir %>%
ggplot(aes(x=lab,y=counts,fill=as.factor(reps))) +
geom_boxplot() + scale_y_log10() +
facet_grid(lib_method_simple~isomir_type) +
ggsave("figures/replicates/bcbio_counts_per_isomir_type.png",
width = 9, height = 9)

### test #######################################################################
ggsave(filename3, width = 9, height = 9)
rm(pilot)
}

# pilot %>% filter(!is.na(lib_method_simple), lab != "Lab1",
# Variant == "Reference", value > 0) %>%
# group_by(miRNA, lab, replicate, lib_method_simple) %>%
# summarise(counts = sum(value)) %>%
# group_by(miRNA, lab, lib_method_simple) %>%
# summarise(reps = length(replicate), counts = sum(counts)) %>%
# group_by(lab, reps, lib_method_simple) %>%
# summarise(n_isomirs = n(), counts = sum(counts)) %>%
# group_by(lab, lib_method_simple) %>%
# arrange(lib_method_simple, lab, desc(reps)) %>%
# mutate(n_mirs_cum = cumsum(n_isomirs)/sum(n_isomirs),
# counts_cum = cumsum(counts)/sum(counts)) %>%
# ggplot(aes(color=as.factor(reps), x=n_mirs_cum, y=counts_cum,
# shape=as.factor(lab))) +
# geom_point() +
# scale_color_brewer("common:n_replicates", palette = "Set2") +
# scale_shape_discrete("laboratory") +
# facet_wrap(~lib_method_simple) +
# xlab("% of sequences detected compared to a single replicate") +
# ylab("% of counts detected compared to a single replicated;")
#
#
# pilot %>% plot_isoadd_position_by_protocol_by_lab(., "iso_add")
#
# pilot %>% plot_isoadd_position_by_protocol_by_lab(., "iso_5p")
#
# pilot %>% plot_isoadd_position_by_protocol_by_lab(., "iso_3p")
#
# iso = "iso_5p"
# filter(pilot, !is.na(lib_method_simple), lab != "Lab1") %>%
# filter(!!sym(iso) != 0) %>%
# select(!!sym(iso),
# miRNA, replicate, lab, value, replicate, lib_method_simple) %>%
# gather(isomir_type, size,
# -value, -miRNA, -lab, -replicate, -lib_method_simple ) %>%
# filter(size != 0) %>%
# group_by(miRNA, lab, replicate, lib_method_simple, size) %>%
# summarise(counts = sum(value)) %>%
# group_by(miRNA, lab, lib_method_simple, size) %>%
# summarise(reps = length(replicate), counts = sum(counts)) %>%
# group_by(lab, reps, lib_method_simple, size) %>%
# summarise(n_isomirs = n(), counts = sum(counts)) %>%
# group_by(lab, lib_method_simple, size) %>%
# arrange(size, lib_method_simple, lab, desc(reps)) %>%
# mutate(n_isomirs_cum = cumsum(n_isomirs),
# counts_cum = cumsum(counts)) %>%
# ggplot(aes(x = reps, y = n_isomirs_cum, group = lab, color = lab)) +
# geom_point() +
# geom_line() +
# facet_grid(size~lib_method_simple)
exprn(all_exprn, "all_exprn")
exprn(high_exprn, "high_exprn")
exprn(medium_exprn, "medium_exprn")
exprn(low_exprn, "low_exprn")
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@ library(tidyverse)
library(ggplot2)
library(cowplot)
library(edgeR)
library(matrixStats)
source("r_code/functions.R")

theme_set(theme_bw(base_size = 11))
Expand All @@ -15,17 +16,6 @@ dds = DGEList(complete[, 13:ncol(complete)])
dds = calcNormFactors(dds)
counts = cpm(dds, normalized.lib.sizes = TRUE)

pilot = cbind(complete[, 1], counts) %>%
gather(sample, value, -UID) %>%
mutate(sample = gsub("_tagMir-all", "", sample)) %>%
filter(sample %in% meta_pilot[["fixed_name"]]) %>%
left_join(complete[,1:12]) %>%
filter(value >= 1) %>%
mutate(Variant = ifelse(is.na(Variant), "Reference", Variant)) %>%
left_join(meta_pilot, by = c("sample" = "fixed_name")) %>%
filter(abs(iso_5p)<4, abs(iso_3p)<4, abs(iso_add)<4 )


dds$samples %>% rownames_to_column("sample") %>%
mutate(sample = gsub("_tagMir-all", "", sample)) %>%
filter(sample %in% meta_pilot[["fixed_name"]]) %>%
Expand All @@ -35,18 +25,53 @@ dds$samples %>% rownames_to_column("sample") %>%
facet_grid(lab~lib_method_simple) +
ggsave("figures/replicates/isomirsea_libsize.png", width = 7, height = 9)

lapply(2:3, function(x){

new_average<- cbind(counts, average_val = rowMeans2(counts))
probs <- c(0.25, 0.5, 0.75)
pre_pilot = cbind(complete[, 1], new_average)

counts_avg <- data.matrix(new_average[, "average_val"])
cQuans<-colQuantiles(counts_avg, probs=probs)
minQ <- cQuans[[1]]
maxQ <- cQuans[[3]]
all_exprn = cbind(complete[, 1], counts)
low_exprn<-subset(pre_pilot, pre_pilot[,"average_val"] <= minQ)
medium_exprn<-subset(pre_pilot, pre_pilot[,"average_val"] >minQ & pre_pilot[,"average_val"] < maxQ)
high_exprn<-subset(pre_pilot, pre_pilot[,"average_val"] >= maxQ)

exprn = function(fn_exrp, fname)
{
filename3 = paste("figures/replicates/isomirsea_counts_per_isomir_type_",fname,".jpg",sep="")
filename2 = paste("figures/replicates/isomirsea_",fname,".jpg",sep="")
#filename2 = "figures/replicates/isomirsea_all_exprn.jpg"
#pilot = cbind(complete[, 1], counts) %>%
pilot = fn_exrp %>%
gather(sample, value, -UID) %>%
mutate(sample = gsub("_tagMir-all", "", sample)) %>%
filter(sample %in% meta_pilot[["fixed_name"]]) %>%
left_join(complete[,1:12]) %>%
filter(value >= 1) %>%
mutate(Variant = ifelse(is.na(Variant), "Reference", Variant)) %>%
left_join(meta_pilot, by = c("sample" = "fixed_name")) %>%
filter(abs(iso_5p)<4, abs(iso_3p)<4, abs(iso_add)<4 )

lapply(2:3, function(x){
filter(pilot, value >= x) %>%
summarize_isomir %>%
mutate(min_counts = x)
}) %>% bind_rows() %>%
summarize_isomir %>%
mutate(min_counts = x)
}) %>% bind_rows() %>%
plot_summarize_isomir +
ggsave("figures/replicates/isomirsea.png", width = 9, height = 9)
ggsave(filename2, width = 9, height = 9)


pilot %>% expression_isomirs_by_lab_protocol_isomir %>%
pilot %>% expression_isomirs_by_lab_protocol_isomir %>%
ggplot(aes(x=lab,y=counts,fill=as.factor(reps))) +
geom_boxplot() + scale_y_log10() +
facet_grid(lib_method_simple~isomir_type) +
ggsave("figures/replicates/isomirsea_counts_per_isomir_type.png",
width = 9, height = 9)
ggsave(filename3, width = 9, height = 9)
rm(pilot)
}

exprn(all_exprn, "all_exprn")
exprn(high_exprn, "high_exprn")
exprn(medium_exprn, "medium_exprn")
exprn(low_exprn, "low_exprn")
Loading