Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Beta diversity plot examples #111

Open
antagomir opened this issue Nov 21, 2021 · 12 comments
Open

Beta diversity plot examples #111

antagomir opened this issue Nov 21, 2021 · 12 comments
Assignees

Comments

@antagomir
Copy link
Member

Add an example showing how pairs of points can be connected on beta diversity map.

@antagomir
Copy link
Member Author

Was this done already, I have some memories..?

@antagomir
Copy link
Member Author

@jepasan could you make this as a warm-up? Just suggest one solution first, we can take it from there. Make a PR to OMA beta diversity chapter. At least the "hitchip1006" demo data set from miaTime package has 1000+ individuals, and multiple time points for some of those individuals. So the beta diversity plot (PCoA) can be done for the 1000+ individuals, then connect points with lines for one or more individuals on that map. If unclear, we can chat more in Slack.

@Daenarys8
Copy link
Contributor

Can this be sufficient?

library(scater)
library(dplyr)
library(miaTime)
data(hitchip1006, package = "miaTime")
tse <- hitchip1006
tse <- transformAssay(tse, method = "relabundance")
tse <- runMDS(tse, FUN = vegan::vegdist, method = "bray", name = "PCoA_BC",
              assay.type = "relabundance", na.rm = TRUE)

meta_data <- as.data.frame(colData(tse))

print(colnames(meta_data))  


pcoa_coords <- as.data.frame(reducedDim(tse, "PCoA_BC")) 
pcoa_coords$subject <- meta_data$subject
pcoa_coords$SampleID <- meta_data$SampleID
pcoa_coords$timee <- meta_data$`time`  

subjects_with_multiple_entries <- meta_data %>%
    group_by(subject) %>%
    summarize(num_entries = n(), .groups = "drop") %>%
    filter(num_entries > 1) %>%
    pull(subject)

subset_data_by_subjects <- meta_data %>%
    filter(subject %in% subjects_with_multiple_entries)

merged_data <- subset_data_by_subjects %>%
    inner_join(pcoa_coords, by = "subject") %>%
    arrange(subject, `timee`)

# Plot with SampleID in legend mapped to subject
p <- ggplot(merged_data, aes(x = V1, y = V2)) + 
    geom_point(aes(color = as.factor(sample)), alpha = 0.6, size = 2) + 
    geom_line(aes(group = subject, color = as.factor(sample)), size = 0.7, alpha = 0.5) +  
    labs(
        title = "PCoA with Bray-Curtis dissimilarity",
        x = "PC1",
        y = "PC2",
        color = "Sample ID"
    ) +
    theme_minimal() +
    theme(
        legend.position = "bottom",
        legend.text = element_text(size = 6),        
        legend.title = element_text(size = 7),      
        legend.key.size = unit(0.4, "cm"),           
        legend.spacing.x = unit(0.1, 'cm'),        
        legend.spacing.y = unit(0.1, 'cm')           
    ) +
    guides(color = guide_legend(nrow = 4, byrow = TRUE)) 

Rplot-samples

@Daenarys8
Copy link
Contributor

Perhaps just for 5-10 individuals?

@TuomasBorman
Copy link
Contributor

Too complex for training purpose. Try this idea:


p <- plotReducedDim(tse, colour_by = "patient", ...)
p + geom_line(aes(group = patient))

@Daenarys8
Copy link
Contributor

p <- plotReducedDim(tse, colour_by = "patient", ...)
p + geom_line(aes(group = patient))

This also works but the line just goes through all the points. I am looking to see if just points with multiple sample-timepoints can get distinct lines passing through them
Rplotline

@TuomasBorman
Copy link
Contributor

My code was just an example, I did not test it. Does this work in your case

data(hitchip1006, package = "miaTime")
tse <- hitchip1006
tse <- tse[, 1000:1020]
tse <- transformAssay(tse, method = "relabundance")
tse <- runMDS(tse, FUN = getDissimilarity, method = "bray", assay.type = "relabundance", na.rm = TRUE)

p <- plotReducedDim(tse, "MDS", colour_by = "subject")
p + geom_line(aes(group = .data[["colour_by"]]))

@Daenarys8
Copy link
Contributor

My code was just an example, I did not test it. Does this work in your case

data(hitchip1006, package = "miaTime")
tse <- hitchip1006
tse <- tse[, 1000:1020]
tse <- transformAssay(tse, method = "relabundance")
tse <- runMDS(tse, FUN = getDissimilarity, method = "bray", assay.type = "relabundance", na.rm = TRUE)

p <- plotReducedDim(tse, "MDS", colour_by = "subject")
p + geom_line(aes(group = .data[["colour_by"]]))

Yes, this links the required points perfectly. thanks

@TuomasBorman
Copy link
Contributor

Can you check if you can add arrows that point from timepoint 1 to timepoint 2....

That might be more tricky, might require incorporating additional data that is not already present in the object.

ggplot_build() function can help you to check the data present

@TuomasBorman
Copy link
Contributor

Or just use shape_by for time points and then use it similarly as colour_by

That might be better and simpler approach

@Daenarys8
Copy link
Contributor

for now i have this but i don't think it is representing the flow of time, I will explore a little bit.
Rplot01arrow

@antagomir
Copy link
Member Author

I think (not sure) that time points are plotted in the order that they occur in the data. If you first sort samples by time as a processing step, then do the plotting, perhaps it works without extra tricks?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: No status
Development

No branches or pull requests

5 participants