Last updated: 2021-07-03
Checks: 7 0
Knit directory: wildlife-bacteria/
This reproducible R Markdown analysis was created with workflowr (version 1.6.2). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.
Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.
Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.
The command set.seed(20210129)
was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.
Great job! Recording the operating system, R version, and package versions is critical for reproducibility.
Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.
Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.
Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.
The results in this page were generated with repository version af6ad5b. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.
Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish
or wflow_git_commit
). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:
Ignored files:
Ignored: .DS_Store
Ignored: .Rhistory
Ignored: .Rproj.user/
Ignored: analysis/.DS_Store
Ignored: analysis/_footer.html
Ignored: data/.DS_Store
Ignored: output/.DS_Store
Ignored: output/plots/.DS_Store
Ignored: output/plots/QC/.DS_Store
Ignored: output/plots/boxplots_select_taxa/.DS_Store
Ignored: output/plots/heatmaps/.DS_Store
Ignored: output/plots/maps/.DS_Store
Ignored: output/plots/tax_prev_abund/.DS_Store
Untracked files:
Untracked: ENA_docs/
Untracked: NCBI_data/
Untracked: data/dada2/
Untracked: data/dada2_tois/
Untracked: data/taxa_trees/
Untracked: data/tmp/
Untracked: output/beta-div-statistics.txt
Untracked: output/supp_table_pos.xlsx
Untracked: tmp/
Unstaged changes:
Modified: RData/amp_dec_bact.RData
Modified: RData/amp_raw_bact.RData
Modified: RData/ps_dec_bact.RData
Modified: RData/ps_raw_bact.RData
Modified: README.md
Deleted: analysis/site-map.Rmd
Deleted: analysis/tois.Rmd
Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.
These are the previous versions of the repository in which changes were made to the R Markdown (analysis/microbiome-viz.Rmd
) and HTML (docs/microbiome-viz.html
) files. If you’ve configured a remote Git repository (see ?wflow_git_remote
), click on the hyperlinks in the table below to view the files as they were in that past version.
File | Version | Author | Date | Message |
---|---|---|---|---|
Rmd | af6ad5b | siobhon-egan | 2021-07-03 | general update |
html | e926c9d | siobhon-egan | 2021-07-02 | fix conflicts |
html | 33466dd | siobhon-egan | 2021-07-01 | update inc accession no |
html | 5f4c86d | siobhon-egan | 2021-05-23 | restructure pages |
Rmd | 0d602a3 | siobhon-egan | 2021-05-23 | updates to pages |
html | 0d602a3 | siobhon-egan | 2021-05-23 | updates to pages |
html | 9ecc095 | siobhon-egan | 2021-04-24 | Build site. |
html | a69dea3 | siobhon-egan | 2021-04-24 | Build site. |
Rmd | 6ebcc5d | siobhon-egan | 2021-04-24 | updates |
html | 6ebcc5d | siobhon-egan | 2021-04-24 | updates |
html | a0173d4 | siobhon-egan | 2021-02-26 | Build site. |
Rmd | d502188 | siobhon-egan | 2021-02-26 | microbiome visualization |
Visualization and analysis of microbiome data.
As this page contains extensive microbiome analysis and some parts are computationally heavy only a selection of figures have been included in the rendered webpage.
Note this was run using R version 4.0.3 and RStudo version 1.4. See R info for full details of R session.
See Phyloseq page for details on data cleaning.
Install libraries if required
Only need to run this code once.
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
# phyloseq
source('http://bioconductor.org/biocLite.R')
biocLite('phyloseq')
#tidyverse
install. packages("tidyverse")
#ampvis2
install.packages("remotes")
remotes::install_github("MadsAlbertsen/ampvis2")
#ampvis2extras
install.packages("BiocManager")
BiocManager::install("kasperskytte/ampvis2extras")
#ggpubr
install.packages("ggpubr")
#agricolae
install.packages("agricolae")
install.packages("remotes")
remotes::install_github("DanielSprockett/reltools")
devtools::install_github('jsilve24/philr')
#decontam
BiocManager::install("decontam")
library(decontam)
BiocManager::install("FelixErnst/mia")
# Microbota process
remotes::install_github("YuLab-SMU/MicrobiotaProcess")
Load libraries
pkgs <- c("qiime2R", "phyloseq", "tidyverse", "ampvis2", "ampvis2extras",
"ggpubr", "agricolae", "plotly", "viridis", "cowplot", "MicrobeR",
"microbiome", "reshape", "decontam", "data.table", "ape", "DESeq2",
"vegan", "microbiomeutilities", "knitr", "tibble", "dplyr",
"patchwork", "Biostrings", "mia", "eulerr", "MicrobiotaProcess")
lapply(pkgs, require, character.only = TRUE)
Load data as imported and filtered as per phyloseq
# phyloseq raw data
load("Rdata/ps_raw_bact.RData")
# phyloseq decontam data
load("Rdata/ps_dec_bact.RData")
# ampvis2 raw data
load("Rdata/amp_raw_bact.RData")
# ampvis2 decontam data
load("Rdata/amp_dec_bact.RData")
Subset decontam objects to remove controls and create obj for each sample category Subset ampvis2 object based on sample category
# remove controls from phyloseq decontam data
ps_bact_samp = subset_samples(ps_dec_bact, !SampleType=="Control")
# sample category
ps_bact_bl = subset_samples(ps_bact_samp, SampleCategory=="Blood")
ps_bact_tis = subset_samples(ps_bact_samp, SampleCategory=="Tissue")
ps_bact_tick = subset_samples(ps_bact_samp, SampleCategory=="Tick")
# remove controls from ampvis2 decontam data
amp_samp <- amp_subset_samples(amp_dec_bact,
!SampleType %in% c("Control"),
RemoveAbsents = TRUE)
# sample category
amp_bl<- amp_subset_samples(amp_samp,
SampleCategory %in% c("Blood"),
RemoveAbsents = TRUE)
amp_tis<- amp_subset_samples(amp_samp,
SampleCategory %in% c("Tissue"),
RemoveAbsents = TRUE)
amp_tick<- amp_subset_samples(amp_samp,
SampleCategory %in% c("Tick"),
RemoveAbsents = TRUE)
Set colours for Sample Category
ColSampCat = c("#7a255d", "#9fd0cb", "#7566ff")
Plotting of rarefaction curves using ampvis2 - separating out by sample type. Use raw ASV count data for inspection of sequencing depth (using raw data here not decon data)
# Subset blood samples from raw data
amp_raw_bl0 <- amp_subset_samples(amp_raw_bact,
SampleCategory %in% c("Blood"),
minreads = 0,
RemoveAbsents = TRUE)
rarecurveblraw = amp_rarecurve(amp_raw_bl0, stepsize = 100)
rarecurveblraw1 = rarecurveblraw + ylab(label = "Number of ASVs") +
theme(axis.text.x = element_text(angle = 45, size=10, vjust = 0.5)) + ggtitle("Blood") + theme_bw()
# Subset Tick samples form raw data
amp_raw_tick0 <- amp_subset_samples(amp_raw_bact,
SampleCategory %in% c("Tick"),
minreads = 0,
RemoveAbsents = TRUE)
rarecurvetickraw = amp_rarecurve(amp_raw_tick0, stepsize = 100)
rarecurvetickraw1 = rarecurvetickraw + ylab(label = "Number of ASVs") +
theme(axis.text.x = element_text(angle = 45, size=10, vjust = 0.5)) + ggtitle("Tick") + theme_bw()
# Subset Tissue samples form raw data
amp_raw_tis100 <- amp_subset_samples(amp_raw_bact,
SampleCategory %in% c("Tissue"),
minreads = 100,
RemoveAbsents = TRUE)
rarecurvetisraw = amp_rarecurve(amp_raw_tis100, stepsize = 100)
rarecurvetisraw1 = rarecurvetisraw + ylab(label = "Number of ASVs") +
theme(axis.text.x = element_text(angle = 45, size=10, vjust = 0.5)) + ggtitle("Tissue") + theme_bw()
# Merge into one figure
rarecurve <- ggarrange(rarecurveblraw1, rarecurvetickraw1, rarecurvetisraw1,
labels = c("A", "B", "C"),
ncol = 1, nrow = 3)
ggsave("rarecurve.pdf", plot = rarecurve, path = "output/plots", width = 25, height = 40, units = "cm")
Make using alpha diversity plots with statistical values using microbiomeutilities
obs_alpha_plot <- plot_diversity_stats(ps_bact_samp, group = "SampleCategory",
index = "observed",
group.colors = ColSampCat,
label.format="p.format",
stats = TRUE)
obs_alpha_plot = obs_alpha_plot + ylab("No. Observed ASVs") + xlab("") + labs(title = "Observed")
chao1_alpha_plot <- plot_diversity_stats(ps_bact_samp, group = "SampleCategory",
index = "chao1",
group.colors = ColSampCat,
label.format="p.format",
stats = TRUE)
chao1_alpha_plot = chao1_alpha_plot + ylab("Chao1 Index") + xlab("") + labs(title = "Chao1")
shan_alpha_plot <- plot_diversity_stats(ps_bact_samp, group = "SampleCategory",
index = "diversity_shannon",
group.colors = ColSampCat,
label.format="p.format",
stats = TRUE)
shan_alpha_plot = shan_alpha_plot + ylab("Shannon Index") + xlab("") + labs(title = "Shannon")
invsimp_alpha_plot <- plot_diversity_stats(ps_bact_samp, group = "SampleCategory",
index = "diversity_inverse_simpson",
group.colors = ColSampCat,
label.format="p.format",
stats = TRUE)
invsimp_alpha_plot = invsimp_alpha_plot + ylab("Inverse Simpson Index") + xlab("") + labs(title = "Inverse Simpson")
#Merge all four alpha div plots into one figure
alphadiv_wp <- ggarrange(obs_alpha_plot, chao1_alpha_plot, shan_alpha_plot, invsimp_alpha_plot,
labels = c("A", "B", "C", "D"),
ncol = 2, nrow = 2)
#ggsave("alphadiv_withpvalues.pdf", plot = alphadiv_wp, path = "output/plots", width = 30, height = 30, units = "cm")
alphadiv_wp
EXTRA - Tick samples only
Repeat but just on tick data for supplementary information Make using alpha diversity plots with statistical values using microbiomeutilities
# subset tick species
ps_bact_tick_sub = subset_samples(ps_bact_tick, !tick_sp_short=="Ixhol;Ixtri")
tickspcols = c("#000004FF", "#781C6DFF", "#BB3754FF", "#ED6925FF", "#FCB519FF", "#c5cc00")
obs_alpha_plot_tk <- plot_diversity_stats(ps_bact_tick_sub, group = "tick_sp_short",
index = "observed",
group.colors = tickspcols,
label.format="p.format",
stats = TRUE)
obs_alpha_plot_tk = obs_alpha_plot_tk + ylab("No. Observed ASVs") + xlab("") + labs(title = "Observed")
chao1_alpha_plot_tk <- plot_diversity_stats(ps_bact_tick_sub, group = "tick_sp_short",
index = "chao1",
group.colors = tickspcols,
label.format="p.format",
stats = TRUE)
chao1_alpha_plot_tk = chao1_alpha_plot_tk + ylab("Chao1 Index") + xlab("") + labs(title = "Chao1")
shan_alpha_plot_tk <- plot_diversity_stats(ps_bact_tick_sub, group = "tick_sp_short",
index = "diversity_shannon",
group.colors = tickspcols,
label.format="p.format",
stats = TRUE)
shan_alpha_plot_tk = shan_alpha_plot_tk + ylab("Shannon Index") + xlab("") + labs(title = "Shannon")
invsimp_alpha_plot_tk <- plot_diversity_stats(ps_bact_tick_sub, group = "tick_sp_short",
index = "diversity_inverse_simpson",
group.colors = tickspcols,
label.format="p.format",
stats = TRUE)
invsimp_alpha_plot_tk = invsimp_alpha_plot_tk + ylab("Inverse Simpson Index") + xlab("") + labs(title = "Inverse Simpson")
#Merge all four alpha div plots into one figure
alphadiv_wp_tk <- ggarrange(obs_alpha_plot_tk, chao1_alpha_plot_tk, shan_alpha_plot_tk, invsimp_alpha_plot_tk,
labels = c("A", "B", "C", "D"),
ncol = 2, nrow = 2)
#ggsave("alphadiv_withpvalues_tk.pdf", plot = alphadiv_wp_tk, path = "output/plots", width = 30, height = 30, units = "cm")
alphadiv_wp_tk
Using fast_melt()
command to get an idea of most prevalent taxa and distribution
Plot Phylum - all samples
Melt phyloseq object to be able to group by taxa levels and filter for abundance and sample info.
# Using R code for taxa_summary.R function available here http://evomics.org/phyloseq/taxa_summary-r/
source("code/taxa_summary.R", local = TRUE)
##### All sample data
# Melt
mdt = fast_melt(ps_bact_samp)
# group by phyla
samp_phyla = mdt %>%
group_by(Phylum) %>%
summarize(sum = sum(count))
#### Blood samples
# Melt
mdt_bl = fast_melt(ps_bact_bl)
# group by family
bl_family = mdt_bl %>%
group_by(Family) %>%
summarize(sum = sum(count))
#### Tick samples
# Melt
mdt_tick = fast_melt(ps_bact_tick)
# group by family
tick_family = mdt_tick %>%
group_by(Family) %>%
summarize(sum = sum(count))
#### Tissue samples
# Tissue
mdt_tis = fast_melt(ps_bact_tis)
# group by family
tis_family = mdt_tis %>%
group_by(Family) %>%
summarize(sum = sum(count))
Make phyla plot for all samples
source("code/taxa_summary.R", local = TRUE)
# Melt
mdt = fast_melt(ps_bact_samp)
prevdt = mdt[, list(Prevalence = sum(count > 0),
TotalCounts = sum(count)),
by = TaxaID]
addPhylum = unique(copy(mdt[, list(TaxaID, Phylum)]))
# Join by TaxaID
setkey(prevdt, TaxaID)
setkey(addPhylum, TaxaID)
prevdt <- addPhylum[prevdt]
#Top 12 phyla
showPhyla = prevdt[, sum(TotalCounts), by = Phylum][order(-V1)][1:12]$Phylum
setkey(prevdt, Phylum)
Make plot for phyla
top_phyla1 = ggplot(prevdt[showPhyla],
mapping = aes(Prevalence, TotalCounts, color = Phylum)) +
geom_point(size = 2.5, alpha = 0.75) +
scale_y_log10() + theme_bw() + scale_color_viridis(discrete=TRUE) + ylab("Abundance") + labs(title = "Overall")
ggsave("top_phyla_all.pdf", plot = top_phyla1, path = "output/plots/tax_prev_abund", width = 15, height = 10, units = "cm")
Plot Phylum - by sample type
source("code/taxa_summary.R", local = TRUE)
### Blood
mdt_bl = fast_melt(ps_bact_bl)
prevdt = mdt_bl[, list(Prevalence = sum(count > 0),
TotalCounts = sum(count)),
by = TaxaID]
addPhylum = unique(copy(mdt_bl[, list(TaxaID, Phylum)]))
# Join by TaxaID
setkey(prevdt, TaxaID)
setkey(addPhylum, TaxaID)
prevdt <- addPhylum[prevdt]
# Top 12 phyla
showPhyla = prevdt[, sum(TotalCounts), by = Phylum][order(-V1)][1:12]$Phylum
setkey(prevdt, Phylum)
# Make plot for phyla
top_phyla_bl = ggplot(prevdt[showPhyla],
mapping = aes(Prevalence, TotalCounts, color = Phylum)) +
geom_point(size = 2.5, alpha = 0.75) +
scale_y_log10() + theme_bw() + scale_color_viridis(discrete=TRUE) + ylab("Abundance") + labs(title = "Blood")
ggsave("top_phyla_blood.pdf", plot = top_phyla_bl, path = "output/plots/tax_prev_abund", width = 15, height = 10, units = "cm")
### Tick
mdt_tick = fast_melt(ps_bact_tick)
prevdt = mdt_tick[, list(Prevalence = sum(count > 0),
TotalCounts = sum(count)),
by = TaxaID]
addPhylum = unique(copy(mdt_tick[, list(TaxaID, Phylum)]))
# Join by TaxaID
setkey(prevdt, TaxaID)
setkey(addPhylum, TaxaID)
prevdt <- addPhylum[prevdt]
# Top 12 phyla
showPhyla = prevdt[, sum(TotalCounts), by = Phylum][order(-V1)][1:12]$Phylum
setkey(prevdt, Phylum)
# Make plot for phyla
top_phyla_tick = ggplot(prevdt[showPhyla],
mapping = aes(Prevalence, TotalCounts, color = Phylum)) +
geom_point(size = 2.5, alpha = 0.75) +
scale_y_log10() + theme_bw() + scale_color_viridis(discrete=TRUE) + ylab("Abundance") + labs(title = "Tick")
ggsave("top_phyla_tick.pdf", plot = top_phyla_tick, path = "output/plots/tax_prev_abund", width = 15, height = 10, units = "cm")
### Tissue
mdt_tis = fast_melt(ps_bact_tis)
prevdt = mdt_tis[, list(Prevalence = sum(count > 0),
TotalCounts = sum(count)),
by = TaxaID]
addPhylum = unique(copy(mdt_tis[, list(TaxaID, Phylum)]))
# Join by TaxaID
setkey(prevdt, TaxaID)
setkey(addPhylum, TaxaID)
prevdt <- addPhylum[prevdt]
# Top 12 phyla
showPhyla = prevdt[, sum(TotalCounts), by = Phylum][order(-V1)][1:12]$Phylum
setkey(prevdt, Phylum)
# Make plot for phyla
top_phyla_tis = ggplot(prevdt[showPhyla],
mapping = aes(Prevalence, TotalCounts, color = Phylum)) +
geom_point(size = 2.5, alpha = 0.75) +
scale_y_log10() + theme_bw() + scale_color_viridis(discrete=TRUE) + ylab("Abundance") + labs(title = "Tissue")
ggsave("top_phyla_tissue.pdf", plot = top_phyla_tis, path = "output/plots/tax_prev_abund", width = 15, height = 10, units = "cm")
Merge all four phyla plots
top_phyla_merge <- ggarrange(top_phyla1, top_phyla_bl, top_phyla_tick, top_phyla_tis,
labels = c("A", "B", "C", "D"),
ncol = 2, nrow = 2)
ggsave("top_phyla_merge.pdf", plot = top_phyla_merge, path = "output/plots/tax_prev_abund", width = 30, height = 20, units = "cm")
Plot Family - blood samples
source("code/taxa_summary.R", local = TRUE)
mdt_bl = fast_melt(ps_bact_bl)
prevdt_bl = mdt_bl[, list(Prevalence = sum(count > 0),
TotalCounts = sum(count)),
by = TaxaID]
addFamily_bl = unique(copy(mdt_bl[, list(TaxaID, Family)]))
# Join by TaxaID
setkey(prevdt_bl, TaxaID)
setkey(addFamily_bl, TaxaID)
prevdt_bl <- addFamily_bl[prevdt_bl]
#Top 12 Family
showFamily_bl = prevdt_bl[, sum(TotalCounts), by = Family][order(-V1)][1:12]$Family
setkey(prevdt_bl, Family)
Make plot for phyla
top_Family_bl = ggplot(prevdt_bl[showFamily_bl],
mapping = aes(Prevalence, TotalCounts, color = Family)) +
geom_point(size = 2.5, alpha = 0.75) +
scale_y_log10() + theme_bw() + scale_color_viridis(discrete=TRUE)
ggsave("top_phyla.pdf", plot = top_phyla, path = "output/plots", width = 40, height = 15, units = "cm")
# Alternative plot
top_phyla3 = top_phyla1 + facet_wrap(~Phylum)
ggsave("top_phyla3.pdf", plot = top_phyla3, path = "output/plots", width = 40, height = 30, units = "cm")
Make frequency heat map using ampvis2
Aggregate taxa, showing 40 most abundant by Family species, facet by sample type
# Relative abundance
heatmap_rel <- amp_heatmap(amp_samp,
facet_by = "SampleCategory",
group_by = "species",
tax_aggregate = "Family",
tax_show = 40,
normalise = TRUE,
plot_values = FALSE,
plot_values_size = 3,
round = 0, color_vector = c("white", "#e5ba52", "#ab7ca3", "#9d02d7", "#0030bf"), plot_colorscale = "log10") +
theme(axis.text.x = element_text(angle = 45, size=10, vjust = 1),
axis.text.y = element_text(size=10),
legend.position="right")
heatmap_rel
# Count of sequences
heatmap_count <- amp_heatmap(amp_samp,
facet_by = "SampleCategory",
group_by = "species",
tax_aggregate = "Family",
tax_show = 40,
normalise = FALSE,
plot_values = FALSE,
plot_values_size = 3,
round = 0, color_vector = c("white", "#e5ba52", "#ab7ca3", "#9d02d7", "#0030bf"), plot_colorscale = "log10") + theme(axis.text.x = element_text(angle = 45, size=10, vjust = 1), axis.text.y = element_text(size=10), legend.position="right")
Save PDF of plots
ggsave("heatmap_rel.pdf", plot = heatmap_rel, path = "output/plots", width = 40, height = 25, units = "cm")
ggsave("heatmap_count.pdf", plot = heatmap_count, path = "output/plots", width = 40, height = 25, units = "cm")
Heat map with more detail for tick samples only. Aggregate taxa, showing 40 most abundant by Family species, facet by sample type
amp_tick <- amp_subset_samples(amp_samp, SampleCategory %in% c("Tick"))
amp_tick_sub <- amp_subset_samples(amp_tick,
tick_sp_short %in%
c("Ixhol", "Ixtri", "Ixtas", "Ixant", "Ixaus", "Amtri"))
# Relative abundance
hmap_rel_tick <- amp_heatmap(amp_tick_sub,
facet_by = "tick_sp_short",
group_by = "instar",
tax_aggregate = "Family",
tax_show = 15,
normalise = TRUE,
plot_values = FALSE,
plot_values_size = 3,
round = 0, color_vector = c("white", "#e5ba52", "#ab7ca3", "#9d02d7", "#0030bf"), plot_colorscale = "log10") +
theme(axis.text.x = element_text(angle = 45, size=10, vjust = 1),
axis.text.y = element_text(size=10),
legend.position="right")
hmap_rel_tick
Save plot
ggsave("heatmap_rel_tick.pdf", plot = hmap_rel_tick, path = "output/plots", width = 40, height = 25, units = "cm")
# Subset black rat and bandicoot
ps_bact_blsub = subset_samples(ps_bact_bl, species !="Quenda")
ps_bact_blsub = subset_samples(ps_bact_blsub, species !="Swamp rat")
# aggregate to family lebel
ps_bact_blsubF <- aggregate_taxa(ps_bact_blsub, "Family")
# create a gradient color palette for abundance
grad_ab <- colorRampPalette(c("#faf3dd","#f7d486" ,"#5e6472"))
grad_ab_pal <- grad_ab(10)
# create a color palette for varaibles of interest
meta_colors <- list(c("Black rat" = "#FFC857",
"Brush tail possum" = "#05B083",
"Long-nosed bandicoot" = "#5e6472",
"Bush rat" = "blue",
"Chuditch" = "pink"),
c("Blood" = "#7a255d",
"Tick" = "#9fd0cb",
"Tissue"="#7566ff"))
# add labels for pheatmap to detect
names(meta_colors) <- c("species", "SampleCategory")
heatmap_2 <- plot_taxa_heatmap(ps_bact_blsubF,
subset.top = 20,
VariableA = c("species","SampleCategory"),
heatcolors = grad_ab_pal, #rev(brewer.pal(6, "RdPu")),
transformation = "log10",
cluster_rows = T,
cluster_cols = F,
show_colnames = F,
annotation_colors=meta_colors #, show_rownames = F)
Plotting selected taxa of interest following microbiomeutilities
Aggregate taxonomy to best family level and select taxa of interest
ps_bact_sampF <- aggregate_taxa(ps_bact_samp, "Family")
# Identify top 40 taxa
top_taxa(ps_bact_sampF, 40)
select.taxa <- c("Anaplasmataceae",
"Midichloriaceae",
"Bacteria_Proteobacteria_Gammaproteobacteria_Legionellales_Coxiellaceae",
"Bartonellaceae",
"Mycobacteriaceae",
"Rickettsiaceae",
"Borreliaceae",
"Francisellaceae",
"Mycoplasmataceae")
Now plot selected taxa with statistical analysis
plot_select_taxa <- plot_listed_taxa(ps_bact_sampF, select.taxa,
group= "SampleCategory",
group.colors = ColSampCat,
add.violin = TRUE,
violin.opacity = 0.3,
dot.opacity = 0.25,
box.opacity = 0.25,
panel.arrange= "wrap")
# Add statistical analysis
comps <- make_pairs(sample_data(ps_bact_sampF)$SampleCategory)
plot_select_taxa1 <- plot_select_taxa + stat_compare_means(
comparisons = comps,
label = "p.format",
tip.length = 0.05,
method = "wilcox.test",
size = 3) + xlab("Sample type") + ylab("Relative abundance (%)") + scale_y_continuous(labels = scales::percent)
# Save plot as PDF
#ggsave("plot_select_taxa.pdf", plot = plot_select_taxa1, path = "output/plots", width = 30, height = 30, units = "cm")
plot_select_taxa1
Repeat process - this time among host species
# identify number of species
n_distinct(ps_bact_samp@sam_data$species)
# set colours
speciescol = viridis(10, option = "B")
# Make plot
plot_select_taxa_sp <- plot_listed_taxa(ps_bact_sampF, select.taxa,
group= "species",
group.colors = speciescol,
add.violin = TRUE,
violin.opacity = 0.3,
dot.opacity = 0.25,
box.opacity = 0.25,
panel.arrange= "wrap")
# Add statistical analysis
comps <- make_pairs(sample_data(ps_bact_sampF)$species)
plot_select_taxa_sp1 <- plot_select_taxa_sp + stat_compare_means(
comparisons = comps,
label = "p.format",
tip.length = 0.05,
method = "wilcox.test",
size = 3) + xlab("Species") + ylab("Relative abundance (%)") + scale_y_continuous(labels = scales::percent)
### For visualization saving option with and without p values
# Customise plot
plot_select_taxa_sp3 = plot_select_taxa_sp + theme(axis.text.x = element_text(angle=45, size=8, vjust = 0.5)) + scale_x_discrete(labels = function(x) str_wrap(x, width = 10)) + xlab("Species") + ylab("Relative abundance (%)") + scale_y_continuous(labels = scales::percent)
plot_select_taxa_sp4 = plot_select_taxa_sp1 + theme(axis.text.x = element_text(angle=45, size=8, vjust = 0.5)) + scale_x_discrete(labels = function(x) str_wrap(x, width = 10))
# Save plot as PDF
ggsave("plot_select_taxa_sp1.pdf", plot = plot_select_taxa_sp3, path = "output/plots", width = 30, height = 30, units = "cm")
ggsave("plot_select_taxa_sp2.pdf", plot = plot_select_taxa_sp4, path = "output/plots", width = 30, height = 30, units = "cm")
Now rotate through this to separate sample types and species
1. Blood
ps_bact_blF <- aggregate_taxa(ps_bact_bl, "Family")
# Identify top 40 taxa
select.taxabl <- c("Anaplasmataceae",
"Bacteria_Proteobacteria_Gammaproteobacteria_Legionellales_Coxiellaceae",
"Bartonellaceae",
"Mycobacteriaceae",
"Mycoplasmataceae")
# identify number of species
n_distinct(ps_bact_bl@sam_data$species)
# set colours
speciesblcol = viridis(7, option = "D")
# Now plot selected taxa with statistical analysis
plot_select_taxa_bl <- plot_listed_taxa(ps_bact_blF, select.taxabl,
group= "species",
group.colors = speciesblcol,
add.violin = TRUE,
violin.opacity = 0.3,
dot.opacity = 0.25,
box.opacity = 0.25,
panel.arrange= "wrap")
# Add statistical analysis
comps <- make_pairs(sample_data(ps_bact_blF)$species)
plot_select_taxa_bl1 <- plot_select_taxa_bl + stat_compare_means(
comparisons = comps,
label = "p.format",
tip.length = 0.05,
method = "wilcox.test",
size = 3) + xlab("Species") + ylab("Relative abundance (%)") + scale_y_continuous(labels = scales::percent)
### For visualization saving option with and without p values
# Customise plot
plot_select_taxa_bl3 = plot_select_taxa_bl + theme(axis.text.x = element_text(angle=45, size=8, vjust = 0.5)) + scale_x_discrete(labels = function(x) str_wrap(x, width = 10)) + xlab("Species") + ylab("Relative abundance (%)") + scale_y_continuous(labels = scales::percent)
plot_select_taxa_bl4 = plot_select_taxa_bl1 + theme(axis.text.x = element_text(angle=45, size=8, vjust = 0.5)) + scale_x_discrete(labels = function(x) str_wrap(x, width = 10))
# Save plot as PDF
ggsave("plot_select_taxa_bl1.pdf", plot = plot_select_taxa_bl3, path = "output/plots", width = 30, height = 20, units = "cm")
ggsave("plot_select_taxa_bl2.pdf", plot = plot_select_taxa_bl4, path = "output/plots", width = 30, height = 20, units = "cm")
2. Ticks
ps_bact_tickF <- aggregate_taxa(ps_bact_tick, "Family")
# Identify top 40 taxa
select.taxa.tick <- c("Anaplasmataceae",
"Midichloriaceae",
"Bacteria_Proteobacteria_Gammaproteobacteria_Legionellales_Coxiellaceae",
"Bartonellaceae",
"Mycobacteriaceae",
"Rickettsiaceae",
"Francisellaceae")
# identify number of species
n_distinct(ps_bact_tick@sam_data$species)
# set colours
speciestickcol = viridis(8, option = "D")
# Now plot selected taxa with statistical analysis
plot_select_taxa_tick <- plot_listed_taxa(ps_bact_tickF, select.taxa.tick,
group= "species",
group.colors = speciestickcol,
add.violin = TRUE,
violin.opacity = 0.3,
dot.opacity = 0.25,
box.opacity = 0.25,
panel.arrange= "wrap")
# Add statistical analysis
comps <- make_pairs(sample_data(ps_bact_tickF)$species)
plot_select_taxa_tick1 <- plot_select_taxa_tick + stat_compare_means(
comparisons = comps,
label = "p.format",
tip.length = 0.05,
method = "wilcox.test",
size = 3) + xlab("Species") + ylab("Relative abundance (%)") + scale_y_continuous(labels = scales::percent)
### For visualization saving option with and without p values
# Customise plot
plot_select_taxa_tick3 = plot_select_taxa_tick + theme(axis.text.x = element_text(angle=45, size=8, vjust = 0.5)) + scale_x_discrete(labels = function(x) str_wrap(x, width = 10)) + xlab("Species") + ylab("Relative abundance (%)") + scale_y_continuous(labels = scales::percent)
plot_select_taxa_tick4 = plot_select_taxa_tick1 + theme(axis.text.x = element_text(angle=45, size=8, vjust = 0.5)) + scale_x_discrete(labels = function(x) str_wrap(x, width = 10))
# Save plot as PDF
ggsave("plot_select_taxa_tick1.pdf", plot = plot_select_taxa_tick3, path = "output/plots", width = 30, height = 30, units = "cm")
ggsave("plot_select_taxa_tick2.pdf", plot = plot_select_taxa_tick4, path = "output/plots", width = 30, height = 30, units = "cm")
3. Tissue
ps_bact_tisF <- aggregate_taxa(ps_bact_tis, "Family")
# Identify top 40 taxa
select.taxa.tis <- c("Anaplasmataceae",
"Midichloriaceae",
"Bacteria_Proteobacteria_Gammaproteobacteria_Legionellales_Coxiellaceae",
"Bartonellaceae",
"Mycobacteriaceae",
"Borreliaceae",
"Mycoplasmataceae")
# identify number of species
n_distinct(ps_bact_tis@sam_data$species)
# set colours
speciestiscol = viridis(9, option = "D")
# Now plot selected taxa with statistical analysis
plot_select_taxa_tis <- plot_listed_taxa(ps_bact_tisF, select.taxa.tis,
group= "species",
group.colors = speciestiscol,
add.violin = TRUE,
violin.opacity = 0.3,
dot.opacity = 0.25,
box.opacity = 0.25,
panel.arrange= "wrap")
# Add statistical analysis
comps <- make_pairs(sample_data(ps_bact_tisF)$species)
plot_select_taxa_tis1 <- plot_select_taxa_tis + stat_compare_means(
comparisons = comps,
label = "p.format",
tip.length = 0.05,
method = "wilcox.test",
size = 3) + xlab("Species") + ylab("Relative abundance (%)") + scale_y_continuous(labels = scales::percent)
### For visualization saving option with and without p values
# Customise plot
plot_select_taxa_tis3 = plot_select_taxa_tis + theme(axis.text.x = element_text(angle=45, size=8, vjust = 0.5)) + scale_x_discrete(labels = function(x) str_wrap(x, width = 10)) + xlab("Species") + ylab("Relative abundance (%)") + scale_y_continuous(labels = scales::percent)
plot_select_taxa_tis4 = plot_select_taxa_tis1 + theme(axis.text.x = element_text(angle=45, size=8, vjust = 0.5)) + scale_x_discrete(labels = function(x) str_wrap(x, width = 10))
# Save plot as PDF
ggsave("plot_select_taxa_tis1.pdf", plot = plot_select_taxa_tis3, path = "output/plots", width = 30, height = 30, units = "cm")
ggsave("plot_select_taxa_tis2.pdf", plot = plot_select_taxa_tis4, path = "output/plots", width = 30, height = 30, units = "cm")
More detailed info on Anaplasmataceae family and tick species
# Subset Ixodes holocyclus
Ixhol = subset_samples(ps_bact_tick, tick_sp_short == "Ixhol")
IxholF <- aggregate_taxa(Ixhol, "Family")
# Select just Anaplasmataceae
select.taxa_ana <- c("Anaplasmataceae")
# identify number of species
n_distinct(IxholF@sam_data$instar)
# set colours
instarcol = viridis(4, option = "D")
# Now plot selected taxa with statistical analysis
ixhol_ana <- plot_listed_taxa(IxholF, select.taxa_ana,
group= "instar",
group.colors = instarcol,
add.violin = TRUE,
violin.opacity = 0.3,
dot.opacity = 0.25,
box.opacity = 0.25,
panel.arrange= "wrap")
# Add statistical analysis
comps <- make_pairs(sample_data(IxholF)$instar)
ixhol_ana1 <- ixhol_ana + stat_compare_means(
comparisons = comps,
label = "p.format",
tip.length = 0.05,
method = "wilcox.test",
size = 3) + xlab("Instar") + ylab("Relative abundance (%)") + scale_y_continuous(labels = scales::percent)
### For visualization saving option with and without p values
# Customise plot
plot_select_taxa_bl3 = plot_select_taxa_bl + theme(axis.text.x = element_text(angle=45, size=8, vjust = 0.5)) + scale_x_discrete(labels = function(x) str_wrap(x, width = 10)) + xlab("Species") + ylab("Relative abundance (%)") + scale_y_continuous(labels = scales::percent)
plot_select_taxa_bl4 = plot_select_taxa_bl1 + theme(axis.text.x = element_text(angle=45, size=8, vjust = 0.5)) + scale_x_discrete(labels = function(x) str_wrap(x, width = 10))
# Save plot as PDF
ggsave("plot_select_taxa_bl1.pdf", plot = plot_select_taxa_bl3, path = "output/plots", width = 30, height = 20, units = "cm")
ggsave("plot_select_taxa_bl2.pdf", plot = plot_select_taxa_bl4, path = "output/plots", width = 30, height = 20, units = "cm")
Plot top taxa using microbiomeutilities and include statistics to compare means
All samples
Plot for all samples and separate by sample type, include top 9 taxa (family level)
ColSampCat = c("#7a255d", "#9fd0cb", "#7566ff")
top_taxa_all <- plot_taxa_boxplot(ps_bact_samp,
taxonomic.level = "Family",
top.otu = 9,
group = "SampleCategory",
add.violin= TRUE,
title = "Top nine family",
keep.other = FALSE,
group.order = c("Blood","Tick","Tissue"),
group.colors = ColSampCat,
dot.size = 1)
#print(pn + theme_biome_utils())
# Statistical analysis (non-parametric)
comps <- make_pairs(sample_data(ps_bact_samp)$SampleCategory)
print(comps)
top_taxa_all1 <- top_taxa_all + stat_compare_means(
comparisons = comps,
label = "p.format",
tip.length = 0.05,
method = "wilcox.test", size = 3)
top_taxa_all2 = top_taxa_all1 + scale_y_continuous(labels = scales::percent) + theme_biome_utils() + theme(legend.position="none")
#Save plot
#ggsave("top_taxa_all2.pdf", plot = top_taxa_all2, path = "output/plots", width = 30, height = 30, units = "cm")
top_taxa_all2
Now plot separate sample categories and separate by species
1. Blood
# set colours
speciesblcol = viridis(7, option = "D")
top_taxa_bl <- plot_taxa_boxplot(ps_bact_bl,
taxonomic.level = "Family",
top.otu = 6,
group = "species",
add.violin= TRUE,
title = "Blood",
keep.other = FALSE,
group.colors = speciesblcol,
group.order = c("Black rat","Brush tail possum","Bush rat","Chuditch","Long-nosed bandicoot","Quenda","Swamp rat"),
dot.size = 1)
# Statistical analysis (non-parametric)
comps <- make_pairs(sample_data(ps_bact_bl)$species)
print(comps)
top_taxa_bl1 <- top_taxa_bl + stat_compare_means(
comparisons = comps,
label = "p.format",
tip.length = 0.05,
method = "wilcox.test", size = 3)
top_taxa_bl2 = top_taxa_bl1 + scale_y_continuous(labels = scales::percent) + theme_biome_utils() + theme(legend.position="none")
# Customise plot - make 2 (1 with and one without p values)
top_taxa_bl3 = top_taxa_bl + theme_biome_utils()
top_taxa_bl3 = top_taxa_bl3 + theme(legend.position="none", axis.text.x = element_text(angle=45, size=8, vjust = 0.5)) + scale_x_discrete(labels = function(x) str_wrap(x, width = 10)) + xlab("Species") + ylab("Relative abundance (%)") + scale_y_continuous(labels = scales::percent)
top_taxa_bl4 = top_taxa_bl1 + theme_biome_utils()
top_taxa_bl4 = top_taxa_bl4 + theme(legend.position="none", axis.text.x = element_text(angle=45, size=8, vjust = 0.5)) + scale_x_discrete(labels = function(x) str_wrap(x, width = 10)) + xlab("Species") + ylab("Relative abundance (%)") + scale_y_continuous(labels = scales::percent)
# Save plot as PDF
ggsave("top_taxa_bl1.pdf", plot = top_taxa_bl3, path = "output/plots", width = 30, height = 20, units = "cm")
ggsave("top_taxa_bl2.pdf", plot = top_taxa_bl4, path = "output/plots", width = 30, height = 20, units = "cm")
2. Tick
#set colours
speciestickcol = viridis(8, option = "D")
top_taxa_tick <- plot_taxa_boxplot(ps_bact_tick,
taxonomic.level = "Family",
top.otu = 6,
group = "species",
add.violin= TRUE,
title = "Tick",
keep.other = FALSE,
group.colors = speciestickcol,
group.order = c("Black rat","Brown antechinus", "Brush tail possum","Bush rat","Chuditch","Long-nosed bandicoot","Quenda", "Rabbit"),
dot.size = 1)
# Statistical analysis (non-parametric)
comps <- make_pairs(sample_data(ps_bact_tick)$species)
print(comps)
top_taxa_tick1 <- top_taxa_tick + stat_compare_means(
comparisons = comps,
label = "p.format",
tip.length = 0.05,
method = "wilcox.test", size = 3)
top_taxa_tick2 = top_taxa_tick1 + scale_y_continuous(labels = scales::percent) + theme_biome_utils() + theme(legend.position="none")
# Customise plot - make 2 (1 with and one without p values)
top_taxa_tick3 = top_taxa_tick + theme_biome_utils()
top_taxa_tick3 = top_taxa_tick3 + theme(legend.position="none", axis.text.x = element_text(angle=45, size=8, vjust = 0.5)) + scale_x_discrete(labels = function(x) str_wrap(x, width = 10)) + xlab("Species") + ylab("Relative abundance (%)") + scale_y_continuous(labels = scales::percent)
top_taxa_tick4 = top_taxa_tick1 + theme_biome_utils()
top_taxa_tick4 = top_taxa_tick4 + theme(legend.position="none", axis.text.x = element_text(angle=45, size=8, vjust = 0.5)) + scale_x_discrete(labels = function(x) str_wrap(x, width = 10)) + xlab("Species") + ylab("Relative abundance (%)") + scale_y_continuous(labels = scales::percent)
# Save plot as PDF
ggsave("top_taxa_tick1.pdf", plot = top_taxa_tick3, path = "output/plots", width = 30, height = 20, units = "cm")
ggsave("top_taxa_tick2.pdf", plot = top_taxa_tick4, path = "output/plots", width = 30, height = 20, units = "cm")
3. Tissue
speciestiscol = viridis(9, option = "D")
top_taxa_tis <- plot_taxa_boxplot(ps_bact_tis,
taxonomic.level = "Family",
top.otu = 6,
group = "species",
add.violin= TRUE,
title = "Tissue",
keep.other = FALSE,
group.colors = speciestiscol,
group.order = c("Black rat","Brown antechinus", "Brush tail possum","Bush rat","Chuditch","Deer", "Long-nosed bandicoot","Quenda", "Rabbit", "Swamp rat"),
dot.size = 1)
# Statistical analysis (non-parametric)
comps <- make_pairs(sample_data(ps_bact_tis)$species)
print(comps)
top_taxa_tis1 <- top_taxa_tis + stat_compare_means(
comparisons = comps,
label = "p.format",
tip.length = 0.05,
method = "wilcox.test", size = 3)
top_taxa_tis2 = top_taxa_tis1 + scale_y_continuous(labels = scales::percent) + theme_biome_utils() + theme(legend.position="none")
# Customise plot - make 2 (1 with and one without p values)
top_taxa_tis3 = top_taxa_tis + theme_biome_utils()
top_taxa_tis3 = top_taxa_tis3 + theme(legend.position="none", axis.text.x = element_text(angle=45, size=8, vjust = 0.5)) + scale_x_discrete(labels = function(x) str_wrap(x, width = 10)) + xlab("Species") + ylab("Relative abundance (%)") + scale_y_continuous(labels = scales::percent)
top_taxa_tis4 = top_taxa_tis1 + theme_biome_utils()
top_taxa_tis4 = top_taxa_tis4 + theme(legend.position="none", axis.text.x = element_text(angle=45, size=8, vjust = 0.5)) + scale_x_discrete(labels = function(x) str_wrap(x, width = 10)) + xlab("Species") + ylab("Relative abundance (%)") + scale_y_continuous(labels = scales::percent)
# Save plot as PDF
ggsave("top_taxa_tis1.pdf", plot = top_taxa_tis3, path = "output/plots", width = 30, height = 20, units = "cm")
ggsave("top_taxa_tis2.pdf", plot = top_taxa_tis4, path = "output/plots", width = 30, height = 20, units = "cm")
Compare abundance vs prevalence for top taxa from Long-nosed bandicoot blood samples microbiomeutelities
ps_bact_bl = subset_samples(ps_bact_samp, SampleCategory == "Blood")
asv_ps <- microbiome::transform(ps_bact_bl, "compositional")
# Select healthy samples
asv_ps <- subset_samples(asv_ps, species=="Long-nosed bandicoot")
asv_ps <- core(asv_ps,detection = 0.001, prevalence = 0.05) # reduce size for example
asv_ps <- format_to_besthit(asv_ps)
set.seed(14285)
p_v <- plot_abund_prev(asv_ps,
label.core = TRUE,
color = "Phylum", # NA or "blue"
mean.abund.thres = 0.01,
mean.prev.thres = 0.99,
dot.opacity = 0.7,
label.size = 3,
label.opacity = 1.0,
nudge.label=-0.15,
bs.iter=9, # increase for actual analysis e.g. 999
size = 29, # increase to match your nsamples(asv_ps)
replace = TRUE,
label.color="#5f0f40")
prev_abund <- p_v +
geom_vline(xintercept = 0.25, lty="dashed", alpha=0.7) +
geom_hline(yintercept = 0.005,lty="dashed", alpha=0.7) +
scale_color_viridis(discrete=TRUE)
prev_abund
SampleCategory
Make venn diagram to show overlap between blood, tick and tissue samples
# how many samples in each category
table(meta(ps_bact_samp)$SampleCategory, useNA = "always")
# transform data
ps_bact_samprel <- microbiome::transform(ps_bact_samp, "compositional")
# assign best hit
ps_bact_samprel.f <- format_to_besthit(ps_bact_samprel)
# Make a list of SampleCategory
sample_cat <- unique(as.character(meta(ps_bact_samprel.f)$SampleCategory))
print(sample_cat)
Format to best hit and then Re-Write a for loop to go through each of the disease_states one by one and combine identified core taxa into a list. Loop through for three different abundance/prevalence situations
option a
list_core <- c() # an empty object to store information
for (n in sample_cat){ # for each variable n in SampleCategory
#print(paste0("Identifying Core Taxa for ", n))
ps.sub <- subset_samples(ps_bact_samprel.f, SampleCategory == n) # Choose sample from SampleCategory by n
core_m <- core_members(ps.sub, # ps.sub is phyloseq selected with only samples from g
detection = 0.001, # 0.001 in at least 50% samples
prevalence = 0.05)
print(paste0("No. of core taxa in ", n, " : ", length(core_m))) # print core taxa identified in each SampleCategory
list_core[[n]] <- core_m # add to a list core taxa for each group.
#print(list_core)
}
vennpl1 = plot(venn(list_core),
fills = ColSampCat)
option b
list_core <- c() # an empty object to store information
for (n in sample_cat){ # for each variable n in SampleCategory
#print(paste0("Identifying Core Taxa for ", n))
ps.sub <- subset_samples(ps_bact_samprel.f, SampleCategory == n) # Choose sample from SampleCategory by n
core_m <- core_members(ps.sub, # ps.sub is phyloseq selected with only samples from g
detection = 0.001, # 0.001 in at least 0.1% samples
prevalence = 0.001)
print(paste0("No. of core taxa in ", n, " : ", length(core_m))) # print core taxa identified in each SampleCategory
list_core[[n]] <- core_m # add to a list core taxa for each group.
#print(list_core)
}
vennpl2 = plot(venn(list_core),
fills = ColSampCat)
option c
#formt to best hit
list_core <- c() # an empty object to store information
for (n in sample_cat){ # for each variable n in SampleCategory
#print(paste0("Identifying Core Taxa for ", n))
ps.sub <- subset_samples(ps_bact_samprel.f, SampleCategory == n) # Choose sample from SampleCategory by n
core_m <- core_members(ps.sub, # ps.sub is phyloseq selected with only samples from g
detection = 0.0001, # 0.0001 in at least 50% samples
prevalence = 0.05)
print(paste0("No. of core taxa in ", n, " : ", length(core_m))) # print core taxa identified in each SampleCategory
list_core[[n]] <- core_m # add to a list core taxa for each group.
#print(list_core)
}
vennpl3 = plot(venn(list_core),
fills = ColSampCat)
Merge into one figure
vennplot <- ggarrange(vennpl1, vennpl2, vennpl3,
labels = c("A", "B", "C"),
ncol = 3, nrow = 1)
#ggsave("vennplot.pdf", plot = vennplot, path = "output/plots", width = 25, height = 10, units = "cm")
vennplot
species
by SampleCategory
Set colours so they are the same for each species
# Use viridis package to retrieve 4 colours for species
viridis(4, option = "D")
# Then lighten colours using https://projects.susielu.com/viz-palette
spcols <- c("Black rat"="#723281", "Brush tail possum"="#6396BE", "Chuditch"="#6EEAA8", "Long-nosed bandicoot"="#FFFF63")
BLOOD
Make venn diagram to show overlap top species for blood samples.
# how many samples in each category
table(meta(ps_bact_bl)$species, useNA = "always")
# subset species
blood_sub <- subset_samples(ps_bact_bl, species %in% c("Black rat", "Brush tail possum", "Chuditch", "Long-nosed bandicoot"))
# transform data
blood_subrel <- microbiome::transform(blood_sub, "compositional")
# assign best hit
blood_subrel.f <- format_to_besthit(blood_subrel)
# Make a list of SampleCategory
species_cat <- unique(as.character(meta(blood_subrel.f)$species))
print(species_cat)
Format to best best and then Re-Write a for loop to go through each of the disease_states one by one and combine identified core taxa into a list. Loop through for three different abundance/prevalence situations - using option a patameters
list_core <- c() # an empty object to store information
for (n in species_cat){ # for each variable n in SampleCategory
#print(paste0("Identifying Core Taxa for ", n))
ps.sub <- subset_samples(blood_subrel.f, species == n) # Choose sample from SampleCategory by n
core_m <- core_members(ps.sub, # ps.sub is phyloseq selected with only samples from g
detection = 0.001, # 0.001 in at least 50% samples
prevalence = 0.05)
print(paste0("No. of core taxa in ", n, " : ", length(core_m))) # print core taxa identified in each SampleCategory
list_core[[n]] <- core_m # add to a list core taxa for each group.
#c
}
blspcols <- c("#723281", "#6396BE", "#6EEAA8", "#FFFF63")
vennpl_bl = plot(venn(list_core),
fills = blspcols)
Tissue
Make venn diagram to show overlap top species for tissue samples.
# how many samples in each category
table(meta(ps_bact_tis)$species, useNA = "always")
# subset species
tissue_sub <- subset_samples(ps_bact_tis, species %in% c("Black rat", "Brush tail possum", "Chuditch", "Long-nosed bandicoot"))
# transform data
tissue_subrel <- microbiome::transform(tissue_sub, "compositional")
# assign best hit
tissue_subrel.f <- format_to_besthit(tissue_subrel)
# Make a list of SampleCategory
species_cat <- unique(as.character(meta(tissue_subrel.f)$species))
print(species_cat)
Format to best best and then Re-Write a for loop to go through each of the disease_states one by one and combine identified core taxa into a list. Loop through for three different abundance/prevalence situations - using option a patameters
list_core <- c() # an empty object to store information
for (n in species_cat){ # for each variable n in SampleCategory
#print(paste0("Identifying Core Taxa for ", n))
ps.sub <- subset_samples(tissue_subrel.f, species == n) # Choose sample from SampleCategory by n
core_m <- core_members(ps.sub, # ps.sub is phyloseq selected with only samples from g
detection = 0.001, # 0.001 in atleast 90% samples
prevalence = 0.05)
print(paste0("No. of core taxa in ", n, " : ", length(core_m))) # print core taxa identified in each SampleCategory
list_core[[n]] <- core_m # add to a list core taxa for each group.
#print(list_core)
}
tisspcols <- c("#6EEAA8", "#6396BE", "#723281", "#FFFF63")
vennpl_tis = plot(venn(list_core),
fills = tisspcols)
Merge into one figure
vennplot_bltis <- ggarrange(vennpl_bl, vennpl_tis,
labels = c("Blood", "Tissue"),
ncol = 2, nrow = 1)
ggsave("vennplot_bltis.pdf", plot = vennplot_bltis, path = "output/plots", width = 25, height = 10, units = "cm")
tick_species
Make venn diagram to show overlap top species for blood samples.
# how many samples in each category
table(meta(ps_bact_tick)$tick_species, useNA = "always")
# subset species
tick_sub <- subset_samples(ps_bact_tick, tick_species %in% c("Amblyomma triguttatum", "Ixodes holocyclus", "Ixodes tasmani", "Ixodes trichosuri"))
# transform data
tick_subrel <- microbiome::transform(tick_sub, "compositional")
# assign best hit
tick_subrel.f <- format_to_besthit(tick_subrel)
# Make a list of SampleCategory
species_cat <- unique(as.character(meta(tick_subrel.f)$tick_species))
print(species_cat)
Format to best best and then Re-Write a for loop to go through each of the disease_states one by one and combine identified core taxa into a list. Loop through for three different abundance/prevalence situations - using option a patameters
list_core <- c() # an empty object to store information
for (n in species_cat){ # for each variable n in SampleCategory
#print(paste0("Identifying Core Taxa for ", n))
ps.sub <- subset_samples(tick_subrel.f, tick_species == n) # Choose sample from SampleCategory by n
core_m <- core_members(ps.sub, # ps.sub is phyloseq selected with only samples from g
detection = 0.001, # 0.001 in atleast 90% samples
prevalence = 0.05)
print(paste0("No. of core taxa in ", n, " : ", length(core_m))) # print core taxa identified in each SampleCategory
list_core[[n]] <- core_m # add to a list core taxa for each group.
#c
}
tickspcolssub = c("#BB3754FF","#c5cc00", "#FCB519FF", "#D3D3D3")
#tickspcols <- c("#723281", "#6396BE", "#6EEAA8", "#FFFF63")
vennpl_tick = plot(venn(list_core),
fills = tickspcolssub)
Save figure
ggsave("vennpl_tick.pdf", plot = vennpl_tick, path = "output/plots", width = 15, height = 10, units = "cm")
You will need to have a play around with the difference options: ordination method, data transformation and distance measure to see which one best explains the data. So I suggest try to keep good code here with lots of notes.
Ordination plots by documentation. Plots PCA
, RDA
, CA
, CCS
, DCA
, NMDS
, PCOA
, and MMDS
. Note when performing NMDS do not transform data - i.e. transform = "none"
. For PCoA and MMDS need to include a distmeasure
option. Valid options outlined here (note some require a phylogenetic tree). Using transform = “Hellinger” and distmeasure = “bray” and “jaccard” for PCOA and MMDS
All samples, and sort by sample type.
#Filter out samples with <1000 reads after QC
amp_samp1000 <- amp_subset_samples(amp_samp,
minreads = 1000,
RemoveAbsents = TRUE)
# CCA Canonical Correspondence Analysis (considered the constrained version of CA)
ord_CCA <- amp_ordinate(amp_samp1000,
type = "CCA",
constrain = "SampleCategory",
transform = "Hellinger",
sample_color_by = "SampleCategory",
sample_shape_by = "species",
sample_colorframe = TRUE,
sample_colorframe_label = "SampleCategory",
detailed_output = TRUE)
# CA Correspondence Analysis
ord_CA <- amp_ordinate(amp_samp1000,
type = "CA",
constrain = "SampleCategory",
transform = "Hellinger",
sample_color_by = "SampleCategory",
sample_shape_by = "species",
sample_colorframe = TRUE,
sample_colorframe_label = "SampleCategory",
detailed_output = TRUE)
# PCA Principal Components Analysis
ord_PCA <- amp_ordinate(amp_samp1000,
type = "PCA",
constrain = "SampleCategory",
transform = "Hellinger",
sample_color_by = "SampleCategory",
sample_shape_by = "species",
sample_colorframe = TRUE,
sample_colorframe_label = "SampleCategory",
detailed_output = TRUE)
# NMDS non-metric Multidimensional Scaling
ord_NMDS <- amp_ordinate(amp_samp1000,
type = "NMDS",
constrain = "SampleCategory",
transform = "Hellinger",
sample_color_by = "SampleCategory",
sample_shape_by = "species",
sample_colorframe = TRUE,
sample_colorframe_label = "SampleCategory",
detailed_output = TRUE)
# RDA Redundancy Analysis (considered the constrained version of PCA)
ord_RDA <- amp_ordinate(amp_samp1000,
type = "RDA",
constrain = "SampleCategory",
transform = "Hellinger",
sample_color_by = "SampleCategory",
sample_shape_by = "species",
sample_colorframe = TRUE,
sample_colorframe_label = "SampleCategory",
detailed_output = TRUE)
# PCOA Principal Coordinates Analysis Jaccard
ord_PCOA_jac <- amp_ordinate(amp_samp1000,
type = "PCOA",
constrain = "SampleCategory",
distmeasure = "jaccard",
transform = "none",
sample_color_by = "SampleCategory",
sample_shape_by = "species",
sample_colorframe = TRUE,
sample_colorframe_label = "SampleCategory",
detailed_output = TRUE)
# PCOA Principal Coordinates Analysis Bray
ord_PCOA_bray <- amp_ordinate(amp_samp1000,
type = "PCOA",
constrain = "SampleCategory",
distmeasure = "bray",
transform = "none",
sample_color_by = "SampleCategory",
sample_shape_by = "species",
sample_colorframe = TRUE,
sample_colorframe_label = "SampleCategory",
detailed_output = TRUE)
# MMDS metric Multidimensional Scaling jaccard
ord_MMDS_jac <- amp_ordinate(amp_samp1000,
type = "MMDS",
constrain = "SampleCategory",
distmeasure = "jaccard",
transform = "none",
sample_color_by = "SampleCategory",
sample_shape_by = "species",
sample_colorframe = TRUE,
sample_colorframe_label = "SampleCategory",
detailed_output = TRUE)
# MMDS metric Multidimensional Scaling Bray
ord_MMDS_bray <- amp_ordinate(amp_samp1000,
type = "MMDS",
constrain = "SampleCategory",
distmeasure = "bray",
transform = "none",
sample_color_by = "SampleCategory",
sample_shape_by = "species",
sample_colorframe = TRUE,
sample_colorframe_label = "SampleCategory",
detailed_output = TRUE)
Customise and save plots
ord_CCA_plot = ord_CCA$plot + scale_shape_manual(values=seq(0,10)) + scale_colour_manual(values = c("#7a255d", "#9fd0cb", "#7566ff"))
ggsave("ord_CCA.pdf", plot = ord_CCA_plot, path = "output/plots", width = 15, height = 15, units = "cm")
ord_CA_plot <- ord_CA$plot + scale_shape_manual(values=seq(0,10)) + scale_colour_manual(values = c("#7a255d", "#9fd0cb", "#7566ff"))
ggsave("ord_CA.pdf", plot = ord_CA_plot, path = "output/plots", width = 15, height = 15, units = "cm")
ord_RDA_plot = ord_RDA$plot + scale_shape_manual(values=seq(0,10)) + scale_colour_manual(values = c("#7a255d", "#9fd0cb", "#7566ff"))
ggsave("ord_RDA.pdf", plot = ord_RDA_plot, path = "output/plots", width = 15, height = 15, units = "cm")
ord_PCA_plot <- ord_PCA$plot + scale_shape_manual(values=seq(0,10)) + scale_colour_manual(values = c("#7a255d", "#9fd0cb", "#7566ff"))
ggsave("ord_PCA.pdf", plot = ord_PCA_plot, path = "output/plots", width = 15, height = 15, units = "cm")
ord_NMDS_plot <- ord_NMDS$plot + scale_shape_manual(values=seq(0,10)) + scale_colour_manual(values = c("#7a255d", "#9fd0cb", "#7566ff"))
ggsave("ord_NMDS.pdf", plot = ord_NMDS_plot, path = "output/plots", width = 15, height = 15, units = "cm")
ord_PCOA_bray_plot <- ord_PCOA_bray$plot + scale_shape_manual(values=seq(0,10)) + scale_colour_manual(values = c("#7a255d", "#9fd0cb", "#7566ff"))
ggsave("ord_PCOA_bray.pdf", plot = ord_PCOA_bray_plot, path = "output/plots", width = 15, height = 15, units = "cm")
ord_PCOA_jac_plot <- ord_PCOA_jac$plot + scale_shape_manual(values=seq(0,10)) + scale_colour_manual(values = c("#7a255d", "#9fd0cb", "#7566ff"))
ggsave("ord_PCOA_jac.pdf", plot = ord_PCOA_jac_plot, path = "output/plots", width = 15, height = 15, units = "cm")
ord_MMDS_bray_plot <- ord_MMDS_bray$plot + scale_shape_manual(values=seq(0,10)) + scale_colour_manual(values = c("#7a255d", "#9fd0cb", "#7566ff"))
ggsave("ord_MMDS_bray.pdf", plot = ord_MMDS_bray_plot, path = "output/plots", width = 15, height = 15, units = "cm")
ord_MMDS_jac_plot <- ord_MMDS_jac$plot + scale_shape_manual(values=seq(0,10)) + scale_colour_manual(values = c("#7a255d", "#9fd0cb", "#7566ff"))
ggsave("ord_MMDS_jac.pdf", plot = ord_MMDS_jac_plot, path = "output/plots", width = 15, height = 15, units = "cm")
# Merge the two constrained analysis methods into one figrue
ordination_CCA_RDA <- ggarrange(ord_CCA_plot, ord_RDA_plot,
labels = c("A", "B"),
ncol = 2, nrow = 1)
ggsave("ordination_CCA_RDA.pdf", plot = ordination_CCA_RDA, path = "output/plots", width = 30, height = 10, units = "cm")
Repeat the constrained analysis on subsetted host data
#Subset black rat samples and filter out samples with <1000 reads after QC
amp_BR1000 <- amp_subset_samples(amp_samp,
species %in% c("Black rat"),
minreads = 1000,
RemoveAbsents = TRUE)
# Canonical Correspondence Analysis (considered the constrained version of CA)
BRord_CCA <- amp_ordinate(amp_BR1000,
type = "CCA",
constrain = "SampleType",
transform = "Hellinger",
sample_color_by = "SampleType",
sample_colorframe = TRUE,
sample_colorframe_label = "SampleType",
detailed_output = TRUE)
# RDA Redundancy Analysis (considered the constrained version of PCA)
BRord_RDA <- amp_ordinate(amp_BR1000,
type = "RDA",
constrain = "SampleCategory",
transform = "Hellinger",
sample_color_by = "SampleCategory",
sample_colorframe = TRUE,
sample_colorframe_label = "SampleCategory",
detailed_output = TRUE)
Constrained ordination of tick species
# Use tick dataset and filter out samples with <1000 reads after QC
amp_tick1000 <- amp_subset_samples(amp_tick,
minreads = 1000,
RemoveAbsents = TRUE)
# Canonical Correspondence Analysis (considered the constrained version of CA)
tickord_CCA <- amp_ordinate(amp_tick1000,
type = "CCA",
constrain = "tick_sp_short",
transform = "Hellinger",
sample_color_by = "tick_sp_short",
sample_colorframe = TRUE,
sample_colorframe_label = "tick_sp_short",
detailed_output = TRUE)
# RDA Redundancy Analysis (considered the constrained version of PCA)
tickord_RDA <- amp_ordinate(amp_tick1000,
type = "RDA",
constrain = "tick_sp_short",
transform = "Hellinger",
sample_color_by = "tick_sp_short",
sample_colorframe = TRUE,
sample_colorframe_label = "tick_sp_short",
detailed_output = TRUE)
Customise plots and combine to one figure
# Use viridis to get colours
# tickspcols = viridis(7, option = "B")
# optimise colours
tickspcols = c("#000004FF", "#330A5FFF", "#781C6DFF", "#BB3754FF", "#ED6925FF", "#FCB519FF", "#c5cc00")
tickord_CCApl = tickord_CCA$plot + scale_colour_manual(values = tickspcols) + theme(legend.position="none")
tickord_RDApl = tickord_RDA$plot + scale_colour_manual(values = tickspcols) + theme(legend.position="none")
# Merge plots and save
tick_ord <- ggarrange(tickord_CCApl, tickord_RDApl,
labels = c("A", "B"),
ncol = 2, nrow = 1)
ggsave("tick_ord.pdf", plot = tick_ord, path = "output/plots", width = 25, height = 15, units = "cm")
Zoom in on tick species for CCA ordination
# Use tick dataset and filter out samples with <1000 reads after QC
amp_ticksub1000 <- amp_subset_samples(amp_tick,
tick_sp_short %in% c("Ixtas", "Ixhol", "Ixtri", "Ixhol;Ixtri"),
minreads = 1000,
RemoveAbsents = TRUE)
# Canonical Correspondence Analysis (considered the constrained version of CA)
tickordsub_CCA <- amp_ordinate(amp_ticksub1000,
type = "CCA",
constrain = "tick_sp_short",
transform = "Hellinger",
sample_color_by = "tick_sp_short",
sample_colorframe = TRUE,
sample_colorframe_label = "tick_sp_short",
detailed_output = TRUE)
tickordsub_RDA <- amp_ordinate(amp_ticksub1000,
type = "RDA",
constrain = "tick_sp_short",
transform = "Hellinger",
sample_color_by = "tick_sp_short",
sample_colorframe = TRUE,
sample_colorframe_label = "tick_sp_short",
detailed_output = TRUE)
Customise plots and combine to one figure
# Use viridis to get colours
# tickspcols = viridis(7, option = "B")
# optimise colours
tickspcols2 = c("#BB3754FF", "#ED6925FF", "#FCB519FF", "#c5cc00")
tickordsub_CCApl = tickordsub_CCA$plot + scale_colour_manual(values = tickspcols2) + theme(legend.position="none")
tickordsub_RDApl = tickordsub_RDA$plot + scale_colour_manual(values = tickspcols2) + theme(legend.position="none")
# Merge plots and save
ticksub_ord <- ggarrange(tickordsub_CCApl, tickordsub_RDApl,
labels = c("A", "B"),
ncol = 2, nrow = 1)
ggsave("ticksub_ord.pdf", plot = ticksub_ord, path = "output/plots", width = 25, height = 15, units = "cm")
Ordination using the phyloseq package
#prune samples with low reads
ps_BR1 = prune_samples(sample_sums(ps_BR)>=1000, ps_BR)
bray_dist = phyloseq::distance(ps_BR1, method="bray", weighted=F)
BR_bray_PCoAordination = ordinate(ps_BR1, method="PCoA", distance=bray_dist)
BR_bray_PCoA <- plot_ordination(ps_BR1, BR_bray_PCoAordination, color="SampleCategory") + theme(aspect.ratio=1)
adonis(bray_dist ~ sample_data(ps_BR1)$SampleCategory)
#prune samples with low reads
ps_BR1 = prune_samples(sample_sums(ps_BR)>=1000, ps_BR)
eu_dist = phyloseq::distance(ps_BR1, method="euclidean", weighted=F)
BR_eu_PCoAordination = ordinate(ps_BR1, method="PCoA", distance=eu_dist)
BR_eu_PCoA <- plot_ordination(ps_BR1, BR_eu_PCoAordination, color="SampleCategory") + theme(aspect.ratio=1)
# Statistical analysis
adonis(eu_dist ~ sample_data(ps_BR1)$SampleCategory)
phyloseq ordination - tick samples
#prune samples with low reads
ps_tick = prune_samples(sample_sums(ps_bact_tick)>=1000, ps_bact_tick)
bray_dist = phyloseq::distance(ps_tick, method="bray", weighted=F)
tick_PCoAordination = ordinate(ps_tick, method="PCoA", distance=bray_dist)
tick_PCoAordination2 = plot_ordination(ps_tick, tick_PCoAordination, color="tick_sp_short", shape="instar") + theme(aspect.ratio=1)
adonis(bray_dist ~ sample_data(ps_tick)$species)
phyloseq ordination - Ix. holocyclus samples
ixhol = subset_samples(ps_bact_tick, tick_sp_short=="Ixhol")
#prune samples with low reads
ps_tick = prune_samples(sample_sums(ixhol)>=1000, ixhol)
bray_dist = phyloseq::distance(ps_tick, method="bray", weighted=F)
tick_PCoAordination = ordinate(ps_tick, method="PCoA", distance=bray_dist)
tick_PCoAordination2 = plot_ordination(ps_tick, tick_PCoAordination, color="tick_sp_short", shape="instar") + theme(aspect.ratio=1)
adonis(bray_dist ~ sample_data(ps_tick)$species)
phyloseq ordination - Ix. holocyclus, Ix. tasmani and Ix. trichosuri samples
ps_bact_tick_sub = subset_samples(ps_bact_tick, tick_sp_short=="Ixhol" | tick_sp_short=="Ixtri" | tick_sp_short=="Ixtas" | tick_sp_short=="Ixhol;Ixtri")
# prune samples with less than 1000 reads
ps_bact_tick_sub_pr = prune_samples(sample_sums(ps_bact_tick_sub)>=1000, ps_bact_tick_sub)
bray_dist = phyloseq::distance(ps_bact_tick_sub_pr, method="bray", weighted=F)
tick_PCoAordination = ordinate(ps_bact_tick_sub_pr, method="PCoA", distance=bray_dist)
tick_PCoAordination2 = plot_ordination(ps_bact_tick_sub_pr, tick_PCoAordination, color="tick_sp_short", shape="instar") + theme(aspect.ratio=1)
adonis(bray_dist ~ sample_data(ps_bact_tick_sub_pr)$tick_sp_short)
See tutorial here
Overall statistical analysis Differences by sample type - using both ANOVA and MANOVA methods
# prune samples with less than 1000 reads
ps_bact_pr = prune_samples(sample_sums(ps_bact_samp)>=1000, ps_bact_samp)
# Transform data to hellinger
pseq.rel <- microbiome::transform(ps_bact_pr, "hellinger")
# Pick relative abundances (compositional) and sample metadata
otu <- abundances(pseq.rel)
meta <- meta(pseq.rel)
# samples x SampleCategory as input
permanova <- adonis(t(otu) ~ SampleCategory,
data = meta, permutations=999, method = "bray")
## statistics
print(as.data.frame(permanova$aov.tab)["SampleCategory", "Pr(>F)"])
dist <- vegdist(t(otu))
mod <- betadisper(dist, meta$SampleCategory)
### ANOVA - are groups different
anova(betadisper(dist, meta$SampleCategory))
### PERMANOVA - which groups different
permutest(betadisper(dist, meta$SampleCategory), pairwise = TRUE)
Statistical comparison within sample types statistical analysis Differences by sample type - using both ANOVA and MANOVA methods
# prune samples with less than 1000 reads
ps_bact_bl_pr = prune_samples(sample_sums(ps_bact_bl)>=1000, ps_bact_bl)
# Transform data to hellinger
pseq.rel <- microbiome::transform(ps_bact_bl_pr, "hellinger")
# Pick relative abundances (compositional) and sample metadata
otu <- abundances(pseq.rel)
meta <- meta(pseq.rel)
# samples x species as input
permanova <- adonis(t(otu) ~ species,
data = meta, permutations=999, method = "bray")
## statistics
print(as.data.frame(permanova$aov.tab)["species", "Pr(>F)"])
dist <- vegdist(t(otu))
mod <- betadisper(dist, meta$species)
### ANOVA - are groups different
anova(betadisper(dist, meta$species))
### PERMANOVA - which groups different
permutest(betadisper(dist, meta$species), pairwise = TRUE)
# prune samples with less than 1000 reads
ps_bact_tick_pr = prune_samples(sample_sums(ps_bact_tick)>=1000, ps_bact_tick)
# Transform data to hellinger
pseq.rel <- microbiome::transform(ps_bact_tick_pr, "hellinger")
# Pick relative abundances (compositional) and sample metadata
otu <- abundances(pseq.rel)
meta <- meta(pseq.rel)
# samples x species as input
permanova <- adonis(t(otu) ~ species,
data = meta, permutations=999, method = "bray")
## statistics
print(as.data.frame(permanova$aov.tab)["species", "Pr(>F)"])
dist <- vegdist(t(otu))
mod <- betadisper(dist, meta$species)
### ANOVA - are groups different
anova(betadisper(dist, meta$species))
### PERMANOVA - which groups different
permutest(betadisper(dist, meta$species), pairwise = TRUE)
# prune samples with less than 1000 reads
ps_bact_tis_pr = prune_samples(sample_sums(ps_bact_tis)>=1000, ps_bact_tis)
# Transform data to hellinger
pseq.rel <- microbiome::transform(ps_bact_tis_pr, "hellinger")
# Pick relative abundances (compositional) and sample metadata
otu <- abundances(pseq.rel)
meta <- meta(pseq.rel)
# samples x species as input
permanova <- adonis(t(otu) ~ species,
data = meta, permutations=999, method = "bray")
## statistics
print(as.data.frame(permanova$aov.tab)["species", "Pr(>F)"])
dist <- vegdist(t(otu))
mod <- betadisper(dist, meta$species)
### ANOVA - are groups different
anova(betadisper(dist, meta$species))
### PERMANOVA - which groups different
permutest(betadisper(dist, meta$species), pairwise = TRUE)
Statistical comparison between tick species Differences by sample type - using both ANOVA and MANOVA methods
# prune samples with less than 1000 reads
ps_bact_tick_pr = prune_samples(sample_sums(ps_bact_tick)>=1000, ps_bact_tick)
# Transform data to hellinger
pseq.rel <- microbiome::transform(ps_bact_tick_pr, "hellinger")
# Pick relative abundances (compositional) and sample metadata
otu <- abundances(pseq.rel)
meta <- meta(pseq.rel)
# samples x tick species as input
permanova <- adonis(t(otu) ~ tick_sp_short,
data = meta, permutations=999, method = "bray")
## statistics
print(as.data.frame(permanova$aov.tab)["tick_sp_short", "Pr(>F)"])
dist <- vegdist(t(otu))
mod <- betadisper(dist, meta$tick_sp_short)
### ANOVA - are groups different
anova(betadisper(dist, meta$tick_sp_short))
### PERMANOVA - which groups different
permutest(betadisper(dist, meta$tick_sp_short), pairwise = TRUE)
# subset select tick species
ps_bact_tick_sub = subset_samples(ps_bact_tick, tick_sp_short=="Ixhol" | tick_sp_short=="Ixtri" | tick_sp_short=="Ixtas" | tick_sp_short=="Ixhol;Ixtri")
# prune samples with less than 1000 reads
ps_bact_tick_sub_pr = prune_samples(sample_sums(ps_bact_tick_sub)>=1000, ps_bact_tick_sub)
# Transform data to hellinger
pseq.rel <- microbiome::transform(ps_bact_tick_sub_pr, "hellinger")
# Pick relative abundances (compositional) and sample metadata
otu <- abundances(pseq.rel)
meta <- meta(pseq.rel)
# samples x tick species as input
permanova <- adonis(t(otu) ~ tick_sp_short,
data = meta, permutations=999, method = "bray")
## statistics
print(as.data.frame(permanova$aov.tab)["tick_sp_short", "Pr(>F)"])
dist <- vegdist(t(otu))
mod <- betadisper(dist, meta$tick_sp_short)
### ANOVA - are groups different
anova(betadisper(dist, meta$tick_sp_short))
### PERMANOVA - which groups different
permutest(betadisper(dist, meta$tick_sp_short), pairwise = TRUE)
Documentation here
Create matrix of beta diversity between samples.
betadiv <- amp_betadiv(amp_subset,
distmeasure = "bray",
label_by = "Instar",
show_values = FALSE,
values_size = 3
)
betadiv
Function from MicrobiotaProcess.
Beta diversity metrics can assess the differences between microbial communities. It can be visualized with PCA or PCoA, this can also be visualized with hierarchical clustering. MicrobiotaProcess also implements the analysis based on ggtree.
## All samples - detailed, include species and SampleCategory
clust_all <- get_clust(obj=ps_bact_samp, distmethod="euclidean",
method="hellinger", hclustmethod="average")
speciescol = viridis(10, option = "D")
# circular layout
clust_all_plot <- ggclust(obj=clust_all ,
layout = "circular",
pointsize=1,
fontsize=0,
factorNames=c("species","SampleCategory")) + scale_color_manual(values=speciescol) + scale_shape_manual(values=c(17, 15, 16)) + ggtitle("Hierarchical Cluster of All Samples (euclidean)")
# Save plot
ggsave("clust_all_plot.pdf", plot = clust_all_plot , path = "output/plots", width = 25, height = 25, units = "cm")
### All samples - simple version, only colour SampleCategory
ColSampCat = c("#7a255d", "#9fd0cb", "#7566ff")
# circular layout
clust_all_plot2 <- ggclust(obj=clust_all ,
layout = "circular",
pointsize=1,
fontsize=0,
factorNames=c("SampleCategory")) + scale_color_manual(values=ColSampCat) + ggtitle("Hierarchical Cluster of All Samples (euclidean)")
# Save plot
ggsave("clust_all_plot2.pdf", plot = clust_all_plot2 , path = "output/plots", width = 25, height = 25, units = "cm")
# Blood samples
clust_bl <- get_clust(obj=ps_bact_bl, distmethod="euclidean",
method="hellinger", hclustmethod="average")
speciesblcol = viridis(7, option = "D")
# circular layout
clust_bl_plot <- ggclust(obj=clust_bl ,
layout = "circular",
pointsize=3,
fontsize=0,
factorNames=c("species")) + scale_color_manual(values=speciesblcol) + ggtitle("Hierarchical Cluster of Blood Samples (euclidean)")
# Save plot
ggsave("clust_bl_plot.pdf", plot = clust_bl_plot , path = "output/plots", width = 25, height = 25, units = "cm")
## Tick samples
clust_tick <- get_clust(obj=ps_bact_tick, distmethod="euclidean",
method="hellinger", hclustmethod="average")
speciestickcol = viridis(8, option = "D")
# circular layout
clust_tick_plot <- ggclust(obj=clust_tick ,
layout = "circular",
pointsize=2,
fontsize=0,
factorNames=c("species","tick_sp_short")) + scale_color_manual(values=speciestickcol) + scale_shape_manual(values=c(17, 8, 25, 18, 7, 15, 16)) + ggtitle("Hierarchical Cluster of Tick Samples (euclidean)")
# Save plot
ggsave("clust_tick_plot.pdf", plot = clust_tick_plot , path = "output/plots", width = 25, height = 25, units = "cm")
### Tissue samples
clust_tis <- get_clust(obj=ps_bact_tis, distmethod="euclidean",
method="hellinger", hclustmethod="average")
speciestiscol = viridis(9, option = "D")
## circular layout
clust_tis_plot <- ggclust(obj=clust_tis ,
layout = "circular",
pointsize=3,
fontsize=0,
factorNames=c("species")) + scale_color_manual(values=speciestiscol) + ggtitle("Hierarchical Cluster of Tissue Samples (euclidean)")
# Save plot
ggsave("clust_tis_plot.pdf", plot = clust_tis_plot , path = "output/plots", width = 25, height = 25, units = "cm")
Load additional libraries
library(ggridges)
library(scales)
library(ggtree)
Simple tree from tutorial here
# group by genus
taxa_of_interest_g = tax_glom(taxa_of_interest, "Genus")
# plot tree
tree1 = plot_tree(taxa_of_interest_g, color="species", shape ="SampleCategory", size="abundance")
From tutorial here
tree2 = ggtree(taxa_of_interest, layout='circular') + geom_text2(aes(subset=!isTip, label=label), hjust=-.2, size=4) +
geom_tiplab(aes(label=Genus), hjust=-.3) +
geom_point(aes(x=x+hjust, color=species, shape=Family),na.rm=TRUE) +
scale_size_continuous(trans=log_trans(5)) +
theme(legend.position="right") + scale_shape_manual(values=c(17, 15, 16, 0, 1, 2, 10))
library(ggtreeExtra)
library(ggtree)
library(phyloseq)
library(dplyr)
data("GlobalPatterns")
GP <- GlobalPatterns
GP <- prune_taxa(taxa_sums(GP) > 600, GP)
sample_data(GP)$human <- get_variable(GP, "SampleType") %in%
c("Feces", "Skin")
mergedGP <- merge_samples(GP, "SampleType")
mergedGP <- rarefy_even_depth(taxa_of_interest,rngseed=394582)
mergedGP <- tax_glom(mergedGP,"Order")
melt_simple <- psmelt(taxa_of_interest) %>%
filter(Abundance < 120) %>%
select(OTU, val=Abundance)
p <- ggtree(taxa_of_interest, layout="fan", open.angle=10) +
geom_tippoint(mapping=aes(color=Family),
size=1.5,
show.legend=FALSE)
p <- rotate_tree(p, -90)
p <- tree2 +
geom_fruit(
data=melt_simple,
geom=geom_boxplot,
mapping = aes(
y=OTU,
x=val,
group=label,
fill=Family,
),
size=.2,
outlier.size=0.5,
outlier.stroke=0.08,
outlier.shape=21,
axis.params=list(
axis = "x",
text.size = 1.8,
hjust = 1,
vjust = 0.5,
nbreak = 3,
),
grid.params=list()
)
p <- p +
scale_fill_discrete(
name="Family",
guide=guide_legend(keywidth=0.8, keyheight=0.8, ncol=1)
) +
theme(
legend.title=element_text(size=9), # The title of legend
legend.text=element_text(size=7) # The label text of legend, the sizes should be adjust with dpi.
)
p
sessionInfo()
R version 4.0.3 (2020-10-10)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Mojave 10.14.6
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRblas.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib
locale:
[1] en_AU.UTF-8/en_AU.UTF-8/en_AU.UTF-8/C/en_AU.UTF-8/en_AU.UTF-8
attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets
[8] methods base
other attached packages:
[1] MicrobiotaProcess_1.2.2 eulerr_6.1.0
[3] Biostrings_2.58.0 XVector_0.30.0
[5] patchwork_1.1.1 knitr_1.31
[7] microbiomeutilities_1.00.12 vegan_2.5-7
[9] lattice_0.20-41 permute_0.9-5
[11] DESeq2_1.30.1 SummarizedExperiment_1.20.0
[13] Biobase_2.50.0 MatrixGenerics_1.2.1
[15] matrixStats_0.58.0 GenomicRanges_1.42.0
[17] GenomeInfoDb_1.26.2 IRanges_2.24.1
[19] S4Vectors_0.28.1 BiocGenerics_0.36.0
[21] ape_5.4-1 data.table_1.14.0
[23] decontam_1.10.0 reshape_0.8.8
[25] microbiome_1.13.9 MicrobeR_0.3.2
[27] cowplot_1.1.1 viridis_0.5.1
[29] viridisLite_0.3.0 plotly_4.9.3
[31] agricolae_1.3-3 ggpubr_0.4.0
[33] ampvis2extras_0.1.5 ampvis2_2.6.7
[35] forcats_0.5.1 stringr_1.4.0
[37] dplyr_1.0.5 purrr_0.3.4
[39] readr_1.4.0 tidyr_1.1.2
[41] tibble_3.1.0 ggplot2_3.3.3
[43] tidyverse_1.3.0 phyloseq_1.34.0
[45] qiime2R_0.99.4 workflowr_1.6.2
loaded via a namespace (and not attached):
[1] bit64_4.0.5 multcomp_1.4-16 DelayedArray_0.16.2
[4] rpart_4.1-15 RCurl_1.98-1.2 doParallel_1.0.16
[7] generics_0.1.0 TH.data_1.0-10 RSQLite_2.2.3
[10] combinat_0.0-8 bit_4.0.4 xml2_1.3.2
[13] lubridate_1.7.10 httpuv_1.5.5 assertthat_0.2.1
[16] xfun_0.21 hms_1.0.0 jquerylib_0.1.3
[19] evaluate_0.14 promises_1.2.0.1 Rmisc_1.5
[22] fansi_0.4.2 progress_1.2.2 dbplyr_2.1.0
[25] readxl_1.3.1 igraph_1.2.6 DBI_1.1.1
[28] geneplotter_1.68.0 htmlwidgets_1.5.3 ellipsis_0.3.1
[31] backports_1.2.1 picante_1.8.2 annotate_1.68.0
[34] libcoin_1.0-8 vctrs_0.3.7 abind_1.4-5
[37] cachem_1.0.4 withr_2.4.1 checkmate_2.0.0
[40] treeio_1.14.3 prettyunits_1.1.1 cluster_2.1.1
[43] lazyeval_0.2.2 crayon_1.4.1 genefilter_1.72.1
[46] pkgconfig_2.0.3 zCompositions_1.3.4 nlme_3.1-152
[49] nnet_7.3-15 rlang_0.4.10 questionr_0.7.4
[52] lifecycle_1.0.0 miniUI_0.1.1.1 sandwich_3.0-0
[55] modelr_0.1.8 cellranger_1.1.0 rprojroot_2.0.2
[58] Matrix_1.3-2 aplot_0.0.6 phangorn_2.5.5
[61] carData_3.0-4 zoo_1.8-8 Rhdf5lib_1.12.1
[64] reprex_1.0.0 base64enc_0.1-3 pheatmap_1.0.12
[67] whisker_0.4 png_0.1-7 bitops_1.0-6
[70] rhdf5filters_1.2.0 blob_1.2.1 coin_1.4-1
[73] jpeg_0.1-8.1 rstatix_0.7.0 DECIPHER_2.18.1
[76] ggsignif_0.6.1 rtk_0.2.6.1 klaR_0.6-15
[79] scales_1.1.1 memoise_2.0.0 magrittr_2.0.1
[82] plyr_1.8.6 gghalves_0.1.1 zlibbioc_1.36.0
[85] compiler_4.0.3 philr_1.16.0 RColorBrewer_1.1-2
[88] ggstar_1.0.1 cli_2.3.1 ade4_1.7-16
[91] htmlTable_2.1.0 Formula_1.2-4 MASS_7.3-53.1
[94] mgcv_1.8-34 tidyselect_1.1.0 stringi_1.5.3
[97] highr_0.8 yaml_2.2.1 locfit_1.5-9.4
[100] latticeExtra_0.6-29 ggrepel_0.9.1 grid_4.0.3
[103] sass_0.3.1 fastmatch_1.1-0 tools_4.0.3
[106] rio_0.5.26 rstudioapi_0.13 foreach_1.5.1
[109] foreign_0.8-81 git2r_0.28.0 gridExtra_2.3
[112] Rtsne_0.15 digest_0.6.27 rvcheck_0.1.8
[115] BiocManager_1.30.10 shiny_1.6.0 quadprog_1.5-8
[118] Rcpp_1.0.6 car_3.0-10 broom_0.7.5
[121] later_1.1.0.1 httr_1.4.2 AnnotationDbi_1.52.0
[124] colorspace_2.0-0 rvest_0.3.6 XML_3.99-0.5
[127] fs_1.5.0 truncnorm_1.0-8 splines_4.0.3
[130] tidytree_0.3.3 multtest_2.46.0 xtable_1.8-4
[133] jsonlite_1.7.2 ggtree_2.4.1 AlgDesign_1.2.0
[136] modeltools_0.2-23 R6_2.5.0 Hmisc_4.5-0
[139] NADA_1.6-1.1 pillar_1.5.1 htmltools_0.5.1.1
[142] mime_0.10 glue_1.4.2 fastmap_1.1.0
[145] DT_0.17 BiocParallel_1.24.1 ggnet_0.1.0
[148] codetools_0.2-18 mvtnorm_1.1-1 utf8_1.2.1
[151] bslib_0.2.4.9001 network_1.16.1 curl_4.3
[154] gtools_3.8.2 zip_2.1.1 openxlsx_4.2.3
[157] survival_3.2-7 rmarkdown_2.7.2 biomformat_1.18.0
[160] munsell_0.5.0 rhdf5_2.34.0 GenomeInfoDbData_1.2.4
[163] iterators_1.0.13 labelled_2.7.0 haven_2.3.1
[166] reshape2_1.4.4 gtable_0.3.0